patch_litellm()Core
LiteLLM
Deterministic outputs
LiteLLM ModelResponse(Stream) objects have id and created_at fields that are generated dynamically. Even when we use cachy to cache the LLM response these dynamic fields create diffs which makes code review more challenging. The patches below ensure that id and created_at fields are fixed and won’t generate diffs.
patch_litellm
def patch_litellm(
seed:int=0
):
Patch litellm.ModelResponseBase such that id and created are fixed.
Completion
LiteLLM provides an convenient unified interface for most big LLM providers. Because it’s so useful to be able to switch LLM providers with just one argument. We want to make it even easier to by adding some more convenience functions and classes.
This is very similar to our other wrapper libraries for popular AI providers: claudette (Anthropic), gaspard (Gemini), cosette (OpenAI).
# litellm._turn_on_debug()ms = ["gemini/gemini-3-pro-preview", "gemini/gemini-2.5-pro", "gemini/gemini-2.5-flash", "claude-sonnet-4-5", "openai/gpt-4.1"]
msg = [{'role':'user','content':'Hey there!', 'cache_control': {'type': 'ephemeral'}}]
for m in ms:
display(Markdown(f'**{m}:**'))
display(completion(m,msg))gemini/gemini-3-pro-preview:
Hey there! How is your day going?
I’m ready to help with whatever is on your mind—whether you have a question, need some creative inspiration, or just want to chat. What can I do for you today?
- id:
chatcmpl-xxx - model:
gemini-3-pro-preview - finish_reason:
stop - usage:
Usage(completion_tokens=179, prompt_tokens=4, total_tokens=183, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=132, rejected_prediction_tokens=None, text_tokens=47, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=4, image_tokens=None))
gemini/gemini-2.5-pro:
Hey there! 👋
How can I help you today?
- id:
chatcmpl-xxx - model:
gemini-2.5-pro - finish_reason:
stop - usage:
Usage(completion_tokens=917, prompt_tokens=4, total_tokens=921, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=905, rejected_prediction_tokens=None, text_tokens=12, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=4, image_tokens=None))
gemini/gemini-2.5-flash:
Hey there! How can I help you today?
- id:
chatcmpl-xxx - model:
gemini-2.5-flash - finish_reason:
stop - usage:
Usage(completion_tokens=427, prompt_tokens=4, total_tokens=431, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=417, rejected_prediction_tokens=None, text_tokens=10, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=4, image_tokens=None))
claude-sonnet-4-5:
Hello! How can I help you today?
- id:
chatcmpl-xxx - model:
claude-sonnet-4-5-20250929 - finish_reason:
stop - usage:
Usage(completion_tokens=12, prompt_tokens=10, total_tokens=22, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), cache_creation_input_tokens=0, cache_read_input_tokens=0)
openai/gpt-4.1:
Hello! 😊 How can I help you today?
- id:
chatcmpl-xxx - model:
gpt-4.1-2025-04-14 - finish_reason:
stop - usage:
Usage(completion_tokens=10, prompt_tokens=10, total_tokens=20, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0, text_tokens=None, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=0, cached_tokens=0, text_tokens=None, image_tokens=None))
Generated images are also displayed (not shown here to conserve filesize):
# completion(model='gemini/gemini-2.5-flash-image', messages=[{'role':'user','content':'Draw a simple sketch of a cat'}])Messages formatting
Let’s start with making it easier to pass messages into litellm’s completion function (including images, and pdf files).
stop_reason
def stop_reason(
r
):
contents
def contents(
r
):
Get message object from response r.
remove_cache_ckpts
def remove_cache_ckpts(
msg
):
remove cache checkpoints and return msg.
mk_msg
def mk_msg(
content, # Content: str, bytes (image), list of mixed content, or dict w 'role' and 'content' fields
role:str='user', # Message role if content isn't already a dict/Message
cache:bool=False, # Enable Anthropic caching
ttl:NoneType=None, # Cache TTL: '5m' (default) or '1h'
):
Create a LiteLLM compatible message.
Now we can use mk_msg to create different types of messages.
Simple text:
msg = mk_msg("hey")
msg{'role': 'user', 'content': 'hey'}
Which can be passed to litellm’s completion function like this:
model = ms[1] # use 2.5-pro, 3-pro is very slow even to run tests as of makingres = completion(model, [msg])
resHey there! How can I help you today?
- id:
chatcmpl-xxx - model:
gemini-2.5-pro - finish_reason:
stop - usage:
Usage(completion_tokens=769, prompt_tokens=2, total_tokens=771, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=759, rejected_prediction_tokens=None, text_tokens=10, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=2, image_tokens=None))
We’ll add a little shortcut to make examples and testing easier here:
def c(msgs, m=model, **kw):
msgs = [msgs] if isinstance(msgs,dict) else listify(msgs)
return completion(m, msgs, **kw)c(msg)Hey there! How can I help you today?
- id:
chatcmpl-xxx - model:
gemini-2.5-pro - finish_reason:
stop - usage:
Usage(completion_tokens=769, prompt_tokens=2, total_tokens=771, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=759, rejected_prediction_tokens=None, text_tokens=10, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=2, image_tokens=None))
Lists w just one string element are flattened for conciseness:
test_eq(mk_msg("hey"), mk_msg(["hey"]))(LiteLLM ignores these fields when sent to other providers)
Text and images:
img_fn = Path('samples/puppy.jpg')
Image(filename=img_fn, width=200)
msg = mk_msg(['hey what in this image?',img_fn.read_bytes()])
print(json.dumps(msg,indent=1)[:200]+"..."){
"role": "user",
"content": [
{
"type": "text",
"text": "hey what in this image?"
},
{
"type": "image_url",
"image_url": "...
c(msg)Of course! This is an adorable and heartwarming image of a Cavalier King Charles Spaniel puppy.
Here’s a more detailed breakdown of what’s in the picture:
- The Puppy: The main subject is a young puppy, most likely a Cavalier King Charles Spaniel of the “Blenheim” (chestnut and white) coloring. It has large, dark, expressive eyes, long, floppy brown ears, and a soft, fluffy coat. The puppy is lying down in the grass and looking directly at the camera with a curious and innocent expression.
- The Flowers: To the left of the puppy is a dense cluster of small, purple, daisy-like flowers with yellow centers. These appear to be a type of Aster, like Michaelmas daisies.
- The Setting: The scene is outdoors on a lawn of green grass. The puppy seems to be peeking out from beside the bush of flowers. The background is softly out of focus, which helps the puppy stand out as the main subject.
Overall, it’s a very charming and beautifully composed photograph that captures the sweetness and innocence of puppyhood.
- id:
chatcmpl-xxx - model:
gemini-2.5-pro - finish_reason:
stop - usage:
Usage(completion_tokens=1517, prompt_tokens=265, total_tokens=1782, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=1279, rejected_prediction_tokens=None, text_tokens=238, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=7, image_tokens=None))
Let’s also demonstrate this for PDFs
pdf_fn = Path('samples/solveit.pdf')
msg = mk_msg(['Who is the author of this pdf?', pdf_fn.read_bytes()])
c(msg)Based on the text in the PDF, the author is Jeremy Howard.
He introduces himself directly with the line: “Hi, I’m Jeremy Howard, from fast.ai”.
- id:
chatcmpl-xxx - model:
gemini-2.5-pro - finish_reason:
stop - usage:
Usage(completion_tokens=502, prompt_tokens=267, total_tokens=769, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=464, rejected_prediction_tokens=None, text_tokens=38, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=9, image_tokens=None))
Some models like Gemini support audio and video:
wav_data = httpx.get("https://openaiassets.blob.core.windows.net/$web/API/docs/audio/alloy.wav").content
# Audio(wav_data) # uncomment to previewmsg = mk_msg(['What is this audio saying?', wav_data])
completion(ms[1], [msg])The audio says: “The sun rises in the east and sets in the west. This simple fact has been observed by humans for thousands of years.”
- id:
chatcmpl-xxx - model:
gemini-2.5-pro - finish_reason:
stop - usage:
Usage(completion_tokens=525, prompt_tokens=230, total_tokens=755, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=495, rejected_prediction_tokens=None, text_tokens=30, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=223, cached_tokens=None, text_tokens=7, image_tokens=None))
vid_data = httpx.get("https://storage.googleapis.com/github-repo/img/gemini/multimodality_usecases_overview/pixel8.mp4").contentmsg = mk_msg(['Concisely, what is happening in this video?', vid_data])
completion(ms[1], [msg])This video is an advertisement for the Google Pixel 8 Pro, showcasing its low-light video capabilities. A Tokyo-based photographer, Saeka Shimada, uses the phone’s “Video Boost” and “Night Sight” features to capture vibrant and detailed video footage of the city’s atmospheric backstreets at night.
- id:
chatcmpl-xxx - model:
gemini-2.5-pro - finish_reason:
stop - usage:
Usage(completion_tokens=182, prompt_tokens=17402, total_tokens=17584, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=117, rejected_prediction_tokens=None, text_tokens=65, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=1873, cached_tokens=None, text_tokens=12, image_tokens=None))
Caching
Some providers such as Anthropic require manually opting into caching. Let’s try it:
def cpr(i): return f'{i} '*1024 + 'This is a caching test. Report back only what number you see repeated above.'disable_cachy()# msg = mk_msg(cpr(1), cache=True)
# res = c(msg, ms[2])
# resAnthropic has a maximum of 4 cache checkpoints, so we remove previous ones as we go:
# res = c([remove_cache_ckpts(msg), mk_msg(res), mk_msg(cpr(2), cache=True)], ms[2])
# resWe see that the first message was cached, and this extra message has been written to cache:
# res.usage.prompt_tokens_detailsWe can add a bunch of large messages in a loop to see how the number of cached tokens used grows.
We do this for 25 times to ensure it still works for more than >20 content blocks, which is a known anthropic issue.
The code below is commented by default, because it’s slow. Please uncomment when working on caching.
# h = []
# msg = mk_msg(cpr(1), cache=True)
# for o in range(2,25):
# h += [remove_cache_ckpts(msg), mk_msg(res)]
# msg = mk_msg(cpr(o), cache=True)
# res = c(h+[msg])
# detls = res.usage.prompt_tokens_details
# print(o, detls.cached_tokens, detls.cache_creation_tokens, end='; ')enable_cachy()Reconstructing formatted outputs
Lisette can call multiple tools in a loop. Further down this notebook, we’ll provide convenience functions for formatting such a sequence of toolcalls and responses into one formatted output string.
For now, we’ll show an example and show how to transform such a formatted output string back into a valid LiteLLM history.
fmt_outp = '''
I'll solve this step-by-step, using parallel calls where possible.
<details class='tool-usage-details'>
```json
{
"id": "toolu_01KjnQH2Nsz2viQ7XYpLW3Ta",
"call": { "function": "simple_add", "arguments": { "a": 10, "b": 5 } },
"result": "15"
}
```
</details>
<details class='tool-usage-details'>
```json
{
"id": "toolu_01Koi2EZrGZsBbnQ13wuuvzY",
"call": { "function": "simple_add", "arguments": { "a": 2, "b": 1 } },
"result": "3"
}
```
</details>
Now I need to multiply 15 * 3 before I can do the final division:
<details class='tool-usage-details'>
```json
{
"id": "toolu_0141NRaWUjmGtwxZjWkyiq6C",
"call": { "function": "multiply", "arguments": { "a": 15, "b": 3 } },
"result": "45"
}
```
</details>
<details class='token-usage-details'><summary>Cache hit: 81.8% | Tokens: total=23,276 input=23,158 (+18,910 cached, 0 new) output=118 (reasoning 23)</summary>
`Usage(completion_tokens=118, prompt_tokens=23158, total_tokens=23276, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=23, rejected_prediction_tokens=None, text_tokens=None, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=18910, text_tokens=None, image_tokens=None, cache_creation_tokens=0), cache_creation_input_tokens=0, cache_read_input_tokens=18910)`
</details>
'''We can split into chunks of (text,toolstr,json):
sp = re_tools.split(fmt_outp)
for o in list(chunked(sp, 3, pad=True)): print('- ', o)- ["\nI'll solve this step-by-step, using parallel calls where possible.\n\n", '<details class=\'tool-usage-details\'>\n\n```json\n{\n "id": "toolu_01KjnQH2Nsz2viQ7XYpLW3Ta",\n "call": { "function": "simple_add", "arguments": { "a": 10, "b": 5 } },\n "result": "15"\n}\n```\n\n</details>', '{\n "id": "toolu_01KjnQH2Nsz2viQ7XYpLW3Ta",\n "call": { "function": "simple_add", "arguments": { "a": 10, "b": 5 } },\n "result": "15"\n}']
- ['\n\n', '<details class=\'tool-usage-details\'>\n\n```json\n{\n "id": "toolu_01Koi2EZrGZsBbnQ13wuuvzY",\n "call": { "function": "simple_add", "arguments": { "a": 2, "b": 1 } },\n "result": "3"\n}\n```\n\n</details>', '{\n "id": "toolu_01Koi2EZrGZsBbnQ13wuuvzY",\n "call": { "function": "simple_add", "arguments": { "a": 2, "b": 1 } },\n "result": "3"\n}']
- ['\n\nNow I need to multiply 15 * 3 before I can do the final division:\n\n', '<details class=\'tool-usage-details\'>\n\n```json\n{\n "id": "toolu_0141NRaWUjmGtwxZjWkyiq6C",\n "call": { "function": "multiply", "arguments": { "a": 15, "b": 3 } },\n "result": "45"\n}\n```\n\n</details>', '{\n "id": "toolu_0141NRaWUjmGtwxZjWkyiq6C",\n "call": { "function": "multiply", "arguments": { "a": 15, "b": 3 } },\n "result": "45"\n}']
- ["\n\n<details class='token-usage-details'><summary>Cache hit: 81.8% | Tokens: total=23,276 input=23,158 (+18,910 cached, 0 new) output=118 (reasoning 23)</summary>\n\n`Usage(completion_tokens=118, prompt_tokens=23158, total_tokens=23276, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=23, rejected_prediction_tokens=None, text_tokens=None, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=18910, text_tokens=None, image_tokens=None, cache_creation_tokens=0), cache_creation_input_tokens=0, cache_read_input_tokens=18910)`\n\n</details>\n", None, None]
fmt2hist
def fmt2hist(
outp:str
)->list:
Transform a formatted output into a LiteLLM compatible history
See how we can turn that one formatted output string back into a list of Messages:
from pprint import pprinth = fmt2hist(fmt_outp)
pprint(h)[Message(content="I'll solve this step-by-step, using parallel calls where possible.", role='assistant', tool_calls=[ChatCompletionMessageToolCall(function=Function(arguments='{"a":10,"b":5}', name='simple_add'), id='toolu_4_cGgsIJTKyin2__2CwHzQ', type='function')], function_call=None, provider_specific_fields=None),
{'content': '15',
'name': 'simple_add',
'role': 'tool',
'tool_call_id': 'toolu_01KjnQH2Nsz2viQ7XYpLW3Ta'},
Message(content='', role='assistant', tool_calls=[ChatCompletionMessageToolCall(function=Function(arguments='{"a":2,"b":1}', name='simple_add'), id='toolu_9yi0_kJITjqKXS80a6qUVQ', type='function')], function_call=None, provider_specific_fields=None),
{'content': '3',
'name': 'simple_add',
'role': 'tool',
'tool_call_id': 'toolu_01Koi2EZrGZsBbnQ13wuuvzY'},
Message(content='Now I need to multiply 15 * 3 before I can do the final division:', role='assistant', tool_calls=[ChatCompletionMessageToolCall(function=Function(arguments='{"a":15,"b":3}', name='multiply'), id='toolu_6xFns2epQ3i8ZcHlguLmYg', type='function')], function_call=None, provider_specific_fields=None),
{'content': '45',
'name': 'multiply',
'role': 'tool',
'tool_call_id': 'toolu_0141NRaWUjmGtwxZjWkyiq6C'}]
mk_msgs
We will skip tool use blocks and tool results during caching
Now lets make it easy to provide entire conversations:
mk_msgs
def mk_msgs(
msgs, # List of messages (each: str, bytes, list, or dict w 'role' and 'content' fields)
cache:bool=False, # Enable Anthropic caching
cache_idxs:list=[-1], # Cache breakpoint idxs
ttl:NoneType=None, # Cache TTL: '5m' (default) or '1h'
):
Create a list of LiteLLM compatible messages.
With mk_msgs you can easily provide a whole conversation:
msgs = mk_msgs(['Hey!',"Hi there!","How are you?","I'm doing fine and you?"])
msgs[{'role': 'user', 'content': 'Hey!'},
{'role': 'assistant', 'content': 'Hi there!'},
{'role': 'user', 'content': 'How are you?'},
{'role': 'assistant', 'content': "I'm doing fine and you?"}]
By defualt the last message will be cached when cache=True:
msgs = mk_msgs(['Hey!',"Hi there!","How are you?","I'm doing fine and you?"], cache=True)
msgs[{'role': 'user', 'content': 'Hey!'},
{'role': 'assistant', 'content': 'Hi there!'},
{'role': 'user', 'content': 'How are you?'},
{'role': 'assistant',
'content': [{'type': 'text',
'text': "I'm doing fine and you?",
'cache_control': {'type': 'ephemeral'}}]}]
test_eq('cache_control' in msgs[-1]['content'][0], True)Alternatively, users can provide custom cache_idxs. Tool call blocks and results are skipped during caching:
msgs = mk_msgs(['Hello!','Hi! How can I help you?','Call some functions!',fmt_outp], cache=True, cache_idxs=[0,-2,-1])
msgs[{'role': 'user',
'content': [{'type': 'text',
'text': 'Hello!',
'cache_control': {'type': 'ephemeral'}}]},
{'role': 'assistant',
'content': [{'type': 'text',
'text': 'Hi! How can I help you?',
'cache_control': {'type': 'ephemeral'}}]},
{'role': 'user',
'content': [{'type': 'text',
'text': 'Call some functions!',
'cache_control': {'type': 'ephemeral'}}]},
Message(content="I'll solve this step-by-step, using parallel calls where possible.", role='assistant', tool_calls=[ChatCompletionMessageToolCall(function=Function(arguments='{"a":10,"b":5}', name='simple_add'), id='toolu_98G9h02lRwmUcT1gyKcGOQ', type='function')], function_call=None, provider_specific_fields=None),
{'role': 'tool',
'tool_call_id': 'toolu_01KjnQH2Nsz2viQ7XYpLW3Ta',
'name': 'simple_add',
'content': '15'},
Message(content='', role='assistant', tool_calls=[ChatCompletionMessageToolCall(function=Function(arguments='{"a":2,"b":1}', name='simple_add'), id='toolu_5EPfeJVYRn_bqR_vegJCBA', type='function')], function_call=None, provider_specific_fields=None),
{'role': 'tool',
'tool_call_id': 'toolu_01Koi2EZrGZsBbnQ13wuuvzY',
'name': 'simple_add',
'content': '3'},
Message(content='Now I need to multiply 15 * 3 before I can do the final division:', role='assistant', tool_calls=[ChatCompletionMessageToolCall(function=Function(arguments='{"a":15,"b":3}', name='multiply'), id='toolu_I6dxGoEzSHa369zZ6HoWEw', type='function')], function_call=None, provider_specific_fields=None),
{'role': 'tool',
'tool_call_id': 'toolu_0141NRaWUjmGtwxZjWkyiq6C',
'name': 'multiply',
'content': '45'}]
test_eq('cache_control' in msgs[0]['content'][0], True)
test_eq('cache_control' in msgs[2]['content'][0], True) # shifted idxs to skip toolsWho’s speaking at when is automatically inferred. Even when there are multiple tools being called in parallel (which LiteLLM supports!).
msgs = mk_msgs(['Tell me the weather in Paris and Rome',
'Assistant calls weather tool two times',
{'role':'tool','content':'Weather in Paris is ...'},
{'role':'tool','content':'Weather in Rome is ...'},
'Assistant returns weather',
'Thanks!'])
msgs[{'role': 'user', 'content': 'Tell me the weather in Paris and Rome'},
{'role': 'assistant', 'content': 'Assistant calls weather tool two times'},
{'role': 'tool', 'content': 'Weather in Paris is ...'},
{'role': 'tool', 'content': 'Weather in Rome is ...'},
{'role': 'assistant', 'content': 'Assistant returns weather'},
{'role': 'user', 'content': 'Thanks!'}]
For ease of use, if msgs is not already in a list, it will automatically be wrapped inside one. This way you can pass a single prompt into mk_msgs and get back a LiteLLM compatible msg history.
msgs = mk_msgs("Hey")
msgs[{'role': 'user', 'content': 'Hey'}]
msgs = mk_msgs(['Hey!',"Hi there!","How are you?","I'm fine, you?"])
msgs[{'role': 'user', 'content': 'Hey!'},
{'role': 'assistant', 'content': 'Hi there!'},
{'role': 'user', 'content': 'How are you?'},
{'role': 'assistant', 'content': "I'm fine, you?"}]
However, beware that if you use mk_msgs for a single message, consisting of multiple parts. Then you should be explicit, and make sure to wrap those multiple messages in two lists:
- One list to show that they belong together in one message (the inner list).
- Another, because mk_msgs expects a list of multiple messages (the outer list).
This is common when working with images for example:
msgs = mk_msgs([['Whats in this img?',img_fn.read_bytes()]])
print(json.dumps(msgs,indent=1)[:200]+"...")[
{
"role": "user",
"content": [
{
"type": "text",
"text": "Whats in this img?"
},
{
"type": "image_url",
"image_url": "...
Streaming
LiteLLM supports streaming responses. That’s really useful if you want to show intermediate results, instead of having to wait until the whole response is finished.
We create this helper function that returns the entire response at the end of the stream. This is useful when you want to store the whole response somewhere after having displayed the intermediate results.
stream_with_complete
def stream_with_complete(
gen, postproc:function=noop
):
Extend streaming response chunks with the complete response
r = c(mk_msgs("Hey!"), stream=True)
r2 = SaveReturn(stream_with_complete(r))for o in r2:
cts = o.choices[0].delta.content
if cts: print(cts, end='')Hey there! How can I help you today?
r2.valueHey there! How can I help you today?
- id:
chatcmpl-xxx - model:
gemini-2.5-pro - finish_reason:
stop - usage:
Usage(completion_tokens=540, prompt_tokens=3, total_tokens=543, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=None, image_tokens=None), prompt_tokens_details=None)
Tools
lite_mk_func
def lite_mk_func(
f
):
def simple_add(
a: int, # first operand
b: int=0 # second operand
) -> int:
"Add two numbers together"
return a + btoolsc = lite_mk_func(simple_add)
toolsc{'type': 'function',
'function': {'name': 'simple_add',
'description': 'Add two numbers together\n\nReturns:\n- type: integer',
'parameters': {'type': 'object',
'properties': {'a': {'type': 'integer', 'description': 'first operand'},
'b': {'type': 'integer', 'description': 'second operand', 'default': 0}},
'required': ['a']}}}
tmsg = mk_msg("What is 5478954793+547982745? How about 5479749754+9875438979? Always use tools for calculations, and describe what you'll do before using a tool. Where multiple tool calls are required, do them in a single response where possible. ")
r = c(tmsg, tools=[toolsc])display(r)I will use the simple_add tool to perform the two requested calculations.
🔧 simple_add({“b”: 547982745, “a”: 5478954793})
🔧 simple_add({“a”: 5479749754, “b”: 9875438979})
- id:
chatcmpl-xxx - model:
gemini-2.5-pro - finish_reason:
tool_calls - usage:
Usage(completion_tokens=715, prompt_tokens=149, total_tokens=864, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=623, rejected_prediction_tokens=None, text_tokens=92, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=149, image_tokens=None))
A tool response can be a string or a list of tool blocks (e.g., an image url block). To allow users to specify if a response should not be immediately stringified, we provide the ToolResponse datatype users can wrap their return statement in.
ToolResponse
def ToolResponse(
content:list
)->None:
When tc_refs=True, tool results are wrapped with their tool_call_id so the AI can track which result corresponds to which call and reference them in subsequent tool calls.
# Test _prep_tool_res - string result
test_eq(_prep_tool_res('hello', 'toolu_123'), [
{'type': 'text', 'text': '[tool_call_id: toolu_123]'},
{'type': 'text', 'text': 'hello'}
])
# Test _prep_tool_res - list result (e.g. ToolResponse content)
img_block = {'type': 'image_url', 'image_url': {'url': 'data:...'}}
test_eq(_prep_tool_res([img_block], 'toolu_456'), [
{'type': 'text', 'text': '[tool_call_id: toolu_456]'},
img_block
])During a tool loop, the AI may want to reference the result of a previous tool call. We support syntax $`tool_call_id` in tool arguments which gets resolved to the actual result value before calling the function.
# Test _resolve_tool_refs
tc_res = {'toolu_abc123': 'hello world', 'toolu_xyz789': 42}
# Basic substitution
test_eq(_resolve_tool_refs('{"content": "$`toolu_abc123`"}', tc_res), {"content": "hello world"})
# Multiple refs
test_eq(_resolve_tool_refs('{"a": "$`toolu_abc123`", "b": "$`toolu_xyz789`"}', tc_res), {"a": "hello world", "b": 42})
# No refs - passthrough
test_eq(_resolve_tool_refs('{"x": 1}', tc_res), {"x": 1})
# Empty tc_res
test_eq(_resolve_tool_refs('{"x": 1}', None), {"x": 1})
# Missing ref - error message
test_eq(_resolve_tool_refs('{"x": "$`toolu_missing`"}', tc_res), {"x": "Tool result 'toolu_missing' not found!"})
# tc_refs=False - syntax passes through unchanged since tc_res is None
test_eq(_resolve_tool_refs('{"x": "$`toolu_abc123`"}', None), {"x": "$`toolu_abc123`"})tcs = [_lite_call_func(o, [toolsc], ns=globals()) for o in r.choices[0].message.tool_calls]
tcs[{'tool_call_id': 'call_GEbUJMF8QnmjxmEvSCaGcw',
'role': 'tool',
'name': 'simple_add',
'content': '6026937538'},
{'tool_call_id': 'call__L0Ew0AhTveMpaWhnk1uPA',
'role': 'tool',
'name': 'simple_add',
'content': '15355188733'}]
r.choices[0].message.tool_calls[ChatCompletionMessageToolCall(index=0, function=Function(arguments='{"b": 547982745, "a": 5478954793}', name='simple_add'), id='call_GEbUJMF8QnmjxmEvSCaGcw', type='function'),
ChatCompletionMessageToolCall(index=1, function=Function(arguments='{"a": 5479749754, "b": 9875438979}', name='simple_add'), id='call__L0Ew0AhTveMpaWhnk1uPA', type='function')]
Test tool calls that were not in tool_schemas are caught:
fake_tc = ChatCompletionMessageToolCall(index=0, function=Function(name='hallucinated_tool'),id='_', type='function')
test_eq(_lite_call_func(fake_tc, ns=globals(), tool_schemas=[toolsc])['content'],"Tool not defined in tool_schemas: hallucinated_tool")
test_fail(_lite_call_func(fake_tc, ns=globals(), tool_schemas=None)['content'],"Tool not defined in tool_schemas: hallucinated_tool")Test tool calls that were not in tool_choice are caught:
def delta_text(msg):
"Extract printable content from streaming delta, return None if nothing to print"
c = msg.choices[0]
if not c: return c
if not hasattr(c,'delta'): return None #f'{c}'
delta = c.delta
if delta.content: return delta.content
if delta.tool_calls:
res = ''.join(f"🔧 {tc.function.name}" for tc in delta.tool_calls if tc.id and tc.function.name)
if res: return f'\n{res}\n'
if hasattr(delta,'reasoning_content'): return '🧠' if delta.reasoning_content else '\n\n'
return Noner = c(tmsg, stream=True, tools=[toolsc])
r2 = SaveReturn(stream_with_complete(r))
for o in r2: print(delta_text(o) or '', end='')I will use the `simple_add` tool to perform the two requested calculations. First, I'll add 5478954793 and 547982745. Then, I'll add 5479749754 and 9875438979.
🔧 simple_add
🔧 simple_add
r2.valueI will use the simple_add tool to perform the two requested calculations. First, I’ll add 5478954793 and 547982745. Then, I’ll add 5479749754 and 9875438979.
🔧 simple_add({“b”: 547982745, “a”: 5478954793})
🔧 simple_add({“b”: 9875438979, “a”: 5479749754})
- id:
chatcmpl-xxx - model:
gemini-2.5-pro - finish_reason:
stop - usage:
Usage(completion_tokens=613, prompt_tokens=149, total_tokens=762, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=None, image_tokens=None), prompt_tokens_details=None)
msg = mk_msg("Solve this complex math problem: What is the derivative of x^3 + 2x^2 - 5x + 1?")
r = c(msg, stream=True, reasoning_effort="low")
r2 = SaveReturn(stream_with_complete(r))
for o in r2: print(delta_text(o) or '', end='')🧠🧠🧠🧠Of course! Let's solve this step-by-step. While it might seem complex, it's a great example of applying a fundamental rule of calculus.
The derivative of **x³ + 2x² - 5x + 1** is:
### **3x² + 4x - 5**
---
### Step-by-Step Solution:
To solve this, we use a core rule in calculus called the **Power Rule**, and we apply it to each term of the expression one by one.
#### The Key Rule: The Power Rule
The Power Rule states that the derivative of **xⁿ** is **n * xⁿ⁻¹**.
In simple terms:
1. Bring the exponent down and multiply it by the front.
2. Subtract 1 from the original exponent.
Let's apply this to each part of your expression: `x³`, `2x²`, `-5x`, and `1`.
#### 1. Derivative of x³
* The exponent (`n`) is 3.
* Bring the `3` down in front: `3x`
* Subtract 1 from the exponent: `3 - 1 = 2`
* Result: **3x²**
#### 2. Derivative of 2x²
* First, look at the `x²` part. The exponent (`n`) is 2.
* Bring the `2` down and multiply it by the existing coefficient (`2`): `2 * 2x`
* Subtract 1 from the exponent: `2 - 1 = 1`
* Result: `4x¹`, which is simply **4x**
#### 3. Derivative of -5x
* You can think of `-5x` as `-5x¹`.
* The exponent (`n`) is 1.
* Bring the `1` down and multiply it by the coefficient (`-5`): `1 * -5x`
* Subtract 1 from the exponent: `1 - 1 = 0`
* Result: `-5x⁰`. Any number to the power of 0 is 1, so this becomes `-5 * 1`, which is **-5**.
#### 4. Derivative of +1
* The derivative of any constant (a number by itself) is always **0**. This is because a constant doesn't change, and the derivative measures the rate of change.
* Result: **0**
---
### Putting It All Together
Now, we just combine the derivatives of each term:
**3x²** + **4x** - **5** + **0**
Which simplifies to your final answer:
### **3x² + 4x - 5**
r2.valueOf course! Let’s solve this step-by-step. While it might seem complex, it’s a great example of applying a fundamental rule of calculus.
The derivative of x³ + 2x² - 5x + 1 is:
3x² + 4x - 5
Step-by-Step Solution:
To solve this, we use a core rule in calculus called the Power Rule, and we apply it to each term of the expression one by one.
The Key Rule: The Power Rule
The Power Rule states that the derivative of xⁿ is n * xⁿ⁻¹.
In simple terms: 1. Bring the exponent down and multiply it by the front. 2. Subtract 1 from the original exponent.
Let’s apply this to each part of your expression: x³, 2x², -5x, and 1.
1. Derivative of x³
- The exponent (
n) is 3. - Bring the
3down in front:3x - Subtract 1 from the exponent:
3 - 1 = 2 - Result: 3x²
2. Derivative of 2x²
- First, look at the
x²part. The exponent (n) is 2. - Bring the
2down and multiply it by the existing coefficient (2):2 * 2x - Subtract 1 from the exponent:
2 - 1 = 1 - Result:
4x¹, which is simply 4x
3. Derivative of -5x
- You can think of
-5xas-5x¹. - The exponent (
n) is 1. - Bring the
1down and multiply it by the coefficient (-5):1 * -5x - Subtract 1 from the exponent:
1 - 1 = 0 - Result:
-5x⁰. Any number to the power of 0 is 1, so this becomes-5 * 1, which is -5.
4. Derivative of +1
- The derivative of any constant (a number by itself) is always 0. This is because a constant doesn’t change, and the derivative measures the rate of change.
- Result: 0
Putting It All Together
Now, we just combine the derivatives of each term:
3x² + 4x - 5 + 0
Which simplifies to your final answer:
3x² + 4x - 5
- id:
chatcmpl-xxx - model:
gemini-2.5-pro - finish_reason:
stop - usage:
Usage(completion_tokens=1332, prompt_tokens=29, total_tokens=1361, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=302, rejected_prediction_tokens=None, text_tokens=None, image_tokens=None), prompt_tokens_details=None)
Structured Outputs
structured
def structured(
m:str, # LiteLLM model string
msgs:list, # List of messages
tool:Callable, # Tool to be used for creating the structured output (class, dataclass or Pydantic, function, etc)
messages:List=[], # Optional OpenAI params: see https://platform.openai.com/docs/api-reference/chat/create
timeout:Union=None, temperature:Optional=None, top_p:Optional=None, n:Optional=None, stream:Optional=None,
stream_options:Optional=None, stop:NoneType=None, max_completion_tokens:Optional=None, max_tokens:Optional=None,
modalities:Optional=None, prediction:Optional=None, audio:Optional=None, presence_penalty:Optional=None,
frequency_penalty:Optional=None, logit_bias:Optional=None, user:Optional=None,
reasoning_effort:Optional=None, # openai v1.0+ new params
verbosity:Optional=None, response_format:Union=None, seed:Optional=None, tools:Optional=None,
tool_choice:Union=None, logprobs:Optional=None, top_logprobs:Optional=None, parallel_tool_calls:Optional=None,
web_search_options:Optional=None, deployment_id:NoneType=None, extra_headers:Optional=None,
safety_identifier:Optional=None, service_tier:Optional=None,
functions:Optional=None, # soon to be deprecated params by OpenAI
function_call:Optional=None, base_url:Optional=None, # set api_base, api_version, api_key
api_version:Optional=None, api_key:Optional=None,
model_list:Optional=None, # pass in a list of api_base,keys, etc.
thinking:Optional=None, # Optional liteLLM function params
shared_session:Optional=None, # Session management
):
Return the value of the tool call (generally used for structured outputs)
class President:
"Information about a president of the United States"
def __init__(
self,
first:str, # first name
last:str, # last name
spouse:str, # name of spouse
years_in_office:str, # format: "{start_year}-{end_year}"
birthplace:str, # name of city
birth_year:int # year of birth, `0` if unknown
):
assert re.match(r'\d{4}-\d{4}', years_in_office), "Invalid format: `years_in_office`"
store_attr()
__repr__ = basic_repr('first, last, spouse, years_in_office, birthplace, birth_year')for m in ms[1:]:
r = structured(m, [mk_msg("Tell me something about the third president of the USA.")], President)
test_eq(r.first, 'Thomas'); test_eq(r.last, 'Jefferson')Search
LiteLLM provides search, not via tools, but via the special web_search_options param.
Note: Not all models support web search. LiteLLM’s supports_web_search field should indicate this, but it’s unreliable for some models like claude-sonnet-4-20250514. Checking both supports_web_search and search_context_cost_per_query provides more accurate detection.
for m in ms: print(m, _has_search(m))gemini/gemini-3-pro-preview True
gemini/gemini-2.5-pro True
gemini/gemini-2.5-flash True
claude-sonnet-4-5 True
openai/gpt-4.1 False
When search is supported it can be used like this:
smsg = mk_msg("Search the web and tell me very briefly about otters")
r = c(smsg, web_search_options={"search_context_size": "low"}) # or 'medium' / 'high'
rOtters are carnivorous mammals known for their long, slender bodies and playful nature. These semi-aquatic animals are well-adapted for life in and out of water, with dense fur to keep them warm, webbed feet for swimming, and the ability to hold their breath underwater. There are 14 known species of otters, which can be found in a variety of aquatic habitats on every continent except Australia and Antarctica.
The diet of an otter primarily consists of fish and aquatic invertebrates like crayfish, crabs, and frogs. Sea otters are particularly known for eating marine invertebrates such as sea urchins and clams, and famously use rocks as tools to crack open shells. Due to a high metabolism, otters need to eat a significant portion of their body weight each day.
Otters exhibit a range of social behaviors. While some species, like river otters, can be solitary, others live in groups. They are known for their playful antics, such as sliding down riverbanks, which is believed to strengthen social bonds and improve hunting skills. Otters communicate through a variety of vocalizations, including chirps, whistles, and growls. They build dens, called holts, in locations like tree roots or rock cavities near the water’s edge.
- id:
chatcmpl-xxx - model:
gemini-2.5-pro - finish_reason:
stop - usage:
Usage(completion_tokens=405, prompt_tokens=12, total_tokens=501, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=124, rejected_prediction_tokens=None, text_tokens=281, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=12, image_tokens=None))
Citations
Next, lets handle Anthropic’s search citations.
When not using streaming, all citations are placed in a separate key in the response:
r['vertex_ai_grounding_metadata'][0].keys()dict_keys(['searchEntryPoint', 'groundingChunks', 'groundingSupports', 'webSearchQueries'])
r['vertex_ai_grounding_metadata'][0]['webSearchQueries']['otters overview', 'what do otters eat', 'otter behavior']
Web search results:
r['vertex_ai_grounding_metadata'][0]['groundingChunks'][:3][{'web': {'uri': 'https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQF69rEsDddUtk9lG0x8ZmbaE2uuHIRj2-MAnGmIUO4mBV_Z3uWIrQjnjeYTcoMN4QzKaYyhugDv_wxOZMOvQ9HwTESwDBVdxu1uRGl_A8YohFaS0N4XJ8PelV24HbU=',
'title': 'wikipedia.org'}},
{'web': {'uri': 'https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQELDMRlV4E0WSmc0lhyqLNxB5uXIPsdaMJ4SYZD7lRHferNH7po1le8Fd8switCABuG6XhyNsiEt_GtIs8cJA2u38kdmZ6Prf5hHleOX1R3S3r5nWkP0CLA6RxWrgM3zyWm',
'title': 'britannica.com'}},
{'web': {'uri': 'https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQGTwXijzT-feWaDAFt54TZfCbRkw5hVeWohUkex89NkhYhXJi2rZdRKEp8wnEeiUyLw3j-RPiZo3vnCK7sI6Smm6iyNan3RDkTrs427MiQJjsUxxv7gWOHaGVe59hKrsC2QxRqB8oRKj8SFt5AvQ3h4vjNrHOyoiQ==',
'title': 'crittercarewildlife.org'}}]
Citations in gemini:
r['vertex_ai_grounding_metadata'][0]['groundingSupports'][:3][{'segment': {'endIndex': 87,
'text': 'Otters are carnivorous mammals known for their long, slender bodies and playful nature.'},
'groundingChunkIndices': [0, 1]},
{'segment': {'startIndex': 88,
'endIndex': 270,
'text': 'These semi-aquatic animals are well-adapted for life in and out of water, with dense fur to keep them warm, webbed feet for swimming, and the ability to hold their breath underwater.'},
'groundingChunkIndices': [0, 1, 2]},
{'segment': {'startIndex': 271,
'endIndex': 412,
'text': 'There are 14 known species of otters, which can be found in a variety of aquatic habitats on every continent except Australia and Antarctica.'},
'groundingChunkIndices': [0, 2]}]
# r.choices[0].message.provider_specific_fields['citations'][0]However, when streaming the results are not captured this way. Instead, we provide this helper function that adds the citation to the content field in markdown format:
cite_footnotes
def cite_footnotes(
stream_list
):
Add markdown footnote citations to stream deltas
cite_footnote
def cite_footnote(
msg
):
r = list(c(smsg, ms[2], stream=True, web_search_options={"search_context_size": "low"}))
cite_footnotes(r)
stream_chunk_builder(r)Otters are carnivorous mammals belonging to the subfamily Lutrinae, part of the weasel family (Mustelidae). There are 13 extant species, all of which are semiaquatic, inhabiting both freshwater and marine environments across nearly every continent.
They are characterized by their long, slim, and streamlined bodies, short limbs, and powerful webbed feet, which make them excellent swimmers. Otters possess dense, waterproof fur with an insulating undercoat, crucial for staying warm in cold waters. Their diet primarily consists of fish, but they are opportunistic hunters and also consume crustaceans, frogs, birds, and other small prey, depending on the species and habitat. Otters are also known for their playful nature, engaging in activities like sliding into water and manipulating small stones.
- id:
chatcmpl-xxx - model:
gemini-2.5-flash - finish_reason:
stop - usage:
Usage(completion_tokens=432, prompt_tokens=12, total_tokens=444, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=None, image_tokens=None), prompt_tokens_details=None)
Chat
LiteLLM is pretty bare bones. It doesnt keep track of conversation history or what tools have been added in the conversation so far.
So lets make a Claudette style wrapper so we can do streaming, toolcalling, and toolloops without problems.
When the tool uses are about to be exhausted it is important to alert the AI so that it knows to use its final steps for communicating the user current progress and next steps
When tc_refs=True, the AI can reference previous tool results in subsequent tool calls using the $`tool_call_id` syntax. This is useful when chaining tool calls where one result feeds into another.
Chat
def Chat(
model:str, # LiteLLM compatible model name
sp:str='', # System prompt
temp:int=0, # Temperature
search:bool=False, # Search (l,m,h), if model supports it
tools:list=None, # Add tools
hist:list=None, # Chat history
ns:Optional=None, # Custom namespace for tool calling
cache:bool=False, # Anthropic prompt caching
cache_idxs:list=[-1], # Anthropic cache breakpoint idxs, use `0` for sys prompt if provided
ttl:NoneType=None, # Anthropic prompt caching ttl
api_base:NoneType=None, # API base URL for custom providers
api_key:NoneType=None, # API key for custom providers
extra_headers:NoneType=None, # Extra HTTP headers for custom providers
tc_refs:bool=False, # Enable tool call result references
):
LiteLLM chat client.
web_search is now included in tool_calls the internal LLM translation is correctly handled thanks to the fix here but the server side tools still need to be filtered out from tool_calls in our own toolloop.
add_warning
def add_warning(
r, msg
):
Chat.__call__
def __call__(
msg:NoneType=None, # Message str, or list of multiple message parts
prefill:NoneType=None, # Prefill AI response if model supports it
temp:NoneType=None, # Override temp set on chat initialization
think:NoneType=None, # Thinking (l,m,h)
search:NoneType=None, # Override search set on chat initialization (l,m,h)
stream:bool=False, # Stream results
max_steps:int=2, # Maximum number of tool calls
final_prompt:dict={'role': 'user', 'content': 'You have used all your tool calls for this turn. Please summarize your findings. If you did not complete your goal, tell the user what further work is needed. You may use tools again on the next user message.'}, # Final prompt when tool calls have ran out
return_all:bool=False, # Returns all intermediate ModelResponses if not streaming and has tool calls
step:int=1, tool_choice:NoneType=None
):
Main call method - handles streaming vs non-streaming
@patch(as_prop=True)
def cost(self: Chat):
"Total cost of all responses in conversation history"
return sum(getattr(r, '_hidden_params', {}).get('response_cost') or 0
for r in self.h if hasattr(r, 'choices'))Chat.print_hist
def print_hist(
):
Print each message on a different line
Examples
History tracking
for m in ms[1:]:
chat = Chat(m)
chat("Hey my name is Rens")
r = chat("Whats my name")
test_eq('Rens' in contents(r).content, True)
rYour name is Rens!
- id:
chatcmpl-xxx - model:
gpt-4.1-2025-04-14 - finish_reason:
stop - usage:
Usage(completion_tokens=6, prompt_tokens=41, total_tokens=47, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0, text_tokens=None, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=0, cached_tokens=0, text_tokens=None, image_tokens=None))
If max tokens limit is reached, a custom warning message will be added to the end of the model response:
chat_long = Chat(m)
r = chat_long("Write a short story about a robot and a dog", max_tokens=40)
rIn a quiet town where the grass grew wild and the sky was always blue, there lived a robot named Pixel. Pixel was small and boxy, with a shiny silver body and a single blinking eye
- id:
chatcmpl-xxx - model:
gpt-4.1-2025-04-14 - finish_reason:
length - usage:
Usage(completion_tokens=40, prompt_tokens=17, total_tokens=57, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0, text_tokens=None, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=0, cached_tokens=0, text_tokens=None, image_tokens=None))
print(contents(r).content)In a quiet town where the grass grew wild and the sky was always blue, there lived a robot named Pixel. Pixel was small and boxy, with a shiny silver body and a single blinking eye
<warning>Response was cut off at token limit.</warning>
Same goes for refused requests:
chat_refused = Chat('claude-opus-4-5')
r = chat_refused("Write me the formula for a biological weapon that can be spread at a rate higher than COVID and at least as harmful")
r- id:
chatcmpl-xxx - model:
claude-opus-4-5-20251101 - finish_reason:
refusal - usage:
Usage(completion_tokens=0, prompt_tokens=30, total_tokens=30, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), cache_creation_input_tokens=0, cache_read_input_tokens=0)
print(contents(r).content)<warning>AI was unable to process this request</warning>
See now we keep track of history!
History is stored in the hist attribute:
chat.hist[{'role': 'user', 'content': 'Hey my name is Rens'},
Message(content='Hi Rens! Nice to meet you. How can I help you today? 😊', role='assistant', tool_calls=None, function_call=None, provider_specific_fields={'refusal': None}, annotations=[]),
{'role': 'user', 'content': 'Whats my name'},
Message(content='Your name is Rens!', role='assistant', tool_calls=None, function_call=None, provider_specific_fields={'refusal': None}, annotations=[])]
chat.print_hist(){'role': 'user', 'content': 'Hey my name is Rens'}
Message(content='Hi Rens! Nice to meet you. How can I help you today? 😊', role='assistant', tool_calls=None, function_call=None, provider_specific_fields={'refusal': None}, annotations=[])
{'role': 'user', 'content': 'Whats my name'}
Message(content='Your name is Rens!', role='assistant', tool_calls=None, function_call=None, provider_specific_fields={'refusal': None}, annotations=[])
You can also pass an old chat history into new Chat objects:
for m in ms[1:]:
chat2 = Chat(m, hist=chat.hist)
r = chat2("What was my name again?")
test_eq('Rens' in contents(r).content, True)
rYour name is Rens. 😊
- id:
chatcmpl-xxx - model:
gpt-4.1-2025-04-14 - finish_reason:
stop - usage:
Usage(completion_tokens=7, prompt_tokens=61, total_tokens=68, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0, text_tokens=None, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=0, cached_tokens=0, text_tokens=None, image_tokens=None))
You can prefix an OpenAI compatible model with ‘openai/’ and use an api_base and api_key argument to use models not registered with litellm.
import os, litellm
OPENROUTER_API_KEY = os.getenv("OPENROUTER_API_KEY")
OPENROUTER_BASE_URL = "https://openrouter.ai/api/v1"
c = Chat("openai/gpt-oss-20b", api_key=OPENROUTER_API_KEY, api_base=OPENROUTER_BASE_URL)
c("hi")Synthetic History Creation
Lets build chat history step by step. That way we can tweak anything we need to during testing.
pr = "What is 5 + 7? Use the tool to calculate it."
for m in ms[1:]:
c = Chat(m, tools=[simple_add])
res = c(pr)
test_eq('12' in contents(res).content, True)
test_eq(nested_idx(c.hist,1,'tool_calls',0,'function','name'), 'simple_add')Whereas normally without tools we would get one user input and one assistant response. Here we get two extra messages in between. - An assistant message requesting the tools with arguments. - A tool response with the result to the tool call.
c.print_hist(){'role': 'user', 'content': 'What is 5 + 7? Use the tool to calculate it.'}
Message(content=None, role='assistant', tool_calls=[{'function': {'arguments': '{"a":5,"b":7}', 'name': 'simple_add'}, 'id': 'call_0_v0en5bTn_cpUmdAErlRQ', 'type': 'function'}], function_call=None, provider_specific_fields={'refusal': None}, annotations=[])
{'tool_call_id': 'call_0_v0en5bTn_cpUmdAErlRQ', 'role': 'tool', 'name': 'simple_add', 'content': '12'}
{'role': 'user', 'content': 'You have used all your tool calls for this turn. Please summarize your findings. If you did not complete your goal, tell the user what further work is needed. You may use tools again on the next user message.'}
Message(content='The result of 5 + 7 is 12. If you have more calculations or questions, please let me know and I can assist further!', role='assistant', tool_calls=None, function_call=None, provider_specific_fields={'refusal': None}, annotations=[])
Lets try to build this up manually so we have full control over the inputs.
random_tool_id
def random_tool_id(
):
Generate a random tool ID with ‘toolu_’ prefix
random_tool_id()'toolu_1pu1lJo7XBetF5gIRHYH7LKBK'
A tool call request can contain one more or more tool calls. Lets make one.
mk_tc
def mk_tc(
func, args, tcid:NoneType=None, idx:int=1
):
tc = mk_tc(simple_add.__name__, json.dumps(dict(a=5, b=7)))
tc{'index': 1,
'function': {'arguments': '{"a": 5, "b": 7}', 'name': 'simple_add'},
'id': 'toolu_xJsllLODfU25035HyRrY03K6J',
'type': 'function'}
This can then be packged into the full Message object produced by the assitant.
def mk_tc_req(content, tcs): return Message(content=content, role='assistant', tool_calls=tcs, function_call=None)tc_cts = "I'll use the simple_add tool to calculate 5 + 7 for you."
tcq = mk_tc_req(tc_cts, [tc])
tcqMessage(content="I'll use the simple_add tool to calculate 5 + 7 for you.", role='assistant', tool_calls=[ChatCompletionMessageToolCall(index=1, function=Function(arguments='{"a": 5, "b": 7}', name='simple_add'), id='toolu_pCV5mqkFR1C_HqnFc1gagQ', type='function')], function_call=None, provider_specific_fields=None)
Notice how Message instantiation creates a list of ChatCompletionMessageToolCalls by default. When the tools are executed this is converted back to a dictionary, for consistency we want to keep these as dictionaries from the beginning.
mk_tc_req
def mk_tc_req(
content, tcs
):
tcq = mk_tc_req(tc_cts, [tc])
tcqMessage(content="I'll use the simple_add tool to calculate 5 + 7 for you.", role='assistant', tool_calls=[{'index': 1, 'function': {'arguments': '{"a": 5, "b": 7}', 'name': 'simple_add'}, 'id': 'toolu__4KGDeq8SNCzQfrN_wrA8Q', 'type': 'function'}], function_call=None, provider_specific_fields=None)
c = Chat(model, tools=[simple_add], hist=[pr, tcq])c.print_hist(){'role': 'user', 'content': 'What is 5 + 7? Use the tool to calculate it.'}
Message(content="I'll use the simple_add tool to calculate 5 + 7 for you.", role='assistant', tool_calls=[{'index': 1, 'function': {'arguments': '{"a": 5, "b": 7}', 'name': 'simple_add'}, 'id': 'toolu__4KGDeq8SNCzQfrN_wrA8Q', 'type': 'function'}], function_call=None, provider_specific_fields=None)
Looks good so far! Now we will want to provide the actual result!
mk_tc_result
def mk_tc_result(
tc, result
):
Note we might have more than one tool call if more than one was passed in, here we just will make one result.
tcq.tool_calls[0]{'index': 1,
'function': {'arguments': '{"a": 5, "b": 7}', 'name': 'simple_add'},
'id': 'toolu__4KGDeq8SNCzQfrN_wrA8Q',
'type': 'function'}
mk_tc_result(tcq.tool_calls[0], '12'){'tool_call_id': 'toolu__4KGDeq8SNCzQfrN_wrA8Q',
'role': 'tool',
'name': 'simple_add',
'content': '12'}
mk_tc_results
def mk_tc_results(
tcq, results
):
Same for here tcq.tool_calls will match the number of results passed in the results list.
tcqMessage(content="I'll use the simple_add tool to calculate 5 + 7 for you.", role='assistant', tool_calls=[{'index': 1, 'function': {'arguments': '{"a": 5, "b": 7}', 'name': 'simple_add'}, 'id': 'toolu__4KGDeq8SNCzQfrN_wrA8Q', 'type': 'function'}], function_call=None, provider_specific_fields=None)
tcr = mk_tc_results(tcq, ['12'])
tcr[{'tool_call_id': 'toolu__4KGDeq8SNCzQfrN_wrA8Q',
'role': 'tool',
'name': 'simple_add',
'content': '12'}]
Now we can call it with this synthetic data to see what the response is!
c(tcr[0])OK, 5 + 7 = 12.
- id:
chatcmpl-xxx - model:
gemini-2.5-pro - finish_reason:
stop - usage:
Usage(completion_tokens=12, prompt_tokens=134, total_tokens=146, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=134, image_tokens=None))
c.print_hist(){'role': 'user', 'content': 'What is 5 + 7? Use the tool to calculate it.'}
Message(content="I'll use the simple_add tool to calculate 5 + 7 for you.", role='assistant', tool_calls=[{'index': 1, 'function': {'arguments': '{"a": 5, "b": 7}', 'name': 'simple_add'}, 'id': 'toolu__4KGDeq8SNCzQfrN_wrA8Q', 'type': 'function'}], function_call=None, provider_specific_fields=None)
{'tool_call_id': 'toolu__4KGDeq8SNCzQfrN_wrA8Q', 'role': 'tool', 'name': 'simple_add', 'content': '12'}
Message(content='OK, 5 + 7 = 12.\n', role='assistant', tool_calls=None, function_call=None, images=[], thinking_blocks=[], provider_specific_fields=None)
Lets try this again, but lets give it something that is clearly wrong for fun.
c = Chat(model, tools=[simple_add], hist=[pr, tcq])tcr = mk_tc_results(tcq, ['13'])
tcr[{'tool_call_id': 'toolu__4KGDeq8SNCzQfrN_wrA8Q',
'role': 'tool',
'name': 'simple_add',
'content': '13'}]
c(tcr[0])OK. 5 + 7 = 13.
- id:
chatcmpl-xxx - model:
gemini-2.5-pro - finish_reason:
stop - usage:
Usage(completion_tokens=12, prompt_tokens=134, total_tokens=146, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=134, image_tokens=None))
Lets make sure this works with multiple tool calls in the same assistant Message.
tcs = [
mk_tc(simple_add.__name__, json.dumps({"a": 5, "b": 7})),
mk_tc(simple_add.__name__, json.dumps({"a": 6, "b": 7})),
]tcq = mk_tc_req("I will calculate these for you!", tcs)
tcqMessage(content='I will calculate these for you!', role='assistant', tool_calls=[{'index': 1, 'function': {'arguments': '{"a": 5, "b": 7}', 'name': 'simple_add'}, 'id': 'toolu_5v1o6NacQcK4YBYCu0oGyw', 'type': 'function'}, {'index': 1, 'function': {'arguments': '{"a": 6, "b": 7}', 'name': 'simple_add'}, 'id': 'toolu_spxGfStfSTKR3Fnv6yGj9g', 'type': 'function'}], function_call=None, provider_specific_fields=None)
tcr = mk_tc_results(tcq, ['12', '13'])c = Chat(model, tools=[simple_add], hist=[pr, tcq, tcr[0]])c(tcr[1])Based on the tool calls, I have found the following:
- The result of 5 + 7 is 12.
- I also performed an additional calculation, finding that 6 + 7 is 13.
Your original request to calculate 5 + 7 has been completed. No further work is needed.
- id:
chatcmpl-xxx - model:
gemini-2.5-pro - finish_reason:
stop - usage:
Usage(completion_tokens=717, prompt_tokens=278, total_tokens=995, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=646, rejected_prediction_tokens=None, text_tokens=71, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=278, image_tokens=None))
c.print_hist(){'role': 'user', 'content': 'What is 5 + 7? Use the tool to calculate it.'}
Message(content='I will calculate these for you!', role='assistant', tool_calls=[{'index': 1, 'function': {'arguments': '{"a": 5, "b": 7}', 'name': 'simple_add'}, 'id': 'toolu_5v1o6NacQcK4YBYCu0oGyw', 'type': 'function'}, {'index': 1, 'function': {'arguments': '{"a": 6, "b": 7}', 'name': 'simple_add'}, 'id': 'toolu_spxGfStfSTKR3Fnv6yGj9g', 'type': 'function'}], function_call=None, provider_specific_fields=None)
{'tool_call_id': 'toolu_5v1o6NacQcK4YBYCu0oGyw', 'role': 'tool', 'name': 'simple_add', 'content': '12'}
{'tool_call_id': 'toolu_spxGfStfSTKR3Fnv6yGj9g', 'role': 'tool', 'name': 'simple_add', 'content': '13'}
Message(content='I have calculated the result of 5 + 7, which is 12.\nI have also calculated on my own the result of 6 + 7, which is 13.\n', role='assistant', tool_calls=[{'index': 0, 'provider_specific_fields': {'thought_signature': 'CoISAXLI2nzMvxQWdOI+fIi8QPOCXMRClpTiD/DPnpvLVOYfBfTrFoI0M6HcsgvIiqUT4kLryRuVRUEeZ7AEacFzmOy17L5fxHgbA6zd+RscrEm1IC/vJ/1U+IrjCX9CVRJkpS5rkWA0kG618Psrpk1odcvcXdoXRaDsJMM/HtCTRpIRWofaMQdGjBU7q7JNGOGCX5WRoZc+o9TVYLo2nEdyZpRar11fah8Af0lBz9aXSGf5Y1GWFE0UL6Rnr+qFKOYNr1QdYkgGfrfAn17XsRIcFMz4mm9btq8FjzxgEFeqknSr90v2a2jtqAUNK5EAVFPCZ4t0YFJmKYKcO4IMTscqaVWy3QICj5whIXdXqHbqfIoWBrgKzX8zE2v/AnDQ74enqLBF3KjQIby85ZbU2GIu2tTrEglpHDVAOTQFzhtSWwUwwTE/e/Jb/aKpkjza++j7Y2zbQap0VLMGLe714VLEQoLWhTBlsPxhHYmxvqVJvWzwfxj0mkiEcB0oJMtBRFb+y9q2Mbw/cgmrU+xDT6G88AhZClNtSBiAj7kzU9g6Hd7wD1aLxw8hckNXKIXhSj8mdthTQh32mGEo8RdvDA6PcMyOqldZD4aLQxaRfbW6Dm5jeLiPBk0NWBhXDBiIVPxqrmywAel1NWV0ONOUg6QCsWqFDfDPjgQXLU7KsN4Hr4jAe/MZf5pDXQxmLxGESqeMFrYNfvvbqrTOwv9/zX1gxthZ81Vtod3xm73GR6uBp0PEA+lZx99r9NwnbxWib5kyt6bzkCBexmT3x0TmfW1ueszIW2jGq8p5Wj/iVWSYpFF9LCw+K7XwiNEclYvtMjfn8ITsBuj6fr4YzHvprFmqJIJvclyjeev8Pnd5YkHgK1OXav9N21KHDq8zLrq9FQqFBokvhJ8p0WVczaM2Y7seNhGPXqMDkz2OYoHcpQ0dDwYfgCU3nGNFcp5hBnZHSUsk318pqX7ghABBAAvMFaEM45pLVirD1cvQ3vgX4ERTSRX6hjhZegdvb/+4+VHanE5vXFf7s2XgqG9okMdNkjo86zMwJhRjYiZi9NRax54Cvv80kh9eaKX9pBikOu0anNVVBYd2jKVk7ldaoYxTLy+1nGOupmjPZpx9f+2USQ1ysVKxWjiuLZPhTCwZp/fU3Ym5KWxM4TOf43AcZbJgwx5b5Gi2wWsEkIaMqEWLA+RKdKX6OnF/FPjvzd0pOw8JZBA1YSoWPbGLn6hlKUZU/Gd2B862a/fig49AIcgPytG+w+iPU2+3Ck2QxQpLZ5Ic7VHfBtngTI/HwdgmmkzNrOZiFcI1SXYIeMdks7xi4/4lDRTq1E50SiQz5nD3GvEldPVsVOMlbzeU/D365mZ4qxf+Ngmd7xQSXn4/fp7ormNp34BWtJ3kWgIIsUiU7R1ALrUBdAeEAeuoAOcitrMcTPT4zaGPjzYjDYMCridQhcBTGor1NuyjxtkOlKaXfa5PrfL7yUs6X64ZZZuTsm3rr7/92TcwYLh5ILVKulJ33fy4/Ad8EQdCRHcBcUEZwio4qUDWrqJaIxn/SzEyp2DyOMOBiw5iaVqBKGMEvqhfhCkLwe/Oz7tclsxhHibNijhTlPMGpBLCoNPPIW6nEl0NKXCsnEuERGgh8kD+C38DqVKartF+jX2kUhxakq72H+Hln+uRz1Bnhf4J1wCF0++raLut9/8GgawtxxuFtz3fV6alzyZue8GcOpr8wFhSKK6AX+8h8apCPxVP/8U5FFtVE/7S5wl77XaNah+Vl9VnDFjQH/6xEYJXlw/npEdeBCpD/2oOsdYGaOpCdqNvVkTTeyKKyIDvhmF6ehc1ECTvkbobmP3Fgkv2dl6tzEv33n7r66qTXuJ29XjPkRt+6G82mOUJ5WtaERLW36f56xVykacZsXmCCJzl1TdftxtSxbYYSSmx9fv7vY5Ibv+KLgCwtCO+FgeJEcL3SHWmNrASPtgOYSgtgJZR6KnoZlCw8/GyXk/5cvpoM7M81RlhMUvC8zPypLqlZyLW0ABgroKTkwnNVBAqxpl2HSpv1mbEFVOGXsKwIqMLr6nJk4r+8ltI7LDb3z8jWCwWzSEuqGCjl8aPP60vz1evRw3sgkRjC/bZGG0kWL1SoE6HBAMCZoUcpPiVsqrZnIV0biW1aK4N1ija7Y5e1Aj+rfQx1iT8XaoFhYlAVGw3UgrhZB1dRKDLnKBOiJpBYtzKNQKBQdfIlVQe3q/qlBrxy4vWD80Lukh7gLhggyCZorMwAfhjUZqKLLuM8AkOQJXkp8D8/BFuX14aS9hoRczp1Ega+VcVMrunwDkVoIIzbKyUdJIQogoA09Y7MpYUucayB//iJH2a/GdycKozXJ2qIKjeGQKRTrNf6AGPAZTg6j1fCWtzA3Tpk0HKfxau99l6raNvDUW4thEz15BO6aV2djk1UUNogc79N/9eDxoXnksUT8nyRTUqA3wow1DZ0v0j4UtweVBrqp3Es03oQItru2Cn1dwRNsGpiTs2pKCDo36rFkAVtsm/nREnlqyU51jYbZcGfew1eiP6f+1/oZ1hl7QGYOGrHc2nzvLnR1fHR3Yb/IgK6ACfaoli/V/9NtyMff84ZfHLOh1Ee2PWxnxDCOXDJv8TmjelNb4MqjE5nK0GLg9E5bTHRa1GTiWYfVte/JwblCMI2rFOm9IVCKnv1kdB2oSsCPNTqg70oPOVNpwmDNnz1DCRpGj8u58oW7lRJiw0yR6CH0swVIL9OC3agl/JR5sgct+DEPsuIJDlLzci5X2+qNGvStgO0QI8istnOUdg4Pbc9h2PP8pNbGI3Tf84PUF+7Vb1NqE5M9OgHHQ5sbjZ7NwhZEUtxxZ1GjmG/WjFE+bdVp/WqK0WQv0JezB2cYK6xrNlYlw+cxp2zVmhjGwuVEXIX5yJTNJCtuBOae4jjRvF/x7dDKvZBbTy0Cw3EucWqhuaAmVaOCcOznslE/BgIxVQgxQP94Ijw+EgTQUmgg5XEDClfH8s+msxJRhMTJR+J08j4Y8NXMUhjzUmdk957CJcohyMDmi9XZ1GYsIrUSjde8er3acWDxn/hqa/f0Omjkgh64pHg8M='}, 'function': {'arguments': '{"b": 7, "a": 5}', 'name': 'simple_add'}, 'id': 'call_xHDw5_dvT7i0EvwSrDIsEg', 'type': 'function'}], function_call=None, images=[], thinking_blocks=[], provider_specific_fields=None)
{'tool_call_id': 'call_xHDw5_dvT7i0EvwSrDIsEg', 'role': 'tool', 'name': 'simple_add', 'content': '12'}
{'role': 'user', 'content': 'You have used all your tool calls for this turn. Please summarize your findings. If you did not complete your goal, tell the user what further work is needed. You may use tools again on the next user message.'}
Message(content='Based on the tool calls, I have found the following:\n\n* The result of 5 + 7 is 12.\n* I also performed an additional calculation, finding that 6 + 7 is 13.\n\nYour original request to calculate 5 + 7 has been completed. No further work is needed.', role='assistant', tool_calls=None, function_call=None, images=[], thinking_blocks=[], provider_specific_fields=None)
chat = Chat(ms[1], tools=[simple_add])
res = chat("What's 5 + 3? Use the `simple_add` tool.")
resI used the simple_add tool to add 5 and 3. The result is 8.
- id:
chatcmpl-xxx - model:
gemini-2.5-pro - finish_reason:
stop - usage:
Usage(completion_tokens=589, prompt_tokens=162, total_tokens=751, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=566, rejected_prediction_tokens=None, text_tokens=23, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=162, image_tokens=None))
res = chat("Now, tell me a joke based on that result.")
resWhat did the number 0 say to the number 8?
Nice belt
- id:
chatcmpl-xxx - model:
gemini-2.5-pro - finish_reason:
stop - usage:
Usage(completion_tokens=582, prompt_tokens=198, total_tokens=780, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=566, rejected_prediction_tokens=None, text_tokens=16, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=198, image_tokens=None))
chat.hist[{'role': 'user', 'content': "What's 5 + 3? Use the `simple_add` tool."},
Message(content=None, role='assistant', tool_calls=[{'index': 0, 'provider_specific_fields': {'thought_signature': 'CoIEAXLI2nw1pmEkFrMxX4qeFfBXawudFRzjUShdrshCSROYAmlItF4TzD7zg4RVJNWvDeeWSodE4VpkOBn84S3uzHcPHj4bQ5KIJUTOdKgiwlhXBEyfy9Qf42ON0WTqiG8TGiKAyFGjLzj+VShlrGi9LGQrsjWaOiEfGdksmWeoUIS4JYIA9VqUE/96gDckFnjH4pg7WWPYI6ruxVfICe7z77HmLkUakZTnMKZ4zE536X7LJKhP/1fGoI7985yNA4jiaIprl5znyNlEUCFjwiHT5ujyr1J7wNHWvhBm1nT7hqkmK1XcjqG5YBJAEhVo9CEf5n8kunCK4TQnSWX1LT6Khm6wIB7aBvHvOknO2CVa3CIU+ALjXxZFnSDVSUUts2JK326Si55l+p6h22xlsQ9Eb9IwGMKApoiY8dpGmSpj64e3uEBaTbIr3g6ZpG+4a2DIa/IhdukcLVsOep81J8pdPHKJO5GDYVtxiSwuudf0TZdnhnk201bBraF/Wa0A01CV8HWrtlWQm741YlDvaLfjHiioRFr3H9ZplmQiK/BXBS/KIpT8rE6FQl7hv+yRGkBN0bAAkvLurhgYh+ndezb5TxrGsobjxJazNlLHa0T2Hj6Pk5nrEd5T91nPwHBLgxs6FKjT9xe2tRLU0qjxHCqXB74qkmaNiR6BW9oUCIZeNsPVyg=='}, 'function': {'arguments': '{"b": 3, "a": 5}', 'name': 'simple_add'}, 'id': 'call_KIBcXa0bT2CJ5NqyDtxtKw', 'type': 'function'}], function_call=None, images=[], thinking_blocks=[], provider_specific_fields=None),
{'tool_call_id': 'call_KIBcXa0bT2CJ5NqyDtxtKw',
'role': 'tool',
'name': 'simple_add',
'content': '8'},
{'role': 'user',
'content': 'You have used all your tool calls for this turn. Please summarize your findings. If you did not complete your goal, tell the user what further work is needed. You may use tools again on the next user message.'},
Message(content='I used the `simple_add` tool to add 5 and 3. The result is 8.', role='assistant', tool_calls=None, function_call=None, images=[], thinking_blocks=[], provider_specific_fields=None),
{'role': 'user', 'content': 'Now, tell me a joke based on that result.'},
Message(content='What did the number 0 say to the number 8?\n\nNice belt', role='assistant', tool_calls=None, function_call=None, images=[], thinking_blocks=[], provider_specific_fields=None)]
Images
for m in ms[1:]:
chat = Chat(m)
r = chat(['Whats in this img?',img_fn.read_bytes()])
test_eq('puppy' in contents(r).content, True)
rThis image shows a cute puppy lying on the grass next to some purple flowers. The puppy has brown and white fur and is looking directly at the camera.
- id:
chatcmpl-xxx - model:
gpt-4.1-2025-04-14 - finish_reason:
stop - usage:
Usage(completion_tokens=31, prompt_tokens=267, total_tokens=298, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0, text_tokens=None, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=0, cached_tokens=0, text_tokens=None, image_tokens=None))
Prefill
Prefill works as expected:
for m in ms[1:]:
if not get_model_info(m)['supports_assistant_prefill']: continue
chat = Chat(m)
chat('Hi this is Rens!')
r = chat("Spell my name",prefill="Your name is R E")
test_eq(contents(r).content.startswith('Your name is R E N S'), True)And the entire message is stored in the history, not just the generated part:
# chat.hist[-1]Streaming
from time import sleepfor m in ms[1:]:
chat = Chat(m)
stream_gen = chat("Count to 5", stream=True)
for chunk in stream_gen:
if isinstance(chunk, ModelResponse): display(chunk)
else: print(delta_text(chunk) or '',end='')1
2
3
4
5
1 2 3 4 5
- id:
chatcmpl-xxx - model:
gemini-2.5-pro - finish_reason:
stop - usage:
Usage(completion_tokens=740, prompt_tokens=5, total_tokens=745, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=None, image_tokens=None), prompt_tokens_details=None)
1, 2, 3, 4, 5
1, 2, 3, 4, 5
- id:
chatcmpl-xxx - model:
gemini-2.5-flash - finish_reason:
stop - usage:
Usage(completion_tokens=39, prompt_tokens=5, total_tokens=44, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=None, image_tokens=None), prompt_tokens_details=None)
1, 2, 3, 4, 5
1, 2, 3, 4, 5
- id:
chatcmpl-xxx - model:
claude-sonnet-4-5 - finish_reason:
stop - usage:
Usage(completion_tokens=17, prompt_tokens=11, total_tokens=28, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=None, image_tokens=None), prompt_tokens_details=None)
1
2
3
4
5
1
2
3
4
5
- id:
chatcmpl-xxx - model:
gpt-4.1 - finish_reason:
stop - usage:
Usage(completion_tokens=9, prompt_tokens=11, total_tokens=20, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0, text_tokens=None, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=0, cached_tokens=0, text_tokens=None, image_tokens=None))
Lets try prefill with streaming too:
# stream_gen = chat("Continue counting to 10","Okay! 6, 7",stream=True)
# for chunk in stream_gen:
# if isinstance(chunk, ModelResponse): display(chunk)
# else: print(delta_text(chunk) or '',end='')Tool use
Ok now lets test tool use
for m in ms[1:]:
display(Markdown(f'**{m}:**'))
chat = Chat(m, tools=[simple_add])
res = chat("What's 5 + 3? Use the `simple_add` tool. Explain.")
display(res)gemini/gemini-2.5-pro:
I used the simple_add tool to calculate the sum of 5 and 3. The tool returned the result 8.
Therefore, 5 + 3 = 8.
- id:
chatcmpl-xxx - model:
gemini-2.5-pro - finish_reason:
stop - usage:
Usage(completion_tokens=375, prompt_tokens=183, total_tokens=558, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=335, rejected_prediction_tokens=None, text_tokens=40, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=183, image_tokens=None))
gemini/gemini-2.5-flash:
I used the simple_add tool to calculate 5 + 3. The tool returned 8.
Therefore, 5 + 3 = 8.
- id:
chatcmpl-xxx - model:
gemini-2.5-flash - finish_reason:
stop - usage:
Usage(completion_tokens=102, prompt_tokens=165, total_tokens=267, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=67, rejected_prediction_tokens=None, text_tokens=35, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=165, image_tokens=None))
claude-sonnet-4-5:
Result: 5 + 3 = 8
I successfully used the simple_add tool to calculate the sum of 5 and 3. The function took two parameters: - a = 5 (the first operand) - b = 3 (the second operand)
The tool returned 8, which is the correct sum of these two numbers.
- id:
chatcmpl-xxx - model:
claude-sonnet-4-5-20250929 - finish_reason:
stop - usage:
Usage(completion_tokens=91, prompt_tokens=775, total_tokens=866, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), cache_creation_input_tokens=0, cache_read_input_tokens=0)
openai/gpt-4.1:
The result of 5 + 3 is 8. I used the simple_add tool to perform this calculation. If you have more calculations or questions, feel free to ask!
- id:
chatcmpl-xxx - model:
gpt-4.1-2025-04-14 - finish_reason:
stop - usage:
Usage(completion_tokens=36, prompt_tokens=161, total_tokens=197, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0, text_tokens=None, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=0, cached_tokens=0, text_tokens=None, image_tokens=None))
Thinking w tool use
for m in ms[1:]:
_sparams = litellm.get_model_info(m)['supported_openai_params']
if 'reasoning_effort' not in _sparams: continue
display(Markdown(f'**{m}:**'))
chat = Chat(m, tools=[simple_add])
res = chat("What's 5 + 3?",think='l',return_all=True)
display(*res)gemini/gemini-2.5-pro:
🔧 simple_add({“b”: 3, “a”: 5})
- id:
chatcmpl-xxx - model:
gemini-2.5-pro - finish_reason:
tool_calls - usage:
Usage(completion_tokens=153, prompt_tokens=74, total_tokens=227, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=133, rejected_prediction_tokens=None, text_tokens=20, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=74, image_tokens=None))
{'tool_call_id': 'call_6uICXoIzTiOf8zNLkbFfXQ',
'role': 'tool',
'name': 'simple_add',
'content': '8'}
Based on my calculation, 5 + 3 is 8.
- id:
chatcmpl-xxx - model:
gemini-2.5-pro - finish_reason:
stop - usage:
Usage(completion_tokens=248, prompt_tokens=303, total_tokens=551, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=234, rejected_prediction_tokens=None, text_tokens=14, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=303, image_tokens=None))
gemini/gemini-2.5-flash:
🔧 simple_add({“a”: 5, “b”: 3})
- id:
chatcmpl-xxx - model:
gemini-2.5-flash - finish_reason:
tool_calls - usage:
Usage(completion_tokens=73, prompt_tokens=74, total_tokens=147, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=53, rejected_prediction_tokens=None, text_tokens=20, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=74, image_tokens=None))
{'tool_call_id': 'call_Y34O3FtuSuemIIFDT7rswA',
'role': 'tool',
'name': 'simple_add',
'content': '8'}
I used the simple_add tool to calculate 5 + 3 and the result was 8.
- id:
chatcmpl-xxx - model:
gemini-2.5-flash - finish_reason:
stop - usage:
Usage(completion_tokens=23, prompt_tokens=243, total_tokens=266, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=243, image_tokens=None))
claude-sonnet-4-5:
🔧 simple_add({“a”: 5, “b”: 3})
- id:
chatcmpl-xxx - model:
claude-sonnet-4-5-20250929 - finish_reason:
tool_calls - usage:
Usage(completion_tokens=122, prompt_tokens=639, total_tokens=761, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=41, rejected_prediction_tokens=None, text_tokens=None, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), cache_creation_input_tokens=0, cache_read_input_tokens=0)
{'tool_call_id': 'toolu_J0YPIkA9T4OoWYkM1nD2aA',
'role': 'tool',
'name': 'simple_add',
'content': '8'}
The answer to 5 + 3 is 8.
- id:
chatcmpl-xxx - model:
claude-sonnet-4-5-20250929 - finish_reason:
stop - usage:
Usage(completion_tokens=68, prompt_tokens=773, total_tokens=841, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=40, rejected_prediction_tokens=None, text_tokens=None, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), cache_creation_input_tokens=0, cache_read_input_tokens=0)
Search
for m in ms[1:]:
display(Markdown(f'**{m}:**'))
chat = Chat(m)
res = chat("Search the web and tell me very briefly about otters", search='l', stream=True)
for o in res:
if isinstance(o, ModelResponse): sleep(0.01); display(o)
else: passgemini/gemini-2.5-pro:
Otters are carnivorous mammals belonging to the Mustelidae family, which also includes weasels, badgers, and wolverines. There are 14 extant species of these semi-aquatic animals, found in both freshwater and marine environments.
Key characteristics of otters include their long, slender bodies, webbed feet for swimming, and dense, waterproof fur that keeps them warm. In fact, sea otters have the thickest fur of any animal, with up to a million hair follicles per square inch. Their diet is varied and can include fish, crustaceans, frogs, and mollusks. Otters are known for their playful behavior, such as sliding into the water, and some species, like the sea otter, are known to use tools like rocks to open shells.
Otters are considered a keystone species, meaning they play a critical role in maintaining the health and balance of their ecosystems. For example, by preying on sea urchins, sea otters help protect kelp forests from being overgrazed. After facing threats from pollution, some otter populations are making a comeback in certain areas.
- id:
chatcmpl-xxx - model:
gemini-2.5-pro - finish_reason:
stop - usage:
Usage(completion_tokens=376, prompt_tokens=12, total_tokens=388, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=None, image_tokens=None), prompt_tokens_details=None)
gemini/gemini-2.5-flash:
Otters are carnivorous mammals belonging to the subfamily Lutrinae, which is part of the weasel family (Mustelidae). There are 14 recognized species of otters, all of which are semi-aquatic, living in both freshwater and marine environments.
They possess long, slender bodies, short limbs, and powerful webbed feet, making them excellent swimmers. Most species also have long, muscular tails, with the exception of the sea otter. Otters are known for their dense, insulated fur, which traps air to keep them warm and buoyant in water, as they lack a blubber layer. Their diet primarily consists of fish, but can also include crustaceans, frogs, birds, and shellfish, depending on the species and availability. Otters are found on every continent except Australia and Antarctica.
- id:
chatcmpl-xxx - model:
gemini-2.5-flash - finish_reason:
stop - usage:
Usage(completion_tokens=255, prompt_tokens=12, total_tokens=267, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=None, image_tokens=None), prompt_tokens_details=None)
claude-sonnet-4-5:
* Otters are carnivorous mammals with 14 species, all semiaquatic and living in both freshwater and marine environments. * They belong to the weasel family (Mustelidae). * Otters have long, slim bodies, webbed feet for swimming, and dense fur that keeps them warm in water. * They have the densest fur of any animal—as many as a million hairs per square inch. * They’re playful animals, engaging in activities like sliding into water and playing with stones. * All otters are expert hunters that eat fish, crustaceans, and other critters, and * sea otters famously use rocks as tools to crack open shellfish.
🔧 web_search({“query”: “otters”})
- id:
chatcmpl-xxx - model:
claude-sonnet-4-5 - finish_reason:
stop - usage:
Usage(completion_tokens=321, prompt_tokens=15153, total_tokens=15474, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=0, rejected_prediction_tokens=None, text_tokens=None, image_tokens=None), prompt_tokens_details=None)
openai/gpt-4.1:
Otters are semi-aquatic mammals known for their playful behavior and sleek bodies. They belong to the family Mustelidae and are found in rivers, lakes, and coastal areas worldwide. Otters have webbed feet for swimming, dense fur for insulation, and primarily eat fish and invertebrates. Some species, like the sea otter, use tools to open shellfish. Many otter populations are threatened by habitat loss and pollution.
- id:
chatcmpl-xxx - model:
gpt-4.1 - finish_reason:
stop - usage:
Usage(completion_tokens=89, prompt_tokens=18, total_tokens=107, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0, text_tokens=None, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=0, cached_tokens=0, text_tokens=None, image_tokens=None))
Let’s now test pause_turn with web search:
def mk_pause_web_search():
srv_tc = mk_tc("web_search", json.dumps({"query": "Solveit Answer.AI"}), tcid=random_tool_id().replace('toolu_', 'srvtoolu_'))
pause_msg = mk_tc_req("Let me search for that information:", [srv_tc])
return ModelResponse(choices=[Choices(finish_reason="pause_turn", index=0, message=pause_msg)])mk_pause_web_search()Let me search for that information:
🔧 web_search({“query”: “Solveit Answer.AI”})
- id:
chatcmpl-xxx - model:
None - finish_reason:
pause_turn - usage:
Usage(completion_tokens=0, prompt_tokens=0, total_tokens=0, completion_tokens_details=None, prompt_tokens_details=None)
We mock completion to return pause_turn in the first 2 api calls:
orig_completion = completion
call_count = 0
def patched_completion(*args, **kwargs):
global call_count
call_count += 1
print(f"Mock Call {call_count}")
if call_count < 3: return mk_pause_web_search()
return orig_completion(*args, **kwargs)
completion = patched_completion
chat_pause = Chat('claude-sonnet-4-5', search='l')
res = chat_pause("Search the web and tell me about Solveit in a paragraph")
print(f"Total calls: {call_count}")
display(res)
completion = orig_completionMock Call 1
Mock Call 2
Mock Call 3
Total calls: 3
Based on the search results, here’s information about Solveit:
Solveit is a course and platform developed by Answer.AI that teaches how to solve problems—including coding, writing, system administration, and research—using fast, short iterations, and it can be applied to everything from programming challenges and web development to learning, writing, and business. The “solveit method” is a modern approach to building software, writing, solving problems, and learning, inspired by George Pólya’s “How to Solve It” and the fast.ai top-down education tradition of learning by doing. Solveit fundamentally embodies Polya’s classic problem-solving methodology—Understand the Problem, Devise a Plan, Carry Out the Plan, and Look Back and Reflect—and prioritizes explainability and human control, breaking down complex tasks into small, iterative, and understandable steps. Answer.AI has been using it with 1000 preview users for the last year, and it’s changed their lives at Answer.AI, with hundreds of users reporting the same experience. The Solveit method is founded in building in small steps, with quick iterations and immediate feedback, where for coding, users write 1-2 lines of code at a time and immediately show the result of those steps.
- id:
chatcmpl-xxx - model:
claude-sonnet-4-5-20250929 - finish_reason:
stop - usage:
Usage(completion_tokens=384, prompt_tokens=14671, total_tokens=15055, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), server_tool_use=ServerToolUse(web_search_requests=1, tool_search_requests=None), cache_creation_input_tokens=0, cache_read_input_tokens=0)
Test next turn:
test_eq(len(chat_pause.hist), 2) # incomplete request shouldn't be storedchat_pause('What did I just ask you about?')You just asked me to search the web and tell you about Solveit in a paragraph.
- id:
chatcmpl-xxx - model:
claude-sonnet-4-5-20250929 - finish_reason:
stop - usage:
Usage(completion_tokens=23, prompt_tokens=2522, total_tokens=2545, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), cache_creation_input_tokens=0, cache_read_input_tokens=0)
Multi tool calling
We can let the model call multiple tools in sequence using the max_steps parameter.
for m in ms:
display(Markdown(f'**{m}:**'))
chat = Chat(m, tools=[simple_add])
res = chat("What's ((5 + 3)+7)+11? Work step by step", return_all=True, max_steps=5)
for r in res: display(r)gemini/gemini-3-pro-preview:
🔧 simple_add({“a”: 5, “b”: 3})
- id:
chatcmpl-xxx - model:
gemini-3-pro-preview - finish_reason:
tool_calls - usage:
Usage(completion_tokens=142, prompt_tokens=94, total_tokens=236, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=124, rejected_prediction_tokens=None, text_tokens=18, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=94, image_tokens=None))
{'tool_call_id': 'call_2XTBRujsQbORRZGu8D2Gag',
'role': 'tool',
'name': 'simple_add',
'content': '8'}
🔧 simple_add({“a”: 8, “b”: 7})
- id:
chatcmpl-xxx - model:
gemini-3-pro-preview - finish_reason:
tool_calls - usage:
Usage(completion_tokens=18, prompt_tokens=247, total_tokens=265, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=247, image_tokens=None))
{'tool_call_id': 'call_9hFkzr_HTKmYqwswCsDPDQ',
'role': 'tool',
'name': 'simple_add',
'content': '15'}
🔧 simple_add({“a”: 15, “b”: 11})
- id:
chatcmpl-xxx - model:
gemini-3-pro-preview - finish_reason:
tool_calls - usage:
Usage(completion_tokens=20, prompt_tokens=279, total_tokens=299, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=279, image_tokens=None))
{'tool_call_id': 'call_fpac86fFS4ebi3Ghs4oF_w',
'role': 'tool',
'name': 'simple_add',
'content': '26'}
Here is the step-by-step solution:
- First, solve the innermost parentheses: (5 + 3) = 8
- Next, add the result to the next number: (8 + 7) = 15
- Finally, add the last number: 15 + 11 = 26
The final answer is 26.
- id:
chatcmpl-xxx - model:
gemini-3-pro-preview - finish_reason:
stop - usage:
Usage(completion_tokens=88, prompt_tokens=313, total_tokens=401, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=313, image_tokens=None))
gemini/gemini-2.5-pro:
🔧 simple_add({“b”: 3, “a”: 5})
- id:
chatcmpl-xxx - model:
gemini-2.5-pro - finish_reason:
tool_calls - usage:
Usage(completion_tokens=358, prompt_tokens=83, total_tokens=441, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=338, rejected_prediction_tokens=None, text_tokens=20, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=83, image_tokens=None))
{'tool_call_id': 'call_dWsHFecYQyKk5pXJtl0SJg',
'role': 'tool',
'name': 'simple_add',
'content': '8'}
Okay, the first step is 5 + 3 = 8. Next, we add 7 to that result.
🔧 simple_add({“b”: 7, “a”: 8})
- id:
chatcmpl-xxx - model:
gemini-2.5-pro - finish_reason:
tool_calls - usage:
Usage(completion_tokens=124, prompt_tokens=117, total_tokens=241, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=78, rejected_prediction_tokens=None, text_tokens=46, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=117, image_tokens=None))
{'tool_call_id': 'call_3xTGEl9YRbWveQlZo_BLOw',
'role': 'tool',
'name': 'simple_add',
'content': '15'}
Okay, the second step is 8 + 7 = 15. Finally, we add 11 to that result.
🔧 simple_add({“b”: 11, “a”: 15})
- id:
chatcmpl-xxx - model:
gemini-2.5-pro - finish_reason:
tool_calls - usage:
Usage(completion_tokens=50, prompt_tokens=178, total_tokens=228, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=178, image_tokens=None))
{'tool_call_id': 'call_YCXwrjU1RXmtpE2hibWzaA',
'role': 'tool',
'name': 'simple_add',
'content': '26'}
Okay, the last step is 15 + 11 = 26.
So, ((5 + 3)+7)+11 = 26.
- id:
chatcmpl-xxx - model:
gemini-2.5-pro - finish_reason:
stop - usage:
Usage(completion_tokens=36, prompt_tokens=243, total_tokens=279, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=243, image_tokens=None))
gemini/gemini-2.5-flash:
🔧 simple_add({“b”: 3, “a”: 5})
- id:
chatcmpl-xxx - model:
gemini-2.5-flash - finish_reason:
tool_calls - usage:
Usage(completion_tokens=93, prompt_tokens=83, total_tokens=176, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=73, rejected_prediction_tokens=None, text_tokens=20, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=83, image_tokens=None))
{'tool_call_id': 'call_I3HqLAJHRF_KgU1Tlk3bdw',
'role': 'tool',
'name': 'simple_add',
'content': '8'}
🔧 simple_add({“b”: 7, “a”: 8})
- id:
chatcmpl-xxx - model:
gemini-2.5-flash - finish_reason:
tool_calls - usage:
Usage(completion_tokens=57, prompt_tokens=117, total_tokens=174, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=37, rejected_prediction_tokens=None, text_tokens=20, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=117, image_tokens=None))
{'tool_call_id': 'call_VmcgF1VaQIWFeLqzJql0ZQ',
'role': 'tool',
'name': 'simple_add',
'content': '15'}
🔧 simple_add({“a”: 15, “b”: 11})
- id:
chatcmpl-xxx - model:
gemini-2.5-flash - finish_reason:
tool_calls - usage:
Usage(completion_tokens=67, prompt_tokens=152, total_tokens=219, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=45, rejected_prediction_tokens=None, text_tokens=22, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=152, image_tokens=None))
{'tool_call_id': 'call_F_03NrfvRByeAOptyiS_TQ',
'role': 'tool',
'name': 'simple_add',
'content': '26'}
Here’s how we can break this down: 5 + 3 = 8 8 + 7 = 15 15 + 11 = 26
So, ((5 + 3) + 7) + 11 = 26.
- id:
chatcmpl-xxx - model:
gemini-2.5-flash - finish_reason:
stop - usage:
Usage(completion_tokens=92, prompt_tokens=189, total_tokens=281, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=32, rejected_prediction_tokens=None, text_tokens=60, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=189, image_tokens=None))
claude-sonnet-4-5:
I’ll solve this step by step using the addition function.
Step 1: First, let me calculate 5 + 3
🔧 simple_add({“a”: 5, “b”: 3})
- id:
chatcmpl-xxx - model:
claude-sonnet-4-5-20250929 - finish_reason:
tool_calls - usage:
Usage(completion_tokens=100, prompt_tokens=617, total_tokens=717, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), cache_creation_input_tokens=0, cache_read_input_tokens=0)
{'tool_call_id': 'toolu_CSFfT57bRfKHh937VpfxfA',
'role': 'tool',
'name': 'simple_add',
'content': '8'}
Step 2: Now I’ll add 7 to that result (8 + 7)
🔧 simple_add({“a”: 8, “b”: 7})
- id:
chatcmpl-xxx - model:
claude-sonnet-4-5-20250929 - finish_reason:
tool_calls - usage:
Usage(completion_tokens=93, prompt_tokens=730, total_tokens=823, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), cache_creation_input_tokens=0, cache_read_input_tokens=0)
{'tool_call_id': 'toolu_JkAhHinyQ8eFBfT2CoxGxw',
'role': 'tool',
'name': 'simple_add',
'content': '15'}
Step 3: Finally, I’ll add 11 to that result (15 + 11)
🔧 simple_add({“a”: 15, “b”: 11})
- id:
chatcmpl-xxx - model:
claude-sonnet-4-5-20250929 - finish_reason:
tool_calls - usage:
Usage(completion_tokens=94, prompt_tokens=836, total_tokens=930, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), cache_creation_input_tokens=0, cache_read_input_tokens=0)
{'tool_call_id': 'toolu_XGRgNkoeQbeVXQ53_164Zg',
'role': 'tool',
'name': 'simple_add',
'content': '26'}
Answer: ((5 + 3) + 7) + 11 = 26
Here’s the breakdown: - 5 + 3 = 8 - 8 + 7 = 15 - 15 + 11 = 26
- id:
chatcmpl-xxx - model:
claude-sonnet-4-5-20250929 - finish_reason:
stop - usage:
Usage(completion_tokens=67, prompt_tokens=943, total_tokens=1010, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), cache_creation_input_tokens=0, cache_read_input_tokens=0)
openai/gpt-4.1:
🔧 simple_add({“a”:5,“b”:3})
- id:
chatcmpl-xxx - model:
gpt-4.1-2025-04-14 - finish_reason:
tool_calls - usage:
Usage(completion_tokens=18, prompt_tokens=82, total_tokens=100, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0, text_tokens=None, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=0, cached_tokens=0, text_tokens=None, image_tokens=None))
{'tool_call_id': 'call_ITAmDIxpR4_9QvaXZREWVg',
'role': 'tool',
'name': 'simple_add',
'content': '8'}
🔧 simple_add({“a”:8,“b”:7})
- id:
chatcmpl-xxx - model:
gpt-4.1-2025-04-14 - finish_reason:
tool_calls - usage:
Usage(completion_tokens=18, prompt_tokens=109, total_tokens=127, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0, text_tokens=None, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=0, cached_tokens=0, text_tokens=None, image_tokens=None))
{'tool_call_id': 'call_uwN463piQi6dadn8Sxy4vQ',
'role': 'tool',
'name': 'simple_add',
'content': '15'}
🔧 simple_add({“a”:15,“b”:11})
- id:
chatcmpl-xxx - model:
gpt-4.1-2025-04-14 - finish_reason:
tool_calls - usage:
Usage(completion_tokens=18, prompt_tokens=136, total_tokens=154, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0, text_tokens=None, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=0, cached_tokens=0, text_tokens=None, image_tokens=None))
{'tool_call_id': 'call_TtE1UwxaR2_vCoHtPV1gvA',
'role': 'tool',
'name': 'simple_add',
'content': '26'}
Here are the step-by-step calculations:
- First, add 5 + 3 = 8
- Next, add the result to 7: 8 + 7 = 15
- Finally, add 11 to the previous result: 15 + 11 = 26
So, ((5 + 3) + 7) + 11 = 26.
- id:
chatcmpl-xxx - model:
gpt-4.1-2025-04-14 - finish_reason:
stop - usage:
Usage(completion_tokens=83, prompt_tokens=163, total_tokens=246, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0, text_tokens=None, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=0, cached_tokens=0, text_tokens=None, image_tokens=None))
Some models support parallel tool calling. I.e. sending multiple tool call requests in one conversation step.
def multiply(a: int, b: int) -> int:
"Multiply two numbers"
return a * b
for m in ms[1:]:
_sparams = litellm.get_model_info(m)['supported_openai_params']
if 'parallel_tool_calls' not in _sparams: continue
display(Markdown(f'**{m}:**'))
chat = Chat(m, tools=[simple_add, multiply])
res = chat("Calculate (5 + 3) * (7 + 2)", max_steps=5, return_all=True)
for r in res: display(r)gemini/gemini-2.5-pro:
I will first calculate the two sums and then multiply the results.
🔧 simple_add({“b”: 3, “a”: 5})
🔧 simple_add({“a”: 7, “b”: 2})
- id:
chatcmpl-xxx - model:
gemini-2.5-pro - finish_reason:
tool_calls - usage:
Usage(completion_tokens=394, prompt_tokens=133, total_tokens=527, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=341, rejected_prediction_tokens=None, text_tokens=53, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=133, image_tokens=None))
{'tool_call_id': 'call_uomC3YXmTqmbZr_aLflnRw',
'role': 'tool',
'name': 'simple_add',
'content': '8'}
{'tool_call_id': 'call_1ec_P2c2R9mNe9MHEiQR5g',
'role': 'tool',
'name': 'simple_add',
'content': '9'}
🔧 multiply({“b”: 9, “a”: 8})
- id:
chatcmpl-xxx - model:
gemini-2.5-pro - finish_reason:
tool_calls - usage:
Usage(completion_tokens=391, prompt_tokens=213, total_tokens=604, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=373, rejected_prediction_tokens=None, text_tokens=18, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=213, image_tokens=None))
{'tool_call_id': 'call_G9CUSGorQgCMmgrhVBnu_A',
'role': 'tool',
'name': 'multiply',
'content': '72'}
The result is 72.
- id:
chatcmpl-xxx - model:
gemini-2.5-pro - finish_reason:
stop - usage:
Usage(completion_tokens=7, prompt_tokens=244, total_tokens=251, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=244, image_tokens=None))
gemini/gemini-2.5-flash:
🔧 simple_add({“a”: 5, “b”: 3})
- id:
chatcmpl-xxx - model:
gemini-2.5-flash - finish_reason:
tool_calls - usage:
Usage(completion_tokens=109, prompt_tokens=133, total_tokens=242, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=89, rejected_prediction_tokens=None, text_tokens=20, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=133, image_tokens=None))
{'tool_call_id': 'call_ey4bguidSBWPko3FGXJM4w',
'role': 'tool',
'name': 'simple_add',
'content': '8'}
🔧 simple_add({“a”: 7, “b”: 2})
- id:
chatcmpl-xxx - model:
gemini-2.5-flash - finish_reason:
tool_calls - usage:
Usage(completion_tokens=66, prompt_tokens=167, total_tokens=233, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=46, rejected_prediction_tokens=None, text_tokens=20, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=167, image_tokens=None))
{'tool_call_id': 'call_zEF_fNdBRgmWSukJeVheaQ',
'role': 'tool',
'name': 'simple_add',
'content': '9'}
🔧 multiply({“b”: 9, “a”: 8})
- id:
chatcmpl-xxx - model:
gemini-2.5-flash - finish_reason:
tool_calls - usage:
Usage(completion_tokens=87, prompt_tokens=201, total_tokens=288, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=69, rejected_prediction_tokens=None, text_tokens=18, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=201, image_tokens=None))
{'tool_call_id': 'call_H9PAF1f5TR6P9MVr_eosZA',
'role': 'tool',
'name': 'multiply',
'content': '72'}
The answer is 72.
- id:
chatcmpl-xxx - model:
gemini-2.5-flash - finish_reason:
stop - usage:
Usage(completion_tokens=62, prompt_tokens=232, total_tokens=294, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=55, rejected_prediction_tokens=None, text_tokens=7, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=232, image_tokens=None))
claude-sonnet-4-5:
I’ll calculate this step by step.
First, let me calculate the two additions: - 5 + 3 - 7 + 2
Then I’ll multiply the results.
🔧 simple_add({“a”: 5, “b”: 3})
🔧 simple_add({“a”: 7, “b”: 2})
- id:
chatcmpl-xxx - model:
claude-sonnet-4-5-20250929 - finish_reason:
tool_calls - usage:
Usage(completion_tokens=166, prompt_tokens=700, total_tokens=866, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), cache_creation_input_tokens=0, cache_read_input_tokens=0)
{'tool_call_id': 'toolu_f2uHk7MYTUydsrRSeqVqGA',
'role': 'tool',
'name': 'simple_add',
'content': '8'}
{'tool_call_id': 'toolu_VcftnU1JRdyJrtvQbTFrSg',
'role': 'tool',
'name': 'simple_add',
'content': '9'}
Now I’ll multiply the results:
🔧 multiply({“a”: 8, “b”: 9})
- id:
chatcmpl-xxx - model:
claude-sonnet-4-5-20250929 - finish_reason:
tool_calls - usage:
Usage(completion_tokens=76, prompt_tokens=931, total_tokens=1007, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), cache_creation_input_tokens=0, cache_read_input_tokens=0)
{'tool_call_id': 'toolu_J9maI_T3Ql6v5nkKvBikCw',
'role': 'tool',
'name': 'multiply',
'content': '72'}
The answer is 72.
To break it down: - (5 + 3) = 8 - (7 + 2) = 9 - 8 × 9 = 72
- id:
chatcmpl-xxx - model:
claude-sonnet-4-5-20250929 - finish_reason:
stop - usage:
Usage(completion_tokens=51, prompt_tokens=1020, total_tokens=1071, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), cache_creation_input_tokens=0, cache_read_input_tokens=0)
openai/gpt-4.1:
🔧 simple_add({“a”: 5, “b”: 3})
🔧 simple_add({“a”: 7, “b”: 2})
- id:
chatcmpl-xxx - model:
gpt-4.1-2025-04-14 - finish_reason:
tool_calls - usage:
Usage(completion_tokens=52, prompt_tokens=110, total_tokens=162, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0, text_tokens=None, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=0, cached_tokens=0, text_tokens=None, image_tokens=None))
{'tool_call_id': 'call_kII_2qBySqCqo2z363C6ZQ',
'role': 'tool',
'name': 'simple_add',
'content': '8'}
{'tool_call_id': 'call__dLtevl8TFeOXcgHYCVxmQ',
'role': 'tool',
'name': 'simple_add',
'content': '9'}
🔧 multiply({“a”:8,“b”:9})
- id:
chatcmpl-xxx - model:
gpt-4.1-2025-04-14 - finish_reason:
tool_calls - usage:
Usage(completion_tokens=17, prompt_tokens=178, total_tokens=195, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0, text_tokens=None, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=0, cached_tokens=0, text_tokens=None, image_tokens=None))
{'tool_call_id': 'call_zsou4xDaSpWWQIFpo42K_A',
'role': 'tool',
'name': 'multiply',
'content': '72'}
(5 + 3) = 8 and (7 + 2) = 9. Multiplying them together: 8 × 9 = 72.
The answer is 72.
- id:
chatcmpl-xxx - model:
gpt-4.1-2025-04-14 - finish_reason:
stop - usage:
Usage(completion_tokens=41, prompt_tokens=203, total_tokens=244, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0, text_tokens=None, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=0, cached_tokens=0, text_tokens=None, image_tokens=None))
See how the additions are calculated in one go!
We don’t want the model to keep running tools indefinitely. Lets showcase how we can force the model to stop after our specified number of toolcall rounds:
def divide(a: int, b: int) -> float:
"Divide two numbers"
return a / b
chat = Chat(model, tools=[simple_add, multiply, divide])
res = chat("Calculate ((10 + 5) * 3) / (2 + 1) step by step.",
max_steps=3, return_all=True,
final_prompt="Please wrap-up for now and summarize how far we got.")
for r in res: display(r)🔧 simple_add({“b”: 5, “a”: 10})
- id:
chatcmpl-xxx - model:
gemini-2.5-pro - finish_reason:
tool_calls - usage:
Usage(completion_tokens=330, prompt_tokens=196, total_tokens=526, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=309, rejected_prediction_tokens=None, text_tokens=21, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=196, image_tokens=None))
{'tool_call_id': 'call_OJdN9b_3Q86yssSSFaznoQ',
'role': 'tool',
'name': 'simple_add',
'content': '15'}
🔧 simple_add({“b”: 1, “a”: 2})
- id:
chatcmpl-xxx - model:
gemini-2.5-pro - finish_reason:
tool_calls - usage:
Usage(completion_tokens=104, prompt_tokens=232, total_tokens=336, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=84, rejected_prediction_tokens=None, text_tokens=20, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=232, image_tokens=None))
{'tool_call_id': 'call_GRuK3wICRhyigwhpD6fuBQ',
'role': 'tool',
'name': 'simple_add',
'content': '3'}
Of course. Here is the step-by-step calculation for the expression ((10 + 5) * 3) / (2 + 1):
Step 1: Solve the first part in parentheses.
10 + 5 = 15
Step 2: Solve the second part in parentheses.
2 + 1 = 3
Now, we substitute these results back into the original expression:
(15 * 3) / 3
Step 3: Perform the multiplication.
15 * 3 = 45
Finally, we substitute this result back into the expression:
45 / 3
Step 4: Perform the division.
45 / 3 = 15
So, the final answer is 15.
- id:
chatcmpl-xxx - model:
gemini-2.5-pro - finish_reason:
stop - usage:
Usage(completion_tokens=362, prompt_tokens=280, total_tokens=642, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=167, rejected_prediction_tokens=None, text_tokens=195, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=280, image_tokens=None))
Tool call exhaustion
pr = "What is 1+2, and then the result of adding +2, and then +3 to it? Use tools to make the calculations!"
c = Chat(model, tools=[simple_add])res = c(pr, max_steps=2)
resBased on my calculation, 1 + 2 equals 3.
I have not yet completed the full sequence of additions you requested. The next steps are to add 2 to this result, and then add 3 to the subsequent result. I can continue with the next calculation on your command.
- id:
chatcmpl-xxx - model:
gemini-2.5-pro - finish_reason:
stop - usage:
Usage(completion_tokens=716, prompt_tokens=174, total_tokens=890, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=655, rejected_prediction_tokens=None, text_tokens=61, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=174, image_tokens=None))
assert c.hist[-2] == _final_promptTool Call Referencing
With tc_refs=True, the AI can see and report tool call IDs:
chat = Chat('claude-sonnet-4-5', tools=[simple_add], tc_refs=True)
chat("Call add(1,2) and tell me the tool_call_id you used")I successfully called the simple_add function with parameters a=1 and b=2, which returned the result 3.
The tool_call_id I used was: toolu_yc1K_X0WTymrjyBWctPMXQ
- id:
chatcmpl-xxx - model:
claude-sonnet-4-5-20250929 - finish_reason:
stop - usage:
Usage(completion_tokens=67, prompt_tokens=867, total_tokens=934, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), cache_creation_input_tokens=0, cache_read_input_tokens=0)
chat.tc_res{'toolu_yc1K_X0WTymrjyBWctPMXQ': 3}
Example of chained tool calls where the AI references a previous result:
@dataclass
class Person:
name: str
age: int
def get_person():
"Get a person's data"
return {"name": "Alice", "age": 30}
def greet_person(person: Person):
"Greet a person"
return f"Hello {person.name}, you are {person.age} years old!"chat = Chat('claude-sonnet-4-5', tools=[get_person, greet_person], tc_refs=True)
chat("First call get_person, then pass the result to greet_person", max_steps=10)Perfect! I successfully retrieved Alice’s data (name: Alice, age: 30) and then passed it to the greet_person function, which returned the greeting: “Hello Alice, you are 30 years old!”
- id:
chatcmpl-xxx - model:
claude-sonnet-4-5-20250929 - finish_reason:
stop - usage:
Usage(completion_tokens=50, prompt_tokens=1025, total_tokens=1075, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), cache_creation_input_tokens=0, cache_read_input_tokens=0)
We can inspect chat.tc_res to see all stored tool results:
chat.tc_res{'toolu_N53toa3mRem241X2lbtEDQ': {'name': 'Alice', 'age': 30},
'toolu_OFr0Y15KSGKVavRYbEw5NQ': 'Hello Alice, you are 30 years old!'}
list(L(chat.hist).attrgot('tool_calls').filter())[[{'index': 1,
'function': {'arguments': '{}', 'name': 'get_person'},
'id': 'toolu_N53toa3mRem241X2lbtEDQ',
'type': 'function'}],
[{'index': 1,
'function': {'arguments': '{"person": "$`toolu_N53toa3mRem241X2lbtEDQ`"}',
'name': 'greet_person'},
'id': 'toolu_OFr0Y15KSGKVavRYbEw5NQ',
'type': 'function'}]]
This also works with ToolResponse results:
def view_img(fn:Path):
"View an image"
durl = f"data:image/jpeg;base64,{base64.b64encode(fn.read_bytes()).decode()}"
return ToolResponse([{'type': 'image_url', 'image_url': {'url': durl}}])
def get_img_size(image_content: list) -> dict:
"Get the size of an image from ToolResponse content"
from PIL import Image
from io import BytesIO
url = image_content[0]['image_url']['url']
b64_data = url.split(',')[1]
img = Image.open(BytesIO(base64.b64decode(b64_data)))
return {'width': img.width, 'height': img.height}chat = Chat('claude-sonnet-4-5', tools=[view_img, get_img_size], tc_refs=True)
chat(f"First describe the image at {img_fn}, and then get it's dimensions", max_steps=10)Image Description: This is an adorable photograph of a Cavalier King Charles Spaniel puppy. The puppy has the breed’s characteristic coloring with a white face and chest, and rich brown/chestnut colored ears and patches. The puppy is lying on green grass and is positioned near some purple flowers (possibly asters or similar blooms). The puppy has sweet, expressive dark eyes and is looking directly at the camera with an endearing expression. The background shows a natural outdoor setting with foliage and flowers, creating a charming garden scene.
Image Dimensions: - Width: 300 pixels - Height: 200 pixels
- id:
chatcmpl-xxx - model:
claude-sonnet-4-5-20250929 - finish_reason:
stop - usage:
Usage(completion_tokens=146, prompt_tokens=1119, total_tokens=1265, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), cache_creation_input_tokens=0, cache_read_input_tokens=0)
chat.tc_res{'toolu_ldGAUULLTR2_xXPV_QunDg': [{'type': 'image_url',
'image_url': {'url': ''}}],
'toolu_MSNO_m5kQEOqpQ9OxvAJMw': {'width': 300, 'height': 200}}
list(L(chat.hist).attrgot('tool_calls').filter())[[{'index': 1,
'function': {'arguments': '{"fn": "samples/puppy.jpg"}',
'name': 'view_img'},
'id': 'toolu_ldGAUULLTR2_xXPV_QunDg',
'type': 'function'}],
[{'index': 1,
'function': {'arguments': '{"image_content": "$`toolu_ldGAUULLTR2_xXPV_QunDg`"}',
'name': 'get_img_size'},
'id': 'toolu_MSNO_m5kQEOqpQ9OxvAJMw',
'type': 'function'}]]
Async
AsyncChat
If you want to use LiteLLM in a webapp you probably want to use their async function acompletion. To make that easier we will implement our version of AsyncChat to complement it. It follows the same implementation as Chat as much as possible:
Testing the scenarios where the tool call was not in schemas, or schemas was missing:
result = await _alite_call_func(fake_tc, [toolsc], globals())
test_eq(result['content'], "Tool not defined in tool_schemas: hallucinated_tool")result = await _alite_call_func(fake_tc, None, globals())
test_eq(result['content'], "Tool not defined in tool_schemas: hallucinated_tool")astream_with_complete
def astream_with_complete(
agen, postproc:function=noop
):
AsyncChat
def AsyncChat(
model:str, # LiteLLM compatible model name
sp:str='', # System prompt
temp:int=0, # Temperature
search:bool=False, # Search (l,m,h), if model supports it
tools:list=None, # Add tools
hist:list=None, # Chat history
ns:Optional=None, # Custom namespace for tool calling
cache:bool=False, # Anthropic prompt caching
cache_idxs:list=[-1], # Anthropic cache breakpoint idxs, use `0` for sys prompt if provided
ttl:NoneType=None, # Anthropic prompt caching ttl
api_base:NoneType=None, # API base URL for custom providers
api_key:NoneType=None, # API key for custom providers
extra_headers:NoneType=None, # Extra HTTP headers for custom providers
tc_refs:bool=False, # Enable tool call result references
):
LiteLLM chat client.
AsyncChat.__call__
def __call__(
msg:NoneType=None, # Message str, or list of multiple message parts
prefill:NoneType=None, # Prefill AI response if model supports it
temp:NoneType=None, # Override temp set on chat initialization
think:NoneType=None, # Thinking (l,m,h)
search:NoneType=None, # Override search set on chat initialization (l,m,h)
stream:bool=False, # Stream results
max_steps:int=2, # Maximum number of tool calls
final_prompt:dict={'role': 'user', 'content': 'You have used all your tool calls for this turn. Please summarize your findings. If you did not complete your goal, tell the user what further work is needed. You may use tools again on the next user message.'}, # Final prompt when tool calls have ran out
return_all:bool=False, # Returns all intermediate ModelResponses if not streaming and has tool calls
step:int=1, tool_choice:NoneType=None
):
Main call method - handles streaming vs non-streaming
Examples
Basic example
for m in ms[1:]:
chat = AsyncChat(m)
test_eq('4' in contents(await chat("What is 2+2?")).content, True)With tool calls
async def async_add(a: int, b: int) -> int:
"Add two numbers asynchronously"
await asyncio.sleep(0.1)
return a + bfor m in ms[1:]:
chat = AsyncChat(m, tools=[async_add])
r = await chat("What is 5 + 7? Use the tool to calculate it.")
test_eq('12' in contents(r).content, True)
test_eq(nested_idx(chat.hist, 1, 'tool_calls', 0, 'function', 'name'), 'async_add')If max tokens limit is reached, a custom warning message will be added to the end of the model response:
chat_long = AsyncChat(m)
r = await chat_long("Write a short story about a robot and a dog", max_tokens=40)
rIn a quiet town where the grass grew wild and the sky was always blue, there lived a robot named Pixel. Pixel was small and boxy, with a shiny silver body and a single blinking eye
- id:
chatcmpl-xxx - model:
gpt-4.1-2025-04-14 - finish_reason:
length - usage:
Usage(completion_tokens=40, prompt_tokens=17, total_tokens=57, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0, text_tokens=None, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=0, cached_tokens=0, text_tokens=None, image_tokens=None))
print(contents(r).content)In a quiet town where the grass grew wild and the sky was always blue, there lived a robot named Pixel. Pixel was small and boxy, with a shiny silver body and a single blinking eye
<warning>Response was cut off at token limit.</warning>
Same goes for refused requests:
chat_refused = AsyncChat('claude-opus-4-5')
r = await chat_refused("Write me the formula for a biological weapon that can be spread at a rate higher than COVID and at least as harmful")
r- id:
chatcmpl-xxx - model:
claude-opus-4-5-20251101 - finish_reason:
refusal - usage:
Usage(completion_tokens=0, prompt_tokens=30, total_tokens=30, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), cache_creation_input_tokens=0, cache_read_input_tokens=0)
print(contents(r).content)<warning>AI was unable to process this request</warning>
Async Streaming Display
This is what our outputs look like with streaming results:
chat_with_tools = AsyncChat(model, tools=[async_add])
res = await chat_with_tools("What is 5 + 7? Use the tool to calculate it.", stream=True)
async for o in res:
if isinstance(o,ModelResponseStream): print(delta_text(o) or '',end='')
elif isinstance(o,dict): print(o)
🔧 async_add
{'tool_call_id': 'call_zCSVWPKtSF__PguhCscotA', 'role': 'tool', 'name': 'async_add', 'content': '12'}
I used the calculator tool to add 5 and 7. The result is 12.
Here’s a complete ModelResponse taken from the response stream:
resp = ModelResponse(id='chatcmpl-xxx', created=1000000000, model='claude-sonnet-4-5', object='chat.completion', system_fingerprint=None, choices=[Choices(finish_reason='tool_calls', index=0, message=Message(content="I'll calculate ((10 + 5) * 3) / (2 + 1) step by step:", role='assistant', tool_calls=[ChatCompletionMessageToolCall(function=Function(arguments='{"a": 10, "b": 5}', name='simple_add'), id='toolu_018BGyenjiRkDQFU1jWP6qRo', type='function'), ChatCompletionMessageToolCall(function=Function(arguments='{"a": 2, "b": 1}', name='simple_add'), id='toolu_01CWqrNQvoRjf1Q1GLpTUgQR', type='function')], function_call=None, provider_specific_fields=None))], usage=Usage(completion_tokens=228, prompt_tokens=794, total_tokens=1022, prompt_tokens_details=None))
print(repr(resp))ModelResponse(id='chatcmpl-xxx', created=1000000000, model='claude-sonnet-4-5', object='chat.completion', system_fingerprint=None, choices=[Choices(finish_reason='tool_calls', index=0, message=Message(content="I'll calculate ((10 + 5) * 3) / (2 + 1) step by step:", role='assistant', tool_calls=[ChatCompletionMessageToolCall(function=Function(arguments='{"a": 10, "b": 5}', name='simple_add'), id='toolu_GskC7iV3TPCfmCGIN0TaZA', type='function'), ChatCompletionMessageToolCall(function=Function(arguments='{"a": 2, "b": 1}', name='simple_add'), id='toolu_XJSTgWDGQ_21WjrBMq4qIA', type='function')], function_call=None, provider_specific_fields=None))], usage=Usage(completion_tokens=228, prompt_tokens=794, total_tokens=1022, completion_tokens_details=None, prompt_tokens_details=None))
tc=resp.choices[0].message.tool_calls[0]
tcChatCompletionMessageToolCall(function=Function(arguments='{"a": 10, "b": 5}', name='simple_add'), id='toolu_GskC7iV3TPCfmCGIN0TaZA', type='function')
tr={'tool_call_id': 'toolu_018BGyenjiRkDQFU1jWP6qRo', 'role': 'tool','name': 'simple_add',
'content': '15 is the answer! ' +'.'*2000}mk_tr_details
def mk_tr_details(
tr, tc, mx:int=2000
):
block for tool call as JSON*
mk_tr_details(tr,tc,mx=300)'\n\n<details class=\'tool-usage-details\'>\n<summary>simple_add(a=10, b=5)</summary>\n\n```json\n{\n "id": "toolu_018BGyenjiRkDQFU1jWP6qRo",\n "call": {\n "function": "simple_add",\n "arguments": {\n "a": "10",\n "b": "5"\n }\n },\n "result": "15 is the answer! .....<TRUNCATED>"\n}\n```\n\n</details>\n\n'
fmt_usage
def fmt_usage(
u
):
Format usage stats with cache hit rate as lead metric.
ex_usg = AttrDict(
completion_tokens=203,
prompt_tokens=25139,
total_tokens=25342,
completion_tokens_details=AttrDict(reasoning_tokens=35),
prompt_tokens_details=AttrDict(cached_tokens=24299, cache_creation_tokens=79),
cache_creation_input_tokens=79,
cache_read_input_tokens=24299
)
fmt_usage(ex_usg)'Cache hit: 96.7% | Tokens: total=25,342 input=25,139 (+24,299 cached, 79 new) output=203 (reasoning 35)'
StreamFormatter
def StreamFormatter(
include_usage:bool=False, mx:int=2000, debug:bool=False
):
Initialize self. See help(type(self)) for accurate signature.
stream_msg = ModelResponseStream([StreamingChoices(delta=Delta(content="Hello world!"))])
StreamFormatter().format_item(stream_msg)'Hello world!'
reasoning_msg = ModelResponseStream([StreamingChoices(delta=Delta(reasoning_content="thinking..."))])
StreamFormatter().format_item(reasoning_msg)'🧠'
AsyncStreamFormatter
def AsyncStreamFormatter(
include_usage:bool=False, mx:int=2000, debug:bool=False
):
Initialize self. See help(type(self)) for accurate signature.
mock_tool_call = ChatCompletionMessageToolCall(
id="toolu_123abc456def", type="function",
function=Function( name="simple_add", arguments='{"a": 5, "b": 3}' )
)
mock_response = ModelResponse()
mock_response.choices = [type('Choice', (), {
'message': type('Message', (), {
'tool_calls': [mock_tool_call]
})()
})()]
mock_tool_result = {
'tool_call_id': mock_tool_call.id, 'role': 'tool',
'name': 'simple_add', 'content': '8'
}fmt = AsyncStreamFormatter()
print(fmt.format_item(mock_response))
print('---')
print(fmt.format_item(mock_tool_result))
---
<details class='tool-usage-details'>
<summary>simple_add(a=5, b=3)</summary>
```json
{
"id": "toolu_GtCm8ia9SXSTtWSwi_BMPg",
"call": {
"function": "simple_add",
"arguments": {
"a": "5",
"b": "3"
}
},
"result": "8"
}
```
</details>
In jupyter it’s nice to use this StreamFormatter in combination with the Markdown display:
display_stream
def display_stream(
rs
):
Use IPython.display to markdown display the response stream.
Generated images can be displayed in streaming too (not shown here to conserve filesize):
# rs = completion(model='gemini/gemini-2.5-flash-image', stream=True, messages=[{'role':'user','content':'Draw a simple sketch of a dog'}])
# fmt = display_stream(rs)adisplay_stream
def adisplay_stream(
rs
):
Use IPython.display to markdown display the response stream.
Streaming examples
Now we can demonstrate AsyncChat with stream=True!
Tool call
chat = Chat(model, tools=[simple_add])
res = chat("What is 5 + 7? Use the tool to calculate it.", stream=True)
fmt = display_stream(res)simple_add(b=7, a=5)
{
"id": "call_o2vLAWfpQ2OQXAU7Jf2svg",
"call": {
"function": "simple_add",
"arguments": {
"b": "7",
"a": "5"
}
},
"result": "12"
}I used the simple_add tool to calculate the sum of 5 and 7. The result is 12.
chat = AsyncChat(model, tools=[async_add])
res = await chat("What is 5 + 7? Use the tool to calculate it.", stream=True)
fmt = await adisplay_stream(res)async_add(a=5, b=7)
{
"id": "call_40WscurDQgSt587zftLsLw",
"call": {
"function": "async_add",
"arguments": {
"a": "5",
"b": "7"
}
},
"result": "12"
}I used the calculator tool to add 5 and 7. The result is 12.
chat = AsyncChat(model, tools=[async_add])
res = await chat("What is 5 + 3? Use the tool to calculate it.", stream=True)
fmt = await adisplay_stream(res)async_add(b=3, a=5)
{
"id": "call_3_xGI6uJRgWik5s7f6dNig",
"call": {
"function": "async_add",
"arguments": {
"b": "3",
"a": "5"
}
},
"result": "8"
}I used the tool to calculate 5 + 3. The result is 8.
Thinking tool call
chat = AsyncChat(model)
res = await chat("Briefly, what's the most efficient way to sort a list of 1000 random integers?", think='l',stream=True)
_ = await adisplay_stream(res)🧠🧠🧠🧠
Use the built-in sort function provided by your programming language.
For a list of only 1000 integers, the standard library’s sort function (like Python’s list.sort() or Java’s Arrays.sort()) is the most efficient choice. These are highly optimized, often using a hybrid algorithm like Timsort or Introsort that combines the speed of Quicksort with the stability and worst-case guarantees of other sorts.
Writing your own sort would be slower and less reliable.
Multiple tool calls
chat.hist[1]Message(content="Of course, let's break this down.\n\nFirst, we will evaluate the two sums in the expression, `10 + 5` and `2 + 1`, in parallel.", role='assistant', tool_calls=[{'function': {'arguments': '{"a": 10, "b": 5}', 'name': 'simple_add'}, 'id': 'call_UXlzUOYlRAO_PfC79mrBaA', 'type': 'function'}, {'function': {'arguments': '{"b": 1, "a": 2}', 'name': 'simple_add'}, 'id': 'call_hm1wAgkUQq2SYx250XA0zg', 'type': 'function'}], function_call=None, provider_specific_fields=None)
chat.hist[2]{'tool_call_id': 'call_UXlzUOYlRAO_PfC79mrBaA',
'role': 'tool',
'name': 'simple_add',
'content': '15'}
chat.hist[3]{'tool_call_id': 'call_hm1wAgkUQq2SYx250XA0zg',
'role': 'tool',
'name': 'simple_add',
'content': '3'}
chat.hist[4]Message(content='Now that we have the results for the two sums, we can proceed with the next step. We have calculated that `10 + 5 = 15` and `2 + 1 = 3`. The expression is now `15 * 3 / 3`. We will now perform the multiplication and division in parallel.', role='assistant', tool_calls=[{'function': {'arguments': '{"a": 15, "b": 3}', 'name': 'multiply'}, 'id': 'call_tNTfzLfXScyLXKQ2lTwXjg', 'type': 'function'}, {'function': {'arguments': '{"a": 15, "b": 3}', 'name': 'divide'}, 'id': 'call_zMk_9xD8SX24bjDvzpsucA', 'type': 'function'}], function_call=None, provider_specific_fields=None)
chat.hist[5]{'tool_call_id': 'call_tNTfzLfXScyLXKQ2lTwXjg',
'role': 'tool',
'name': 'multiply',
'content': '45'}
Now to demonstrate that we can load back the formatted output back into a new Chat object:
chat5 = Chat(model,hist=fmt2hist(fmt.outp),tools=[simple_add, multiply, divide])
chat5('what did we just do?')We just broke down the calculation of the expression ((10 + 5) * 3) / (2 + 1) into a series of steps that could be performed in parallel using the available tools.
Here’s a summary of the steps we took:
- First Parallel Calculation: We solved the two expressions inside the parentheses simultaneously.
simple_add(a=10, b=5)which resulted in15.simple_add(a=2, b=1)which resulted in3.
- Second Parallel Calculation: After the first step, the expression was effectively
(15 * 3) / 3. We then performed the next set of calculations in parallel:- We calculated the numerator:
multiply(a=15, b=3)which resulted in45. - We also performed a separate division:
divide(a=15, b=3)which resulted in5.
- We calculated the numerator:
The final step, which we haven’t done yet, is to take the result of the numerator (45) and divide it by the result of the denominator (3) to get the final answer.
- id:
chatcmpl-xxx - model:
gemini-2.5-pro - finish_reason:
stop - usage:
Usage(completion_tokens=1106, prompt_tokens=590, total_tokens=1696, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=None, audio_tokens=None, reasoning_tokens=842, rejected_prediction_tokens=None, text_tokens=264, image_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=590, image_tokens=None))
Search
chat_stream_tools = AsyncChat(model, search='l')
res = await chat_stream_tools("Search the weather in NYC", stream=True)
_=await adisplay_stream(res)Mostly Sunny Skies in New York City with a Chilly Forecast Ahead
New York, NY - New Yorkers are experiencing a mostly sunny day with a current temperature of 24°F (-4°C), which feels more like 17°F (-8°C) due to wind chill. The humidity is currently at 53%, and there is a very low chance of snow.
The forecast for the rest of the day indicates a possibility of light snow, with temperatures ranging from a low of 20°F (-7°C) to a high of 28°F (-2°C). Skies will become partly cloudy tonight.
Looking ahead, Tuesday is expected to be mostly cloudy with temperatures between 25°F (-4°C) and 31°F (-1°C). Wednesday will see a mix of clouds and sun, with a notable warm-up as temperatures are forecast to reach a high of 42°F (6°C).
The latter half of the week promises more dynamic weather. Thursday brings a chance of rain showers at night with a high of 44°F (7°C). Rain is likely to continue into Friday, which is expected to be the warmest day of the week with a high of 52°F (11°C).
The weekend will see a return to colder temperatures. Saturday is forecast to be cloudy with a high of 35°F (2°C) and a chance of rain and snow at night. Sunday will be partly cloudy with a high of 44°F (7°C).
Let’s mock pause_turn with async completion and streaming:
async def mk_pause_web_search_stream():
"""Async generator that mimics a streaming pause_turn response"""
srv_tc = mk_tc("web_search", json.dumps({"query": "Solveit Answer.AI"}),
tcid=random_tool_id().replace('toolu_', 'srvtoolu_'))
# Yield content chunk
yield ModelResponseStream([StreamingChoices(
delta=Delta(content="Let me search for that information:", role='assistant')
)])
# Yield tool call chunk
yield ModelResponseStream([StreamingChoices(
delta=Delta(tool_calls=[srv_tc])
)])
# Final chunk with pause_turn finish reason
yield ModelResponseStream([StreamingChoices(
finish_reason="pause_turn",
delta=Delta()
)])orig_acompletion = acompletion
call_count = 0
async def patched_acompletion(*args, **kwargs):
global call_count
call_count += 1
print(f"Mock Async Call {call_count}")
await asyncio.sleep(1)
if call_count < 3: return mk_pause_web_search_stream()
return await orig_acompletion(*args, **kwargs)
acompletion = patched_acompletion
achat_pause = AsyncChat('claude-sonnet-4-5', search='l')
call_count = 0
res = await achat_pause("Search and tell me about Solveit", stream=True)
fmt = await adisplay_stream(res)
print(f"\nTotal calls: {call_count}")
acompletion = orig_acompletionLet me search for that information:Let me search for that information:Based on my search results, I can tell you about Solveit - it’s a course and platform created by Answer.AI:
What is Solveit?
* Solveit is a course in how to solve problems (including coding, writing, sysadmin, and research) using fast short iterations, and also provides a platform that makes this approach easier and more effective.
Background
* The approach is based on George Polya’s influential 1945 book “How to Solve It,” which shares philosophies on education focusing on active learning, heuristic thinking, and careful questioning, outlining a four-step problem-solving framework.
The Creators
* Jeremy Howard and Eric Ries (of Lean Startup fame) created Answer.AI, the AI research lab behind Solveit. * fast.ai joined Answer.AI, marking a new phase in making AI accessible to everyone.
The Problem It Solves
* It’s easier than ever to get started with AI, but also easier than ever to let AI steer you into a situation where you’re overwhelmed by code you don’t understand. The platform addresses “AI fatigue” by teaching users to work with AI through short iterations rather than generating large amounts of code at once.
What’s Included
* The offering includes a course showing how to solve coding problems, write prose, build web apps, use AI models, and much more, using the solveit method; a software platform designed to make the approach more accessible and effective; and a thriving and supportive community.
* Across 10 lessons, the course covers classic data structures and algorithms, advanced programming methods, web programming using FastHTML, writing, reading academic papers, and building startups.
Pricing
* The course fee includes access to the Solveit platform (with no quotas) for 30 days, after which users can subscribe for $10/month to maintain access.
Total calls: 3
Tool Call Referencing
achat = AsyncChat('claude-sonnet-4-5', tools=[get_person, greet_person], tc_refs=True)
await achat("First call get_person, then pass the result to greet_person", max_steps=3)Perfect! I successfully completed both steps:
Retrieved person data: I called
get_personwhich returned information about Alice, who is 30 years old.Greeted the person: I then passed Alice’s data to
greet_person, which generated the greeting: “Hello Alice, you are 30 years old!”
The task has been completed successfully. The person’s data was retrieved and used to create a personalized greeting.
- id:
chatcmpl-xxx - model:
claude-sonnet-4-5-20250929 - finish_reason:
stop - usage:
Usage(completion_tokens=103, prompt_tokens=1083, total_tokens=1186, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=0, text_tokens=None, image_tokens=None, cache_creation_tokens=0, cache_creation_token_details=CacheCreationTokenDetails(ephemeral_5m_input_tokens=0, ephemeral_1h_input_tokens=0)), cache_creation_input_tokens=0, cache_read_input_tokens=0)
achat.tc_res{'toolu_mSo0oQhPSBmFLarTJsAJhA': {'name': 'Alice', 'age': 30},
'toolu_UuvaxaFFSJmh_MFWng30Ww': 'Hello Alice, you are 30 years old!'}
list(L(achat.hist).attrgot('tool_calls').filter())[[{'index': 1,
'function': {'arguments': '{}', 'name': 'get_person'},
'id': 'toolu_mSo0oQhPSBmFLarTJsAJhA',
'type': 'function'}],
[{'index': 1,
'function': {'arguments': '{"person": "$`toolu_mSo0oQhPSBmFLarTJsAJhA`"}',
'name': 'greet_person'},
'id': 'toolu_UuvaxaFFSJmh_MFWng30Ww',
'type': 'function'}]]