Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Tags: wreed4/litellm

Tags

v1.72.0.rc1

Toggle v1.72.0.rc1's commit message
fixes: expose flag to disable token counter (BerriAI#11344)

* fixes: expose flag to disable token counter

* fix add disable_token_counter

v1.72.0.dev3

Toggle v1.72.0.dev3's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Merge in - Gemini streaming - thinking content parsing - return in `r…

…easoning_content` (BerriAI#11298)

* fix(base_routing_strategy.py): compress increments to redis - reduces write ops

* fix(base_routing_strategy.py): make get and reset in memory keys atomic

* fix(base_routing_strategy.py): don't reset keys - causes discrepency on subsequent requests to instance

* fix(parallel_request_limiter.py): retrieve values of previous slots from cache

more accurate rate limiting with sliding window

* fix: fix test

* fix: fix linting error

* fix(gemini/): fix streaming handler for function calling

Closes BerriAI#11294

* fix: fix linting error

* test: update test

* fix(vertex_and_google_ai_studio_gemini.py): return none on skipped chunk

* fix(streaming_handler.py): skip none chunks on async streaming

v1.72.0.dev2

Toggle v1.72.0.dev2's commit message
fix(proxy_server.py): mount __next__ at / and /litellm

allows it to work when proxy is mounted on root

v1.72.0.dev1

Toggle v1.72.0.dev1's commit message
fix: use handle exception on proxy

v1.72.0.rc

Toggle v1.72.0.rc's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Rate Limiting: Check all slots on redis, Reduce number of cache writes (

BerriAI#11299)

* fix(base_routing_strategy.py): compress increments to redis - reduces write ops

* fix(base_routing_strategy.py): make get and reset in memory keys atomic

* fix(base_routing_strategy.py): don't reset keys - causes discrepency on subsequent requests to instance

* fix(parallel_request_limiter.py): retrieve values of previous slots from cache

more accurate rate limiting with sliding window

* fix: fix test

* fix: fix linting error

v1.72.0-nightly

Toggle v1.72.0-nightly's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
feat: Add audio parameter support to gemini tts models (BerriAI#11287)

* feat: Add Gemini TTS audio parameter support

- Add is_model_gemini_audio_model() method to detect TTS models
- Include 'audio' parameter in supported params for TTS models
- Map OpenAI audio parameter to Gemini speechConfig format
- Add _extract_audio_response_from_parts() method to transform audio
  output to openai format

* updated unit-test to use pcm16

* - created typedict for speechconfig
- simplified gemini tts model detection
- moved gemini_tts test to test_litellm

* simplified is_model_gemini_audio_model more

v1.71.3-rc

Toggle v1.71.3-rc's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
feat(parallel_request_limiter_v2.py): add sliding window logic (Berri…

…AI#11283)

* feat(parallel_request_limiter_v2.py): add sliding window logic

allows rate limiting to work across minutes

* fix(parallel_request_limiter_v2.py): decrement usage on rate limit error

* fix(base_routing_strategy.py): fix merge from redis - preserve values in in-memory cache during gap b/w push to redis and read from redis

* fix(base_routing_strategy.py): catch the delta change during redis sync

ensures values are kept in sync

* fix(parallel_request_limiter_v2.py): update tpm tracking to use slot key logic

* fix: fix linting error

* test: update testing

* test: update tests

* test: skip on rate limit or internal server errors

* test: use pytest fixture instead

* test: bump mistral model

v1.71.3-nightly

Toggle v1.71.3-nightly's commit message
ui new build

v1.71.2.dev5

Toggle v1.71.2.dev5's commit message
test: update testing

v1.71.2.dev4

Toggle v1.71.2.dev4's commit message
fix: bedrock guard param persistence