Tags: wreed4/litellm
Tags
fixes: expose flag to disable token counter (BerriAI#11344) * fixes: expose flag to disable token counter * fix add disable_token_counter
Merge in - Gemini streaming - thinking content parsing - return in `r… …easoning_content` (BerriAI#11298) * fix(base_routing_strategy.py): compress increments to redis - reduces write ops * fix(base_routing_strategy.py): make get and reset in memory keys atomic * fix(base_routing_strategy.py): don't reset keys - causes discrepency on subsequent requests to instance * fix(parallel_request_limiter.py): retrieve values of previous slots from cache more accurate rate limiting with sliding window * fix: fix test * fix: fix linting error * fix(gemini/): fix streaming handler for function calling Closes BerriAI#11294 * fix: fix linting error * test: update test * fix(vertex_and_google_ai_studio_gemini.py): return none on skipped chunk * fix(streaming_handler.py): skip none chunks on async streaming
fix(proxy_server.py): mount __next__ at / and /litellm allows it to work when proxy is mounted on root
Rate Limiting: Check all slots on redis, Reduce number of cache writes ( BerriAI#11299) * fix(base_routing_strategy.py): compress increments to redis - reduces write ops * fix(base_routing_strategy.py): make get and reset in memory keys atomic * fix(base_routing_strategy.py): don't reset keys - causes discrepency on subsequent requests to instance * fix(parallel_request_limiter.py): retrieve values of previous slots from cache more accurate rate limiting with sliding window * fix: fix test * fix: fix linting error
feat: Add audio parameter support to gemini tts models (BerriAI#11287) * feat: Add Gemini TTS audio parameter support - Add is_model_gemini_audio_model() method to detect TTS models - Include 'audio' parameter in supported params for TTS models - Map OpenAI audio parameter to Gemini speechConfig format - Add _extract_audio_response_from_parts() method to transform audio output to openai format * updated unit-test to use pcm16 * - created typedict for speechconfig - simplified gemini tts model detection - moved gemini_tts test to test_litellm * simplified is_model_gemini_audio_model more
feat(parallel_request_limiter_v2.py): add sliding window logic (Berri… …AI#11283) * feat(parallel_request_limiter_v2.py): add sliding window logic allows rate limiting to work across minutes * fix(parallel_request_limiter_v2.py): decrement usage on rate limit error * fix(base_routing_strategy.py): fix merge from redis - preserve values in in-memory cache during gap b/w push to redis and read from redis * fix(base_routing_strategy.py): catch the delta change during redis sync ensures values are kept in sync * fix(parallel_request_limiter_v2.py): update tpm tracking to use slot key logic * fix: fix linting error * test: update testing * test: update tests * test: skip on rate limit or internal server errors * test: use pytest fixture instead * test: bump mistral model