Tags: nevrax/litellm
Tags
Litellm dev 06 03 2025 p3 (BerriAI#11388) * fix(vertex_ai/common_utils.py): Close BerriAI#11383 * feat(anthropic/batches): transformation.py new transformation config for anthropic batches * feat(anthropic/batches): working token tracking for anthropic batch calls via `/anthropic` passthrough route * fix(anthropic_passthrough_logging_handler.py): ruff check fixes
[Feat] Performance - Don't create 1 task for every hanging request al… …ert (BerriAI#11385) * feat: add async_get_oldest_n_keys in memory cache * fix: add add_request_to_hanging_request_check * test: alerting * feat: v2 hanging request check * fix: HangingRequestData * fix: AlertingHangingRequestCheck * fix: check_for_hanging_requests * fix: use correct metadata location for hanging requests * fix: formatting alert * test hanging request check * fix: add guard flags for background tasks alerting
fixes: expose flag to disable token counter (BerriAI#11344) * fixes: expose flag to disable token counter * fix add disable_token_counter
Merge in - Gemini streaming - thinking content parsing - return in `r… …easoning_content` (BerriAI#11298) * fix(base_routing_strategy.py): compress increments to redis - reduces write ops * fix(base_routing_strategy.py): make get and reset in memory keys atomic * fix(base_routing_strategy.py): don't reset keys - causes discrepency on subsequent requests to instance * fix(parallel_request_limiter.py): retrieve values of previous slots from cache more accurate rate limiting with sliding window * fix: fix test * fix: fix linting error * fix(gemini/): fix streaming handler for function calling Closes BerriAI#11294 * fix: fix linting error * test: update test * fix(vertex_and_google_ai_studio_gemini.py): return none on skipped chunk * fix(streaming_handler.py): skip none chunks on async streaming
fix(proxy_server.py): mount __next__ at / and /litellm allows it to work when proxy is mounted on root
Rate Limiting: Check all slots on redis, Reduce number of cache writes ( BerriAI#11299) * fix(base_routing_strategy.py): compress increments to redis - reduces write ops * fix(base_routing_strategy.py): make get and reset in memory keys atomic * fix(base_routing_strategy.py): don't reset keys - causes discrepency on subsequent requests to instance * fix(parallel_request_limiter.py): retrieve values of previous slots from cache more accurate rate limiting with sliding window * fix: fix test * fix: fix linting error
feat: Add audio parameter support to gemini tts models (BerriAI#11287) * feat: Add Gemini TTS audio parameter support - Add is_model_gemini_audio_model() method to detect TTS models - Include 'audio' parameter in supported params for TTS models - Map OpenAI audio parameter to Gemini speechConfig format - Add _extract_audio_response_from_parts() method to transform audio output to openai format * updated unit-test to use pcm16 * - created typedict for speechconfig - simplified gemini tts model detection - moved gemini_tts test to test_litellm * simplified is_model_gemini_audio_model more