Releases: tensorzero/tensorzero
2026.1.5
Caution
Breaking Changes
- TensorZero will normalize the reported
usagefrom different model providers. Moving forward,input_tokensandoutput_tokensinclude all token variations (provider prompt caching, reasoning, etc.), just like OpenAI. Tokens cached by TensorZero remain excluded. You can still access the raw usage reported by providers withinclude_raw_usage.
Warning
Planned Deprecations
- Migrate
include_original_responsetoinclude_raw_response. For advanced variant types, the former only returned the last model inference, whereas the latter returns every model inference with associated metadata. - Migrate
allow_auto_detect_region = truetoregion = "sdk"when configuring AWS model providers. The behavior is identical. - Provide the proper API base rather than the full endpoint when configuring custom Anthropic providers. Example:
- Before:
api_base = "https://YOUR-RESOURCE-NAME.services.ai.azure.com/anthropic/v1/messages" - Now:
api_base = "https://YOUR-RESOURCE-NAME.services.ai.azure.com/anthropic/v1/"
- Before:
Bug Fixes
- Fix a regression that triggered incorrect warnings about usage reporting for streaming inferences with Anthropic models.
- Fix a bug in the TensorZero Python SDK that discarded some request fields in certain multi-turn inferences with tools.
New Features
- Improve error handling across many areas: TensorZero UI, JSON deserialization, AWS providers, streaming inferences, timeouts, etc.
- Support Valkey (Redis) for improving performance of rate limiting checks (recommended at 100+ QPS).
- Support
reasoning_effortfor Gemini 3 models (mapped tothinkingLevel). - Improve handling of Anthropic reasoning models in TensorZero JSON functions. Moving forward,
json_mode = "strict"will use the beta structured outputs feature;json_mode = "on"still uses the legacy assistant message prefill. - Improve handling of reasoning content in the OpenRouter and xAI model providers.
- Add
extra_headerssupport for embedding models. (thanks @jonaylor89!) - Support dynamic credentials for AWS Bedrock and AWS SageMaker model providers.
& multiple under-the-hood and UI improvements (thanks @ndoherty-xyz)!
2026.1.2
New Features
- Support appending to arrays with
extra_bodyusing the/my_array/-notation. - Handle cross-model thought signatures in GCP Vertex AI Gemini and Google AI Studio.
& multiple under-the-hood and UI improvements (thanks @ecalifornica!)
2026.1.1
Warning
Planned Deprecations
- In a future release, the parameter
modelwill be required when initializingDICLOptimizationConfig. The parameter remains optional (defaults toopenai::gpt-5-mini) in the meantime.
Bug Fixes
- Stop buffering
raw_usagewhen streaming with the OpenAI-compatible inference endpoint; instead, emitraw_usageas soon as possible, just like in the native endpoint. - Stop reporting zero usage in every chunk when streaming a cached inference; instead, report zero usage only in the final chunk, as expected.
New Features
- Support
stream_options.include_usagefor every model under the Azure provider.
& multiple under-the-hood and UI improvements!
2026.1.0
Caution
Breaking Changes
- The Prometheus metric
tensorzero_inference_latency_overhead_secondswill report a histogram instead of a summary. You can customize the buckets usinggateway.metrics.tensorzero_inference_latency_overhead_seconds_bucketsin the configuration (default: 1ms, 10ms, 100ms).
Warning
Planned Deprecations
- Deprecate the
TENSORZERO_CLICKHOUSE_URLenvironment variable from the UI. Moving forward, the UI will query data through the gateway and does not communicate directly with ClickHouse. - Rename the Prometheus metric
tensorzero_inference_latency_overhead_seconds_histogramtotensorzero_inference_latency_overhead_seconds. Both metrics will be emitted for now. - Rename the configuration field
tensorzero_inference_latency_overhead_seconds_histogram_bucketstotensorzero_inference_latency_overhead_seconds_buckets. Both fields are available for now.
New Features
- Add optional
include_raw_usageparameter to inference requests. If enabled, the gateway returns the raw usage objects from model provider responses in addition to the normalizedusageresponse field. - Add optional
--bind-addressCLI flag to the gateway. - Add optional
descriptionfield to metrics in the configuration. - Add option to fine-tune Fireworks models without automatic deployment.
& multiple under-the-hood and UI improvements (thanks @ecalifornica @achaljhawar @rguilmont)!
2025.12.6
Caution
Breaking Changes
- Migrated the following optimization fields from the TensorZero Python SDK to the configuration:
DICLOptimizationConfig: removedcredential_location.FireworksSFTConfig: movedaccount_idto[provider_types.fireworks.sft]; removedapi_baseandcredential_location.GCPVertexGeminiSFTConfig: movedbucket_name,bucket_path_prefix,kms_key_name,project_id,region, andservice_accountto to[provider_types.gcp_vertex_gemini.sft].OpenAIRFTConfig: removedapi_baseandcredential_location.OpenAISFTConfig: removedapi_baseandcredential_location.TogetherSFTConfig:hf_api_token,wandb_api_key,wandb_base_url, andwandb_project_namemoved to[provider_types.together.sft]; removedapi_baseandcredential_location.
New Features
- Support gateway relay. With gateway relay, an LLM inference request can be routed through multiple independent TensorZero Gateway deployments before reaching a model provider. This enables you to enforce organization-wide controls (e.g. auth, rate limits, credentials) without restricting how teams build their LLM features.
- Add "Try with model" button to the datapoint page in the UI.
- Add
tensorzero_inference_latency_overhead_seconds_histogramPrometheus metric for meta-observability. - Add
concurrencyparameter toexperimental_render_samples(defaults to 100). - Add
otlp_traces_extra_attributesandotlp_traces_extra_resourcesto the TensorZero Python SDK. (thanks @jinnovation!)
& multiple under-the-hood and UI improvements (thanks @ecalifornica)
2025.12.5
Warning
Planned Deprecations
- The variant type
experimental_chain_of_thoughtwill be deprecated in2026.2+. As reasoning models are becoming prevalent, please use their native reasoning capabilities. - The
timeout_sconfiguration field for best/mixture-of-N variants will be deprecated in2026.2+. Please use the[timeouts]block in the configuration for their candidates instead.
New Features
- Expand the dataset builder in the UI to support complex queries (e.g. filter by tags, feedback).
- Export
tensorzero_inference_latency_overhead_secondsPrometheus metric for meta-observability. - Allow users to disable TensorZero API keys using
--disable-api-keyin the CLI. (thanks @jinnovation!)
& multiple under-the-hood and UI improvements (thanks @ecalifornica)!
2025.12.3
Bug Fixes
- Fix a bug where negative tag filters (e.g.
user_id != 1) matched inferences and datapoints without that tag. - Fix a bug where metric filters covering default values (e.g.
exact_match = false) matched inferences without that metric. - Fix a regression affecting the logger in the UI.
New Features
- Improve the performance of the inference and datapoint list pages in the UI.
- Support filtering inferences by whether they have a demonstration.
& multiple under-the-hood and UI improvements (thanks @jinnovation @ecalifornica @simeonlee)!
2025.12.2
Bug Fixes
- Fix a performance regression affecting the inference table in the UI.
New Features
- Allow users to customize the log level in the UI (
TENSORZERO_UI_LOG_LEVEL).
& multiple under-the-hood and UI improvements
2025.12.1
Bug Fixes
- Fixed a regression that broke the dataset builder in the UI.
& multiple under-the-hood and UI improvements
2025.12.0
Caution
Breaking Changes
- Unknown content blocks now return the scope as
model_nameandprovider_nameinstead of the fully-qualifiedmodel_provider_name.
Warning
Planned Deprecations
- The TensorZero UI now reads the configuration from the gateway (instead of reading directly from the filesystem). The environment variables
TENSORZERO_UI_CONFIG_PATHandTENSORZERO_UI_DEFAULT_CONFIGare deprecated and ignored. You no longer need to mount the configuration onto the UI container. - Use
model_nameandprovider_nameto scope provider tools (e.g. OpenAI Responses API web search) instead ofmodel_provider_name. The deprecated name is still accepted in the API.
Bug Fixes
- Fix a regression in the "Try with..." modal in the UI that disregarded some parameters (e.g.
allowed_tools). - Fix a regression in
allowed_toolswhen using custom display names for tools. - Fix an edge case when using both
allowed_toolsandtool_choiceparameters with GCP Vertex AI Gemini.
New Features
- Support free-form search and filtering (e.g. by tags, metrics) the inference and datapoint tables in the UI.
- Support creating datapoints from scratch in the UI.
- Support editing TensorZero API key descriptions in the UI (thanks @nicoestrada!).
- Support editing any kind of datapoint input and output in the UI.
- Support peeking at inferences in the episode detail page in the UI (thanks @BrianLi23!).
- Support cloning datapoints in the UI.
- Optimize the rendering performance of the code editor in the UI.
- Make
mime_typeoptional for base64 file inputs (now inferred from magic bytes when not provided).
& multiple under-the-hood and UI improvements