Codestin Search App

@jonaylor89

Caution

Breaking Changes

TensorZero will normalize the reported usage from different model providers. Moving forward, input_tokens and output_tokens include all token variations (provider prompt caching, reasoning, etc.), just like OpenAI. Tokens cached by TensorZero remain excluded. You can still access the raw usage reported by providers with include_raw_usage.

Warning

Planned Deprecations

Migrate include_original_response to include_raw_response. For advanced variant types, the former only returned the last model inference, whereas the latter returns every model inference with associated metadata.
Migrate allow_auto_detect_region = true to region = "sdk" when configuring AWS model providers. The behavior is identical.
Provide the proper API base rather than the full endpoint when configuring custom Anthropic providers. Example:
- Before: api_base = "https://YOUR-RESOURCE-NAME.services.ai.azure.com/anthropic/v1/messages"
- Now: api_base = "https://YOUR-RESOURCE-NAME.services.ai.azure.com/anthropic/v1/"

Bug Fixes

Fix a regression that triggered incorrect warnings about usage reporting for streaming inferences with Anthropic models.
Fix a bug in the TensorZero Python SDK that discarded some request fields in certain multi-turn inferences with tools.

New Features

Improve error handling across many areas: TensorZero UI, JSON deserialization, AWS providers, streaming inferences, timeouts, etc.
Support Valkey (Redis) for improving performance of rate limiting checks (recommended at 100+ QPS).
Support reasoning_effort for Gemini 3 models (mapped to thinkingLevel).
Improve handling of Anthropic reasoning models in TensorZero JSON functions. Moving forward, json_mode = "strict" will use the beta structured outputs feature; json_mode = "on" still uses the legacy assistant message prefill.
Improve handling of reasoning content in the OpenRouter and xAI model providers.
Add extra_headers support for embedding models. (thanks @jonaylor89!)
Support dynamic credentials for AWS Bedrock and AWS SageMaker model providers.

& multiple under-the-hood and UI improvements (thanks @ndoherty-xyz)!

@ecalifornica

New Features

Support appending to arrays with extra_body using the /my_array/- notation.
Handle cross-model thought signatures in GCP Vertex AI Gemini and Google AI Studio.

& multiple under-the-hood and UI improvements (thanks @ecalifornica!)

Warning

Planned Deprecations

In a future release, the parameter model will be required when initializing DICLOptimizationConfig. The parameter remains optional (defaults to openai::gpt-5-mini) in the meantime.

Bug Fixes

Stop buffering raw_usage when streaming with the OpenAI-compatible inference endpoint; instead, emit raw_usage as soon as possible, just like in the native endpoint.
Stop reporting zero usage in every chunk when streaming a cached inference; instead, report zero usage only in the final chunk, as expected.

New Features

Support stream_options.include_usage for every model under the Azure provider.

& multiple under-the-hood and UI improvements!

@ecalifornica

Caution

Breaking Changes

The Prometheus metric tensorzero_inference_latency_overhead_seconds will report a histogram instead of a summary. You can customize the buckets using gateway.metrics.tensorzero_inference_latency_overhead_seconds_buckets in the configuration (default: 1ms, 10ms, 100ms).

Warning

Planned Deprecations

Deprecate the TENSORZERO_CLICKHOUSE_URL environment variable from the UI. Moving forward, the UI will query data through the gateway and does not communicate directly with ClickHouse.
Rename the Prometheus metric tensorzero_inference_latency_overhead_seconds_histogram to tensorzero_inference_latency_overhead_seconds. Both metrics will be emitted for now.
Rename the configuration field tensorzero_inference_latency_overhead_seconds_histogram_buckets to tensorzero_inference_latency_overhead_seconds_buckets. Both fields are available for now.

New Features

Add optional include_raw_usage parameter to inference requests. If enabled, the gateway returns the raw usage objects from model provider responses in addition to the normalized usage response field.
Add optional --bind-address CLI flag to the gateway.
Add optional description field to metrics in the configuration.
Add option to fine-tune Fireworks models without automatic deployment.

& multiple under-the-hood and UI improvements (thanks @ecalifornica @achaljhawar @rguilmont)!

@jinnovation

Caution

Breaking Changes

Migrated the following optimization fields from the TensorZero Python SDK to the configuration:
- DICLOptimizationConfig: removed credential_location.
- FireworksSFTConfig: moved account_id to [provider_types.fireworks.sft]; removed api_base and credential_location.
- GCPVertexGeminiSFTConfig: moved bucket_name, bucket_path_prefix, kms_key_name, project_id, region, and service_account to to [provider_types.gcp_vertex_gemini.sft].
- OpenAIRFTConfig: removed api_base and credential_location.
- OpenAISFTConfig: removed api_base and credential_location.
- TogetherSFTConfig: hf_api_token, wandb_api_key, wandb_base_url, and wandb_project_name moved to [provider_types.together.sft]; removed api_base and credential_location.

New Features

Support gateway relay. With gateway relay, an LLM inference request can be routed through multiple independent TensorZero Gateway deployments before reaching a model provider. This enables you to enforce organization-wide controls (e.g. auth, rate limits, credentials) without restricting how teams build their LLM features.
Add "Try with model" button to the datapoint page in the UI.
Add tensorzero_inference_latency_overhead_seconds_histogram Prometheus metric for meta-observability.
Add concurrency parameter to experimental_render_samples (defaults to 100).
Add otlp_traces_extra_attributes and otlp_traces_extra_resources to the TensorZero Python SDK. (thanks @jinnovation!)

& multiple under-the-hood and UI improvements (thanks @ecalifornica)

@jinnovation

Warning

Planned Deprecations

The variant type experimental_chain_of_thought will be deprecated in 2026.2+. As reasoning models are becoming prevalent, please use their native reasoning capabilities.
The timeout_s configuration field for best/mixture-of-N variants will be deprecated in 2026.2+. Please use the [timeouts] block in the configuration for their candidates instead.

New Features

Expand the dataset builder in the UI to support complex queries (e.g. filter by tags, feedback).
Export tensorzero_inference_latency_overhead_seconds Prometheus metric for meta-observability.
Allow users to disable TensorZero API keys using --disable-api-key in the CLI. (thanks @jinnovation!)

& multiple under-the-hood and UI improvements (thanks @ecalifornica)!

@jinnovation

Bug Fixes

Fix a bug where negative tag filters (e.g. user_id != 1) matched inferences and datapoints without that tag.
Fix a bug where metric filters covering default values (e.g. exact_match = false) matched inferences without that metric.
Fix a regression affecting the logger in the UI.

New Features

Improve the performance of the inference and datapoint list pages in the UI.
Support filtering inferences by whether they have a demonstration.

& multiple under-the-hood and UI improvements (thanks @jinnovation @ecalifornica @simeonlee)!

Bug Fixes

Fix a performance regression affecting the inference table in the UI.

New Features

Allow users to customize the log level in the UI (TENSORZERO_UI_LOG_LEVEL).

& multiple under-the-hood and UI improvements

Bug Fixes

Fixed a regression that broke the dataset builder in the UI.

& multiple under-the-hood and UI improvements

@nicoestrada

Caution

Breaking Changes

Unknown content blocks now return the scope as model_name and provider_name instead of the fully-qualified model_provider_name.

Warning

Planned Deprecations

The TensorZero UI now reads the configuration from the gateway (instead of reading directly from the filesystem). The environment variables TENSORZERO_UI_CONFIG_PATH and TENSORZERO_UI_DEFAULT_CONFIG are deprecated and ignored. You no longer need to mount the configuration onto the UI container.
Use model_name and provider_name to scope provider tools (e.g. OpenAI Responses API web search) instead of model_provider_name. The deprecated name is still accepted in the API.

Bug Fixes

Fix a regression in the "Try with..." modal in the UI that disregarded some parameters (e.g. allowed_tools).
Fix a regression in allowed_tools when using custom display names for tools.
Fix an edge case when using both allowed_tools and tool_choice parameters with GCP Vertex AI Gemini.

New Features

Support free-form search and filtering (e.g. by tags, metrics) the inference and datapoint tables in the UI.
Support creating datapoints from scratch in the UI.
Support editing TensorZero API key descriptions in the UI (thanks @nicoestrada!).
Support editing any kind of datapoint input and output in the UI.
Support peeking at inferences in the episode detail page in the UI (thanks @BrianLi23!).
Support cloning datapoints in the UI.
Optimize the rendering performance of the code editor in the UI.
Make mime_type optional for base64 file inputs (now inferred from magic bytes when not provided).

& multiple under-the-hood and UI improvements

Releases: tensorzero/tensorzero

2026.1.5

Contributors

Uh oh!

2026.1.2

Contributors

Uh oh!

2026.1.1

Uh oh!

2026.1.0

Contributors

Uh oh!

2025.12.6

Contributors

Uh oh!

2025.12.5

Contributors

Uh oh!

2025.12.3

Contributors

Uh oh!

2025.12.2

Uh oh!

2025.12.1

Uh oh!

2025.12.0

Contributors

Uh oh!