Codestin Search App

Baseten makes it easy to go from a trained machine learning model to a fully-deployed, production-ready API. You’ll use Truss—our open-source model packaging tool—to containerize your model code and configuration, and ship it to Baseten for deployment, testing, and scaling.

What does it mean to develop a model?

In Baseten, developing a model means:

Packaging your model code and weights: Wrap your trained model into a structured project that includes your inference logic and dependencies.
Configuring the model environment: Define everything needed to run your model—from Python packages to system dependencies and secrets.
Deploying and iterating quickly: Push your model to Baseten and iterate with live edits using truss push --watch.

Once your model works the way you want, you can promote it to production, ready for live traffic.

Development flow on Baseten

Here’s what the typical model development loop looks like:

Initialize a new model project using the Truss CLI.
Add your model logic to a Python class (model.py), specifying how to load and run inference.
Configure dependencies in a YAML or Python config.
Deploy the model using truss push for a published deployment, or truss push --watch for development mode.
Iterate fast with truss push --watch or truss watch to live-reload your dev deployment as you make changes.
Test and tune the model until it’s production-ready.
Promote the model to production when you’re ready to scale.

Note: Truss runs your model in a standardized container without needing Docker installed locally. It also gives you a fast developer loop and a consistent way to configure and serve models.

What is Truss?

Truss is the tool you use to:

Scaffold a new model project
Serve models locally or in the cloud
Package your code, config, and model files
Push to Baseten for deployment

You can think of it as the developer toolkit for building and managing model servers—built specifically for machine learning workflows. With Truss, you can create a containerized model server without needing to learn Docker, and define everything about how your model runs: Python and system packages, GPU settings, environment variables, and custom inference logic. It gives you a fast, reproducible dev loop—test changes locally or in a remote environment that mirrors production. Truss is flexible enough to support a wide range of ML stacks, including:

Model frameworks like PyTorch, transformers, and diffusers
Inference engines like TensorRT-LLM, SGLang, vLLM
Serving technologies like Triton
Any package installable with pip or apt

We’ll use Truss throughout this guide, but the focus will stay on how you develop models, not just how Truss works.

From model to server: the key components

When you develop a model on Baseten, you define:

A Model class: This is where your model is loaded, preprocessed, run, and the results returned.
A configuration file (config.yaml or Python config): Defines the runtime environment, dependencies, and deployment settings.
Optional extra assets, like model weights, secrets, or external packages.

These components together form a Truss, which is what you deploy to Baseten. Truss simplifies and standardizes model packaging for seamless deployment. It encapsulates model code, dependencies, and configurations into a portable, reproducible structure, enabling efficient development, scaling, and optimization.

Development vs. published deployments

By default, truss push creates a published deployment, which is stable, autoscaled, and ready for live traffic.

Published deployment (truss push) Stable, autoscaled, and ready for live traffic but doesn’t support live-reloading.
Development deployment (truss push --watch) Meant for iteration and testing. It supports live-reloading for quick feedback loops and will only scale to one replica, no autoscaling.

Use development mode to build and test, then deploy a published version with truss push when you’re satisfied.

Get started

Concepts

Development

Deployment

Inference

Engines

Training

Organization

Observability

Troubleshooting

Developing a Model on Baseten

What does it mean to develop a model?

Development flow on Baseten

What is Truss?

From model to server: the key components

Development vs. published deployments

Get started

Concepts

Development

Deployment

Inference

Engines

Training

Organization

Observability

Troubleshooting

​What does it mean to develop a model?

​Development flow on Baseten

​What is Truss?

​From model to server: the key components

​Development vs. published deployments

What does it mean to develop a model?

Development flow on Baseten

What is Truss?

From model to server: the key components

Development vs. published deployments