Thanks to visit codestin.com
Credit goes to docs.baseten.co

Skip to main content
Baseten makes it easy to go from a trained machine learning model to a fully-deployed, production-ready API. You’ll use Truss—our open-source model packaging tool—to containerize your model code and configuration, and ship it to Baseten for deployment, testing, and scaling.

What does it mean to develop a model?

In Baseten, developing a model means:
  1. Packaging your model code and weights: Wrap your trained model into a structured project that includes your inference logic and dependencies.
  2. Configuring the model environment: Define everything needed to run your model—from Python packages to system dependencies and secrets.
  3. Deploying and iterating quickly: Push your model to Baseten and iterate with live edits using truss push --watch.
Once your model works the way you want, you can promote it to production, ready for live traffic.

Development flow on Baseten

Here’s what the typical model development loop looks like:
  1. Initialize a new model project using the Truss CLI.
  2. Add your model logic to a Python class (model.py), specifying how to load and run inference.
  3. Configure dependencies in a YAML or Python config.
  4. Deploy the model using truss push for a published deployment, or truss push --watch for development mode.
  5. Iterate fast with truss push --watch or truss watch to live-reload your dev deployment as you make changes.
  6. Test and tune the model until it’s production-ready.
  7. Promote the model to production when you’re ready to scale.
Note: Truss runs your model in a standardized container without needing Docker installed locally. It also gives you a fast developer loop and a consistent way to configure and serve models.

What is Truss?

Truss is the tool you use to:
  • Scaffold a new model project
  • Serve models locally or in the cloud
  • Package your code, config, and model files
  • Push to Baseten for deployment
You can think of it as the developer toolkit for building and managing model servers—built specifically for machine learning workflows. With Truss, you can create a containerized model server without needing to learn Docker, and define everything about how your model runs: Python and system packages, GPU settings, environment variables, and custom inference logic. It gives you a fast, reproducible dev loop—test changes locally or in a remote environment that mirrors production. Truss is flexible enough to support a wide range of ML stacks, including:
  • Model frameworks like PyTorch, transformers, and diffusers
  • Inference engines like TensorRT-LLM, SGLang, vLLM
  • Serving technologies like Triton
  • Any package installable with pip or apt
We’ll use Truss throughout this guide, but the focus will stay on how you develop models, not just how Truss works.

From model to server: the key components

When you develop a model on Baseten, you define:
  • A Model class: This is where your model is loaded, preprocessed, run, and the results returned.
  • A configuration file (config.yaml or Python config): Defines the runtime environment, dependencies, and deployment settings.
  • Optional extra assets, like model weights, secrets, or external packages.
These components together form a Truss, which is what you deploy to Baseten. Truss simplifies and standardizes model packaging for seamless deployment. It encapsulates model code, dependencies, and configurations into a portable, reproducible structure, enabling efficient development, scaling, and optimization.

Development vs. published deployments

By default, truss push creates a published deployment, which is stable, autoscaled, and ready for live traffic.
  • Published deployment (truss push) Stable, autoscaled, and ready for live traffic but doesn’t support live-reloading.
  • Development deployment (truss push --watch) Meant for iteration and testing. It supports live-reloading for quick feedback loops and will only scale to one replica, no autoscaling.
Use development mode to build and test, then deploy a published version with truss push when you’re satisfied.