This verified model matrix pairs with import_workflows.md. For each model family, it lists the dtype(s) used during validation.
TensorRT is a general-purpose neural-network graph execution engine, not a model zoo. In principle any NN architecture can run on TensorRT as long as it is expressible through the workflows described in the Import Workflows Guide. The Custom Plugin section covers the escape hatch for ops TensorRT does not yet implement natively.
The table below is not an exhaustive support list. It is the subset of models NVIDIA has verified and benchmarked; we publish it so you know which configurations have a known-good baseline and where the current rough edges are. If your model is not listed, the expectation is still that it works — please file an issue if it does not.
- Dtype lists the precision used for the verified baseline. Other precisions may also work.
- Component-split models (diffusion pipelines, speech models with encoder/decoder) list one row per validated component.
- LLMs / Text Generation
- Encoder-only NLP (BERT family, embeddings)
- Vision Classification & Embeddings
- Speech / Audio
- Diffusion Models
- Multimodal
- Legacy / TRT Sample Models
- Requesting New Model Coverage
Preferred path for LLM generation: TensorRT-LLM (KV-cache, paged attention, FP8/INT4, speculative decoding, tensor/pipeline parallelism). For production LLM serving, use TensorRT-LLM.
| Model | Dtype |
|---|---|
meta-llama/Llama-3.1-8B |
bfloat16 |
meta-llama/Llama-3.2-1B |
bfloat16 |
Qwen/Qwen3-0.6B |
bfloat16 |
deepseek-ai/Janus-Pro-7B |
bfloat16 |
For TensorRT-LLM's own coverage, see the TensorRT-LLM model support matrix.
| Model | Dtype |
|---|---|
google-bert/bert-base-uncased |
float32 |
google-bert/bert-base-multilingual-cased |
float16 |
FacebookAI/roberta-base |
float32 |
FacebookAI/roberta-large |
float32 |
FacebookAI/xlm-roberta-base |
float32 |
distilbert/distilbert-base-uncased |
float32 |
sentence-transformers/all-MiniLM-L6-v2 |
float32 |
sentence-transformers/all-mpnet-base-v2 |
float32 |
sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 |
float32 |
BAAI/bge-base-en-v1.5 |
float32 |
nlpaueb/legal-bert-base-uncased |
float32 |
| Model | Dtype |
|---|---|
torchvision/resnet50 |
float32 |
timm/mobilenetv3_small_100.lamb_in1k |
float32 |
trpakov/vit-face-expression |
float32 |
openai/clip-vit-base-patch32 |
float32 |
openai/clip-vit-large-patch14 |
float32 |
facebook/dinov2-base |
float32 |
Falconsai/nsfw_image_detection |
float32 |
dima806/fairface_age_image_detection |
float32 |
| Model (Component) | Dtype |
|---|---|
openai/whisper-large-v3-turbo (Encoder) |
float32 |
openai/whisper-large-v3-turbo (Decoder) |
float32 |
openai/whisper-large-v3 (Encoder) |
float32 |
openai/whisper-large-v3 (Decoder) |
float32 |
laion/clap-htsat-fused |
float32 |
sesame/csm-1b (Backbone) |
float32 |
neuphonic/neutts-air |
float32 |
LiquidAI/LFM2-Audio-1.5B |
float32 |
Diffusion pipelines are evaluated per component (Text Encoder / UNet or DiT / VAE) because TRT does not ingest the pipeline object directly.
| Pipeline (Component) | Dtype |
|---|---|
stabilityai/sd-turbo |
float16 |
stabilityai/sdxl-turbo (UNet) |
float16 |
stabilityai/sdxl-turbo (VAE / Text Encoders) |
mixed |
stabilityai/stable-diffusion-xl-base-1.0 |
float16 |
CompVis/stable-diffusion-v1-4 |
float16 |
stable-diffusion-v1-5/stable-diffusion-v1-5 |
float16 |
stabilityai/stable-diffusion-2-1 |
float16 |
playgroundai/playground-v2.5-1024px-aesthetic |
float16 |
dataautogpt3/ProteusV0.3 |
float16 |
black-forest-labs/FLUX.2-dev (Text Encoder) |
bfloat16 |
black-forest-labs/FLUX.2-dev (DiT) |
bfloat16 |
black-forest-labs/FLUX.2-dev (VAE) |
float16 |
black-forest-labs/FLUX.1-schnell (DiT / TextEnc / VAE) |
mixed |
Wan-AI/Wan2.2-T2V-A14B-Diffusers (Text Encoder) |
float16 |
Wan-AI/Wan2.2-T2V-A14B-Diffusers (VAE) |
float16 |
Qwen/Qwen-Image (Text Encoder) |
bfloat16 |
Qwen/Qwen-Image (DiT / VAE) |
bfloat16 |
stabilityai/stable-diffusion-3-medium-diffusers |
bfloat16 |
stabilityai/stable-diffusion-3.5-medium / 3.5-large |
mixed |
HiDream-ai/HiDream-I1-Full |
bfloat16 |
stabilityai/stable-video-diffusion-img2vid-xt |
float16 |
| Model | Dtype |
|---|---|
openai/clip-vit-base-patch32 |
float32 |
deepseek-ai/Janus-Pro-7B |
bfloat16 |
Datadog/Toto-Open-Base-1.0 |
float32 |
TensorRT ships hand-validated C++/Python samples for these classic architectures and workflows:
- MNIST digit classifiers, model parsing, dynamic-shape, plugin, and safe-runtime samples — see
samples/in this repo.
File a GitHub issue with:
- The Hugging Face ID or model source URL.
- The target dtype (fp32 / fp16 / bf16 / fp8 / int8 / int4).
- Any framework-level working example (helps us reproduce quickly).
The maintainers will benchmark the model and extend this table — no external contributor action needed for the benchmark step.