index.mdx

title	What is DeepInfra
description	AI inference cloud — OpenAI-compatible API, 100s of open-source models, private GPU deployments, and GPU rental.
icon	bolt

DeepInfra is an AI inference cloud that makes it simple to run the latest machine learning models at scale — LLMs, vision, embeddings, image generation, video generation, speech, and more.

What you can do

OpenAI-compatible API for 100+ LLMs. Swap your base URL, keep your code. Multimodal models for visual understanding and document text extraction. State-of-the-art embedding and reranker models for search and RAG. FLUX, Stable Diffusion, text-to-video, and more. Speech recognition (Whisper) and text-to-speech models. Run your own fine-tuned LLM on A100 / H100 / H200 / B200 / B300 with autoscaling.

Why DeepInfra

Drop-in OpenAI replacement. Point your existing OpenAI SDK to https://api.deepinfra.com/v1/openai and your code works without changes. No migration required.

Best price for open-source models. DeepInfra consistently offers the lowest prices for open-source model inference. You only pay per token — no idle GPU time, no minimums, no seat fees. DeepInfra is also the provider with the most models on OpenRouter.

Always-fresh model catalog. DeepInfra is typically among the first providers to deploy a newly released model.

Private deployments for compliance and customization. Need to run your own fine-tuned weights, or require data isolation? Deploy a dedicated instance on A100/H100/H200/B200/B300 with autoscaling and a private endpoint — competitive GPU pricing, deployable in just a few clicks.

GPU Clusters for training and full control. Rent a B200 or B300 cluster with SSH access and run whatever you want.

Get started in 60 seconds

Make your first API call — no installation required.

Quick example

from openai import OpenAI

client = OpenAI(
    api_key="$DEEPINFRA_TOKEN",
    base_url="https://api.deepinfra.com/v1/openai",
)

response = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-V3",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)

Get your API key from the Dashboard.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What you can do

Why DeepInfra

Get started in 60 seconds

Quick example

FilesExpand file tree

index.mdx

Latest commit

History

index.mdx

File metadata and controls

What you can do

Why DeepInfra

Get started in 60 seconds

Quick example