| title | What is DeepInfra |
|---|---|
| description | AI inference cloud — OpenAI-compatible API, 100s of open-source models, private GPU deployments, and GPU rental. |
| icon | bolt |
DeepInfra is an AI inference cloud that makes it simple to run the latest machine learning models at scale — LLMs, vision, embeddings, image generation, video generation, speech, and more.
OpenAI-compatible API for 100+ LLMs. Swap your base URL, keep your code. Multimodal models for visual understanding and document text extraction. State-of-the-art embedding and reranker models for search and RAG. FLUX, Stable Diffusion, text-to-video, and more. Speech recognition (Whisper) and text-to-speech models. Run your own fine-tuned LLM on A100 / H100 / H200 / B200 / B300 with autoscaling.Drop-in OpenAI replacement. Point your existing OpenAI SDK to https://api.deepinfra.com/v1/openai and your code works without changes. No migration required.
Best price for open-source models. DeepInfra consistently offers the lowest prices for open-source model inference. You only pay per token — no idle GPU time, no minimums, no seat fees. DeepInfra is also the provider with the most models on OpenRouter.
Always-fresh model catalog. DeepInfra is typically among the first providers to deploy a newly released model.
Private deployments for compliance and customization. Need to run your own fine-tuned weights, or require data isolation? Deploy a dedicated instance on A100/H100/H200/B200/B300 with autoscaling and a private endpoint — competitive GPU pricing, deployable in just a few clicks.
GPU Clusters for training and full control. Rent a B200 or B300 cluster with SSH access and run whatever you want.
Make your first API call — no installation required.from openai import OpenAI
client = OpenAI(
api_key="$DEEPINFRA_TOKEN",
base_url="https://api.deepinfra.com/v1/openai",
)
response = client.chat.completions.create(
model="deepseek-ai/DeepSeek-V3",
messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)Get your API key from the Dashboard.