| Model | Category | Type | Context | Strength |
|---|---|---|---|---|
| General | Instruction | 128K | General assistant | |
| Enterprise | Efficient | 128K | Enterprise AI | |
| Reasoning | Advanced | 128K | Reasoning | |
| Reasoning | Efficient | 128K | Distilled reasoning | |
| General | Multilingual | 128K | General chat | |
| General | Improved | 128K | General chat | |
| Fast | Low Latency | 128K | Fast inference | |
| Reasoning | High Performance | 256K | Advanced reasoning | |
| Chat | Multilingual | 128K | Conversational AI | |
| Reasoning | Multilingual | 128K | Reasoning | |
| Reasoning | Long Context | 200K | Reasoning | |
| Enterprise | Lightweight | 32K | Enterprise AI | |
| Multimodal | Chat | 128K | Vision + chat | |
| Open | Efficient | 32K | Open-weight AI | |
| Chat | Instruction | 200K | Conversational AI | |
| Multimodal | Vision + Audio | 128K | Omni AI | |
| Enterprise | High Performance | 128K | Enterprise inference | |
| Open | Large Model | 128K | Open reasoning | |
| General | Multilingual | 128K | General AI | |
| Reasoning | Advanced | 256K | Reasoning | |
| Coding | Advanced | 256K | Agentic coding | |
| Coding | Balanced | 128K | Software engineering | |
| Coding | Instruct | 128K | Code generation | |
| Coding | Large | 256K | Massive coding | |
| Coding | Fast | 128K | Fast coding | |
| Vision | Instruction | 128K | Vision understanding | |
| Vision | Reasoning | 128K | Vision reasoning | |
| Vision | Efficient | 128K | Vision tasks | |
| Vision | Lightweight | 128K | Light vision tasks | |
| Vision | Fast | 128K | Fast multimodal | |
| Vision | Advanced | 128K | Advanced multimodal | |
| General | Large | 128K | Multilingual AI | |
| General | Efficient | 128K | Efficient inference | |
| Reasoning | Balanced | 128K | Reasoning | |
| General | Massive | 256K | Large-scale AI | |
| Fast | Efficient | 128K | Fast inference | |
| General | Balanced | 128K | General AI | |
| General | Efficient | 128K | General inference | |
| Reasoning | Balanced | 128K | Reasoning | |
| General | Preview | 256K | High performance | |
| General | Advanced | 128K | Advanced AI | |
| Chat | Long Context | 200K | Long-context AI | |
| Chat | Advanced | 200K | Conversational AI | |
| Coding | Reliable | 1M | Software engineering | |
| Multimodal | Omni | 128K | Vision + chat | |
| Fast | Efficient | 128K | Lightweight multimodal | |
| Multimodal | Advanced | 1M | Long-context multimodal | |
| Fast | Efficient | 1M | Fast multimodal | |
| Fast | Preview | 1M | Next-gen flash AI | |
| General | Preview | 1M | Advanced reasoning | |
| OCR | Document AI | 128K | OCR + extraction | |
| Fast | Efficient | 200K | Ultra-fast chat | |
| Reasoning | Premium | 200K | High-end reasoning | |
| Reasoning | Advanced | 200K | Advanced reasoning | |
| Reasoning | Top Tier | 200K | Top-tier reasoning | |
| Chat | Balanced | 200K | General assistant | |
| Chat | Reasoning | 200K | Conversational reasoning | |
| General | Balanced | 128K | General AI | |
| Reasoning | Frontier | 1M | Flagship reasoning | |
| Reasoning | Efficient | 1M | Fast reasoning | |
| Fast | Ultra Efficient | 1M | Ultra-low latency |
Run powerful AI models via simple APIs - no infrastructure required.
We handle routing, scaling, tuning, and reliability so your team can focus on building.
Need higher performance or predictable workloads?
Launch dedicated GPU instances with better latency, control, and consistent performance.
As demand grows, scale to high-performance infrastructure.
Move to bare metal and AI appliances for maximum performance and lower cost at scale.
From zero setup → dedicated compute → hyperscale infrastructure - all in one platform.