Question 1

What is Fleek?

Accepted Answer

AI inference platform. We serve optimized versions of popular models via API. Bring your own models too—we'll optimize them. 70% lower cost, 3x faster, zero quality loss.

Question 2

How is Fleek different from other inference providers?

Accepted Answer

Most inference platforms just host your model on a GPU. We optimize two layers:

1. Model — NVFP4 quantization, custom kernels, precision tuning. 3x faster.
2. GPU — MicroVM infrastructure, sub-second cold starts, 95%+ utilization vs. 30-50% typical.

Both layers compound. That's where the 70% savings come from.

Question 3

What models does Fleek support?

Accepted Answer

At launch: popular open-source models for image, video, multimodal, and select LLMs (DeepSeek R1, Llama, FLUX, etc.).

Custom model optimization is coming soon—you'll be able to upload any model (public or private) and get the same optimization at the same $0.0025/GPU-sec pricing.

Question 4

Can you optimize our proprietary model?

Accepted Answer

Coming soon. We're building support for any model—not just open-source. Upload your fine-tuned weights, proprietary model, or any PyTorch model. Same optimization process, same pricing, no custom model premium. Launching in the coming weeks.

Question 5

What is Weyl?

Accepted Answer

Our research lab. Weyl focuses on fundamental breakthroughs in efficient inference. The work there powers Fleek's products.

Question 6

How does pricing work?

Accepted Answer

$0.0025 per GPU-second. Not per token, not per image. You pay for compute time. When our optimization makes models faster, you pay less automatically. Save up to 70% vs competitors.

Question 7

What is a GPU-second?

Accepted Answer

One second of GPU compute time. Simple, transparent pricing at $0.0025 per GPU-second. Our optimization gains pass directly to you as lower costs.

Question 8

Why do I see a price range?

Accepted Answer

The range reflects our current B200 infrastructure and our optimized GB200 NVL72 infrastructure as it rolls out. The lower price is what you'll pay as we deploy the new hardware. You automatically get the best available rate.

Question 9

Do custom models cost more?

Accepted Answer

No. Same rates. When custom model optimization launches (coming soon), you'll paste a HuggingFace URL or upload your private weights, we optimize it, and the same $0.0025/GPU-sec pricing applies. No custom model premium.

Question 10

Can I set spending limits?

Accepted Answer

Yes. Monthly cap in your dashboard. Alerts at 80%, hard stop at 100%.

Question 11

How does model optimization work?

Accepted Answer

We built our own inference stack with s4 codegen triggering native Blackwell FP4 tactics. We achieve industry-leading throughput, and our GPU-second pricing passes those gains directly to you.

Question 12

How does multi-tenant GPU sharing work?

Accepted Answer

Each tenant gets isolated GPU access through our abstraction layer. Your workloads can't see or interfere with other tenants. We handle the multiplexing—you just send requests.

Question 13

What about edge deployment?

Accepted Answer

On the roadmap. Same optimization stack, smaller devices. Jetson Orin, Xavier, NVIDIA Thor.

Question 14

Which GPUs are available on Fleek?

Accepted Answer

We run on NVIDIA B200, GB200 NVL72, and RTX PRO 6000 (all Blackwell architecture) with native FP4 support. DGX Spark and Jetson Thor support coming soon for edge and embedded AI deployments.

Question 15

Is Fleek secure?

Accepted Answer

Yes. Beyond standard encryption and SOC 2 Type II (in progress), our infrastructure is built on formally verified foundations. Core components are proven correct in Lean4 with cryptographic attestation at every layer. This isn't marketing—it's math. Enterprise customers can get private deployments with VPCs, audit logging, and access to our verification proofs. Contact us to learn more.

Question 16

Do I pay for idle time?

Accepted Answer

No. You only pay for active compute time. When your inference request completes, billing stops. No idle charges, no minimums, no reserved capacity fees.

Question 17

What level of customer support do you offer?

Accepted Answer

Free tier: Email support and Discord community. Pro: Priority email support. Enterprise: Dedicated Slack channel, 24/7 support, and a named account manager.

Question 18

Do you offer volume discounts?

Accepted Answer

Yes. Enterprise customers with high-volume workloads qualify for custom pricing. Contact sales to discuss your specific needs.

Question 19

Can I run Fleek on my own infrastructure?

Accepted Answer

Yes. Enterprise on-prem deployment is available. Bring your own GPUs and we'll run our optimization stack on your hardware. Contact sales to get started.

Question 20

Will you have an MCP server?

Accepted Answer

On the roadmap. Claude, Cursor, Windsurf integration coming.

Question 21

API access?

Accepted Answer

Full REST API with OpenAPI spec at launch. Python SDK planned.

Question 22

Free tier?

Accepted Answer

$5 in credits. No card required.

Frequently Asked Questions

About Fleek

Pricing

Technical

Platform

Still have questions?

View Pricing

Read the Docs

Explore Research