Competitors
16 companiesFireworks AICRITICAL
DIRECT COMPETITOR. Pure inference-as-a-service, 10T tok/day, 10K+ customers (Cursor, DoorDash). Must match or beat $0.20/M pricing via energy cost advantage.
Inference Platform$4B~$280M ARR
NebiusCRITICAL
DIRECT COMPETITOR. Token Factory = managed inference at $0.13/M (lowest published). 69% gross margin. Must undercut via energy advantage or compete on sovereign/compliance.
GPU AI Cloud$9.6B (public)$117.7M (Q3 2024)
CerebrasHIGH
Non-GPU inference at 40M+ tok/sec threatens GPU-based cost assumptions. IPO Q2 2026. Evaluate as potential compute partner or technology licensor.
Custom Silicon$22B (pre-IPO)
GroqHIGH
Nvidia acquisition signals inference hardware consolidation. LPU's deterministic latency (877 tok/s) sets benchmark. Pricing is higher than infrastructure-first providers' target range.
Custom Silicon~$20B (Nvidia acq.)$500M target 2025
BasetenHIGH
Nvidia's $150M investment signals intent. Custom C++ engine targets enterprise inference workloads. Expanding to training creates a full-stack competitor.
Inference Platform$5B
CrusoeHIGH
DIRECT COMPETITOR. Closest energy-to-inference model. Vertically integrated (owns energy + DCs). Key differentiation: energy cost structure and dual revenue streams.
GPU AI Cloud$10B+ (Oct 2025)~$1B (projected 2025)
DeepInfraHIGH
Price floor leader at $0.03/M input. 8,000x volume growth since seed. SOC2 + ISO 27001 certified. Lean team (~15 employees) with Blackwell GPU advantage. Must monitor as cost benchmark.
Inference Platform~$100M (est.)~$3.8M
CoreWeaveMEDIUM
Crypto-to-AI pivot. $55.6B backlog is GPU rental/training, not managed inference. Potential partner for GPU supply. Watch for inference API launch.
GPU AI Cloud$49B (public)$3.6B (9-mo 2024)
Together AIMEDIUM
Prices at ~breakeven with FlashAttention optimization. Energy cost advantage is key to sustainable margins. Potential integration partner for model serving.
GPU AI Cloud$3.3B~$300M ARR
OpenRouterMEDIUM
Distribution channel opportunity: list inference endpoints on OpenRouter for demand generation. Their a16z 100T token study shows inference demand shifting to code + reasoning.
Aggregator / MarketplaceUndisclosed
ReplicateMEDIUM
Acquired by Cloudflare Nov 2025. 50K model marketplace is distribution play. Cold-start latency (60s+) limits production use. Now part of $30B+ Cloudflare edge network.
Inference Platform$350M (pre-acq.)~$5.3M
Lepton AIMEDIUM
Acquired by NVIDIA Apr 2025. Rebranded as DGX Cloud Lepton. Founded by Caffe creator (Yangqing Jia). Now NVIDIA's multi-cloud GPU marketplace connecting devs to CoreWeave, Crusoe, Lambda.
Inference PlatformUndisclosed
ModalMEDIUM
Developer-first serverless GPU platform. Built in Rust with sub-1s cold starts. $1.1B unicorn, in talks for $2.5B round. 90% workloads are inference. Different approach: compute platform vs managed inference.
Inference Platform$1.1B~$50M ARR
LambdaLOW
Pure GPU rental with zero egress fees. Not in managed inference today. Potential GPU supply partner. Monitor for inference API announcements.
GPU AI Cloud$4B+$425M (2024)
SambaNovaLOW
Cautionary tale: $5B peak valuation collapsed to $1.6B Intel offer. Validates GPU-agnostic approach over custom silicon lock-in. Potential acqui-hire talent pool.
Custom Silicon$1.6B (Intel offer)
Inference.netLOW
Marketplace model for custom LLM inference. Potential distribution partner. Claims 90% cost reduction. a16z + Multicoin backing.
Aggregator / MarketplaceUndisclosed
| Company | Category | Threat | Valuation | Revenue |
|---|---|---|---|---|
Fireworks AI DIRECT COMPETITOR. Pure inference-as-a-service, 10T tok/day, 10K+ customers (Cursor, DoorDash). Must match or beat $0.20/M pricing via energy cost advantage. | Inference Platform | CRITICAL | $4B | ~$280M ARR |
Nebius DIRECT COMPETITOR. Token Factory = managed inference at $0.13/M (lowest published). 69% gross margin. Must undercut via energy advantage or compete on sovereign/compliance. | GPU AI Cloud | CRITICAL | $9.6B (public) | $117.7M (Q3 2024) |
Cerebras Non-GPU inference at 40M+ tok/sec threatens GPU-based cost assumptions. IPO Q2 2026. Evaluate as potential compute partner or technology licensor. | Custom Silicon | HIGH | $22B (pre-IPO) | Undisclosed |
Groq Nvidia acquisition signals inference hardware consolidation. LPU's deterministic latency (877 tok/s) sets benchmark. Pricing is higher than infrastructure-first providers' target range. | Custom Silicon | HIGH | ~$20B (Nvidia acq.) | $500M target 2025 |
Baseten Nvidia's $150M investment signals intent. Custom C++ engine targets enterprise inference workloads. Expanding to training creates a full-stack competitor. | Inference Platform | HIGH | $5B | Undisclosed |
Crusoe DIRECT COMPETITOR. Closest energy-to-inference model. Vertically integrated (owns energy + DCs). Key differentiation: energy cost structure and dual revenue streams. | GPU AI Cloud | HIGH | $10B+ (Oct 2025) | ~$1B (projected 2025) |
DeepInfra Price floor leader at $0.03/M input. 8,000x volume growth since seed. SOC2 + ISO 27001 certified. Lean team (~15 employees) with Blackwell GPU advantage. Must monitor as cost benchmark. | Inference Platform | HIGH | ~$100M (est.) | ~$3.8M |
CoreWeave Crypto-to-AI pivot. $55.6B backlog is GPU rental/training, not managed inference. Potential partner for GPU supply. Watch for inference API launch. | GPU AI Cloud | MEDIUM | $49B (public) | $3.6B (9-mo 2024) |
Together AI Prices at ~breakeven with FlashAttention optimization. Energy cost advantage is key to sustainable margins. Potential integration partner for model serving. | GPU AI Cloud | MEDIUM | $3.3B | ~$300M ARR |
OpenRouter Distribution channel opportunity: list inference endpoints on OpenRouter for demand generation. Their a16z 100T token study shows inference demand shifting to code + reasoning. | Aggregator / Marketplace | MEDIUM | Undisclosed | Undisclosed |
Replicate Acquired by Cloudflare Nov 2025. 50K model marketplace is distribution play. Cold-start latency (60s+) limits production use. Now part of $30B+ Cloudflare edge network. | Inference Platform | MEDIUM | $350M (pre-acq.) | ~$5.3M |
Lepton AI Acquired by NVIDIA Apr 2025. Rebranded as DGX Cloud Lepton. Founded by Caffe creator (Yangqing Jia). Now NVIDIA's multi-cloud GPU marketplace connecting devs to CoreWeave, Crusoe, Lambda. | Inference Platform | MEDIUM | Undisclosed | Undisclosed |
Modal Developer-first serverless GPU platform. Built in Rust with sub-1s cold starts. $1.1B unicorn, in talks for $2.5B round. 90% workloads are inference. Different approach: compute platform vs managed inference. | Inference Platform | MEDIUM | $1.1B | ~$50M ARR |
Lambda Pure GPU rental with zero egress fees. Not in managed inference today. Potential GPU supply partner. Monitor for inference API announcements. | GPU AI Cloud | LOW | $4B+ | $425M (2024) |
SambaNova Cautionary tale: $5B peak valuation collapsed to $1.6B Intel offer. Validates GPU-agnostic approach over custom silicon lock-in. Potential acqui-hire talent pool. | Custom Silicon | LOW | $1.6B (Intel offer) | Undisclosed |
Inference.net Marketplace model for custom LLM inference. Potential distribution partner. Claims 90% cost reduction. a16z + Multicoin backing. | Aggregator / Marketplace | LOW | Undisclosed | Undisclosed |
Pricingper 1M tokens, standard models
CoreWeaveGPU hourly
GPU: $4.25/hr (H100 PCIe)
CerebrasLlama 3 70B
In: $0.60/MOut: $0.60/M
Fireworks AILlama 3.1 8B
In: $0.20/MOut: $0.20/M
GroqLlama 3 70B
In: $0.59/MOut: $0.79/M
Together AILlama 3.1 8B
In: $0.20/MOut: $0.20/M
BasetenCustom models
--
OpenRouter500+ models
--
SambaNovaDeepSeek R1 671B
--
Inference.netCustom fine-tuned
--
NebiusLlama 3 70B
In: $0.13/MOut: $0.40/M
Replicate50K+ open models
--
Lepton AIMulti-cloud GPU
--
DeepInfraLlama 3.1 8B
In: $0.03/MOut: $0.05/M
ModalCustom deployments
GPU: $3.95/hr (H100)
| Provider | Category | Model | Input $/1M | Output $/1M | Notes |
|---|---|---|---|---|---|
| CoreWeave | GPU AI Cloud | GPU hourly | -- | -- | GPU: $4.25/hr (H100 PCIe) |
| Cerebras | Custom Silicon | Llama 3 70B | $0.60/M | $0.60/M | |
| Fireworks AI | Inference Platform | Llama 3.1 8B | $0.20/M | $0.20/M | |
| Groq | Custom Silicon | Llama 3 70B | $0.59/M | $0.79/M | |
| Together AI | GPU AI Cloud | Llama 3.1 8B | $0.20/M | $0.20/M | |
| Baseten | Inference Platform | Custom models | -- | -- | |
| OpenRouter | Aggregator / Marketplace | 500+ models | -- | -- | |
| SambaNova | Custom Silicon | DeepSeek R1 671B | -- | -- | |
| Inference.net | Aggregator / Marketplace | Custom fine-tuned | -- | -- | |
| Nebius | GPU AI Cloud | Llama 3 70B | $0.13/M | $0.40/M | GPU: $2.60/hr (H100 SXM) |
| Replicate | Inference Platform | 50K+ open models | -- | -- | |
| Lepton AI | Inference Platform | Multi-cloud GPU | -- | -- | |
| DeepInfra | Inference Platform | Llama 3.1 8B | $0.03/M | $0.05/M | |
| Modal | Inference Platform | Custom deployments | -- | -- | GPU: $3.95/hr (H100) |