Competitors

16 companies
Fireworks AICRITICAL
DIRECT COMPETITOR. Pure inference-as-a-service, 10T tok/day, 10K+ customers (Cursor, DoorDash). Must match or beat $0.20/M pricing via energy cost advantage.
Inference Platform$4B~$280M ARR
NebiusCRITICAL
DIRECT COMPETITOR. Token Factory = managed inference at $0.13/M (lowest published). 69% gross margin. Must undercut via energy advantage or compete on sovereign/compliance.
GPU AI Cloud$9.6B (public)$117.7M (Q3 2024)
CerebrasHIGH
Non-GPU inference at 40M+ tok/sec threatens GPU-based cost assumptions. IPO Q2 2026. Evaluate as potential compute partner or technology licensor.
Custom Silicon$22B (pre-IPO)
GroqHIGH
Nvidia acquisition signals inference hardware consolidation. LPU's deterministic latency (877 tok/s) sets benchmark. Pricing is higher than infrastructure-first providers' target range.
Custom Silicon~$20B (Nvidia acq.)$500M target 2025
BasetenHIGH
Nvidia's $150M investment signals intent. Custom C++ engine targets enterprise inference workloads. Expanding to training creates a full-stack competitor.
Inference Platform$5B
CrusoeHIGH
DIRECT COMPETITOR. Closest energy-to-inference model. Vertically integrated (owns energy + DCs). Key differentiation: energy cost structure and dual revenue streams.
GPU AI Cloud$10B+ (Oct 2025)~$1B (projected 2025)
DeepInfraHIGH
Price floor leader at $0.03/M input. 8,000x volume growth since seed. SOC2 + ISO 27001 certified. Lean team (~15 employees) with Blackwell GPU advantage. Must monitor as cost benchmark.
Inference Platform~$100M (est.)~$3.8M
CoreWeaveMEDIUM
Crypto-to-AI pivot. $55.6B backlog is GPU rental/training, not managed inference. Potential partner for GPU supply. Watch for inference API launch.
GPU AI Cloud$49B (public)$3.6B (9-mo 2024)
Together AIMEDIUM
Prices at ~breakeven with FlashAttention optimization. Energy cost advantage is key to sustainable margins. Potential integration partner for model serving.
GPU AI Cloud$3.3B~$300M ARR
OpenRouterMEDIUM
Distribution channel opportunity: list inference endpoints on OpenRouter for demand generation. Their a16z 100T token study shows inference demand shifting to code + reasoning.
Aggregator / MarketplaceUndisclosed
ReplicateMEDIUM
Acquired by Cloudflare Nov 2025. 50K model marketplace is distribution play. Cold-start latency (60s+) limits production use. Now part of $30B+ Cloudflare edge network.
Inference Platform$350M (pre-acq.)~$5.3M
Lepton AIMEDIUM
Acquired by NVIDIA Apr 2025. Rebranded as DGX Cloud Lepton. Founded by Caffe creator (Yangqing Jia). Now NVIDIA's multi-cloud GPU marketplace connecting devs to CoreWeave, Crusoe, Lambda.
Inference PlatformUndisclosed
ModalMEDIUM
Developer-first serverless GPU platform. Built in Rust with sub-1s cold starts. $1.1B unicorn, in talks for $2.5B round. 90% workloads are inference. Different approach: compute platform vs managed inference.
Inference Platform$1.1B~$50M ARR
LambdaLOW
Pure GPU rental with zero egress fees. Not in managed inference today. Potential GPU supply partner. Monitor for inference API announcements.
GPU AI Cloud$4B+$425M (2024)
SambaNovaLOW
Cautionary tale: $5B peak valuation collapsed to $1.6B Intel offer. Validates GPU-agnostic approach over custom silicon lock-in. Potential acqui-hire talent pool.
Custom Silicon$1.6B (Intel offer)
Inference.netLOW
Marketplace model for custom LLM inference. Potential distribution partner. Claims 90% cost reduction. a16z + Multicoin backing.
Aggregator / MarketplaceUndisclosed
Pricingper 1M tokens, standard models
CoreWeaveGPU hourly
GPU: $4.25/hr (H100 PCIe)
CerebrasLlama 3 70B
In: $0.60/MOut: $0.60/M
Fireworks AILlama 3.1 8B
In: $0.20/MOut: $0.20/M
GroqLlama 3 70B
In: $0.59/MOut: $0.79/M
Together AILlama 3.1 8B
In: $0.20/MOut: $0.20/M
BasetenCustom models
--
OpenRouter500+ models
--
SambaNovaDeepSeek R1 671B
--
Inference.netCustom fine-tuned
--
NebiusLlama 3 70B
In: $0.13/MOut: $0.40/M
Replicate50K+ open models
--
Lepton AIMulti-cloud GPU
--
DeepInfraLlama 3.1 8B
In: $0.03/MOut: $0.05/M
ModalCustom deployments
GPU: $3.95/hr (H100)