Competitors

21 companies
Fireworks AICRITICAL
DIRECT COMPETITOR. Pure inference-as-a-service, 10T tok/day, 10K+ customers (Cursor, DoorDash). Must match or beat $0.20/M pricing via energy cost advantage.
Inference Platform$4B~$280M ARR
NebiusCRITICAL
DIRECT COMPETITOR. Token Factory = managed inference at $0.13/M (lowest published). 69% gross margin. Must undercut via energy advantage or compete on sovereign/compliance.
GPU AI Cloud$9.6B (public)$117.7M (Q3 2024)
CerebrasHIGH
Non-GPU inference at 40M+ tok/sec threatens GPU-based cost assumptions. IPO Q2 2026. Evaluate as potential compute partner or technology licensor.
Custom Silicon$22B (pre-IPO)
GroqHIGH
Nvidia acquisition signals inference hardware consolidation. LPU's deterministic latency (877 tok/s) sets benchmark. Pricing is higher than infrastructure-first providers' target range.
Custom Silicon~$20B (Nvidia acq.)$500M target 2025
BasetenHIGH
Nvidia's $150M investment signals intent. Custom C++ engine targets enterprise inference workloads. Expanding to training creates a full-stack competitor.
Inference Platform$5B
CrusoeHIGH
DIRECT COMPETITOR. Now a managed inference platform, not just GPU cloud. MemoryAlloy: 9.9x TTFT vs vLLM. BYOM custom model support. 5x bookings growth. Product team led by ex-Google Cloud AI SVP (Erwan Menard, ex-Vertex AI).
GPU AI Cloud$10B+ (Oct 2025)~$1B (projected 2025)
DeepInfraHIGH
Price floor leader at $0.03/M input. 8,000x volume growth since seed. SOC2 + ISO 27001 certified. Lean team (~15 employees) with Blackwell GPU advantage. Must monitor as cost benchmark.
Inference Platform~$100M (est.)~$3.8M
InferactHIGH
Commercializing vLLM, the open-source engine behind 400K+ GPUs. Meta, Google, Amazon in production. The software layer MARA likely uses. Monitor commercial offerings closely.
Inference Platform$800M
Cloudflare Workers AIHIGH
Distribution moat: 332K paying customers, 50K+ models via Replicate acquisition. Edge inference in 330+ cities. Threat is distribution, not raw performance.
Inference Platform$68.8B (public)$614.5M (Q4 2025)
TaalasHIGH
Model-specific chips deliver 73x H200 performance at 1/10th power. Trade flexibility for extreme efficiency. Air-cooled PCIe form factor fits modular containers. Evaluate as compute partner.
Custom Silicon~$500M (est.)Pre-revenue
CoreWeaveMEDIUM
Crypto-to-AI pivot. $55.6B backlog is GPU rental/training, not managed inference. Potential partner for GPU supply. Watch for inference API launch.
GPU AI Cloud$49B (public)$3.6B (9-mo 2024)
Together AIMEDIUM
Prices at ~breakeven with FlashAttention optimization. Energy cost advantage is key to sustainable margins. Potential integration partner for model serving.
GPU AI Cloud$3.3B~$300M ARR
OpenRouterMEDIUM
Distribution channel opportunity: list inference endpoints on OpenRouter for demand generation. Their a16z 100T token study shows inference demand shifting to code + reasoning.
Aggregator / MarketplaceUndisclosed
ReplicateMEDIUM
Acquired by Cloudflare Nov 2025. 50K model marketplace is distribution play. Cold-start latency (60s+) limits production use. Now part of $30B+ Cloudflare edge network.
Inference Platform$350M (pre-acq.)~$5.3M
Lepton AIMEDIUM
Acquired by NVIDIA Apr 2025. Rebranded as DGX Cloud Lepton. Founded by Caffe creator (Yangqing Jia). Now NVIDIA's multi-cloud GPU marketplace connecting devs to CoreWeave, Crusoe, Lambda.
Inference PlatformUndisclosed
ModalMEDIUM
Developer-first serverless GPU platform. Built in Rust with sub-1s cold starts. $1.1B unicorn, in talks for $2.5B round. 90% workloads are inference. Different approach: compute platform vs managed inference.
Inference Platform$1.1B~$50M ARR
fal.aiMEDIUM
Fastest ARR growth in inference: $25M to $200M in 12 months. Media/multimodal focus today, expanding to LLMs. Sequoia + NVIDIA backed. Watch for LLM pivot.
Inference Platform$4.5B~$200M ARR (est.)
NscaleMEDIUM
Largest European AI infrastructure company. $1.1B Series B. Stargate Norway with OpenAI (100K GPUs). $14B Microsoft deal value. Sovereign cloud competitor and potential model for MARA's European expansion.
GPU AI Cloud$2B+Pre-platform revenue
LambdaLOW
Pure GPU rental with zero egress fees. Not in managed inference today. Potential GPU supply partner. Monitor for inference API announcements.
GPU AI Cloud$4B+$425M (2024)
SambaNovaLOW
Cautionary tale: $5B peak valuation collapsed to $1.6B Intel offer. Validates GPU-agnostic approach over custom silicon lock-in. Potential acqui-hire talent pool.
Custom Silicon$1.6B (Intel offer)
Inference.netLOW
Marketplace model for custom LLM inference. Potential distribution partner. Claims 90% cost reduction. a16z + Multicoin backing.
Aggregator / MarketplaceUndisclosed
Pricingper 1M tokens, standard models
CoreWeaveGPU hourly
GPU: $4.25/hr (H100 PCIe)
CerebrasLlama 3 70B
In: $0.60/MOut: $0.60/M
Fireworks AILlama 3.1 8B
In: $0.20/MOut: $0.20/M
GroqLlama 3 70B
In: $0.59/MOut: $0.79/M
Together AILlama 3.1 8B
In: $0.20/MOut: $0.20/M
BasetenCustom models
--
OpenRouter500+ models
--
SambaNovaDeepSeek R1 671B
--
Inference.netCustom fine-tuned
--
NebiusLlama 3 70B
In: $0.13/MOut: $0.40/M
Replicate50K+ open models
--
Lepton AIMulti-cloud GPU
--
DeepInfraLlama 3.1 8B
In: $0.03/MOut: $0.05/M
ModalCustom deployments
GPU: $3.95/hr (H100)
InferactvLLM-based
--
Cloudflare Workers AILlama 3.2 1B
In: $0.01/MOut: $0.01/M
fal.ai600+ media models
--
TaalasLlama 3.1 8B (chip)
--