Competitors

Company	Category	Threat	Valuation	Revenue	Key Product	Monitor
Fireworks AI DIRECT COMPETITOR. Pure inference-as-a-service, 10T tok/day, 10K+ customers (Cursor, DoorDash). Must match or beat $0.20/M pricing via energy cost advantage.	Inference Platform	CRITICAL	$4B	~$280M ARR	FireAttention inference engine	Bi-weekly
Nebius DIRECT COMPETITOR. Token Factory = managed inference at $0.13/M (lowest published). 69% gross margin. Must undercut via energy advantage or compete on sovereign/compliance.	GPU AI Cloud	CRITICAL	$9.6B (public)	$117.7M (Q3 2024)	Token Factory (managed inference)	Weekly
Cerebras Non-GPU inference at 40M+ tok/sec threatens GPU-based cost assumptions. IPO Q2 2026. Evaluate as potential compute partner or technology licensor.	Custom Silicon	HIGH	$22B (pre-IPO)	Undisclosed	WSE-3 Wafer-Scale Engine	Weekly
Groq Nvidia acquisition signals inference hardware consolidation. LPU's deterministic latency (877 tok/s) sets benchmark. Pricing is higher than infrastructure-first providers' target range.	Custom Silicon	HIGH	~$20B (Nvidia acq.)	$500M target 2025	LPU (Language Processing Unit)	Monthly
Baseten Nvidia's $150M investment signals intent. Custom C++ engine targets enterprise inference workloads. Expanding to training creates a full-stack competitor.	Inference Platform	HIGH	$5B	Undisclosed	Serverless inference (custom C++ engine)	Monthly
Crusoe DIRECT COMPETITOR. Closest energy-to-inference model. Vertically integrated (owns energy + DCs). Key differentiation: energy cost structure and dual revenue streams.	GPU AI Cloud	HIGH	$10B+ (Oct 2025)	~$1B (projected 2025)	Clean-energy GPU cloud	Weekly
DeepInfra Price floor leader at $0.03/M input. 8,000x volume growth since seed. SOC2 + ISO 27001 certified. Lean team (~15 employees) with Blackwell GPU advantage. Must monitor as cost benchmark.	Inference Platform	HIGH	~$100M (est.)	~$3.8M	Serverless Inference API (100+ models)	Bi-weekly
CoreWeave Crypto-to-AI pivot. $55.6B backlog is GPU rental/training, not managed inference. Potential partner for GPU supply. Watch for inference API launch.	GPU AI Cloud	MEDIUM	$49B (public)	$3.6B (9-mo 2024)	GPU Cloud (H100/B200)	Weekly
Together AI Prices at ~breakeven with FlashAttention optimization. Energy cost advantage is key to sustainable margins. Potential integration partner for model serving.	GPU AI Cloud	MEDIUM	$3.3B	~$300M ARR	AI Native Cloud + FlashAttention	Monthly
OpenRouter Distribution channel opportunity: list inference endpoints on OpenRouter for demand generation. Their a16z 100T token study shows inference demand shifting to code + reasoning.	Aggregator / Marketplace	MEDIUM	Undisclosed	Undisclosed	Model Router (500+ models, 1 API)	Quarterly
Replicate Acquired by Cloudflare Nov 2025. 50K model marketplace is distribution play. Cold-start latency (60s+) limits production use. Now part of $30B+ Cloudflare edge network.	Inference Platform	MEDIUM	$350M (pre-acq.)	~$5.3M	Model Marketplace (50K+ models)	Quarterly
Lepton AI Acquired by NVIDIA Apr 2025. Rebranded as DGX Cloud Lepton. Founded by Caffe creator (Yangqing Jia). Now NVIDIA's multi-cloud GPU marketplace connecting devs to CoreWeave, Crusoe, Lambda.	Inference Platform	MEDIUM	Undisclosed	Undisclosed	DGX Cloud Lepton (post-acq.)	Quarterly
Modal Developer-first serverless GPU platform. Built in Rust with sub-1s cold starts. $1.1B unicorn, in talks for $2.5B round. 90% workloads are inference. Different approach: compute platform vs managed inference.	Inference Platform	MEDIUM	$1.1B	~$50M ARR	Serverless GPU Compute (Rust-based)	Monthly
Lambda Pure GPU rental with zero egress fees. Not in managed inference today. Potential GPU supply partner. Monitor for inference API announcements.	GPU AI Cloud	LOW	$4B+	$425M (2024)	Superintelligence Cloud	Quarterly
SambaNova Cautionary tale: $5B peak valuation collapsed to $1.6B Intel offer. Validates GPU-agnostic approach over custom silicon lock-in. Potential acqui-hire talent pool.	Custom Silicon	LOW	$1.6B (Intel offer)	Undisclosed	SN40L RDU	Quarterly
Inference.net Marketplace model for custom LLM inference. Potential distribution partner. Claims 90% cost reduction. a16z + Multicoin backing.	Aggregator / Marketplace	LOW	Undisclosed	Undisclosed	Custom LLM Marketplace	Quarterly

Pricingper 1M tokens, standard models

CoreWeaveGPU hourly

GPU: $4.25/hr (H100 PCIe)

CerebrasLlama 3 70B

In: $0.60/MOut: $0.60/M

Fireworks AILlama 3.1 8B

In: $0.20/MOut: $0.20/M

GroqLlama 3 70B

In: $0.59/MOut: $0.79/M

Together AILlama 3.1 8B

In: $0.20/MOut: $0.20/M

BasetenCustom models

OpenRouter500+ models

SambaNovaDeepSeek R1 671B

Inference.netCustom fine-tuned

NebiusLlama 3 70B

In: $0.13/MOut: $0.40/M

Replicate50K+ open models

Lepton AIMulti-cloud GPU

DeepInfraLlama 3.1 8B

In: $0.03/MOut: $0.05/M

ModalCustom deployments

GPU: $3.95/hr (H100)

Provider	Category	Model	Input $/1M	Output $/1M	Notes
CoreWeave	GPU AI Cloud	GPU hourly	--	--	GPU: $4.25/hr (H100 PCIe)
Cerebras	Custom Silicon	Llama 3 70B	$0.60/M	$0.60/M
Fireworks AI	Inference Platform	Llama 3.1 8B	$0.20/M	$0.20/M
Groq	Custom Silicon	Llama 3 70B	$0.59/M	$0.79/M
Together AI	GPU AI Cloud	Llama 3.1 8B	$0.20/M	$0.20/M
Baseten	Inference Platform	Custom models	--	--
OpenRouter	Aggregator / Marketplace	500+ models	--	--
SambaNova	Custom Silicon	DeepSeek R1 671B	--	--
Inference.net	Aggregator / Marketplace	Custom fine-tuned	--	--
Nebius	GPU AI Cloud	Llama 3 70B	$0.13/M	$0.40/M	GPU: $2.60/hr (H100 SXM)
Replicate	Inference Platform	50K+ open models	--	--
Lepton AI	Inference Platform	Multi-cloud GPU	--	--
DeepInfra	Inference Platform	Llama 3.1 8B	$0.03/M	$0.05/M
Modal	Inference Platform	Custom deployments	--	--	GPU: $3.95/hr (H100)