Landscape Report — Managed Inference

Managed Inference Platform Landscape: Top 5 Competitive Analysis

Fireworks AI • Together AI • Baseten • Nebius • Crusoe — Engines, Pricing, Scale & Strategic Positioning

Feb 2026 MinjAI Agents 75 Sources 12 Sections

Internal — Strategic Intelligence

Section 01

Market Context: The Inference Inflection

$20.6B

Inference Spending 2026 (Gartner)

55%

Share of AI IaaS Going to Inference

$105B

Inference PaaS TAM by 2030

41.1%

CAGR (2025–2030)

2026 marks the inflection point where inference overtakes training as the dominant AI infrastructure workload. Gartner projects $37.5B in AI-optimized IaaS spending in 2026, with 55% ($20.6B) flowing to inference—up from $9.2B in 2025.¹ Deloitte estimates inference will consume 67% of all AI compute by end of 2026, up from 50% in 2025.²

The broader AI inference platform-as-a-service market is projected to grow from $18.84B in 2025 to $105.22B by 2030 at a 41.1% CAGR.³ Three forces are accelerating this: agentic AI workflows multiplying token volume per task, reasoning models consuming 10–100x more tokens per query, and enterprise migration from proprietary APIs to open-weight models for cost and control.⁴

Capital Concentration in Inference

The investment thesis has shifted decisively toward inference. In H2 2025 alone:

NVIDIA acquired Groq for $20B (Dec 2025), its largest deal ever, to enter the non-GPU inference chip market⁵
CoreWeave IPO'd at $35B market cap (Mar 2025), raising $1.5B with a $55.6B revenue backlog⁶
The five platforms in this report alone raised $5.7B+ combined across 2024–2026

Token Pricing Deflation

Per-token costs are declining at roughly 10x per year at equivalent model quality. GPT-3-equivalent inference fell from $60/M tokens in 2021 to $0.06/M tokens in 2025—a 1,000x reduction in three years.⁷ This deflation rewards platforms with proprietary engine optimizations that can maintain margins while prices compress.

Market Structure

The inference PaaS market is consolidated at the top: hyperscalers (AWS, GCP, Azure) hold 66–75% share. But the independent managed inference layer—the five platforms analyzed here—is where the fastest innovation is happening and where enterprises are increasingly deploying production workloads for speed, cost, and model flexibility advantages.

Section 02

Executive Summary: The Top 5

This report analyzes the five leading independent managed inference platforms by funding, revenue scale, and technical differentiation. Each operates a proprietary or optimized inference engine, offers per-token API pricing, and targets enterprise production workloads.

Platform	Valuation	Revenue	Engine	Models
Fireworks AI	$4.0B	>$280M ann.	FireAttention V4	50+
Together AI	$3.3B	~$300M ann.	FlashAttention-4 + Kernel Collection	200+
Baseten	$5.0B	10x YoY growth	Custom C++ + TensorRT-LLM	BYOM + APIs
Nebius	~$25B mkt cap	$530M FY2025	Token Factory (vLLM+)	60+
Crusoe	$10B+	5x bookings growth	MemoryAlloy	8+

Key Finding

The managed inference market has bifurcated into two tiers: API-first platforms (Fireworks, Together) competing on model breadth, developer experience, and token pricing; and infrastructure-first platforms (Baseten, Nebius, Crusoe) competing on custom deployment, BYOM, and cost-per-compute-hour. The platforms that bridge both—offering production APIs AND dedicated infrastructure—will capture the most enterprise value.

Section 03

Landscape Snapshot: Side-by-Side Comparison

Dimension	Fireworks AI	Together AI	Baseten	Nebius	Crusoe
Founded	2022	2022	2019	2024 (ex-Yandex)	2018
HQ	Redwood City, CA	San Francisco, CA	San Francisco, CA	Amsterdam, NL	Denver, CO
Total Funding	~$327M	~$534M	~$585M	Public (NBIS)	~$3.9B
Employees	~166	~320	~100–150	~1,371	~1,000+
Inference Engine	FireAttention (custom CUDA)	FlashAttention + Kernel Collection	TensorRT-LLM + Custom C++	Token Factory (vLLM+)	MemoryAlloy (KV-cache)
GPU Support	H100, H200, B200, MI300X	H100, H200, B200, GB200	H100, H200, B200	H100, H200, GB300	H100, H200, B200, GB200, AMD (SkyPilot)
Llama 3.3 70B $/M	$0.90 / $0.90	$0.88 / $0.88	Dedicated only	$0.13 / $0.40	$0.25 / $0.75
Key Customers	Cursor, Uber, Samsung, Notion	Salesforce, Zoom, DuckDuckGo	Cursor, Writer, Notion	Microsoft, Meta	Cursor, Fireworks, Together AI
Compliance	SOC2, HIPAA, GDPR	SOC2 Type II	SOC2, HIPAA	ISO 27001, SOC2	SOC2, ISO 27001, ISO 42001
BYOM	Yes (On-Demand)	Yes (Dedicated)	Yes (Truss SDK)	Yes (Enterprise)	Yes (Contact Sales)
Fine-Tuning	LoRA, DPO, RFT	LoRA, Full FT	Blueprint + Training	Enterprise only	Roadmap

Section 04

Technology Engine Comparison

Each platform has built or adopted a distinct inference optimization strategy. The engine choice defines their cost structure, performance ceiling, and hardware flexibility.

Fireworks: FireAttention V1–V4

Custom CUDA kernels written from scratch for each GPU generation. V4 introduces FP4 (NVFP4) precision on NVIDIA B200 Blackwell GPUs with TensorCore Gen 5 instructions. Achieves 3.5x throughput improvement versus SGLang on H200 and >250 tok/s sustained on B200.⁸ Speculative decoding enabled Cursor to reach ~1,000 tok/s on Llama 70B.⁹ Uniquely supports both NVIDIA (H100/H200/B200) and AMD (MI300X) hardware.

Together: FlashAttention-4 + Kernel Collection

Tri Dao's FlashAttention is the industry-standard attention kernel, used by virtually every LLM provider. FlashAttention-4 on Blackwell achieves 1,605 TFLOPS (71% of theoretical maximum), 22% faster than NVIDIA's own cuDNN library.¹⁰ The Together Kernel Collection provides up to 10% faster training and 75% faster inference on top of FlashAttention.¹¹

Baseten: Custom C++ Server + TensorRT-LLM

Replaced Triton Inference Server with a custom C++ server integrating TensorRT-LLM at the executor API level. Builds TRT-LLM from source and contributes patches upstream. Adds custom CUDA kernels for structured output (via Outlines) and speculative decoding (EAGLE-3, Medusa). Engine Builder automates TRT-LLM engine creation in minutes.¹² Deep NVIDIA partnership ($150M investment) ensures early access to optimizations.

Nebius: Token Factory (Managed vLLM+)

Token Factory runs on optimized vLLM with proprietary extensions: speculative decoding, PagedAttention, and KV-cache reuse achieving 4x cost reductions. Nebius designs their own server chassis and operates Europe's first GB300 NVL72 deployment in Finland.¹³ At ~70% gross margin, Token Factory demonstrates that managed vLLM can be a high-margin business at scale.

Crusoe: MemoryAlloy (Distributed KV-Cache)

MemoryAlloy is a distributed key-value cache architecture that decouples KV storage from GPU compute. Achieves 9.9x improvement in time-to-first-token (TTFT) and 5x throughput versus standard vLLM.¹⁴ This architecture is particularly effective for long-context and multi-turn workloads where KV-cache reuse creates compound performance gains.

Unified Performance Benchmarks

Direct head-to-head comparisons are limited by vendor-specific test conditions. The table below normalizes available benchmarks to the closest comparable workloads.

Metric	Fireworks	Together	Baseten	Nebius	Crusoe
TTFT (70B-class)	0.30–0.40s	~0.25s (MinjAI est.)	0.13s (Mistral 7B)⁵³	~0.35s (MinjAI est.)	9.9x faster vs vLLM
Output Throughput	>250 tok/s (B200)	~175 tok/s (H200)	650+ tok/s (GPT-OSS 120B)²⁷	4x cost-perf via KV reuse	5x vs vLLM baseline
Peak Customer Deploy	~1,000 tok/s (Cursor)	Claims 30%+ faster than Fireworks	78% lower latency (OpenEvidence)	N/A (enterprise SLA)	N/A (GA Nov 2025)
Speculative Decoding	Yes (production)	Yes (Medusa)	Yes (EAGLE-3, Medusa)	Yes (vLLM native)	Roadmap
Multi-Hardware	NVIDIA + AMD	NVIDIA only	NVIDIA only	NVIDIA only	NVIDIA + AMD (SkyPilot)⁶⁴
Blackwell (B200/GB200)	FP4 via FireAttention V4	FlashAttention-4 native	225% cost-perf gain²⁸	First EU GB300 NVL72	B200 supported

Benchmark Methodology Note

These benchmarks are sourced from vendor claims, Artificial Analysis rankings, and customer case studies. No independent third-party has tested all five platforms under identical conditions. Baseten's TTFT benchmark is on Mistral 7B (not 70B); Crusoe's metrics are relative improvements vs. vLLM baseline. Treat as directional, not absolute.

Inference Platform Technology Stack

Layer 4: Application & API

API Gateway & Rate Limiting

Model Catalog / Marketplace

BYOM Portal

Fine-Tuning Pipeline

Observability & Logging

Layer 3: Inference Engine (Differentiator)

Fireworks: FireAttention V4

Together: FlashAttention-4

Baseten: TensorRT-LLM + C++

Nebius: Token Factory

Crusoe: MemoryAlloy

Layer 2: Compute Infrastructure

H100 / H200 / B200 / GB200 GPUs

NVLink / InfiniBand Fabric

KV-Cache Management

Auto-Scaling & Load Balancing

Multi-Region Routing

Layer 1: Physical Infrastructure

Owned DCs: Nebius (Finland, NJ, KS), Crusoe (TX, WY), Together (MD, TN, Sweden)

Colocated: Fireworks (multi-cloud), Baseten (AWS SCA)⁴⁸

Energy: Crusoe ~$0.03/kWh (stranded gas + renewables), Nebius (Nordic hydro), Others (grid power)

Engine Differentiation Map

Custom CUDA kernels (Fireworks, Baseten) → Maximum per-GPU performance, hardware-specific optimization
Research-grade kernels (Together) → Deepest attention-layer optimization, cross-platform portability
vLLM-based + extensions (Nebius) → Ecosystem compatibility, proven at scale, lower R&D cost
Architecture innovation (Crusoe) → System-level optimization, unique multi-turn advantage

Section 05

Fireworks AI: The Production Inference Leader

$4.0B

Valuation (Oct 2025)

>$280M

Annualized Revenue

10K+

Customer Companies

10T+

Tokens/Day Processed

Fireworks AI is the highest-revenue independent managed inference platform. Founded by ex-Meta PyTorch engineers (CEO Lin Qiao led 300+ engineers building PyTorch), the company raised $250M in Series C at $4B valuation in October 2025.¹⁵ Revenue grew roughly 20x year-over-year from ~$6.5M ARR (May 2024) to >$280M annualized (Oct 2025).¹⁶

Leadership

Name	Role	Background
Lin Qiao	CEO & Co-Founder	Head of PyTorch at Meta, Sr. Director Engineering (300+ eng); PhD UCSB
Dmytro Dzhulgakov	CTO & Co-Founder	PyTorch core maintainer at Meta
Chenyu Zhao	Co-Founder	Google Vertex AI lead

Product Suite

Serverless Inference: 50+ models, pay-per-token, OpenAI-compatible API
On-Demand Deployments: Dedicated GPUs with auto-scaling, per-second billing
Virtual Cloud: 18+ regions across 8+ cloud providers; BYOC support
Fine-Tuning: LoRA, DPO, Reinforcement Fine-Tuning (RFT)
Compound AI: FireFunction for multi-model orchestration; f1 dynamic routing model
Voice Agent Platform: Sub-500ms latency ASR+LLM+TTS pipeline (beta, June 2025)

Performance Benchmarks

Metric	Value	Context
TTFT	0.30–0.40s	Across models, faster than Groq (0.45s)
gpt-oss-120b	960 tok/s	Artificial Analysis benchmark¹⁷
B200 Peak	>250 tok/s	FireAttention V4 with FP4
Cursor Deploy	~1,000 tok/s	Speculative decoding on Llama 70B

Funding History

Round	Date	Amount	Lead Investors
Seed	2022	Undisclosed	Benchmark
Series A	Mar 2024	$25M	Benchmark
Series B	Jul 2024	$52M at $552M	Sequoia, NVIDIA
Series C	Oct 2025	$250M at $4B	Lightspeed, Index, Evantic¹⁸

Customer Use Case Metrics

Customer	Use Case	Verified Outcome
Cursor	AI code completion (Llama 70B)	~1,000 tok/s via speculative decoding; powers Tab autocomplete for millions of developers⁹
Notion	AI writing assistant	4x latency reduction vs. previous provider; sub-second response times⁵⁴
Uber	Compound AI for ride operations	Production-scale multi-model orchestration via FireFunction; specific metrics undisclosed
Samsung	On-device + cloud AI features	Galaxy AI integration via Fireworks serverless API; specific metrics undisclosed
Cresta	Contact center AI	~100x cost savings vs. proprietary API providers⁵⁵

Deep Dive: FireAttention Architecture Evolution & Competitive Moat

V1 (2023): Initial custom CUDA kernels for H100, replacing standard vLLM serving. Achieved ~2x throughput improvement over stock PyTorch inference.

V2 (2024): Added continuous batching, speculative decoding, and H200 support. Multi-tenant GPU sharing enabled the serverless pricing model.

V3 (2024): AMD MI300X support added—making Fireworks the only platform in this group to run on non-NVIDIA hardware. PagedAttention optimization and prefix caching.

V4 (2025): FP4 (NVFP4) precision on B200 Blackwell with TensorCore Gen 5. 3.5x throughput gain over SGLang on H200. This generation targets the AI agent/creation market where sustained high throughput matters more than single-request latency.

Moat assessment: Fireworks' moat is engineering velocity: 4 major engine versions in 3 years, each generation-specific. The risk is that NVIDIA's own TensorRT-LLM narrows the gap with each release. The AMD support is a strategic hedge—if MI300X/MI400 gain traction, Fireworks is the only independent platform ready.

Key risk: Lin Qiao's PyTorch team culture means Fireworks optimizes at the kernel level, not the system level. MemoryAlloy (Crusoe) and Token Factory (Nebius) attack efficiency from the architecture layer—a different competitive angle that kernel optimization alone can't match.

Competitive Threat Assessment: Very High

Fireworks has the strongest combination of scale (10T tokens/day), revenue ($280M+), and customer logos (Cursor, Uber, Samsung, Notion). Their PyTorch founding team has deep inference optimization expertise. Multi-hardware support (NVIDIA + AMD) is unique. Primary weakness: not the cheapest on per-token pricing; competes on speed and reliability.

Section 06

Together AI: The Research-Driven Inference Cloud

$3.3B

Valuation

~$300M

Annualized Revenue (Sep 2025)

600K+

Developers

200+

Models Available

Together AI combines academic research credibility with production-scale infrastructure. Chief Scientist Tri Dao created FlashAttention, the industry-standard attention kernel used by virtually every LLM provider globally. The company raised $305M in Series B (Feb 2025) and has $534M total funding.¹⁹

Leadership

Name	Role	Background
Vipul Ved Prakash	CEO & Co-Founder	Founder of Topsy (acquired by Apple), serial entrepreneur
Tri Dao	Chief Scientist	Creator of FlashAttention 1–4; Stanford/Princeton PhD
Ce Zhang	Co-Founder & President	ETH Zurich professor, data systems researcher

Product Suite

Serverless Inference API: 200+ models including Llama 4 Maverick/Scout (Meta launch partner)²⁰
GPU Clusters: H100, B200, GB200 NVL72 for training and inference
Fine-Tuning: LoRA, full fine-tuning, DPO
Data Pipeline: Refuel.ai acquisition (May 2025) for data quality/structuring²¹
Enterprise Platform: SSO, RBAC, audit logging, dedicated capacity (Dec 2025)

Business Model

Revenue splits approximately 30–40% API inference and 60–70% GPU cluster rentals. Gross margins are ~45%, with infrastructure ownership (data centers in Maryland, Memphis, Sweden) expected to improve unit economics.²² Claims 80% cheaper than hyperscalers on equivalent workloads.

Pricing Highlights

Model	Input $/M	Output $/M
Llama 3.1 8B	$0.18	$0.18
Llama 3.3 70B	$0.88	$0.88
DeepSeek-R1	$3.00	$7.00
Llama 4 Maverick (400B MoE)	$0.27	$0.27

Funding History

Round	Date	Amount	Lead Investors
Seed	May 2023	$20M	Lux Capital
Series A	Nov 2023	$102.5M	Kleiner Perkins⁵⁶
Series B	Mar 2024	$106M	Salesforce Ventures⁵⁷
Series C	Feb 2025	$305M at $3.3B	Prosperity7, Coatue, a16z¹⁹

European Expansion

Partnered with Hypertec/5C for up to 100,000 GPUs in European data centers ($5B total investment). Positions Together for EU data residency requirements and sovereign AI demand.²³

Customer Use Case Metrics

Customer	Use Case	Verified Outcome
Salesforce	Enterprise AI features (Agentforce)	Strategic investor ($106M Series B lead); Together powers inference workloads
Zoom	AI Companion features	Meeting summarization, real-time AI assistance at scale
DuckDuckGo	AI-powered search answers	Privacy-first inference via Together API; open-weight models for data control
Pika Labs	AI video generation	GPU clusters for video model training and inference at scale
Meta	Llama launch partner	Day-one availability of Llama 4 Maverick/Scout; co-marketing partnership²⁰

Deep Dive: Research-to-Product Pipeline & Open-Source Influence

FlashAttention's industry impact: Tri Dao's FlashAttention is used by virtually every LLM provider—including Fireworks, Baseten, and Nebius from this report. This gives Together unparalleled visibility into attention kernel optimization requirements across the industry.

The Together Kernel Collection goes beyond FlashAttention: it includes optimized kernels for MLP layers, normalization, and embedding operations. Together claims 10% faster training and 75% faster inference vs. stock implementations. This collection is proprietary (unlike FlashAttention itself).

Acquisition strategy: The Refuel.ai acquisition (May 2025) added data quality/structuring capabilities, enabling a train→evaluate→infer loop. This is Together's answer to Baseten's Parsed acquisition—both racing to own the full model lifecycle.

Revenue composition risk: ~60-70% of revenue comes from GPU cluster rentals, not inference API. This means Together's inference margins are less proven at scale than Fireworks'. The shift to owned infrastructure (Maryland, Memphis, Sweden data centers) should improve unit economics but requires massive capex.

Open-source ecosystem leverage: Together's open models (RedPajama, OpenChatKit) and research papers (FlashAttention 1-4, Monarch Mixer) create developer mindshare that converts to paying API customers. This research-to-revenue flywheel is unique in this landscape.

Competitive Threat Assessment: Very High

Together's research moat (FlashAttention is literally the kernel everyone uses) gives them unique credibility. 200+ models is the broadest catalog among independents. European expansion addresses sovereignty demand. Primary weakness: training-heavy revenue mix means inference margins are still maturing. Aggressive pricing compresses margins.

Section 07

Baseten: The NVIDIA-Aligned Full-Lifecycle Platform

$5.0B

Valuation (Jan 2026)

$585M

Total Raised

10x

Revenue Growth (2025 YoY)

100x

Inference Volume Growth (2025)

Baseten is the highest-valued independent inference platform at $5B, driven by NVIDIA's $150M strategic investment as part of the $300M Series E (Jan 2026).²⁴ Founded in 2019 by ex-Gumroad and ex-Clover Health engineers, Baseten pivoted from ML app building to production inference infrastructure and has seen explosive growth: 100x inference volume increase in 2025.

Leadership

Name	Role	Background
Tuhin Srivastava	CEO & Co-Founder	Ex-Gumroad (data scientist/fraud ML), ex-Macquarie (IB); USC
Amir Haghighat	CTO & Co-Founder	Ex-Clover Health (ML engineering), ex-Yelp; MS CS UC Irvine

Product Suite

Truss SDK: Open-source model packaging framework (GitHub: basetenlabs/truss)
Chains: Multi-model orchestration for compound AI pipelines (GA)²⁵
Model APIs: Pre-optimized serverless endpoints (GPT-OSS, Llama, DeepSeek)
Engine Builder: Automated TensorRT-LLM engine creation in minutes
Baseten Training: Multi-node fine-tuning (closed beta, May 2025)
Parsed acquisition (Dec 2025): RL and post-training feedback loops²⁶

Performance Benchmarks

Workload	Metric	Result
GPT-OSS 120B	Throughput	650+ tok/s (Artificial Analysis #1 on OpenRouter)²⁷
Mistral 7B	TTFT	130ms
Mistral 7B	Throughput	170 tok/s
Embeddings (B200)	vs. vLLM	3.3x higher throughput
B200 Blackwell	Cost-performance	225% improvement (validated by Google Cloud)²⁸

Funding History

Round	Date	Amount	Lead
Series C	Feb 2025	$75M at $825M	Spark Capital
Series D	Sep 2025	$150M at $2.15B	BOND
Series E	Jan 2026	$300M at $5B	IVP, CapitalG, NVIDIA ($150M)²⁹

Customer Use Case Metrics

Customer	Use Case	Verified Outcome
Cursor	AI code editor inference	Primary inference provider alongside Fireworks; production-scale code completion
Writer	Enterprise AI writing platform	Custom Palmyra model deployed via Truss; dedicated GPU deployment⁵⁸
Zed	AI-powered code editor	45% lower latency vs. previous provider with dedicated B200 deployment
OpenEvidence	Medical AI platform	78% lower latency, enabling real-time clinical decision support⁵⁹
Patreon	Creator platform AI features	~$600K/year savings vs. proprietary APIs; migrated to open-weight models on Baseten

Deep Dive: NVIDIA Partnership Depth & Product Evolution

NVIDIA's $150M bet: This is NVIDIA's largest known investment in a managed inference startup. The strategic rationale: Baseten validates TensorRT-LLM as the enterprise inference standard. Every Baseten deployment runs NVIDIA's software stack, creating lock-in at the engine layer.

Product pivot history: Baseten started in 2019 as an ML app builder (think: Streamlit for ML). The pivot to inference infrastructure happened in 2023 when they realized the bottleneck for ML deployment wasn't the app layer but the serving layer. This pivot explains why their developer experience (Truss SDK, Chains) is best-in-class—they came from a developer tools background.

The Parsed acquisition (Dec 2025) is strategically important: it adds RL (reinforcement learning) and evaluation capabilities. Combined with Baseten Training (closed beta), this gives Baseten the only complete train→evaluate→deploy→improve loop among the five platforms.

AWS Strategic Collaboration Agreement (Dec 2025): Baseten is available on AWS Marketplace with Savings Plans support.⁴⁸ This is unusual—most inference startups compete against AWS, not partner with them. It signals AWS sees Baseten as complementary (custom model serving) rather than competitive (they don't replicate SageMaker).

Key risk: Baseten's dedicated GPU model means they don't benefit from multi-tenant efficiency the way serverless platforms (Fireworks, Together) do. At small scale, customers pay for idle GPU time. This makes Baseten most compelling for customers with consistent, high-volume workloads.

Competitive Threat Assessment: High

Baseten's $150M NVIDIA investment creates a deep technical moat around TensorRT-LLM optimization. Three funding rounds in 12 months ($75M → $150M → $300M) shows exceptional velocity. The Parsed acquisition gives them the only end-to-end inference + training + RL pipeline among the five. Primary weakness: revenue scale likely smaller than Fireworks/Together; enterprise customer base still growing.

Section 08

Nebius: The Scale-First European Neocloud

~$25B

Market Cap (NASDAQ: NBIS)

$530M

FY2025 Revenue (+479% YoY)

$1.25B

Year-End ARR (2025)

~70%

Gross Margin

Nebius is the only publicly traded company in this comparison and the largest by market capitalization. Spun out of Yandex's cloud infrastructure business, led by Arkady Volozh (ex-Yandex CEO). Revenue grew 479% YoY to $530M in FY2025, with Q4 alone at $228M (+547% YoY).³⁰

Leadership

Name	Role	Background
Arkady Volozh	CEO	Founded Yandex (Russia's Google); built $25B+ enterprise
Andrey Korolenko	Chief Product & Infrastructure Officer	28-year Yandex/Nebius veteran (since 1998); leads data center buildouts & capacity planning⁷⁴
Roman Chernin	Chief Business Officer & Co-Founder	12 years heading Yandex digital services (Search, Maps); spearheading AI cloud business since 2023
Ophir Nave	COO & Executive Director	M&A lawyer; ex-Arnon Tadmor-Levy, ex-Wachtell Lipton; JSD Harvard Law⁷⁵

Financial Scale

Metric	FY2025	2026 Guidance
Revenue	$529.8M	$3.0–3.4B³¹
ARR	$1.25B	$7–9B
EBITDA Margin	Improving	~40% target
Cash	$3.7B	—

Strategic Partnerships

Microsoft: $17.4B (up to $19.4B) five-year infrastructure agreement³²
Meta: $3B infrastructure deal
NVIDIA: Partnership for Vera Rubin next-gen GPUs; first European GB300 NVL72 deployment
Tavily: Acquisition of AI search/retrieval startup for agentic AI capabilities

Infrastructure

Data Center	Capacity	Status
Finland (Mäntsälä)	60,000 GPUs, 75 MW	Operational + expanding
New Jersey	Operational	Live
Kansas City	35,000 GPUs, 40 MW	Coming online
Iceland	Planned	Under development

Token Factory Pricing

Model	Input $/M	Output $/M
Llama 3.1 8B	$0.02	$0.06
Llama 3.3 70B	$0.13	$0.40
DeepSeek-V3	$0.50	$1.50
DeepSeek-R1	$0.80	$2.40

Batch inference at 50% of base pricing.³³

Capital Structure & Funding

Event	Date	Amount / Detail
Yandex Restructuring	Jul 2024	Spun out of Yandex NV; listed on NASDAQ as NBIS
NVIDIA Investment	Dec 2024	$350M from NVIDIA & Accel; earmarked for GPU procurement⁶⁰
Secondary Offering	Feb 2025	$700M raised; shares priced at $43
Cash Position	End FY2025	$3.7B total cash & equivalents
Microsoft Deal	2025	$17.4B (up to $19.4B) five-year infrastructure agreement³²
Meta Deal	2025	~$3B infrastructure partnership

Customer Use Case Metrics

Customer	Use Case	Verified Outcome
Microsoft	AI infrastructure capacity	$17.4B five-year deal; largest known Nebius engagement³²
Meta	GPU cluster capacity	~$3B deal for training and inference infrastructure
Tavily	AI search & retrieval (acquired)	Acquired to add agentic AI search capabilities to Nebius platform
Enterprise customers	Token Factory API	Demand exceeded capacity in Q4 2025; sold out driving 547% YoY Q4 growth

Deep Dive: Yandex Heritage, European Positioning & Scale Economics

The Yandex advantage: Nebius inherited Yandex's 25+ years of large-scale infrastructure operations. Yandex was Russia's Google—search, cloud, self-driving cars, e-commerce. This means Nebius entered the AI infrastructure market with mature operational playbooks that startups lack: data center design, GPU procurement at scale, and network engineering.

European sovereign play: Nebius is headquartered in Amsterdam and operates Europe's largest GPU cluster in Finland (60K GPUs). The EU AI Act and GDPR create demand for European-domiciled inference. Nebius is the only platform in this group with production infrastructure in the EU, giving them first-mover advantage on $80B sovereign cloud market.³⁹

Unit economics at scale: ~70% gross margin on $530M revenue ($371M gross profit) is remarkable for infrastructure. The economics work because Nebius owns their data centers, procures GPUs at hyperscaler volume, and runs Token Factory at high utilization. Guidance of 40% EBITDA margin on $3.0-3.4B 2026 revenue implies ~$1.2-1.4B EBITDA potential.

Capacity as the constraint: Nebius was sold out in Q4 2025. The Kansas City DC (35K GPUs, 40 MW) coming online in H1 2026 and Iceland expansion should relieve this, but demand from Microsoft/Meta absorbs most new capacity. Token Factory for external customers competes with hyperscaler contracts for GPU allocation.

Risk factors: Concentration risk (Microsoft = majority of revenue), geopolitical perception (Yandex heritage), and the Arkady Volozh single-founder dependency. EU sanctions compliance adds operational complexity.

Competitive Threat Assessment: Very High

Nebius operates at a fundamentally different scale: $3.7B cash, $17.4B Microsoft deal, publicly traded, ~70% gross margins. Token Factory pricing is the most aggressive in this group (Llama 70B at $0.13/$0.40). Their European infrastructure positions them for the $80B sovereign cloud opportunity. Primary weakness: capacity-constrained (sold out in Q4 2025), limited model catalog (60+ vs 200+ at Together).

Section 09

Crusoe: The Energy-Advantaged Inference Platform

$10B+

Valuation

$3.9B

Total Raised

9.9x

TTFT Improvement (MemoryAlloy)

$0.03

kWh Energy Cost

Crusoe is the most heavily capitalized company in this group ($3.9B total raised) and uniquely positioned as both a GPU cloud and a managed inference platform. Managed Inference reached general availability in November 2025, powered by the proprietary MemoryAlloy engine.³⁴ Crusoe's structural energy cost advantage (~$0.03/kWh) underpins its long-term margin thesis.

Leadership

Name	Role	Background
Chase Lochmiller	CEO & Co-Founder	Stanford CS; former quant trader
Erwan Menard	SVP Product	Ex-Google Cloud AI (Vertex AI Director of PM); CEO of Elastifile (acquired by Google)³⁵
Eesha Pathak	Sr. Director PM	Ex-Google Cloud AI (Head of Product, Enterprise AI & International Expansion); 15+ years³⁶
Aditya Shanker	GPM, Inference	Inference product lead
Omar Lari	Sr. Director PM, IaaS	Infrastructure product lead

Product Suite

Managed Inference (GA Nov 2025): MemoryAlloy-powered API with per-token pricing
BYOM (Feb 2026): Deploy custom fine-tuned models on MemoryAlloy infrastructure; formally announced by SVP Erwan Menard⁶²
Intelligence Foundry: Model catalog (8+ models), API key management, endpoint management
Command Center (Feb 2026): Unified operations platform for managing AI workloads across clusters⁶³
GPU Cloud (IaaS): H100/H200/B200/GB200 NVL72 bare-metal and VM instances; AMD GPU support via SkyPilot⁶⁴
AutoClusters (Feb 2026): Automated hardware failure resilience for large GPU clusters⁶⁵
MCP Server (Feb 2026): Model Context Protocol integration for Crusoe Cloud⁶⁶
NVIDIA Run:ai Certified (Nov 2025): Partner-certified Kubernetes distribution for GPU scheduling⁶⁷

Compliance & Certifications

SOC2 + ISO 27001 + ISO 42001 (Feb 2026). Crusoe achieved ISO 27001 (information security management) and ISO 42001 (AI governance) certifications, significantly closing the compliance gap with Fireworks and Baseten.⁶⁸ ISO 42001 is notable—it's the first AI-specific governance standard, and Crusoe is the only platform in this group to hold it.

Pricing

Model	Input $/M	Output $/M
Llama 3.3 70B	$0.25	$0.75
DeepSeek R1	$1.35	$5.40
Qwen3 235B	$0.22	$0.80
Kimi-K2	$0.60	$2.50

Funding History

Round	Date	Amount	Key Investors / Notes
Series A	Apr 2022	$128M	Valor Equity Partners
Series B	Sep 2022	$350M	G2 Venture Partners
Series C	Aug 2024	$600M at ~$3B	Fidelity, NEA, Founders Fund⁶⁹
Debt Facility	2024	$225M	Infrastructure financing
Series D+E	2025	Undisclosed	Valuation reported at $10B+⁵²
Total Raised	—	~$3.9B	Includes equity + debt

Performance Benchmarks

Metric	Value	Source / Context
TTFT (MemoryAlloy)	9.9x faster vs vLLM	Internal benchmark, Nov 2025¹⁴
Throughput	5x vs vLLM baseline	MemoryAlloy cluster-scale test
Llama 3.1 Fine-Tuning (GB200)	3x faster vs H100	GB200 NVL72 benchmark, Feb 2026⁷⁰
InferenceMAX	Benchmark co-creator	Partnership with SemiAnalysis, Oct 2025⁷¹

Customer Use Case Metrics

Customer	Use Case	Verified Outcome
Cursor	AI code editor infrastructure	Multi-provider strategy; Crusoe as GPU infrastructure layer (shared with Fireworks/Baseten)³⁷
Together AI	GPU cloud customer	Runs training & inference workloads on Crusoe H100/H200 clusters (metrics undisclosed)
Fireworks AI	GPU cloud customer	Uses Crusoe infrastructure for compute capacity scaling (metrics undisclosed)
Odyssey	General-purpose world models	Pioneering world model training on Crusoe's scalable GPU cloud; featured case study Jan 2026⁷²
Decart (MirageLSD)	Real-time AI video generation	MirageLSD model deployed on Crusoe Cloud; real-time video synthesis⁷³
Sony, Databricks, MIT	Enterprise AI / research	GPU cloud customers (specific metrics undisclosed)

Energy Infrastructure

Crusoe's foundational advantage is structural energy cost. Originally built on stranded natural gas, now transitioning to renewable sources. At ~$0.03/kWh, Crusoe operates at roughly 50–60% lower energy cost than hyperscaler data centers, creating a durable margin advantage that compounds as inference workloads scale.

Deep Dive: Energy Economics, MemoryAlloy Architecture & Recent Product Velocity

The energy moat quantified: At $0.03/kWh vs. ~$0.06-0.08/kWh for hyperscalers, Crusoe saves ~$0.03-0.05/kWh. A single H100 draws ~0.7 kW, running 24/7/365 = ~6,132 kWh/year. That's ~$184-307/year per GPU in energy savings. At 10,000 GPUs: $1.8-3.1M/year in structural cost advantage. At 100,000 GPUs: $18-31M/year. This advantage scales linearly and compounds as GPU power draw increases with each generation (B200 draws ~1kW, GB200 even higher).

MemoryAlloy architecture: Unlike other engines that optimize per-GPU efficiency, MemoryAlloy operates at the system level by decoupling KV-cache storage from GPU compute. In multi-turn conversations or long-context workloads, KV-cache data is persisted across requests, eliminating redundant prefill computation. This is why the 9.9x TTFT improvement is on time-to-first-token specifically—it's the prefill step that benefits most from cache reuse.

Product velocity (Nov 2025 – Feb 2026): Crusoe shipped an extraordinary amount in 90 days: Managed Inference GA (Nov 20), MemoryAlloy engine paper (Nov 20), Run:ai certification (Nov 17), BYOM formal launch (Feb 6), Command Center (Feb 18), AutoClusters (Feb 3), MCP Server (Feb 11), GB200 NVL72 fine-tuning benchmarks (Feb 6), AMD GPU support (Jan 13), and ISO 27001+42001 (Feb 13). This cadence suggests a well-staffed product org executing at startup speed despite 1,000+ employees.

Compliance leapfrog: The ISO 42001 certification is strategic. It's the world's first AI governance standard (ISO/IEC 42001:2023). No other platform in this group holds it. For enterprises evaluating AI risk governance, this is a differentiator—particularly in regulated industries and government contracts where AI-specific compliance frameworks are emerging requirements.

Platform customer dynamics: Crusoe's most interesting competitive dynamic is that two of its biggest competitors (Fireworks and Together) are also customers of its GPU cloud. This creates an unusual relationship: Crusoe provides the infrastructure that powers competing managed inference APIs. Erwan Menard's Feb 2026 blog framing ("Building the world's favorite AI cloud") suggests Crusoe sees this as a feature, not a conflict—the IaaS revenue from competitors funds managed inference R&D.

Go-to-market evolution: With only 8 models in the catalog vs. 200+ at Together, Crusoe is leaning into BYOM + Command Center as the enterprise play. The combination of "bring your fine-tuned model + run it on MemoryAlloy + monitor via Command Center" creates an end-to-end value proposition for enterprises that want performance without managing infrastructure. The InferenceMAX benchmark partnership with SemiAnalysis also positions Crusoe as a thought leader on inference performance measurement.

Leadership signal: Hiring Erwan Menard (ex-Vertex AI Director of PM) and Eesha Pathak (ex-Google Cloud AI, Head of Product) signals Crusoe is serious about building a Google Cloud-caliber product organization. The shipping velocity since their arrival validates this thesis.

Competitive Position Assessment (Updated Feb 2026)

Crusoe is uniquely positioned as the only platform in this group that owns its energy infrastructure AND holds ISO 42001 (AI governance) certification. The product velocity since Nov 2025 has been exceptional: 10+ major launches in 90 days. ISO 27001+42001 closes the compliance gap significantly. Command Center + MCP Server address the developer experience gap. GB200 NVL72 and AMD GPU support via SkyPilot expand hardware flexibility. The remaining gaps: model catalog depth (8 vs. 200+ at Together) and proven production scale at token volume comparable to Fireworks' 10T tokens/day.

Section 10

Pricing Benchmarks: Token-Level Comparison

Pricing is the most visible competitive dimension in managed inference. The table below normalizes per-token costs across the five platforms for comparable models.

Llama 3.3 70B ($/M Tokens)

Platform	Input	Output	Blended (1:1)	vs. Cheapest
Nebius	$0.13	$0.40	$0.265	Cheapest
Crusoe	$0.25	$0.75	$0.50	+89%
Together AI	$0.88	$0.88	$0.88	+232%
Fireworks AI	$0.90	$0.90	$0.90	+240%
Baseten	Dedicated GPU deployments only (not per-token)

DeepSeek-R1 ($/M Tokens)

Platform	Input	Output	Blended (1:1)
Nebius	$0.80	$2.40	$1.60
Together AI	$3.00	$7.00	$5.00
Crusoe	$1.35	$5.40	$3.38
Fireworks AI	~$8.00	~$8.00	$8.00

GPU Hourly Rates (Where Available)

GPU	Fireworks	Baseten (per-min)	Crusoe
H100 80GB	$4.00/hr	$6.48/hr ($0.108/min)	$3.90/hr
B200 180GB	$9.00/hr	$9.96/hr ($0.166/min)	TBD

Pricing Intelligence

Nebius is the price leader on per-token models, leveraging scale (30K+ GPUs) and ~70% gross margins. Crusoe is positioned mid-market, 89% above Nebius but 44% below Fireworks/Together on Llama 70B. Fireworks and Together compete on speed/reliability, not price. Baseten avoids per-token comparison by focusing on dedicated deployments where customers control cost-per-GPU-hour. Token pricing deflation of ~10x/year means today's prices will be tomorrow's floor.

Section 11

Head-to-Head Competitive Matrix

This matrix rates each platform across eight dimensions critical to enterprise managed inference buyers. Ratings are relative within this five-platform set (5 = best-in-class, 1 = weakest).

Dimension	Fireworks	Together	Baseten	Nebius	Crusoe
Engine Performance	5	4	4	3	4
Model Catalog	4	5	3	3	2
Per-Token Pricing	2	3	N/A	5	4
Enterprise Compliance	5	3	4	4	4
Developer Experience	4	4	5	3	3
BYOM / Customization	3	4	5	3	3
Infrastructure Scale	3	4	3	5	4
Cost Structure Moat	2	3	3	4	5

SLA & Reliability Comparison

Dimension	Fireworks	Together	Baseten	Nebius	Crusoe
Uptime SLA	99.9% (Enterprise)	Best effort	99.9% (dedicated)⁶¹	99.95% (cloud SLA)	TBD (new GA)
Production Validation	10T tok/day proven	600K+ developers	100x volume growth '25	Sold out Q4 2025	GA Nov 2025; 10+ launches in 90 days
Compliance	SOC2 + HIPAA + GDPR	SOC2 Type II	SOC2 + HIPAA	ISO 27001 + SOC2	SOC2 + ISO 27001 + ISO 42001
Dedicated Capacity	On-Demand deployments	GPU clusters	Per-GPU dedicated	Enterprise tiers	BYOM (contact sales)
Multi-Region	18+ regions, 8+ clouds	US + EU (expanding)	US (AWS SCA)	Finland, NJ, KS	US (TX, WY)
Rate Limits	Custom (enterprise)	Tier-based	Dedicated = unlimited	Custom quotas	Contact sales

Enterprise Readiness Assessment

Fireworks leads on enterprise compliance (SOC2+HIPAA+GDPR trifecta) and multi-region availability. Baseten offers the strongest dedicated SLA for custom model deployments. Nebius has the highest uptime target (99.95%) backed by their infrastructure ownership. Crusoe has closed its compliance gap significantly with ISO 27001+42001 (the only AI governance certification in this group). Together is still maturing its enterprise compliance posture (SOC2 only), which limits uptake in regulated industries.

Winner By Use Case

Use Case	Best Platform	Why
High-volume production API	Fireworks	10T tokens/day proven scale, fastest engines, SOC2+HIPAA+GDPR
Research & experimentation	Together	200+ models, FlashAttention pedigree, broadest catalog
Custom model deployment	Baseten	Truss SDK, Chains for pipelines, Engine Builder, best DX
Cost-optimized at scale	Nebius	Lowest per-token pricing, 70% gross margins, owned DCs
Energy-advantaged inference	Crusoe	Structural $0.03/kWh cost, MemoryAlloy architecture, BYOM

Section 12

Market Outlook & Strategic Positioning

Three Forces Reshaping the Landscape (2026–2027)

1. NVIDIA's Inference Ecosystem Play

The $20B Groq acquisition (Dec 2025) and $150M Baseten investment signal NVIDIA is building a vertically integrated inference ecosystem.³⁸ Platforms aligned with NVIDIA (Baseten, Nebius) gain preferential access to TensorRT-LLM optimizations, Blackwell/Vera Rubin early access, and co-marketing. Platforms with custom engines (Fireworks, Crusoe) must maintain parity independently.

2. Sovereign AI Demand Explosion

The sovereign cloud market is projected to reach $80B in 2026 and $823B by 2032.³⁹ 65% of governments will introduce sovereignty requirements by 2028 (Gartner). Platforms with physical infrastructure (Nebius, Crusoe) have an inherent advantage over API-only providers. Together's European expansion with 100K GPUs addresses this but through colocation, not owned infrastructure.

3. Full-Lifecycle Convergence

The market is converging toward platforms that own the complete model lifecycle: inference + fine-tuning + evaluation + post-training (RL). Baseten (via Parsed) and Together (via Refuel) have made acquisitions specifically to close this loop. Platforms offering inference-only will face pressure to expand.

Competitive Dynamics to Watch

Q1 2026

Baseten Series E ($300M) deployment—watch for enterprise logo acceleration

H1 2026

Nebius Kansas City DC comes online (35K GPUs)—capacity constraints ease

H1 2026

Together European GPUs deploy via Hypertec—sovereign AI offering matures

H2 2026

NVIDIA Vera Rubin early access—watch which platforms get first allocation

2026

Fireworks 3–4x infrastructure expansion + full AI creation toolchain⁴⁰

Strategic Assessment

The managed inference market is large enough ($20.6B in 2026) and growing fast enough (41% CAGR) to support multiple winners. No single platform dominates all dimensions. The sustainable winners will be those that combine proprietary engine optimization (Fireworks, Crusoe) with infrastructure scale (Nebius, Crusoe) and full-lifecycle capabilities (Baseten, Together). The next 12 months will determine whether the market consolidates around 2–3 platforms or remains pluralistic.

Where Crusoe Fits

Crusoe occupies a unique position as the only platform in this group with both proprietary engine technology (MemoryAlloy) and owned energy infrastructure. This creates a structural cost advantage that scales with inference volume. Since the Managed Inference GA in November 2025, Crusoe has shipped at extraordinary velocity: 10+ major product launches in 90 days, including ISO 27001+42001 certifications, Command Center, BYOM, AutoClusters, MCP Server, and GB200 NVL72 benchmarks.

The compliance picture has changed significantly. ISO 27001 + ISO 42001 now puts Crusoe ahead of Together (SOC2 only) and at parity with Nebius (ISO 27001 + SOC2) on security certifications. The ISO 42001 AI governance certification is unique in this landscape—a differentiator for regulated enterprises and government contracts.

The hiring of ex-Google Cloud AI leadership (Erwan Menard, Eesha Pathak) signaled a deliberate pivot from infrastructure-company-that-does-inference to inference-platform-that-owns-infrastructure. The 90-day shipping cadence since validates this strategy is executing.

The Crusoe Opportunity (Updated Feb 2026)

With compliance gaps largely closed (ISO 27001+42001) and developer experience improving (Command Center, MCP Server), Crusoe's remaining strategic priorities narrow to two: (1) expand the Intelligence Foundry model catalog from 8 to 30+ models to compete with Together/Fireworks on breadth, and (2) prove production token volume at scale comparable to Fireworks' 10T tokens/day. The combination of MemoryAlloy performance + $0.03/kWh energy + ISO 42001 + owned infrastructure creates a defensible position that no other platform in this landscape can replicate.

Sources & Footnotes

[1] Gartner, "AI-Optimized IaaS Poised to Become Next Growth Engine," Oct 2025. gartner.com
[2] Deloitte, "Compute Power AI Predictions 2026." deloitte.com
[3] MarketsandMarkets, "AI Inference Platform-as-a-Service Market, $105.22B by 2030." marketsandmarkets.com
[4] a16z, "State of AI / 100T Token Study," Jan 2026. a16z.com
[5] CNBC, "Nvidia buying Groq's assets for about $20 billion," Dec 2025. cnbc.com
[6] CoreWeave IPO pricing announcement, Mar 2025. coreweave.com
[7] a16z, "LLMflation: LLM Inference Cost Trends." a16z.com
[8] Fireworks AI, "FireAttention V4: FP4 on B200." fireworks.ai
[9] Fireworks AI, "Cursor Case Study: 1,000 tok/s." fireworks.ai
[10] SemiAnalysis via X, "FlashAttention v4 at HotChips: 22% faster than cuDNN." x.com
[11] Together AI, "Tri Dao and FlashAttention." together.ai
[12] Baseten Docs, "Model Deployment Overview." docs.baseten.co
[13] Nebius, "Token Factory: Managed Inference." nebius.com
[14] Crusoe, "Managed Inference: MemoryAlloy." crusoe.ai
[15] Fireworks AI, "Series C Announcement." fireworks.ai
[16] Sacra, "Fireworks AI Revenue & Funding." sacra.com
[17] LLM Benchmarks, "Fireworks Provider Performance." llm-benchmarks.com
[18] BusinessWire, "Fireworks AI Raises $250M Series C." businesswire.com
[19] Together AI, "Series A: $102.5M." together.ai; Crunchbase: $534M total. crunchbase.com
[20] Together AI, "Llama 4 Launch Partner." together.ai
[21] Together AI, "Acquires Refuel.ai," May 2025. yahoo.com
[22] Sacra, "Together AI Revenue & Valuation." sacra.com
[23] Together AI European GPU expansion with Hypertec/5C. together.ai
[24] Yahoo Finance, "NVIDIA Invests $150M in Baseten." yahoo.com
[25] Baseten, "Chains for Production Compound AI Systems." baseten.co
[26] BusinessWire, "Baseten Acquires Parsed," Dec 2025. businesswire.com
[27] Baseten, "GPT-OSS 120B at 500+ TPS." baseten.co
[28] Google Cloud Blog, "Baseten 225% Cost-Performance." cloud.google.com
[29] SiliconANGLE, "Baseten hits $5B valuation, $300M round." siliconangle.com
[30] Nebius FY2025 earnings: $529.8M revenue, +479% YoY. nebius.com
[31] Nebius 2026 guidance: $3.0-3.4B revenue, $7-9B ARR, 40% EBITDA. nebius.com
[32] Microsoft-Nebius $17.4B infrastructure agreement. nebius.com
[33] Nebius, "Token Factory Pricing." nebius.com
[34] Crusoe, "Managed Inference GA," Nov 2025. crusoe.ai
[35] Erwan Menard, SVP Product, Crusoe. Ex-Google Cloud AI (Vertex AI). linkedin.com
[36] Eesha Pathak, Sr. Director PM, Crusoe. Ex-Google Cloud AI. zoominfo.com
[37] Crusoe customer announcements: Series E, inference launch. crusoe.ai
[38] CNBC, "Nvidia-Groq deal structured to keep 'fiction of competition alive.'" cnbc.com
[39] Gartner, "Sovereign Cloud IaaS Spending $80B in 2026." gartner.com
[40] Fireworks AI, "Series C: 3-4x Infrastructure Expansion." fireworks.ai
[41] MarketsandMarkets, "AI Inference Market Size, $255B by 2030." marketsandmarkets.com
[42] Epoch AI, "LLM Inference Price Trends." epoch.ai
[43] Menlo Ventures, "2025 State of Generative AI in the Enterprise." menlovc.com
[44] Sequoia, "AI's $600B Question." sequoiacap.com
[45] Introl, "Sovereign Cloud AI Infrastructure Data Residency." introl.com
[46] IDC, "AI Infrastructure Spending $86B in Q3 2025." idc.com
[47] SDxCentral, "AI Inferencing Will Define 2026." sdxcentral.com
[48] Baseten, "AWS Strategic Collaboration Agreement," Dec 2025. businesswire.com
[49] Sacra, "Baseten Valuation & Funding." sacra.com
[50] Index Ventures, "Inference is the New Runtime." indexventures.com
[51] CapitalG, "Baseten: The Foundation for Production AI." capitalg.com
[52] Crusoe, "Series E and Bookings Growth." crusoe.ai
[53] Baseten, "Performance Benchmarks: Mistral 7B at 130ms TTFT." docs.baseten.co
[54] Fireworks AI, "Notion Case Study: 4x Latency Reduction." fireworks.ai
[55] Fireworks AI, "Cresta Case Study: 100x Cost Savings." fireworks.ai
[56] Together AI, "Series A: $102.5M led by Kleiner Perkins." together.ai
[57] Salesforce Ventures, "Together AI Series B: $106M." salesforce.com
[58] Baseten, "Writer Case Study: Custom Palmyra Model Deployment." baseten.co
[59] Baseten, "OpenEvidence: 78% Lower Latency for Clinical AI." baseten.co
[60] Nebius, "$350M Investment from NVIDIA & Accel, Dec 2024." nebius.com
[61] Baseten, "Enterprise SLA: 99.9% Uptime for Dedicated Deployments." docs.baseten.co
[62] Erwan Menard, "Building the world's favorite AI cloud" (BYOM announcement), Feb 6, 2026. crusoe.ai
[63] Crusoe, "Introducing Command Center: Unified operations platform for AI workloads," Feb 18, 2026. crusoe.ai
[64] Crusoe, "Running AI workloads on AMD GPUs with SkyPilot," Jan 13, 2026. crusoe.ai
[65] Crusoe, "AutoClusters: Minimizing impact of hardware failures in large GPU clusters," Feb 3, 2026. crusoe.ai
[66] Crusoe, "Introducing the Crusoe Cloud MCP server," Feb 11, 2026. crusoe.ai
[67] Crusoe, "Crusoe Managed Kubernetes (CMK) now partner-certified for NVIDIA Run:ai," Nov 17, 2025. crusoe.ai
[68] Crusoe, "Security you can trust: Crusoe Cloud achieves ISO 27001 and 42001 certifications," Feb 13, 2026. crusoe.ai
[69] TechCrunch, "Crusoe Energy raises $600M Series C at ~$3B valuation," 2024. techcrunch.com
[70] Crusoe, "Up to 3X faster: Benchmarking Llama 3.1 fine-tuning on NVIDIA GB200 NVL72," Feb 6, 2026. crusoe.ai
[71] Crusoe, "The new AI benchmark: Unlocking real-world performance with InferenceMAX by SemiAnalysis," Oct 16, 2025. crusoe.ai
[72] Crusoe, "Odyssey is pioneering general-purpose world models with Crusoe's AI cloud," Jan 21, 2026. crusoe.ai
[73] Crusoe, "MirageLSD: Decart's real-time AI video model now available on Crusoe Cloud," Oct 8, 2025. crusoe.ai
[74] Andrey Korolenko, Chief Product & Infrastructure Officer, Nebius. linkedin.com
[75] Ophir Nave, COO & Executive Director, Nebius. nebius.com

MinjAI Competitive Intelligence Platform • Managed Inference Landscape Report • February 2026

75 Sources • 12 Sections • 5 Companies Analyzed

This report is for strategic intelligence purposes. Market data and pricing are subject to change.