Landscape Report — Enterprise AI Inference Demand Side

Enterprise AI Inference Buyers: The $37B Demand-Side Landscape

Cursor • Perplexity • Lovable • ElevenLabs • Replit • Harvey • Sierra • JPMorgan • Epic • Palantir • Tesla

Feb 2026 MinjAI Agents 85+ Sources 14 Sections

Internal — Strategic Intelligence

Section 01

Executive Summary

$37B

Enterprise GenAI Spend (2025)¹

55%

Inference Share of AI Compute (2026)²

30+

Companies Profiled

$650M

Cursor’s Annual Inference Bill³

Thesis

This report maps who buys inference, what they pay, and where margins break. AI-native startups spend 28–130% of revenue on inference. That is unsustainable. Any provider offering 30–50% cost reduction has a direct, measurable ROI case.

Enterprise AI spending hit $37B in 2025.¹ Inference now exceeds training compute.² 76% of enterprises buy, not build.⁴

The buyer landscape is not monolithic. Cursor and Perplexity face margin crises. JPMorgan and Epic need compliance-first deployment. Palantir requires air-gapped sovereign infrastructure.

Independent inference platforms can target all three segments. Sovereign-ready deployment serves regulated verticals. A 30–50% cost advantage serves margin-compressed AI-natives.

Buyer Segment Conviction Rankings

Segment	Market Size	Independent Provider Fit	Conviction
AI-Native Startups (Tier 1)	$12.5B API market⁵	Cost + latency	Highest
Financial Services	$73B total AI⁶	Sovereign + compliance	High
Healthcare	$45B total AI⁶	HIPAA + on-prem	Medium-High
Defense / Government	$350M+ vertical AI⁷	Air-gapped sovereign	High
AI-Native Startups (Tier 2)	Included above	Cost optimization	Medium
Automotive / Edge	$16.7B mfg AI⁶	Limited (edge-first)	Low

Section 02

Why Demand Matters Now

3.2x

YoY Enterprise AI Spend Growth¹

76%

Enterprises Buying (Not Building)⁴

37%

Use 5+ Models⁸

Three structural shifts make demand-side analysis urgent now.

Shift 1: Inference surpasses training. 2023: inference was 33% of compute. 2026: it reaches 55%.² First year inference exceeds training. Every new app deployed increases inference demand permanently.

Shift 2: Build-to-buy reversal. In 2024, 47% of enterprises built internally. By 2025, only 24% build.⁴ Enterprises want governance, audit trails, and compliance. They will pay for it.

Shift 3: Multi-model is standard. 37% of enterprises use 5+ models, up from 29%.⁸ Average enterprise LLM spend: $7M, up from $4.5M.⁸ Model diversity creates demand for provider-agnostic platforms.

Bottom Line

Inference is no longer a cost center. It is the primary revenue driver for a new company class. Whoever controls inference economics controls the AI application layer.

Inference Demand Milestones

2023

Inference = 33% of AI compute. Enterprise GenAI spend: $1.7B.¹

2024

Inference reaches 50%. Enterprise spend jumps to $11.5B. Cursor crosses $100M ARR.⁹

2025

Enterprise GenAI: $37B. Foundation model APIs: $12.5B. Build-to-buy shift accelerates.¹

2026

Inference crosses 55% of compute. First year inference > training. Projected $20.6B inference spend.²

2030

AI inference market: $255B. CAGR 19.2% from 2025 base of $106B.¹⁰

Where Enterprise AI Dollars Go (2025)

Category	Spend	Share
Foundation Model APIs	$12.5B¹	70% of infra layer
Model Training Infrastructure	$4.0B	22%
AI Infrastructure (data/orchestration)	$1.5B	8%
Coding AI	$4.0B¹¹	55% of dept. AI
IT Operations AI	$700M	10%
Marketing AI	$660M	9%
Customer Success AI	$630M	9%

Section 03

Buyer Landscape Snapshot

Every major inference buyer profiled in this report appears below. It spans AI-natives, enterprises, and government buyers. The common thread: all consume massive inference volume.

Company	Segment	Valuation / Mcap	ARR / Revenue	Primary Provider	Compliance
Cursor	AI-Native T1	$29.3B³	~$1B ARR	Anthropic, OpenAI, xAI, Google	SOC2
Perplexity	AI-Native T1	$20B¹²	$148M ARR	Multi-provider + in-house	SOC2
Lovable	AI-Native T1	$6.6B¹³	$200M ARR	Anthropic, OpenAI	SOC2
ElevenLabs	AI-Native T1	$6.6B¹⁴	$200M ARR	Own infra + cloud GPU	SOC2
Replit	AI-Native T1	$9B¹⁵	$252M ARR	OpenAI, Anthropic, Google	SOC2
Harvey AI	AI-Native T1	$8–11B¹⁶	~$50–100M	OpenAI (primary)	Legal compliance
Sierra AI	AI-Native T1	$10B¹⁷	~$150M ARR	OpenAI, Anthropic, Meta	Enterprise SLA
Runway	AI-Native T1	$5.3B¹⁸	~$90M ARR	Own GPU + GCP/AWS burst	SOC2
Glean	AI-Native T1	$7.2B¹⁹	$270M ARR	Multi-LLM abstracted	Enterprise
Midjourney	AI-Native T2	Private	$200M+ ARR	Google TPUs²⁰	Basic
Character.ai	AI-Native T2	$1B²¹	$32M	Open-source (DeepSeek, Llama)	Basic
Bolt.new	AI-Native T2	$700M²²	~$40–100M	Anthropic (primary)	Basic
JPMorgan	Financial	Public	~$2B AI spend²³	AWS + OpenAI/Anthropic	Data residency
Goldman Sachs	Financial	Public	Undisclosed	Multi-provider via gateway	SEC/FINRA
Epic Systems	Healthcare	Private	Undisclosed	Azure OpenAI²⁴	HIPAA/HITRUST
Palantir	Defense	Public ($4.4B rev)	$4.4B revenue²⁵	Model-agnostic (AIP)	IL6 / Air-gap
Scale AI	Defense	Private	$300M+ gov contracts²⁶	Multi-cloud + GovCloud	FedRAMP/TS-SCI
Tesla	Automotive	Public	$16.5B Samsung deal²⁷	Proprietary AI5/AI6	ASIL-D
Waymo	Automotive	$16B²⁸	Part of Alphabet capex	Google TPU + edge	ISO 26262

Section 04

AI-Native Startups: The New Buyer Class (Tier 1)

These nine companies represent a new category. Inference is not a cost line. It is their primary COGS. Unit economics depend entirely on inference pricing. They are the most price-sensitive, highest-volume buyers.

The COGS Crisis

Cursor pays ~$650M/year to Anthropic on ~$500M revenue. Negative gross margin at scale.³ This is not an outlier. It is the structural reality of AI-native business models. Every company here faces this problem.

Cursor (Anysphere) — $29.3B Valuation, ~$1B ARR

Metric	Value
Valuation	$29.3B (Nov 2025, Series D)³
ARR	~$1B (crossed Nov 2025)
Total Funding	~$2.6B (Accel, Coatue)
Users	360K+ paying, 1M+ daily active
Inference Providers	Anthropic, OpenAI, xAI, Google
AWS Spend	$12.6M/month (June 2025), doubling monthly²⁹
Anthropic Bill	~$650M/year (est.)³

Verdict: Cursor is the poster child for the inference COGS crisis. At ~$1B ARR, it is the world’s largest coding AI. Inference costs consume 28–130% of revenue.²⁹ Three responses: $200/month “Ultra” tier, credit-based pricing, and a proprietary “Composer” LLM.

Provider Relevance

Cursor is an ideal design partner for independent inference providers. A 30% cut on $650M equals $195M annual savings. Multi-year agreements with four providers. Actively seeks cost optimization.

Perplexity — $20B Valuation, $148M ARR

Metric	Value
Valuation	$20B (Sep 2025)¹²
ARR	$148M (June 2025 annualized)
Total Funding	~$1.22B (SoftBank, NVIDIA, Accel)
Query Volume	780M monthly queries
Inference Spend	$100–150M+ (estimated)
Providers	OpenAI, Anthropic, Google + in-house

Verdict: 780M queries per month. GPU procurement is their top use of capital. Revenue: $20/month Pro, $200/month Max, enterprise contracts. Inference cost as percent of revenue: estimated 33–67%.

Risk Factor

Perplexity runs hybrid in-house and API models. Self-hosting reduces API dependency. The opportunity is dedicated capacity, not pay-as-you-go API.

Lovable — $6.6B Valuation, $200M ARR

Metric	Value
Valuation	$6.6B (Dec 2025, Series B)¹³
ARR	$200M (Nov 2025)
Total Funding	~$500M+ (CapitalG, Menlo, NVIDIA)
Providers	Anthropic (Claude Sonnet), OpenAI

Verdict: Fastest-growing “vibe coding” tool. Investors flag unit economics as a concern. Every new customer compresses gross margins. No gross margin disclosed.

Claude Sonnet 3.5 enabled Lovable’s launch. Anthropic dependency is high. This creates both cost risk and switching opportunity.

ElevenLabs — $6.6B Valuation, $200M ARR

Metric	Value
Valuation	$6.6B¹⁴
ARR	$200M (Sep 2025, doubled from $100M in 9 months)
A16z Rank	#5 in enterprise AI application spend³⁰
Pricing	Characters/month, $0.06/min overage

Verdict: Voice synthesis requires real-time inference. Latency tolerance: <200ms. One of the most compute-intensive workloads. ElevenLabs likely runs own GPU clusters with cloud burst.

Market Opportunity

Low-latency inference is 10–100x better than web-based alternatives. Voice AI companies need this. Dedicated capacity with predictable costs is the value proposition.

Replit — $9B Valuation, $252M ARR

Metric	Value
Valuation	$9B (Jan 2026 raise)¹⁵
ARR	$252M (Oct 2025)
Prior Valuation	$3B (Sep 2025)
Target	$1B ARR in 2026
Providers	OpenAI, Anthropic, Google

Verdict: 15.8x growth in one year ($16M to $252M ARR). AI agent revenue drives most of the spike. Replit bills API costs through to users, partially insulating margins. But the $1B ARR target requires massive inference scale.

Multi-provider openness makes Replit a strong design partner candidate.

Harvey AI — $8–11B Valuation, Legal AI Leader

Metric	Value
Valuation	$8B (Dec 2025); raising at $11B (Feb 2026)¹⁶
Recent Funding	$160M (a16z), $300M Series E ($5B)
Customers	200+ top law firms, Big Four, Fortune 500 legal
Provider	OpenAI (primary partnership)

Verdict: Legal AI has the highest per-seat contract values. 200+ law firms generate multi-million-dollar ACV deals. OpenAI dependency is high. Compliance is strict: privilege, data residency, audit trails.

Compliance Angle

Law firms are the most compliance-sensitive buyers. OpenAI dependency creates concentration risk. Sovereign on-premises deployment is high-value for legal departments.

Sierra AI — $10B Valuation, ~$150M ARR

Metric	Value
Valuation	$10B (Sep 2025, Series C)¹⁷
ARR	~$150M (Jan 2026 est.); $100M hit in 21 months
Total Funding	$635M
Providers	OpenAI, Anthropic, Meta (multi-provider)

Verdict: Sierra uses a “constellation” multi-provider approach. No single provider dependency. Built for failover across models. This architecture templates independent provider integration.

Per-resolution pricing means Sierra absorbs inference costs. Multi-model routing optimizes cost vs. quality per query. Openness to alternative providers makes it a strong target.

Runway — $5.3B Valuation, Video Generation Leader

Metric	Value
Valuation	$5.3B (Feb 2026, Series E)¹⁸
Revenue	~$90M annualized (June 2025)
Recent Funding	$315M (Feb 2026) + $308M (Apr 2025)
Inference Model	Own GPU clusters + GCP/AWS burst

Verdict: Video generation is the most GPU-intensive inference workload. Credit-based pricing ($0.01/credit). Runs own clusters with cloud burst for peaks. Less addressable by third-party providers.

Glean — $7.2B Valuation, $270M ARR

Metric	Value
Valuation	$7.2B (June 2025, Series D)¹⁹
ARR	$270M (late 2025)
Market Share	~10% of agent platforms ($750M segment)¹
Providers	Multi-LLM abstracted layer

Verdict: Enterprise search at scale. Glean abstracts the model layer across multiple LLMs. Significant inference costs for knowledge retrieval. Compliance requirements are high (SSO, RBAC, DLP). Addressable by cost-efficient, compliant providers.

Section 05

AI-Native Startups: Emerging Buyers (Tier 2)

Smaller or earlier-stage, but inference demand is growing fast. Several face the same margin compression as Tier 1.

Company	Valuation	Revenue	Provider	Note
Midjourney	Private (bootstrapped)	$200M+ ARR	Google TPUs	65% cost cut by switching to TPUs²⁰
Character.ai	$1B (down from $2.5B)²¹	$32M	Open-source (DeepSeek, Llama)	Compute costs destroyed valuation
Bolt.new	$700M²²	~$40–100M	Anthropic (Claude)	1.3M tokens/day per user
Poe (Quora)	~$2.5B (Quora total)³¹	~$65M	Aggregator (all providers)	Pass-through; marks up API costs
Pika	$700–900M³²	~$85M	Own video models (GPU-intensive)	40% enterprise revenue
Augment Code	$977M³³	~$20M ARR	Undisclosed	Eric Schmidt-backed Copilot rival
Cohere	$7B³⁴	$240M ARR	Self-hosted (AMD + hyperscaler)	85% from on-prem enterprise
Jasper AI	Declining	$88M	OpenAI (GPT-4)	Pass-through model; facing ChatGPT threat
Stability AI	~$1B (down from $4B)	~$50M	AWS	Financially distressed; open-source pivot

Key Insight: Provider Switching Is Real

Midjourney cut inference costs 65% by switching to Google TPUs.²⁰ Before: $2M/month. After: $700K/month. Payback: 11 days. Buyers will switch for cost. Character.ai moved to open-source for the same reason. Provider lock-in is weaker than incumbents believe.

Cautionary Tale: Character.ai

Character.ai peaked at $2.5B. Compute costs + free users + founder departures collapsed it to $1B.²¹ Inference costs outran monetization. Every AI-native company here faces this risk.

Section 06

Financial Services: The Compliance-First Buyers

Largest enterprise AI vertical: $73B total spend.⁶ Compliance drives every procurement decision. Data residency, audit trails, and regulatory requirements override cost and speed.

JPMorgan — ~$2B AI Spend, 200K+ Employees on LLM Suite

Metric	Value
Total Tech Budget	$18B (2025)²³
AI Spend	~$2B (reclassified as core infra)
Annual AI Value	$1.5–2.0B
LLM Suite Users	200–250K employees, ~50% daily usage
AI Use Cases	450+ in production
Providers	AWS Bedrock/SageMaker + OpenAI + Anthropic
Deployment	Hybrid (private cloud + external API via compliance gateway)

Verdict: JPMorgan is the benchmark enterprise AI buyer. $2B spend. 450+ use cases. Compliance gateway filters all inference requests before data exits. Model-agnostic: OpenAI and Anthropic through controlled gateway.

Market Opportunity

JPMorgan wants model-agnostic inference with data residency control. No single-provider lock-in. Sovereign deployment paired with multi-model support aligns directly. The compliance gateway architecture is the integration point.

Goldman Sachs — Multi-Provider AI Assistant, 46K+ Users

Metric	Value
Employee Access	46,500+
Adoption	>50%; targeting 100% by end 2026
Providers	OpenAI GPT, Google Gemini, Anthropic Claude
Deployment	Private cloud + external API via compliance gateway
Compliance	SEC/FINRA audit trail, prompt filtering, data anonymization

Verdict: Goldman runs model-agnostic. Routes to OpenAI, Google, and Anthropic via private gateway. AI agents handle trade accounting and compliance. Business Conduct Code revised Jan 2026 for AI. Targeting 100% adoption.

Bloomberg — On-Premises BloombergGPT

Metric	Value
Model	BloombergGPT (50B parameters, proprietary corpus)
Deployment	Fully on-premises / private cloud
New Features	Document Search & Analysis (400M+ documents)
External Cloud	Zero for core financial data products

Verdict: Pure on-premises. Data sovereignty is the business model. Zero external cloud for core products. Not addressable by third-party providers. This is maximum data control.

Financial Services Pattern

All three banks share one architecture: compliance gateway between users and model APIs. No raw data to external models. Model-agnostic routing. Full audit trails. Sovereign-ready inference maps directly to this pattern.

Section 07

Healthcare & Defense/Government

Compliance is not optional here. HIPAA, FedRAMP, ITAR, and IL6 certifications determine who can compete. Procurement cycles: 6–24 months.

Healthcare

Epic Systems — 85% AI Adoption, Azure OpenAI

Metric	Value
AI Adoption	85% of Epic customers live on generative AI²⁴
AI Features	Art, Emmie, Penny copilots + Dragon Copilot
Insights Usage	16M+ times/month (3x growth since Nov 2025)
Provider	Microsoft Azure OpenAI Service (primary)
Deployment	Hybrid: on-prem EHR + Azure inference with BAA
Compliance	HIPAA HITRUST

Verdict: Epic dominates healthcare EHR. 260+ health systems. AI inference runs inside Azure’s HIPAA environment. AI Validation software lets hospitals test models before deployment. PHI requires contractual data processing agreements.

Recursion Pharmaceuticals — BioHive-2 Supercomputer

Metric	Value
Compute	BioHive-2: 504x H100 GPUs (TOP500 #35)³⁵
NVIDIA Deal	$50M investment + collaboration
Cloud Partners	Google Cloud (burst), Oracle Cloud (overflow)
Model	Phenom-Beta (molecular screening)
Compliance	GxP Data sovereignty

Verdict: Primary inference on-premises via BioHive-2. Bursts to Google Cloud for parallel screening. Petabyte-scale imaging stays on-prem. Pharma model: on-prem for sensitive data, cloud for burst.

Defense / Government

Palantir — $4.4B Revenue, IL6 Air-Gapped Inference

Metric	Value
Revenue	$4.4B (2025, +53% YoY)²⁵
Q3 TCV	$2.76B (+151% YoY)
US Commercial TCV	$1.31B (+342% YoY)
Platform	AIP (model-agnostic: GPT-4o, Claude, Llama)
Deployment	AWS GovCloud, Azure Secret, on-prem, air-gapped
Compliance	IL6 TS/SCI Air-gap

Verdict: AIP deploys the same codebase across commercial, GovCloud, and air-gapped networks. Patch time: 3.5 minutes via Apollo. Rackspace (Feb 2026) adds UK Sovereign Cloud. Anduril integration connects AIP with Lattice.

Competitive Template

Apollo is the competitive template for sovereign inference. Same code, any environment. Model-agnostic. Air-gapped capable. Any sovereign inference provider should study this architecture.

Scale AI — $300M+ Government Contracts

Metric	Value
DoD Contracts	$300M+ cumulative²⁶
Key Deals	$250M JAIC, $99M Army R&D, $41–100M TS networks
Edge Product	Thunderforge (real-time military logistics)
Compliance	CMMC FedRAMP High TS/SCI

Verdict: Primarily data infrastructure (labeling, RLHF, evaluation). Thunderforge is the inference play: real-time edge for military logistics. Commercial on multi-cloud. Classified on GovCloud or air-gapped.

Compliance Requirements Matrix

Sector	Certifications	Deployment	Procurement Cycle
Healthcare	HIPAA HITRUST SOC2	Hybrid (on-prem EHR + cloud inference)	6–12 months
Financial Services	SOC2 ISO 27001 GDPR	Private cloud + compliance gateway	6–12 months
Defense (Unclass.)	FedRAMP CMMC	GovCloud	12–18 months
Defense (Classified)	IL6 TS/SCI ITAR	Air-gapped / on-premises only	18–24+ months
Pharma / Life Sciences	GxP HIPAA	On-prem primary + cloud burst	12–18 months

Section 08

Automotive & E-Commerce: Edge vs. Cloud

Automotive and e-commerce: the edge inference frontier. Tesla and Waymo run on-device. Amazon and Shopify run cloud at massive scale. Instructive for independent provider positioning.

Edge vs. Cloud Inference Architecture

Application Layer

FSD (Tesla)

Waymo Driver

Amazon Personalize

Shopify Magic

Inference Layer

AI5 Chip (Tesla edge)

On-vehicle compute (Waymo)

AWS SageMaker (Amazon)

Cloud LLM APIs (Shopify)

Compute Infrastructure

Custom Silicon (Tesla AI5/AI6)

Google TPU (Waymo training)

AWS GPU Clusters

Multi-cloud (Shopify)

Data Layer

On-vehicle sensors (no cloud)

LiDAR + camera + radar

User behavior data (cloud)

Product catalog (cloud)

Company	Inference Model	Provider	Independent Provider Fit
Tesla	Edge-only (AI5 chip, no external cloud)²⁷	Proprietary (TSMC + Samsung fab)	None
Waymo	Hybrid (edge driving + GCP training/sim)²⁸	Google TPU + in-vehicle compute	None
Amazon	Cloud (AWS SageMaker, Bedrock)	Internal AWS infrastructure	None
Shopify	Cloud (multi-provider LLM APIs)	Multiple cloud providers	Medium

Verdict

Automotive is not addressable. Tesla and Waymo are fully vertical: own silicon, on-device inference. E-commerce is partially addressable: Shopify uses multi-provider APIs. But Amazon runs internal infrastructure. Limited opportunity for independents.

Section 09

Inference Economics: The COGS Crisis

AI-native companies face a structural problem. Inference is their primary COGS. Costs scale linearly with users (or worse). Traditional SaaS: 75–85% gross margins. AI-native: 50–65% at best.

130%

Cursor Inference Cost / Revenue³

33–67%

Perplexity Inference / Revenue

Midjourney After TPU Switch²⁰

75–85%

Traditional SaaS Gross Margin

Inference as % of Revenue

Cursor

~130%

Perplexity

33–67%

Character.ai

30–60%

ElevenLabs

15–30%

Midjourney

AI-Native vs. Traditional SaaS Margins

Metric	AI-Native Companies	Traditional SaaS
Gross Margin	50–65% (at maturity)	75–85%
COGS Profile	Variable (scales with usage)	Fixed (server + bandwidth)
Unit Economics	Degrades at scale without cost reduction	Improves at scale
Provider Dependency	High (Anthropic, OpenAI, Google)	Low (commodity cloud)
Pricing Power	Limited (commodity inference)	Moderate (switching costs)

The Fundamental Problem

AI-native COGS scales linearly with users. Every new Cursor subscriber adds inference calls. Every new Perplexity query costs money. Traditional SaaS marginal cost approaches zero. AI-native marginal cost does not. Three solutions: build your own models, switch for cost, or find a structurally cheaper provider.

Cost Benchmark Table

Company	Revenue	Est. Inference Cost	Cost/Rev %	Response
Cursor	~$1B ARR	$650M/yr to Anthropic³	~65–130%	Built Composer LLM, credit pricing
Perplexity	$148M ARR	$100–150M+	33–67%	Hybrid in-house + API
ElevenLabs	$200M ARR	$30–60M	15–30%	Own GPU clusters
Character.ai	$32M	$10–20M	30–60%	Switched to open-source models
Midjourney	$200M+ ARR	$8.4M/yr	~4%	Switched to Google TPUs

Section 10

Procurement Criteria & Deal Structures

Priority Ranking by Buyer Type

Priority	AI-Native Startups	Enterprise	Defense/Gov
#1	Cost	Security⁸	Data sovereignty
#2	Latency / reliability	Cost	Security / audit
#3	Multi-model support	Accuracy	SLA uptime
#4	Scalability	Compliance	Cost
#5	API compatibility	Observability	Latency

Five Purchase Models

Model	Buyer	Commitment	Discount	Contract
Pay-as-you-go	SMB / Developers	None	0%	$1K–$50K/yr
Volume Commit	Mid-market	Annual token volume	15–30%³⁶	$50K–$500K/yr
Reserved Capacity	Enterprise	Dedicated instances	20–35%	$500K–$10M+/yr
On-Premises	Gov / Defense / Regulated	Multi-year license	Custom	Multi-million ACV
Marketplace	Cloud-native enterprise	Via AWS/Azure/GCP credits	EDP-bundled	Varies

Procurement Cycles

Segment	Cycle Length	Decision Maker	AI Conversion Rate
Developer / SMB	Days to weeks	Individual / team lead	Self-serve
Mid-market	30–90 days	IT + Finance approval	47% (vs 25% trad. SaaS)¹
Enterprise	6–12 months	CIO + CISO + Legal	Enterprise sales motion
Government	12–24+ months	Contracting officer + ATO	FedRAMP required
AI-Native ($100M+ ARR)	Weeks to months	CEO / CTO direct	Direct relationship

Key Insight

AI deals convert at 47% vs. 25% for traditional SaaS.¹ Fastest-selling software category in history. Bottleneck is not demand. It is compliance, security review, and procurement cycles.

Section 11

Provider Selection Matrix

LLM API Market Share (Enterprise, 2025)

40%

Anthropic (up from 12%)³⁷

27%

OpenAI (down from 50%)³⁷

21%

Google (up from 7%)³⁷

12%

Others (Meta, Cohere, Mistral)

The Anthropic Surge

Anthropic: 12% to 40% share in three years.³⁷ OpenAI dropped from 50% to 27%. In coding: Anthropic holds 54%. Most dramatic share shift in enterprise software history. Enterprises switch fast when quality improves.

Provider Usage by Buyer Segment

Buyer	Anthropic	OpenAI	Google	Meta/OSS	Self-Hosted
Cursor	Primary	Secondary	Secondary	—	Building Composer
Lovable	Primary	Secondary	—	—	—
Bolt.new	Primary	—	—	—	—
Harvey	—	Primary	—	—	—
Sierra	Multi	Multi	—	Llama	—
JPMorgan	Via gateway	Via gateway	—	—	Compliance gateway
Goldman	Via gateway	Via gateway	Via gateway	—	Compliance gateway
Epic	—	Azure OpenAI	—	—	On-prem EHR
Palantir	AIP	AIP	—	Llama	Air-gapped deploy
Midjourney	—	—	TPU primary	—	Own models
Cohere	—	—	—	—	85% on-prem

Multi-Provider Is Now Default

89% of enterprises adopt multi-cloud strategies. 37% use 5+ models.⁸ Even OpenAI signed a $38B AWS deal alongside Microsoft.³⁸ Everyone hedges. Entry point for independents: become one provider in the stack.

Section 12

Strategic Implications for Independent Providers

Analyst Verdict: AI-Native ICP

Ideal customer: AI-native at $100M+ ARR with inference as primary COGS. Cursor ($1B ARR, 65–130% cost/revenue), Replit ($252M, $1B target), Sierra ($150M, multi-provider), Lovable ($200M) all fit. 30% cut on Cursor’s $650M bill = $195M savings. That is the pitch.

Bear Case: AI-Natives Self-Host

Cursor is building its own LLM. Perplexity self-hosts. Midjourney runs on TPUs. The largest buyers are actively reducing third-party dependency. If AI-natives bring inference in-house at scale, the addressable market shrinks fast. Cost advantage alone may not retain customers beyond 12–18 months.

Analyst Verdict: Sovereign Inference Wedge

Sovereign cloud: $154B (2025) to $823B by 2032.³⁹ JPMorgan, Goldman, Epic, Palantir need compliance-first inference with data residency. Air-cooled modular infrastructure enables on-premises deployment. No hyperscaler matches this cost structure.

Analyst Verdict: Open-Source Threat

Open-source compresses margins across the stack. Character.ai switched to DeepSeek and Llama. Midjourney cut 65% with TPUs. Cursor builds its own model. The biggest buyers will reduce API dependency. Providers must offer more than cheaper tokens: latency, compliance, dedicated capacity.

Analyst Verdict: Timing Risk

Token pricing deflates ~10x per year. $10/M tokens today = $1/M next year. Cost advantage must be structural: energy, hardware efficiency. If competitors match pricing faster, the window closes. Speed to market beats perfection.

Target Segments for Independent Providers

Segment	Target Companies	Value Proposition	Priority
AI-Native Coding	Cursor, Replit, Lovable, Bolt.new	30–50% cost reduction + low-latency SLAs	P0
AI-Native Search/Agent	Perplexity, Sierra, Glean	Dedicated capacity + cost optimization	P0
Financial Services	JPMorgan, Goldman Sachs	Sovereign deployment + compliance gateway	P1
Healthcare	Epic ecosystem hospitals	HIPAA-compliant on-prem + hybrid	P2
Defense/Gov	Scale AI, Palantir ecosystem	Air-gapped sovereign inference	P2

Next 90 Days: Action Priorities

Action	Target	Why Now
Sign first design partner	Cursor, Replit, or Lovable	Highest COGS pain. Fastest decision cycle.
Achieve OpenAI API parity	LiteLLM / LangChain compatible	89% multi-cloud. Zero switching cost is table stakes.
Publish latency benchmarks	Sub-200ms TTFT on Llama 70B	Coding AI demands real-time. Prove it publicly.
Begin SOC2 / FedRAMP prep	SOC2 Type II within 6 months	Enterprise sales blocked without certification.
Scope sovereign deployment	One EU or financial services pilot	Sovereign cloud grows 5.3x by 2032. First-mover advantage.

Bottom Line

Demand is real: $37B in GenAI spend. 76% buying. AI-natives with negative margins need cost cuts. Regulated enterprises need sovereign deployment. Low-cost energy operators serve both. The question is speed of execution vs. token pricing deflation.

Section 13

Buyer Decision Framework

What Triggers Provider Switching

Trigger	Example	Frequency
Cost spike	Cursor’s API costs doubling monthly²⁹	Very Common
Quality improvement elsewhere	Anthropic gaining 28pp market share in 3 years³⁷	Very Common
Compliance requirement	New regulation mandating data residency	Common
Reliability failure	Provider outage impacting production	Common
Vendor concentration risk	Board/investor pressure to diversify	Growing

Build vs. Buy Economics

Factor	Build (Self-Host)	Buy (API/Managed)
Upfront Cost	$1M–$50M+ (GPU clusters)	$0 (pay per token)
Per-Token Cost	Lower at scale	Higher but predictable
Time to Production	3–12 months	Days to weeks
Team Required	ML engineers, infra, ops	API developers only
Model Flexibility	Full (any open-source)	Provider’s catalog only
Compliance Control	Full	Depends on provider
Best For	>$5M/yr inference spend	<$5M/yr inference spend

Lock-In Factors

Factor	Lock-In Strength	Mitigation
Fine-tuned models on proprietary platforms	High	Use open-source base models (Llama, Mistral)
Integration depth into workflows	High	OpenAI-compatible API abstraction
Data stored in provider infra	Medium	Negotiate data portability clauses
Team expertise on specific APIs	Medium	Abstraction layers (LangChain, LiteLLM)
Contract/volume commitments	Low-Medium	Multi-provider routing

Integration Strategy for Independent Providers

New providers must be OpenAI API-compatible from day one. 89% multi-cloud. 37% use 5+ models. LiteLLM/LangChain compatibility is table stakes. Position as one provider in the stack. Reduce switching costs to zero. Win on cost and latency.

Section 14

Methodology & Sources

Research Methodology

This report synthesizes data from 85+ sources across five categories.

Source Type	Count	Examples
Industry Reports	12	Menlo Ventures, A16z, Gartner, IDC, MarketsandMarkets
Company Filings / IR	18	SEC filings, earnings calls, investor presentations
Press Coverage	30+	TechCrunch, CNBC, Bloomberg, CoinDesk
Analyst Research	10	Stanford HAI, Sacra, Foundamental
Company Disclosures	15+	Blog posts, product announcements, pricing pages

Data Freshness

All data current as of February 21, 2026. Valuations reflect most recent rounds or market data. Revenue figures annualized from latest disclosed quarter.

Limitations

Inference costs for private companies are derived estimates. Actual costs may differ.
Cursor’s $650M Anthropic spend is a Foundamental estimate. Not confirmed by Anysphere.
$73B financial services and $45B healthcare are total AI, not inference-specific.
Menlo’s $12.5B API figure captures Mercury bank data. True market is larger.
Sierra, Harvey, and others do not disclose gross margins. Cost estimates inferred.
Goldman AI spend undisclosed. Architecture inferred from platform descriptions.
Epic’s 85% AI adoption is self-reported marketing data.

Disclaimer

For strategic analysis only. Not investment advice. Data from public sources as of Feb 2026. Accuracy not guaranteed. Forward-looking statements carry execution risk.

References & Sources

[1] Menlo Ventures, “2025 State of Generative AI in the Enterprise,” 2025. Enterprise GenAI spend: $37B (3.2x YoY). Build vs buy: 76% purchasing. menlovc.com
[2] MarketsandMarkets and Menlo Ventures analysis. Inference = ~50% of AI compute in 2025, crossing to 55% in 2026. First year inference exceeds training spend.
[3] Foundamental analysis of Cursor/Anysphere economics. Estimated ~$650M/yr paid to Anthropic vs ~$500M revenue (June 2025). wheresyoured.at; mktclarity.com
[4] Menlo Ventures, 2025. Build vs buy shift: 47% built internally (2024) → 24% built internally (2025). 76% now purchasing.
[5] Menlo Ventures, 2025. Foundation model APIs = $12.5B (70% of $18B infrastructure layer). Captures Mercury bank customer data.
[6] Market Clarity 2026 forecast. Financial services: $73B total AI. Healthcare: $45B+. Manufacturing: $16.7B. Retail: $17.8B. mktclarity.com
[7] Menlo Ventures, 2025. Vertical AI breakdown: Healthcare $1.5B, Legal $650M, Government $350M, Creator $360M (of $3.5B total vertical AI).
[8] A16z Enterprise AI CIO Survey, 2025. Average LLM spend: $7M (up from $4.5M). 37% use 5+ models (up from 29%). Security = #1 priority. a16z.com
[9] Cursor/Anysphere funding timeline. $200M ARR (Mar 2025), $500M ARR (Jun 2025), ~$1B ARR (Nov 2025). techcrunch.com
[10] MarketsandMarkets, “AI Inference Market.” 2025: $106B. 2030: $255B. CAGR: 19.2%. marketsandmarkets.com
[11] Menlo Ventures, 2025. Coding AI = $4.0B (55% of departmental AI spend). IT Ops $700M, Marketing $660M, Customer Success $630M.
[12] Perplexity raised $200M at $20B valuation, September 2025. Total funding ~$1.22B. techcrunch.com
[13] Lovable (formerly GPT Engineer) raised $330M Series B at $6.6B valuation, December 2025. $200M ARR (Nov 2025). techcrunch.com; bloomberg.com
[14] ElevenLabs: $200M ARR (Sep 2025), doubled from $100M in nine months. #5 in A16z enterprise AI spend ranking. economyinsights.com
[15] Replit nearing $9B valuation (Jan 2026), up from $3B (Sep 2025). $252M ARR (Oct 2025). bloomberg.com
[16] Harvey AI: $8B valuation (Dec 2025), raising at $11B (Feb 2026). 200+ law firms. techcrunch.com
[17] Sierra AI raised $350M at $10B valuation, September 2025. $100M ARR in 21 months. techcrunch.com; sierra.ai
[18] Runway raised $315M at $5.3B valuation, February 2026. techcrunch.com
[19] Glean Series D at $7.2B valuation, June 2025. $270M ARR. Menlo Ventures data.
[20] Midjourney switched to Google TPUs in 2024. Cost: $2M/month → $700K/month (65% reduction). Payback: 11 days.
[21] Character.ai valuation declined from $2.5B to ~$1B. Revenue: $32.2M (2025). Founders left for Google. Switched to open-source models.
[22] Bolt.new (StackBlitz) raised at $700M valuation, January 2025. $105.5M Series B. bloomberg.com
[23] JPMorgan: $18B total tech budget, ~$2B dedicated to AI. 200–250K employees on LLM Suite. 450+ AI use cases in production. $1.5–2.0B annual AI value delivered.
[24] Epic Systems: 85% of customers live on generative AI. “Insights” feature used 16M+ times/month (3x growth). Azure OpenAI Service is primary inference provider.
[25] Palantir 2025 revenue: $4.4B (+53% YoY). Q3 TCV: $2.76B (+151% YoY). AIP deploys across cloud, GovCloud, and air-gapped environments.
[26] Scale AI: $300M+ cumulative DoD contracts. $250M JAIC, $99M Army R&D, $41–100M TS network deal (Sep 2025).
[27] Tesla AI5 chip: 40x improvement over AI4. $16.5B Samsung manufacturing deal. TSMC + Samsung at U.S. fabs. Mass production H2 2026. Dojo 3 restarted Jan 2026.
[28] Waymo: $16B valuation (2025 round). Part of Alphabet $175–185B 2026 capex. 15M trips completed. Hybrid: Teacher model on GCP, Student model on-device.
[29] Cursor AWS spend: $12.6M/month (June 2025), doubling monthly. AWS = 17.5–28% of revenue. Multi-year agreements with Anthropic, OpenAI, xAI, Google. cnbc.com
[30] A16z AI Application Spending Report. ElevenLabs ranked #5 in actual enterprise AI application spend. a16z.com
[31] Poe (Quora): $75M dedicated raise from A16z (Jan 2024). ~18M MAUs. Aggregates OpenAI, Anthropic, Google, Meta, Mistral models. techcrunch.com
[32] Pika Labs: $80M Series B (Spark Capital). $700–900M valuation. ~$85M annualized revenue. 40% enterprise. maginative.com
[33] Augment Code: $252M total funding ($227M Series B). $977M valuation (Apr 2024). Eric Schmidt backed. ~$20M ARR. techcrunch.com
[34] Cohere: $7B valuation (Sep 2025). $240M ARR (Feb 2026). 85% from on-premises enterprise deployment. theaiinsider.tech
[35] Recursion Pharmaceuticals BioHive-2: 504x H100 GPUs. TOP500 #35 globally. NVIDIA $50M investment. Phenom-Beta model for molecular screening.
[36] OpenAI enterprise negotiation data: 15–30% volume discounts. Scale Tier: GPT-4.1 input at $110/day (30K tokens/min). redresscompliance.com
[37] Menlo Ventures, 2025. LLM API market share: Anthropic 40% (from 12% in 2023), OpenAI 27% (from 50%), Google 21% (from 7%). Coding: Anthropic 54%, OpenAI 21%.
[38] OpenAI signed $38B AWS deal alongside Microsoft partnership. Multi-cloud hedging at the model provider level.
[39] Sovereign cloud market: $154B (2025) → $823B (2032). Key drivers: GDPR, HIPAA, government contracts. introl.com
[40] Gartner: worldwide AI spending $1.5 trillion in 2025. gartner.com
[41] IDC: AI infrastructure spending to reach $758B by 2029. Data center systems: +31.7% to $650B in 2026. idc.com
[42] Stanford HAI AI Index 2025: AI now accounts for majority of enterprise tech procurement decisions. hai.stanford.edu
[43] A16z State of AI / OpenRouter 100T token study on model usage patterns. a16z.com
[44] AI inference costs analysis 2026. GPU compute costs and token pricing deflation trends. byteiota.com
[45] Deloitte AI Infrastructure & Compute Strategy 2026. deloitte.com
[46] Anthropic-Deloitte enterprise partnership: first major professional services on-prem deployment. cnbc.com
[47] Windsurf (Codeium) Google acquisition. OpenAI $3B deal unraveled. Google acquired team. techcrunch.com
[48] SaaStr AI gross margins analysis: AI-native companies 50–65% vs traditional SaaS 75–85%. saastr.com
[49] VCs predict enterprise AI consolidation 2026: fewer vendors, larger contracts. techcrunch.com
[50] Cleanlab AI survey: latency and reliability dominant for high-traffic production. Cost dominant for smaller deployments. 62% plan to improve observability.
[51] AWS European Sovereign Cloud: €7.8B investment, launching Germany (late 2025). Microsoft: air-gapped France/Germany deployments.
[52] Palantir Rackspace partnership, Feb 18, 2026: AIP on UK Sovereign Cloud. Anduril integration: AIP + Maven with Lattice + Menace C4 (Dec 2024).
[53] Goldman Sachs AI Assistant: 46,500+ employees, >50% adoption, targeting 100% by end 2026. Multi-provider via compliance gateway (OpenAI, Google, Anthropic). Business Conduct Code revised Jan 2026.
[54] Bloomberg: BloombergGPT (50B parameter), entirely on-premises. Document Search & Analysis: 400M+ documents. Zero external cloud for core financial data.
[55] Stripe: Acquired Metronome (Dec 2025) for usage-based billing. LLM Proxy routes calls and records token usage. 56% of AI customers on hybrid billing.
[56] Scale AI Thunderforge: real-time edge inference for military logistics. CMMC, FedRAMP High, IL5/IL6, TS/SCI clearances.
[57] Cursor launched proprietary “Composer” coding LLM (Oct 2025) to reduce Anthropic dependence. Also launched $200/month “Ultra” tier.
[58] Replit 15.8x ARR growth: $16M (2024) → $252M (Oct 2025). AI agent revenue drives majority. $1B ARR target for 2026.
[59] Bolt.new token consumption: 1.3M tokens/day reported by users for standard apps. $1,000+ cost explosions on complex projects. Tiers: $20–$200/month.
[60] Multi-cloud adoption: 89% of enterprises (2025). Driver: vendor lock-in risk, negotiation leverage, reliability.
[61] Menlo Ventures: 10 products generating $1B+ ARR. 50 products generating $100M+ ARR. AI deals convert at 47% vs 25% traditional SaaS.
[62] Enterprise AI spend growing 75% YoY. Innovation budget allocation dropped from 25% to 7% as AI moves to core operations. A16z, 2025.
[63] Latency requirements: enterprise standard sub-800ms TTFT. Voice AI <200ms. Financial trading: microsecond-level. Per-token decode times vary by model size: smaller models achieve sub-millisecond; larger models typically 1–10ms/token.
[64] SLA requirements: typical minimum 99.9%. Enterprise-grade: 99.99%+. Financial remedies: 5–10% of monthly fees per SLA miss.
[65] AI-specific SLAs now include accuracy rates, hallucination rates, precision scores (beyond uptime).
[66] Anthropic coding market share: 54% (2025). OpenAI: 21%. Menlo Ventures survey.
[67] Broader AI market: $1.5T (Gartner, 2025). $2.0–2.5T (2026, multiple analysts). AI server market 2026: $330B.
[68] Regional distribution: US ~$300B (30–40% global). Europe $70B+ (25%, compliance emphasis). China $27B (8.9%). APAC: 45.7% growth.
[69] Switching costs: fine-tuned models = high lock-in. Integration depth = high. Data stored = medium. Team expertise = medium. Mitigation via open-source and abstraction layers.
[70] Anthropic large accounts: 100K+ annual spend, grew 7x in one year. Menlo Ventures data.
[71] Recursion BioHive-2 operational, TOP500 #35. Enabled Boltz-2 protein structure model training. Google Cloud for burst. Oracle Cloud for overflow.
[72] Epic AI Validation software: hospitals test/monitor models before deployment. PHI requires contractual data processing agreements.
[73] Palantir Apollo: deploys same AIP codebase across AWS GovCloud, Azure Secret, Azure Commercial, on-premises, air-gapped. Average patch time: 3.5 minutes.
[74] Character.ai: stopped developing own models after founders left for Google. Now uses DeepSeek and Meta Llama. Cost reduction via open-source.
[75] Jasper AI: $88M revenue (2025). Declining. Facing competition from ChatGPT Enterprise. Pass-through model (buys OpenAI API, packages for marketers).
[76] Stability AI: ~$1B valuation (down from $4B peak). ~$50M revenue. Financially distressed. March 2025 funding round.
[77] Healthcare vertical AI: $1.5B (Menlo, 2025). Ambient scribes alone: $600M. Total healthcare AI: $45B+ (Market Clarity).
[78] Cohere: 85% revenue from private on-premises deployments (Oracle, RBC, Fujitsu, LG). Multi-year contracts. AMD partnership for compute.
[79] Sierra AI “constellation” multi-provider approach: failover across OpenAI, Anthropic, Meta Llama. No single provider dependency.
[80] OpenAI Scale Tier: GPT-4.1 input units at $110/day (30K tokens/min). Reserved capacity for latency-sensitive production.
[81] Waymo Foundation Model: large Teacher model on GCP, compact Student model on-device. Vehicle maintains full autonomy without cloud. Waymo World Model for synthetic training data.
[82] Tesla Dojo team disbanded Aug 2025. Dojo 3 restarted Jan 2026 using AI5/AI6 architecture. Same AI5 chip powers Optimus robot.
[83] Lovable investors flag unit economics concern. Claude Sonnet 3.5 was enabling model. NVIDIA, Google (Alphabet), Khosla among investors.
[84] Pika enterprise = 40% of revenue. $141M total funding. Compute-intensive video generation on own models.
[85] [Methodology] Research compiled from 85+ public sources, February 2026. All financial data represents most recent disclosed quarter or analyst estimate. Inference cost estimates are derived, not company-confirmed.