Landscape Report — Enterprise AI Inference Demand Side

Enterprise AI Inference Buyers: The $37B Demand-Side Landscape

Cursor • Perplexity • Lovable • ElevenLabs • Replit • Harvey • Sierra • JPMorgan • Epic • Palantir • Tesla

Feb 2026 MinjAI Agents 85+ Sources 14 Sections
Internal — Strategic Intelligence
Section 01

Executive Summary

$37B
Enterprise GenAI Spend (2025)1
55%
Inference Share of AI Compute (2026)2
30+
Companies Profiled
$650M
Cursor’s Annual Inference Bill3
Thesis

This report maps who buys inference, what they pay, and where margins break. AI-native startups spend 28–130% of revenue on inference. That is unsustainable. Any provider offering 30–50% cost reduction has a direct, measurable ROI case.

Enterprise AI spending hit $37B in 2025.1 Inference now exceeds training compute.2 76% of enterprises buy, not build.4

The buyer landscape is not monolithic. Cursor and Perplexity face margin crises. JPMorgan and Epic need compliance-first deployment. Palantir requires air-gapped sovereign infrastructure.

Independent inference platforms can target all three segments. Sovereign-ready deployment serves regulated verticals. A 30–50% cost advantage serves margin-compressed AI-natives.

Buyer Segment Conviction Rankings

SegmentMarket SizeIndependent Provider FitConviction
AI-Native Startups (Tier 1)$12.5B API market5Cost + latencyHighest
Financial Services$73B total AI6Sovereign + complianceHigh
Healthcare$45B total AI6HIPAA + on-premMedium-High
Defense / Government$350M+ vertical AI7Air-gapped sovereignHigh
AI-Native Startups (Tier 2)Included aboveCost optimizationMedium
Automotive / Edge$16.7B mfg AI6Limited (edge-first)Low
Section 02

Why Demand Matters Now

3.2x
YoY Enterprise AI Spend Growth1
76%
Enterprises Buying (Not Building)4
37%
Use 5+ Models8

Three structural shifts make demand-side analysis urgent now.

Shift 1: Inference surpasses training. 2023: inference was 33% of compute. 2026: it reaches 55%.2 First year inference exceeds training. Every new app deployed increases inference demand permanently.

Shift 2: Build-to-buy reversal. In 2024, 47% of enterprises built internally. By 2025, only 24% build.4 Enterprises want governance, audit trails, and compliance. They will pay for it.

Shift 3: Multi-model is standard. 37% of enterprises use 5+ models, up from 29%.8 Average enterprise LLM spend: $7M, up from $4.5M.8 Model diversity creates demand for provider-agnostic platforms.

Bottom Line

Inference is no longer a cost center. It is the primary revenue driver for a new company class. Whoever controls inference economics controls the AI application layer.

Inference Demand Milestones

2023
Inference = 33% of AI compute. Enterprise GenAI spend: $1.7B.1
2024
Inference reaches 50%. Enterprise spend jumps to $11.5B. Cursor crosses $100M ARR.9
2025
Enterprise GenAI: $37B. Foundation model APIs: $12.5B. Build-to-buy shift accelerates.1
2026
Inference crosses 55% of compute. First year inference > training. Projected $20.6B inference spend.2
2030
AI inference market: $255B. CAGR 19.2% from 2025 base of $106B.10

Where Enterprise AI Dollars Go (2025)

CategorySpendShare
Foundation Model APIs$12.5B170% of infra layer
Model Training Infrastructure$4.0B22%
AI Infrastructure (data/orchestration)$1.5B8%
Coding AI$4.0B1155% of dept. AI
IT Operations AI$700M10%
Marketing AI$660M9%
Customer Success AI$630M9%
Section 03

Buyer Landscape Snapshot

Every major inference buyer profiled in this report appears below. It spans AI-natives, enterprises, and government buyers. The common thread: all consume massive inference volume.

Company Segment Valuation / Mcap ARR / Revenue Primary Provider Compliance
Cursor AI-Native T1 $29.3B3 ~$1B ARR Anthropic, OpenAI, xAI, Google SOC2
Perplexity AI-Native T1 $20B12 $148M ARR Multi-provider + in-house SOC2
Lovable AI-Native T1 $6.6B13 $200M ARR Anthropic, OpenAI SOC2
ElevenLabs AI-Native T1 $6.6B14 $200M ARR Own infra + cloud GPU SOC2
Replit AI-Native T1 $9B15 $252M ARR OpenAI, Anthropic, Google SOC2
Harvey AI AI-Native T1 $8–11B16 ~$50–100M OpenAI (primary) Legal compliance
Sierra AI AI-Native T1 $10B17 ~$150M ARR OpenAI, Anthropic, Meta Enterprise SLA
Runway AI-Native T1 $5.3B18 ~$90M ARR Own GPU + GCP/AWS burst SOC2
Glean AI-Native T1 $7.2B19 $270M ARR Multi-LLM abstracted Enterprise
Midjourney AI-Native T2 Private $200M+ ARR Google TPUs20 Basic
Character.ai AI-Native T2 $1B21 $32M Open-source (DeepSeek, Llama) Basic
Bolt.new AI-Native T2 $700M22 ~$40–100M Anthropic (primary) Basic
JPMorgan Financial Public ~$2B AI spend23 AWS + OpenAI/Anthropic Data residency
Goldman Sachs Financial Public Undisclosed Multi-provider via gateway SEC/FINRA
Epic Systems Healthcare Private Undisclosed Azure OpenAI24 HIPAA/HITRUST
Palantir Defense Public ($4.4B rev) $4.4B revenue25 Model-agnostic (AIP) IL6 / Air-gap
Scale AI Defense Private $300M+ gov contracts26 Multi-cloud + GovCloud FedRAMP/TS-SCI
Tesla Automotive Public $16.5B Samsung deal27 Proprietary AI5/AI6 ASIL-D
Waymo Automotive $16B28 Part of Alphabet capex Google TPU + edge ISO 26262
Section 04

AI-Native Startups: The New Buyer Class (Tier 1)

These nine companies represent a new category. Inference is not a cost line. It is their primary COGS. Unit economics depend entirely on inference pricing. They are the most price-sensitive, highest-volume buyers.

The COGS Crisis

Cursor pays ~$650M/year to Anthropic on ~$500M revenue. Negative gross margin at scale.3 This is not an outlier. It is the structural reality of AI-native business models. Every company here faces this problem.

Cursor (Anysphere) — $29.3B Valuation, ~$1B ARR
MetricValue
Valuation$29.3B (Nov 2025, Series D)3
ARR~$1B (crossed Nov 2025)
Total Funding~$2.6B (Accel, Coatue)
Users360K+ paying, 1M+ daily active
Inference ProvidersAnthropic, OpenAI, xAI, Google
AWS Spend$12.6M/month (June 2025), doubling monthly29
Anthropic Bill~$650M/year (est.)3

Verdict: Cursor is the poster child for the inference COGS crisis. At ~$1B ARR, it is the world’s largest coding AI. Inference costs consume 28–130% of revenue.29 Three responses: $200/month “Ultra” tier, credit-based pricing, and a proprietary “Composer” LLM.

Provider Relevance

Cursor is an ideal design partner for independent inference providers. A 30% cut on $650M equals $195M annual savings. Multi-year agreements with four providers. Actively seeks cost optimization.

Perplexity — $20B Valuation, $148M ARR
MetricValue
Valuation$20B (Sep 2025)12
ARR$148M (June 2025 annualized)
Total Funding~$1.22B (SoftBank, NVIDIA, Accel)
Query Volume780M monthly queries
Inference Spend$100–150M+ (estimated)
ProvidersOpenAI, Anthropic, Google + in-house

Verdict: 780M queries per month. GPU procurement is their top use of capital. Revenue: $20/month Pro, $200/month Max, enterprise contracts. Inference cost as percent of revenue: estimated 33–67%.

Risk Factor

Perplexity runs hybrid in-house and API models. Self-hosting reduces API dependency. The opportunity is dedicated capacity, not pay-as-you-go API.

Lovable — $6.6B Valuation, $200M ARR
MetricValue
Valuation$6.6B (Dec 2025, Series B)13
ARR$200M (Nov 2025)
Total Funding~$500M+ (CapitalG, Menlo, NVIDIA)
ProvidersAnthropic (Claude Sonnet), OpenAI

Verdict: Fastest-growing “vibe coding” tool. Investors flag unit economics as a concern. Every new customer compresses gross margins. No gross margin disclosed.

Claude Sonnet 3.5 enabled Lovable’s launch. Anthropic dependency is high. This creates both cost risk and switching opportunity.

ElevenLabs — $6.6B Valuation, $200M ARR
MetricValue
Valuation$6.6B14
ARR$200M (Sep 2025, doubled from $100M in 9 months)
A16z Rank#5 in enterprise AI application spend30
PricingCharacters/month, $0.06/min overage

Verdict: Voice synthesis requires real-time inference. Latency tolerance: <200ms. One of the most compute-intensive workloads. ElevenLabs likely runs own GPU clusters with cloud burst.

Market Opportunity

Low-latency inference is 10–100x better than web-based alternatives. Voice AI companies need this. Dedicated capacity with predictable costs is the value proposition.

Replit — $9B Valuation, $252M ARR
MetricValue
Valuation$9B (Jan 2026 raise)15
ARR$252M (Oct 2025)
Prior Valuation$3B (Sep 2025)
Target$1B ARR in 2026
ProvidersOpenAI, Anthropic, Google

Verdict: 15.8x growth in one year ($16M to $252M ARR). AI agent revenue drives most of the spike. Replit bills API costs through to users, partially insulating margins. But the $1B ARR target requires massive inference scale.

Multi-provider openness makes Replit a strong design partner candidate.

Harvey AI — $8–11B Valuation, Legal AI Leader
MetricValue
Valuation$8B (Dec 2025); raising at $11B (Feb 2026)16
Recent Funding$160M (a16z), $300M Series E ($5B)
Customers200+ top law firms, Big Four, Fortune 500 legal
ProviderOpenAI (primary partnership)

Verdict: Legal AI has the highest per-seat contract values. 200+ law firms generate multi-million-dollar ACV deals. OpenAI dependency is high. Compliance is strict: privilege, data residency, audit trails.

Compliance Angle

Law firms are the most compliance-sensitive buyers. OpenAI dependency creates concentration risk. Sovereign on-premises deployment is high-value for legal departments.

Sierra AI — $10B Valuation, ~$150M ARR
MetricValue
Valuation$10B (Sep 2025, Series C)17
ARR~$150M (Jan 2026 est.); $100M hit in 21 months
Total Funding$635M
ProvidersOpenAI, Anthropic, Meta (multi-provider)

Verdict: Sierra uses a “constellation” multi-provider approach. No single provider dependency. Built for failover across models. This architecture templates independent provider integration.

Per-resolution pricing means Sierra absorbs inference costs. Multi-model routing optimizes cost vs. quality per query. Openness to alternative providers makes it a strong target.

Runway — $5.3B Valuation, Video Generation Leader
MetricValue
Valuation$5.3B (Feb 2026, Series E)18
Revenue~$90M annualized (June 2025)
Recent Funding$315M (Feb 2026) + $308M (Apr 2025)
Inference ModelOwn GPU clusters + GCP/AWS burst

Verdict: Video generation is the most GPU-intensive inference workload. Credit-based pricing ($0.01/credit). Runs own clusters with cloud burst for peaks. Less addressable by third-party providers.

Glean — $7.2B Valuation, $270M ARR
MetricValue
Valuation$7.2B (June 2025, Series D)19
ARR$270M (late 2025)
Market Share~10% of agent platforms ($750M segment)1
ProvidersMulti-LLM abstracted layer

Verdict: Enterprise search at scale. Glean abstracts the model layer across multiple LLMs. Significant inference costs for knowledge retrieval. Compliance requirements are high (SSO, RBAC, DLP). Addressable by cost-efficient, compliant providers.

Section 05

AI-Native Startups: Emerging Buyers (Tier 2)

Smaller or earlier-stage, but inference demand is growing fast. Several face the same margin compression as Tier 1.

Company Valuation Revenue Provider Note
Midjourney Private (bootstrapped) $200M+ ARR Google TPUs 65% cost cut by switching to TPUs20
Character.ai $1B (down from $2.5B)21 $32M Open-source (DeepSeek, Llama) Compute costs destroyed valuation
Bolt.new $700M22 ~$40–100M Anthropic (Claude) 1.3M tokens/day per user
Poe (Quora) ~$2.5B (Quora total)31 ~$65M Aggregator (all providers) Pass-through; marks up API costs
Pika $700–900M32 ~$85M Own video models (GPU-intensive) 40% enterprise revenue
Augment Code $977M33 ~$20M ARR Undisclosed Eric Schmidt-backed Copilot rival
Cohere $7B34 $240M ARR Self-hosted (AMD + hyperscaler) 85% from on-prem enterprise
Jasper AI Declining $88M OpenAI (GPT-4) Pass-through model; facing ChatGPT threat
Stability AI ~$1B (down from $4B) ~$50M AWS Financially distressed; open-source pivot
Key Insight: Provider Switching Is Real

Midjourney cut inference costs 65% by switching to Google TPUs.20 Before: $2M/month. After: $700K/month. Payback: 11 days. Buyers will switch for cost. Character.ai moved to open-source for the same reason. Provider lock-in is weaker than incumbents believe.

Cautionary Tale: Character.ai

Character.ai peaked at $2.5B. Compute costs + free users + founder departures collapsed it to $1B.21 Inference costs outran monetization. Every AI-native company here faces this risk.

Section 06

Financial Services: The Compliance-First Buyers

Largest enterprise AI vertical: $73B total spend.6 Compliance drives every procurement decision. Data residency, audit trails, and regulatory requirements override cost and speed.

JPMorgan — ~$2B AI Spend, 200K+ Employees on LLM Suite
MetricValue
Total Tech Budget$18B (2025)23
AI Spend~$2B (reclassified as core infra)
Annual AI Value$1.5–2.0B
LLM Suite Users200–250K employees, ~50% daily usage
AI Use Cases450+ in production
ProvidersAWS Bedrock/SageMaker + OpenAI + Anthropic
DeploymentHybrid (private cloud + external API via compliance gateway)

Verdict: JPMorgan is the benchmark enterprise AI buyer. $2B spend. 450+ use cases. Compliance gateway filters all inference requests before data exits. Model-agnostic: OpenAI and Anthropic through controlled gateway.

Market Opportunity

JPMorgan wants model-agnostic inference with data residency control. No single-provider lock-in. Sovereign deployment paired with multi-model support aligns directly. The compliance gateway architecture is the integration point.

Goldman Sachs — Multi-Provider AI Assistant, 46K+ Users
MetricValue
Employee Access46,500+
Adoption>50%; targeting 100% by end 2026
ProvidersOpenAI GPT, Google Gemini, Anthropic Claude
DeploymentPrivate cloud + external API via compliance gateway
ComplianceSEC/FINRA audit trail, prompt filtering, data anonymization

Verdict: Goldman runs model-agnostic. Routes to OpenAI, Google, and Anthropic via private gateway. AI agents handle trade accounting and compliance. Business Conduct Code revised Jan 2026 for AI. Targeting 100% adoption.

Bloomberg — On-Premises BloombergGPT
MetricValue
ModelBloombergGPT (50B parameters, proprietary corpus)
DeploymentFully on-premises / private cloud
New FeaturesDocument Search & Analysis (400M+ documents)
External CloudZero for core financial data products

Verdict: Pure on-premises. Data sovereignty is the business model. Zero external cloud for core products. Not addressable by third-party providers. This is maximum data control.

Financial Services Pattern

All three banks share one architecture: compliance gateway between users and model APIs. No raw data to external models. Model-agnostic routing. Full audit trails. Sovereign-ready inference maps directly to this pattern.

Section 07

Healthcare & Defense/Government

Compliance is not optional here. HIPAA, FedRAMP, ITAR, and IL6 certifications determine who can compete. Procurement cycles: 6–24 months.

Healthcare

Epic Systems — 85% AI Adoption, Azure OpenAI
MetricValue
AI Adoption85% of Epic customers live on generative AI24
AI FeaturesArt, Emmie, Penny copilots + Dragon Copilot
Insights Usage16M+ times/month (3x growth since Nov 2025)
ProviderMicrosoft Azure OpenAI Service (primary)
DeploymentHybrid: on-prem EHR + Azure inference with BAA
ComplianceHIPAA HITRUST

Verdict: Epic dominates healthcare EHR. 260+ health systems. AI inference runs inside Azure’s HIPAA environment. AI Validation software lets hospitals test models before deployment. PHI requires contractual data processing agreements.

Recursion Pharmaceuticals — BioHive-2 Supercomputer
MetricValue
ComputeBioHive-2: 504x H100 GPUs (TOP500 #35)35
NVIDIA Deal$50M investment + collaboration
Cloud PartnersGoogle Cloud (burst), Oracle Cloud (overflow)
ModelPhenom-Beta (molecular screening)
ComplianceGxP Data sovereignty

Verdict: Primary inference on-premises via BioHive-2. Bursts to Google Cloud for parallel screening. Petabyte-scale imaging stays on-prem. Pharma model: on-prem for sensitive data, cloud for burst.

Defense / Government

Palantir — $4.4B Revenue, IL6 Air-Gapped Inference
MetricValue
Revenue$4.4B (2025, +53% YoY)25
Q3 TCV$2.76B (+151% YoY)
US Commercial TCV$1.31B (+342% YoY)
PlatformAIP (model-agnostic: GPT-4o, Claude, Llama)
DeploymentAWS GovCloud, Azure Secret, on-prem, air-gapped
ComplianceIL6 TS/SCI Air-gap

Verdict: AIP deploys the same codebase across commercial, GovCloud, and air-gapped networks. Patch time: 3.5 minutes via Apollo. Rackspace (Feb 2026) adds UK Sovereign Cloud. Anduril integration connects AIP with Lattice.

Competitive Template

Apollo is the competitive template for sovereign inference. Same code, any environment. Model-agnostic. Air-gapped capable. Any sovereign inference provider should study this architecture.

Scale AI — $300M+ Government Contracts
MetricValue
DoD Contracts$300M+ cumulative26
Key Deals$250M JAIC, $99M Army R&D, $41–100M TS networks
Edge ProductThunderforge (real-time military logistics)
ComplianceCMMC FedRAMP High TS/SCI

Verdict: Primarily data infrastructure (labeling, RLHF, evaluation). Thunderforge is the inference play: real-time edge for military logistics. Commercial on multi-cloud. Classified on GovCloud or air-gapped.

Compliance Requirements Matrix

SectorCertificationsDeploymentProcurement Cycle
Healthcare HIPAA HITRUST SOC2 Hybrid (on-prem EHR + cloud inference) 6–12 months
Financial Services SOC2 ISO 27001 GDPR Private cloud + compliance gateway 6–12 months
Defense (Unclass.) FedRAMP CMMC GovCloud 12–18 months
Defense (Classified) IL6 TS/SCI ITAR Air-gapped / on-premises only 18–24+ months
Pharma / Life Sciences GxP HIPAA On-prem primary + cloud burst 12–18 months
Section 08

Automotive & E-Commerce: Edge vs. Cloud

Automotive and e-commerce: the edge inference frontier. Tesla and Waymo run on-device. Amazon and Shopify run cloud at massive scale. Instructive for independent provider positioning.

Edge vs. Cloud Inference Architecture

Application Layer
FSD (Tesla)
Waymo Driver
Amazon Personalize
Shopify Magic
Inference Layer
AI5 Chip (Tesla edge)
On-vehicle compute (Waymo)
AWS SageMaker (Amazon)
Cloud LLM APIs (Shopify)
Compute Infrastructure
Custom Silicon (Tesla AI5/AI6)
Google TPU (Waymo training)
AWS GPU Clusters
Multi-cloud (Shopify)
Data Layer
On-vehicle sensors (no cloud)
LiDAR + camera + radar
User behavior data (cloud)
Product catalog (cloud)
CompanyInference ModelProviderIndependent Provider Fit
Tesla Edge-only (AI5 chip, no external cloud)27 Proprietary (TSMC + Samsung fab) None
Waymo Hybrid (edge driving + GCP training/sim)28 Google TPU + in-vehicle compute None
Amazon Cloud (AWS SageMaker, Bedrock) Internal AWS infrastructure None
Shopify Cloud (multi-provider LLM APIs) Multiple cloud providers Medium
Verdict

Automotive is not addressable. Tesla and Waymo are fully vertical: own silicon, on-device inference. E-commerce is partially addressable: Shopify uses multi-provider APIs. But Amazon runs internal infrastructure. Limited opportunity for independents.

Section 09

Inference Economics: The COGS Crisis

AI-native companies face a structural problem. Inference is their primary COGS. Costs scale linearly with users (or worse). Traditional SaaS: 75–85% gross margins. AI-native: 50–65% at best.

130%
Cursor Inference Cost / Revenue3
33–67%
Perplexity Inference / Revenue
4%
Midjourney After TPU Switch20
75–85%
Traditional SaaS Gross Margin

Inference as % of Revenue

Cursor
~130%
Perplexity
33–67%
Character.ai
30–60%
ElevenLabs
15–30%
Midjourney
4%

AI-Native vs. Traditional SaaS Margins

MetricAI-Native CompaniesTraditional SaaS
Gross Margin50–65% (at maturity)75–85%
COGS ProfileVariable (scales with usage)Fixed (server + bandwidth)
Unit EconomicsDegrades at scale without cost reductionImproves at scale
Provider DependencyHigh (Anthropic, OpenAI, Google)Low (commodity cloud)
Pricing PowerLimited (commodity inference)Moderate (switching costs)
The Fundamental Problem

AI-native COGS scales linearly with users. Every new Cursor subscriber adds inference calls. Every new Perplexity query costs money. Traditional SaaS marginal cost approaches zero. AI-native marginal cost does not. Three solutions: build your own models, switch for cost, or find a structurally cheaper provider.

Cost Benchmark Table

CompanyRevenueEst. Inference CostCost/Rev %Response
Cursor ~$1B ARR $650M/yr to Anthropic3 ~65–130% Built Composer LLM, credit pricing
Perplexity $148M ARR $100–150M+ 33–67% Hybrid in-house + API
ElevenLabs $200M ARR $30–60M 15–30% Own GPU clusters
Character.ai $32M $10–20M 30–60% Switched to open-source models
Midjourney $200M+ ARR $8.4M/yr ~4% Switched to Google TPUs
Section 10

Procurement Criteria & Deal Structures

Priority Ranking by Buyer Type

PriorityAI-Native StartupsEnterpriseDefense/Gov
#1CostSecurity8Data sovereignty
#2Latency / reliabilityCostSecurity / audit
#3Multi-model supportAccuracySLA uptime
#4ScalabilityComplianceCost
#5API compatibilityObservabilityLatency

Five Purchase Models

ModelBuyerCommitmentDiscountContract
Pay-as-you-go SMB / Developers None 0% $1K–$50K/yr
Volume Commit Mid-market Annual token volume 15–30%36 $50K–$500K/yr
Reserved Capacity Enterprise Dedicated instances 20–35% $500K–$10M+/yr
On-Premises Gov / Defense / Regulated Multi-year license Custom Multi-million ACV
Marketplace Cloud-native enterprise Via AWS/Azure/GCP credits EDP-bundled Varies

Procurement Cycles

SegmentCycle LengthDecision MakerAI Conversion Rate
Developer / SMBDays to weeksIndividual / team leadSelf-serve
Mid-market30–90 daysIT + Finance approval47% (vs 25% trad. SaaS)1
Enterprise6–12 monthsCIO + CISO + LegalEnterprise sales motion
Government12–24+ monthsContracting officer + ATOFedRAMP required
AI-Native ($100M+ ARR)Weeks to monthsCEO / CTO directDirect relationship
Key Insight

AI deals convert at 47% vs. 25% for traditional SaaS.1 Fastest-selling software category in history. Bottleneck is not demand. It is compliance, security review, and procurement cycles.

Section 11

Provider Selection Matrix

LLM API Market Share (Enterprise, 2025)

40%
Anthropic (up from 12%)37
27%
OpenAI (down from 50%)37
21%
Google (up from 7%)37
12%
Others (Meta, Cohere, Mistral)
The Anthropic Surge

Anthropic: 12% to 40% share in three years.37 OpenAI dropped from 50% to 27%. In coding: Anthropic holds 54%. Most dramatic share shift in enterprise software history. Enterprises switch fast when quality improves.

Provider Usage by Buyer Segment

BuyerAnthropicOpenAIGoogleMeta/OSSSelf-Hosted
Cursor Primary Secondary Secondary Building Composer
Lovable Primary Secondary
Bolt.new Primary
Harvey Primary
Sierra Multi Multi Llama
JPMorgan Via gateway Via gateway Compliance gateway
Goldman Via gateway Via gateway Via gateway Compliance gateway
Epic Azure OpenAI On-prem EHR
Palantir AIP AIP Llama Air-gapped deploy
Midjourney TPU primary Own models
Cohere 85% on-prem
Multi-Provider Is Now Default

89% of enterprises adopt multi-cloud strategies. 37% use 5+ models.8 Even OpenAI signed a $38B AWS deal alongside Microsoft.38 Everyone hedges. Entry point for independents: become one provider in the stack.

Section 12

Strategic Implications for Independent Providers

Analyst Verdict: AI-Native ICP

Ideal customer: AI-native at $100M+ ARR with inference as primary COGS. Cursor ($1B ARR, 65–130% cost/revenue), Replit ($252M, $1B target), Sierra ($150M, multi-provider), Lovable ($200M) all fit. 30% cut on Cursor’s $650M bill = $195M savings. That is the pitch.

Bear Case: AI-Natives Self-Host

Cursor is building its own LLM. Perplexity self-hosts. Midjourney runs on TPUs. The largest buyers are actively reducing third-party dependency. If AI-natives bring inference in-house at scale, the addressable market shrinks fast. Cost advantage alone may not retain customers beyond 12–18 months.

Analyst Verdict: Sovereign Inference Wedge

Sovereign cloud: $154B (2025) to $823B by 2032.39 JPMorgan, Goldman, Epic, Palantir need compliance-first inference with data residency. Air-cooled modular infrastructure enables on-premises deployment. No hyperscaler matches this cost structure.

Analyst Verdict: Open-Source Threat

Open-source compresses margins across the stack. Character.ai switched to DeepSeek and Llama. Midjourney cut 65% with TPUs. Cursor builds its own model. The biggest buyers will reduce API dependency. Providers must offer more than cheaper tokens: latency, compliance, dedicated capacity.

Analyst Verdict: Timing Risk

Token pricing deflates ~10x per year. $10/M tokens today = $1/M next year. Cost advantage must be structural: energy, hardware efficiency. If competitors match pricing faster, the window closes. Speed to market beats perfection.

Target Segments for Independent Providers

SegmentTarget CompaniesValue PropositionPriority
AI-Native Coding Cursor, Replit, Lovable, Bolt.new 30–50% cost reduction + low-latency SLAs P0
AI-Native Search/Agent Perplexity, Sierra, Glean Dedicated capacity + cost optimization P0
Financial Services JPMorgan, Goldman Sachs Sovereign deployment + compliance gateway P1
Healthcare Epic ecosystem hospitals HIPAA-compliant on-prem + hybrid P2
Defense/Gov Scale AI, Palantir ecosystem Air-gapped sovereign inference P2

Next 90 Days: Action Priorities

ActionTargetWhy Now
Sign first design partner Cursor, Replit, or Lovable Highest COGS pain. Fastest decision cycle.
Achieve OpenAI API parity LiteLLM / LangChain compatible 89% multi-cloud. Zero switching cost is table stakes.
Publish latency benchmarks Sub-200ms TTFT on Llama 70B Coding AI demands real-time. Prove it publicly.
Begin SOC2 / FedRAMP prep SOC2 Type II within 6 months Enterprise sales blocked without certification.
Scope sovereign deployment One EU or financial services pilot Sovereign cloud grows 5.3x by 2032. First-mover advantage.
Bottom Line

Demand is real: $37B in GenAI spend. 76% buying. AI-natives with negative margins need cost cuts. Regulated enterprises need sovereign deployment. Low-cost energy operators serve both. The question is speed of execution vs. token pricing deflation.

Section 13

Buyer Decision Framework

What Triggers Provider Switching

TriggerExampleFrequency
Cost spike Cursor’s API costs doubling monthly29 Very Common
Quality improvement elsewhere Anthropic gaining 28pp market share in 3 years37 Very Common
Compliance requirement New regulation mandating data residency Common
Reliability failure Provider outage impacting production Common
Vendor concentration risk Board/investor pressure to diversify Growing

Build vs. Buy Economics

FactorBuild (Self-Host)Buy (API/Managed)
Upfront Cost$1M–$50M+ (GPU clusters)$0 (pay per token)
Per-Token CostLower at scaleHigher but predictable
Time to Production3–12 monthsDays to weeks
Team RequiredML engineers, infra, opsAPI developers only
Model FlexibilityFull (any open-source)Provider’s catalog only
Compliance ControlFullDepends on provider
Best For>$5M/yr inference spend<$5M/yr inference spend

Lock-In Factors

FactorLock-In StrengthMitigation
Fine-tuned models on proprietary platforms High Use open-source base models (Llama, Mistral)
Integration depth into workflows High OpenAI-compatible API abstraction
Data stored in provider infra Medium Negotiate data portability clauses
Team expertise on specific APIs Medium Abstraction layers (LangChain, LiteLLM)
Contract/volume commitments Low-Medium Multi-provider routing
Integration Strategy for Independent Providers

New providers must be OpenAI API-compatible from day one. 89% multi-cloud. 37% use 5+ models. LiteLLM/LangChain compatibility is table stakes. Position as one provider in the stack. Reduce switching costs to zero. Win on cost and latency.

Section 14

Methodology & Sources

Research Methodology

This report synthesizes data from 85+ sources across five categories.

Source TypeCountExamples
Industry Reports12Menlo Ventures, A16z, Gartner, IDC, MarketsandMarkets
Company Filings / IR18SEC filings, earnings calls, investor presentations
Press Coverage30+TechCrunch, CNBC, Bloomberg, CoinDesk
Analyst Research10Stanford HAI, Sacra, Foundamental
Company Disclosures15+Blog posts, product announcements, pricing pages

Data Freshness

All data current as of February 21, 2026. Valuations reflect most recent rounds or market data. Revenue figures annualized from latest disclosed quarter.

Limitations

Disclaimer

For strategic analysis only. Not investment advice. Data from public sources as of Feb 2026. Accuracy not guaranteed. Forward-looking statements carry execution risk.

References & Sources

  1. [1] Menlo Ventures, “2025 State of Generative AI in the Enterprise,” 2025. Enterprise GenAI spend: $37B (3.2x YoY). Build vs buy: 76% purchasing. menlovc.com
  2. [2] MarketsandMarkets and Menlo Ventures analysis. Inference = ~50% of AI compute in 2025, crossing to 55% in 2026. First year inference exceeds training spend.
  3. [3] Foundamental analysis of Cursor/Anysphere economics. Estimated ~$650M/yr paid to Anthropic vs ~$500M revenue (June 2025). wheresyoured.at; mktclarity.com
  4. [4] Menlo Ventures, 2025. Build vs buy shift: 47% built internally (2024) → 24% built internally (2025). 76% now purchasing.
  5. [5] Menlo Ventures, 2025. Foundation model APIs = $12.5B (70% of $18B infrastructure layer). Captures Mercury bank customer data.
  6. [6] Market Clarity 2026 forecast. Financial services: $73B total AI. Healthcare: $45B+. Manufacturing: $16.7B. Retail: $17.8B. mktclarity.com
  7. [7] Menlo Ventures, 2025. Vertical AI breakdown: Healthcare $1.5B, Legal $650M, Government $350M, Creator $360M (of $3.5B total vertical AI).
  8. [8] A16z Enterprise AI CIO Survey, 2025. Average LLM spend: $7M (up from $4.5M). 37% use 5+ models (up from 29%). Security = #1 priority. a16z.com
  9. [9] Cursor/Anysphere funding timeline. $200M ARR (Mar 2025), $500M ARR (Jun 2025), ~$1B ARR (Nov 2025). techcrunch.com
  10. [10] MarketsandMarkets, “AI Inference Market.” 2025: $106B. 2030: $255B. CAGR: 19.2%. marketsandmarkets.com
  11. [11] Menlo Ventures, 2025. Coding AI = $4.0B (55% of departmental AI spend). IT Ops $700M, Marketing $660M, Customer Success $630M.
  12. [12] Perplexity raised $200M at $20B valuation, September 2025. Total funding ~$1.22B. techcrunch.com
  13. [13] Lovable (formerly GPT Engineer) raised $330M Series B at $6.6B valuation, December 2025. $200M ARR (Nov 2025). techcrunch.com; bloomberg.com
  14. [14] ElevenLabs: $200M ARR (Sep 2025), doubled from $100M in nine months. #5 in A16z enterprise AI spend ranking. economyinsights.com
  15. [15] Replit nearing $9B valuation (Jan 2026), up from $3B (Sep 2025). $252M ARR (Oct 2025). bloomberg.com
  16. [16] Harvey AI: $8B valuation (Dec 2025), raising at $11B (Feb 2026). 200+ law firms. techcrunch.com
  17. [17] Sierra AI raised $350M at $10B valuation, September 2025. $100M ARR in 21 months. techcrunch.com; sierra.ai
  18. [18] Runway raised $315M at $5.3B valuation, February 2026. techcrunch.com
  19. [19] Glean Series D at $7.2B valuation, June 2025. $270M ARR. Menlo Ventures data.
  20. [20] Midjourney switched to Google TPUs in 2024. Cost: $2M/month → $700K/month (65% reduction). Payback: 11 days.
  21. [21] Character.ai valuation declined from $2.5B to ~$1B. Revenue: $32.2M (2025). Founders left for Google. Switched to open-source models.
  22. [22] Bolt.new (StackBlitz) raised at $700M valuation, January 2025. $105.5M Series B. bloomberg.com
  23. [23] JPMorgan: $18B total tech budget, ~$2B dedicated to AI. 200–250K employees on LLM Suite. 450+ AI use cases in production. $1.5–2.0B annual AI value delivered.
  24. [24] Epic Systems: 85% of customers live on generative AI. “Insights” feature used 16M+ times/month (3x growth). Azure OpenAI Service is primary inference provider.
  25. [25] Palantir 2025 revenue: $4.4B (+53% YoY). Q3 TCV: $2.76B (+151% YoY). AIP deploys across cloud, GovCloud, and air-gapped environments.
  26. [26] Scale AI: $300M+ cumulative DoD contracts. $250M JAIC, $99M Army R&D, $41–100M TS network deal (Sep 2025).
  27. [27] Tesla AI5 chip: 40x improvement over AI4. $16.5B Samsung manufacturing deal. TSMC + Samsung at U.S. fabs. Mass production H2 2026. Dojo 3 restarted Jan 2026.
  28. [28] Waymo: $16B valuation (2025 round). Part of Alphabet $175–185B 2026 capex. 15M trips completed. Hybrid: Teacher model on GCP, Student model on-device.
  29. [29] Cursor AWS spend: $12.6M/month (June 2025), doubling monthly. AWS = 17.5–28% of revenue. Multi-year agreements with Anthropic, OpenAI, xAI, Google. cnbc.com
  30. [30] A16z AI Application Spending Report. ElevenLabs ranked #5 in actual enterprise AI application spend. a16z.com
  31. [31] Poe (Quora): $75M dedicated raise from A16z (Jan 2024). ~18M MAUs. Aggregates OpenAI, Anthropic, Google, Meta, Mistral models. techcrunch.com
  32. [32] Pika Labs: $80M Series B (Spark Capital). $700–900M valuation. ~$85M annualized revenue. 40% enterprise. maginative.com
  33. [33] Augment Code: $252M total funding ($227M Series B). $977M valuation (Apr 2024). Eric Schmidt backed. ~$20M ARR. techcrunch.com
  34. [34] Cohere: $7B valuation (Sep 2025). $240M ARR (Feb 2026). 85% from on-premises enterprise deployment. theaiinsider.tech
  35. [35] Recursion Pharmaceuticals BioHive-2: 504x H100 GPUs. TOP500 #35 globally. NVIDIA $50M investment. Phenom-Beta model for molecular screening.
  36. [36] OpenAI enterprise negotiation data: 15–30% volume discounts. Scale Tier: GPT-4.1 input at $110/day (30K tokens/min). redresscompliance.com
  37. [37] Menlo Ventures, 2025. LLM API market share: Anthropic 40% (from 12% in 2023), OpenAI 27% (from 50%), Google 21% (from 7%). Coding: Anthropic 54%, OpenAI 21%.
  38. [38] OpenAI signed $38B AWS deal alongside Microsoft partnership. Multi-cloud hedging at the model provider level.
  39. [39] Sovereign cloud market: $154B (2025) → $823B (2032). Key drivers: GDPR, HIPAA, government contracts. introl.com
  40. [40] Gartner: worldwide AI spending $1.5 trillion in 2025. gartner.com
  41. [41] IDC: AI infrastructure spending to reach $758B by 2029. Data center systems: +31.7% to $650B in 2026. idc.com
  42. [42] Stanford HAI AI Index 2025: AI now accounts for majority of enterprise tech procurement decisions. hai.stanford.edu
  43. [43] A16z State of AI / OpenRouter 100T token study on model usage patterns. a16z.com
  44. [44] AI inference costs analysis 2026. GPU compute costs and token pricing deflation trends. byteiota.com
  45. [45] Deloitte AI Infrastructure & Compute Strategy 2026. deloitte.com
  46. [46] Anthropic-Deloitte enterprise partnership: first major professional services on-prem deployment. cnbc.com
  47. [47] Windsurf (Codeium) Google acquisition. OpenAI $3B deal unraveled. Google acquired team. techcrunch.com
  48. [48] SaaStr AI gross margins analysis: AI-native companies 50–65% vs traditional SaaS 75–85%. saastr.com
  49. [49] VCs predict enterprise AI consolidation 2026: fewer vendors, larger contracts. techcrunch.com
  50. [50] Cleanlab AI survey: latency and reliability dominant for high-traffic production. Cost dominant for smaller deployments. 62% plan to improve observability.
  51. [51] AWS European Sovereign Cloud: €7.8B investment, launching Germany (late 2025). Microsoft: air-gapped France/Germany deployments.
  52. [52] Palantir Rackspace partnership, Feb 18, 2026: AIP on UK Sovereign Cloud. Anduril integration: AIP + Maven with Lattice + Menace C4 (Dec 2024).
  53. [53] Goldman Sachs AI Assistant: 46,500+ employees, >50% adoption, targeting 100% by end 2026. Multi-provider via compliance gateway (OpenAI, Google, Anthropic). Business Conduct Code revised Jan 2026.
  54. [54] Bloomberg: BloombergGPT (50B parameter), entirely on-premises. Document Search & Analysis: 400M+ documents. Zero external cloud for core financial data.
  55. [55] Stripe: Acquired Metronome (Dec 2025) for usage-based billing. LLM Proxy routes calls and records token usage. 56% of AI customers on hybrid billing.
  56. [56] Scale AI Thunderforge: real-time edge inference for military logistics. CMMC, FedRAMP High, IL5/IL6, TS/SCI clearances.
  57. [57] Cursor launched proprietary “Composer” coding LLM (Oct 2025) to reduce Anthropic dependence. Also launched $200/month “Ultra” tier.
  58. [58] Replit 15.8x ARR growth: $16M (2024) → $252M (Oct 2025). AI agent revenue drives majority. $1B ARR target for 2026.
  59. [59] Bolt.new token consumption: 1.3M tokens/day reported by users for standard apps. $1,000+ cost explosions on complex projects. Tiers: $20–$200/month.
  60. [60] Multi-cloud adoption: 89% of enterprises (2025). Driver: vendor lock-in risk, negotiation leverage, reliability.
  61. [61] Menlo Ventures: 10 products generating $1B+ ARR. 50 products generating $100M+ ARR. AI deals convert at 47% vs 25% traditional SaaS.
  62. [62] Enterprise AI spend growing 75% YoY. Innovation budget allocation dropped from 25% to 7% as AI moves to core operations. A16z, 2025.
  63. [63] Latency requirements: enterprise standard sub-800ms TTFT. Voice AI <200ms. Financial trading: microsecond-level. Per-token decode times vary by model size: smaller models achieve sub-millisecond; larger models typically 1–10ms/token.
  64. [64] SLA requirements: typical minimum 99.9%. Enterprise-grade: 99.99%+. Financial remedies: 5–10% of monthly fees per SLA miss.
  65. [65] AI-specific SLAs now include accuracy rates, hallucination rates, precision scores (beyond uptime).
  66. [66] Anthropic coding market share: 54% (2025). OpenAI: 21%. Menlo Ventures survey.
  67. [67] Broader AI market: $1.5T (Gartner, 2025). $2.0–2.5T (2026, multiple analysts). AI server market 2026: $330B.
  68. [68] Regional distribution: US ~$300B (30–40% global). Europe $70B+ (25%, compliance emphasis). China $27B (8.9%). APAC: 45.7% growth.
  69. [69] Switching costs: fine-tuned models = high lock-in. Integration depth = high. Data stored = medium. Team expertise = medium. Mitigation via open-source and abstraction layers.
  70. [70] Anthropic large accounts: 100K+ annual spend, grew 7x in one year. Menlo Ventures data.
  71. [71] Recursion BioHive-2 operational, TOP500 #35. Enabled Boltz-2 protein structure model training. Google Cloud for burst. Oracle Cloud for overflow.
  72. [72] Epic AI Validation software: hospitals test/monitor models before deployment. PHI requires contractual data processing agreements.
  73. [73] Palantir Apollo: deploys same AIP codebase across AWS GovCloud, Azure Secret, Azure Commercial, on-premises, air-gapped. Average patch time: 3.5 minutes.
  74. [74] Character.ai: stopped developing own models after founders left for Google. Now uses DeepSeek and Meta Llama. Cost reduction via open-source.
  75. [75] Jasper AI: $88M revenue (2025). Declining. Facing competition from ChatGPT Enterprise. Pass-through model (buys OpenAI API, packages for marketers).
  76. [76] Stability AI: ~$1B valuation (down from $4B peak). ~$50M revenue. Financially distressed. March 2025 funding round.
  77. [77] Healthcare vertical AI: $1.5B (Menlo, 2025). Ambient scribes alone: $600M. Total healthcare AI: $45B+ (Market Clarity).
  78. [78] Cohere: 85% revenue from private on-premises deployments (Oracle, RBC, Fujitsu, LG). Multi-year contracts. AMD partnership for compute.
  79. [79] Sierra AI “constellation” multi-provider approach: failover across OpenAI, Anthropic, Meta Llama. No single provider dependency.
  80. [80] OpenAI Scale Tier: GPT-4.1 input units at $110/day (30K tokens/min). Reserved capacity for latency-sensitive production.
  81. [81] Waymo Foundation Model: large Teacher model on GCP, compact Student model on-device. Vehicle maintains full autonomy without cloud. Waymo World Model for synthetic training data.
  82. [82] Tesla Dojo team disbanded Aug 2025. Dojo 3 restarted Jan 2026 using AI5/AI6 architecture. Same AI5 chip powers Optimus robot.
  83. [83] Lovable investors flag unit economics concern. Claude Sonnet 3.5 was enabling model. NVIDIA, Google (Alphabet), Khosla among investors.
  84. [84] Pika enterprise = 40% of revenue. $141M total funding. Compute-intensive video generation on own models.
  85. [85] [Methodology] Research compiled from 85+ public sources, February 2026. All financial data represents most recent disclosed quarter or analyst estimate. Inference cost estimates are derived, not company-confirmed.