Landscape Report — Hyperscaler Inference

Hyperscaler Managed Inference Strategies: Google Cloud, AWS, Azure & Oracle

Custom Silicon • Pricing Benchmarks • Enterprise Compliance • Implications for Independent Inference Providers

Feb 2026 MinjAI Agents 87 Sources 14 Sections

Strategic Intelligence Report

Section 01

Market Context

$106–126B

Global AI Inference Market (2025)

$255–537B

Projected Market (2030–2034)

66–75%

Hyperscaler Market Share

10x/Year

Inference Cost Deflation

The global AI inference market is valued at $106–126B in 2025^[1]^[2], projected to reach $255B by 2030 (MarketsandMarkets) to $537B by 2034 (Research & Markets)^[3]. Some analysts project an intermediate milestone of $349B by 2032.^[76] Hyperscalers command 66–75% of the market through bundled enterprise relationships, compliance certifications, and custom silicon cost advantages.^[10]^[68]

Inference Cost Deflation

Inference costs are declining at roughly 10x per year at equivalent model quality^[4]^[79]: GPT-3-equivalent inference fell from $60/M tokens in 2021 to $0.06/M in 2025. Deloitte estimates inference will consume 67% of all AI compute by end of 2026, up from 50% in 2025.^[69] Gartner projects $37.5B in AI-optimized IaaS spending in 2026, with 55% ($20.6B) flowing to inference.^[5]

Capital Concentration

AI infrastructure spending reached $86B in Q3 2025 alone.^[77] The combined 2025–2026 capex commitments of the four hyperscalers in this report exceed $500B. Sequoia's "$600B question"^[80] remains unresolved: whether inference revenue will justify this capital expenditure. SDxCentral argues that inference, not training, will be the defining workload of 2026.^[78]

Why This Report Matters

For independent inference providers, understanding hyperscaler strategy is existential. Custom silicon (TPU, Trainium, Maia) creates structural cost floors that NVIDIA-dependent providers cannot match. But hyperscaler one-size-fits-all approaches leave durable gaps: sovereign deployment, low-latency SLAs, BYOM flexibility, and edge inference. With 89% of enterprises using multi-cloud^[66] and sovereign cloud spending projected at $80B in 2026^[54], the addressable market for specialized providers remains substantial.

Section 02

Executive Summary

$85B+

Combined Quarterly Cloud Revenue

$500B+

Combined Capex (2025–26)

3 of 4

Building Custom Silicon

180K+

Enterprise AI Customers (est.)*

*Estimated: AWS 100K+ Bedrock orgs^[30] + Azure 80K enterprise AI customers^[38]. Overlap likely; treat as directional.

Provider	Cloud Revenue	AI Growth Signal	Custom Silicon	Key Product	Enterprise Customers
Google Cloud	$17.7B/qtr (+48%)	200%+ AI revenue growth	TPU Trillium/Ironwood	Vertex AI (200+ models)	Midjourney, Shopify, GM
AWS	$35.6B/qtr (+24%)	100K+ Bedrock orgs	Trainium2/3	Bedrock + SageMaker	Robinhood, Carrier
Azure	$13B AI ann. (+175%)	80% Fortune 500	Maia 200 (3nm)	Microsoft Foundry (1900+ models)	OpenAI, Air India
Oracle	$8.0B cloud/qtr (+34%)	IaaS +68% growth	None (NVIDIA/AMD)	OCI AI Services	SoftBank, OpenAI

Key Finding

Three distinct strategies have emerged. Custom silicon leaders (Google, AWS) build from chip to API. The partnership maximizer (Azure) leverages OpenAI exclusivity and the broadest model catalog. The scale arbitrageur (Oracle) offers raw GPU at the lowest price with no custom silicon, betting on sheer capacity and sovereign deals.

Sources: Alphabet Q4 FY2025^[6], AWS Q4 FY2025^[7], Azure OpenAI Statistics^[8], Oracle Q2 FY2026^[9]

Section 03

Landscape Snapshot (Side-by-Side)

Dimension	Google Cloud	AWS	Azure	Oracle
Founded	2008	2006	2010	1977
HQ	Mountain View, CA	Seattle, WA	Redmond, WA	Austin, TX
Cloud Revenue (Qtr)	$17.7B (+48%)	$35.6B (+24%)	~$25.6B (Azure +39%)	$8.0B (+34%)
AI Revenue Signal	200%+ YoY growth	Multi-B$ Bedrock ARR	$13B AI annual (+175%)	IaaS +68% YoY
Capex (2025/2026)	$85B (2025)	$200B (2026 planned)	$150B (FY2026 annualized)	$50B (FY2026)
Custom Silicon	TPU v6e (GA) / v7 (announced)	Trainium2 (1.4M) / T3	Maia 200 (Jan 2026)	None
GPU Fleet	NVIDIA A3/A3Ultra + TPU	P5/P5e/P5en + Trainium	ND H100/H200 + Maia	NVIDIA H100/H200/B200
Inference Platform	Vertex AI	Bedrock + SageMaker	Microsoft Foundry + OpenAI Service	OCI AI Services
Model Catalog	200+ (Gemini, open-source)	~100 providers (Nova, Claude, Llama)	1900+ (OpenAI, Claude, Llama)	Growing (Llama 4, Cohere, Grok, OCI GenAI)
Key Customers	Midjourney, Shopify, GM, Citibank	Robinhood, OPLOG, Carrier	OpenAI, Air India, H&R Block	SoftBank, OpenAI (Stargate), Uber
Compliance	SOC2, HIPAA, FedRAMP, ISO 27001, PCI DSS	SOC2, HIPAA, FedRAMP, ISO 27001, PCI DSS	SOC2, HIPAA, FedRAMP, ISO 27001, PCI DSS	SOC2, HIPAA, FedRAMP, ISO 27001, PCI DSS
BYOM	Yes (Vertex AI Endpoints)	Yes (SageMaker, Bedrock Custom)	Yes (Microsoft Foundry Managed Compute)	Yes (OCI Data Science)
Regions	40+	34+	60+	48+
Market Share	~12%	~30–31%	~23%	~3% (fastest growth)

Note: Azure "Cloud Revenue" (~$25.6B/qtr estimated) reflects Microsoft Cloud at $51.5B/qtr with Azure growing +39% YoY (Q2 FY2026). Azure "$13B AI annual" (Section 02) is the AI-specific subset. These are different metrics; the AI-specific figure grows faster (+175% YoY) because it's emerging from a smaller base.

Sources: Synergy Research Cloud Market Share Q4 2025^[10], Oracle Q2 FY2026 Earnings^[11]

Section 04

Custom Silicon — “The Great Decoupling”

3 of 4

Hyperscalers With Custom Silicon

50%+

Internal Workloads on Custom ASICs

1.4–2x

Cost Efficiency vs NVIDIA GPUs

75–80%

NVIDIA Share by End 2026 (from 95%)

Custom Silicon Comparison

Dimension	Google TPU	AWS Trainium	Azure Maia	Oracle
Current Gen	Trillium (v6e)	Trainium2/3 (T3 GA)	Maia 200 (internal)	N/A (NVIDIA/AMD)
Next Gen	Ironwood (v7, GA early 2026)	Trainium4 (announced)	TBD	AMD MI450 (Q3 2026)
Process	N/A	N/A	TSMC 3nm	N/A
Transistors	N/A	N/A	140B+	N/A
Key Claim	4.7x compute vs v5e	30–40% better vs P5e	3x FP4 of Trainium3	Largest NVIDIA clusters
Chips Deployed	Millions (est.)	1.4M Trainium2	Internal (Des Moines), not yet GA	131K–800K GPU superclusters
Pricing	$0.39/chip-hr (v6e CUD)	~$4.80/hr (Trainium2)	30% better $/perf	Market-rate GPU

Full-Stack Architecture

Layer 4: Application

Vertex AI API

Bedrock API

Microsoft Foundry API

OCI GenAI API

Layer 3: Platform

Model Garden, GKE Inference Gateway

SageMaker, Bedrock AgentCore

Azure OpenAI Service, Foundry Models

OCI AI Services, Data Science

Layer 2: Software

JAX/XLA, Pathways

Neuron SDK, NeMo

Maia SDK, Triton Compiler

CUDA, ROCm

Layer 1: Silicon

TPU v6e/v7 + NVIDIA

Trainium2/3 + Inferentia2

Maia 200 + NVIDIA

NVIDIA H100/H200/B200 + AMD MI355X

The NVIDIA Dependency Equation

Custom silicon creates 30–50% cost advantages for high-volume inference. But NVIDIA retains dominance in training and frontier workloads. The emerging architecture is hybrid: custom ASICs for high-volume inference, NVIDIA GPUs for training and new model onboarding. Oracle's lack of custom silicon is both a weakness (no cost floor advantage) and a strength (full NVIDIA/AMD compatibility, no software migration burden).

Why Custom Silicon Matters for Independents

NVIDIA-dependent providers (CoreWeave, Lambda, Crusoe, and other independents) face a structural cost floor. Custom ASICs are 1.4–2x more cost-efficient for inference at scale, meaning hyperscalers running TPU v6e or Trainium2 can offer the same inference workload at 30–50% lower cost than a provider running NVIDIA H100s.

The key dynamics shaping this landscape:

Training vs. Inference Split: Custom ASICs excel at inference (fixed model, optimized kernels). NVIDIA retains the training advantage through CUDA ecosystem maturity and multi-GPU scaling (NVLink, NVSwitch).
Software Lock-in Eroding: OpenAI's Triton compiler is enabling hardware-agnostic kernel development, eroding NVIDIA's CUDA moat. JAX/XLA already abstracts across TPU and GPU.
Predicted Market Structure: NVIDIA dominates training (80%+ share through 2027). Custom ASICs take over high-volume inference (50%+ of hyperscaler inference by end 2026). Independent providers must find cost advantages through operational efficiency, not silicon.

Sources: The Great Decoupling^[12], NVIDIA's Blackwell Moat^[13], Trillium GA Blog^[14], AWS Trainium docs^[15], Maia 200 blog^[16]

Section 05

Google Cloud Profile

$17.7B

Cloud Revenue Q4 2025

+48%

YoY Growth

78%

Serving Cost Reduction (2025)

$240B

Cloud Backlog^[6]

Vertex AI Model Garden

Google Cloud's Vertex AI platform provides access to 200+ models including the Gemini family (3.1 Pro, 3 Flash, 2.5 Flash-Lite), partner models (Claude, Llama), and open-source models. The Model Garden serves as a single pane for model discovery, deployment, and management. Google's first-party Gemini models are the primary differentiator, offering competitive pricing with strong reasoning capabilities.^[17]

TPU Lineup

Google's TPU evolution spans six generations: v5e (cost-optimized inference), v5p (training-optimized), Trillium v6e (GA, 4.7x compute vs v5e), and the inference-optimized Ironwood v7 (GA early 2026, 192GB HBM3e, 10x peak performance over v5p, 42+ exaflops per pod). The v6e is available at $0.39/chip-hour on Committed Use Discounts. Vertex AI usage grew 20x YoY, and the cloud backlog reached $240B (55% sequential increase).^[14]^[19]

GKE Inference Gateway

GA since September 2025, the GKE Inference Gateway reduces serving costs by 30%, tail latency by 60%, and improves throughput by 40%. It integrates NVIDIA NeMo Guardrails for safety and the Model Optimizer for automated routing across model variants. GKE Agent Sandbox reduces cold-start by ~90%.^[20]^[81]

Gemini 3 Model Family

Google's Gemini model family has rapidly evolved to its third generation. Gemini 3.1 Pro (released February 2026) is the current frontier model, while Gemini 3 Flash offers strong mid-tier performance. The legacy 2.5 Flash-Lite remains available as a budget option at $0.10/M input tokens, the cheapest first-party model from any hyperscaler.^[86]

Pricing

Model	Input / 1M Tokens	Output / 1M Tokens
Gemini 2.5 Flash-Lite (legacy)	$0.10	$0.40
Gemini 3 Flash	$0.50	$3.00
Gemini 3.1 Pro	$2.00	$12.00
TPU v6e (chip-hour)	$0.39 (CUD)	$1.20 (on-demand)

Customer Wins

Customer	Use Case	Result
Midjourney	Image generation inference	$16.8M/yr savings (monthly spend $2.1M to <$700K)
Shopify	Claude on Vertex AI	Sidekick AI commerce assistant
Sabre	Gemini + Agent Builder	Airline retailing AI
BMC	Vertex AI agents	Autonomous enterprise IT

Key Risk: Google Cloud

TPU ecosystem lock-in is the double-edged sword. Models optimized for TPU (via JAX/XLA) require significant porting effort to run on NVIDIA or other silicon. Enterprises wary of vendor lock-in may prefer the portability of NVIDIA-based platforms. Additionally, Google Cloud's 12% market share means fewer enterprise integration partners and a thinner ecosystem than AWS or Azure.

GKE Inference Gateway: Architecture & Competitive Implications

The GKE Inference Gateway matters because it attacks the three cost drivers of serving at scale: idle compute, cold starts, and suboptimal routing. Architecturally, it sits between the load balancer and model backends, making routing decisions based on real-time KV cache utilization and request priority.

Capability	Mechanism	Why It Matters for Independents
Model Multiplexing	Routes across TPU + GPU backends dynamically	Requires custom silicon fleet; NVIDIA-only providers can't replicate
KV Cache-Aware Routing	Steers requests to backends with warm caches	Reduces redundant computation; open-source routers lack this
Priority Scheduling	Queues by SLA tier (latency vs throughput)	Enables premium tiers; most independents offer flat SLAs
Agent Sandbox	Pre-warmed containers for agentic workloads	90% cold-start reduction; critical as agent adoption grows

The competitive implication: Google is building inference optimization into the platform layer, not just the silicon layer. Independents must match this routing intelligence through software even without custom silicon.

Threat to Independents

Google's 78% serving cost reduction in 2025 is the most aggressive cost deflation of any hyperscaler. Combined with TPU v6e at $0.39/chip-hour, Google can offer inference at cost floors that NVIDIA-dependent providers cannot match. Ironwood (v7), now GA with 10x peak performance over v5p, will extend this advantage further. The GKE Inference Gateway achieved 35–52% TTFT latency improvements and doubled prefix cache hit rates to 70%.

Sources: Vertex AI Pricing^[17], Gemini API Pricing^[18], Trillium GA Blog^[19], GKE Inference Gateway Blog^[20], Google Cloud Next 2025^[21], Alphabet Q4 Earnings^[22], Google Cloud Customers^[23], Midjourney TPU migration^[24]

Section 06

AWS Profile

$35.6B

AWS Revenue Q4 2025

100K+

Bedrock Organizations

1.4M

Trainium2 Chips Deployed

$200B

Planned Capex

Amazon Bedrock

Amazon Bedrock is the managed inference juggernaut: 100K+ organizations, multi-billion dollar ARR, 4.7x customer growth in one year, and 150% QoQ spending increase. Available models span Nova 2 (Amazon's first-party), Claude (Anthropic), Llama 4 (Meta), Mistral, Cohere, Google, OpenAI, and NVIDIA. Intelligent Prompt Routing (GA) dynamically routes to the cheapest model maintaining quality, delivering 30–60% cost savings.^[25]

re:Invent 2025 Launches

At re:Invent 2025, AWS announced the Nova 2 family (Nova 2 Lite with reasoning and 1M token context, Nova 2 Omni for multimodal I/O, Nova 2 Sonic for real-time speech), Nova Act for browser automation (90%+ reliability), AgentCore for managed agent infrastructure, and the Strands open-source agent framework. Trainium3 UltraServers (3nm, 2.52 PFLOPS/chip FP8, 4.4x over Trn2) are now GA.^[82]

SageMaker Inference

SageMaker inference endpoints provide rolling updates, bidirectional streaming, and deep integration with the AWS ecosystem. For enterprises needing full control over model deployment, SageMaker offers custom containers, multi-model endpoints, and auto-scaling tied to CloudWatch metrics.^[31]

Pricing

Model	Input / 1M Tokens	Output / 1M Tokens
Claude Sonnet 4.6	$3.00	$15.00
Llama 4 Maverick	$0.24	$0.97
Amazon Nova Lite	$0.06	$0.24
Amazon Nova Pro	$0.80	$3.20
Provisioned Throughput	$21–50/hr per model unit	20–50% savings (1-month commit)

Pricing tiers: Standard (base), Priority (+75%), Flex (−50%), Batch (−50%).

Customer Wins

Customer	Use Case	Result
Robinhood	Token scaling	500M to 5B tokens/day in 6 months, 80% cost reduction
OPLOG	Production AI agents	Thousands of intelligent decisions/day
Totemia	Search + bookings	65% reduced search, 40% more bookings
Bynder	Asset management	75% reduction in asset search time

Trainium Architecture Evolution

AWS's custom silicon roadmap represents a systematic approach to cost-optimized inference:

Chip	Performance	Memory	Key Advantage
Inferentia2	190 TFLOPS FP16	32GB HBM	Cost-optimized inference baseline
Trainium2	4x first-gen	HBM2e	30–40% better than P5e, 54% lower cost per token
Trainium3	2.52 PFLOPs FP8 per chip, 4.4x over Trn2	144GB HBM3e, 4.9 TB/s	3nm, UltraServers GA (144 chips = 362 PFLOPS)
Trainium4	6x FP4 over Trn3	TBD	Announced; NVLink Fusion with NVIDIA Blackwell. Late 2026/2027

The progression from Inferentia2 to Trainium3 shows AWS building a complete silicon stack: inference-optimized chips (Inferentia) for high-volume serving, training/inference hybrid chips (Trainium) for flexibility, and frontier chips (Trainium3) for competitive positioning against NVIDIA Blackwell.

Key Risk: AWS

Complexity is AWS's Achilles' heel. The Bedrock vs. SageMaker vs. self-managed split confuses enterprise buyers. Neuron SDK adoption remains a fraction of CUDA's ecosystem. Trainium price-performance is strong but software maturity lags TPU (JAX/XLA) and NVIDIA (CUDA). Nova 2 Pro (the strongest first-party model) remains in preview only; AWS still relies on partnerships for frontier model quality differentiation.

Threat to Independents

AWS's sheer scale (30–31% cloud market share, 100K+ Bedrock orgs, $244B backlog) creates distribution advantage no independent can match. Trainium2 at 1.4M chips (Project Rainier: ~500K online with Anthropic) represents the largest custom silicon deployment for inference. Trainium3 UltraServers are now GA. The $200B planned capex for 2026 signals AWS will continue aggressive infrastructure investment.

Sources: Amazon Bedrock^[25], Bedrock Pricing^[26], AWS Trainium^[27], AWS Inferentia^[28], Amazon Q4 FY2025^[29], Bedrock Customers^[30], SageMaker 2025 Year in Review^[31], re:Invent 2025^[32]

Section 07

Microsoft Azure Profile

$13B

AI Annual Revenue

+175%

AI Revenue Growth YoY

80K

Enterprise AI Customers

80%

Fortune 500 on Microsoft Foundry

Microsoft Foundry (formerly Azure AI Foundry)

Microsoft Foundry (rebranded from Azure AI Foundry in January 2026^[83]) provides Models-as-a-Service (MaaS) with serverless API access to 1900+ managed models. Azure holds a unique dual position with BOTH OpenAI (exclusive until AGI) and Anthropic Claude, making it the only cloud where enterprises can access GPT-5.2 and Claude Opus 4.6 under a single billing relationship. The Azure AI Agent Service is now GA with 10,000+ customers and A2A multi-cloud support.^[33]

Maia 200 Custom Silicon

Microsoft's Maia 200, deployed in January 2026, is built on TSMC 3nm with 140B+ transistors, 216GB HBM3e at 7 TB/s bandwidth, and native FP8/FP4 tensor cores. Microsoft claims 3x FP4 performance of Trainium3 and FP8 above TPU v7.^[35]^[36]

GitHub Models

GitHub Models provides free prototyping access to AI models with an upgrade path to Microsoft Foundry for production. This developer funnel captures model evaluation at the earliest stage of the development lifecycle.^[41]

OpenAI remains Azure's largest customer and primary infrastructure consumer.^[87]

Pricing

Model	Input / 1M Tokens	Output / 1M Tokens
GPT-5.2	$1.75	$14.00
GPT-5 Mini	$0.25	$2.00
Provisioned Throughput (PTU)	From $2,448/month; recommended when pay-as-you-go exceeds ~$1,800/month
Total Cost vs. OpenAI Direct	15–40% higher (support plans, data transfer, network infra)^[34]

Key Partnerships

Partner	Relationship	Strategic Value
OpenAI	Exclusive cloud (until AGI)	GPT-5/5.2 exclusive, $12.43B infra spend
Anthropic	Claude on Azure	Only cloud with both Claude AND GPT
Meta	Llama enterprise deployment	Enterprise SLA on open-weight models
NVIDIA	NIM integration	Foundry-integrated GPU optimization

Customer Wins

Customer	Use Case	Result
OpenAI	Inference infrastructure	$12.43B spent (CY2024–Q3 2025)
Air India	Customer support AI	97% query automation
Schneider Electric	Troubleshooting AI	60–80% time reduction
H&R Block	Tax filing assistance	Real-time AI advisory

Key Risk: Azure

Azure's AI revenue is disproportionately dependent on OpenAI. The "exclusive until AGI" clause creates existential risk: if OpenAI achieves AGI (by their own or Azure's definition), the exclusivity ends. Azure's GPU pricing is the highest among hyperscalers (~$6.98/hr H100 vs. $2.50–3.00 on Oracle), and the 15–40% TCO premium over OpenAI direct limits cost-sensitive customers. Maia 200 is deployed internally but not GA for external customers; independent benchmarks remain absent.

Maia 200 Technical Deep Dive

Microsoft's Maia 200 represents the most aggressive custom silicon bet among the hyperscalers in terms of raw specifications:

Process: TSMC 3nm, the most advanced node used in any AI accelerator
Transistors: 140B+, exceeding NVIDIA Blackwell's 208B (dual-die) but on a single monolithic die
Memory: 216GB HBM3e at 7 TB/s bandwidth
On-chip SRAM: 272MB for model weight caching
Compute: Native FP8/FP4 tensor cores, 3x FP4 performance of Trainium3, FP8 above TPU v7
Deployment: US Central (Des Moines) initially, US West 3 (Phoenix) coming H2 2026
Software: Maia SDK with PyTorch integration, Triton compiler support
Purpose: Designed to reduce the "Copilot tax" by bringing inference costs in-house

The Maia 200 is optimized for inference, not training. Microsoft's strategy is to use Maia for high-volume Copilot and Azure OpenAI Service workloads while retaining NVIDIA GPUs for training and frontier model development.

Threat to Independents

Azure's OpenAI exclusivity means any enterprise wanting GPT-5 models with enterprise compliance MUST go through Azure. With 80% of Fortune 500 already on Microsoft Foundry, Azure's distribution moat is the deepest in enterprise AI. The Maia 200 chip (3x FP4 of Trainium3) signals Microsoft is serious about matching Google/AWS on custom silicon cost advantages. xAI Grok 3 and Perplexity ($750M cloud deal) further expand the ecosystem. Independent providers cannot replicate this model-access + silicon + distribution combination.

Sources: Foundry Models^[33], Azure OpenAI Pricing^[34], Maia 200 Blog^[35], TechCrunch Maia 200^[36], OpenAI Partnership Extension^[37], Azure OpenAI Statistics^[38], Microsoft AI Customer Stories^[39], Ignite 2025 Recap^[40], GitHub Models^[41]

Section 08

Oracle Cloud Profile

$8.0B

Cloud Revenue Q2 FY2026

$523B

Remaining Performance Obligations

800K

Zettascale10 GPU Target

$500B

Stargate Total Investment

OCI AI Strategy

Oracle Cloud Infrastructure (OCI) is the fastest-growing hyperscaler with IaaS revenue up 68% YoY (cloud revenue $8.0B, +34%). Oracle has no custom silicon but operates the largest NVIDIA GPU clusters: the original Zettascale (131K GPUs, 2.4 zettaFLOPS) is operational, with Zettascale10 (up to 800K GPUs, 16 zettaFLOPS) taking orders for H2 2026 GA. OCI AI Services^[84] now includes Llama 4, Cohere Command A, xAI Grok 4.1, and Google Gemini. Oracle is the only hyperscaler besides GCP offering Gemini as a managed service. AMD MI355X support (GA since Oct 2025) adds multi-vendor GPU capability.^[42]^[43]

Oracle's strategy is distinct: lowest-price GPU at the largest scale, combined with sovereign partnerships. The jaw-dropping $523B in Remaining Performance Obligations (up 438% YoY) signals massive contracted future revenue, driven by OpenAI's $30B/year contract signed July 2025. Sovereign deals span Saudi Arabia ($14B), UK ($5B), Germany ($2B), Netherlands ($1B), and Japan via SoftBank.^[47]

Pricing

Oracle publishes less granular AI pricing than peers. GPU hourly rates are estimated from third-party benchmarks and customer reports.

Resource	Rate	Notes
NVIDIA H100 (on-demand)	~$2.50–3.00/GPU-hr	Lowest among hyperscalers^[51]
NVIDIA H100 (spot/flex)	~$1.50–2.00/GPU-hr	Most aggressive spot pricing
OCI GenAI (Cohere Command R+)	$0.50 input / $1.50 output per 1M tokens	Published serverless rate
OCI GenAI (Llama 4 Maverick)	Custom enterprise pricing	Available via dedicated GPU hosting
Dedicated GPU Clusters	Custom enterprise pricing	131K–800K GPU superclusters; volume discounts negotiated

Key Partnerships

Partner	Deal Value	Strategic Impact
OpenAI (Stargate)	$30B/year contract (within $500B JV)	Largest cloud infrastructure deal in history
SoftBank	Japan sovereign cloud	Largest GPU cluster outside US
AMD	MI355X support	Multi-vendor GPU strategy
NVIDIA	Blackwell superclusters	131K–800K GPU superclusters

Sovereign Deals

Initiative	Geography	Scale
Stargate	US (Abilene, TX)	$500B total investment (w/ SoftBank, OpenAI)
Stargate for Countries	Multi-national	Sovereign AI infrastructure program
SoftBank Partnership	Japan	National AI infrastructure
Oracle Sovereign Cloud	EU, Middle East	Data residency compliance

Oracle's Unique Position

Oracle has no custom silicon, making it fully dependent on NVIDIA/AMD pricing. This is a structural weakness vs Google/AWS/Azure on cost efficiency. But Oracle's willingness to build at massive scale (800K GPU Zettascale10), offer the lowest pricing, and pursue sovereign deals creates a distinct niche. The $523B RPO (438% YoY) and $500B Stargate JV signal that Oracle's strategy of "scale and price" is gaining traction with the largest AI customers. Oracle guides IaaS revenue from $18B in FY2026 to $144B in five years.

Key Risk: Oracle

Oracle has zero custom silicon, meaning its cost floor is set by NVIDIA/AMD pricing. As Google, AWS, and Azure shift 50%+ of inference to custom ASICs by end 2026, Oracle's structural cost disadvantage widens. The $523B RPO creates customer concentration risk: the trajectory ($130B Feb 2025 → $523B Nov 2025) is driven primarily by OpenAI/SoftBank. Q2 FY2026 missed revenue estimates by $100M (11% stock drop). The model catalog remains the smallest vs peers, and developer mindshare in AI is minimal compared to the top three.

The Stargate Equation

The Stargate project represents the largest AI infrastructure investment in history:

Total Investment: $500B (SoftBank/OpenAI/Oracle joint venture). Over $400B committed across sites over 3 years.
Oracle's Role: Infrastructure backbone with OCI and Zettascale fabric. OpenAI signed $30B/year contract (July 2025).
First Data Center: Abilene, TX — operational and running OCI infrastructure with NVIDIA GPUs. Five additional sites announced.
Stargate for Countries: Extends the model to sovereign deployments globally, allowing nations to build their own AI infrastructure on Oracle's platform
Strategic Positioning: Oracle is not competing as an AI platform company. It is positioning as a pure-play compute supplier, the "picks and shovels" provider for frontier AI development.

This positioning is both Oracle's greatest opportunity (massive revenue from infrastructure deals) and its greatest risk (no platform lock-in, easily replaceable if another provider offers lower pricing).

Sources: Oracle Q2 FY2026 Earnings^[42], Stargate announcement^[43], SoftBank-Oracle Japan^[44], Oracle Sovereign Cloud^[45], Oracle AMD MI355X^[46], Oracle RPO disclosures^[47]

Section 09

Pricing Benchmarks

GPU Hourly Rates

GPU	Google Cloud	AWS	Azure	Oracle
H100 (on-demand)	~$3.00 (A3-High)	~$3.90 (p5.xlarge)	~$6.98	~$2.50–3.00
H100 (spot/preemptible)	~$2.25	~$2.50	N/A	~$1.50–2.00
Custom Silicon	$0.39/chip-hr (TPU v6e CUD)	~$4.80/hr (Trainium2)	TBD (Maia 200)	N/A

Per-Token Pricing (Input/Output per 1M Tokens)

Tier	Google (Vertex)	AWS (Bedrock)	Azure (OpenAI)	Oracle (OCI)
Frontier	Gemini 3.1 Pro: $2.00/$12	Claude Sonnet 4.6: $3/$15	GPT-5.2: $1.75/$14	Cohere Command R+: $0.50/$1.50
Mid-tier	Gemini 3 Flash: $0.50/$3.00	Llama 4 Maverick: $0.24/$0.97	GPT-5 Mini: $0.25/$2.00	Via dedicated GPU
Budget	Flash-Lite: $0.10/$0.40	Nova Lite: $0.06/$0.24	Phi-4 (open): varies	OCI GenAI: varies

Blended Cost (Llama 4 Maverick per 1M Tokens)

MinjAI estimates based on published pricing and compute benchmarks. Actual enterprise costs vary with CUDs, volume, and reserved capacity.

Provider	Input (est.)	Output (est.)	Methodology
AWS Bedrock	$0.24	$0.97	Published on-demand, standard tier (Maverick)
Google Vertex	~$0.20–0.50	~$0.50–1.00	Estimated via GKE with A3 GPU instances (Llama requires NVIDIA)
Azure Foundry	~$0.30–0.60	~$0.80–1.20	Estimated from MaaS serverless list pricing
Oracle OCI	~$0.25–0.50	~$0.60–0.90	Estimated range from dedicated GPU hourly rates

Pricing Caveat

List pricing is unreliable for enterprise comparisons. All hyperscalers offer Committed Use Discounts (CUDs), Savings Plans, and enterprise agreements that reduce costs 20–50%+. The true cost of inference depends on volume commitments, reserved capacity, and negotiated rates. Treat these benchmarks as directional, not definitive.

Sources: Vertex AI Pricing^[48], Bedrock Pricing^[49], Azure OpenAI Pricing^[50], GPU Cloud Pricing Comparison^[51], Introl Inference Unit Economics^[52]

Section 10

Enterprise Compliance & Sovereignty

Compliance Matrix

Certification	Google Cloud	AWS	Azure	Oracle
SOC 2 Type II	Yes	Yes	Yes	Yes
FedRAMP High	Yes	Yes (GovCloud)	Yes (Government)	Yes (Government)
HIPAA BAA	Yes	Yes	Yes	Yes
ISO 27001	Yes	Yes	Yes	Yes
PCI DSS	Yes	Yes	Yes	Yes
ISO 42001 (AI Gov.)	Partial	No	No	No
GDPR	Yes	Yes	Yes	Yes
C5 (Germany)	Yes	Yes	Yes	Yes

Sovereign Cloud Offerings

Provider	Sovereign Offering	Key Features
Google Cloud	Distributed Cloud (air-gapped)	On-prem/edge, data residency, government
AWS	GovCloud + Dedicated Local Zones	FedRAMP High, ITAR, Secret/Top Secret regions
Azure	Government + Sovereign Clouds	15+ government regions, Microsoft Cloud for Sovereignty
Oracle	Sovereign Cloud + Stargate for Countries	EU sovereign, dedicated regions, national AI infra

Inference-Specific Compliance

Capability	Google Cloud	AWS	Azure	Oracle
Inference Audit Logging	Cloud Audit Logs	CloudTrail + Bedrock logs	Azure Monitor + Content Safety	OCI Audit
Model Provenance	Model Garden metadata	Bedrock model cards	Foundry model transparency	Limited
EU AI Act Readiness	Early (transparency reports)	Early (guardrails)	Leading (Copilot Impact Assessments)	Minimal
Data Residency for Inference	Regional endpoints	Regional + GovCloud	Regional + Sovereign	Regional + Sovereign

SLA Commitments

Google Cloud: 99.9% SLA AWS: 99.9% SLA Azure: 99.95% SLA Oracle: 99.9% SLA

Compliance as Moat

Enterprise compliance remains the most durable hyperscaler advantage. SOC2, FedRAMP, HIPAA, and ISO 27001 certifications take 12–18 months to obtain and require ongoing investment. Most independent inference providers lack the full certification suite that regulated industries (healthcare, finance, government) require. This is the primary reason enterprises pay 20–40% premiums for hyperscaler inference.

AI-specific compliance is the next frontier. ISO 42001 (AI management systems) is gaining traction, but only Google has partial certification. EU AI Act compliance, model provenance tracking, and inference audit logging are emerging requirements that no provider fully addresses. The first provider to offer turnkey AI governance tooling alongside inference creates a new moat.

Sources: BentoML Inference Platform Buyer's Guide^[53], Gartner Sovereign Cloud $80B 2026^[54], AWS GovCloud^[55], Azure Government^[56], Google Distributed Cloud^[57], Oracle Sovereign Cloud^[58]

Section 11

Model Catalog & Multi-Model Strategies

Dimension	Google Cloud	AWS	Azure	Oracle
Catalog Size	200+ models	~100 providers	1900+ models	Growing (50+)
First-Party Models	Gemini 3.1 Pro / 3 Flash / 2.5 Flash-Lite	Amazon Nova family	N/A (partner models)	N/A (partner models)
OpenAI Models	Via Vertex (limited)	GPT via Bedrock (new)	Exclusive (GPT-5/5.2, o-series)	Via OCI (limited)
Anthropic Claude	Yes (Vertex)	Yes (Bedrock, primary)	Yes (Microsoft Foundry, new)	Limited
Meta Llama	Yes	Yes	Yes	Yes
Open-Weight Breadth	Strong (Gemini + open-source)	Broadest provider list	Strongest via Foundry catalog	Growing
Fine-Tuning	Vertex AI tuning	Bedrock custom models	Azure fine-tuning APIs	OCI fine-tuning
Serverless API	Yes	Yes	Yes (MaaS)	Yes

Multi-Model is Table Stakes

37% of enterprises now use 5+ models in production (up from 29% prior year)^[59]. Model differentiation by use case is the primary driver: frontier reasoning (GPT-5.2, Gemini 3.1 Pro), cost-optimized (Flash-Lite, Nova 2 Lite), domain-specific (fine-tuned Llama), and specialized (code, vision, speech). AI model gateways are emerging as abstraction layers. The implication: any inference platform, hyperscaler or independent, MUST support broad model catalogs to be competitive. Azure leads on raw catalog size (1900+), but Google and AWS lead on first-party model quality.

The Convergence Problem

All four hyperscalers are converging on the same models: every catalog now includes Llama, Claude, and Mistral. The differentiator is shifting from which models to how they're served: inference speed, cost per token, integration depth, and platform lock-in. For independent providers, this convergence is both threat (hyperscalers match any model catalog) and opportunity (model quality is commoditizing; execution and specialization matter more).

AI Model Gateways: The Emerging Abstraction Layer

As enterprises adopt 5+ models, the need for a unified routing layer has created a new category: AI model gateways. These gateways abstract model selection, enforce cost/latency policies, and enable A/B testing across providers.

Gateway Approach	Hyperscaler Example	Implication
Platform-native	Bedrock cross-model routing, Vertex Model Garden	Deep integration but vendor lock-in
Third-party	Portkey, LiteLLM, Martian	Multi-cloud flexibility; favors independents
Enterprise-built	Internal LLM proxies at banks, insurers	Full control; high build/maintenance cost

The model gateway layer is where independents can compete most effectively. By offering a neutral routing layer across hyperscaler backends, independent providers can capture the orchestration margin even when they don't own the underlying compute. This is the "Switzerland strategy" for inference.

Sources: a16z Enterprise AI 2025^[59], Menlo Ventures State of GenAI^[60], Vertex AI Model Garden^[61], Azure AI Foundry Models^[62], Amazon Bedrock Models^[63]

Section 12

Competitive Head-to-Head Matrix

Scored Matrix (1–5 Scale)

Scores reflect MinjAI analyst assessment based on market data, product capability, published benchmarks, and competitive positioning. 5 = category leader, 3 = competitive, 1 = significant gap. See Methodology (Section 14) for details.

Dimension	Google Cloud	AWS	Azure	Oracle
Custom Silicon	5 (TPU v6e/v7)	4 (Trainium2/3)	4 (Maia 200)	1 (none)
Model Catalog	4 (200+ models)	4 (broadest providers)	5 (1900+ models)	2 (growing)
Pricing Competitiveness	5 (Flash-Lite $0.10, 78% reduction)	3 (mid-range)	2 (highest GPU pricing)	4 (lowest GPU rates)
Enterprise Compliance	5	5	5	4
Sovereignty	4 (Distributed Cloud)	4 (GovCloud)	5 (15+ gov regions)	4 (Stargate for Countries)
Developer Experience	4 (Vertex AI, GKE)	5 (broadest ecosystem)	4 (Foundry, GitHub Models)	3 (improving)
Scale & Reliability	4	5 (largest infrastructure)	4	3 (fastest growing)
Threat to Independents	5 (cost floor via TPU)	4 (distribution + scale)	3 (enterprise lock-in)	2 (complementary)
TOTAL	36/40	34/40	32/40	23/40

Winner by Use Case

Use Case	Best Provider	Why
High-Volume API Inference	Google Cloud	TPU v6e cost floor + 78% cost reduction
Custom Silicon Optimization	Google Cloud	Most mature TPU ecosystem (6 generations)
Enterprise GPT/Claude	Azure	OpenAI exclusive + only cloud with both GPT-5 and Claude
Sovereign / Government	Azure	15+ government regions, Microsoft Cloud for Sovereignty
Open-Weight Model Hosting	AWS	Broadest provider list in Bedrock, largest customer base
Lowest-Cost GPU	Oracle	No custom silicon markup, aggressive pricing
Agentic AI Workflows	AWS	Bedrock AgentCore, SageMaker ecosystem

No Single Winner

Each hyperscaler dominates different dimensions. Google leads on cost (custom silicon + pricing, $240B backlog). AWS leads on scale (30% market share, $35.6B/qtr, $244B backlog). Azure leads on enterprise relationships (80% Fortune 500, GPT-5 exclusive). Oracle is the dark horse with $523B RPO and Zettascale10. For independent providers, the opportunity lies in dimensions where ALL hyperscalers underperform: guaranteed low-latency SLAs, true sovereign air-gapped deployment, and rapid BYOM onboarding.

Sources: MinjAI scoring methodology^[64], Synergy Research Cloud Share^[65]

Section 13

Implications for Independent Providers

Custom Silicon Deployment Timeline

Q1 2026

Maia 200 deployed in Azure (US Central, Des Moines)

Q1 2026

Trainium3 ramping at AWS, near-full 2026 supply

H1 2026

Ironwood (TPU v7) production ramp at Google

H1 2026

Oracle 131K GPU Zettascale operational; 800K GPU Zettascale10 taking orders

H2 2026

Maia 200 expands to US West 3 (Phoenix)

H2 2026

NVIDIA Rubin architecture early access (Vera Rubin 2027+)

2026

All hyperscalers running 50%+ inference on custom silicon

Reality Check: The Cost Floor Problem

Custom silicon creates 30–50% cost advantages that NVIDIA-dependent independents cannot structurally match. Google's 78% serving cost reduction in 2025, AWS's 1.4M Trainium2 chips (with Trainium3 now GA), and Microsoft's Maia 200 represent a permanent cost floor. Combined 2026 capex exceeds $500B across these four hyperscalers. Independent providers competing purely on token price will face margin compression as hyperscalers scale custom silicon. The "race to the bottom" on per-token pricing favors those who manufacture their own silicon.

The Opportunity: Where Hyperscalers Underperform

Sovereign & Air-Gapped Deployment: Hyperscalers offer "sovereign" but not true air-gapped. Independent providers with modular infrastructure can deploy where no hyperscaler can.
Low-Latency SLAs: Hyperscalers optimize for throughput and cost, not latency. Guaranteed latency SLAs are a genuine enterprise selling point for real-time applications.
BYOM Flexibility: Hyperscaler BYOM requires their ecosystem (SageMaker, Vertex endpoints). Independent platforms offer simpler, faster model onboarding.
Edge Inference: Hyperscalers are centralized by design. Edge and on-prem inference is a structural gap.
Multi-Cloud Arbitrage: 89% of enterprises use multi-cloud^[66]. Independent providers serve as the neutral inference layer across clouds.
Hybrid TCO: When factoring sovereignty compliance overhead, multi-region complexity, and latency penalties, independents can offer lower TCO for the right customer segments.

Strategic Positioning for Independent Providers

Independent inference providers that compete effectively against hyperscalers target a specific intersection that no hyperscaler serves well: sovereign-ready, latency-guaranteed inference with hardware flexibility. Against the hyperscaler landscape analyzed above, the strongest competitive positions sharpen around three pillars:

Independent Advantage	Hyperscaler Gap	Market Signal
True Air-Gapped Sovereign	Hyperscaler "sovereign" is still their cloud, their region. Not air-gapped.	Gartner: $80B sovereign cloud spend in 2026^[85]
Guaranteed Latency SLAs	Hyperscalers optimize throughput/cost, not latency. GKE reduced tail latency 60% but from high baselines.	Real-time finance, healthcare, autonomous systems
Multi-Chip Flexibility	Each hyperscaler pushes proprietary silicon. No provider offers H100 + alternative accelerators under one roof.	Enterprise demand for hardware-agnostic inference^[75]

Strategic Reality Check

With Google's 78% cost reduction ($240B backlog), AWS's $200B capex ($244B backlog)^[71], Microsoft's Maia 200 ramp^[72], and Oracle's $523B RPO, the cost advantage window for NVIDIA-dependent providers is narrowing rapidly. Independent providers must prove TCO advantages against hyperscalers with custom silicon, not just against each other. Competitive benchmarking should be done against Google Cloud TPU pricing, not just other independents like Fireworks, Together, or Baseten.

What to Watch: Decision Points for H1–H2 2026

Signal	What It Means	Impact on Independents
Ironwood (TPU v7) production benchmarks	If Google achieves 10x inference vs v6e, cost floor drops another 60–80%	Token price competition becomes unsustainable for NVIDIA-only providers
Maia 200 independent benchmarks	Validates or deflates Microsoft's "3x FP4" claims	If confirmed, Azure inference costs drop; if not, NVIDIA dependency remains
NVIDIA Rubin pricing	If Rubin narrows cost gap with custom ASICs significantly	Lifeline for NVIDIA-dependent providers; reduces urgency to diversify silicon
Agent framework lock-in	If one platform (Bedrock AgentCore, Vertex Agents) achieves >50% share	Multi-model matters less; platform stickiness becomes the moat
Open-source model quality parity	If Llama 4 Maverick/Mistral close gap with GPT-5.2/Gemini 3.1 Pro	Reduces Azure-OpenAI exclusivity premium; shifts value to infrastructure

Sources: Flexera Multi-Cloud Survey^[66], Gartner Hybrid Cloud 2027^[67], Alphabet Q4 Earnings^[70], Amazon Q4 Earnings^[71], Microsoft Q2 FY2026^[72], Oracle Q2 FY2026^[73]

Section 14

Methodology

Research Methodology

This report synthesizes data from Q3–Q4 2025 earnings calls^[70]^[71]^[72]^[73], January–February 2026 product announcements, analyst reports (Gartner^[5], IDC^[77], Deloitte^[69], MarketsandMarkets^[1]), and primary product documentation. Pricing data reflects list prices as of February 2026; enterprise pricing varies 20–50%.^[74] Performance claims are vendor-reported unless noted. Market share data from Synergy Research Group Q4 2025.^[65]

Data Limitations

Revenue attribution: No hyperscaler breaks out inference-specific revenue. "AI revenue" includes training, fine-tuning, and adjacent services. Figures are directional.
Pricing reliability: List pricing is misleading. Enterprise contracts include CUDs, Savings Plans, and volume discounts of 20–50%+. The blended cost estimates in Section 09 are MinjAI estimates, not vendor-confirmed.
Custom silicon benchmarks: Performance claims (Google's 4.7x, AWS's 30–40%, Microsoft's 3x FP4) are self-reported. No independent third-party has benchmarked all four under identical conditions.
Oracle data gaps: Oracle provides less public detail on AI-specific metrics than Google/AWS/Azure. IaaS revenue growth (+68%) is a proxy for AI demand, not a direct measure.
Scoring methodology: The head-to-head matrix (Section 12) uses MinjAI's internal 1–5 scoring based on market data, product capability, and competitive positioning. Scores reflect analyst judgment, not algorithmic ranking.

Source Categories

Category	Count	Examples
Earnings / Financial	12	Alphabet Q4^[70], Amazon Q4^[71], Oracle Q2^[73]
Product Documentation	28	Vertex AI^[17], Bedrock^[25], AI Foundry^[83], OCI^[84]
Analyst Reports	18	Gartner^[5], IDC^[77], Deloitte^[69]
Press / Tech Coverage	15	TechCrunch^[36], SDxCentral^[78]
Customer Case Studies	8	Midjourney^[24], Robinhood^[30]
Market Research	6	MarketsandMarkets^[1], SNS Insider^[76]

Sources & References

[1] MarketsandMarkets, "AI Inference Market Size, $255B by 2030." marketsandmarkets.com
[2] Fortune Business Insights, "AI Inference Market." fortunebusinessinsights.com
[3] Research & Markets, "AI Inference Market Outlook, $537B by 2034." researchandmarkets.com
[4] a16z, "LLMflation: LLM Inference Cost Trends." a16z.com
[5] Gartner, "AI-Optimized IaaS Poised to Become Next Growth Engine," Oct 2025. gartner.com
[6] Futurum Group, "Alphabet Q4 FY2025" — Cloud revenue and growth metrics. futurumgroup.com
[7] Futurum Group, "Amazon Q4 FY2025: AWS 24% Growth." futurumgroup.com
[8] QuantumRun, "Azure OpenAI Statistics 2026." quantumrun.com
[9] Oracle Q2 FY2026 Earnings Report. oracle.com/investor
[10] Synergy Research Group, "Cloud Market Share Q4 2025." srgresearch.com
[11] Oracle Q2 FY2026 Supplemental. oracle.com/investor
[12] TokenRing, "The Great Decoupling: How Custom Silicon is Eroding NVIDIA's Grip." financialcontent.com
[13] TokenRing, "The Blackwell Moat: NVIDIA's AI Hegemony." financialcontent.com
[14] Google Cloud Blog, "Trillium TPU is GA" — TPU v6e architecture and 4.7x performance claims. cloud.google.com
[15] AWS, "Trainium ML Chips" — architecture overview and performance claims. aws.amazon.com
[16] Microsoft Blog, "Maia 200: The AI Accelerator Built for Inference." blogs.microsoft.com
[17] Google Cloud, "Vertex AI GenAI Pricing" — Gemini model token pricing. cloud.google.com
[18] Google AI, "Gemini API Pricing." ai.google.dev
[19] Google Cloud Blog, "Trillium TPU is GA" — TPU v6e pricing ($0.39/chip-hr CUD). cloud.google.com
[20] Google Cloud Blog, "GKE Inference Gateway and Quickstart are GA." cloud.google.com
[21] Google Cloud Blog, "Google Cloud Next 2025 Wrap Up." cloud.google.com
[22] Futurum Group, "Alphabet Q4 FY2025" — AI revenue growth and customer wins. futurumgroup.com
[23] Google Cloud, "Customer Stories." cloud.google.com
[24] Midjourney TPU v6e migration case study (referenced in Trillium GA blog). cloud.google.com
[25] Amazon, "Amazon Bedrock." aws.amazon.com
[26] Amazon, "Bedrock Pricing" — per-token and provisioned throughput rates. aws.amazon.com
[27] AWS, "Trainium ML Chips" — Trainium2 pricing and deployment scale. aws.amazon.com
[28] AWS, "Inferentia Machine Learning Chips." aws.amazon.com
[29] Futurum Group, "Amazon Q4 FY2025 Revenue Beat." futurumgroup.com
[30] Amazon, "Bedrock Customers." aws.amazon.com
[31] AWS Blog, "SageMaker AI 2025 Year in Review." aws.amazon.com
[32] AWS Blog, "Top Announcements of re:Invent 2025." aws.amazon.com
[33] Azure, "Foundry Models." azure.microsoft.com
[34] Azure, "OpenAI Service Pricing" — GPT-5.2/GPT-5 Mini/PTU rates. azure.microsoft.com
[35] Microsoft Blog, "Maia 200 AI Accelerator." blogs.microsoft.com
[36] TechCrunch, "Microsoft Announces Powerful New Chip for AI Inference." techcrunch.com
[37] OpenAI, "OpenAI and Microsoft Extend Partnership." openai.com
[38] QuantumRun, "Azure OpenAI Statistics." quantumrun.com
[39] Microsoft, "AI Customer Stories." microsoft.com
[40] Azure Blog, "Actioning Agentic AI: Ignite 2025." azure.microsoft.com
[41] GitHub, "GitHub Models." github.com
[42] Oracle, "Q2 FY2026 Earnings." oracle.com/investor
[43] OpenAI/SoftBank/Oracle, "Stargate Joint Announcement," Jan 2025. openai.com
[44] SoftBank-Oracle Japan Partnership Announcement. oracle.com/news
[45] Oracle, "Sovereign Cloud" — EU sovereign and data residency features. oracle.com
[46] Oracle-AMD MI355X Support Announcement. oracle.com
[47] Oracle, "Remaining Performance Obligations." oracle.com/investor
[48] Google Cloud, "Vertex AI Pricing" — GPU hourly rates and compute pricing. cloud.google.com
[49] Amazon, "Bedrock Pricing" — cross-provider pricing benchmark data. aws.amazon.com
[50] Azure, "OpenAI Service Pricing" — cross-provider benchmark pricing. azure.microsoft.com
[51] GMI Cloud, "2025 GPU Cloud Pricing Comparison." gmicloud.ai
[52] Introl, "Inference Unit Economics: True Cost per Million Tokens." introl.com
[53] BentoML, "How to Vet Inference Platforms: Buyer's Guide." bentoml.com
[54] Gartner, "Worldwide Sovereign Cloud IaaS Spending $80B in 2026." gartner.com
[55] AWS, "GovCloud." aws.amazon.com
[56] Azure, "Government Cloud." azure.microsoft.com
[57] Google Cloud, "Distributed Cloud." cloud.google.com
[58] Oracle, "Sovereign Cloud" — Stargate for Countries program details. oracle.com
[59] a16z, "AI in the Enterprise 2025." a16z.com
[60] Menlo Ventures, "2025 State of Generative AI in the Enterprise." menlovc.com
[61] Google Cloud, "Model Garden Overview." cloud.google.com
[62] Azure, "AI Foundry Models Overview." learn.microsoft.com
[63] Amazon, "Bedrock Supported Models." aws.amazon.com
[64] MinjAI scoring methodology: 1–5 scale based on market data, product capability, and competitive positioning.
[65] Synergy Research Group, "Cloud Infrastructure Services Q4 2025." srgresearch.com
[66] Flexera, "2025 State of the Cloud Report: 89% Multi-Cloud Adoption." flexera.com
[67] Gartner, "90% Hybrid Cloud by 2027." gartner.com
[68] Grand View Research, "AI as a Service Market Size & Trends Analysis." grandviewresearch.com
[69] Deloitte, "Technology Predictions 2026: Compute Power." deloitte.com
[70] Alphabet Inc., "Q4 2025 Earnings Call Transcript." abc.xyz/investor
[71] Amazon.com Inc., "Q4 2025 Earnings Call Transcript." ir.aboutamazon.com
[72] Microsoft Corp., "Q2 FY2026 Earnings: Azure Growth +39%." microsoft.com/investor
[73] Oracle Corp., "Q2 FY2026 Earnings Call." oracle.com/investor
[74] CloudChipr, "Amazon Bedrock Pricing Explained." cloudchipr.com
[75] Wiz, "Enterprise Cloud Security." wiz.io
[76] SNS Insider, "AI Inference Market to Reach $349B by 2032." finance.yahoo.com
[77] IDC, "AI Infrastructure Spending." idc.com
[78] SDxCentral, "AI Inferencing Will Define 2026." sdxcentral.com
[79] Epoch AI, "LLM Inference Price Trends." epoch.ai
[80] Sequoia Capital, "AI's $600B Question." sequoiacap.com
[81] Google Cloud Blog, "GKE Agent Sandbox." cloud.google.com
[82] AWS, "Nova 2 Family Announcement." aws.amazon.com
[83] Microsoft, "Microsoft Foundry (formerly Azure AI Foundry)." azure.microsoft.com
[84] Oracle, "OCI AI Services." oracle.com
[85] Introl, "Sovereign Cloud AI Infrastructure." introl.com
[86] Google DeepMind, "Gemini 3 Model Family." deepmind.google
[87] Microsoft, "OpenAI Infrastructure Spending: $12.43B." quantumrun.com

MinjAI Competitive Intelligence Platform • Hyperscaler Inference Landscape Report • February 2026

87 Sources • 14 Sections • 4 Hyperscalers Analyzed

For strategic intelligence purposes. Market data and pricing are subject to change. Not investment advice.