Landscape Report — Hyperscaler Inference

Hyperscaler Managed Inference Strategies: Google Cloud, AWS, Azure & Oracle

Custom Silicon • Pricing Benchmarks • Enterprise Compliance • Implications for Independent Inference Providers

Feb 2026 MinjAI Agents 87 Sources 14 Sections
Strategic Intelligence Report
Section 01

Market Context

$106–126B
Global AI Inference Market (2025)
$255–537B
Projected Market (2030–2034)
66–75%
Hyperscaler Market Share
10x/Year
Inference Cost Deflation

The global AI inference market is valued at $106–126B in 2025[1][2], projected to reach $255B by 2030 (MarketsandMarkets) to $537B by 2034 (Research & Markets)[3]. Some analysts project an intermediate milestone of $349B by 2032.[76] Hyperscalers command 66–75% of the market through bundled enterprise relationships, compliance certifications, and custom silicon cost advantages.[10][68]

Inference Cost Deflation

Inference costs are declining at roughly 10x per year at equivalent model quality[4][79]: GPT-3-equivalent inference fell from $60/M tokens in 2021 to $0.06/M in 2025. Deloitte estimates inference will consume 67% of all AI compute by end of 2026, up from 50% in 2025.[69] Gartner projects $37.5B in AI-optimized IaaS spending in 2026, with 55% ($20.6B) flowing to inference.[5]

Capital Concentration

AI infrastructure spending reached $86B in Q3 2025 alone.[77] The combined 2025–2026 capex commitments of the four hyperscalers in this report exceed $500B. Sequoia's "$600B question"[80] remains unresolved: whether inference revenue will justify this capital expenditure. SDxCentral argues that inference, not training, will be the defining workload of 2026.[78]

Why This Report Matters

For independent inference providers, understanding hyperscaler strategy is existential. Custom silicon (TPU, Trainium, Maia) creates structural cost floors that NVIDIA-dependent providers cannot match. But hyperscaler one-size-fits-all approaches leave durable gaps: sovereign deployment, low-latency SLAs, BYOM flexibility, and edge inference. With 89% of enterprises using multi-cloud[66] and sovereign cloud spending projected at $80B in 2026[54], the addressable market for specialized providers remains substantial.

Section 02

Executive Summary

$85B+
Combined Quarterly Cloud Revenue
$500B+
Combined Capex (2025–26)
3 of 4
Building Custom Silicon
180K+
Enterprise AI Customers (est.)*

*Estimated: AWS 100K+ Bedrock orgs[30] + Azure 80K enterprise AI customers[38]. Overlap likely; treat as directional.

Provider Cloud Revenue AI Growth Signal Custom Silicon Key Product Enterprise Customers
Google Cloud $17.7B/qtr (+48%) 200%+ AI revenue growth TPU Trillium/Ironwood Vertex AI (200+ models) Midjourney, Shopify, GM
AWS $35.6B/qtr (+24%) 100K+ Bedrock orgs Trainium2/3 Bedrock + SageMaker Robinhood, Carrier
Azure $13B AI ann. (+175%) 80% Fortune 500 Maia 200 (3nm) Microsoft Foundry (1900+ models) OpenAI, Air India
Oracle $8.0B cloud/qtr (+34%) IaaS +68% growth None (NVIDIA/AMD) OCI AI Services SoftBank, OpenAI
Key Finding

Three distinct strategies have emerged. Custom silicon leaders (Google, AWS) build from chip to API. The partnership maximizer (Azure) leverages OpenAI exclusivity and the broadest model catalog. The scale arbitrageur (Oracle) offers raw GPU at the lowest price with no custom silicon, betting on sheer capacity and sovereign deals.

Sources: Alphabet Q4 FY2025[6], AWS Q4 FY2025[7], Azure OpenAI Statistics[8], Oracle Q2 FY2026[9]

Section 03

Landscape Snapshot (Side-by-Side)

Dimension Google Cloud AWS Azure Oracle
Founded 2008 2006 2010 1977
HQ Mountain View, CA Seattle, WA Redmond, WA Austin, TX
Cloud Revenue (Qtr) $17.7B (+48%) $35.6B (+24%) ~$25.6B (Azure +39%) $8.0B (+34%)
AI Revenue Signal 200%+ YoY growth Multi-B$ Bedrock ARR $13B AI annual (+175%) IaaS +68% YoY
Capex (2025/2026) $85B (2025) $200B (2026 planned) $150B (FY2026 annualized) $50B (FY2026)
Custom Silicon TPU v6e (GA) / v7 (announced) Trainium2 (1.4M) / T3 Maia 200 (Jan 2026) None
GPU Fleet NVIDIA A3/A3Ultra + TPU P5/P5e/P5en + Trainium ND H100/H200 + Maia NVIDIA H100/H200/B200
Inference Platform Vertex AI Bedrock + SageMaker Microsoft Foundry + OpenAI Service OCI AI Services
Model Catalog 200+ (Gemini, open-source) ~100 providers (Nova, Claude, Llama) 1900+ (OpenAI, Claude, Llama) Growing (Llama 4, Cohere, Grok, OCI GenAI)
Key Customers Midjourney, Shopify, GM, Citibank Robinhood, OPLOG, Carrier OpenAI, Air India, H&R Block SoftBank, OpenAI (Stargate), Uber
Compliance SOC2, HIPAA, FedRAMP, ISO 27001, PCI DSS SOC2, HIPAA, FedRAMP, ISO 27001, PCI DSS SOC2, HIPAA, FedRAMP, ISO 27001, PCI DSS SOC2, HIPAA, FedRAMP, ISO 27001, PCI DSS
BYOM Yes (Vertex AI Endpoints) Yes (SageMaker, Bedrock Custom) Yes (Microsoft Foundry Managed Compute) Yes (OCI Data Science)
Regions 40+ 34+ 60+ 48+
Market Share ~12% ~30–31% ~23% ~3% (fastest growth)

Note: Azure "Cloud Revenue" (~$25.6B/qtr estimated) reflects Microsoft Cloud at $51.5B/qtr with Azure growing +39% YoY (Q2 FY2026). Azure "$13B AI annual" (Section 02) is the AI-specific subset. These are different metrics; the AI-specific figure grows faster (+175% YoY) because it's emerging from a smaller base.

Sources: Synergy Research Cloud Market Share Q4 2025[10], Oracle Q2 FY2026 Earnings[11]

Section 04

Custom Silicon — “The Great Decoupling”

3 of 4
Hyperscalers With Custom Silicon
50%+
Internal Workloads on Custom ASICs
1.4–2x
Cost Efficiency vs NVIDIA GPUs
75–80%
NVIDIA Share by End 2026 (from 95%)

Custom Silicon Comparison

Dimension Google TPU AWS Trainium Azure Maia Oracle
Current Gen Trillium (v6e) Trainium2/3 (T3 GA) Maia 200 (internal) N/A (NVIDIA/AMD)
Next Gen Ironwood (v7, GA early 2026) Trainium4 (announced) TBD AMD MI450 (Q3 2026)
Process N/A N/A TSMC 3nm N/A
Transistors N/A N/A 140B+ N/A
Key Claim 4.7x compute vs v5e 30–40% better vs P5e 3x FP4 of Trainium3 Largest NVIDIA clusters
Chips Deployed Millions (est.) 1.4M Trainium2 Internal (Des Moines), not yet GA 131K–800K GPU superclusters
Pricing $0.39/chip-hr (v6e CUD) ~$4.80/hr (Trainium2) 30% better $/perf Market-rate GPU

Full-Stack Architecture

Layer 4: Application
Vertex AI API
Bedrock API
Microsoft Foundry API
OCI GenAI API
Layer 3: Platform
Model Garden, GKE Inference Gateway
SageMaker, Bedrock AgentCore
Azure OpenAI Service, Foundry Models
OCI AI Services, Data Science
Layer 2: Software
JAX/XLA, Pathways
Neuron SDK, NeMo
Maia SDK, Triton Compiler
CUDA, ROCm
Layer 1: Silicon
TPU v6e/v7 + NVIDIA
Trainium2/3 + Inferentia2
Maia 200 + NVIDIA
NVIDIA H100/H200/B200 + AMD MI355X
The NVIDIA Dependency Equation

Custom silicon creates 30–50% cost advantages for high-volume inference. But NVIDIA retains dominance in training and frontier workloads. The emerging architecture is hybrid: custom ASICs for high-volume inference, NVIDIA GPUs for training and new model onboarding. Oracle's lack of custom silicon is both a weakness (no cost floor advantage) and a strength (full NVIDIA/AMD compatibility, no software migration burden).

Why Custom Silicon Matters for Independents

NVIDIA-dependent providers (CoreWeave, Lambda, Crusoe, and other independents) face a structural cost floor. Custom ASICs are 1.4–2x more cost-efficient for inference at scale, meaning hyperscalers running TPU v6e or Trainium2 can offer the same inference workload at 30–50% lower cost than a provider running NVIDIA H100s.

The key dynamics shaping this landscape:

  • Training vs. Inference Split: Custom ASICs excel at inference (fixed model, optimized kernels). NVIDIA retains the training advantage through CUDA ecosystem maturity and multi-GPU scaling (NVLink, NVSwitch).
  • Software Lock-in Eroding: OpenAI's Triton compiler is enabling hardware-agnostic kernel development, eroding NVIDIA's CUDA moat. JAX/XLA already abstracts across TPU and GPU.
  • Predicted Market Structure: NVIDIA dominates training (80%+ share through 2027). Custom ASICs take over high-volume inference (50%+ of hyperscaler inference by end 2026). Independent providers must find cost advantages through operational efficiency, not silicon.

Sources: The Great Decoupling[12], NVIDIA's Blackwell Moat[13], Trillium GA Blog[14], AWS Trainium docs[15], Maia 200 blog[16]

Section 05

Google Cloud Profile

$17.7B
Cloud Revenue Q4 2025
+48%
YoY Growth
78%
Serving Cost Reduction (2025)
$240B
Cloud Backlog[6]

Vertex AI Model Garden

Google Cloud's Vertex AI platform provides access to 200+ models including the Gemini family (3.1 Pro, 3 Flash, 2.5 Flash-Lite), partner models (Claude, Llama), and open-source models. The Model Garden serves as a single pane for model discovery, deployment, and management. Google's first-party Gemini models are the primary differentiator, offering competitive pricing with strong reasoning capabilities.[17]

TPU Lineup

Google's TPU evolution spans six generations: v5e (cost-optimized inference), v5p (training-optimized), Trillium v6e (GA, 4.7x compute vs v5e), and the inference-optimized Ironwood v7 (GA early 2026, 192GB HBM3e, 10x peak performance over v5p, 42+ exaflops per pod). The v6e is available at $0.39/chip-hour on Committed Use Discounts. Vertex AI usage grew 20x YoY, and the cloud backlog reached $240B (55% sequential increase).[14][19]

GKE Inference Gateway

GA since September 2025, the GKE Inference Gateway reduces serving costs by 30%, tail latency by 60%, and improves throughput by 40%. It integrates NVIDIA NeMo Guardrails for safety and the Model Optimizer for automated routing across model variants. GKE Agent Sandbox reduces cold-start by ~90%.[20][81]

Gemini 3 Model Family

Google's Gemini model family has rapidly evolved to its third generation. Gemini 3.1 Pro (released February 2026) is the current frontier model, while Gemini 3 Flash offers strong mid-tier performance. The legacy 2.5 Flash-Lite remains available as a budget option at $0.10/M input tokens, the cheapest first-party model from any hyperscaler.[86]

Pricing

Model Input / 1M Tokens Output / 1M Tokens
Gemini 2.5 Flash-Lite (legacy) $0.10 $0.40
Gemini 3 Flash $0.50 $3.00
Gemini 3.1 Pro $2.00 $12.00
TPU v6e (chip-hour) $0.39 (CUD) $1.20 (on-demand)

Customer Wins

Customer Use Case Result
Midjourney Image generation inference $16.8M/yr savings (monthly spend $2.1M to <$700K)
Shopify Claude on Vertex AI Sidekick AI commerce assistant
Sabre Gemini + Agent Builder Airline retailing AI
BMC Vertex AI agents Autonomous enterprise IT
Key Risk: Google Cloud

TPU ecosystem lock-in is the double-edged sword. Models optimized for TPU (via JAX/XLA) require significant porting effort to run on NVIDIA or other silicon. Enterprises wary of vendor lock-in may prefer the portability of NVIDIA-based platforms. Additionally, Google Cloud's 12% market share means fewer enterprise integration partners and a thinner ecosystem than AWS or Azure.

GKE Inference Gateway: Architecture & Competitive Implications

The GKE Inference Gateway matters because it attacks the three cost drivers of serving at scale: idle compute, cold starts, and suboptimal routing. Architecturally, it sits between the load balancer and model backends, making routing decisions based on real-time KV cache utilization and request priority.

CapabilityMechanismWhy It Matters for Independents
Model MultiplexingRoutes across TPU + GPU backends dynamicallyRequires custom silicon fleet; NVIDIA-only providers can't replicate
KV Cache-Aware RoutingSteers requests to backends with warm cachesReduces redundant computation; open-source routers lack this
Priority SchedulingQueues by SLA tier (latency vs throughput)Enables premium tiers; most independents offer flat SLAs
Agent SandboxPre-warmed containers for agentic workloads90% cold-start reduction; critical as agent adoption grows

The competitive implication: Google is building inference optimization into the platform layer, not just the silicon layer. Independents must match this routing intelligence through software even without custom silicon.

Threat to Independents

Google's 78% serving cost reduction in 2025 is the most aggressive cost deflation of any hyperscaler. Combined with TPU v6e at $0.39/chip-hour, Google can offer inference at cost floors that NVIDIA-dependent providers cannot match. Ironwood (v7), now GA with 10x peak performance over v5p, will extend this advantage further. The GKE Inference Gateway achieved 35–52% TTFT latency improvements and doubled prefix cache hit rates to 70%.

Sources: Vertex AI Pricing[17], Gemini API Pricing[18], Trillium GA Blog[19], GKE Inference Gateway Blog[20], Google Cloud Next 2025[21], Alphabet Q4 Earnings[22], Google Cloud Customers[23], Midjourney TPU migration[24]

Section 06

AWS Profile

$35.6B
AWS Revenue Q4 2025
100K+
Bedrock Organizations
1.4M
Trainium2 Chips Deployed
$200B
Planned Capex

Amazon Bedrock

Amazon Bedrock is the managed inference juggernaut: 100K+ organizations, multi-billion dollar ARR, 4.7x customer growth in one year, and 150% QoQ spending increase. Available models span Nova 2 (Amazon's first-party), Claude (Anthropic), Llama 4 (Meta), Mistral, Cohere, Google, OpenAI, and NVIDIA. Intelligent Prompt Routing (GA) dynamically routes to the cheapest model maintaining quality, delivering 30–60% cost savings.[25]

re:Invent 2025 Launches

At re:Invent 2025, AWS announced the Nova 2 family (Nova 2 Lite with reasoning and 1M token context, Nova 2 Omni for multimodal I/O, Nova 2 Sonic for real-time speech), Nova Act for browser automation (90%+ reliability), AgentCore for managed agent infrastructure, and the Strands open-source agent framework. Trainium3 UltraServers (3nm, 2.52 PFLOPS/chip FP8, 4.4x over Trn2) are now GA.[82]

SageMaker Inference

SageMaker inference endpoints provide rolling updates, bidirectional streaming, and deep integration with the AWS ecosystem. For enterprises needing full control over model deployment, SageMaker offers custom containers, multi-model endpoints, and auto-scaling tied to CloudWatch metrics.[31]

Pricing

Model Input / 1M Tokens Output / 1M Tokens
Claude Sonnet 4.6 $3.00 $15.00
Llama 4 Maverick $0.24 $0.97
Amazon Nova Lite $0.06 $0.24
Amazon Nova Pro $0.80 $3.20
Provisioned Throughput $21–50/hr per model unit 20–50% savings (1-month commit)

Pricing tiers: Standard (base), Priority (+75%), Flex (−50%), Batch (−50%).

Customer Wins

Customer Use Case Result
Robinhood Token scaling 500M to 5B tokens/day in 6 months, 80% cost reduction
OPLOG Production AI agents Thousands of intelligent decisions/day
Totemia Search + bookings 65% reduced search, 40% more bookings
Bynder Asset management 75% reduction in asset search time
Trainium Architecture Evolution

AWS's custom silicon roadmap represents a systematic approach to cost-optimized inference:

Chip Performance Memory Key Advantage
Inferentia2 190 TFLOPS FP16 32GB HBM Cost-optimized inference baseline
Trainium2 4x first-gen HBM2e 30–40% better than P5e, 54% lower cost per token
Trainium3 2.52 PFLOPs FP8 per chip, 4.4x over Trn2 144GB HBM3e, 4.9 TB/s 3nm, UltraServers GA (144 chips = 362 PFLOPS)
Trainium4 6x FP4 over Trn3 TBD Announced; NVLink Fusion with NVIDIA Blackwell. Late 2026/2027

The progression from Inferentia2 to Trainium3 shows AWS building a complete silicon stack: inference-optimized chips (Inferentia) for high-volume serving, training/inference hybrid chips (Trainium) for flexibility, and frontier chips (Trainium3) for competitive positioning against NVIDIA Blackwell.

Key Risk: AWS

Complexity is AWS's Achilles' heel. The Bedrock vs. SageMaker vs. self-managed split confuses enterprise buyers. Neuron SDK adoption remains a fraction of CUDA's ecosystem. Trainium price-performance is strong but software maturity lags TPU (JAX/XLA) and NVIDIA (CUDA). Nova 2 Pro (the strongest first-party model) remains in preview only; AWS still relies on partnerships for frontier model quality differentiation.

Threat to Independents

AWS's sheer scale (30–31% cloud market share, 100K+ Bedrock orgs, $244B backlog) creates distribution advantage no independent can match. Trainium2 at 1.4M chips (Project Rainier: ~500K online with Anthropic) represents the largest custom silicon deployment for inference. Trainium3 UltraServers are now GA. The $200B planned capex for 2026 signals AWS will continue aggressive infrastructure investment.

Sources: Amazon Bedrock[25], Bedrock Pricing[26], AWS Trainium[27], AWS Inferentia[28], Amazon Q4 FY2025[29], Bedrock Customers[30], SageMaker 2025 Year in Review[31], re:Invent 2025[32]

Section 07

Microsoft Azure Profile

$13B
AI Annual Revenue
+175%
AI Revenue Growth YoY
80K
Enterprise AI Customers
80%
Fortune 500 on Microsoft Foundry

Microsoft Foundry (formerly Azure AI Foundry)

Microsoft Foundry (rebranded from Azure AI Foundry in January 2026[83]) provides Models-as-a-Service (MaaS) with serverless API access to 1900+ managed models. Azure holds a unique dual position with BOTH OpenAI (exclusive until AGI) and Anthropic Claude, making it the only cloud where enterprises can access GPT-5.2 and Claude Opus 4.6 under a single billing relationship. The Azure AI Agent Service is now GA with 10,000+ customers and A2A multi-cloud support.[33]

Maia 200 Custom Silicon

Microsoft's Maia 200, deployed in January 2026, is built on TSMC 3nm with 140B+ transistors, 216GB HBM3e at 7 TB/s bandwidth, and native FP8/FP4 tensor cores. Microsoft claims 3x FP4 performance of Trainium3 and FP8 above TPU v7.[35][36]

GitHub Models

GitHub Models provides free prototyping access to AI models with an upgrade path to Microsoft Foundry for production. This developer funnel captures model evaluation at the earliest stage of the development lifecycle.[41]

OpenAI remains Azure's largest customer and primary infrastructure consumer.[87]

Pricing

Model Input / 1M Tokens Output / 1M Tokens
GPT-5.2 $1.75 $14.00
GPT-5 Mini $0.25 $2.00
Provisioned Throughput (PTU) From $2,448/month; recommended when pay-as-you-go exceeds ~$1,800/month
Total Cost vs. OpenAI Direct 15–40% higher (support plans, data transfer, network infra)[34]

Key Partnerships

Partner Relationship Strategic Value
OpenAI Exclusive cloud (until AGI) GPT-5/5.2 exclusive, $12.43B infra spend
Anthropic Claude on Azure Only cloud with both Claude AND GPT
Meta Llama enterprise deployment Enterprise SLA on open-weight models
NVIDIA NIM integration Foundry-integrated GPU optimization

Customer Wins

Customer Use Case Result
OpenAI Inference infrastructure $12.43B spent (CY2024–Q3 2025)
Air India Customer support AI 97% query automation
Schneider Electric Troubleshooting AI 60–80% time reduction
H&R Block Tax filing assistance Real-time AI advisory
Key Risk: Azure

Azure's AI revenue is disproportionately dependent on OpenAI. The "exclusive until AGI" clause creates existential risk: if OpenAI achieves AGI (by their own or Azure's definition), the exclusivity ends. Azure's GPU pricing is the highest among hyperscalers (~$6.98/hr H100 vs. $2.50–3.00 on Oracle), and the 15–40% TCO premium over OpenAI direct limits cost-sensitive customers. Maia 200 is deployed internally but not GA for external customers; independent benchmarks remain absent.

Maia 200 Technical Deep Dive

Microsoft's Maia 200 represents the most aggressive custom silicon bet among the hyperscalers in terms of raw specifications:

  • Process: TSMC 3nm, the most advanced node used in any AI accelerator
  • Transistors: 140B+, exceeding NVIDIA Blackwell's 208B (dual-die) but on a single monolithic die
  • Memory: 216GB HBM3e at 7 TB/s bandwidth
  • On-chip SRAM: 272MB for model weight caching
  • Compute: Native FP8/FP4 tensor cores, 3x FP4 performance of Trainium3, FP8 above TPU v7
  • Deployment: US Central (Des Moines) initially, US West 3 (Phoenix) coming H2 2026
  • Software: Maia SDK with PyTorch integration, Triton compiler support
  • Purpose: Designed to reduce the "Copilot tax" by bringing inference costs in-house

The Maia 200 is optimized for inference, not training. Microsoft's strategy is to use Maia for high-volume Copilot and Azure OpenAI Service workloads while retaining NVIDIA GPUs for training and frontier model development.

Threat to Independents

Azure's OpenAI exclusivity means any enterprise wanting GPT-5 models with enterprise compliance MUST go through Azure. With 80% of Fortune 500 already on Microsoft Foundry, Azure's distribution moat is the deepest in enterprise AI. The Maia 200 chip (3x FP4 of Trainium3) signals Microsoft is serious about matching Google/AWS on custom silicon cost advantages. xAI Grok 3 and Perplexity ($750M cloud deal) further expand the ecosystem. Independent providers cannot replicate this model-access + silicon + distribution combination.

Sources: Foundry Models[33], Azure OpenAI Pricing[34], Maia 200 Blog[35], TechCrunch Maia 200[36], OpenAI Partnership Extension[37], Azure OpenAI Statistics[38], Microsoft AI Customer Stories[39], Ignite 2025 Recap[40], GitHub Models[41]

Section 08

Oracle Cloud Profile

$8.0B
Cloud Revenue Q2 FY2026
$523B
Remaining Performance Obligations
800K
Zettascale10 GPU Target
$500B
Stargate Total Investment

OCI AI Strategy

Oracle Cloud Infrastructure (OCI) is the fastest-growing hyperscaler with IaaS revenue up 68% YoY (cloud revenue $8.0B, +34%). Oracle has no custom silicon but operates the largest NVIDIA GPU clusters: the original Zettascale (131K GPUs, 2.4 zettaFLOPS) is operational, with Zettascale10 (up to 800K GPUs, 16 zettaFLOPS) taking orders for H2 2026 GA. OCI AI Services[84] now includes Llama 4, Cohere Command A, xAI Grok 4.1, and Google Gemini. Oracle is the only hyperscaler besides GCP offering Gemini as a managed service. AMD MI355X support (GA since Oct 2025) adds multi-vendor GPU capability.[42][43]

Oracle's strategy is distinct: lowest-price GPU at the largest scale, combined with sovereign partnerships. The jaw-dropping $523B in Remaining Performance Obligations (up 438% YoY) signals massive contracted future revenue, driven by OpenAI's $30B/year contract signed July 2025. Sovereign deals span Saudi Arabia ($14B), UK ($5B), Germany ($2B), Netherlands ($1B), and Japan via SoftBank.[47]

Pricing

Oracle publishes less granular AI pricing than peers. GPU hourly rates are estimated from third-party benchmarks and customer reports.

Resource Rate Notes
NVIDIA H100 (on-demand) ~$2.50–3.00/GPU-hr Lowest among hyperscalers[51]
NVIDIA H100 (spot/flex) ~$1.50–2.00/GPU-hr Most aggressive spot pricing
OCI GenAI (Cohere Command R+) $0.50 input / $1.50 output per 1M tokens Published serverless rate
OCI GenAI (Llama 4 Maverick) Custom enterprise pricing Available via dedicated GPU hosting
Dedicated GPU Clusters Custom enterprise pricing 131K–800K GPU superclusters; volume discounts negotiated

Key Partnerships

Partner Deal Value Strategic Impact
OpenAI (Stargate) $30B/year contract (within $500B JV) Largest cloud infrastructure deal in history
SoftBank Japan sovereign cloud Largest GPU cluster outside US
AMD MI355X support Multi-vendor GPU strategy
NVIDIA Blackwell superclusters 131K–800K GPU superclusters

Sovereign Deals

Initiative Geography Scale
Stargate US (Abilene, TX) $500B total investment (w/ SoftBank, OpenAI)
Stargate for Countries Multi-national Sovereign AI infrastructure program
SoftBank Partnership Japan National AI infrastructure
Oracle Sovereign Cloud EU, Middle East Data residency compliance
Oracle's Unique Position

Oracle has no custom silicon, making it fully dependent on NVIDIA/AMD pricing. This is a structural weakness vs Google/AWS/Azure on cost efficiency. But Oracle's willingness to build at massive scale (800K GPU Zettascale10), offer the lowest pricing, and pursue sovereign deals creates a distinct niche. The $523B RPO (438% YoY) and $500B Stargate JV signal that Oracle's strategy of "scale and price" is gaining traction with the largest AI customers. Oracle guides IaaS revenue from $18B in FY2026 to $144B in five years.

Key Risk: Oracle

Oracle has zero custom silicon, meaning its cost floor is set by NVIDIA/AMD pricing. As Google, AWS, and Azure shift 50%+ of inference to custom ASICs by end 2026, Oracle's structural cost disadvantage widens. The $523B RPO creates customer concentration risk: the trajectory ($130B Feb 2025 → $523B Nov 2025) is driven primarily by OpenAI/SoftBank. Q2 FY2026 missed revenue estimates by $100M (11% stock drop). The model catalog remains the smallest vs peers, and developer mindshare in AI is minimal compared to the top three.

The Stargate Equation

The Stargate project represents the largest AI infrastructure investment in history:

  • Total Investment: $500B (SoftBank/OpenAI/Oracle joint venture). Over $400B committed across sites over 3 years.
  • Oracle's Role: Infrastructure backbone with OCI and Zettascale fabric. OpenAI signed $30B/year contract (July 2025).
  • First Data Center: Abilene, TX — operational and running OCI infrastructure with NVIDIA GPUs. Five additional sites announced.
  • Stargate for Countries: Extends the model to sovereign deployments globally, allowing nations to build their own AI infrastructure on Oracle's platform
  • Strategic Positioning: Oracle is not competing as an AI platform company. It is positioning as a pure-play compute supplier, the "picks and shovels" provider for frontier AI development.

This positioning is both Oracle's greatest opportunity (massive revenue from infrastructure deals) and its greatest risk (no platform lock-in, easily replaceable if another provider offers lower pricing).

Sources: Oracle Q2 FY2026 Earnings[42], Stargate announcement[43], SoftBank-Oracle Japan[44], Oracle Sovereign Cloud[45], Oracle AMD MI355X[46], Oracle RPO disclosures[47]

Section 09

Pricing Benchmarks

GPU Hourly Rates

GPU Google Cloud AWS Azure Oracle
H100 (on-demand) ~$3.00 (A3-High) ~$3.90 (p5.xlarge) ~$6.98 ~$2.50–3.00
H100 (spot/preemptible) ~$2.25 ~$2.50 N/A ~$1.50–2.00
Custom Silicon $0.39/chip-hr (TPU v6e CUD) ~$4.80/hr (Trainium2) TBD (Maia 200) N/A

Per-Token Pricing (Input/Output per 1M Tokens)

Tier Google (Vertex) AWS (Bedrock) Azure (OpenAI) Oracle (OCI)
Frontier Gemini 3.1 Pro: $2.00/$12 Claude Sonnet 4.6: $3/$15 GPT-5.2: $1.75/$14 Cohere Command R+: $0.50/$1.50
Mid-tier Gemini 3 Flash: $0.50/$3.00 Llama 4 Maverick: $0.24/$0.97 GPT-5 Mini: $0.25/$2.00 Via dedicated GPU
Budget Flash-Lite: $0.10/$0.40 Nova Lite: $0.06/$0.24 Phi-4 (open): varies OCI GenAI: varies

Blended Cost (Llama 4 Maverick per 1M Tokens)

MinjAI estimates based on published pricing and compute benchmarks. Actual enterprise costs vary with CUDs, volume, and reserved capacity.

Provider Input (est.) Output (est.) Methodology
AWS Bedrock $0.24 $0.97 Published on-demand, standard tier (Maverick)
Google Vertex ~$0.20–0.50 ~$0.50–1.00 Estimated via GKE with A3 GPU instances (Llama requires NVIDIA)
Azure Foundry ~$0.30–0.60 ~$0.80–1.20 Estimated from MaaS serverless list pricing
Oracle OCI ~$0.25–0.50 ~$0.60–0.90 Estimated range from dedicated GPU hourly rates
Pricing Caveat

List pricing is unreliable for enterprise comparisons. All hyperscalers offer Committed Use Discounts (CUDs), Savings Plans, and enterprise agreements that reduce costs 20–50%+. The true cost of inference depends on volume commitments, reserved capacity, and negotiated rates. Treat these benchmarks as directional, not definitive.

Sources: Vertex AI Pricing[48], Bedrock Pricing[49], Azure OpenAI Pricing[50], GPU Cloud Pricing Comparison[51], Introl Inference Unit Economics[52]

Section 10

Enterprise Compliance & Sovereignty

Compliance Matrix

Certification Google Cloud AWS Azure Oracle
SOC 2 Type II Yes Yes Yes Yes
FedRAMP High Yes Yes (GovCloud) Yes (Government) Yes (Government)
HIPAA BAA Yes Yes Yes Yes
ISO 27001 Yes Yes Yes Yes
PCI DSS Yes Yes Yes Yes
ISO 42001 (AI Gov.) Partial No No No
GDPR Yes Yes Yes Yes
C5 (Germany) Yes Yes Yes Yes

Sovereign Cloud Offerings

Provider Sovereign Offering Key Features
Google Cloud Distributed Cloud (air-gapped) On-prem/edge, data residency, government
AWS GovCloud + Dedicated Local Zones FedRAMP High, ITAR, Secret/Top Secret regions
Azure Government + Sovereign Clouds 15+ government regions, Microsoft Cloud for Sovereignty
Oracle Sovereign Cloud + Stargate for Countries EU sovereign, dedicated regions, national AI infra

Inference-Specific Compliance

Capability Google Cloud AWS Azure Oracle
Inference Audit Logging Cloud Audit Logs CloudTrail + Bedrock logs Azure Monitor + Content Safety OCI Audit
Model Provenance Model Garden metadata Bedrock model cards Foundry model transparency Limited
EU AI Act Readiness Early (transparency reports) Early (guardrails) Leading (Copilot Impact Assessments) Minimal
Data Residency for Inference Regional endpoints Regional + GovCloud Regional + Sovereign Regional + Sovereign

SLA Commitments

Google Cloud: 99.9% SLA   AWS: 99.9% SLA   Azure: 99.95% SLA   Oracle: 99.9% SLA

Compliance as Moat

Enterprise compliance remains the most durable hyperscaler advantage. SOC2, FedRAMP, HIPAA, and ISO 27001 certifications take 12–18 months to obtain and require ongoing investment. Most independent inference providers lack the full certification suite that regulated industries (healthcare, finance, government) require. This is the primary reason enterprises pay 20–40% premiums for hyperscaler inference.

AI-specific compliance is the next frontier. ISO 42001 (AI management systems) is gaining traction, but only Google has partial certification. EU AI Act compliance, model provenance tracking, and inference audit logging are emerging requirements that no provider fully addresses. The first provider to offer turnkey AI governance tooling alongside inference creates a new moat.

Sources: BentoML Inference Platform Buyer's Guide[53], Gartner Sovereign Cloud $80B 2026[54], AWS GovCloud[55], Azure Government[56], Google Distributed Cloud[57], Oracle Sovereign Cloud[58]

Section 11

Model Catalog & Multi-Model Strategies

Dimension Google Cloud AWS Azure Oracle
Catalog Size 200+ models ~100 providers 1900+ models Growing (50+)
First-Party Models Gemini 3.1 Pro / 3 Flash / 2.5 Flash-Lite Amazon Nova family N/A (partner models) N/A (partner models)
OpenAI Models Via Vertex (limited) GPT via Bedrock (new) Exclusive (GPT-5/5.2, o-series) Via OCI (limited)
Anthropic Claude Yes (Vertex) Yes (Bedrock, primary) Yes (Microsoft Foundry, new) Limited
Meta Llama Yes Yes Yes Yes
Open-Weight Breadth Strong (Gemini + open-source) Broadest provider list Strongest via Foundry catalog Growing
Fine-Tuning Vertex AI tuning Bedrock custom models Azure fine-tuning APIs OCI fine-tuning
Serverless API Yes Yes Yes (MaaS) Yes
Multi-Model is Table Stakes

37% of enterprises now use 5+ models in production (up from 29% prior year)[59]. Model differentiation by use case is the primary driver: frontier reasoning (GPT-5.2, Gemini 3.1 Pro), cost-optimized (Flash-Lite, Nova 2 Lite), domain-specific (fine-tuned Llama), and specialized (code, vision, speech). AI model gateways are emerging as abstraction layers. The implication: any inference platform, hyperscaler or independent, MUST support broad model catalogs to be competitive. Azure leads on raw catalog size (1900+), but Google and AWS lead on first-party model quality.

The Convergence Problem

All four hyperscalers are converging on the same models: every catalog now includes Llama, Claude, and Mistral. The differentiator is shifting from which models to how they're served: inference speed, cost per token, integration depth, and platform lock-in. For independent providers, this convergence is both threat (hyperscalers match any model catalog) and opportunity (model quality is commoditizing; execution and specialization matter more).

AI Model Gateways: The Emerging Abstraction Layer

As enterprises adopt 5+ models, the need for a unified routing layer has created a new category: AI model gateways. These gateways abstract model selection, enforce cost/latency policies, and enable A/B testing across providers.

Gateway ApproachHyperscaler ExampleImplication
Platform-nativeBedrock cross-model routing, Vertex Model GardenDeep integration but vendor lock-in
Third-partyPortkey, LiteLLM, MartianMulti-cloud flexibility; favors independents
Enterprise-builtInternal LLM proxies at banks, insurersFull control; high build/maintenance cost

The model gateway layer is where independents can compete most effectively. By offering a neutral routing layer across hyperscaler backends, independent providers can capture the orchestration margin even when they don't own the underlying compute. This is the "Switzerland strategy" for inference.

Sources: a16z Enterprise AI 2025[59], Menlo Ventures State of GenAI[60], Vertex AI Model Garden[61], Azure AI Foundry Models[62], Amazon Bedrock Models[63]

Section 12

Competitive Head-to-Head Matrix

Scored Matrix (1–5 Scale)

Scores reflect MinjAI analyst assessment based on market data, product capability, published benchmarks, and competitive positioning. 5 = category leader, 3 = competitive, 1 = significant gap. See Methodology (Section 14) for details.

Dimension Google Cloud AWS Azure Oracle
Custom Silicon 5 (TPU v6e/v7) 4 (Trainium2/3) 4 (Maia 200) 1 (none)
Model Catalog 4 (200+ models) 4 (broadest providers) 5 (1900+ models) 2 (growing)
Pricing Competitiveness 5 (Flash-Lite $0.10, 78% reduction) 3 (mid-range) 2 (highest GPU pricing) 4 (lowest GPU rates)
Enterprise Compliance 5 5 5 4
Sovereignty 4 (Distributed Cloud) 4 (GovCloud) 5 (15+ gov regions) 4 (Stargate for Countries)
Developer Experience 4 (Vertex AI, GKE) 5 (broadest ecosystem) 4 (Foundry, GitHub Models) 3 (improving)
Scale & Reliability 4 5 (largest infrastructure) 4 3 (fastest growing)
Threat to Independents 5 (cost floor via TPU) 4 (distribution + scale) 3 (enterprise lock-in) 2 (complementary)
TOTAL 36/40 34/40 32/40 23/40

Winner by Use Case

Use Case Best Provider Why
High-Volume API Inference Google Cloud TPU v6e cost floor + 78% cost reduction
Custom Silicon Optimization Google Cloud Most mature TPU ecosystem (6 generations)
Enterprise GPT/Claude Azure OpenAI exclusive + only cloud with both GPT-5 and Claude
Sovereign / Government Azure 15+ government regions, Microsoft Cloud for Sovereignty
Open-Weight Model Hosting AWS Broadest provider list in Bedrock, largest customer base
Lowest-Cost GPU Oracle No custom silicon markup, aggressive pricing
Agentic AI Workflows AWS Bedrock AgentCore, SageMaker ecosystem
No Single Winner

Each hyperscaler dominates different dimensions. Google leads on cost (custom silicon + pricing, $240B backlog). AWS leads on scale (30% market share, $35.6B/qtr, $244B backlog). Azure leads on enterprise relationships (80% Fortune 500, GPT-5 exclusive). Oracle is the dark horse with $523B RPO and Zettascale10. For independent providers, the opportunity lies in dimensions where ALL hyperscalers underperform: guaranteed low-latency SLAs, true sovereign air-gapped deployment, and rapid BYOM onboarding.

Sources: MinjAI scoring methodology[64], Synergy Research Cloud Share[65]

Section 13

Implications for Independent Providers

Custom Silicon Deployment Timeline

Q1 2026
Maia 200 deployed in Azure (US Central, Des Moines)
Q1 2026
Trainium3 ramping at AWS, near-full 2026 supply
H1 2026
Ironwood (TPU v7) production ramp at Google
H1 2026
Oracle 131K GPU Zettascale operational; 800K GPU Zettascale10 taking orders
H2 2026
Maia 200 expands to US West 3 (Phoenix)
H2 2026
NVIDIA Rubin architecture early access (Vera Rubin 2027+)
2026
All hyperscalers running 50%+ inference on custom silicon
Reality Check: The Cost Floor Problem

Custom silicon creates 30–50% cost advantages that NVIDIA-dependent independents cannot structurally match. Google's 78% serving cost reduction in 2025, AWS's 1.4M Trainium2 chips (with Trainium3 now GA), and Microsoft's Maia 200 represent a permanent cost floor. Combined 2026 capex exceeds $500B across these four hyperscalers. Independent providers competing purely on token price will face margin compression as hyperscalers scale custom silicon. The "race to the bottom" on per-token pricing favors those who manufacture their own silicon.

The Opportunity: Where Hyperscalers Underperform
  • Sovereign & Air-Gapped Deployment: Hyperscalers offer "sovereign" but not true air-gapped. Independent providers with modular infrastructure can deploy where no hyperscaler can.
  • Low-Latency SLAs: Hyperscalers optimize for throughput and cost, not latency. Guaranteed latency SLAs are a genuine enterprise selling point for real-time applications.
  • BYOM Flexibility: Hyperscaler BYOM requires their ecosystem (SageMaker, Vertex endpoints). Independent platforms offer simpler, faster model onboarding.
  • Edge Inference: Hyperscalers are centralized by design. Edge and on-prem inference is a structural gap.
  • Multi-Cloud Arbitrage: 89% of enterprises use multi-cloud[66]. Independent providers serve as the neutral inference layer across clouds.
  • Hybrid TCO: When factoring sovereignty compliance overhead, multi-region complexity, and latency penalties, independents can offer lower TCO for the right customer segments.

Strategic Positioning for Independent Providers

Independent inference providers that compete effectively against hyperscalers target a specific intersection that no hyperscaler serves well: sovereign-ready, latency-guaranteed inference with hardware flexibility. Against the hyperscaler landscape analyzed above, the strongest competitive positions sharpen around three pillars:

Independent Advantage Hyperscaler Gap Market Signal
True Air-Gapped Sovereign Hyperscaler "sovereign" is still their cloud, their region. Not air-gapped. Gartner: $80B sovereign cloud spend in 2026[85]
Guaranteed Latency SLAs Hyperscalers optimize throughput/cost, not latency. GKE reduced tail latency 60% but from high baselines. Real-time finance, healthcare, autonomous systems
Multi-Chip Flexibility Each hyperscaler pushes proprietary silicon. No provider offers H100 + alternative accelerators under one roof. Enterprise demand for hardware-agnostic inference[75]
Strategic Reality Check

With Google's 78% cost reduction ($240B backlog), AWS's $200B capex ($244B backlog)[71], Microsoft's Maia 200 ramp[72], and Oracle's $523B RPO, the cost advantage window for NVIDIA-dependent providers is narrowing rapidly. Independent providers must prove TCO advantages against hyperscalers with custom silicon, not just against each other. Competitive benchmarking should be done against Google Cloud TPU pricing, not just other independents like Fireworks, Together, or Baseten.

What to Watch: Decision Points for H1–H2 2026

Signal What It Means Impact on Independents
Ironwood (TPU v7) production benchmarks If Google achieves 10x inference vs v6e, cost floor drops another 60–80% Token price competition becomes unsustainable for NVIDIA-only providers
Maia 200 independent benchmarks Validates or deflates Microsoft's "3x FP4" claims If confirmed, Azure inference costs drop; if not, NVIDIA dependency remains
NVIDIA Rubin pricing If Rubin narrows cost gap with custom ASICs significantly Lifeline for NVIDIA-dependent providers; reduces urgency to diversify silicon
Agent framework lock-in If one platform (Bedrock AgentCore, Vertex Agents) achieves >50% share Multi-model matters less; platform stickiness becomes the moat
Open-source model quality parity If Llama 4 Maverick/Mistral close gap with GPT-5.2/Gemini 3.1 Pro Reduces Azure-OpenAI exclusivity premium; shifts value to infrastructure

Sources: Flexera Multi-Cloud Survey[66], Gartner Hybrid Cloud 2027[67], Alphabet Q4 Earnings[70], Amazon Q4 Earnings[71], Microsoft Q2 FY2026[72], Oracle Q2 FY2026[73]

Section 14

Methodology

Research Methodology

This report synthesizes data from Q3–Q4 2025 earnings calls[70][71][72][73], January–February 2026 product announcements, analyst reports (Gartner[5], IDC[77], Deloitte[69], MarketsandMarkets[1]), and primary product documentation. Pricing data reflects list prices as of February 2026; enterprise pricing varies 20–50%.[74] Performance claims are vendor-reported unless noted. Market share data from Synergy Research Group Q4 2025.[65]

Data Limitations

Source Categories

Category Count Examples
Earnings / Financial 12 Alphabet Q4[70], Amazon Q4[71], Oracle Q2[73]
Product Documentation 28 Vertex AI[17], Bedrock[25], AI Foundry[83], OCI[84]
Analyst Reports 18 Gartner[5], IDC[77], Deloitte[69]
Press / Tech Coverage 15 TechCrunch[36], SDxCentral[78]
Customer Case Studies 8 Midjourney[24], Robinhood[30]
Market Research 6 MarketsandMarkets[1], SNS Insider[76]

Sources & References

  1. [1] MarketsandMarkets, "AI Inference Market Size, $255B by 2030." marketsandmarkets.com
  2. [2] Fortune Business Insights, "AI Inference Market." fortunebusinessinsights.com
  3. [3] Research & Markets, "AI Inference Market Outlook, $537B by 2034." researchandmarkets.com
  4. [4] a16z, "LLMflation: LLM Inference Cost Trends." a16z.com
  5. [5] Gartner, "AI-Optimized IaaS Poised to Become Next Growth Engine," Oct 2025. gartner.com
  6. [6] Futurum Group, "Alphabet Q4 FY2025" — Cloud revenue and growth metrics. futurumgroup.com
  7. [7] Futurum Group, "Amazon Q4 FY2025: AWS 24% Growth." futurumgroup.com
  8. [8] QuantumRun, "Azure OpenAI Statistics 2026." quantumrun.com
  9. [9] Oracle Q2 FY2026 Earnings Report. oracle.com/investor
  10. [10] Synergy Research Group, "Cloud Market Share Q4 2025." srgresearch.com
  11. [11] Oracle Q2 FY2026 Supplemental. oracle.com/investor
  12. [12] TokenRing, "The Great Decoupling: How Custom Silicon is Eroding NVIDIA's Grip." financialcontent.com
  13. [13] TokenRing, "The Blackwell Moat: NVIDIA's AI Hegemony." financialcontent.com
  14. [14] Google Cloud Blog, "Trillium TPU is GA" — TPU v6e architecture and 4.7x performance claims. cloud.google.com
  15. [15] AWS, "Trainium ML Chips" — architecture overview and performance claims. aws.amazon.com
  16. [16] Microsoft Blog, "Maia 200: The AI Accelerator Built for Inference." blogs.microsoft.com
  17. [17] Google Cloud, "Vertex AI GenAI Pricing" — Gemini model token pricing. cloud.google.com
  18. [18] Google AI, "Gemini API Pricing." ai.google.dev
  19. [19] Google Cloud Blog, "Trillium TPU is GA" — TPU v6e pricing ($0.39/chip-hr CUD). cloud.google.com
  20. [20] Google Cloud Blog, "GKE Inference Gateway and Quickstart are GA." cloud.google.com
  21. [21] Google Cloud Blog, "Google Cloud Next 2025 Wrap Up." cloud.google.com
  22. [22] Futurum Group, "Alphabet Q4 FY2025" — AI revenue growth and customer wins. futurumgroup.com
  23. [23] Google Cloud, "Customer Stories." cloud.google.com
  24. [24] Midjourney TPU v6e migration case study (referenced in Trillium GA blog). cloud.google.com
  25. [25] Amazon, "Amazon Bedrock." aws.amazon.com
  26. [26] Amazon, "Bedrock Pricing" — per-token and provisioned throughput rates. aws.amazon.com
  27. [27] AWS, "Trainium ML Chips" — Trainium2 pricing and deployment scale. aws.amazon.com
  28. [28] AWS, "Inferentia Machine Learning Chips." aws.amazon.com
  29. [29] Futurum Group, "Amazon Q4 FY2025 Revenue Beat." futurumgroup.com
  30. [30] Amazon, "Bedrock Customers." aws.amazon.com
  31. [31] AWS Blog, "SageMaker AI 2025 Year in Review." aws.amazon.com
  32. [32] AWS Blog, "Top Announcements of re:Invent 2025." aws.amazon.com
  33. [33] Azure, "Foundry Models." azure.microsoft.com
  34. [34] Azure, "OpenAI Service Pricing" — GPT-5.2/GPT-5 Mini/PTU rates. azure.microsoft.com
  35. [35] Microsoft Blog, "Maia 200 AI Accelerator." blogs.microsoft.com
  36. [36] TechCrunch, "Microsoft Announces Powerful New Chip for AI Inference." techcrunch.com
  37. [37] OpenAI, "OpenAI and Microsoft Extend Partnership." openai.com
  38. [38] QuantumRun, "Azure OpenAI Statistics." quantumrun.com
  39. [39] Microsoft, "AI Customer Stories." microsoft.com
  40. [40] Azure Blog, "Actioning Agentic AI: Ignite 2025." azure.microsoft.com
  41. [41] GitHub, "GitHub Models." github.com
  42. [42] Oracle, "Q2 FY2026 Earnings." oracle.com/investor
  43. [43] OpenAI/SoftBank/Oracle, "Stargate Joint Announcement," Jan 2025. openai.com
  44. [44] SoftBank-Oracle Japan Partnership Announcement. oracle.com/news
  45. [45] Oracle, "Sovereign Cloud" — EU sovereign and data residency features. oracle.com
  46. [46] Oracle-AMD MI355X Support Announcement. oracle.com
  47. [47] Oracle, "Remaining Performance Obligations." oracle.com/investor
  48. [48] Google Cloud, "Vertex AI Pricing" — GPU hourly rates and compute pricing. cloud.google.com
  49. [49] Amazon, "Bedrock Pricing" — cross-provider pricing benchmark data. aws.amazon.com
  50. [50] Azure, "OpenAI Service Pricing" — cross-provider benchmark pricing. azure.microsoft.com
  51. [51] GMI Cloud, "2025 GPU Cloud Pricing Comparison." gmicloud.ai
  52. [52] Introl, "Inference Unit Economics: True Cost per Million Tokens." introl.com
  53. [53] BentoML, "How to Vet Inference Platforms: Buyer's Guide." bentoml.com
  54. [54] Gartner, "Worldwide Sovereign Cloud IaaS Spending $80B in 2026." gartner.com
  55. [55] AWS, "GovCloud." aws.amazon.com
  56. [56] Azure, "Government Cloud." azure.microsoft.com
  57. [57] Google Cloud, "Distributed Cloud." cloud.google.com
  58. [58] Oracle, "Sovereign Cloud" — Stargate for Countries program details. oracle.com
  59. [59] a16z, "AI in the Enterprise 2025." a16z.com
  60. [60] Menlo Ventures, "2025 State of Generative AI in the Enterprise." menlovc.com
  61. [61] Google Cloud, "Model Garden Overview." cloud.google.com
  62. [62] Azure, "AI Foundry Models Overview." learn.microsoft.com
  63. [63] Amazon, "Bedrock Supported Models." aws.amazon.com
  64. [64] MinjAI scoring methodology: 1–5 scale based on market data, product capability, and competitive positioning.
  65. [65] Synergy Research Group, "Cloud Infrastructure Services Q4 2025." srgresearch.com
  66. [66] Flexera, "2025 State of the Cloud Report: 89% Multi-Cloud Adoption." flexera.com
  67. [67] Gartner, "90% Hybrid Cloud by 2027." gartner.com
  68. [68] Grand View Research, "AI as a Service Market Size & Trends Analysis." grandviewresearch.com
  69. [69] Deloitte, "Technology Predictions 2026: Compute Power." deloitte.com
  70. [70] Alphabet Inc., "Q4 2025 Earnings Call Transcript." abc.xyz/investor
  71. [71] Amazon.com Inc., "Q4 2025 Earnings Call Transcript." ir.aboutamazon.com
  72. [72] Microsoft Corp., "Q2 FY2026 Earnings: Azure Growth +39%." microsoft.com/investor
  73. [73] Oracle Corp., "Q2 FY2026 Earnings Call." oracle.com/investor
  74. [74] CloudChipr, "Amazon Bedrock Pricing Explained." cloudchipr.com
  75. [75] Wiz, "Enterprise Cloud Security." wiz.io
  76. [76] SNS Insider, "AI Inference Market to Reach $349B by 2032." finance.yahoo.com
  77. [77] IDC, "AI Infrastructure Spending." idc.com
  78. [78] SDxCentral, "AI Inferencing Will Define 2026." sdxcentral.com
  79. [79] Epoch AI, "LLM Inference Price Trends." epoch.ai
  80. [80] Sequoia Capital, "AI's $600B Question." sequoiacap.com
  81. [81] Google Cloud Blog, "GKE Agent Sandbox." cloud.google.com
  82. [82] AWS, "Nova 2 Family Announcement." aws.amazon.com
  83. [83] Microsoft, "Microsoft Foundry (formerly Azure AI Foundry)." azure.microsoft.com
  84. [84] Oracle, "OCI AI Services." oracle.com
  85. [85] Introl, "Sovereign Cloud AI Infrastructure." introl.com
  86. [86] Google DeepMind, "Gemini 3 Model Family." deepmind.google
  87. [87] Microsoft, "OpenAI Infrastructure Spending: $12.43B." quantumrun.com

MinjAI Competitive Intelligence Platform • Hyperscaler Inference Landscape Report • February 2026

87 Sources • 14 Sections • 4 Hyperscalers Analyzed

For strategic intelligence purposes. Market data and pricing are subject to change. Not investment advice.