Cloudflare (NYSE: NET) is transforming from a CDN and security company into the leading edge inference platform.1 Workers AI runs serverless inference across 180+ GPU-enabled cities. The November 2025 acquisition of Replicate (terms undisclosed) added 50,000+ production-ready models.2
Q4 2025 results confirmed the thesis: $614.5M revenue, up 34% YoY.3 Workers AI inference requests grew 4,000% YoY as measured in early 2025.4 More recent growth data is not publicly available; Cloudflare does not break out AI-specific revenue. The company guides $2.79B in 2026 revenue at 28-29% growth.5
CEO Matthew Prince positions 2026 as the year of the "Agentic Internet." Cloudflare aims to be the platform where AI agents run, not just the network they traverse.6
Cloudflare's distribution moat is its core advantage. With 332K paying customers, adding inference is a natural upsell. MARA must emphasize what Cloudflare cannot match: dedicated GPU clusters, contractual latency guarantees, and data sovereignty. Cloudflare's shared-edge model lacks the isolation sensitive workloads require.
| Attribute | Detail |
|---|---|
| Legal Name | Cloudflare, Inc. |
| Founded | July 26, 2009 |
| Founders | Matthew Prince (CEO), Michelle Zatlyn (COO), Lee Holloway (stepped back from active operations due to health issues; no longer involved in day-to-day management) |
| Headquarters | San Francisco, California |
| Employees | ~6,670 (Jan 2026)7 |
| Stock Ticker | NYSE: NET |
| IPO Date | September 13, 2019 at $15/share8 |
| Pre-IPO Funding | $332M across 7 rounds7 |
| Market Cap | $68.8B (Feb 2026)3 |
Cloudflare grew out of Project Honey Pot, an anti-spam initiative by Prince and Holloway.9 It launched at TechCrunch Disrupt in September 2010 with a mission: build a faster, safer internet via CDN and DDoS protection.
Three strategic phases define the evolution. Phase 1 (2010-2017): global CDN and security. Phase 2 (2017-2022): Workers serverless compute, becoming a developer platform. Phase 3 (2023-present): AI inference, vector databases, and Replicate.
Matthew Prince made TIME's 100 Most Influential People in AI (2025).10 His thesis: AI is a "platform shift" comparable to mobile, not a bubble. Michelle Zatlyn serves as COO and President, leading business operations.
| Period | Revenue | YoY Growth | Key Metric |
|---|---|---|---|
| FY2023 | $1.30B | 32% | IPO price: $15/share |
| FY2024 | $1.67B | 29%11 | 173 customers at $1M+ ARR |
| Q4 2025 | $614.5M | 34%3 | 269 customers at $1M+ ARR (+55% YoY) |
| FY2026 Guide | $2.79B | 28-29%5 | Op. income: $378-382M (14% margin) |
Q4 2025 cash: $4.1B.3 Free cash flow: $99.4M for the quarter. FY2026 EPS guidance: $1.11-$1.12, reflecting improving unit economics.
The $1M+ cohort grew 55% YoY to 269 accounts; $100K+ ARR reached 3,850 customers.3 Over 70% of large contracts include 3+ products.12 This cross-sell motion drives the inference distribution strategy.
Cloudflare holds $4.1B in cash and generates ~$400M annual free cash flow. It can subsidize AI inference pricing indefinitely. MARA cannot compete on price against free inference tiers. The path forward: dedicated performance, SLAs, and sovereign compliance that Cloudflare's multi-tenant edge cannot deliver.
Cloudflare does not report Workers AI revenue separately. AI-specific contribution to the ~$2.16B total is unknown. The 4,000% YoY inference growth (early 2025) has not been updated. Without segment-level disclosure, sizing Cloudflare's inference business requires estimates.
Workers AI provides serverless GPU inference with an OpenAI-compatible API.13 Developers deploy models with a single API call. No GPU provisioning, no cluster management. Models run on the nearest GPU-enabled edge node automatically.
Native integrations: Vectorize (RAG), R2 (model storage), D1 (metadata).14 No other inference provider offers this full-stack integration.
Cloudflare built Infire, a Rust-based LLM inference engine replacing Python-based stacks like vLLM.15 Key benchmarks:
AI Gateway routes requests across model providers from a single endpoint.17 Features: BYOK for secure API key management, unified billing, and dynamic routing with fallback logic. In 2026, consolidated billing lets developers pay for third-party models (OpenAI, Anthropic) on one invoice.
The Replicate acquisition (announced Nov 17, 2025; acquisition price not disclosed) adds 50,000+ production-ready models.2 This includes access to proprietary models like GPT-5 and Claude through a unified API. Replicate's marketplace enables one-line deployment on Cloudflare's edge. The brand operates independently post-acquisition.18
Integration timeline for Replicate's 50K+ model catalog into Cloudflare's edge network is unclear. Full edge deployment of GPU-intensive models faces physical memory constraints at individual PoPs. The speed of this integration determines when Cloudflare's model catalog becomes a true competitive advantage versus a marketing headline.
Workers AI uses a "Neuron" abstraction for billing. Each model maps its compute cost to a Neuron equivalent.19
| Tier | Included | Price | Target |
|---|---|---|---|
| Free | 10,000 Neurons/day | $0 | Hobbyists, prototyping |
| Paid (Workers) | 10,000 Neurons/day free | $0.011 / 1K Neurons | Production apps |
| Enterprise | Custom allocation | Custom pricing | Large-scale deployments |
| Provider | Model | Pricing Approach | Free Tier |
|---|---|---|---|
| Cloudflare | Serverless (edge) | $0.011/1K Neurons | 10K Neurons/day |
| Fireworks AI | Centralized GPU | $0.20/M input tokens (Llama 3.1 70B) | Free credits |
| Together AI | Centralized GPU | $0.88/M input tokens (Llama 3.1 70B) | $1 free credits |
| Baseten | Dedicated/serverless | Per-GPU-second | $30 free credits |
The Neuron abstraction obscures true token costs, making direct comparison difficult. The free tier (10K Neurons/day) captures developer mindshare before production workloads emerge. At enterprise scale, per-Neuron pricing can compete but lacks MARA's latency guarantees and dedicated capacity.
Workers AI sits within the broader Workers platform.20 The $5/month Paid plan includes 10M requests, 30M CPU ms, and 10K Neurons/day. Bundling means existing Workers users get inference at marginal cost.
Cloudflare added a record 37,000 paying customers in Q4 2025 alone.3 The $1M+ cohort grew 55% YoY to 269 accounts. Over 40% of the Y Combinator Winter 2025 cohort builds on Cloudflare's R2 and Workers AI platform.12
Cloudflare's developer platform is the core of its distribution moat. Key integrations:
| Product | Function | AI Relevance |
|---|---|---|
| Workers | Serverless compute | Inference orchestration |
| Pages | Frontend deployment | AI-powered app hosting |
| R2 | Object storage (S3-compatible) | Model artifacts, training data |
| D1 | Serverless SQL (SQLite) | Structured metadata |
| Vectorize | Vector database | RAG, semantic search |
| Durable Objects | Stateful compute | Agent memory, sessions |
| AI Gateway | API routing & billing | Multi-provider inference |
Cloudflare launched the Agents SDK and agents.cloudflare.com in early 2026.21 Durable Objects provide persistent state for AI agents. The "Markdown for Agents" feature auto-converts HTML to markdown for agent consumption.22 Moltworker, a self-hosted personal AI agent, demonstrates the platform's agent capabilities.23
In December 2025, Cloudflare expanded its JD Cloud partnership for global AI inference.24 The deal cuts cross-border latency by up to 80%. China traffic routes to JD Cloud; all other traffic routes to Cloudflare. This addresses data residency in China and India.
Cloudflare's fundamental bet is that inference belongs at the edge, not in centralized GPU clusters. The argument: a user in Tokyo querying a model in Virginia incurs hundreds of milliseconds in network latency alone. Edge inference eliminates this overhead.
| Dimension | Cloudflare (Edge) | Centralized Providers | MARA (Sovereign) |
|---|---|---|---|
| Latency | Low (50ms to 95% of users) | Variable (region-dependent) | Low-latency SLA (dedicated) |
| Model Size | Limited by edge GPU memory | Full range (large clusters) | Full range (dedicated H100/H200) |
| Isolation | Multi-tenant (shared edge) | Shared or dedicated | Fully dedicated clusters |
| Data Sovereignty | Data Localization Suite | Region selection | Air-gapped, sovereign-ready |
| Customization | Limited (catalog models) | Fine-tuning, custom models | Full stack customization |
| Pricing | Pay-per-Neuron (serverless) | Per-token or per-GPU-hour | 30-50% below hyperscalers |
Distribution moat. 332K paying customers already on the platform. Adding inference is an upsell, not a cold start. Over 70% of large deals include 3+ products.12
Full-stack integration. No other inference provider offers compute, storage, database, vector search, and inference in one platform. Developers build entire AI apps without leaving Cloudflare.
Developer gravity. 40%+ of YC W25 building on Cloudflare. Once developers adopt Workers + R2 + D1, switching costs rise significantly.
Infire engine. Custom Rust inference engine with 82% lower CPU overhead than vLLM.15 Enables profitable inference on edge hardware with fewer GPUs.
Edge GPU memory limits. Edge nodes run smaller GPU configurations. Large models (70B+) require centralized infrastructure that Cloudflare lacks at scale.
Multi-tenant architecture. Shared edge infrastructure cannot guarantee the isolation that regulated industries require. No dedicated GPU allocations per customer.
Neuron pricing opacity. The Neuron abstraction makes cost comparison difficult. Enterprise buyers with high-volume workloads may find centralized providers cheaper at scale.
| Dimension | Assessment | Threat to MARA |
|---|---|---|
| Distribution | 332K customers, massive cross-sell | Critical |
| Pricing | Free tier + serverless pay-per-use | High |
| Technology | Infire engine, 180+ GPU cities | Medium |
| Model Catalog | 50K+ models via Replicate | Medium |
| Enterprise AI | Multi-tenant edge, limited isolation | Low |
| Sovereign/Regulated | Data Localization Suite (basic) | Low |
Cloudflare competes as a developer platform that includes inference, not as an inference provider. This is a fundamentally different GTM from pure-play inference companies. It does not need to win on price or performance. "Good enough" inference for existing Workers/R2/D1 users is sufficient.
This is the distribution moat in action. Fireworks AI and Together AI must convince developers to adopt a new platform. Cloudflare only needs existing customers to check a box.
Dedicated infrastructure. Cloudflare's multi-tenant edge model cannot offer isolated GPU clusters. Enterprises running proprietary models on sensitive data need hardware-level isolation.
Large model inference. Edge nodes with limited GPU memory cannot serve 70B+ parameter models efficiently. Centralized or dedicated infrastructure is required.
Guaranteed latency SLAs. Edge inference latency varies by load and location. Contractual latency guarantees require dedicated, predictable hardware.
Sovereign deployments. Air-gapped, on-premises inference for defense and government workloads is outside Cloudflare's operational model entirely.
Cloudflare AI Gateway lets developers route inference requests to multiple backends. MARA could register as a premium backend provider, capturing customers who outgrow shared-edge inference. Requirements: OpenAI-compatible API, published latency SLAs, and billing integration via AI Gateway. This turns Cloudflare's 332K customer base into a lead generation channel without head-to-head competition.
Cloudflare will fully integrate Replicate by mid-2026. Expect a unified model marketplace with one-click edge deployment. The Agentic AI platform (Durable Objects + Agents SDK) will attract AI-native startups building autonomous agents.25
The company plans to hire 1,111 interns in 2026, signaling aggressive talent acquisition.26 GPU-enabled cities will likely expand beyond 200 by end of 2026. Revenue guidance of $2.79B suggests confidence in continued 28-29% growth.
The real risk: Cloudflare makes "good enough" inference so accessible that enterprises deprioritize dedicated infrastructure. MARA must prove the ROI of dedicated inference justifies the premium over serverless.
| Unknown | Why It Matters | How to Monitor |
|---|---|---|
| Workers AI revenue | Cannot size the inference business without segment-level data. | Monitor quarterly earnings calls for AI commentary. |
| Replicate deal terms | Deal valuation signals Cloudflare's strategic commitment to AI. | Watch for SEC filing amendments or analyst estimates. |
| Edge GPU memory limits | Determines which models can actually run at the edge vs centralized. | Test model availability across different PoPs. |
| Enterprise inference adoption | Is Cloudflare winning enterprise AI workloads or only developer/startup? | Track customer announcements and AI Gateway partnerships. |