Cerebras Systems is a custom silicon company that has built the world's largest chip, the Wafer-Scale Engine (WSE), and is now leveraging it to dominate AI inference speed benchmarks.[1] Founded in 2016 by Andrew Feldman and the team behind SeaMicro (sold to AMD for $334M),[2] Cerebras has evolved from a training-focused hardware vendor into an inference-as-a-service platform powering OpenAI, Meta, and the U.S. Department of Energy.[3][4]
The company's WSE-3 chip contains 4 trillion transistors and 900,000 AI-optimized cores on a single 46,255 mm² wafer, delivering 125 petaflops of peak AI performance.[1] This architecture eliminates GPU interconnect overhead, enabling inference speeds up to 20x faster than NVIDIA-based clouds at 32% lower cost.[5]
Cerebras represents a dual threat and opportunity for the inference platform. Their non-GPU architecture delivers inference at speeds and costs that fundamentally challenge GPU-based assumptions. The $10B OpenAI deal[3] and Meta Llama API partnership[4] validate market demand for ultra-fast, cost-efficient inference infrastructure. The platform should evaluate Cerebras as a potential compute partner or technology licensor alongside its existing NVIDIA/alternative silicon strategy. If the platform does not partner, it must match the 20x speed advantage or accept permanent disadvantage in latency-sensitive use cases.
Cerebras was founded in 2016 by five co-founders who previously built SeaMicro, a pioneer of energy-efficient microservers acquired by AMD in 2012 for $334M.[2] CEO Andrew Feldman holds degrees in Economics/Political Science and an MBA from Stanford. His track record of building and selling hardware companies gives Cerebras unusual credibility in the custom silicon space.
| Name | Title | Background |
|---|---|---|
| Andrew Feldman | Co-Founder & CEO[2] | Stanford (Econ/MBA). Co-founded SeaMicro (sold to AMD for $334M). Serial hardware entrepreneur. |
| Gary Lauterbach | Co-Founder & SVP Engineering[2] | SeaMicro co-founder. Former VP at Sun Microsystems. Chip architecture veteran. |
| Sean Lie | Co-Founder & Chief Hardware Architect[2] | SeaMicro. Lead architect of the Wafer-Scale Engine. |
| Michael James | Co-Founder[2] | SeaMicro founding team. |
| Jean-Philippe Fricker | Co-Founder[2] | SeaMicro founding team. |
| Round | Date | Amount | Valuation | Lead Investors |
|---|---|---|---|---|
| Seed/Series A | 2016-2018 | ~$112M | -- | Benchmark Capital, Eclipse Ventures[12] |
| Series B | Nov 2018 | $80M | -- | Benchmark Capital[12] |
| Series C | Nov 2019 | $60M | -- | Benchmark Capital[12] |
| Series D | Nov 2020 | $250M | -- | Altimeter Capital, Coatue[12] |
| Series E | Nov 2021 | $250M | $4.0B | Alpha Wave Ventures, Abu Dhabi Growth Fund[12] |
| Series F | 2023 | ~$250M | -- | G42, Alpha Wave[12] |
| Series G | Oct 2025 | $1.1B | $8.1B | Fidelity, Atreides, Tiger Global[7] |
| Series H | Feb 2026 | $1.0B | $23B | Tiger Global, Benchmark, AMD[6] |
| Total | ~$2.55B+ |
| Period | Revenue | YoY Growth | Net Loss | Notes |
|---|---|---|---|---|
| FY 2022 | $24.6M | -- | ($177.7M) | Early commercial stage |
| FY 2023 | $78.7M | +220% | ($127.2M) | G42 = 83% of revenue |
| H1 2024 | $136.4M | +935% (vs H1'23) | ($66.6M) | G42 = 87% of revenue |
| FY 2024 (Est.) | ~$500M | +535% | -- | Rapid diversification begun |
| FY 2025 (Est.) | >$1B | +100% | -- | OpenAI, Meta, DOE contracts |
Cerebras historically relied on G42 (UAE) for 83-87% of revenue.[7] This triggered a CFIUS national security review that delayed the IPO through much of 2025. By early 2026, Cerebras restructured its investor base, moving G42 out of its primary stakeholder list to satisfy U.S. regulators.[8] The OpenAI and Meta deals have materially reduced this concentration risk, but it remains a factor to watch.
The Wafer-Scale Engine is Cerebras's core innovation: an entire silicon wafer used as a single chip, rather than being cut into hundreds of individual dies.[1] This architectural approach eliminates the multi-GPU interconnect bottleneck that limits inference speed in conventional systems.
| Specification | WSE-1 (2019) | WSE-2 (2021) | WSE-3 (2024) |
|---|---|---|---|
| Process Node | 16nm (TSMC) | 7nm (TSMC) | 5nm (TSMC)[1] |
| Transistors | 1.2 trillion | 2.6 trillion | 4 trillion[1] |
| AI Cores | 400,000 | 850,000 | 900,000[1] |
| On-Chip SRAM | 18 GB | 40 GB | 44 GB[1] |
| Memory Bandwidth | 9.6 PB/s | 20 PB/s | 21 PB/s[1] |
| Peak AI Performance | -- | -- | 125 PFLOPS[1] |
| Die Area | 46,225 mm² | 46,225 mm² | 46,255 mm²[1] |
| System | CS-1 | CS-2 | CS-3 |
The MemoryX external memory system supports models up to 24 trillion parameters.[13] If the platform partnered with Cerebras, it could offer customers access to the largest open-source models (Llama 4, DeepSeek, Qwen3-235B) at speeds no GPU-based competitor can match, without building custom silicon in-house.
| Model | Cerebras (tok/s) | GPU Cloud (tok/s) | Speedup | Source |
|---|---|---|---|---|
| Llama 3.1 8B | 1,800 | ~90 | 20x | Cerebras[9] |
| Llama 3.1 70B | 2,100 | ~105 | 20x | Cerebras[9] |
| Llama 3.1 405B | 969 | ~50 | ~19x | Cerebras[14] |
| Llama 4 Scout | 2,600 | ~137 | 19x | Artificial Analysis[4] |
| TTFT (405B) | 240 ms | ~4,000 ms | ~17x | Cerebras[14] |
Most speed benchmarks are self-reported by Cerebras or verified by Artificial Analysis (which is not fully independent). Third-party benchmarks from SemiAnalysis confirm the cost advantage (32% lower than Blackwell B200)[5] but independent latency verification at production scale is limited. The platform should request direct benchmark access before making strategic decisions.
| Architecture | Wafer-Scale (single chip) |
| AI Performance | 125 PFLOPS |
| Memory Bandwidth | 21 PB/s (on-chip SRAM) |
| Inference Speed | 21x faster (claimed) |
| Cost per Token | 32% lower (SemiAnalysis) |
| Power | ~23 kW per system |
| Programming | Cerebras SDK (smaller ecosystem) |
| Scaling | SwarmX (up to 2,048 nodes) |
| Architecture | Multi-GPU (8x B200) |
| AI Performance | ~144 PFLOPS (FP4) |
| Memory Bandwidth | ~64 TB/s (HBM3e) |
| Inference Speed | Baseline |
| Cost per Token | Baseline |
| Power | ~14.3 kW per system |
| Programming | CUDA (dominant ecosystem) |
| Scaling | NVLink/InfiniBand (proven) |
As of March 2025, Cerebras operates data centers in Dallas, Oklahoma, Minnesota, Montreal, and California,[4] with a combined inference capacity exceeding 40 million tokens per second.[11] The OpenAI deal will add 750 MW of additional capacity through 2028.[3]
Even if Cerebras's claims are exaggerated by 50%, they are still 10x faster than standard GPU inference. The platform's ultra-low-latency target is achievable with standard NVIDIA hardware, but Cerebras is operating at a different order of magnitude. For latency-sensitive use cases (real-time agents, voice AI, autonomous systems), Cerebras creates a market tier the platform cannot reach with GPUs alone.
| Customer | Relationship | Deal Value | Details |
|---|---|---|---|
| OpenAI | Inference Compute[3] | $10B+ | 750 MW compute through 2028. Cerebras builds/leases DCs filled with WSE chips. OpenAI pays for cloud inference. |
| Meta | Llama API Partner[4] | Undisclosed | Powers Llama API inference at 2,600 tok/s. Announced at LlamaCon (Apr 2025). |
| G42 (UAE) | Supercomputer + Investor[10] | $500M+ | Condor Galaxy network (CG-1, CG-2, CG-3). 16 exaFLOPS total. 83% of 2023 revenue. |
| U.S. Dept. of Energy | National Lab Deployments[8] | Undisclosed | Scientific computing and AI research applications at national laboratories. |
| University of Edinburgh | EPCC Supercomputing[15] | Undisclosed | 4x CS-3 cluster deployed at EPCC. 70% faster than GPU solutions for research. |
| IBM | Enterprise Compute[8] | Undisclosed | Enterprise AI infrastructure contracts. |
| System | Location | Specs | Status |
|---|---|---|---|
| CG-1 | Santa Clara, CA | 64x CS-2, 4 exaFLOPS, 54M cores | Operational (Jun 2023) |
| CG-2 | Undisclosed | 4 exaFLOPS, 54M cores | Operational (Nov 2023) |
| CG-3 | Dallas, TX | 64x CS-3, 8 exaFLOPS, 58M cores | Under construction |
| CG-4 through CG-9 | Various | Planned total: 36 exaFLOPS | Planned |
Cerebras's customer base has transformed dramatically in 12 months. From 87% G42 concentration in H1 2024[7] to a portfolio that now includes OpenAI ($10B),[3] Meta,[4] IBM, DOE, and academic institutions.[8] This rapid diversification signals that the inference cloud product has found real market demand beyond the G42 anchor relationship.
| Model | Input ($/1M tokens) | Output ($/1M tokens) | Context Window | Notes |
|---|---|---|---|---|
| Llama 3.1 8B | $0.10 | $0.10 | 128K | Free tier: 1M tokens/day |
| Llama 3.3 70B Instruct | $0.60 | $0.60 | 128K | Core offering |
| Llama 3.1 405B | $6.00 | $12.00 | 128K | Largest Llama model |
| Qwen3-235B (A22B) | ~$0.22 | ~$0.80 | 131K | MoE architecture[16] |
| Qwen3-32B | Free tier | Free tier | 64K | Developer on-ramp |
| DeepSeek R1 | $1.35 | $5.40 | 164K | Reasoning model |
| GPT-OSS 120B | $0.15 | $0.60 | 131K | OpenAI open-weight |
| Provider | Llama 3.3 70B Input | Llama 3.3 70B Output | Speed (tok/s) | Cost Advantage |
|---|---|---|---|---|
| Cerebras | $0.60 | $0.60 | 2,100 | Baseline |
| Together AI | $0.88 | $0.88 | ~100 | Cerebras 32% cheaper |
| Fireworks AI | $0.90 | $0.90 | ~120 | Cerebras 33% cheaper |
| AWS Bedrock | $2.50 | $3.50 | ~60 | Cerebras 76-83% cheaper |
| Azure OpenAI | $2.68 | $3.50 | ~70 | Cerebras 78-83% cheaper |
Cerebras's pricing at $0.60/M tokens for Llama 3.3 70B[9] sets a market floor for high-speed inference. At these price points with 20x speed advantage, GPU-based providers face a structural disadvantage in cost-per-token. the platform must either: (1) partner with Cerebras to offer this speed tier, (2) match on price through operational efficiency with H100/H200, or (3) differentiate on sovereignty, compliance, and customization where Cerebras has no presence. Option 3 is the most defensible near-term strategy; Option 1 is the most aggressive.
Cerebras offers 1 million free tokens per day with no waitlist.[16] This developer-acquisition strategy mirrors what worked for OpenAI and Anthropic. Available models on the free tier include Qwen3-32B and Llama 3.1 8B. Pay-as-you-go options are also available via OpenRouter and Hugging Face integrations.
Cerebras's pricing is developer-focused and API-first. There is no enterprise tier with SLAs, dedicated capacity, compliance certifications (SOC 2, HIPAA, FedRAMP), or data residency guarantees. The platform's sovereign-ready positioning can command a 2-3x premium over Cerebras's public API pricing if paired with enterprise features that regulated industries require.
| Company | Architecture | Key Advantage | Primary Use | Threat to the platform |
|---|---|---|---|---|
| Cerebras | Wafer-Scale Engine | Speed (20x GPU) | Inference + Training | HIGH |
| Groq | LPU (Tensor Streaming) | Deterministic latency | Inference only | HIGH |
| SambaNova | RDU (Reconfigurable) | Enterprise features | Training + Inference | MEDIUM (Partner) |
| Etched | Sohu ASIC | Transformer-specific | Inference only | MEDIUM (Partner) |
| Google TPU | Custom ASIC (v5p/v6) | Integration with GCP | Training + Inference | LOW (captive) |
| AWS Trainium/Inferentia | Custom ASIC | AWS ecosystem lock-in | Training + Inference | LOW (captive) |
Both Cerebras and Groq are competing for the "fastest inference" positioning. Meta's Llama API gives developers both options: Cerebras for wafer-scale speed, Groq for LPU-based deterministic latency.[4] This dual-sourcing by Meta suggests the market wants multiple non-GPU inference options. A multi-chip strategy (multi-chip) is aligned with this trend, but lacks a wafer-scale option.
| Date | Event | Significance |
|---|---|---|
| Sep 2024 | Confidential S-1 filed with SEC[7] | Ticker: CBRS, Nasdaq. First disclosed revenue figures. |
| Late 2024 | CFIUS review launched[8] | G42's 87% revenue concentration triggered national security review. |
| Oct 2025 | S-1 withdrawn. $1.1B Series G raised.[7] | Pivoted to private round at $8.1B. Began restructuring G42 stake. |
| Jan 2026 | $10B OpenAI deal announced[3] | Materially reduced customer concentration. Cleared regulatory concerns. |
| Feb 2026 | $1B raised at $23B valuation[6] | Pre-IPO round. Tiger Global led. AMD participated as strategic investor. |
| Q2 2026 | Target IPO date[8] | Expected valuation: $22-25B. Would be largest AI chip IPO since ARM. |
| Date | Valuation | Multiple (Revenue) | Event |
|---|---|---|---|
| Nov 2021 | $4.0B | ~160x (FY22 revenue) | Series E |
| Oct 2025 | $8.1B | ~16x (est. FY24 rev) | Series G |
| Feb 2026 | $23.0B | ~23x (est. FY25 rev) | Series H |
| Q2 2026 (est.) | $22-25B | -- | IPO target |
A successful Cerebras IPO at $22-25B would give the company significant capital for: (1) building out data center infrastructure for the OpenAI contract, (2) aggressive pricing to capture inference market share, (3) hiring enterprise sales and go-to-market teams, and (4) potential acquisitions in the inference stack. Post-IPO Cerebras will be a more formidable competitor than pre-IPO Cerebras. The platform should accelerate its own go-to-market before Cerebras has public-market capital to compete on enterprise sales.
Action: License CS-3 systems or buy Cerebras Inference API capacity. Offer as The platform's "ultra-speed" inference tier.
Pro: Instant 20x speed advantage. Differentiates from GPU-only competitors.
Con: Dependency on single-chip vendor. Margin compression. Limited customization.
Fit: High -- Aligns with A multi-chip strategy (H100 + alternative silicon + Cerebras).
Action: Optimize GPU inference stack to minimize latency gap. Compete on price with operational efficiency.
Pro: Full control. No vendor dependency. Proven GPU ecosystem.
Con: Cannot close 20x speed gap with software alone. Cerebras has structural advantage.
Fit: Medium -- Viable for enterprise workloads where speed is less critical than compliance.
Action: Position the platform for regulated industries (healthcare, finance, government). SOC 2, HIPAA, FedRAMP. Data residency guarantees.
Pro: Cerebras has zero enterprise compliance infrastructure. Clear whitespace.
Con: Smaller TAM. Cerebras will eventually build compliance. Time-limited moat.
Fit: High -- Aligns with "sovereign-ready" positioning. Defensible for 12-18 months.
Action: Partner with Cerebras for speed tier (Option A) + build sovereignty moat (Option C). Offer tiered service: Standard (GPU), Fast (alternative silicon), Ultra (Cerebras).
Pro: Best of both worlds. Multi-chip strategy is The platform's stated approach.
Con: Execution complexity. Multiple vendor relationships to manage.
Fit: Highest -- Maximizes market coverage and differentiation.
| Dimension | the inference platform | Cerebras | Platform Advantage? |
|---|---|---|---|
| Inference Speed | Sub-120 µs/token (target) | 20x faster than GPU | No |
| Cost per Token | 30-50% below hyperscalers | 32% below NVIDIA Blackwell | Parity |
| Compute Platforms | 3+ platforms | 1 (WSE only) | Yes |
| Enterprise Compliance | Building (SOC 2, HIPAA target) | None | Yes |
| Data Sovereignty | Sovereign-ready, modular DCs | US/Canada only | Yes |
| Model Support | Open-source LLMs | Llama, Qwen, DeepSeek, GPT-OSS | Parity |
| Go-to-Market | Design partner phase | API cloud + $10B anchor deal | No |
| Capital | Platform capital | $23B valuation, IPO imminent | No |
Cerebras is not a direct competitor to the platform today. They are a potential compute supplier that could become The platform's most powerful inference accelerator. The risk is that Cerebras builds enterprise sales and compliance capabilities post-IPO, at which point they become a direct competitor with a 20x speed advantage. The platform's window to establish sovereign-ready, compliance-first positioning is 12-18 months. The recommended path: partner on speed, compete on everything else.