Groq is a custom silicon company that designed the Language Processing Unit (LPU), a purpose-built chip for AI inference that delivers deterministic, ultra-low-latency token generation.[1] Founded in 2016 by Jonathan Ross, one of the original architects of Google's Tensor Processing Unit (TPU),[2] Groq has built both the hardware (LPU chips, GroqRack systems) and a cloud inference platform (GroqCloud) serving over 2.8 million developers worldwide.[3]
In December 2025, Nvidia announced a landmark ~$20 billion deal to license Groq's inference technology and hire its founding leadership team, including CEO Jonathan Ross and President Sunny Madra.[4] This represents Nvidia's largest transaction ever and a clear signal that the inference hardware market is consolidating around purpose-built silicon rather than general-purpose GPUs.
Groq represents a HIGH threat to The platform's inference-as-a-service ambitions, but with important nuances. The Nvidia acquisition signals that custom inference silicon is the future of this market. Groq's LPU delivers deterministic latency that GPUs cannot match, setting a performance benchmark the platform must address. However, Groq's pricing ($0.59/M input for Llama 3 70B) is higher than The platform's target range,[9] and the company cut its 2025 revenue projection from $2B to $500M due to data center capacity constraints.[6] A multi-chip, sovereign-ready approach occupies a different market position. The key risk: Nvidia now owns Groq's technology and will integrate it into its ecosystem at massive scale.
Groq was founded in 2016 by Jonathan Ross, a high school dropout who became one of Google's most inventive engineers.[2] While at Google, Ross initiated the Tensor Processing Unit (TPU) project as a 20% side project, designing and implementing the core elements of what became Google's dominant AI training chip.[2] He later joined Google X's Rapid Evaluation Team, incubating new "Bets" for Alphabet.
Ross left Google with a conviction: the future of AI would be defined not by training (where GPUs dominate) but by inference, where a fundamentally different architecture could deliver orders-of-magnitude improvements. He co-founded Groq with Douglas Wightman, a former Google X engineer, assembling a team of ex-Google, ex-Broadcom chip architects.[2]
| Name | Title | Background | Status |
|---|---|---|---|
| Simon Edwards | CEO[11] | Former CFO of Groq; previously CFO at Conga, ServiceMax (acquired by PTC 2023) | Current |
| Jonathan Ross | Founder & Former CEO[2] | TPU co-creator at Google; Google X Rapid Eval Team | Departed to Nvidia |
| Sunny Madra | Former President[4] | Enterprise scaling, operational leadership | Departed to Nvidia |
The Nvidia deal removed Groq's founder, president, and multiple senior executives simultaneously.[4] While Groq continues as an independent entity under Simon Edwards, the loss of the visionary founder and operational president creates significant execution risk. This is a common pattern in acqui-hire structures: the acquired company's innovation velocity often declines within 12-18 months.
| Round | Date | Amount | Valuation | Lead Investors |
|---|---|---|---|---|
| Seed / Series A[7] | Dec 2016 | $10.3M | -- | Social Capital (Chamath) |
| Series B[7] | Aug 2020 | ~$57M | -- | TDK Ventures |
| Series C[7] | Apr 2021 | $300M | -- | Tiger Global, D1 Capital |
| Series D[7] | Aug 2024 | $640M | ~$2.8B | BlackRock PE, Samsung, Cisco |
| Series E[5] | Sep 2025 | $750M | $6.9B | Disruptive, BlackRock, Neuberger Berman |
| Total | ~$1.75B | |||
| Nvidia Deal[4] | Dec 2025 | ~$20B | 2.9x last round | Nvidia (cash, asset license) |
The Language Processing Unit (LPU) is a fundamentally different processor architecture designed exclusively for inference workloads.[1] Unlike GPUs, which are general-purpose parallel processors adapted for AI, the LPU is a deterministic, single-core, programmable assembly line that eliminates the reactive hardware components (branch predictors, arbiters, reordering buffers, caches) responsible for non-deterministic behavior in GPUs.[14]
Training AI models requires massive parallel computation across many GPUs. Inference is a fundamentally sequential, memory-bandwidth-bound problem: the model must generate tokens one at a time, and each token depends on all previous tokens. GPUs are over-provisioned for this task. The LPU was designed from scratch to solve the inference bottleneck by maximizing memory bandwidth and minimizing latency, not maximizing FLOPS.[14]
The LPU uses data "conveyor belts" that move instructions and data between SIMD (Single Instruction, Multiple Data) functional units in a predetermined, compiler-scheduled pipeline.[14] Every instruction's execution time and data arrival is known at compile time. This means:
| Specification | LPU Gen 1 (TSP) | LPU v2 (Next-Gen) |
|---|---|---|
| Process Node | 14nm[14] | Samsung 4nm (SF4X)[15] |
| Die Size | 25 x 29 mm[14] | TBD (expected smaller) |
| On-Chip SRAM | 230 MB[1] | Expected increase |
| Memory Bandwidth | 80 TB/s on-die[1] | Expected increase |
| Compute | 750 TOPS INT8[1] | Expected 3-5x improvement |
| Clock Frequency | 900 MHz[14] | TBD |
| Compute Density | 1+ TeraOp/s per mm2[14] | Expected 3x+ improvement |
| Numerics | TruePoint (100-bit intermediate accum.)[14] | TruePoint enhanced |
| Manufacturing Partner | GlobalFoundries[14] | Samsung Foundry[15] |
For large models (70B+ parameters) that exceed a single LPU's SRAM capacity, Groq developed a plesiosynchronous protocol that cancels natural clock drift and aligns hundreds of LPUs to act as a single logical core.[14] The compiler predicts exactly when data arrives between chips, maintaining deterministic execution across the entire system. This is how Groq serves 70B and 120B parameter models at hundreds of tokens per second.
GPUs rely on HBM (High Bandwidth Memory) with bandwidth of ~3.35 TB/s (H100). Groq's LPU has 80 TB/s on-die SRAM bandwidth, roughly 24x the bandwidth of an H100. This is the fundamental source of Groq's speed advantage: the model weights are already on-chip, eliminating the memory transfer bottleneck that defines GPU inference latency.[1] The tradeoff: 230 MB SRAM per chip means you need many more chips to serve large models (a 70B model requires hundreds of LPUs vs. 8 H100s).
Groq's proprietary numerical format stores 100 bits of intermediate accumulation, providing sufficient range and precision for lossless computation regardless of input bit width.[14] This eliminates the accuracy loss that plagues lower-precision GPU inference (INT4, FP8) while maintaining speed. It is a meaningful engineering differentiator for applications where inference quality cannot be compromised.
Groq's LPU consistently tops independent inference benchmarks, particularly those measuring latency and output throughput.[8] The performance gap versus GPU-based inference providers is substantial.
| Metric | Groq LPU | Nvidia H100 (GPU) | Advantage |
|---|---|---|---|
| Llama 3 70B Output Speed | 280-300 tok/s[8] | 10-30 tok/s | ~10x faster |
| Llama 3 8B Output Speed | 1,300+ tok/s[8] | ~100 tok/s | ~13x faster |
| Time to First Token | 0.22 seconds[8] | 0.5-2.0 seconds | 2-9x faster |
| Latency Variance | Near-zero (deterministic) | High (stochastic) | Predictable SLAs |
| Energy per Token | 1-3 joules[8] | 10-30 joules | ~10x efficient |
| 500 Words Generation | ~1 second[8] | ~10 seconds | ~10x faster |
| Provider | Chip | Llama 3 70B (tok/s) | TTFT (sec) | Category |
|---|---|---|---|---|
| Groq | LPU | 280-300 | 0.22 | Custom Silicon |
| Cerebras | WSE-3 | ~200-250 | ~0.3 | Custom Silicon |
| Fireworks AI | GPU | ~60-80 | ~0.5 | GPU Cloud |
| Together AI | GPU | ~50-70 | ~0.6 | GPU Cloud |
| AWS Bedrock | GPU | ~20-40 | ~1.0 | Hyperscaler |
| Azure OpenAI | GPU | ~15-30 | ~1.5 | Hyperscaler |
The platform's ultra-low-latency target translates to roughly 8,300+ tokens/second, which is a different measurement (per-token latency vs. output throughput). The comparison is nuanced: Groq optimizes for output throughput and TTFT; The platform's target focuses on per-token latency. Both metrics matter for enterprise customers. The platform should benchmark on both dimensions to credibly position against Groq.
Groq claims its architecture is up to 10x more energy-efficient than conventional GPU-based inference deployments, consuming 1-3 joules per token vs. 10-30 joules for GPUs.[8] For the platform, which owns energy infrastructure, this creates an interesting dynamic: even if Groq's chip is more efficient per-token, The platform's structurally lower energy cost may offset the efficiency gap at the total-cost-of-ownership level.
Groq operates across two primary product lines: GroqCloud (cloud API inference) and GroqRack (on-premises/colocation hardware).[17] Both are built on the LPU hardware stack.
| Model | Input ($/1M) | Output ($/1M) | Context | Notes |
|---|---|---|---|---|
| Llama 3.1 8B Instant | $0.05 | $0.08 | 128K | Fastest, cheapest option |
| GPT-OSS 20B | $0.075 | $0.30 | 128K | OpenAI open model |
| Llama 4 Scout (17Bx16E) | $0.11 | $0.34 | 128K | MoE architecture |
| GPT-OSS 120B | $0.15 | $0.60 | 128K | Largest open model |
| Llama 4 Maverick (17Bx128E) | $0.20 | $0.60 | 128K | Large MoE |
| Qwen3 32B | $0.29 | $0.59 | 131K | Strong reasoning |
| Llama 3.3 70B Versatile | $0.59 | $0.79 | 128K | Primary benchmark model |
| Kimi K2 (1T MoE) | $1.00 | $3.00 | 256K | Largest model offered |
| Tier | Rate Limits | Features |
|---|---|---|
| Free | Limited requests/min | Playground access, basic API |
| Developer | Higher limits, pay-as-you-go | Full API, batch processing, prompt caching |
| Enterprise | Custom | SLAs, dedicated capacity, GroqRack options |
Groq's Llama 3.3 70B pricing of $0.59/M input and $0.79/M output is premium pricing justified by speed. For comparison, Crusoe charges $0.25/M input for the same model.[9] The platform's target of 30-50% below hyperscalers would put it at roughly competitive rates for 70B models, significantly below Groq. The positioning is clear: Groq owns speed, The platform should own cost-per-token for sovereign/enterprise workloads.
| Customer / Partner | Type | Deal Details | Significance |
|---|---|---|---|
| HUMAIN (Saudi Arabia)[10] | Sovereign Government | $1.5B commitment. GroqRack deployment in Dammam. OpenAI models hosted on sovereign infrastructure. | Direct the platform Competitor for sovereign inference |
| IBM[13] | Enterprise Partnership | GroqCloud integrated into watsonx Orchestrate. IBM Granite models on GroqCloud. RedHat vLLM integration. | Enterprise channel access via IBM's sales force |
| Nvidia[4] | Technology Licensing | ~$20B deal. Non-exclusive license. Key personnel transfer. | Validates LPU technology at highest level |
| Equinix[12] | Data Center Partner | Helsinki DC, US colocation. Equinix Fabric integration for private connectivity. | Global reach without owning DCs |
| Bell Canada[12] | Regional Partner | Canadian data center capacity. | Regional expansion for data sovereignty |
| Samsung[15] | Manufacturing + Investor | LPU v2 manufacturing on 4nm. Also a Series D/E investor. | Strategic supply chain alignment |
Groq's $1.5B HUMAIN deal in Saudi Arabia[10] is the most directly relevant competitive move for the platform. HUMAIN's sovereign data centers with Groq hardware hosting OpenAI models is exactly the type of deployment The platform is targeting. Key observations:
| Year | Revenue | Notes |
|---|---|---|
| 2024 (Actual) | $90M[6] | First full year of GroqCloud revenue |
| 2025 (Original Target) | $2.0B[6] | Told to investors during Series E raise |
| 2025 (Revised Target) | $500M[6] | 75% cut. Data center capacity constraints, Saudi delays. |
| 2026 (Projected) | $1.2B[6] | Post-Nvidia deal; uncertain given leadership exodus |
| 2027 (Projected) | $1.9B[6] | Assumes full LPU v2 ramp and global expansion |
Groq's 75% revenue projection cut (from $2B to $500M) just months after telling investors is a major red flag.[6] The company blamed "lack of data center capacity in regions where semiconductor input was scheduled." Translation: the Saudi/HUMAIN deployment was delayed, and Groq does not own its own data centers. This is a structural weakness: Groq depends on colocation partners (Equinix, DataBank) for capacity, creating supply bottlenecks it cannot directly control. The platform, which owns physical infrastructure, does not face this constraint.
On December 24, 2025, Nvidia announced the acquisition of Groq's assets for approximately $20 billion in cash, making it Nvidia's largest deal ever, surpassing the $7 billion Mellanox acquisition in 2019.[4]
| Component | Details |
|---|---|
| Value | ~$20 billion (cash)[4] |
| Premium | 2.9x over $6.9B Series E valuation (3 months prior)[4] |
| What Nvidia Gets | All of Groq's assets (IP, silicon design, compiler tech); non-exclusive license for inference technology; Jonathan Ross, Sunny Madra, and senior leadership[4] |
| What Nvidia Does NOT Get | GroqCloud business (continues independently)[4] |
| Groq Post-Deal | Continues as independent company under new CEO Simon Edwards[11] |
| License Type | Non-exclusive (Groq can continue licensing to others)[4] |
Analysts have noted this deal is structured to "keep the fiction of competition alive."[19] By acquiring assets and talent through a licensing agreement rather than a full corporate acquisition, Nvidia avoids triggering antitrust review while effectively absorbing Groq's innovation engine. The GroqCloud business continues independently, but without its founder, president, and key architects, its competitive trajectory is uncertain.
The real threat is not Groq itself. It is what Nvidia does with Groq's technology in 18-24 months. If Nvidia integrates LPU concepts (deterministic execution, massive SRAM, compiler-driven scheduling) into its next-generation inference chips, every GPU-based cloud provider (including The platform's potential infrastructure) could face a significant performance disadvantage. A multi-chip strategy (including non-Nvidia accelerators like alternative silicon) becomes even more strategically important as a hedge against Nvidia inference dominance.
The inference-as-a-service market is segmenting into three tiers: custom silicon providers (Groq, Cerebras, Etched), GPU cloud providers (CoreWeave, Crusoe, Lambda), and hyperscalers (AWS, Azure, GCP). Groq occupies a unique position as the speed leader, but at a pricing premium.
| Metric | Groq | Cerebras | Fireworks AI | Crusoe |
|---|---|---|---|---|
| Chip | LPU (custom)[1] | WSE-3 (wafer-scale) | Nvidia GPU | Nvidia/AMD GPU |
| Llama 70B Speed | 280-300 tok/s | 200-250 tok/s | 60-80 tok/s | Comparable to GPU |
| Llama 70B Pricing (input) | $0.59/M[9] | $0.60/M | $0.20-0.40/M | $0.25/M |
| Valuation | $20B (Nvidia)[4] | ~$4B | ~$2.5B | $10B+ |
| Revenue (2025) | $500M[6] | ~$100M | ~$200M | ~$1B |
| Own DCs | No (colocation) | No | No | Yes |
| Sovereign Play | Yes (HUMAIN) | Limited | No | Limited |
| Enterprise Partnerships | IBM, Samsung[13] | Limited | Developer focus | OpenAI, Oracle |
| Nvidia Relationship | Acquired[4] | Independent | Customer | Investor + Partner |
| Category | Custom Silicon |
| Chip | LPU (own design)[1] |
| Latency | Best-in-class (deterministic) |
| Pricing | Premium ($0.59/M for 70B)[9] |
| Sovereign | Yes (HUMAIN, GroqRack)[10] |
| Data Centers | Colocation only[12] |
| Revenue | $500M (2025)[6] |
| Risk | Leadership exodus, Nvidia dependency |
| Category | Inference-as-a-Service |
| Chip | Multi-chip architecture |
| Latency | Target: ultra-low-latency |
| Pricing | Target: 30-50% below hyperscalers |
| Sovereign | Core Strategy |
| Data Centers | Owned Infrastructure |
| Revenue | In development |
| Advantage | Energy ownership, multi-chip flex |
Unlike vertically integrated AI clouds (Crusoe, CoreWeave), Groq does not own data centers. It relies on colocation partnerships for global reach, which provides speed of deployment but limits control and created the capacity constraints that drove the 2025 revenue miss.[6]
| Region | Location | Partner | Status | Notes |
|---|---|---|---|---|
| North America | Multiple US sites | Equinix, DataBank[12] | Live | Primary capacity |
| North America | Canada | Bell Canada[12] | Live | Canadian data sovereignty |
| Europe | Helsinki, Finland | Equinix[12] | Live (Jul 2025) | Deployed in 4 weeks; green hydro power |
| Middle East | Dammam, Saudi Arabia | HUMAIN[10] | Ramping | $1.5B commitment; 200MW DCs under construction |
For enterprises and governments requiring full physical control, Groq offers GroqRack: pre-configured rack-scale systems containing 64 to 576+ LPUs per rack.[18]
| Feature | Details |
|---|---|
| Configuration | 64-576+ LPUs per rack[18] |
| Deployment | On-premises, colocation, or air-gapped[18] |
| Target Buyers | Hyperscalers, sovereign clouds, regulated industries (defense, healthcare, finance) |
| Data Residency | Full compliance with local data sovereignty requirements |
| Pricing | Enterprise / government contract pricing (not public) |
Groq's colocation-dependent model creates a structural bottleneck. The Helsinki DC was deployed in 4 weeks[12] (impressive speed), but the Saudi deployment delays cost Groq $1.5B in projected revenue.[6] As demand scales, Groq must either:
The platform's owned infrastructure is a significant competitive advantage in this context.
Groq has moved into the agentic AI space with Compound, its first compound AI system now in general availability on GroqCloud. Compound enables developers to build agents that conduct research, execute code, control browsers, and navigate the web. Key stats:
Agentic AI workloads generate 10-100x more inference calls than simple chat interactions. Each "thought step" in an agent chain requires a separate inference call. Groq's speed advantage becomes compounding in agentic workflows: a 10-step agent chain that takes 10 seconds on GPUs completes in ~1 second on Groq. This is why Groq is investing heavily in this space, and why The platform should track agentic workload patterns as a primary use case for its platform.
| # | Decision | Impact |
|---|---|---|
| 1 | Purpose-built silicon, not general-purpose GPUs[1] | 10x speed advantage. Set the latency benchmark for the entire industry. |
| 2 | Developer-first go-to-market[17] | 2.8M+ developers, 75% Fortune 100. Free tier created viral adoption (Feb 2024 launch). |
| 3 | Sovereign infrastructure play[10] | $1.5B Saudi commitment proves sovereign AI inference is a massive market. |
| 4 | Enterprise partnerships (IBM)[13] | IBM's sales force provides enterprise distribution Groq could never build alone. |
| 5 | Speed as brand[8] | Groq is synonymous with "fast inference." The brand positioning is razor-sharp. |
| # | Vulnerability | Opportunity |
|---|---|---|
| 1 | No owned infrastructure (colocation only)[12] | The platform owns physical infrastructure; no supply bottleneck |
| 2 | Leadership exodus to Nvidia[4] | Groq's innovation velocity will likely decline. Window to catch up. |
| 3 | Revenue miss (75% cut)[6] | Indicates execution challenges. Customer trust may be shaken. |
| 4 | Premium pricing ($0.59/M for 70B)[9] | The platform targets 30-50% below hyperscalers. Clear cost advantage. |
| 5 | Single-chip architecture[1] | A multi-chip strategy provides workload optimization flexibility. |
| 6 | SRAM capacity limits (230MB/chip)[1] | Large models (175B+) require hundreds of LPUs. Capital-intensive scaling. |
| 7 | Nvidia dependency post-deal | The platform offers vendor-neutral alternative for Nvidia-wary enterprises. |
Groq's 877 tok/s for Llama 3 70B is the performance bar.[8] the platform must publish competitive benchmarks on its multi-chip architecture to be taken seriously by enterprise buyers.
Groq's HUMAIN deal proves sovereign inference is a $1B+ market.[10] The platform's owned infrastructure and modular containers are better suited for sovereign deployment than Groq's colocation model.
Groq charges $0.59/M input for 70B models.[9] The platform should target competitive rates with strong gross margins, leveraging energy cost advantages to undercut Groq while maintaining profitability.
Nvidia will integrate LPU concepts into future chips.[4] the platform must plan for a world where Nvidia offers LPU-class inference performance in 18-24 months. Diversify chip partnerships now.
Groq's Compound platform[17] signals that agentic AI is the next major inference workload pattern. The platform should design its platform for multi-step, high-frequency inference calls from day one.
Post-Nvidia deal, Groq is no longer an independent competitor. Enterprises wary of Nvidia lock-in need a vendor-neutral alternative. A multi-chip strategy (H100/H200, alternative silicon) is that alternative.
| Dimension | Threat Level | Assessment |
|---|---|---|
| Technology | HIGH | LPU sets the latency benchmark. Nvidia integration will amplify reach. |
| Pricing | MEDIUM | Premium pricing. The platform can undercut on cost while maintaining margins. |
| Market Access | HIGH | 2.8M developers, IBM partnership, 75% Fortune 100 presence. |
| Sovereign / Enterprise | HIGH | HUMAIN deal is a direct precedent for The platform's target market. |
| Execution | MEDIUM | 75% revenue miss, leadership exodus create execution uncertainty. |
| Overall | HIGH | Groq + Nvidia is the most formidable inference competitor long-term. |
This report was compiled from 24 primary sources including Groq's corporate website, product documentation, press releases, independent benchmarks, financial data (Tracxn, Sacra, Crunchbase), analyst reports (SemiAnalysis, Seeking Alpha, FourWeekMBA), and news publications (CNBC, Bloomberg, TrendForce, The Information, Arab News). All performance benchmarks are as reported by Groq or independent testing organizations unless otherwise noted. Revenue projections sourced from investor communications and verified through multiple reporting outlets. Report accessed and compiled February 16, 2026.