Taalas hardwires AI model weights directly into transistors.1 This eliminates memory bandwidth bottlenecks entirely. One HC1 chip stores and runs Llama 3.1 8B without external DRAM.2
On February 19, 2026, Taalas announced a $169M round, bringing total funding to $219M.3 Quiet Capital and Fidelity led; semiconductor legend Pierre Lamond participated.4
HC1 generates 17,000 tokens/sec on Llama 3.1 8B: 73x faster than NVIDIA's H200.5 At ~200W per card, it fits standard air-cooled racks.6
Taalas validates model-specific silicon at production scale. If customers converge on 2-3 dominant open-source models, Taalas could undercut MARA on cost and latency. But it cannot serve proprietary or fast-iterating models, preserving MARA's flexibility for sovereign deployments.
| Attribute | Detail |
|---|---|
| Company Name | Taalas Inc. |
| Founded | August 2023 |
| Headquarters | Toronto, Canada |
| Employees | ~25 engineers |
| R&D Spend to Date | ~$30M (to reach product launch)6 |
| Tagline | "The model is the computer" |
| Stealth Exit | March 2024 ($50M Series A)7 |
| Name | Role | Background |
|---|---|---|
| Ljubisa Bajic | Co-Founder, CEO | Founded Tenstorrent (2016). Former AMD architect and senior manager of hybrid CPU-GPU designs. One year at NVIDIA as senior architect.8 |
| Lejla Bajic | Co-Founder, COO | Former senior engineer at ATI (now AMD). Senior manager of systems engineering. Earlier career at Altera (FPGA).8 |
| Drago Ignjatovic | Co-Founder, CTO | Former director of ASIC design at AMD. VP of hardware engineering at Tenstorrent.8 |
All three founders built Tenstorrent together. They bring decades of GPU, CPU, and AI processor design across AMD, NVIDIA, ATI, and Altera. Arguably the deepest chip-design founding team among AI inference startups.
Bajic founded Tenstorrent in 2016 as a programmable AI chip company. After Jim Keller took over as CEO, Bajic left and started Taalas in August 2023 with the opposite thesis: sacrifice all programmability for maximum inference speed.9
| Round | Date | Amount | Lead Investors |
|---|---|---|---|
| Series A1 & A2 | March 2024 | $50M (across two rounds) | Pierre Lamond, Quiet Capital7 |
| Series B | February 19, 2026 | $169M | Quiet Capital, Fidelity3 |
| Total | $219M |
Pierre Lamond is a semiconductor industry legend. He co-founded National Semiconductor and later became a partner at Sequoia Capital. His backing provides deep credibility in chip design circles.10
Quiet Capital led both the Series A and Series B rounds. Their continued investment signals strong conviction in the model-specific silicon approach.4
Fidelity joined the Series B round. Their participation brings institutional validation and long-term capital perspective.3
Taalas reached product launch with ~$30M in R&D and 25 engineers. Etched raised $620M; Cerebras raised $700M+ to reach comparable milestones. This efficiency reflects customizing two metal layers instead of designing entirely new architectures.
Taalas has not disclosed its valuation. For context, Etched was valued at $3.4B after its $500M raise.11 Groq was acquired by NVIDIA for $20B (announced December 2025).12 Taalas' $219M total raise suggests a valuation in the $800M-$1.5B range. Unconfirmed estimate based on comparable valuations.
Taalas' "Hard Coded Inference" (HC) approach stores model weights directly in transistors using mask ROM.2 A single transistor stores a weight and performs its associated multiply operation. This eliminates the compute-memory barrier that limits GPU performance.6
Traditional GPUs shuttle weights between HBM DRAM and compute units, creating a memory bandwidth bottleneck. Taalas removes it by co-locating weights with compute at the transistor level.
| Specification | Detail |
|---|---|
| Process Node | TSMC N6 (6nm)2 |
| Die Size | 815 mm² (near reticle limit)2 |
| Transistor Count | 53 billion6 |
| Parameters per Chip | 8 billion (Llama 3.1 8B)5 |
| Power Consumption | ~200W per card6 |
| Weight Storage | Mask ROM recall fabric (4 bits per transistor)5 |
| Programmable Memory | SRAM for KV cache + fine-tuned weights6 |
| Form Factor | PCI-Express card |
| Server Config | 10 HC1 cards + dual-socket x86 = 2,500W total6 |
| Cooling | Standard air-cooled racks |
Taalas builds a nearly complete processor with approximately 100 metal layers.5 Only the final two metal layers are customized for each model. These layers encode the specific weight values as mask ROM patterns.
This approach yields three advantages:
Think of the HC1 as a vinyl record. The "grooves" (weights) are physically etched into the medium. Playback is instant and power-efficient because there is no software layer translating instructions. The tradeoff: you cannot change the song.
The HC1 stores model weights in mask ROM. This is non-volatile, non-programmable storage baked into silicon during fabrication. A small SRAM block provides programmable storage for:
This eliminates all HBM from the design. HBM modules are expensive, power-hungry, and supply-constrained. Removing them cuts BOM cost significantly.5
Taalas uses TSMC N6 process. TSMC allocates capacity to high-volume customers (NVIDIA, Apple, AMD) first. A startup ordering custom ASICs faces allocation risk, longer lead times, and minimum order requirements. Whether Taalas can scale from prototype to volume production depends on TSMC capacity availability.
| Platform | Tok/s per User | vs HC1 | Architecture | Verified? |
|---|---|---|---|---|
| Taalas HC1 | 17,0005 | 1.0x (baseline) | Model-specific mask ROM | Self-reported |
| Cerebras CS-3 | ~2,00014 | 8.5x slower | Wafer-Scale Engine | Published benchmark |
| SambaNova SN40L | ~90014 | 19x slower | Dataflow / RDU | Published benchmark |
| Groq LPU | ~60014 | 28x slower | TSP / SRAM-based | Published benchmark |
| NVIDIA H200 | ~2335 | 73x slower | General-purpose GPU | Published benchmark |
Taalas' 17K tok/s figure is self-reported. Independent verification is pending. The metric is per-user single-stream throughput, not batched aggregate. GPU performance improves significantly with batch sizes >64, narrowing the gap in high-concurrency scenarios.
| Platform | Power Draw | Tok/s per Watt |
|---|---|---|
| Taalas HC1 | ~200W | ~85 |
| NVIDIA H200 | ~700W | ~0.3 |
| Groq LPU | ~300W | ~2.0 |
| Cerebras CS-3 | ~23,000W (full system) | ~0.09 |
Taalas claims 1/10th the power of GPU equivalents for the same workload.5 If validated, HC1 is ideal for edge and power-constrained sovereign installations.
Back-of-envelope estimates based on published throughput and power data. Assumes $0.10/kWh energy cost and 80% utilization.
| Platform | Throughput (tok/s) | Power (W) | Est. $/M Tokens | Basis |
|---|---|---|---|---|
| Taalas HC1 | 17,000 | ~200 | ~$0.0003 | Self-reported specs |
| NVIDIA H200 | ~233 | ~700 | ~$0.08 | Published benchmarks |
| Groq LPU | ~600 | ~300 | ~$0.01 | Published benchmarks |
| Cerebras WSE-3 | ~2,100 | ~23,000 | ~$0.03 | Published benchmarks |
These are simplified estimates. Actual costs depend on GPU utilization, batch sizes, memory bandwidth, and infrastructure overhead. Taalas HC1 numbers are self-reported from single-stream tests. GPU performance improves significantly with batch sizes above 64.
HC1 achieves far lower latency than GPU systems without query batching.6 This matters most for real-time agents and voice assistants, where single-user latency outweighs aggregate throughput.
Taalas plans frontier-scale (100B+) support via pipeline parallelism: multiple HC cards over PCI-Express, each handling a portion of model layers.6 The company has demonstrated DeepSeek R1 671B using this multi-card approach.
Taalas is pursuing two parallel paths: (1) selling PCI-Express inference cards directly to customers, and (2) building its own inference infrastructure to offer API access on open-source models. The dual approach hedges between hardware and service revenue.16
| Segment | Use Case | Value Proposition |
|---|---|---|
| Hyperscale API Providers | Serving open-source models at scale | Lowest cost-per-token, highest throughput |
| Real-Time Applications | Voice agents, chatbots, coding assistants | Sub-millisecond latency without batching |
| Edge / Sovereign | On-premise inference in constrained environments | Air-cooled, 200W, standard rack deployment |
| Model Developers | Custom chip for proprietary model serving | 2-month turnaround, dedicated silicon |
The AI inference market is projected to represent two-thirds of all AI compute spending by 2026.12 NVIDIA's $20B Groq acquisition (announced December 2025) confirms the inference era is here. The total addressable market for inference-specific hardware exceeds $100B annually.
Taalas' best customers share three characteristics:
Taalas has not publicly named any customers or design partners. The HC1 is described as "running inference today," but no production deployment has been confirmed by a third party. This is a notable gap for a company at this funding stage.
| Company | Approach | Funding | Flexibility | Status |
|---|---|---|---|---|
| Taalas | Model-specific mask ROM | $219M | None (one model per chip) | HC1 in production |
| Etched | Transformer-only ASIC (Sohu) | $620M11 | Any transformer model | Pre-production |
| Cerebras | Wafer-Scale Engine | $700M+17 | Any model | CS-3 shipping |
| Groq (NVIDIA) | SRAM-based LPU | Acquired $20B12 | Any model | Integrated into NVIDIA |
| NVIDIA GPUs | General-purpose GPU | Public ($3T+ mkt cap) | Universal | H200/B200 shipping |
| SambaNova | Dataflow / RDU | ~$1.6B18 | Any model | SN40L shipping |
AI inference chips exist on a spectrum from fully flexible to fully specialized:
Taalas occupies the extreme end: zero flexibility, maximum performance. The core bet is that popular open-source models will have long enough lifespans to justify model-specific silicon.
NVIDIA's $20B acquisition of Groq (announced December 2025) reshaped the competitive landscape.12 Key implications:
Model obsolescence is an existential risk. If Meta releases Llama 4 and customers migrate within months, every HC1 chip becomes a paperweight. Taalas' 2-month fab cycle is fast for silicon but slow compared to a GPU software update. Rapid model iteration favors flexible hardware like GPUs.
| Date | Milestone | Significance |
|---|---|---|
| Aug 2023 | Founded by three ex-Tenstorrent leaders | Deep chip design pedigree from AMD/NVIDIA/Tenstorrent |
| Mar 2024 | $50M Series A; exited stealth | First public disclosure of model-specific silicon approach |
| Q3 2024 | HC1 tape-out at TSMC | Validated fabrication on TSMC N6 process |
| Q1 2025 | HC1 delivered to early customers | First model-specific inference chip in the field |
| Feb 2026 | $169M Series B; 17K tok/s benchmark | Production-validated performance; institutional investor backing |
Taalas poses a significant but bounded threat to MARA's IaaS strategy. Risk peaks when customers standardize on few open-source models.
| Dimension | Threat Level | Analysis |
|---|---|---|
| Cost Competition | Critical | If HC1 delivers 73x throughput at 1/10th power, cost-per-token could be 10-50x lower than GPU-based inference. MARA cannot match this on GPUs. |
| Latency Competition | Critical | Sub-millisecond latency without batching. MARA's <120 us/token target is competitive, but Taalas may achieve lower absolute latency on supported models. |
| Model Coverage | Low | HC1 runs exactly one model. MARA's multi-chip strategy supports arbitrary models, including proprietary and fine-tuned variants. This is MARA's strongest differentiator. |
| Sovereign Readiness | Medium | HC1's air-cooled, low-power design fits sovereign/edge deployments well. But MARA's modular container approach offers more flexibility for government requirements. |
| Time to Market | Medium | Two-month fab cycle is fast for silicon but slow vs. GPU software deploys. Customer requirements for new model versions create ongoing friction. |
Taalas represents the most radical approach in the inference chip landscape: zero flexibility, maximum performance. With $219M raised, a proven founding team from Tenstorrent/AMD/NVIDIA, and validated 73x H200 throughput, this is not vaporware. But the one-model-per-chip limitation constrains its addressable market to stable, high-volume open-source model deployments. MARA's multi-chip, multi-model platform strategy remains the superior approach for enterprise customers needing flexibility, sovereignty, and proprietary model support.
| Unknown | Why It Matters | How to Monitor |
|---|---|---|
| Batch performance | 17K tok/s is single-stream. Enterprise inference runs batched. Batch performance is unknown. | Request benchmark data or wait for independent testing. |
| Production customers | No named customers running at scale. Threat is theoretical until proven. | Monitor customer announcements and case studies. |
| TSMC allocation | Fab capacity determines whether Taalas can scale beyond prototype. | Track manufacturing partnership announcements. |
| 20B parameter chip timeline | Summer 2026 target for larger models. Delay would narrow the competitive window. | Monitor product announcements and partner disclosures. |
| Model obsolescence rate | If top open-source models change every 6 months, ROI on model-specific chips is poor. | Track Llama, Mistral, DeepSeek release cadence vs HC1 fabrication cycle. |