Inference.net (formerly Kuzco) is a San Francisco-based AI inference platform that offers full-stack LLM fine-tuning and serverless inference APIs.[1] Founded in 2022 by Sam Hogan and Ibrahim "Abe" Ahmed,[2] the company operates a dual-business model: a centralized enterprise inference service (inference.net) and a decentralized GPU network built on Solana (devnet.inference.net).[3] In October 2025, the company raised $11.8M in seed funding led by Multicoin Capital and a16z CSX.[4]
Inference.net is not a direct competitor but a potential distribution partner. Their marketplace model for custom LLM inference aligns with The platform's low-cost positioning. The company's claims of 90% cost reduction mirror The platform's own value proposition. Their a16z + Multicoin backing signals crypto-native credibility. The platform should explore a partnership where Inference.net serves as a marketplace channel for The platform's sovereign-grade inference capacity, particularly for cost-sensitive enterprise workloads.
Inference.net began life as Kuzco, a Solana-based distributed GPU network for LLM inference launched in early 2024.[3] The project allowed anyone with a GPU to earn crypto rewards by serving inference requests for open-source models like Mistral and Llama 2. By mid-2024, the network had scaled to 8,500 active nodes with 18x growth in online GPUs since March 2024.[3]
The company then pivoted from a pure decentralized compute network to a full-stack enterprise inference platform, rebranding the enterprise-facing product as inference.net while maintaining the decentralized network under devnet.inference.net. This dual-track approach lets them serve both crypto-native communities and traditional enterprise customers.
| Name | Title | Background |
|---|---|---|
| Sam Hogan | Co-Founder & CEO[2] | Serial entrepreneur. Active in crypto/AI intersection. Based in San Francisco. Leads product vision and fundraising. |
| Ibrahim "Abe" Ahmed | Co-Founder & CTO[2] | Technical co-founder. Leads infrastructure, model training, and distributed systems engineering. |
| Amar Singh | Research Engineer[8] | Published researcher on hybrid-attention models, trustless inference verification (LOGIC protocol). |
| Michael Ryaboy | Engineer[8] | Technical contributor. Batch inference architecture and API design. |
Inference.net has a small but technically focused team (~20 employees estimated). The hiring of a Chief of Staff[5] signals organizational scaling post-seed round. The research team publishes regularly on model optimization, indicating genuine technical depth. However, the team lacks visible enterprise sales leadership, which may limit B2B traction.
| Round | Date | Amount | Lead Investors | Notable Participants |
|---|---|---|---|---|
| Seed[4] | Oct 2025 | $11.8M | Multicoin Capital, a16z CSX | Topology Ventures, Founders Inc., angel investors |
| Total | $11.8M |
The $11.8M seed is modest compared to competitors like Together AI ($426M), Fireworks AI ($552M), and Groq ($940M+). However, the a16z CSX + Multicoin combination signals strong backing from both the AI and crypto investor communities. The company's capital-efficient approach (decentralized GPU supply reduces infra CAPEX) could allow them to punch above their weight class.
Inference.net operates two complementary business lines that share underlying infrastructure but target distinct customer segments.
The primary revenue driver. Inference.net works hand-in-hand with engineering teams to train, host, and optimize custom language models.[1] The value proposition: custom models trained on private data that match frontier quality at a fraction of the cost. Target customers are organizations spending over $50,000/month on closed-source AI providers.[4]
| Product | Description | Pricing Model |
|---|---|---|
| Custom Model Training[10] | Full-stack distillation: fine-tune a teacher model, distill into 7-27B student model. Supports text, image, video, audio modalities. | Custom engagement (sales-driven) |
| Serverless API[1] | Pay-per-token APIs for open-source models (Llama, DeepSeek, Mistral, Gemma). OpenAI-compatible endpoints. | Pay-as-you-go per M tokens |
| Batch Inference API[1] | High-volume processing at reduced rates. Scales to billions of requests. | Discounted per M tokens |
| Dedicated Inference[1] | Private tenancy with predictable throughput/latency. Custom SLAs. | Reserved capacity pricing |
| Data Extraction[1] | Structured data extraction from documents using Schematron models. | Per-document or per M tokens |
A Solana-based protocol that crowdsources idle GPU compute from a distributed network of contributors.[6] GPU providers ("workers") earn $INT token rewards and USDC revenue for completed inference tasks. The staking protocol coordinates incentives through an epoch-based reward system.
| Component | Mechanism |
|---|---|
| GPU Workers | Anyone with a GPU can contribute compute. 8,500+ active nodes as of mid-2024.[3] |
| $INT Token | Protocol reward token distributed to operators and delegators via epoch system.[6] |
| USDC Revenue | Operators earn USDC for completed inference tasks, shareable with delegators.[6] |
| Staking | Delegators stake tokens to operator pools. Rewards split by commission rate.[6] |
| Verification | LOGIC protocol for trustless inference via log-probability verification.[8] |
The dual-track model is clever. The decentralized network provides cheap GPU supply (no data center CAPEX) and a crypto-native distribution channel (8,500+ node operators are also potential customers and evangelists). The enterprise service provides revenue and credibility. The bridge between them is the custom model pipeline: Inference.net trains models on the enterprise side and can deploy them across the decentralized network for cost-efficient inference. For the platform, this means a potential partnership could tap both channels.
Inference.net's technical stack spans custom model development, optimized inference serving, and distributed compute orchestration.
| Model | Parameters | Use Case | Key Claim |
|---|---|---|---|
| Schematron-3B[8] | 3B | HTML-to-JSON structured extraction at scale | Near-frontier quality at 10x lower cost |
| Schematron-8B[8] | 8B | Complex structured extraction, reasoning | Specialized extraction at significantly reduced cost |
| ClipTagger-12b[8] | 12B | Video understanding, captioning, Q&A | State-of-the-art video understanding at 15x lower cost |
Key result: A distilled Gemma 12B student matches a 27B teacher's performance on a single A100 GPU instead of eight H200s, achieving ~90% token accuracy at ~4x speed with 1/3 memory.[10]
Inference.net positions on aggressive pricing: up to 90% lower than "legacy providers" (their term for OpenAI, Anthropic, and hyperscaler endpoints). The pricing model is pure pay-as-you-go with no upfront commitments. New users receive $25 in free credits.[1]
| Model | Quantization | Input | Output | Notes |
|---|---|---|---|---|
| DeepSeek R1 | -- | $3.00 | $3.00 | Reasoning model |
| DeepSeek R1 Distill Llama 70B | -- | $0.40 | $0.40 | Cost-optimized |
| DeepSeek V3 | -- | $1.20 | $1.20 | General purpose |
| Llama 3.1 70B Instruct | -- | $0.40 | $0.40 | Popular choice |
| Llama 3.1 8B Instruct | -- | $0.03 | $0.03 | Floor pricing |
| Qwen 2.5 7B Vision Instruct | -- | $0.20 | $0.20 | Multi-modal |
| Mistral Nemo 12B Instruct | -- | $0.10 | $0.10 | Mid-range |
| Schematron-3B | BF16 | $0.02 | $0.05 | Proprietary |
| Schematron-8B | BF16 | $0.04 | $0.10 | Proprietary |
| ClipTagger-12B | FP8 | $0.30 | $0.50 | Vision model |
| Google Gemma 3 | BF16 | $0.15 | $0.30 | Multi-modal |
| Provider | Llama 3.1 70B (Input/M) | Model Count | Custom Models |
|---|---|---|---|
| Inference.net | $0.40 | ~15 | Yes (core product) |
| Together AI | $0.54 | 200+ | Yes (fine-tuning) |
| Fireworks AI | $0.70 | 100+ | Yes (fine-tuning) |
| Groq | $0.59 | 20+ | No |
| DeepInfra | $0.35 | 100+ | Limited |
| AWS Bedrock | $2.65 | 30+ | Yes (Bedrock Custom) |
At $0.03/M tokens for Llama 3.1 8B, Inference.net is pricing at or near cost for centralized GPU infrastructure. This level is only sustainable if (a) they run primarily on their decentralized GPU network (near-zero CAPEX) or (b) the serverless API is a loss leader to drive custom model engagement revenue, which carries much higher margins. The custom model service, which requires sales-driven engagements and specialized engineering, is likely the true profit center. The platform should note: the margin opportunity is in custom model hosting, not commodity open-source serving.
| Metric | Inference.net | Together AI | Context |
|---|---|---|---|
| Generation Time | 10.65s | 17.11s | Shared model benchmark |
| Time to First Token | 0.33s | 0.73s | Shared model benchmark |
| Custom Model Latency | 50ms | N/A | Classification tasks |
| Custom vs Frontier Cost | 90% lower | N/A | Self-reported claim |
Inference.net occupies a unique niche in the inference market: a custom model specialist with crypto-native infrastructure. They do not compete head-to-head with hyperscalers or large inference platforms on model breadth. Instead, they compete on custom model performance and cost for specific enterprise use cases.
| Category | Players | Inference.net Position |
|---|---|---|
| Hyperscalers | AWS, Azure, GCP | Positioned against as "90% cheaper alternative" |
| GPU Cloud | Crusoe, Lambda, CoreWeave | Not competing; they don't sell raw GPU |
| Inference Platforms | Together AI, Fireworks, Groq | Overlapping on serverless API, differentiated on custom models |
| Custom Model Shops | Scale AI, Predibase, Anyscale | Direct competition on fine-tuning + hosting |
| DePIN / Crypto Infra | io.net, Render, Akash | Overlapping on decentralized GPU, differentiated on enterprise service |
| Hardware-First | Cerebras, alternative silicon | No overlap; software + marketplace only |
The "90% cheaper" claim likely holds for specific use cases (custom distilled models vs. GPT-4 class frontier models) but is misleading as a general claim. A distilled 8B model will always be cheaper than a 405B model. The real question is whether their inference infrastructure delivers competitive cost-per-quality-adjusted-token at scale. The platform should benchmark this directly before engaging on partnership terms.
| Customer | Category | Likely Use Case |
|---|---|---|
| NVIDIA | Chip manufacturer | Likely inference testing / benchmarking, not a revenue customer |
| AWS | Cloud provider | Likely integration partner or marketplace listing |
| LAION | AI research nonprofit | Open-source model training data and research collaboration |
| Grass | DePIN / Web3 | Decentralized data network; crypto-native customer |
| Cal AI | Health/Fitness tech | Custom model; achieved 66% latency reduction[1] |
| Wynd Labs | Tech startup | Batch inference; achieved 95% cost savings[1] |
The customer list mixes genuine enterprise wins (Cal AI, Wynd Labs with specific metrics) with credibility-by-association logos (NVIDIA, AWS) that likely represent partnerships or integrations rather than paid inference customers. This is common for seed-stage companies. The crypto-native customers (Grass) are a natural fit given the Solana/DePIN heritage. The key question: are there $50K+/month enterprise contracts as their messaging suggests?
Custom models trained on private codebases. Claims higher productivity than frontier models for domain-specific code tasks.
High Volume
Extract summaries, entities, citations from long documents. Low cost and stable latencies using specialized smaller models.
Core Strength
Extract structured data from HTML/documents using Schematron models. Custom data schemas. Lightning-fast processing.
Core Strength
Higher accuracy than frontier models on domain-specific classification with latencies as low as 50ms.
Differentiated
Custom embedding models and rerankers to improve recall in enterprise search applications.
Emerging
ClipTagger-12b for video captioning, Q&A, and summarization at 15x lower cost than frontier vision models.
Differentiated
Inference.net's most ambitious public project is OSSAS (Open Science Summaries at Scale), which aims to process 100 million research papers using custom-trained language models. This serves as both a public good initiative and a demonstration of their custom model training and batch inference capabilities at scale. The project showcases their end-to-end pipeline: data curation, model training, and high-volume inference deployment.
The global AI inference market is projected to grow from $106B in 2025 to $255B by 2030 (19.2% CAGR).[12] Inference workloads will account for roughly two-thirds of all AI compute by 2026, surpassing training for the first time.[13] This macro trend validates both The platform's and Inference.net's strategic direction.
| Trend | Impact on Inference.net | Platform Impact |
|---|---|---|
| Price race to the floor Open-source models at <$0.20/M tokens | Mixed: their serverless API margins compress, but custom model premium remains | Negative for commodity inference; positive for differentiated services |
| Custom model adoption Fine-tuned 7B beats generic 70B at 10x lower cost | Strong tailwind for their core business | Opportunity to host custom models on the platform infrastructure |
| Inference > Training spend 55% of AI cloud spend on inference by 2026[13] | TAM expansion for all inference providers | Validates the inference platform's timing |
| Sovereign AI demand Governments mandating data residency | Decentralized network complicates compliance | The platform's sovereign-grade infrastructure is a differentiator |
| DePIN infrastructure Crypto-incentivized compute networks | Core to their supply strategy | Potential demand channel for the platform GPU capacity |
| Company | Total Funding | Valuation | Key Differentiator |
|---|---|---|---|
| Groq | $940M+[14] | $6.9B | Custom LPU chip, ultra-low latency |
| Together AI | $426M | $3.3B | 200+ models, open-source ecosystem |
| Fireworks AI | $552M[15] | ~$3B | FireAttention engine, HIPAA/SOC2 |
| Cerebras | $720M+ | $4.5B | Wafer-Scale Engine, 1800+ tok/s |
| Crusoe | $3.9B[16] | $10B+ | Energy-first IaaS, MemoryAlloy |
| Inference.net | $11.8M[4] | Undisclosed | Custom LLM marketplace, DePIN GPU supply |
Inference.net is a seed-stage startup competing in a market where the top 5 players have raised a combined $6.5B+. Their capital-efficient approach (decentralized GPU supply) is creative but unproven at enterprise scale. The 90% cost reduction claim is achievable for custom distilled models vs. frontier APIs, but this is table stakes: Together AI, Fireworks, and even AWS Bedrock offer similar fine-tuning capabilities with far more resources. Inference.net's edge is the combination of custom model expertise + crypto-native distribution.
Inference.net presents a low-threat, moderate-opportunity profile for the platform. The company is not competing for the same enterprise customers or the same infrastructure layer. Instead, it represents a potential distribution channel for The platform's inference capacity.
Inference.net needs reliable, cost-effective GPU supply for both its enterprise service and decentralized network. The platform has excess GPU capacity with competitive energy economics. A supply partnership could work as follows:
| Option | Description | Risk | Reward | Timeline |
|---|---|---|---|---|
| 1. GPU Supply Partner | Wholesale GPU capacity to Inference.net at negotiated rates | Low | Medium | 3-6 months |
| 2. Marketplace Integration | List the platform infrastructure on Inference.net's marketplace as a premium tier | Low | Medium | 6-9 months |
| 3. Custom Model Co-Development | Joint offering: Inference.net trains models, the platform provides sovereign-grade hosting | Medium | High | 6-12 months |
| 4. Monitoring Only | Track their growth but take no action | Low | Low | Ongoing |
| 5. Acqui-hire / Invest | Small investment or talent acquisition for custom model capability | High | High | 12+ months |
Start with a GPU supply agreement (low risk, fast to execute) while exploring a joint custom model offering for enterprise customers who need both sovereign infrastructure and task-specific models. This positions the platform as the infrastructure backbone while Inference.net handles the model optimization layer. The crypto-native community is a bonus demand channel, not the primary value driver.
| Risk | Severity | Likelihood | Strategic Implication |
|---|---|---|---|
| Funding runway $11.8M seed in a capital-intensive market | High | Medium | Partner may not survive 18+ months without Series A |
| Token regulatory risk $INT token and Solana staking may face SEC scrutiny | Medium | Medium | Could complicate partnership optics for the platform |
| Enterprise sales gap No visible enterprise sales leadership | Medium | High | Limits demand they can bring to a partnership |
| Decentralized network SLAs Consumer GPUs cannot match DC-grade reliability | Medium | High | Enterprise workloads will need Enterprise-grade infrastructure |
| Competitive squeeze Together AI, Fireworks expanding custom model services | Medium | High | Inference.net may get commoditized before scaling |
| Customer concentration Likely dependent on a few enterprise accounts | Medium | Medium | Revenue volatility risk in a partnership |
| Trigger | Signal | Platform Action |
|---|---|---|
| Series A announcement | $30M+ raise validates market traction | Accelerate partnership conversations |
| Token launch ($INT) | Mainnet staking goes live | Assess regulatory implications before deepening engagement |
| Enterprise customer win | Named F500 customer or $1M+ contract | Evaluate co-selling opportunity |
| Key hire (VP Sales/CRO) | Enterprise go-to-market scaling | Initiate partnership discussion |
| Acquisition by competitor | Together AI, Fireworks, or hyperscaler acquires | Reassess competitive landscape |
| Network growth >25K nodes | Decentralized GPU supply at meaningful scale | Explore supply integration |
Inference.net is an early-stage, technically capable team building at the intersection of enterprise AI and crypto infrastructure. They are too small to be a competitive threat to the platform but represent a genuinely interesting partnership channel. Their custom model expertise (Schematron, ClipTagger, distillation pipeline) could complement The platform's infrastructure play. The a16z + Multicoin backing gives them credibility in both the AI and crypto ecosystems. Recommended action: initiate a low-commitment conversation about GPU supply before their next funding round changes the economics.