Competitive Intelligence Report

Baseten: Inference Platform Strategy Analysis

How an NVIDIA-backed inference startup built a custom C++ engine, scaled to $5B valuation, and is expanding into training to become a full-stack competitor to the platform

February 16, 2026 Analyst: MinjAI Agents For: AI Infrastructure Strategy & Product Leaders
Threat Level: HIGH — 28 Footnoted Sources
Page 1 of 10

Executive Summary

Baseten is a serverless AI inference platform that deploys and runs machine learning models in production.[1] Founded in 2019 by Tuhin Srivastava (CEO), Amir Haghighat (CTO), Philip Howes (Chief Scientist), and Pankaj Gupta,[2] the company has evolved from a no-code ML app builder into the leading independent inference-as-a-service provider. With NVIDIA's $150M strategic investment in January 2026[3] and a custom C++ inference engine replacing the standard Triton server,[4] Baseten directly targets the same enterprise inference workloads The platform is pursuing.

$5B[3]
Valuation (Jan 2026)
$585M[5]
Total Funding Raised
10x+[6]
YoY Revenue Growth
~191[7]
Employees (Jan 2026)
100+[8]
Enterprise Customers
100x[6]
Inference Volume Growth (2025)
10+[9]
Cloud Provider Partners
99.99%[10]
Uptime SLA
Strategic Implications

Baseten is a direct, high-threat competitor to The platform's inference-as-a-service ambitions. Key concerns: (1) NVIDIA's $150M strategic investment signals intent to make Baseten a preferred inference partner;[3] (2) their custom C++ engine and TensorRT-LLM integration achieve 2-3x throughput improvements over vLLM;[4] (3) expansion into training creates a full-stack competitor capturing the entire model lifecycle;[11] (4) multi-cloud capacity management across 10+ providers gives them geographic reach the platform lacks.[9]

Five Things Action Items

  1. Differentiate on sovereignty and compliance. Baseten runs on public cloud infrastructure.[9] The platform's air-gapped, physically isolated inference is a genuine moat for regulated industries.
  2. Invest in a custom inference engine. Baseten's C++ server replacement shows the standard Triton/vLLM stack is not competitive enough.[4] The platform needs proprietary optimization.
  3. Leverage multi-chip as a differentiator. Baseten is NVIDIA-only on GPUs.[12] A multi-chip strategy offers workload-optimal routing competitors cannot match.
  4. Target Baseten's pricing gaps. Their serverless pay-per-use model works for bursty startups but is expensive at scale.[13] The platform should compete on dedicated, predictable pricing for enterprise.
  5. Move fast on training. Baseten launched training in Q4 2025.[11] The platform should offer integrated fine-tuning-to-inference pipelines before Baseten matures this capability.
Page 2 of 10

Company Overview and Evolution

Leadership Team

NameTitleBackground
Tuhin SrivastavaCEO, Co-Founder[2]Former Data Scientist at Gumroad; ML fraud detection and content moderation[14]
Amir HaghighatCTO, Co-Founder[2]Led ML teams at Clover Health; population health management[14]
Philip HowesChief Scientist, Co-Founder[2]Former Data Scientist at Gumroad; ML infrastructure[14]
Pankaj GuptaCo-Founder[2]Engineering leadership

Timeline: From No-Code to Inference Leader

2019
Founded in San Francisco as a no-code platform for building ML-powered applications.[2] Early focus on giving data scientists full-stack engineering superpowers.
2021
Raised $2.5M seed round led by First Round Capital. Began building ML infrastructure components.[5]
2022-2023
Pivoted to inference infrastructure. Launched Truss open-source model serving framework.[15] Raised $13.5M Series A led by Sequoia.[5]
Mar 2024
Raised $40M Series B led by IVP and Spark Capital.[5] Launched Engine Builder for TensorRT-LLM optimization.[16]
Feb 2025
Raised $75M Series C co-led by IVP and Spark Capital.[5] Revenue growing 10x+ YoY.[6]
May 2025
Launched Model APIs and Baseten Training (closed beta). Expanded from inference-only to full model lifecycle.[11]
Sep 2025
Raised $150M Series D at $2.15B valuation, led by BOND. Became a unicorn.[17] CapitalG (Alphabet), Premji invest.
Nov 2025
Baseten Training reaches GA. Multi-node B200 GPU training with automated checkpointing.[11]
Jan 2026
Raised $300M Series E at $5B valuation. NVIDIA invests $150M. IVP and CapitalG co-lead.[3] Third fundraise in 12 months.
Key Insight: Speed of Capital Formation

Baseten raised three rounds totaling $525M in under 12 months (Feb 2025 to Jan 2026), going from ~$1B valuation to $5B. This velocity signals extreme investor confidence in the inference market and Baseten's position within it. The platform's fundraising narrative should emphasize the differentiated sovereign/multi-chip angle that Baseten cannot replicate.

Page 3 of 10

Funding History and Investor Analysis

Complete Funding Rounds

RoundDateAmountValuationLead Investors
Seed[5]2021$2.5M--First Round Capital
Series A[5]2023$13.5M--Sequoia
Series A+ (Greylock)[14]2023$20M--Greylock
Series B[5]Mar 2024$40M~$400M (est.)IVP, Spark Capital
Series C[5]Feb 2025$75M~$1B (est.)IVP, Spark Capital
Series D[17]Sep 2025$150M$2.15BBOND
Series E[3]Jan 2026$300M$5BIVP, CapitalG
Total~$585M

Strategic Investor Map

InvestorTypeStrategic Significance
NVIDIA[3]Strategic ($150M)Secures GPU supply + TensorRT-LLM integration. Part of NVIDIA's inference ecosystem strategy.
CapitalG (Alphabet)[3]StrategicGoogle Cloud partnership. Baseten on Google Cloud Marketplace.[18]
IVP[3]Financial (multi-round)Led or co-led Series B, C, and E. Deep conviction.
Spark Capital[5]Financial (multi-round)Series B and C lead. Early growth investor.
Greylock[14]Financial$20M follow-on. Enterprise SaaS expertise.
BOND[17]FinancialSeries D lead. Late-stage growth.
Conviction[5]FinancialSarah Guo. AI infrastructure thesis.
Strategic Implication: NVIDIA's Strategic Intent

NVIDIA's $150M investment in Baseten is not just financial. It complements NVIDIA's $20B acquisition of Groq and signals a deliberate strategy to control the inference stack.[19] For the first time, inference surpassed training in NVIDIA's total data center revenue in late 2025. Baseten becomes NVIDIA's preferred software layer for enterprise inference deployment. A multi-chip strategy (alternative silicon) directly challenges this NVIDIA lock-in, which is both The platform's biggest risk and biggest opportunity.

Page 4 of 10

Product Architecture and Technical Stack

Baseten has evolved from a model deployment tool into a full inference platform with four product tiers.[1] The architecture is designed around serverless GPU orchestration with multi-cloud capacity management.

Layer 4: Model APIs (Serverless Inference)[11]
Model APIs (Pre-optimized open-source models)[11]
Engine Builder (TensorRT-LLM auto-optimization)[16]
Speculative Decoding[4]
Disaggregated Serving[4]
Structured Output + Function Calling[16]
Layer 3: Dedicated Deployments[13]
Custom Model Hosting (Any framework via Truss)[15]
Autoscaling (Scale-to-zero, configurable)[15]
GPU Selection (A10G, A100, H100, B200)[13]
Multi-Region Deployment[9]
Self-Hosted (In customer VPC)[9]
Layer 2: Platform Services[9]
Multi-Cloud Capacity Mgmt (MCM)[9]
Baseten Training (Multi-node B200) GA[11]
Observability & Monitoring[10]
OpenAI-Compatible APIs[16]
Chains (Multi-model workflows)[10]
Layer 1: Infrastructure (Multi-Cloud)
Google Cloud (A4 VMs, Blackwell)[18]
Vultr (Cloud GPU + Bare Metal)[20]
AWS[21]
10+ Cloud Providers[9]
NVIDIA GPUs: A10G, A100, H100, H200, B200[12]
Warning: Full-Stack Expansion

Baseten's May 2025 launch of Training[11] transforms them from an inference-only provider into a full model lifecycle platform: train, fine-tune, optimize, deploy, serve. Their strategy: customers own model weights, Baseten captures the inference revenue. This is the same value-chain play The platform needs to execute.

Page 5 of 10

Technical Deep Dive: Custom Engine and Performance

Custom C++ Inference Server

Baseten's core technical differentiator is replacing the standard NVIDIA Triton Inference Server with a custom C++ server built directly on TensorRT-LLM.[4] This gives them tighter control over streaming output, structured generation, and request scheduling. The custom GRPC-based server eliminates Triton overhead while maintaining compatibility with TensorRT-LLM's kernel optimizations.

Key Technical Components

  1. Engine Builder: Automatically compiles TensorRT-LLM engines optimized for specific model + GPU + sequence length + batch size combinations. Supports deploy-time quantization to FP8 (Hopper) and NVFP4 (Blackwell).[16]
  2. Speculative Decoding: Coordinates draft model (e.g., Llama 8B) and target model (e.g., Llama 70B) on a single server. Production-grade with streaming, structured output, and request cancellation.[4]
  3. Disaggregated Serving: Separates prefill and decode phases onto different GPUs, eliminating resource competition. Independent scaling of each phase reduces latency.[4]
  4. BEI (Baseten Embedding Inference): High-throughput engine for embedding, reranker, and classifier models. Optimized for batch processing.[22]
  5. Multi-Cloud Capacity Management (MCM): Distributes workloads across 10+ cloud providers. Automatic failover ensures 99.99% uptime.[9]

Performance Benchmarks (Self-Reported)[4][12]

MetricImprovementBenchmark Context
Throughput (tokens/sec)2-3x vs. vLLM[4]TensorRT-LLM Engine Builder
Time-to-First-Token30% faster vs. vLLM[4]Engine Builder deployments
LLM Inference Speed33% faster[12]TensorRT-LLM vs. default
SDXL Inference40% faster[12]Image generation workloads
Cost-Performance (Blackwell)225% better[18]Google Cloud A4 VMs, high-throughput
Cost-Per-Token (Blackwell)Up to 10x reduction[12]vs. Hopper platform
Writer (Customer)60% higher tok/s[12]Palmyra LLMs on Baseten

Truss: Open-Source Model Serving

Truss is Baseten's open-source standard for packaging models.[15] It creates containerized model servers without requiring Docker knowledge, supporting any framework (PyTorch, TensorFlow, TensorRT, Triton). With 6,000+ GitHub stars,[15] Truss serves as the developer acquisition funnel: open-source users convert to paid Baseten deployments.

Technical Assessment

Baseten's custom C++ engine is a meaningful technical moat. By replacing Triton and building directly on TensorRT-LLM, they achieve performance gains that cannot be replicated by competitors using off-the-shelf serving stacks. However, this architecture is NVIDIA-locked. Baseten has no equivalent optimization for AMD, Intel, or custom ASIC chips. A multi-chip architecture (H100/H200 + alternative silicon) creates a fundamentally different, and potentially more defensible, competitive position.

Page 6 of 10

Customer Analysis

Baseten serves 100+ enterprise customers along with hundreds of smaller companies.[8] Their customer base is concentrated in AI-native companies building production applications, with a growing presence in regulated industries.

Key Customers[8][23]

CustomerSegmentUse CaseRelationship
Cursor[23]AI-Native (Code)AI code editor inferenceProduction inference
Notion[23]Enterprise SaaSAI features in productivity suiteProduction inference
Writer[8]Enterprise AICustom 70B LLM inference, 100% of inference on BasetenDeep partnership
Abridge[8]Healthcare AIClinical documentation, 100% inference on BasetenDeep partnership
Clay[23]Sales TechAI-powered sales intelligenceProduction inference
Descript[8]Media/CreativeAudio/video AI processingInference at scale
Superhuman[24]Enterprise SaaS80% faster embedding inferenceCustom model deployment
Sully AI[25]HealthcareClinical AI, open-source model migrationFull inference stack
Patreon[8]Creator EconomyContent moderation, recommendationsML deployment

Customer Concentration Pattern

Insight: Customer Profile Analysis
  • AI-native companies dominate. Cursor, Writer, Abridge, Descript, Clay are all AI-first businesses. These are high-volume, cost-sensitive buyers.
  • Enterprise SaaS is growing. Notion and Superhuman represent the next wave: traditional SaaS adding AI features. This is The platform's target market too.
  • Healthcare is a beachhead. Abridge and Sully AI show Baseten penetrating regulated industries, though still via public cloud, not sovereign infrastructure.
  • "100% of inference" relationships with Writer and Abridge[8] signal deep lock-in. These customers are very hard to displace.
Opportunity: Regulated Enterprise

Baseten's customer base skews toward AI startups and tech companies. They lack sovereign, air-gapped deployment capabilities. The platform should target defense, government, financial services, and healthcare enterprises that require physically isolated inference. Baseten's public cloud architecture cannot serve these customers without fundamental redesign.

Page 7 of 10

Pricing and Business Model

Baseten operates two pricing models: pay-per-token for Model APIs (serverless) and pay-per-minute for dedicated GPU deployments.[13] This dual approach captures both bursty startup workloads and sustained enterprise demand.

Model APIs: Per-Token Pricing[13]

ModelInput ($/1M tokens)Output ($/1M tokens)Notes
DeepSeek V3Pay-per-usePay-per-useServed on Blackwell A4 VMs[18]
DeepSeek R1Pay-per-usePay-per-useReasoning model[18]
Llama 4 MaverickPay-per-usePay-per-useLatest Meta model[18]
Custom/Fine-tunedGPU-minute billingGPU-minute billingDedicated deployment[13]

Dedicated Deployments: Per-Minute GPU Pricing[13]

GPU TypePricing ModelKey Features
NVIDIA A10GPer-minute billingEntry-level, image/audio models
NVIDIA A100 (40/80 GB)Per-minute billingStandard LLM serving
NVIDIA H100Per-minute billingHigh-performance LLMs
NVIDIA B200Per-minute billingLatest Blackwell, training + inference[11]

Pricing Strategy Assessment

Baseten Strengths

  • Scale-to-zero reduces cost for bursty workloads[15]
  • No upfront commitments on serverless tier
  • Engine Builder optimizes cost-per-token automatically[16]
  • Multi-cloud arbitrage for best GPU prices[9]

Baseten Weaknesses

  • No data center ownership = no structural cost advantage
  • NVIDIA-only GPUs limit cost optimization[12]
  • Serverless markup vs. dedicated is significant at scale
  • No energy asset ownership = exposed to compute cost inflation
Warning: Pricing Compression Risk

Baseten, Together AI, Fireworks AI, and DeepInfra are in an active price war on per-token inference.[26] NVIDIA Blackwell is enabling up to 10x cost-per-token reductions.[12] The platform should not compete on per-token pricing for open-source models. Instead, differentiate on: (1) dedicated, predictable pricing for enterprise; (2) multi-chip cost optimization; (3) sovereign deployment premiums for regulated industries.

Page 8 of 10

Competitive Positioning: Inference Platform Landscape

The AI inference market is projected to account for two-thirds of all AI compute by end of 2026, up from one-third in 2023.[6] The market exceeds $100B and is one of the largest and fastest-growing in tech history.[6]

Baseten vs. Inference Platform Peers

MetricBasetenTogether AIFireworks AIReplicate
Valuation$5B[3]~$3.3B[26]~$2B (est.)[26]~$1B (est.)
Total Funding$585M[5]$400M+[26]$250M+$100M+
Employees~191[7]~200~100~80
Key DifferentiatorCustom C++ engine[4]200+ models[26]Flash-Attention v2[26]Easy prototyping
Custom ModelsYes (Truss)[15]YesYesLimited
TrainingYes (GA)[11]YesNoNo
Multi-Cloud10+ providers[9]LimitedLimitedSingle
Self-HostedYes (VPC)[9]NoNoNo
NVIDIA Investment$150M[3]NoNoNo
Uptime SLA99.99%[10]99.9%99.9%Best effort
Enterprise FocusHighMediumMediumLow (prototyping)

Baseten vs. Inference Platform: Head-to-Head

Baseten

ArchitectureServerless, multi-cloud[9]
Chip StrategyNVIDIA only (TensorRT-LLM)[12]
Inference EngineCustom C++[4]
TrainingGA (B200)[11]
InfrastructureNo owned DCs; uses public cloud[9]
SovereigntyNo
Revenue10x YoY growth[6]
Customers100+ enterprise[8]

the inference platform

ArchitectureDedicated, sovereign-ready
Chip StrategyMulti-chip architecture
Inference EngineIn Development
TrainingPlanned
InfrastructureOwned DCs + air-cooled containers
SovereigntyYes (Air-gapped)
RevenuePre-revenue (inference)
CustomersDesign partner stage
The platform's Potential Advantages
  • Multi-chip architecture enables workload-optimal routing across multiple chip architectures. Baseten's NVIDIA-only approach creates vendor lock-in risk for customers.
  • Sovereign/air-gapped deployment serves defense, government, and regulated enterprise that Baseten's public cloud model cannot reach.
  • Owned infrastructure provides structural cost advantage. Baseten pays cloud provider markups on every GPU minute.
  • Energy assets reduce the largest variable cost in inference (power = a significant portion of total cost).
Page 9 of 10

Strategic Strategic Implications

What Baseten Got Right (Lessons)

#DecisionImpact
1Built a custom inference engine[4]2-3x throughput improvement over vLLM. Real technical moat vs. commoditized serving stacks.
2Secured NVIDIA as a strategic investor[3]$150M investment + preferential access to Blackwell GPUs + TensorRT-LLM co-development.
3Developer-first GTM via Truss[15]6,000+ GitHub stars. Open-source funnel converts developers to paid customers. Zero enterprise sales friction.
4Multi-cloud capacity management[9]10+ cloud providers = resilience + cost optimization + 99.99% uptime SLA.
5Expanded from inference to training[11]Full lifecycle capture. Customers who train on Baseten have zero friction deploying inference.
6Raised aggressively in a hot market[3]$525M in 12 months. Capital advantage over smaller competitors. Runway to sustain price wars.

Baseten's Vulnerabilities (Opportunities for the platform)

#VulnerabilityOpportunity
1No owned infrastructure[9]The platform's owned DCs and energy assets = 30-50% lower cost. Structural advantage at scale.
2NVIDIA-only GPU dependency[12]A multi-chip strategy hedges supply risk and offers workload-optimal routing.
3No sovereign/air-gapped capabilityDefense, government, healthcare enterprises need physically isolated inference. Baseten cannot serve them.
4~191 employees = thin engineering bench[7]Rapid expansion creates execution risk. The platform can target Baseten's underserved customer segments.
5Pricing compression from competitors[26]Baseten in price war with Together AI, Fireworks, DeepInfra. the platform competes on value, not price.
6Public cloud cost structureEvery GPU-minute includes cloud provider markup. The platform's owned infrastructure avoids this margin drag.

Recommended Actions

1. Build a Custom Inference Engine

Baseten proved that replacing Triton with a custom C++ server delivers 2-3x gains.[4] The platform needs equivalent proprietary optimization, ideally across multiple chip architectures.

2. Lead with Sovereignty

Baseten cannot offer air-gapped, physically isolated inference. The platform should make sovereign deployment the primary differentiator for enterprise sales.

3. Offer Predictable Enterprise Pricing

Baseten's serverless model is expensive at sustained volume.[13] The platform should offer flat-rate, dedicated pricing that undercuts Baseten by 30-50% for enterprise workloads.

4. Target NVIDIA Lock-in as a Weakness

Position multi-chip architecture as enterprise risk mitigation. NVIDIA supply constraints and Baseten's single-vendor dependency are real customer concerns.[12]

5. Ship Training Before Q2 2026

Baseten's training launch[11] captures the full model lifecycle. The platform needs integrated fine-tuning-to-inference before Baseten matures this capability.

6. Pursue Design Partners in Regulated Industries

Baseten's healthcare customers (Abridge, Sully AI)[8] run on public cloud. The platform can win these verticals with HIPAA-ready, air-gapped inference.

Page 10 of 10

Appendix

A. Product Comparison Matrix

CapabilityBasetenPlatform (Target)
Serverless InferenceLivePlanned
Dedicated InferenceLiveIn Dev
Custom Model DeploymentTruss FrameworkPlanned
Training (Fine-Tuning)GAPlanned
Multi-Chip SupportNVIDIA Onlymulti-chip
Sovereign/Air-GappedNoYes
Owned Data CentersNoYes
Energy AssetsNoYes
OpenAI-Compatible APIYesPlanned
Self-Hosted (Customer VPC)YesPlanned
Multi-Cloud10+ ProvidersOwned Infra Only
Speculative DecodingProductionEvaluating
Disaggregated ServingProductionEvaluating

B. Key Customers by Segment

SegmentCustomersInference Use Case
AI-NativeCursor, Writer, Descript, Clay[8][23]Core product inference at scale
Enterprise SaaSNotion, Superhuman, Patreon[23][24]AI feature integration
HealthcareAbridge, Sully AI[8][25]Clinical documentation, patient AI
Developer ToolsOxen AI[27]Dataset-to-model pipeline

C. Inference Market Context

MetricValueSource
Inference share of AI compute (2023)~33%Industry estimates[6]
Inference share of AI compute (end 2026)~66%Analyst projections[6]
Total inference market size$100B+Industry estimates[6]
NVIDIA inference vs. training revenueInference surpassed training (late 2025)Deloitte[19]
Blackwell cost-per-token improvementUp to 10x vs. HopperNVIDIA[12]

Sources & Footnotes

  1. [1] Baseten Homepage, "Inference Platform: Deploy AI models in production," product overview, baseten.co
  2. [2] Tracxn, "Baseten Technologies 2026 Company Profile," founders (Tuhin Srivastava, Philip Howes, Amir Haghighat, Pankaj Gupta), founding year 2019, tracxn.com
  3. [3] SiliconANGLE, "AI inference startup Baseten hits $5B valuation in $300M round backed by Nvidia," Jan 20, 2026, $300M Series E, IVP + CapitalG co-lead, NVIDIA $150M, siliconangle.com
  4. [4] Baseten Blog, "How we built production-ready speculative decoding with TensorRT-LLM," custom C++ server replacing Triton, speculative decoding, disaggregated serving, baseten.co/blog
  5. [5] Baseten Blog, "Announcing Baseten's $75M Series C," complete funding history: $2.5M seed (First Round), $13.5M Series A (Sequoia), $40M Series B (IVP, Spark), $75M Series C (IVP, Spark), total $585M, baseten.co/blog
  6. [6] Fortune, "Exclusive: Baseten, AI inference unicorn, raises $150 million at $2.15 billion valuation," Sep 2025, 10x+ revenue growth, 100x inference volume growth, inference market projections ($100B+, 2/3 of compute by 2026), fortune.com
  7. [7] Tracxn, Baseten employee count: 191 employees as of Jan 31, 2026, tracxn.com
  8. [8] CNBC, "AI startup Baseten raises $75 million following DeepSeek's emergence," 100+ enterprise customers, Writer, Descript, Abridge, Patreon customer mentions, "100% of inference" relationships, cnbc.com
  9. [9] Baseten Blog, "How Baseten multi-cloud capacity management (MCM) unifies deployments," 10+ cloud providers, self-hosted VPC option, multi-region, baseten.co/blog
  10. [10] ZenML LLMOps Database, "Baseten: Mission-Critical LLM Inference Platform Architecture," 99.99% uptime, observability, multi-model workflows (Chains), zenml.io
  11. [11] VentureBeat, "Baseten takes on hyperscalers with new AI training platform that lets you own your model weights," training launch, multi-node B200 training, automated checkpointing, sub-minute scheduling, venturebeat.com
  12. [12] NVIDIA Blog, "Leading Inference Providers Cut AI Costs by up to 10x With Open Source Models on NVIDIA Blackwell," Baseten on Blackwell, 10x cost-per-token reduction, TensorRT-LLM performance gains (33% faster LLM, 40% faster SDXL, 3x throughput), Writer 60% higher tok/s, blogs.nvidia.com
  13. [13] Baseten Pricing Page, per-token pricing for Model APIs, per-minute billing for dedicated deployments, GPU tiers, baseten.co/pricing
  14. [14] Greylock, "Self-Serve Apps for ML Teams," $20M funding, founder backgrounds (Tuhin + Phil at Gumroad, Amir at Clover Health), greylock.com
  15. [15] GitHub, "basetenlabs/truss: The simplest way to serve AI/ML models in production," 6,000+ stars, open-source model packaging, multi-framework support, github.com/basetenlabs/truss
  16. [16] Baseten Blog, "Introducing automatic LLM optimization with TensorRT-LLM Engine Builder," deploy-time quantization (FP8, NVFP4), OpenAI-compatible APIs, structured output, function calling, baseten.co/blog
  17. [17] BusinessWire, "Baseten Secures $150M Series D as the Premier Inference Platform for AI's App Layer," Sep 2025, $2.15B valuation, BOND lead, CapitalG + Premji new investors, businesswire.com
  18. [18] Google Cloud Blog, "How Baseten achieves 225% better cost-performance for AI inference," Google Cloud A4 VMs (Blackwell), 225% cost-performance improvement, DeepSeek V3/R1/Llama 4 Maverick served, cloud.google.com/blog
  19. [19] AInvest, "Nvidia's $150M Bet on Baseten: Securing the AI Inference Infrastructure Layer," NVIDIA ecosystem strategy, inference surpassed training in DC revenue (late 2025, Deloitte), complements $20B Groq acquisition, ainvest.com
  20. [20] Vultr Blog, "Deploy and Scale AI Inference Faster with Baseten on Vultr Cloud GPU," Vultr partnership, Cloud GPU + Bare Metal, multi-region deployment, blogs.vultr.com
  21. [21] AWS, "Baseten Delivers Fast, Scalable Generative AI Inference with AWS and NVIDIA," AWS partnership and success story, aws.amazon.com
  22. [22] Baseten Blog, "How we built BEI: high-throughput embedding, reranker, and classifier inference," custom embedding inference engine, baseten.co/blog
  23. [23] CapitalG, "Baseten: The Foundation for Production AI," Cursor, Notion, Abridge, Clay customer mentions, market positioning, capitalg.com
  24. [24] Baseten Customer Story, "Superhuman achieves 80% faster embedding model inference with Baseten," custom embedding deployment, baseten.co
  25. [25] Baseten Customer Story, "Sully AI returns 30M clinical minutes using open source," healthcare AI migration to Baseten, baseten.co
  26. [26] TechFundingNews, "Baseten nabs $300M from IVP, CapitalG to challenge Together AI in inference," competitive landscape, Together AI comparison, pricing dynamics, techfundingnews.com
  27. [27] Baseten Customer Story, "From datasets to deployed models: How Oxen AI builds on Baseten," developer tool customer, baseten.co
  28. [28] BusinessWire, "Baseten Raises $300M at a $5B Valuation to Power a Multi-Model Future," Jan 2026, full round details, multi-model future vision, businesswire.com

D. Methodology

This report was compiled from 28 primary sources including Baseten's corporate website, blog posts, customer stories, pricing page, official press releases (BusinessWire), investor announcements (CapitalG, Greylock), third-party research (Tracxn, Fortune, CNBC, SiliconANGLE, VentureBeat, TechFundingNews, AInvest), NVIDIA official blog and case study, Google Cloud blog, AWS partner success story, Vultr blog, ZenML analysis, and GitHub repository data. Revenue figures are estimated from investor disclosures and press reports. Employee count from Tracxn (Jan 31, 2026). Performance claims are self-reported by Baseten unless otherwise noted. Report accessed and compiled February 16, 2026.