The 100-Trillion-Token X-Ray: What OpenRouter Reveals About Real AI Usage
Why the loudest debates about AI miss what's actually happening in production
There are reports based on surveys. Others on benchmarks. And then there’s the OpenRouter report: 100 trillion tokens of actual usage data showing what people really do with Large Language Models when nobody’s watching. What emerges contradicts several popular narratives fundamentally.
What is OpenRouter anyway?
OpenRouter is essentially a single API layer providing access to hundreds of different language models – from GPT to Claude to open-source models like DeepSeek or Qwen. Instead of building separate integrations for each model, OpenRouter routes requests to the respective provider. This makes the platform something like an air traffic control tower for LLM inference: it doesn’t own the planes, but it sees an enormous portion of the traffic.
Specifically: over 300 active models from 60+ providers, millions of developers and end users, more than 50% of usage outside the US. The data basis for this report comprises metadata from over 100 trillion tokens – without access to the actual prompts or responses. Categories are created through sampling of ~0.25% of all requests that run through Google’s Natural Language Classifier. Geography is approximated via billing data, not IP addresses.
This isn’t a perfect view of the world, but it’s one of the largest and most diverse samples of actual production usage ever analyzed.
The five findings that shift the picture
1) Open source isn’t losing – and China is the real engine
Open-weight models reach about one-third of total token usage by end of 2025. That alone is remarkable, but the real story lies in the breakdown: Chinese open-source models went from practically zero at end of 2024 to nearly 30% weekly share at times – averaging about 13% over the year. Models like Qwen, DeepSeek, and Kimi haven’t just caught up technically, they’re now defining the dynamics in the open-source segment.
What this means: The “open vs. closed” debate is no longer just a Silicon Valley internal discussion about philosophy. It’s industrial geopolitics materializing in token flows. Western proprietary providers aren’t just competing with Meta or Mistral anymore, but with an entire ecosystem of Chinese models that iterate extremely fast and are globally available.
2) Reasoning models became standard – without anyone noticing
OpenAI’s o1 (internally “Strawberry”) marked the transition from single-pass generation to multi-step deliberation in December 2024. The report shows: reasoning-optimized models went from fringe phenomenon to over 50% of token share in 2025.
This isn’t marketing spin. It shows up measurably:
Average prompt length: ~4× growth (from ~1.5K to >6K tokens)
Completion length: nearly 3× growth (from ~150 to ~400 tokens)
Tool usage: steady increase, concentrated on models like Claude Sonnet and Gemini Flash
The shape of LLM usage has structurally changed. It’s no longer about “chat with a bot” but rather: “Load a pile of context, iterate over multiple steps, use tools, get precise outputs.” The typical request today is an analytical workflow, not creative generation.
3) The “killer app” is programming – by an enormous margin
Programming rose from ~11% of token usage in early 2025 to over 50% in recent weeks. This isn’t gradual shift, this is market consolidation around a single use case.
And this reflects in model choice: Anthropic’s Claude dominates this segment with over 60% of programming-related spend for most of the observation period. OpenAI worked its way up from ~2% to ~8%, Google remains stable at ~15%. What stands out: open-source providers like Qwen, Mistral, and the rapidly growing MiniMax are catching up.
Practical implication: The modern LLM economy is a throughput + context-window economy. If your model can’t handle long contexts cheaply and reliably, you’re irrelevant for the highest-volume category.
4) Roleplay isn’t a niche – it’s mass demand
This is the biggest conceptual surprise in the report: Roleplay accounts for ~52% of open-source token usage. And this isn’t diffuse smalltalk – ~60% of that is explicitly “Roleplaying Games,” with substantial shares for “Writers Resources” and Adult content.
The report’s interpretation is direct: open-source models have a structural advantage here because they’re less constrained by commercial moderation layers and easier to adapt for character-driven interactions.
What this means: Roleplay isn’t a niche. It’s one of the two primary demand sources shaping model training and fine-tuning incentives. Anyone building “serious AI” while ignoring this use case is overlooking a massive part of the real economy.
5) Price barely explains usage – the market is segmented, not elastic
The report plots cost vs. usage across all models and finds: the trendline is practically flat. 10% price decrease → only ~0.5-0.7% more usage (at market level).
Instead, you see clear segmentation:
Premium Leaders (Claude Sonnet, GPT-5 Pro): expensive, still high usage → willingness to pay for quality
Efficient Giants (Gemini Flash, DeepSeek V3): cheap, massive volume → default workhorses
Premium Specialists (GPT-4, GPT-5 Pro at ~$35/1M tokens): very expensive, low usage → reserved for high-stakes tasks
The market doesn’t behave like a commodity. There are different buyers buying different things. Closed-source models retain pricing power for mission-critical workloads. Open-source models absorb volume from cost-sensitive users.
The “Cinderella Glass Slipper” hypothesis: Why retention explains everything
One of the analytically strongest concepts in the report is the retention analysis. The thesis: Most models experience high churn, but early cohorts of some models remain extremely sticky – when a model first “cracks” an important workload, users build their pipelines around it and don’t switch.
Example: Gemini 2.5 Pro (June 2025) and Claude 4 Sonnet (May 2025) retain ~40% of users at month five – significantly higher than later cohorts.
The metaphor: There’s a latent distribution of unsolved high-value workloads. Each new frontier model gets “tried on” against these problems. When a model first meets the technical and economic constraints of such a workload, “the shoe fits” – and users stay.
Practically, this means: First-to-solve is more important than first-mover. Whoever first solves a critical workload binds users long-term. Later models don’t just need to be equivalent, but substantially better to get users to switch.
What the report really shows (and what’s often overlooked)
The multi-model ecosystem is reality. Nobody uses just one model. Developers and enterprises build stacks that switch between multiple models depending on the task. This isn’t a transitional state – this is the new normal.
Programming and Roleplay are the two volume killers. Everything else is comparatively noise. Anyone building AI infrastructure needs to optimize for these two categories.
Geography is shifting eastward. Asia rose from ~13% to ~31% usage share. China isn’t just a model developer but also an exporter. The notion that LLMs are a Western phenomenon is empirically refuted.
Agentic inference is taking over. Typical LLM usage is no longer an isolated request. It’s a structured, agent-like loop: invoke tools, reason over state, persist across longer contexts. Models that can’t do this fall behind.
Retention, not growth, is the signal. In a market with rapid capability jumps, what matters isn’t who acquires the most users, but who binds foundational cohorts – user segments whose retention remains stable even when new models launch.
Why this report matters (even if you hate AI discourse)
Because it replaces a bunch of lazy arguments with measurable reality:
The real war isn’t open vs. closed – it’s multi-model stacks and fast switching unless a model nails a workload.
The center of gravity isn’t “chat” – it’s long-context, tool-using, iterative workflows, dominated by programming.
The “creative” side isn’t a niche – it’s structurally important for demand (roleplay at scale).
The data shows that LLM usage isn’t uniform, exploratory behavior. It clusters tightly around a small set of repeatable, high-volume tasks. Roleplay, programming, and reasoning workflows each have clear structure and dominant patterns.
Source: OpenRouter State of AI Report


