Anthropic's token inflation quietly doubles Claude costs
Anthropic's unchanged rate card obscures a near-doubling of real per-task costs — a pattern that breaks enterprise cost models and content visibility assumptions alike.
Key takeaways
- Claude Sonnet 5 consumes ~40% more tokens per task than its predecessor, nearly doubling effective costs at unchanged list prices.
- Anthropic has done this before: Claude Sonnet 3.7 showed the same pattern, making this a pricing strategy, not a model quirk.
- Agentic workflows compound the problem — token overhead multiplies across orchestration loops, not just single prompts.
- The correct procurement metric is cost per completed task, not cost per thousand tokens.
- Token inflation also shifts how models read and weight sources, creating a secondary risk for brands optimised to earlier Claude versions.
Claude Sonnet 5 consumes roughly 40% more tokens per task than the model it replaces, The Decoder reports. The list price per token is unchanged. The actual cost per unit of work has nearly doubled. Anthropic has not announced a price increase because, technically, it has not raised prices.
This is a sleight of hand worth examining carefully, because it is becoming a deliberate pattern rather than an incidental side effect of a more capable model.
The mechanism of the hidden hike
When a model vendor raises per-token rates, the increase is visible, comparable, and easy to object to. When a new model simply uses more tokens to accomplish the same task, the economics shift invisibly. Procurement teams see the same rate card. Finance teams approve the same unit cost. The bill arrives 40% higher anyway.
Claude Sonnet 5 sits fifth in the Artificial Analysis Intelligence Index with 53 points and outperforms the more expensive Opus 4.8 on several agent-based tasks. That is a genuine capability gain. The question is whether enterprise buyers are pricing in what that gain actually costs to deploy at scale.
The answer, for most teams running agentic workflows, is almost certainly no. Agentic tasks compound token consumption: a model that is verbose in its chain-of-thought reasoning, thorough in its tool calls, and expansive in its outputs will run through tokens at rates that dwarf a simple prompt-response exchange. A 40% per-task overhead in that context does not add 40% to an API bill; it can double or triple it once the downstream effects on orchestration loops are counted.
Why this matters for large-scale deployments
For a financial services firm running compliance-checking agents, or a multilateral organisation using LLMs to process policy documentation at volume, the difference between the stated and the real cost of inference is a budget risk, not an abstraction. These organisations typically run multi-year technology procurement cycles and sign contracts based on vendor rate cards. A pattern of capability-linked token inflation renders those forecasts structurally unreliable.
The Decoder's framing is pointed: this is not the first time Anthropic has done this. Claude Sonnet 3.7 exhibited the same behaviour relative to its predecessor. A single data point is a model characteristic; two data points in a row is pricing strategy. Enterprises that signed cost models against Sonnet 3.x benchmarks and upgraded expecting continuity absorbed an unannounced price increase.
Anthropic is not unique in this. OpenAI's reasoning models, particularly the o-series, carry explicit warnings about elevated token consumption. The difference is that o-series models are positioned and priced as a separate, premium tier. Anthropic is using the same model family and version numbering to obscure what is functionally a tier change.
What brands building on Claude need to recalibrate