Gemini 3.5 Flash runs 5.5x pricier than predecessor
The cheapest tier of frontier models is no longer cheap, and retrieval systems will get more selective about which sources they reason over.
Key takeaways
- Gemini 3.5 Flash costs 5.5x more to run than its predecessor on benchmark tasks.
- On agent workloads, Flash costs 75% more than Gemini 3.1 Pro because it needs more interaction steps.
- Google, Anthropic, and OpenAI are all raising newer-model prices; the cheap-LLM era is over.
- Long PDFs and unstructured sources will be deprioritised by cost-sensitive retrieval systems.
- Brands win citations when their content is cheap to reason over: dense, structured, summarised up top.
What happened
Per The Decoder, Google's Gemini 3.5 Flash costs 5.5 times more to run than its predecessor in benchmark testing. On agent tasks, the total cost exceeds even the pricier Gemini 3.1 Pro by 75 percent, because the model requires more interaction steps than any rival tested.
Google is not the outlier. The Decoder notes that Anthropic and OpenAI have both made their newer models significantly more expensive. The era of each successive model being cheaper at the same capability tier is over. The new generation costs more, full stop.
The reason is structural. Frontier labs have spent hundreds of billions on training and infrastructure, and that capex has to be recouped. The cheapest tier of any major lab is no longer cheap.
Why it matters for your brand
If you are budgeting AI search visibility work or generative content pipelines on last year's token economics, your forecast is wrong. A 5.5x unit cost increase on the "Flash" tier (the supposed budget option) cascades through every downstream tool that touches your brand: agent-based research assistants, RAG pipelines that surface your content, summarisation layers in enterprise search, the AI Overview substrate. None of those costs sit still.
For financial services brands, this hardens a trend that compliance teams have been quietly tracking: the AI vendors your wealth management arm or institutional research desk integrates with will pass through these costs, either as per-seat price hikes or as quietly throttled context windows. Expect AI features that were free in 2024 to move behind enterprise contracts in 2026. If your brand's content is being retrieved through a third-party copilot, retrieval frequency will be optimised against cost. Less popular sources get dropped first.
For multilateral institutions and UN-system bodies, the implication is sharper. Your reports, frameworks, and standards documents are long. They are token-heavy. When agents reason over a UNDRR risk framework or a World Bank policy paper, they consume far more tokens than a short news summary. As model costs rise, retrieval systems will increasingly favour shorter, denser sources over comprehensive PDFs. The 200-page report is the most expensive object in the index. If your authority strategy depends on long-form publications being read whole by an LLM, that assumption is breaking. Executive summaries and standalone explainer pages are now the load-bearing assets, not the appendices.
For industrial groups, the agent-task finding is the one to watch. Gemini 3.5 Flash needs more interaction steps than rivals on agent workloads. Translation: when a procurement agent, a sustainability reporting agent, or a competitive intelligence agent crawls your investor relations site or your sustainability disclosures, the cost per query is climbing. Vendors building those agents will optimise. They will prefer well-structured, machine-readable sources that resolve in fewer hops. Brands with messy IR sites, PDF-only ESG disclosures, or fragmented product taxonomies will be deprioritised on cost grounds alone, regardless of how authoritative they are.
For philanthropic and policy institutions, the squeeze hits distribution. Foundations have been experimenting with AI-powered grantee discovery, policy briefing tools, and citation tracking. The unit economics of those experiments just changed. Expect a contraction in which sources get indexed and re-indexed. The institutions that maintain clean APIs, structured metadata, and short canonical summaries of long documents will retain visibility. Those that don't will quietly disappear from the answer layer.
The signal in context
The cheap-LLM era was the foundation of a particular kind of content strategy: flood the zone, let the models pick up what they will, optimise later. That strategy assumed retrieval costs would keep falling. They are not. Across Google, Anthropic, and OpenAI, the direction is the same: better models cost more, and the cost is showing up in agent workflows first because agents make multiple calls per query.
This reshapes the calculus of LLM visibility. When every retrieval has a measurable cost, retrieval systems become more selective. The brands that win citations in 2026 will be the ones whose content is cheapest to reason over: structured, dense, well-summarised at the top, with clear canonical URLs and minimal redundancy. Authority alone will not be enough if extracting your authority is expensive. The economic layer of AI search has caught up with the editorial one.