Industry report·Model changes·7 June 2026·3 min read

Perplexity lets agents write their own search code

When the model writes the retrieval pipeline, ranking stops being a stable target and primary sources start to win.

85%

Token cost reduction per query

Perplexity Search as Code, reported by The Decoder

Key takeaways

Perplexity's agents now write their own Python search routines instead of calling a fixed API.
Token costs fall by up to 85%, permitting deeper reads of fewer, better sources.
Programmatic deduplication penalises press-release saturation and rewards primary evidence.
Brands optimising for a stable retrieval layer are optimising for an abstraction that is dissolving.
Industrials and multilaterals with structured, machine-readable documentation gain; volume-led PR loses.

Perplexity has stopped pretending search is an API call. The Decoder reports that the company's new "Search as Code" architecture lets its agents write their own retrieval routines in Python inside a sandbox, handling filtering, deduplication and ranking themselves rather than asking a fixed endpoint to do it. The claimed reward: benchmark wins over OpenAI and Anthropic, and token costs down by as much as 85%.

This is a bigger shift than the cost line suggests. For two years the dominant pattern across LLM search products has been retrieval-augmented generation with a tidy division of labour: a search API returns a ranked list, the model summarises. Perplexity is collapsing that boundary. The model decides what to fetch, how to slice it, what to throw away, and what to read twice. Search becomes a program the agent writes on the fly, not a service it queries.

What changes when the agent owns the pipeline

Three things, in order of consequence for brands.

First, the unit of optimisation moves. Classical SEO optimises for a ranker. Generative engine optimisation, so far, has mostly optimised for whichever retrieval layer sits in front of the model: Bing for ChatGPT, Google's grounding for Gemini, Perplexity's own index. If the agent writes its own filter in Python, "ranking" is no longer a single function with stable behaviour. It is whatever code the model wrote for this query, this session, this user. Per-query retrieval logic is hard to reverse-engineer and harder to game.

Second, deduplication moves inside the model's reasoning. Today a brand can win citations by being the cleanest, best-structured version of a fact that appears in many places. When the agent deduplicates programmatically, near-duplicates collapse earlier, and the surviving citation tends to be the most authoritative or the most distinctive, not the most numerous. Press-release saturation strategies degrade. Primary sources, original data and named analysts get a quiet lift.

Third, cost economics start to favour deeper reads. An 85% token saving is not a rounding error; it is permission to do more expensive retrieval per query without blowing the unit economics. Expect longer context windows fed with more sources, fewer shallow snippet matches, and more weight given to documents that survive a programmatic skim. Thin content optimised for a snippet grab loses ground to documents that hold up under scrutiny.

Who this hits

Financial services and multilaterals should read this carefully. Both sectors produce the kind of material that programmatic filters reward: dated, attributed, numerically specific, internally consistent. They also produce vast quantities of near-duplicate boilerplate (disclosures, communiques, repeated framing across regional sites) that an agent-written deduper will quietly discard. The asymmetry matters. A central bank's working paper or a UN agency's flagship report will tend to survive the cut; the fifteen press releases announcing it will not. Communications teams still measuring success by release volume are optimising for a layer that is being abstracted away.

Industrial groups face the opposite problem. Technical documentation, safety data, product specs and sustainability disclosures are exactly the kind of structured, verifiable content an agent will reach for. Most large industrials publish this material in PDFs buried three clicks deep, behind cookie walls, or fragmented across country sites. Search-as-code rewards whoever makes that material trivial to fetch and parse. The winners will be the firms that treat their documentation as a public API, not a compliance archive.

The wider direction of travel

Perplexity is the smallest of the three names in this story, and its benchmark claims deserve the usual scepticism reserved for vendor numbers. The architectural bet, though, is the interesting part, and it will not stay proprietary. OpenAI's Deep Research and Anthropic's agentic tool use are already heading in the same direction: give the model tools, let it compose them, stop pretending a single search call is enough. Once one major lab demonstrates a cost win of this magnitude, the others copy the pattern within two quarters.

The implication for anyone building visibility in LLM answers is that the stable optimisation target of 2024 to 2025, "rank well in the retrieval layer your target model uses", is dissolving. The new target is to be the document an agent keeps after it has written its own filter and run it. That favours specificity over volume, primary evidence over commentary, and machine-readable structure over prose that flatters a human editor. Brands that have spent the past year industrialising thin content for AI Overviews are about to find they optimised for the wrong abstraction.

Source: The Decoder

AI-authored, editor reviewed