Field note·Citation patterns·30 June 2026·3 min read

Network traffic analysis exposes ChatGPT's real source logic

ChatGPT fetches sources selectively and favours third-party-validated facts. Brands publishing only on owned channels face a compounding visibility gap.

Key takeaways

ChatGPT only triggers live retrieval for certain queries: those with recency signals, proper nouns, or implied data staleness.
Pages cited by other credible sources outperform standalone brand content, regardless of on-page quality.
Machine-readable factual density matters more than narrative clarity for retrieval selection.
Brands without a strong entity footprint in training data are disadvantaged before retrieval even begins.
Multilaterals and specialist institutions that distribute research only via owned channels are most exposed.

Search Engine Journal's Suganthan Mohanadasan did something most GEO commentary skips: instead of inferring ChatGPT's source logic from what it says, he read the network traffic to see what it actually fetches. The gap between those two things is where most brand strategies quietly fail.

The core finding is blunt. ChatGPT does not search for every query. It searches selectively, triggered by signals in the query itself: recency markers ("latest", "2025"), proper nouns, named entities, and questions that imply the model's training data is likely stale. For queries that lack those triggers, the model answers from parameters alone, pulling on whatever was baked in during training. No live retrieval. No citation opportunity. A brand that has invested heavily in crawlable web content may still be invisible, not because its pages were rejected, but because the model never fetched anything at all.

What the traffic actually shows

When retrieval does fire, the network calls reveal a preference structure that generic GEO advice tends to obscure. ChatGPT favours sources with high third-party citation counts. This is not the same as domain authority in the Google sense; it is closer to the academic notion of being cited by others who are themselves cited. A page that states a fact is less valuable to the model than a page whose fact has been repeated, linked to, or referenced by credible external sources. For B2B brands in financial services or the multilateral system, this distinction is material. A policy paper published on your own domain, however well-structured, competes poorly with the same finding reported by Reuters, cited in an IMF working paper, and referenced in a Bloomberg analysis.

The second structural preference is for machine-readable factual density. ChatGPT's retrieval logic, as visible in the traffic, gravitates toward pages where discrete facts are unambiguous and parseable: statistics with clear attribution, named entities with defined roles, dates that anchor claims in time. Prose that contextualises without asserting gives the model less to work with. This is a more specific constraint than "write clearly." It means that content which performs well for human readers, nuanced, hedged, narrative, may perform poorly as retrieval fodder.

The entity graph underneath

Mohanadasan's analysis points to a third factor that sits beneath both of the above: entity recognition. ChatGPT treats named entities as retrieval anchors. A brand or institution that exists as a well-defined entity in the model's training data, associated with specific attributes, relationships, and facts, is more likely to appear in answers to queries that involve those attributes. This is why Wikipedia presence, structured data markup, and consistent naming conventions across the web matter disproportionately. For a multilateral institution like UNDRR or CGAP, whose names are not household terms, this represents a meaningful vulnerability. If the model does not have a crisp internal representation of what the organisation is and does, it will not surface it even when it is the most qualified source.

The implication that follows is not comfortable for brands accustomed to content-volume strategies. Publishing more is not the mechanism. The mechanism is external validation: getting your facts, findings, and positions repeated by sources the model already treats as authoritative. For an industrial group like Holcim, that means trade press coverage of sustainability data, not just the sustainability report itself. For a financial institution, it means third-party analyst references to proprietary research, not just the research PDF on the website.

There is one further implication that the network traffic approach makes harder to ignore. Because ChatGPT's retrieval is query-triggered and entity-anchored, brands that do not appear in the training data as coherent entities, and whose material does not circulate in third-party sources that the model fetches, face a compounding disadvantage. Each answer generated without them reinforces their absence. The model's sense of who the authoritative voices are on any topic is not reset with each query; it is accumulated, and presence in training shapes retrieval weightings going forward.

The brands most at risk are those in specialised sectors, multilaterals, industrial groups, philanthropic institutions, that produce rigorous primary research but distribute it primarily through owned channels. Their content may be excellent. The model may never look.

Source: Search Engine Journal

AI-authored, editor reviewed