LLMs cite third-party sources for brands 6 times more than owned ones
Owned web estates account for just 14.3% of LLM brand citations. The other 85.7% belongs to third parties your content team does not control.
Key takeaways
- LLMs cite third-party sources for brand information at a 6-to-1 ratio over owned domains.
- 80% of all brand citations come from roughly 18% of domains, following a Zipf distribution.
- Wikipedia is the top-cited domain in 11 of 12 home markets studied.
- Citation source mix varies by language and market, making remediation a local problem, not a global one.
- Optimising owned content addresses the minority channel; the majority is earned and concentrated in a handful of high-authority sites.
The conventional assumption in B2B marketing is that a company's website is its authoritative voice. In AI search, that assumption is wrong by a factor of six.
A study published on arXiv, drawing on Rankfor.AI data covering 128 brands, 12 home markets, and 13 languages, analysed 167,551 URL-grounded citations to establish where large language models actually source brand information. The finding is blunt: 85.7% of citations point to domains the brand does not own. Owned properties account for just 14.3%.
That ratio inverts the logic of most corporate content strategy.
The structure of the citation pool
The source base is concentrated and long-tailed. Roughly 80% of citations come from approximately 18% of domains, a distribution that fits a Zipf law with an alpha of 0.86 and an R-squared of 0.983. At the head of that distribution sits Wikipedia, which is the most-cited domain in 11 of the 12 home markets studied. One reference site, editorially independent of every brand in the dataset, is doing more to shape what LLMs say about those brands than those brands' own digital estates.
This is not a quirk of consumer brands. The 128 brands in the dataset span major industrial and financial groups, the kind of organisations whose reputations are built over decades and whose web presence runs to thousands of pages. The LLM does not weight by publishing volume. It weights by source trust as encoded in its retrieval logic, and that logic reaches for third-party editorial corroboration first.
The implication for multilateral institutions and policy bodies, which often maintain extensive owned-media operations (microsites, technical papers, official reports) while treating press coverage and Wikipedia as secondary concerns, is direct. If a model answering a question about UNDRR's work in climate resilience, or CGAP's role in financial inclusion, draws 85.7% of its citations from external sources, the institution's framing of its own work is structurally subordinate to how third parties describe it. The owned content is in the room. It is rarely the loudest voice.
Concentration matters as much as the ratio
The Zipf distribution is the second finding that deserves attention. A long-tail citation pool with 80% of weight concentrated in 18% of domains means that appearing on a small number of authoritative third-party sites is not merely useful; it is close to necessary. A brand that features substantively on, say, 50 mid-tier domains but lacks a strong Wikipedia entry and minimal coverage in the dozen or so domains that sit at the head of the distribution will be poorly represented even if its own web presence is extensive.
For financial services firms and large industrial groups, this concentration effect is structurally unfavourable in one specific way. Regulatory, reputational, and legal sensitivities often limit how aggressively these organisations can pursue coverage in the kinds of editorial outlets that sit at the head of the citation distribution. The brands that benefit most are those with a long history of unprompted press coverage in high-authority outlets, not those with the most sophisticated content operations.
Languages and markets break the pattern unevenly
The study's 13-language scope adds a further complication. Citation source type varies by language and market, which means a brand's LLM visibility in, say, Japanese or Arabic is shaped by an entirely different third-party ecosystem than its visibility in English. For multinationals and multilaterals operating across language markets, the earned-media gap is not uniform. It may be severe in some markets and negligible in others, and the remediation is necessarily local rather than global.
ISO and IEEE, both of which operate standard-setting functions that are cited heavily in technical and policy contexts, face a version of this problem in reverse. Their authority is well established in English-language sources. Whether that authority transfers intact into AI-mediated answers in other language markets depends on the density and quality of third-party citations in those languages, not on the comprehensiveness of their own multilingual publishing.
What the 14.3% figure actually protects
The 14.3% owned-source share is not worthless. Owned citations carry precise brand-controlled language: product definitions, organisational descriptions, official positions. When they appear, they anchor the model's factual framing. The problem is that they appear infrequently, and their influence on the final answer is diluted by the weight of third-party corroboration surrounding them.
The practical consequence: brands that treat website optimisation as the primary lever for AI visibility are optimising the minority channel. The majority channel is earned, not owned, and it is governed by a small number of high-authority domains whose editorial decisions no brand controls directly.
Getting onto those domains substantively, with accurate, detailed, and regularly updated information, is now as much a search infrastructure decision as a communications one. The brands that understood Wikipedia as a reputational liability to be managed defensively have been building in the wrong direction for some time.