Your traffic is down. Not catastrophically, not yet, but the trend line is unmistakable. You check your rankings and they are fine. The same pages still sit at positions one through three for the same queries. The traffic just is not arriving. Something between the ranking and the click has broken.

That something is the citation bottleneck. AI-generated answers now sit between your content and your reader. Those answers cite a narrow set of dominant sources, a pattern reshaping the economics of publishing from the inside out.

The citation bottleneck

Rand Fishkin’s SparkToro research on AI search has shown that AI-generated answers often cite a small number of dominant sources, with Wikipedia being the most frequently cited domain in many categories (SparkToro blog posts on AI search). That is not an accident. It is a structural feature of how large language models are trained to cite. They favor authoritative, well-structured, and frequently referenced sources. The result is a winner-take-most dynamic that punishes the middle of the content pyramid.

The Princeton paper “Counting the Cost of Generative Search” estimates that generative AI search could reduce web traffic to publishers by 25% to 60% depending on the query type and implementation (Counting the Cost of Generative Search, Princeton). That is a wide range, but even the low end represents a structural loss that no amount of traditional SEO optimization can recover. The problem is not about ranking. It is about citation. Your page can rank first in Google’s traditional search results and still never appear in an AI-generated answer. The two systems are decoupled.

Google’s AI Overviews (formerly Search Generative Experience) were launched in the US in May 2024 and have been shown to cite sources from a mix of e-commerce, forums, and publisher sites, with a tendency to favor high-authority domains, according to Search Engine Land. The pattern is consistent across the AI search landscape. Perplexity AI cites sources by providing inline citations with numbered links that point to the original URLs, and its documentation states that it prioritizes authoritative and up-to-date sources (Perplexity AI Documentation: Citations). The common thread is clear: authority and structure matter more than keyword density or backlink counts.

This is not a bug. It is the point. AI engines are not trying to replicate the search engine results page. They are trying to answer the question in a single response, and that means they need sources they can trust without human judgment. The sources they trust are the ones that look like primary sources.

Entity signals

If traditional SEO was about keywords, generative engine optimization (GEO) is about entities. An entity is a thing with a clear identity: a person, a company, a product, a place, a concept. The primary signal AI engines use to identify entities is structured markup.

Schema.org’s Article type includes properties such as ‘author’, ‘publisher’, ‘datePublished’, and ‘mainEntityOfPage’, which are used to structure content for search engines and AI systems (Schema.org: Article). Schema.org’s Organization type includes the ‘parentOrganization’ property, which allows specifying a parent organization for a sub-organization, and ‘sameAs’ for linking to external identifiers like Wikidata (Schema.org: Organization).

These are not new standards. Schema.org has been around for over a decade. But their role has shifted. In traditional SEO, structured data was a nice-to-have that sometimes produced rich snippets. In GEO, it is a prerequisite. An LLM that encounters a page with clear entity markup can confidently attribute a claim to that source. A page without it is a text blob that the model must evaluate on content alone, which is slower and less reliable.

The key is the sameAs link. When you connect your organization’s schema to its Wikidata entry, you create a machine-readable chain of identity that an AI engine can follow. Wikidata becomes the canonical reference. Your page becomes a primary source about that entity. The AI engine can cite you with confidence because it knows exactly who you are.

This is why entity-density matters. Every named entity on your page, every tool, company, person, or methodology, should have a clear schema annotation. Not for the human reader. For the machine that decides whether to cite you.

The llms.txt question

The llms.txt file is a proposed standard by the AI community that allows website owners to specify which pages should be used by large language models for training or citation, similar to robots.txt for crawlers (llms.txt project page). The idea is elegant. Place a file at the root of your domain that tells AI providers which pages to cite and which to ignore. Give the machine a map of your best content.

But adoption is nascent. The standard has no enforcement mechanism. It is aspirational signaling, not deterministic control. Contrast this with robots.txt, which search engines have agreed to honor for decades. Robots.txt works because it is backed by a shared understanding between publishers and crawlers. llms.txt has no such agreement. An AI provider can ignore it with no consequence.

That does not mean it is useless. It means it is an early-stage signal that will become more valuable as adoption grows. For now, implementing llms.txt is a low-cost hedge. It tells AI providers that you are aware of the standard and that you want to participate. That awareness itself may be a signal. The first publishers to adopt it will have an advantage when compliance becomes standard.

The smarter move is to treat llms.txt as one layer in a broader strategy. Use it to surface your best content. But do not rely on it alone. The AI engines are not waiting for a standard. They are already citing sources based on the signals they can read today.

Primary-source citation patterns

The citation hierarchy that AI engines use is not random. It follows a predictable pattern: primary sources first, curated databases second, aggregators last.

Perplexity AI’s inline citation system favors authoritative and up-to-date sources (Perplexity AI Documentation: Citations). Google’s AI Overviews favor high-authority domains (Search Engine Land). Wikipedia wins in both systems because it is a primary source for encyclopedic knowledge, it is heavily structured, and it is constantly updated (SparkToro blog posts on AI search). The same logic applies to official documentation, company websites, and research papers. These are sources that the AI engine can trust because they are the original authority on their subject.

Aggregators lose. A listicle that summarizes ten tools without linking to their official documentation is a secondary source. An AI engine that cites it is taking a risk. The aggregator might be wrong, outdated, or biased. The primary source is safer. So the AI engine cites the primary source, and the aggregator gets nothing.

This inverts the traditional SEO value of aggregation. In the old model, aggregators could outrank primary sources by building more backlinks and optimizing for keywords. In the new model, the AI engine skips the aggregator entirely and goes straight to the source. The aggregation strategy that worked for a decade is now a liability.

The lesson is uncomfortable but clear. If you want to be cited by AI engines, you need to be a primary source. That means original reporting, original data, original analysis. It means being the authority on your subject, not the curator of other people’s authority.

The AI engine skips the aggregator and goes straight to the source. The aggregation strategy that worked for a decade is now a liability.

A 30-day checklist

The shift to GEO is real, but it is also concrete. There is a playbook, and it can be executed in roughly 30 days. Here is the sequence.

Week one: Audit existing schema. Run a crawl of your site and check every page for schema.org markup. Identify pages that are missing Article or Organization types. Flag pages that have incomplete markup, especially missing author, publisher, or datePublished properties (Schema.org: Article). This is the diagnostic phase. You cannot fix what you have not measured.

Week two: Add sameAs links to Wikidata. For every organization on your site, create or update its Wikidata entry. Then add a sameAs link in your Organization schema pointing to that entry (Schema.org: Organization). This is the highest-leverage single action you can take. It creates the machine-readable identity chain that AI engines use to trust your content.

Week three: Implement llms.txt. Create the file at your root domain. List your best pages, the ones that are primary sources on their subjects. Prioritize depth over breadth. A short list of high-quality pages is better than a long list of everything (llms.txt project page). Monitor your server logs for requests to the file. If no AI providers are hitting it, that is useful information.

Week four: Shift content strategy toward primary-source reporting. This is the hard part. Stop writing listicles that aggregate other people’s work. Start writing original analysis, original data, and original reporting. Every piece of content should be the primary source for something. If it is not, it is unlikely to be cited by an AI engine, and its traffic will continue to decline.

The checklist is simple. The execution is not. But the alternative is worse. The alternative is watching your traffic decline quarter over quarter while your rankings stay flat, wondering what changed.

The honest summary

SEO is not dead. But the version of SEO that rewarded aggregation, keyword density, and link-building at the expense of authority is dying. The new version rewards entity-density, structured data, and primary-source authority. It rewards being the source, not the curator.

The AI engines are not trying to destroy publishing. They are trying to answer questions. They will cite the sources that make that job easiest. The publishers that adapt to that reality will be cited. The ones that do not will be invisible.

That is not a judgment. It is a signal. And like any signal, it can be read and acted on. The question is whether you will read it before your traffic tells you the same story in numbers.