Schema Markup for AI: What It Actually Does (With Real Code)

A vendor tells you schema markup gets you cited in ChatGPT. The same week, a controlled study tells you it does nothing. Both are talking about schema markup for AI, both have data, and both sound certain. So which is it?

They are measuring different jobs. Schema does three separate things, for three different systems, at three different times: Google’s index, an LLM’s training run, and a live AI fetching your page. Most arguments about schema collapse all three into one yes-or-no question, which is why they never resolve.

The framing I am borrowing here comes from Suganthan Mohanadasan, who laid it out as the three lives of schema markup. I am going to walk each life using the actual JSON-LD running on this site, tell you plainly what each one buys you and what it does not, and then add a fourth life that schema can’t reach on its own.

Key insight

Schema markup is closer to registering a business than running an ad. It does not produce the citation. It makes you legible to the system that decides the citation.

Key Takeaways

Schema markup does three jobs at three different times: index, training, and runtime. Stop debating it as one thing.
Life 1 (Google’s index) and Life 2 (entity canonicalization) are the durable payoff. They still work.
Life 3 (runtime): most third-party LLMs strip your JSON-LD when they fetch the page, so schema is not a citation hack.
Treat schema as entity registration, not advertising. Markup nominates you; authority canonicalizes you.
The fourth life, serving structured data to agents directly through an MCP endpoint, is how you close the runtime gap.

First, the confusion: why smart people disagree about schema and AI

The disagreement is real but the contradiction is fake. The “schema gets you cited” camp and the “schema does nothing for AI” camp are each half right, because neither one says when the schema gets read.

Pin that down and the noise clears. Structured data for AI passes through three systems at three moments: Google’s indexing pipeline at index time, an LLM’s training corpus at training time, and a live retrieval fetch at runtime. Different machine, different moment, different outcome. The rest of this is just those three, in order, with the code I actually ship.

Life	System that reads it	When it is read	What it buys you
Life 1	Google’s index pipeline	Index time	Rich results, Knowledge Graph, author attribution
Life 2	LLM pretraining corpus	Training time	Entity canonicalization via `sameAs`
Life 3	Third-party LLM at retrieval	Query time	Mostly nothing; the JSON-LD is stripped

Life 1, index time: schema feeds Google’s entity engine

The original job, and the one that still works exactly as advertised. This is schema as Google has used it since the early 2010s.

What schema does at index time

At index time, schema feeds Google’s entity identification system. That is the real function: not ranking, but recognition. You are handing Google an unambiguous, machine-readable statement of what each thing on your page is, so it can place that thing in the Knowledge Graph, attach author attribution, and render rich results like recipe cards and product carousels.

The distinction that matters for AI: this is about being a thing Google can identify and connect, not a page it ranks. An entity Google recognizes cleanly is one it can carry into every downstream surface, including its own AI features. (Google Search Central)

What I actually ship: a Person and Organization graph on every page

Here is the implementation, not the theory. Every front-end view on this site emits one linked @graph with two entities: a Person (me) and an Organization (the site), each carrying a stable @id the rest of the site points back to.

{
  "@context": "https://schema.org",
  "@graph": [
    {
      "@type": "Person",
      "@id": "https://toddmorourke.com/#person",
      "name": "Todd M. O'Rourke",
      "url": "https://toddmorourke.com/",
      "jobTitle": "SEO Consultant",
      "sameAs": [
        "https://www.linkedin.com/in/todd-orourke/",
        "https://www.youtube.com/@todd.orourke"
      ],
      "knowsAbout": [
        "Search Engine Optimization",
        "Answer Engine Optimization",
        "Generative Engine Optimization"
      ],
      "alumniOf": [
        {
          "@type": "CollegeOrUniversity",
          "name": "Rutgers University-Newark",
          "sameAs": "https://www.newark.rutgers.edu/"
        }
      ],
      "worksFor": { "@id": "https://toddmorourke.com/#organization" }
    },
    {
      "@type": "Organization",
      "@id": "https://toddmorourke.com/#organization",
      "name": "Todd M. O'Rourke",
      "url": "https://toddmorourke.com/",
      "founder": { "@id": "https://toddmorourke.com/#person" }
    }
  ]
}

The load-bearing detail is the @id. Every other block on the site, the BlogPosting author and publisher included, references #person and #organization instead of redefining them. One canonical entity, referenced everywhere. That is what makes the graph legible to a parser instead of a pile of conflicting copies. The human-readable version of this same identity lives on my AI information page; the graph is just its machine-readable twin.

The architecture decision: one plugin, zero schema in templates

One opinionated call most schema posts skip: all JSON-LD on this site comes from a single plugin. The theme templates emit none. The plugin is the only source of schema output; the theme stays the only source of the copy, and the plugin reads from it.

This matters for AI specifically because entities hate ambiguity. Schema scattered across templates drifts: a Person defined three ways on three page types, an Organization that disagrees with itself. Centralizing the output guarantees one consistent entity definition everywhere, which is the whole point of technical SEO work on structured data.

Life 2, training time: how schema reaches an LLM indirectly

The indirect life, and the most misunderstood. This is what people mean, usually wrongly, when they say schema “trains” the model.

JSON-LD doesn’t make it into training data, but your entity can

Your raw JSON-LD almost never survives into a training corpus. Data-cleaning pipelines strip it during preprocessing, alongside most other markup. So no, the model does not read your FAQPage block and learn your answers.

What survives is the canonicalization the schema enables. Your sameAs links tie your entity to authoritative nodes like Wikidata, Wikipedia, and the Knowledge Graph. Those nodes do flow into training data. Schema is how you nominate yourself into the canonical entity record the model actually learns from. The markup is the nomination form, not the thing that gets learned. (Suganthan Mohanadasan)

My sameAs chain, and where it falls short

Here is my current sameAs, and the honest gap in it:

"sameAs": [
  "https://www.linkedin.com/in/todd-orourke/",
  "https://www.youtube.com/@todd.orourke"
]

LinkedIn and YouTube are table stakes. The strongest version of Life 2 wants a link to a canonical knowledge-base node, a Wikidata item, and I do not have one yet. The on-site change is one line:

"sameAs": [
  "https://www.linkedin.com/in/todd-orourke/",
  "https://www.youtube.com/@todd.orourke",
  "https://www.wikidata.org/wiki/Q00000000"
]

I could ship that today. The catch is that a sameAs to a Wikidata item I created myself, that nothing else references, is weak signal. Wikidata and the models downstream of it trust a node because other authoritative sources point at it. So the real work is off-site: earning enough referenceable coverage that the node deserves to exist. Markup nominates; authority canonicalizes. Life 2 is an entity-authority project wearing a one-line code change as a disguise.

Life 3, runtime: what happens when an AI actually fetches your page

The life that fuels the “schema is dead for AI” takes, and the one with the most uncomfortable evidence. This is schema at query time, when a live assistant pulls your URL.

Most LLMs strip your JSON-LD at retrieval

When a third-party AI assistant fetches your page in real time, most of them strip the JSON-LD and read only the visible HTML. One test ran ChatGPT, Claude, Perplexity, Gemini, and Google AI Mode against the same pages and found every one of them reading visible content only, with JSON-LD, Microdata, and RDFa ignored. The markup you carefully built is invisible at the exact moment you wanted it working. (searchVIU)

There is a split worth keeping straight. Google’s own AI surfaces, AI Overviews included, do use schema as context, because they sit on the same entity infrastructure from Life 1. Third-party runtime retrieval mostly does not. So “schema is dead for AI” is wrong; “schema is a runtime citation lever” is also wrong. It is first-party yes, third-party no.

This is also where the famous null result comes from. A controlled study tracked 1,885 pages adding JSON-LD against control pages and measured no meaningful citation lift across Google AI Overviews, AI Mode, or ChatGPT. (Ahrefs) That study measured Life 3, runtime citations. It disproved schema as a runtime citation hack. It said nothing about schema as index and training infrastructure, which is Life 1 and Life 2.

So what’s the point of schema if runtime ignores it?

Judge schema by the right job and it is obviously worth doing. Its payoff is Life 1 and Life 2: index legibility and entity canonicalization that compound quietly over years. You are registering the entity, not buying the citation. If your goal is getting cited in AI search, schema is the foundation under that work, not the lever that triggers it.

Judge it as a runtime citation trick and you will be let down, and the data will back you up. The mistake is the expectation, not the markup. Which raises the obvious practitioner question: if runtime retrieval is the gap, do you just accept it? I did not.

The Fourth Life: stop waiting for LLMs to read your markup, hand it to them

The part Mohanadasan’s framework does not cover, because it is not a life schema has. It is the one I went and built.

WebMCP: serving structured data to agents directly

If runtime LLMs will not parse your JSON-LD, the fix is not more schema. It is a different channel. I built a WebMCP endpoint on this site that exposes callable tools to any agent that connects: search_posts, get_post, list_posts, get_services, and get_consulting_info. Instead of hoping an assistant scrapes and parses my page, I let it call my content as structured tools.

The build, briefly: the tools are defined once in the official W3C WebMCP shape (document.modelContext.registerTool), with a feature-detecting adapter and a fallback widget, and they load site-wide as a floating tile. Two of the five tools run on custom REST routes that serve my services and consulting profile as clean structured data. This is the logical end of the three-lives logic: Life 1 and 2 make you legible to indexes and training, and the fourth life makes you directly callable at runtime, the exact layer schema cannot reach. I documented the full build in how to add WebMCP to WordPress.

The honest caveat: nobody’s calling these tools yet

Real agent traffic to a WebMCP endpoint is roughly zero today, and the current spec leans on a localhost bridge with genuine friction to connect. This is first-mover positioning, not a traffic channel, and I am not going to pretend otherwise.

I built it anyway for the same reason the people who added schema in 2011 were not getting rich results that afternoon. They were laying entity infrastructure before it paid. The cost of standing up an agent interface now is low and the option value, if runtime moves toward agents calling tools instead of scraping pages, is high. That is a bet I am comfortable making in public.

Conclusion

Schema markup for AI has no single yes-or-no answer. It is three jobs and a fourth move. Get the jobs straight and the contradictions disappear: register the entity at index and training time, stop expecting runtime citations the markup can’t deliver, and build the direct channel schema can’t reach.

Next steps

Audit your @graph for @id consistency, so every block references one canonical entity instead of redefining it.
Centralize schema output to a single source so your entity never contradicts itself across page types.
Add a sameAs aimed at a canonical knowledge-base node, then do the off-site work to make that node real.
Consider an agent-callable interface before you need one.

Frequently Asked Questions

Does schema markup help you get cited in AI?

Not directly at runtime. Most third-party AI assistants strip your JSON-LD when they fetch the page, and controlled testing found no citation lift from schema alone. Its real payoff is upstream: index legibility and entity canonicalization that feed Google’s AI surfaces and model training.

Do AI assistants read JSON-LD when they fetch a page?

Mostly no. Independent testing showed several major AI systems discarding JSON-LD during live retrieval and reading only the visible HTML. The exception is Google’s own AI surfaces, which use schema as context because they sit on the same entity infrastructure that powers regular search.

Does schema markup still help SEO?

Yes. At index time it feeds Google’s entity engine, drives rich results, and supports Knowledge Graph membership and author attribution. That function has worked since the early 2010s and still does. The “schema is dead” arguments are about AI runtime citations, a different job entirely.

Which schema types matter most for AI search?

The entity-identity types, Organization and Person, wired together with stable @id references and sameAs links. Those build the canonical entity that index and training systems care about. Rich-result types like FAQPage and HowTo help presentation but are secondary to a clean, consistent entity graph.

Can AI generate schema markup automatically?

Yes, and generation was never the hard part. A model can emit valid JSON-LD in seconds. The hard part is entity consistency: one canonical definition referenced everywhere through @id, and sameAs links pointed at nodes authoritative enough to matter. Tools generate markup; they do not earn entity authority.

What’s the difference between schema for SEO and schema for AI?

Same markup, different lives. For classic SEO, schema works at index time to produce rich results and entity recognition. For AI, it works indirectly at training time through entity canonicalization, and barely at all at runtime, where most assistants strip it. The tag is identical; the system reading it is not.

Sources

Key Takeaways

First, the confusion: why smart people disagree about schema and AI

Life 1, index time: schema feeds Google’s entity engine

What schema does at index time

What I actually ship: a Person and Organization graph on every page

The architecture decision: one plugin, zero schema in templates

Life 2, training time: how schema reaches an LLM indirectly

JSON-LD doesn’t make it into training data, but your entity can

My sameAs chain, and where it falls short

Life 3, runtime: what happens when an AI actually fetches your page

Most LLMs strip your JSON-LD at retrieval

So what’s the point of schema if runtime ignores it?

The Fourth Life: stop waiting for LLMs to read your markup, hand it to them

WebMCP: serving structured data to agents directly

The honest caveat: nobody’s calling these tools yet

Conclusion

Next steps

Frequently Asked Questions

Want schema that builds your entity, not just rich results?