Topic

AI Search & Citation

How AI search engines (ChatGPT, Perplexity, Claude, Google AI Overviews) read independent restaurant content in 2026 — what they cite, where the corpus lives, and the operator moves that earn citation.

Articles

Read the playbooks.

Pillar essay · new May 2026

AI search isn’t Google with a chatbox.

By mid-2026, AI search engines — ChatGPT, Perplexity, Claude, Google AI Overviews, Bing Copilot — account for an estimated 8–14% of restaurant-discovery queries in the DMV, depending on the operator’s demographic mix. Younger diners (18–34) over-index toward AI-search; older diners (55+) still use Google Maps directly. The crossover point is moving steadily; by 2027 the share for restaurants will likely be in the 20–30% band.

This pillar is the field guide for an independent operator who wants to be cited by name in those answers, not just indexed somewhere a crawler might find. Three things to know before you start: AI search isn’t Google with a chatbox; the algorithms care about different signals than the local-pack does; and most of what you can do to influence citation is the same work that makes the rest of your site better.

How AI search engines actually work in 2026

Different from a Google search in three ways:

  • Retrieval, not ranking. An AI search engine doesn’t order ten blue links; it pulls 3–8 sources, summarizes them, and cites by URL or site. The retrieval is usually a vector search across an indexed corpus + a real-time fetch of the freshest source for time-sensitive queries (hours, prices, reservations). Your job is to be in the corpus AND fetchable when asked.
  • Citation is a separate signal from indexing. Indexing means the crawler reached your URL. Citation means the LLM chose to quote you (with a link) when answering a question. The signals that earn citation: clear authoritative claims, named sources, structured data, freshness dates, and the absence of marketing fluff. The same operator who’d be cited in a New York Times piece is the operator AI search cites.
  • The unit of citation is a paragraph, not a page. AI search lifts the 40–80 word block under a question-shaped H2 (“How much does it cost to…”, “What schema does Google use for…”) far more often than it lifts a wandering hero paragraph. Write the way you’d want to be cited: a question, then a tight paragraph that answers it, then the supporting context.

The corpus surface

Most AI search engines fetch and index two specific files when they decide what a site is about:

  • /llms.txt — the curated map. Title + meta description + URL for every article, glossary term, and tool. Maintained by scripts/build-llms-txt.mjs on this site; rebuilt on every deploy.
  • /llms-full.txt — the full-body corpus. Every article + research note + glossary term in Markdown, with stable per-item front matter (title, canonical URL, locale, date, kind). The corpus on this site is ~520 KB; /es/llms-full.txt ships the Spanish.
  • /feed-llm.json — JSON Feed 1.1 with content_text per item. AI search engines that prefer feed-shape over flat text use this. Mixed locales tagged via language.

Plus the conventional discoverability signals: /sitemap.xml (455 URLs, EN + ES), /feed.xml (RSS), /robots.txt (with explicit per-agent allows for GPTBot, ClaudeBot, anthropic-ai, Claude-Web, PerplexityBot, Google-Extended, Bytespider, Applebot-Extended). The /ai/ page documents what the studio uses AI for and what it never does — the trust posture that AI search engines look for when deciding whether to cite a source as authoritative.

Why schema is the second corpus

AI search engines parse JSON-LD aggressively. The same five schema types that move the needle for traditional SEO (Restaurant, LocalBusiness, OpeningHoursSpecification, Menu, Reservation) are the ones AI search lifts when answering a query like “is the Irish Inn open Christmas Eve?” A site whose JSON-LD says “closed Dec 24” gets cited; a site whose JSON-LD is silent gets second-guessed.

Three implications for the operator:

  • Holiday hours in JSON-LD, not just on GBP. Use Holiday Hours Generator to emit the specialOpeningHoursSpecification override; paste into your site’s schema block.
  • Menu in HTML + JSON-LD, not as a PDF. Use Menu Converter to turn pasted menu text into both. AI search can’t parse a PDF; it can parse a Menu schema block in two seconds.
  • Reservation deeplink in schema. The Reservation type with a potentialAction pointing at your venue’s OpenTable / Resy page lets AI search engines answer “book me a table at <name>” with the right URL on the first try. The 6 schema types Google actually uses walks through the JSON-LD.

What AI search engines won’t cite

Patterns that make a source effectively un-citable, in roughly this order:

  1. Marketing fluff. A page whose first paragraph is “Welcome to a culinary journey of authentic flavors” gets indexed but not cited. AI search models have learned to discount this register because it’s rarely true and never specific. The voice contract on this site (/methods/#voice-contract) is the citable register: short declaratives, named sources, no abstractions.
  2. Undated claims. “Restaurants typically run on 4–7% net margin” without a date or a source gets cited rarely. The same claim with “per RAMW’s 2025 industry report” gets cited often. Date everything; cite everything.
  3. Pages that lie via SEO. A page titled “Best pizza in Bethesda 2026” that’s a thinly-disguised affiliate roundup loses citation weight every quarter. AI search engines are explicitly trained to discount these patterns.
  4. JS-only content. If your menu only renders after JavaScript runs, AI search crawlers (which often don’t execute JS) see an empty page. The fallback: server-render the menu, or ship the data in JSON-LD too. The free Menu Converter emits both.
  5. Sites with hostile robots.txt. Blocking GPTBot, ClaudeBot, Google-Extended, and PerplexityBot guarantees you won’t be cited. The site’s posture is the opposite: explicit per-agent allows for retrieval-time citation crawlers, with named denies only for training-only scrapers (CCBot, Omgilibot, ImagesiftBot).

What to do this month

Three actions, in order of leverage:

  1. Ship a /llms-full.txt on your own site. If the AI-search engines read full bodies but you only ship a sitemap, you’re leaving the citation surface flat. The build script on this site is open; copy it, adapt to your CMS, ship.
  2. Audit your robots.txt against the AI crawler list. Per-agent stanzas for GPTBot, ClaudeBot, anthropic-ai, Claude-Web, PerplexityBot, Google-Extended, Bytespider, Applebot-Extended. Allow each, with the same disallows as your wildcard. The robots.txt on this site is a working template.
  3. Rewrite your three highest-traffic pages in citation register. First paragraph: a short declarative answer to the question someone would ask. Second paragraph: the supporting math, with named sources. Third: a tight conclusion. Strip the marketing-fluff first paragraph; replace with the answer. The voice contract on this site is the model.

The composite to watch: citations in the wild. There’s no Plausible-equivalent dashboard for this yet; the practical method is to type your business name + a question into ChatGPT, Perplexity, and Claude monthly and see whether your site is cited as a source. The trend line over six months tells you whether the work is landing.

Where this topic touches the others

  • Local SEO & Discovery — because AI search and local SEO compete for some of the same operator attention, but the work compounds. Strong schema, strong content, strong owner-response patterns build both. Where they diverge: AI search rewards depth and citation; local SEO rewards consistency and recency.
  • Conversions & Content — because the visitor who lands on your site after an AI-search citation is, at the click, the highest-intent visitor you get. Don’t leak that visitor with a slow page or a hidden reservation button.
  • Information Security — because the trust signals AI search engines read (named author, dated claims, transparent vendor list) are the same signals that build operator-side trust. The /security/ page on this site is a single artifact serving both audiences.