← Back to Blog
Research
By ·May 30, 2026·6 min read

Two-Thirds of the Web Is Invisible to AI Bots

When AI search engines became a meaningful share of how people find businesses online, a quiet assumption traveled with the trend. The assumption was that if your site shows up in Google, it will probably also show up in ChatGPT. They run on the same web. They read the same pages. Surely the underlying access pattern is similar.

It is not.

We analyzed 1,187,128 brand websites in our research dataset. Of those, only 395,022 have ever been crawled by a major AI bot. That is a base crawl rate of 33.3 percent. The other 66.7 percent are completely invisible to ChatGPT, Claude, Perplexity, and the other large language model assistants that millions of consumers now use as a primary discovery channel.

If you assumed your site was being read, the math says there is a two-in-three chance you are wrong.

Why the gap exists

Traditional search engines built their indexes during a 20-year era when the dominant problem was scale: there were billions of pages and the search engine needed to crawl all of them quickly. Googlebot and Bingbot are aggressive crawlers because their job is to know about everything, then sort it.

AI bots have a different job. They do not have to know about every site. They have to know about the sites worth quoting in an answer. The cost of crawling a low-quality, machine-unreadable site is the same as crawling a good one, but the value is zero. So AI bot operators are far pickier about what they index. If your site does not present the structured signals that say "I am a real organization with verifiable information," the AI bot moves on and rarely comes back.

The shorthand for this is that AI bots crawl on quality, not coverage. Googlebot crawls on coverage. Different incentives produce different access patterns. The 67% of sites that are invisible to AI bots are the sites that pass Googlebot's much lower bar but fail the AI bots' higher one.

The structured-data effect

We trained a classifier on our dataset to predict, from structured-data signals alone, whether a site is in the AI-visible third. The model holds all other variables aside (no AEO score, no domain popularity, no business category) and turns 16 structured-data signal flags on and off.

Sites with none of those signals are crawled at 27.2 percent. Sites with all of them are crawled at 57.0 percent. The multiplier between the two ends is 2.09 times. The Stage A model AUC is 0.612, meaningfully better than chance and consistent with the descriptive cuts in the data.

What this says in plain English: if your site has the structured-data signals AI bots look for, you are roughly twice as likely to be in the visible third than if you do not. The signals are not exotic. They are the same JSON-LD Organization schema, FAQ schema, Article schema, and Product schema that have been standard practice in technical SEO for a decade. The difference is that AI bots actually use them as a gate, while traditional search engines treat them as a tiebreaker.

The 67% problem in practical terms

If you are a local business owner, the invisible majority is your competitive set. Most of your competitors are in it. That is good news. It means your AEO investment buys you outsized differentiation cheaply.

If you are a marketing leader at a larger company, the invisible majority is harder to think about. You probably assumed your site was visible because it ranks well in Google. The two are decoupled now. You can rank position 1 for your category and still not appear in a Perplexity answer for the same query. Your AI search visibility is governed by a different set of signals than your blue-link visibility, and the signals require explicit attention.

The hardest version of the 67% problem is brands whose robots.txt explicitly blocks AI crawlers. We see this constantly in audits. Years ago, well-meaning developers added User-agent blocks for GPTBot, CCBot, and Google-Extended out of caution about content scraping. Those blocks were rational in 2023. They are now actively preventing the brand from appearing in AI answers, because AI assistants will not cite a source they cannot reach.

If you have not audited your robots.txt in the last 12 months, it is likely that someone on your team made a defensive decision that has since become an offensive disadvantage. This is the single fastest thing to check.

How to find out which side you're on

The free AEO score at engagemii.com/aeo includes a direct check on AI crawler access. The audit returns a yes-or-no answer on whether GPTBot, ClaudeBot, PerplexityBot, Google-Extended, and Applebot can read your site, along with the specific robots.txt rules and meta robots tags that are blocking them if they cannot.

If the audit says you are blocking AI crawlers, you are guaranteed to be in the 67%. Removing the block does not by itself put you in the visible third (the structured-data signals still matter), but it is a necessary condition. No amount of schema work compensates for an explicit block.

If the audit says you are allowing them and your AEO score is still under 5, you are in the 67% for the second reason: the structured signals are too thin. The fix list returned with the score ranks the missing signals by impact. Most sites in this state can move into the visible third within a single week.

Why this matters now

AI search is still in the phase where adoption is uneven and citation slots are not yet locked in. The brands that establish themselves as citable in the next 12 to 18 months will be the ones AI assistants return to repeatedly when relevant queries come in. The brands that do not will find that the citation slots have been allocated by the time they arrive.

This is the same dynamic that happened in early SEO, in early App Store optimization, and in early social platform algorithms. The brands that did the work before the channel matured got disproportionate share. The brands that waited until the channel was obvious paid more for less.

About this analysis

The visibility rate, multiplier, and AUC come from Section 3.1 of Engagemii's research brief, which analyzes structured-data signals against observed AI bot crawl data across 1,187,128 brand domains. The full methodology and the rest of the findings are at engagemii.com/research/aeo-crawl-drivers.

If you want to cite this article, the URL is engagemii.com/blog/two-thirds-web-invisible-to-ai-bots. New findings from this dataset publish here weekly.


Ready to find out if AI can cite your brand?

Get Your Free AEO Score