Technical

Meta Robots Tags and AI: noai, noindex, and What They Do

Meta robots tags are HTML directives that tell crawlers how to handle specific pages. As AI crawlers have proliferated, new directives have emerged alongside the traditional noindex and nofollow tags. Understanding what each tag does and how it affects AI visibility is essential for anyone managing AEO at a technical level. A single incorrectly applied meta robots tag can exclude a page from AI citation entirely.

Standard meta robots directives

The classic meta robots tag appears in the HTML head section as: meta name='robots' content='noindex, nofollow'. The noindex directive tells compliant crawlers not to include the page in their index. The nofollow directive tells crawlers not to follow links on the page. Both apply to any crawler that honors the robots meta tag, including AI crawlers. A page with noindex will not appear in traditional search results and will generally not be used by AI engines for citation. Review your site for noindex tags on pages you want AI engines to cite.

The noai and noimageai directives

Newer directives have emerged specifically targeting AI use: noai tells AI systems not to use the page's content for AI training or generation, and noimageai tells AI systems not to use images from the page for AI training. These directives are honored by some AI systems (DeviantArt's AI tools, for instance) but are not universally respected by major AI search crawlers like GPTBot or ClaudeBot. Applying noai to content pages you want cited in AI answers may have the unintended effect of reducing AI citation for those pages on systems that do honor it.

How AI crawlers handle meta robots tags

Most major AI crawlers (GPTBot, ClaudeBot, PerplexityBot) respect standard noindex directives because they are designed to be good web citizens. However, the noai directive is not universally respected. If your goal is AI citation, avoid applying noindex or noai to any page you want cited. If you want to prevent specific pages from appearing in AI answers (internal tools, private content that requires auth bypass), use robots.txt disallow rules for AI-specific crawlers combined with noindex as a belt-and-suspenders approach. Never rely on a single blocking method.

X-Robots-Tag for non-HTML files

Meta robots tags only work for HTML pages. For non-HTML files you want to control (PDFs, images, JSON files), use the X-Robots-Tag HTTP response header instead. This header accepts the same directives as the meta robots tag but is delivered in the HTTP response headers rather than the HTML. If you have AI-accessible PDFs with proprietary content you want to protect, configure your server to return X-Robots-Tag: noai for those file types. This is an advanced implementation that requires server configuration access.

Auditing your meta robots configuration

Audit your meta robots tags by crawling your own site using a tool like Screaming Frog or a browser extension that displays meta tags. Look for any pages with noindex that you expect to be cited in AI answers. Remove noindex from pages that should be AI-accessible. For pages with noai tags, decide whether the business cost of reduced AI citation is worth the protection from AI training use. After making changes, run an AEO audit at /aeo/scores to verify that your key pages are passing the crawler access signals correctly.

Ready to improve your AI visibility?

Run a free audit and get your score across 6 AEO categories.

Check your site's crawler access score