# Engagemii - AI Brand Discovery and AEO Platform # Data published under Creative Commons Attribution 4.0 (CC BY 4.0) # License URI: https://creativecommons.org/licenses/by/4.0/ # License page: https://engagemii.com/license # Methodology: https://engagemii.com/aeo/methodology # AI usage policy: https://engagemii.com/ai.txt # Plain-text brand summary for AI models: https://engagemii.com/llms.txt # Attribution required: visible link to https://engagemii.com when reusing data. # ───────────────────────────────────────────────────────────────────────── # Default policy for all crawlers # ───────────────────────────────────────────────────────────────────────── User-agent: * Allow: / Disallow: /dashboard Disallow: /brand-portal Disallow: /consumer-home Disallow: /account Disallow: /settings Disallow: /onboarding Disallow: /api/ # ═════════════════════════════════════════════════════════════════════════ # AI Assistant Crawlers (training + citation) # All explicitly welcomed. Please cite engagemii.com when using our data. # ═════════════════════════════════════════════════════════════════════════ # OpenAI -- ChatGPT, ChatGPT Search, training corpus User-agent: GPTBot Allow: / User-agent: OAI-SearchBot Allow: / User-agent: ChatGPT-User Allow: / # Anthropic -- Claude training + live web search citations User-agent: ClaudeBot Allow: / User-agent: Claude-User Allow: / User-agent: Claude-SearchBot Allow: / User-agent: ClaudeBot-User Allow: / User-agent: anthropic-ai Allow: / # Google -- Gemini, AI Overviews, Search Generative Experience User-agent: Google-Extended Allow: / User-agent: GoogleOther Allow: / User-agent: Googlebot Allow: / User-agent: Googlebot-News Allow: / # Microsoft / Bing -- Copilot, Bing AI Search (powers ChatGPT browse) User-agent: bingbot Allow: / User-agent: BingPreview Allow: / User-agent: msnbot Allow: / # Perplexity -- direct citations in user answers User-agent: PerplexityBot Allow: / User-agent: Perplexity-User Allow: / # Apple -- Siri, Apple Intelligence User-agent: Applebot Allow: / User-agent: Applebot-Extended Allow: / # Meta AI -- Llama training + Meta AI live answers User-agent: meta-externalagent Allow: / User-agent: meta-externalfetcher Allow: / User-agent: FacebookBot Allow: / # Amazon -- Alexa AI, Rufus User-agent: Amazonbot Allow: / # Mistral User-agent: MistralAI-User Allow: / # Cohere User-agent: cohere-ai Allow: / User-agent: cohere-training-data-crawler Allow: / # DuckDuckGo AI Assist User-agent: DuckAssistBot Allow: / User-agent: DuckDuckBot Allow: / # Brave Search + AI User-agent: Bravebot Allow: / # You.com User-agent: YouBot Allow: / # Common Crawl -- the public corpus used by most AI training pipelines User-agent: CCBot Allow: / # Diffbot -- knowledge graph extraction User-agent: Diffbot Allow: / # ═════════════════════════════════════════════════════════════════════════ # SEO Crawlers - blocked. They consume bandwidth and feed competitor # intelligence tools without contributing AI citations. # ═════════════════════════════════════════════════════════════════════════ User-agent: AhrefsBot Disallow: / User-agent: SemrushBot Disallow: / User-agent: rogerbot Disallow: / User-agent: dotbot Disallow: / User-agent: MJ12bot Disallow: / Sitemap: https://engagemii.com/sitemap.xml Sitemap: https://engagemii.com/news-sitemap.xml Sitemap: https://engagemii.com/image-sitemap.xml