How We Trained a Two-Stage Model to Find What AI Bots Actually Look At

A reasonable reader of our research brief will arrive at Section 3 and want to know whether the findings are reproducible. The numbers are large. The cliff at AEO score 8 is dramatic. The cohort gap between the top 0.1 percent and the bottom 90 percent is a factor of 30. Before you act on any of that, you want to know how the analysis was constructed.

This article is the long-form version of the methodology section in the brief. If you are technically inclined and want to know what we actually did, here it is.

The data

Our training set is a snapshot of the Engagemii brand directory taken on 2026-05-29. Each row is one root domain. Each row carries 22 features. Most of them are binary flags indicating the presence or absence of a structured-data signal: FAQ schema, Organization schema, llms.txt, multi-page JSON-LD blocks, contact email visibility, US state location, an identity-mismatch flag for cases where the brand name does not appear in the page title or H1.

Each row also carries an AEO score (0 to 10, computed independently of the model) and an observed AI bot crawl count drawn from our bot tracking pipeline. We observe crawls from GPTBot, ChatGPT-User, OAI-SearchBot, ClaudeBot, Claude-Web, PerplexityBot, Anthropic AI, Google-Extended, Applebot, AmazonBot, Meta-ExternalAgent, Bytespider, CCBot, and approximately ten other bots that AI operators have publicly identified.

The dataset has 1,187,128 brands total. 395,022 of those have been crawled at least once. The other 791,000 have never been touched by an AI bot. Both groups are in the training data, which is essential because the most interesting question is what predicts being in the crawled group at all.

Why two models, not one

A naive approach would train a single model to predict crawl count directly from structured-data features. That approach is statistically tempting because it gives you one regression coefficient per feature, which feels clean and reportable. It is also misleading.

The reason it misleads is multicollinearity. We already publish an AEO score for every brand. The AEO score is itself a function of the same structured-data features. If you put the score and its constituent features into a single regression, the score absorbs most of the explanatory credit and each individual feature looks tiny. The model is technically correct (the features predict the score, the score predicts crawls), but the per-feature coefficients are wrong because they have been competed away by their own sum.

So we built two models, in two stages, and chained them at the end.

Stage 1: features to score

The first model is a LightGBM gradient-boosted regressor that predicts a brand's AEO score from its 17 binary structured-data feature flags. We trained on 80 percent of the data and evaluated on a held-out 20 percent. The R squared on the holdout is 0.633, which means structured-data features explain about 63 percent of AEO score variance. That is high enough to attribute score lifts back to individual features with confidence.

Per-feature attribution uses SHAP (SHapley Additive exPlanations) values, computed on a 50,000-row sample of the training data. SHAP gives a per-feature, per-brand contribution to the predicted score. Averaging SHAP values across all brands gives a global feature importance. We use both: global importances drive the per-feature ranking in the brief, and per-brand SHAP values power the per-brand fix list that the public AEO score returns.

Stage 2: score to crawls

The second model is a separate LightGBM regressor that predicts the natural log of (crawl count + 1) from the AEO score alone. We deliberately exclude business category, domain popularity, and other structural controls from this model.

Excluding controls is a methodological choice that needs justification. Including them is statistically tempting because it would isolate the score's effect from confounders. The problem is that a brand cannot change its business category in response to a fix. A dentist cannot decide to be an ecommerce store. The unconditional score-to-crawls relationship is therefore the relationship that matters for any product claim about what AEO investment buys.

The Stage 2 holdout R squared is 0.123. AEO score explains about 12 percent of crawl count variance, on its own. The rest of the variance (88 percent) lives in business category, domain popularity, brand recognition, backlink graph density, and brand-specific noise. This is an honest finding. It means our model is not a crystal ball for individual brands. It is, however, an accurate model of the per-point effect of AEO score on expected crawl rate, which is the variable a brand can actually move.

Chaining the two stages

To translate a feature's Stage 1 score lift into a Stage 2 crawl impact, we chain the two. For any feature whose presence raises a brand's expected score by some delta, the expected crawl multiplier is approximately (1 + 0.04)^delta. That is the source of the compound table in Section 3.4 of the brief and the dedicated article on per-point compounding.

We publish the chained values in the per-feature ranking. We deliberately do not publish raw per-feature SHAP values, because they are noisier and easier to misread. Chained values are conservative and interpretable.

The model's blind spots

Three caveats are worth being explicit about. First, the Stage 2 R squared of 0.123 means most of the crawl-count variance lives outside the model. AEO score is the largest single input a brand can move, not the only input that drives crawls.

Second, the analysis is observational, not experimental. We did not randomize a treatment group to receive structured-data fixes and withhold them from a matched control. The causal interpretation (fix the signals, score rises, crawls rise) is consistent with how AI bots are documented to work, but it is supported by correlation plus a plausible mechanism rather than by direct experimental evidence. We are setting up the experimental version of this study with Engagemii Fix-It customers, but it requires 30 to 60 days of post-fix crawl data to read out.

Third, the bot sample is opportunistic. We see crawls that reach our directory pages, not crawls on the brand's own site. The directory is large and frequently crawled, which makes it a usable proxy, but a brand-side measurement instrument is a future addition we want to make. Some directionally surprising findings (a brand might be crawled heavily on their own domain but rarely on our directory page, or vice versa) will only surface once that instrument is in place.

Why publish a model that explains only 12 percent of variance

Two reasons. First, the 12 percent that AEO score explains is the part a brand can change. Business category and brand recognition explain more of the total variance, but you cannot move them. So a model that explains less of the total variance can be more useful than a model that explains more, as long as it accurately captures the part of the variance that is under your control.

Second, the descriptive cuts (the cohort table and the score-band table in Sections 3.2 and 3.3 of the brief) tell the same story as the model with much less smoothing. The top 0.1 percent of crawled brands have a 75-times higher rate of AEO score 8+ adoption than the bottom 90 percent. That ratio does not come from a model. It comes directly from counting. When the descriptive cuts and the model agree, we trust the finding. When they disagree, we report the descriptive cut and treat the model as suggestive.

All five major findings in the brief are supported by both the model and the descriptive data. That is the reason we published rather than the reason we didn't.

If you want to reproduce this

The features, AEO scores, and crawl counts that produced this analysis are derived from the public Engagemii brand directory. Anyone who wants to verify the directional findings can take any sample of public sites, score them via the free AEO endpoint at engagemii.com/aeo, observe their public crawl behavior, and replicate the cohort patterns described in Section 3.2 of the brief.

We do not publish the trained model weights. We do publish the full methodology, the descriptive cuts, the goodness-of-fit metrics, and the caveats above. That is enough for a thoughtful reader to evaluate the work.

About this analysis

The full methodology, model architecture, training procedure, and limitations are in Section 2 of Engagemii's research brief at engagemii.com/research/aeo-crawl-drivers.

If you want to cite this article, the URL is engagemii.com/blog/how-we-trained-aeo-crawl-model.

Ready to find out if AI can cite your brand?

Get Your Free AEO Score