8 min read

AI Influence

The AI Influence view shows how AI surfaces interact with your site. It covers two things you can actually measure today:

AI crawler access — which of 29 tracked AI bots your robots.txt, llms.txt and Content-Signal declarations allow or block.
AI referrals — which AI products send you human visitors (ChatGPT, Claude, Perplexity, Gemini, Copilot, DeepSeek, and 20+ others).

These are surfaced across three tabs: Overview (posture summary + robots/signals inspection), Crawlers (the full read-only catalogue table), and Metrics (AI referral analytics).

Two further layers — Citation (where your brand appears inside AI answers) and Assist (probabilistic lift in branded search after AI exposure) — are deferred to V2 because they need either a vendor or risk-aware probabilistic modelling. We won't ship vanity metrics we can't measure rigorously.

You'll find AI Influence under Intelligence → AI Influence in the dashboard.

AI Influence is available on every plan, including Free. AI traffic visibility is a baseline acquisition signal — not a paid feature.

AI as a revenue channel. This page shows AI traffic. When an AI-identified visitor converts, that conversion is also credited to a first-class AI channel in Revenue → Attribution, with the same per-product breakdown — plus a Likely AI (unspecified) bucket for heuristic-detected visits that arrive without a referrer. Use this page for traffic; use Revenue attribution for conversions. (Revenue attribution is shown on the authenticated dashboard, not on public or shared dashboards.)

Overview tab

The Overview answers one question at a glance:

"Can ChatGPT (or Claude, or Perplexity, …) crawl my site? Did I block anything by accident?"

It shows a compact Summary (how many of the 29 tracked bots can crawl you, how many are blocked at the root, AI human visits) and a Robots & Signals card that makes the verdict verifiable — direct "View raw" links to your robots.txt and llms.txt, the parsed Content-Signal declaration, and the explicit list of bots blocked at the root.

Content-Signal

Zenovay parses the Content-Signal directive that Cloudflare's "Managed robots.txt" and a growing number of sites now emit. It expresses AI-usage intent independent of crawl access:

search — may your content be used to build a search index?
ai-input — may it be used as real-time input to a generative answer (RAG/grounding)?
ai-train — may it be used to train or fine-tune a model?

Each is shown as yes, no, or unset. This is the modern, content-specific AI-consent declaration — distinct from a blanket Disallow.

Crawlers tab

A read-only table of all 29 tracked AI crawlers with each bot's configured access on your site. It is deliberately not a control panel — Zenovay reads robots.txt; it does not enforce blocks at the edge. It reports what your robots.txt declares.

The 29 bots are grouped into 5 categories:

Live AI Assistants — fetch pages on demand to answer user questions: ChatGPT-User, Perplexity-User, ClaudeBot, Bingbot (Copilot), Applebot-Extended.
Model Training — harvest content for training corpora: GPTBot, Google-Extended, anthropic-ai, Common Crawl (CCBot), Meta-ExternalAgent, Bytespider, Amazonbot.
AI Browser Agents — autonomous task agents browsing on a user's behalf: ChatGPT-Operator, Claude-Computer-Use, You.com Agent.
Commercial Scrapers — data-mining feeds sold to AI labs: DataForSEO Bot, PetalBot, Webz.io.
Search with AI Overlays — primary search indexes whose results power AI answers: Googlebot (AI Overviews).

What the verdict means

For an AI-content-access tool the only question that matters is "can this bot reach my content?". The verdict is therefore reported in three states:

Allowed — the bot is permitted at your site root. This includes bots that are root-allowed but excluded from generic infrastructure paths (/api, /_next, /e, …) — those exclusions are shown as a caption, not an alarm, because they don't restrict content.
Blocked — the bot is disallowed at the site root by robots.txt.
Unknown — we couldn't read your robots.txt (network error, 5xx, malformed) or the site hasn't had its first check yet (new sites are checked within 24 hours).

RFC 9309 group combining (important if you use Cloudflare Managed robots.txt)

We follow RFC 9309. UA tokens are matched case-insensitively as substrings (so ClaudeBot matches Anthropic-ClaudeBot). Crucially, all User-agent groups that match a bot are combined, then longest-match precedence is applied with Allow winning ties.

This matters because Cloudflare's Managed robots.txt emits a managed block that Disallow: /s many AI bots, and site owners frequently add a later custom group that re-Allow: /s the ones they want for AI visibility. A naive first-match parser would wrongly report those re-allowed bots as Blocked. Zenovay combines the groups and reports the bot's effective access — matching what a compliant crawler actually does.

The Re-check now button

The header has a Re-check now button. It re-fetches your robots.txt + llms.txt, re-evaluates the full 29-bot catalogue, and fires 5 live HTTP HEAD requests (one representative bot per category) to detect runtime blocks (Cloudflare WAF, CDN bot filters, geo restrictions). Rate-limited to once per minute per site.

The honest caveat

robots.txt is advisory, not enforcement.

Reputable crawlers (OpenAI, Anthropic, Google, Perplexity, Microsoft) honour robots.txt. Less reputable ones ignore it. A "Blocked" verdict means the bot has been politely asked to stay away — your server still has to enforce it if you want real protection. To actively enforce: Cloudflare AI-bot rules, a WAF/rate-limit rule matching User-Agent, or a server-side beacon (Zenovay V1.5 — measures real crawl events rather than configured access).

Metrics tab

The Metrics tab is AI referral analytics — human visitors who arrived from AI products. A visit is classified as AI-originated using four signals, in priority order:

Client-side AI source — the tracker reads a hint from the AI product's in-app browser if present.
Referrer domain match — document.referrer matches a known AI host (chat.openai.com, claude.ai, perplexity.ai, gemini.google.com, copilot.microsoft.com, deepseek.com, you.com, phind.com, t3.chat, kimi.com, +15 more).
UTM source match — the campaign URL carries utm_source=chatgpt, utm_source=claude, etc. (44 known variants).
User-Agent match — visit came from an AI product's in-app browser.

Every detected AI source is listed (not truncated). Each visit stores its ai_source and the ai_detection_method that classified it, with a confidence score 0.0–1.0, so the Detection method distribution panel reconciles exactly with the source breakdown.

Dark AI

A large share of AI-arrived traffic carries no referrer — the user pasted your link from a chat window into a new tab, or the AI surface stripped the referrer. A daily behavioural heuristic catches this "Dark AI" across direct-traffic visitors:

Deep content landing (entered on a long-tail URL, not /)
Single-page focused reading (30–300 seconds, scroll depth > 70%)
First-time visitor with low click interaction
Business-hours arrival

A score ≥ 60 marks the visit as is_ai_traffic=true with ai_detection_method='behavioral_heuristic' and confidence 0.5–0.95. Detection method and confidence are stored per visit so you can filter, export, and audit honestly.

What's deferred to V2

Citation — measuring when your brand or pages appear inside AI answers. Needs a prompt-monitoring vendor or our own crawler. We won't ship numbers we can't substantiate.
Assist — probabilistic estimation of downstream lift in branded search after AI exposure. Surfaced only once we can honestly label confidence.

When V2 ships, every metric will carry an explicit measured vs. inferred label.

Compliance

GPC (Global Privacy Control) is honoured — visitors with Sec-GPC: 1 are not used for behavioural AI heuristics.
IP addresses are hashed with a daily-rotating salt — never stored plaintext.
No cookies are introduced by AI Influence — the existing cookieless tracker remains the only required script.

API

GET /api/analytics/ai-influence?websiteId={id}&period={7d|30d|...}

Returns:

{
  "crawl": {
    "summary": {
      "total": 29, "allowed": 6, "partial": 19, "blocked": 4,
      "indeterminate": 0, "never_checked": 0,
      "robots_txt_present": true, "llms_txt_present": true,
      "hostname": "example.com", "last_checked_at": "2026-05-16T08:14:23Z",
      "blocked_bots": ["Amazonbot", "Bytespider", "CCBot", "Meta-ExternalAgent"],
      "content_signals": { "search": true, "ai_train": false, "raw": "search=yes,ai-train=no" }
    },
    "crawlers": [{ "ua_token": "GPTBot", "vendor": "OpenAI", "category": "training", "verdict": "allowed", "restricted_paths": [], "...": "..." }]
  },
  "referral": { "summary": { "ai_visitors": 1247, "known_ai_visitors": 1100, "dark_ai_visitors": 147, "...": "..." }, "sources": [...], "trend": [...], "top_pages": [...] }
}

Note: verdict is one of allowed / partial / blocked / indeterminate / never_checked in the raw payload; the UI collapses allowed + partial into a single Allowed state (see What the verdict means).

POST /api/analytics/ai-influence-probe?websiteId={id}

Triggers a synchronous re-fetch + parse + 5 live HTTP HEAD probes. Rate-limited to 1 request/minute/website.

Page Flows — visualises emergent visitor navigation.
Visitor Value Scoring — ranks visits by predicted lifetime value.
Data Export — export AI traffic data programmatically.