Back to blog
The Direction Gap10 min read

Google-Agent Is Crawling Your Site and Ignoring robots.txt: What SEOs Need to Know

On March 20, 2026, Google added a new AI crawler called Google-Agent to its official user-triggered fetchers list. Unlike Googlebot, it completely bypasses robots.txt. Here's what agency SEOs need to know and do right now.

March 29, 2026

If you manage robots.txt files for clients, your playbook just changed. On March 20, 2026, Google officially added the Google-Agent crawler to its user-triggered fetchers documentation. This is not another minor tweak to Google's crawling infrastructure. Google-Agent is a new category of web visitor entirely, and it ignores robots.txt by design.

I spent the past week digging into the documentation, reading the IETF drafts, and parsing server logs. Here's everything I know, what it means for the sites you manage, and the concrete steps I'd take today.

What Is the Google-Agent Crawler?

Google-Agent is a user-triggered fetcher used by AI systems hosted on Google's infrastructure to browse the web and perform actions on behalf of a user. It is the official identity for products like Project Mariner, Google DeepMind's AI agent that navigates websites autonomously in a Chrome browser.

Unlike Googlebot, which autonomously crawls pages to build Google's search index, Google-Agent only fires when a real human asks an AI to do something. Think of a user telling Project Mariner: "Find me flights to Berlin under $400 and compare three options." The AI agent opens websites, clicks through pages, fills forms, and extracts data. Every one of those page requests arrives with the Google-Agent user agent string.

According to Google's crawling documentation, the user agent was "rolling out over the next few weeks" as of March 20. You may already be seeing this traffic in your server logs.

How Does Google-Agent Differ from Googlebot and Google-Extended?

These three crawlers serve completely different purposes, and conflating them will cause problems for your clients. Here's how they break down:

| Crawler | Purpose | Follows robots.txt? | Category |

|---|---|---|---|

| Googlebot | Indexes pages for Google Search | Yes | Autonomous crawler |

| Google-Extended | Trains Gemini AI models | Yes | Autonomous crawler |

| Google-Agent | Browses the web on behalf of a user (e.g., Project Mariner) | No | User-triggered fetcher |

The critical distinction: Googlebot crawls to index. Google-Extended crawls to train. Google-Agent navigates to act.

Google-Agent doesn't build an index. It doesn't train models. It executes a specific task for a specific user in real time. And because it's classified as a user-triggered fetcher, the same category as Google Site Verifier and Google-NotebookLM, it bypasses robots.txt entirely.

This is not a loophole. Google's documentation explicitly states that user-triggered fetchers ignore robots.txt because the fetch was requested by a human user, not by an autonomous system.

Why Does Google-Agent Bypass robots.txt?

User-triggered fetchers bypass robots.txt because the request originates from a human, not from an automated crawling schedule. Google's logic is that when a person actively asks an AI to retrieve a specific page, the fetch is functionally equivalent to that person visiting the page in their browser.

This is consistent with how other user-triggered fetchers work. Google-NotebookLM, for example, fetches pages that users explicitly add as sources, and it also ignores robots.txt directives. Google Site Verifier does the same when a user initiates ownership verification through Search Console.

The practical implication is significant: if you've blocked certain paths in robots.txt, those blocks stop Googlebot but don't stop Google-Agent. A user can direct Project Mariner to retrieve content from any URL on your site, regardless of your crawl directives.

This isn't necessarily alarming for most sites. But for clients with paywalled content, staging environments accidentally exposed, or sensitive admin paths, it's worth understanding.

What Is Project Mariner and Why Should You Care?

Project Mariner is a research prototype from Google DeepMind that automates web browsing tasks. It can navigate websites, click buttons, fill forms, compare products, and extract information, all autonomously within a Chrome browser instance.

As of early 2026, Mariner is available to Google AI Ultra subscribers in the US at $249.99/month. It can handle up to 10 simultaneous tasks and has scored 83.5% on the WebVoyager benchmark for real-world web task completion.

For SEOs, the key detail is this: when Project Mariner visits your client's site, it identifies itself as Google-Agent. It operates a full Chrome instance, meaning it renders JavaScript, interacts with dynamic content, and behaves like a real browser session. This is fundamentally different from how traditional crawlers interact with your pages.

Google has also announced plans to integrate Mariner capabilities into AI Mode in Search and the Gemini app. The traffic volume from Google-Agent will only grow.

What About the web-bot-auth Protocol?

Google's documentation includes a notable detail: they are experimenting with the web-bot-auth protocol using the identity https://agent.bot.goog.

Web Bot Auth is an emerging IETF standard that enables cryptographic verification of bots. Instead of relying on user-agent strings (which are trivially spoofed), the agent signs its HTTP requests with a private key. Your server can verify the signature using a published public key. Think of it as a digital passport for bots.

The IETF Working Group for Web Bot Auth is actively meeting, with a virtual session scheduled for April 13, 2026. Companies including Google, Amazon, Cloudflare, Akamai, and OpenAI are pushing this standard forward.

But here's the reality check: web-bot-auth is experimental and not production-ready. As of March 2026, there is no mechanism for site owners to use this protocol to control or restrict Google-Agent access. WAF providers like Cloudflare and Akamai may eventually support automatic validation of Web Bot Auth signatures, but that's future-state.

For now, the protocol is something to monitor, not something to act on.

How Does This Fit into Google's Broader Crawling Infrastructure Changes?

Google-Agent didn't appear in isolation. Over the past 12 months, Google has been overhauling its entire crawling documentation and infrastructure:

  • November 2025: Migrated all crawling docs from Google Search Central to a dedicated crawling infrastructure site, reflecting that Google's crawlers serve far more than search (Shopping, News, Gemini, AdSense, AI agents)
  • December 2025: Migrated additional docs covering crawl budget optimization, HTTP status codes, and DNS error debugging
  • January 2026: Added Google Messages to the user-triggered fetchers list
  • February 2026: Updated file size limits. Google's crawlers now default to the first 15MB of a file, with individual products setting different limits
  • March 2026: Added Google-Agent with its own dedicated IP range file (user-triggered-agents.json)

A key insight from the Search Off the Record podcast (March 12, 2026): Googlebot is not a standalone program. It's a name used by one team within a shared crawling infrastructure that also serves NotebookLM, Gemini, AdSense, Shopping, and now AI agents. Google-Agent is simply the newest named client of that shared system.

This matters because it means Google's crawling infrastructure is increasingly multi-purpose. The days of "managing Googlebot" as a single concern are over. You're now managing access for an ecosystem of crawlers, fetchers, and agents, each with different rules.

What Should You Do Right Now?

Here's my practical checklist for agency SEOs managing client sites. I'd work through this in the next two weeks:

1. Update Your Log Analysis

  • Add "Google-Agent" as a tracked user agent in your server log analysis
  • Monitor for traffic volume, request patterns, and which URLs are being fetched
  • Compare Google-Agent traffic against your Googlebot traffic to understand the ratio

2. Audit What robots.txt Actually Protects

Review every client's robots.txt with fresh eyes. For each Disallow rule, ask: "If this path is accessible to user-triggered fetchers like Google-Agent, is that a problem?"

Paths that may need additional protection beyond robots.txt:

  • Staging or development environments
  • Admin or login pages
  • Internal search results pages generating infinite URL patterns
  • Paywalled or gated content

If a path genuinely needs to be restricted, robots.txt alone is not sufficient. Use server-side access controls (authentication, IP allowlisting, or WAF rules) for anything sensitive.

3. Verify Legitimate Google-Agent Requests

Google-Agent uses its own IP range file: user-triggered-agents.json. Use this to verify that requests claiming to be Google-Agent are actually from Google. User agent strings can be spoofed, so IP verification is your reliable check.

Google also updates its IP range JSON files daily (not weekly), making verification more accurate.

4. Review Your WAF and Rate-Limiting Rules

If your client's firewall or rate-limiting treats all "bots" the same, you risk blocking legitimate Google-Agent traffic. That means blocking real users who are using Google's AI tools to interact with your client's site.

Treat Google-Agent as a legitimate user-driven request. Configure your WAF to recognize and appropriately handle this traffic rather than blocking it indiscriminately.

5. Understand the Google-Extended Distinction

Blocking Google-Extended in robots.txt prevents your content from being used to train Gemini models. That's still valid and still works. But blocking Google-Extended has zero effect on Google-Agent. These are separate systems with separate purposes.

If a client asks "are we blocking AI crawlers?" the answer is now more nuanced than a single robots.txt directive.

6. Prepare Your Sites for Agentic Traffic

AI agents interact with your site differently than human visitors or traditional crawlers. They fill forms, click through multi-step flows, and interact with dynamic elements. Make sure your sites are ready:

  • Clean, semantic HTML helps agents understand page structure
  • Proper form labels and accessible markup improve agent interaction accuracy
  • Fast page loads matter even more when an agent is executing multi-step tasks
  • Review your technical SEO fundamentals to ensure your infrastructure handles this new traffic type cleanly

7. Stay Current on GEO and AI Crawler Access

The broader landscape of AI crawlers is shifting fast. Google-Agent is one piece of a larger puzzle that includes GPTBot, ClaudeBot, PerplexityBot, and others. Each has different rules about robots.txt compliance.

I'd strongly recommend reading through our AI SEO guide to understand the full picture, from llms.txt files to schema markup for AI citation. Managing AI crawler access is becoming a core part of technical SEO, not an edge case.

What This Means for the Future of Crawl Control

The introduction of Google-Agent highlights a growing tension in how the web works. Robots.txt was designed in 1994 for a web where bots autonomously crawled sites to build search indexes. It was never designed for a world where AI agents act as proxies for human users.

Google's classification of Google-Agent as a user-triggered fetcher is logically consistent: if a human asks for the content, it's a user request. But it also means site owners have less control over how AI systems access their content than many assumed.

The web-bot-auth protocol may eventually provide more granular controls. The IETF working group has milestones targeting mid-2026 for initial specifications. But until those standards are ratified and implemented by browsers and WAFs, the practical reality is:

  • robots.txt controls autonomous crawlers (Googlebot, Google-Extended, GPTBot, etc.)
  • robots.txt does not control user-triggered fetchers (Google-Agent, Google-NotebookLM, etc.)
  • Server-side access controls are the only reliable way to restrict access to sensitive content

For most sites, this isn't a crisis. Google-Agent traffic is currently limited to US-based AI Ultra subscribers using Project Mariner. But that will change as Google expands these capabilities globally and integrates them into more products.

The agencies that update their playbooks now will be ahead of the curve. The ones that discover Google-Agent in their logs six months from now and scramble to understand it won't be.

Start with your log analysis. Audit your robots.txt assumptions. Brief your clients. The agentic web is here.

Frequently Asked Questions

Can you block Google-Agent with robots.txt?

No. Google-Agent is classified as a user-triggered fetcher, which means it bypasses robots.txt directives entirely. This is by design, not a bug. Google treats user-triggered fetches the same way it treats a human visiting a page directly. To restrict access to specific paths, you need server-side controls like authentication, IP-based rules, or WAF configurations.

What is the difference between Google-Agent and Google-Extended?

Google-Extended is an autonomous crawler that collects content for training Google's Gemini AI models. You can block it via robots.txt. Google-Agent is a user-triggered fetcher used by AI products like Project Mariner to browse the web on behalf of a human user. It ignores robots.txt. They serve completely different purposes and operate under different rules.

Does Google-Agent affect my search rankings?

No. Google-Agent is not a search crawler and does not contribute to Google's search index. Your rankings in Google Search are determined by Googlebot, not by Google-Agent. However, how your site performs when an AI agent interacts with it (page speed, form usability, content clarity) will matter as agentic browsing becomes more common.

How do I verify if a request is really from Google-Agent?

Google publishes a dedicated IP range file called user-triggered-agents.json for Google-Agent. Cross-reference incoming requests that claim the Google-Agent user agent against these IP ranges. User agent strings can be spoofed, so IP verification is the reliable method. Google now updates these IP range files daily for improved accuracy.

What is the web-bot-auth protocol Google mentioned?

Web-bot-auth is an emerging IETF standard for cryptographic bot verification. Instead of relying on user-agent strings, bots sign HTTP requests with a private key that servers can verify using a published public key. Google is experimenting with this protocol for Google-Agent using the https://agent.bot.goog identity. It's not production-ready yet, but it signals the future direction of bot identification on the web.

Enjoyed this article?

Get actionable SEO insights delivered to your inbox. No spam, ever.