A 2026 analysis of 23,000 LLM citations across ChatGPT, Perplexity, Gemini, and Claude found that 92% of brands are invisible in AI search results (Omniscient Digital, 2026). For a significant chunk of those invisible stores, the root cause is not missing schema, weak content, or bad SEO. It is a single line in their robots.txt file that tells AI crawlers to stay away.
If your store blocks GPTBot, ClaudeBot, PerplexityBot, or Google-Extended, your products will never appear in AI shopping recommendations no matter how good your structured data is. This guide walks through exactly how to audit your robots.txt, identify which AI user-agents you are blocking, test what each crawler sees, and fix your configuration without opening your site to bad bots.
Why robots.txt Matters More Than Ever for Ecommerce
The robots.txt file sits at the root of every domain (yourstore.com/robots.txt) and tells automated crawlers which pages they can and cannot access. Historically, ecommerce teams used it to manage Googlebot and prevent index bloat: blocking cart pages, checkout flows, and internal search results.
But in 2026, the crawler landscape expanded dramatically. OpenAI, Anthropic, Perplexity, Google (for AI training), and a dozen smaller AI companies all run dedicated bots. With 45 billion AI search sessions happening monthly and ChatGPT alone holding 64.5% market share (Stackmatix, 2026), blocking these bots is equivalent to de-indexing your store from the fastest-growing search channel.
Bain research shows 80% of consumers rely on zero-click AI results at least 40% of the time (Bain, 2026). When a shopper asks ChatGPT “what’s the best protein powder under $50?”, your store appears only if GPTBot was allowed to crawl your product pages. If it was blocked, ChatGPT recommends your competitor instead.
The AI Crawlers You Need to Know About
Here are the major AI crawler user-agents relevant to ecommerce stores in 2026:
| User-Agent | Operator | Purpose | Default Access |
|---|---|---|---|
GPTBot | OpenAI | ChatGPT training and live search | Often blocked |
ChatGPT-User | OpenAI | ChatGPT real-time browsing | Often blocked |
ClaudeBot | Anthropic | Claude training and citations | Often blocked |
PerplexityBot | Perplexity | AI search indexing | Mixed |
Google-Extended | AI training (separate from Googlebot) | Usually allowed | |
Bytespider | ByteDance | AI training | Varies |
cohere-ai | Cohere | Enterprise AI citations | Usually allowed |
Amazonbot | Amazon | Product indexing, Alexa | Varies |
Applebot-Extended | Apple | Siri, Spotlight AI | Usually allowed |
A critical distinction: Googlebot and Google-Extended are separate user-agents. Googlebot handles traditional search indexing. Google-Extended controls whether Google can use your content for AI model training and AI Overviews. Blocking Google-Extended does not affect your regular Google rankings, but it does affect whether your store appears in Google AI Mode responses.
Step 1: Download and Read Your Current robots.txt
Open your terminal and fetch your current file:
curl -s https://yourstore.com/robots.txt
If you manage multiple stores, save each one:
curl -s https://yourstore.com/robots.txt -o robots-yourstore.txt
curl -s https://yourotherstore.com/robots.txt -o robots-other.txt
What to Look For
Scan the output for any Disallow rules targeting AI crawlers. The most common blocking patterns:
User-agent: GPTBot
Disallow: /
User-agent: ChatGPT-User
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: PerplexityBot
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: *
Disallow: /
That last one (User-agent: * with Disallow: /) blocks everything, including all AI crawlers. If you see this, your store is invisible to every bot on the internet except those that ignore robots.txt entirely.
Common Hosting Platform Defaults
Many ecommerce platforms ship default robots.txt files that inadvertently block AI crawlers:
- Shopify: Generally permissive, but some themes add restrictive rules in
{{ robots.txt }}Liquid templates - WooCommerce: Inherits WordPress defaults, which may block AI bots if security plugins are active
- BigCommerce: Platform-managed
robots.txtwith limited customization on lower tiers - Wix/Squarespace: Platform-controlled; you cannot edit
robots.txtdirectly in most cases
If your platform does not let you edit robots.txt, check whether it offers a “verified bots” or “AI crawlers” toggle in settings. Shopify, for example, added AI crawler controls in its admin dashboard in late 2025.
Step 2: Test What Each AI Crawler Actually Sees
Reading your robots.txt tells you what you intended. Testing tells you what actually happens. Use these tools to verify crawler access:
Google robots.txt Tester (Search Console)
Google Search Console includes a robots.txt tester under Settings > robots.txt Tester. It simulates Googlebot and Google-Extended. Enter specific URLs (like /products/protein-powder) and see whether they are allowed or blocked.
Limitation: Only tests Google user-agents, not OpenAI or Anthropic bots.
robots.txt Validator Tools
Several online validators parse robots.txt against arbitrary user-agents:
- technicalseo.com/robots-txt-tester: Free, supports custom user-agents, bulk URL testing
- ryte.com: Comprehensive crawler simulation with AI bot user-agents
- screamingfrog.co.uk: The Screaming Frog SEO Spider lets you test
robots.txtagainst any user-agent string
Manual curl Simulation
For precise testing, simulate each crawler directly:
# Test as GPTBot
curl -s -A "GPTBot" https://yourstore.com/products/protein-powder -o /dev/null -w "%{http_code}"
# Test as ClaudeBot
curl -s -A "ClaudeBot" https://yourstore.com/products/protein-powder -o /dev/null -w "%{http_code}"
# Test as PerplexityBot
curl -s -A "PerplexityBot" https://yourstore.com/products/protein-powder -o /dev/null -w "%{http_code}"
A 200 response means the page is accessible. A 403 means something is blocking the request (which could be robots.txt, a WAF rule, or a server-level block). A 301/302 redirect is fine as long as the final destination returns 200.
Important caveat: robots.txt is a voluntary protocol. Crawlers that respect it will check the file before making requests, but the curl test above only checks HTTP access, not robots.txt compliance. Combine both: verify your robots.txt allows the user-agent, then confirm the pages return 200.
Step 3: Check for Secondary Blocking Layers
robots.txt is the most common blocker, but not the only one. Check these additional layers:
Cloudflare Bot Management
If your store uses Cloudflare, navigate to Security > Bots in the dashboard. Look for:
- Bot Fight Mode: Can block “verified bots” including AI crawlers depending on settings
- Super Bot Fight Mode: More aggressive; often blocks GPTBot and ClaudeBot by default
- User-Agent blocking rules: Custom WAF rules that block specific user-agents
Cloudflare maintains a list of “verified bots” that includes major AI crawlers. If Bot Fight Mode is set to “Definitely Automated” blocking, it will challenge or block AI crawlers even if your robots.txt allows them.
WordPress Security Plugins
Plugins like Wordfence, Sucuri, and All In One Security can block AI crawlers at the application level:
- Wordfence: Check Firewall > All Options > “Rate Limiting Crawlers” and “Block Bots”
- Sucuri: Check Access Control > API Whitelist
- All In One Security: Check Firewall > Bot settings
Server-Level Blocks
Check your .htaccess (Apache) or nginx.conf for user-agent denies:
# Apache - look for these patterns
RewriteCond %{HTTP_USER_AGENT} ^.*GPTBot.*$ [NC]
RewriteRule .* - [F]
# Nginx - look for these patterns
if ($http_user_agent ~* "GPTBot") {
return 403;
}
Step 4: Build an AI-Friendly robots.txt
Here is a production-ready robots.txt template for ecommerce stores that want AI visibility while protecting sensitive pages:
# AI Crawlers - ALLOW for product discoverability
User-agent: GPTBot
Allow: /products/
Allow: /collections/
Allow: /blogs/
Allow: /pages/
Disallow: /cart
Disallow: /checkout
Disallow: /account
Disallow: /search
Disallow: /admin
User-agent: ChatGPT-User
Allow: /products/
Allow: /collections/
Allow: /blogs/
Allow: /pages/
Disallow: /cart
Disallow: /checkout
Disallow: /account
Disallow: /search
Disallow: /admin
User-agent: ClaudeBot
Allow: /products/
Allow: /collections/
Allow: /blogs/
Allow: /pages/
Disallow: /cart
Disallow: /checkout
Disallow: /account
Disallow: /search
Disallow: /admin
User-agent: PerplexityBot
Allow: /products/
Allow: /collections/
Allow: /blogs/
Allow: /pages/
Disallow: /cart
Disallow: /checkout
Disallow: /account
Disallow: /search
Disallow: /admin
User-agent: Google-Extended
Allow: /
User-agent: Googlebot
Allow: /
Disallow: /cart
Disallow: /checkout
Disallow: /account
Disallow: /search
Disallow: /admin
# Generic crawlers
User-agent: *
Allow: /products/
Allow: /collections/
Allow: /blogs/
Disallow: /cart
Disallow: /checkout
Disallow: /account
Disallow: /search
Disallow: /admin
Key Design Decisions
Allow product and collection pages: These are the pages that generate AI recommendations. Every product page should be crawlable.
Allow blog content: AI models cite informational content heavily. Your blog posts establish topical authority and increase the probability of product citations.
Block cart, checkout, account, and search: These pages add no value for AI crawlers and waste crawl budget. Keep them blocked.
Explicit per-bot rules: Generic User-agent: * rules are fallbacks. Explicit rules for each major AI crawler ensure clarity and prevent ambiguity.
For Shopify Stores
Shopify generates robots.txt automatically via a Liquid template. To customize it:
- Go to Online Store > Themes > Actions > Edit Code
- Find the
robots.txt.liquidtemplate (under Templates or Sections) - Add AI crawler allow rules before the default Shopify rules
- Test immediately with
curl -s https://yourstore.com/robots.txt
Alternatively, the Shopify admin now includes an “AI Crawlers” section under Online Store > Preferences where you can toggle access for major bots without editing Liquid.
For WooCommerce Stores
WordPress generates robots.txt dynamically. You can override it by:
- Creating a physical
robots.txtfile in your WordPress root directory - Using an SEO plugin (Yoast, Rank Math) to edit
robots.txtfrom the admin panel - Adding a filter in your theme’s
functions.php:
add_filter('robots_txt', function($output, $public) {
if ($public) {
$output .= "\n# AI Crawlers\n";
$output .= "User-agent: GPTBot\n";
$output .= "Allow: /product/\n";
$output .= "Allow: /product-category/\n";
$output .= "Allow: /blog/\n";
$output .= "Disallow: /cart/\n";
$output .= "Disallow: /checkout/\n";
// Repeat for other AI bots
}
return $output;
}, 10, 2);
Step 5: Verify Your Changes Propagated
After updating robots.txt, verify the changes are live:
# Check the file itself
curl -s https://yourstore.com/robots.txt
# Verify it was not cached (should reflect immediately)
curl -s -H "Cache-Control: no-cache" https://yourstore.com/robots.txt
Most crawlers re-check robots.txt every 24 hours. Google re-fetches it within a day. OpenAI and Anthropic do not publish their re-crawl schedules, but empirical testing suggests GPTBot re-checks within 24 to 48 hours.
If you previously had AI crawlers blocked for months, do not expect immediate results. AI models build citations from crawl data accumulated over time. After unblocking, it typically takes 2 to 6 weeks for your products to start appearing in AI recommendations, depending on crawl frequency and content quality.
Step 6: Monitor Ongoing AI Crawler Activity
Set up monitoring to confirm AI crawlers are actively accessing your store:
Server Log Analysis
Search your access logs for AI bot user-agents:
# Apache
grep -E "GPTBot|ClaudeBot|PerplexityBot|ChatGPT-User" /var/log/apache2/access.log | tail -50
# Nginx
grep -E "GPTBot|ClaudeBot|PerplexityBot|ChatGPT-User" /var/log/nginx/access.log | tail -50
Cloudflare Analytics
If you use Cloudflare, navigate to Security > Bots > “Verified Bots” to see which AI crawlers are visiting your site and how frequently.
Google Search Console
The “Crawl Stats” report in Google Search Console shows Google-Extended activity. Look for an increase in crawl requests after you update robots.txt.
The Cost of Blocking AI Crawlers: By the Numbers
Putting the business impact in concrete terms:
- 45 billion AI search sessions per month across ChatGPT, Gemini, Perplexity, and Claude (Stackmatix, 2026)
- 64.5% of those sessions happen on ChatGPT alone
- 80% of consumers use zero-click AI results at least 40% of the time (Bain, 2026)
- 92% of brands are invisible in AI citations (Omniscient Digital, 2026)
- Google AI Mode drives 35% more organic clicks when citations do appear (Optimyzee, 2026)
If your robots.txt blocks GPTBot, you are voluntarily removing your store from the largest AI search platform. That is not a strategic choice. That is an oversight.
Common Mistakes to Avoid
Blocking AI Training But Wanting AI Citations
Some store owners block Google-Extended or GPTBot because they do not want their content used for AI training, but they still want to appear in AI search results. These are the same thing. AI models cannot recommend your products if they cannot access your content. If you want AI visibility, you must allow AI crawlers.
Over-Reliance on Wildcard Blocks
User-agent: *
Disallow: /wp-
Disallow: /cart
Disallow: /checkout
This is fine. But if you add Disallow: / under the wildcard, you block everything including AI crawlers. Always test the wildcard rule’s impact on AI user-agents.
Forgetting Mobile App Crawlers
Amazonbot and Applebot-Extended drive product recommendations in Alexa and Siri. If your products are consumer goods, these crawlers matter. Include them in your allow list.
Ignoring Subdomains
If your blog is on blog.yourstore.com and your store is on www.yourstore.com, each subdomain has its own robots.txt. Check both.
How Shopti.ai Helps
Auditing robots.txt is one piece of AI discoverability. The full picture includes structured data quality, product feed completeness, content optimization for AI citations, and ongoing monitoring. Shopti.ai runs a comprehensive audit across all of these dimensions, starting with crawler access and extending through schema validation and AI citation tracking. If you want to skip the manual process, check your store’s agent discoverability score free at shopti.ai.
FAQ
Does allowing AI crawlers hurt my Google rankings?
No. Googlebot and Google-Extended are separate user-agents. Allowing GPTBot or ClaudeBot has zero impact on your Google search rankings. Google does not penalize sites for allowing other bots.
What if my platform does not let me edit robots.txt?
Shopify, Wix, and Squarespace all offer AI crawler controls in their admin dashboards as of 2026. For platforms without direct access, check whether your CDN (Cloudflare, Fastly) offers bot management rules that can override robots.txt behavior. As a last resort, switching to a self-hosted platform gives you full control.
Should I allow AI crawlers on my entire site?
No. Allow access to product pages, collection/category pages, and blog content. Block cart, checkout, account, admin, and internal search pages. These provide no value to AI models and can waste crawl budget.
How long after updating robots.txt will AI models cite my products?
Typically 2 to 6 weeks. AI crawlers need to re-crawl your pages, process the content, and incorporate it into citation databases. Stores with higher domain authority and better structured data tend to get cited faster.
Is blocking AI crawlers ever the right choice?
Yes, if your store sells sensitive products, operates in a regulated industry where AI recommendations could cause legal issues, or if you have an explicit policy against AI training on your content. For most ecommerce stores, the visibility benefit far outweighs the risk.
Sources
Omniscient Digital. “2026 AI Search Visibility Report.” IssueWire, May 7, 2026. https://www.issuewire.com/2026-ai-search-visibility-report-which-service-actually-gets-brands-recommended-by-chatgpt-perplexity-and-google-ai-1861410847710076
Stackmatix. “AI Search Market Share 2026: The Complete Breakdown.” Stackmatix Blog, March 2026. https://www.stackmatix.com/blog/ai-search-market-share-2026
Bain & Company. “Goodbye Clicks, Hello AI: Zero-Click Search Redefines Marketing.” Bain Insights, 2026. https://www.bain.com/insights/goodbye-clicks-hello-ai-zero-click-search-redefines-marketing/
Optimyzee. “Google AI Mode Ads 2026: Everything Advertisers Need to Know.” Optimyzee Blog, 2026. https://www.optimyzee.com/blog/google-ai-mode-ads-2026
OpenAI. “GPTBot Documentation.” OpenAI Platform Docs, 2025. https://platform.openai.com/docs/gptbot
Google. “Google-Extended Documentation.” Google Search Central, 2024. https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers
Related Articles
- AI Crawlers 101: What They Are, How They Work, and How to Let Them Index Your Store
- llms.txt for Ecommerce: Your Store’s Instruction Manual for AI
- Schema Validators Won’t Save You: What Actually Tests AI Discoverability in 2026
Check your store agent discoverability score free at shopti.ai.
