AI shopping agents do not see your beautifully designed product pages. They see a stripped-down text extract: product name, price, maybe a description, and whatever structured data your HTML contains. If that extract is empty, garbled, or missing critical fields, your products are invisible to ChatGPT, Google AI Mode, and Perplexity regardless of how good your storefront looks. This guide covers five free tools that show you exactly what AI agents read when they visit your ecommerce store, and how to fix the gaps.

The problem is bigger than most store owners realize. A Pragma study in 2026 found that 41% of ecommerce product feeds contain at least one critical error that prevents AI agent ingestion. But feeds are only half the picture. When ChatGPT browses your site in real-time using its retrieval tool, it extracts content the same way Jina Reader does: fetch the HTML, strip the layout, return the text. If your product data lives only in JavaScript-rendered components or behind interactive widgets, the AI gets nothing useful.

BrightEdge reported that AI search results grew 850% between mid-2024 and early 2025. That growth is not slowing. Yet most ecommerce teams optimize for how their store looks in a browser, not for how it reads in plain text. The tools below close that gap.

Why You Need to Preview AI Content Extraction

When a shopper asks ChatGPT “What is the best wireless earbud under $100?”, the AI does not render your product page with images, carousels, and color swatches. It reads the raw text and structured data your server returns. The extraction process works like this:

  1. The AI crawler sends a GET request to your URL
  2. Your server returns HTML (hopefully with embedded structured data)
  3. The crawler strips CSS, JavaScript, and layout markup
  4. What remains is plain text plus any JSON-LD blocks
  5. The AI model processes that text to build its answer

If step 3 produces empty or incomplete content, step 5 produces no citation for your store.

This is not theoretical. Seer Interactive analyzed 25.1 million Google AI Mode impressions in May 2026 and found that 93% of queries ended without a click. When your product content is not extractable, you lose not just the click but the citation itself. AI agents recommend what they can read.

The five tools below let you preview exactly what step 4 looks like for your store. Use them in sequence to find and fix extraction problems.

Tool 1: Jina Reader (r.jina.ai)

Jina Reader is a free API that fetches any URL and returns clean, markdown-formatted text. It is the closest approximation to what an AI agent extracts when it visits your page. Many AI retrieval systems, including implementations based on the Reader protocol, use similar extraction logic.

How to Use It

Open your browser and go to:

https://r.jina.ai/https://yourstore.com/products/your-product

The service fetches your URL, executes any JavaScript it can, strips all HTML/CSS/layout, and returns the page content as clean markdown text. What you see is approximately what an LLM processes when it retrieves your product page.

What to Look For

Read the output carefully and check for these things:

Product name. Does the product name appear clearly at the top of the text? If it is buried inside a navigation menu or mixed with other text, AI agents struggle to identify it as the primary product.

Price. Is the price present as a plain number? If your price renders only through a JavaScript widget or is displayed as an image, Jina Reader might not extract it. AI agents need the price in text form.

Description. Does the full product description appear? Or do you see just a truncated snippet because the rest loads via a “Read more” JavaScript toggle?

Availability. Is “In stock” or “Out of stock” visible in the text? If availability is shown only through a badge image or a dynamic widget, AI agents cannot read it.

Reviews and ratings. Are star ratings and review counts present as text? Or are they rendered as star images with no alt text?

Structured data hints. After the main text, look for any JSON-LD blocks that Jina captures. These are critical for AI understanding.

Common Problems Jina Reveals

  • Empty product pages: Your content is JavaScript-only and Jina cannot render it. This is the most common issue with headless commerce setups and React-based themes.
  • Navigation soup: The extracted text starts with 200 lines of menu items before reaching the product. This happens when your navigation is not properly marked up with semantic HTML.
  • Missing prices: Prices rendered by JavaScript widgets do not appear in the extraction.
  • Duplicate content: The same product description appears three times because your theme outputs it in multiple containers.

Fix

For JavaScript-heavy stores, implement server-side rendering (SSR) or static site generation (SSG) for product pages. At minimum, ensure that the product name, price, description, and availability are present in the raw HTML source. You can verify this with the curl method in Tool 2 below.

Tool 2: curl and grep

The simplest way to check what AI crawlers see is to fetch your page the way they do: with a plain HTTP request. No browser, no JavaScript execution, no rendering engine.

How to Use It

Open your terminal and run:

curl -s https://yourstore.com/products/your-product | head -200

This returns the first 200 lines of raw HTML your server sends to any crawler. Now search for your key content:

curl -s https://yourstore.com/products/your-product | grep -i "product name"
curl -s https://yourstore.com/products/your-product | grep -i "99.99"
curl -s https://yourstore.com/products/your-product | grep -i '"@type": "Product"'

What to Look For

Product name in HTML. If grep cannot find your product name in the raw HTML, it is being rendered by JavaScript after the page loads. Most AI crawlers do not execute JavaScript reliably.

Price in HTML. The price should appear as a text number in the source, not as an image or a JavaScript-generated element.

JSON-LD structured data. Search for "@type": "Product". If you find it, your structured data is server-rendered and accessible to all crawlers. If not, your schema might be injected by JavaScript, which many AI crawlers cannot process.

The Text-Only Test

For a cleaner view, strip all HTML tags:

curl -s https://yourstore.com/products/your-product | sed 's/<[^>]*>//g' | sed '/^$/d' | head -100

This shows you the raw text content without any markup. Read the first 20 lines. If they do not contain your product name and key details, your content is not accessible to non-JavaScript crawlers.

Why This Matters

Google has improved its JavaScript rendering capabilities over the years, but AI crawlers like GPTBot, PerplexityBot, and ClaudeBot are far less capable at executing JavaScript as of mid-2026. If your product data is not in the raw HTML, these crawlers see an empty or partial page.

For a full breakdown of which crawlers access your store and how to verify they can reach your pages, see our robots.txt AI crawler access audit guide for ecommerce.

Tool 3: Browser Reader Mode

Most modern browsers have a built-in reader mode that strips away navigation, ads, and layout to show only the main content. This is a good approximation of what AI text extraction produces.

How to Use It

  1. Open a product page in Safari, Firefox, or Edge
  2. Click the reader mode icon in the address bar (Safari: four lines icon, Firefox: page icon, Edge: book icon)
  3. Read what appears

What to Look For

Reader mode algorithms are designed to find the “main content” of a page. If reader mode shows your product description clearly, it is a good sign that AI extraction will too. If reader mode gets confused and shows navigation text or nothing at all, your page structure needs work.

Specific checks:

  • Does reader mode identify the correct main content block?
  • Is the price visible in reader mode?
  • Are product options (sizes, colors) present?
  • Is the “Add to cart” button text readable?

Limitations

Reader mode is more forgiving than AI crawlers because your browser has already executed all JavaScript and rendered the full page. Use reader mode as a quick check, but always verify with curl or Jina Reader for the full picture.

Tool 4: View Page Source Inspection

Every browser lets you view the raw HTML your server sends before any JavaScript runs. This is the ground truth of what crawlers receive.

How to Use It

  1. Open a product page
  2. Right-click and select “View Page Source” (not “Inspect Element”)
  3. Search (Ctrl+F) for your product name, price, and "@type"

What to Look For

Structured data block. Search for "@type": "Product" in the source. You should find a JSON-LD block that looks like this:

{
  "@context": "https://schema.org",
  "@type": "Product",
  "name": "Wireless Earbuds Pro",
  "image": "https://yourstore.com/images/earbuds.jpg",
  "description": "Premium wireless earbuds with active noise cancellation",
  "offers": {
    "@type": "Offer",
    "price": "89.99",
    "priceCurrency": "USD",
    "availability": "https://schema.org/InStock"
  }
}

If this block is missing from the source, your structured data is being injected by JavaScript. Move it to server-rendered HTML immediately. This is the single highest-impact fix for AI discoverability.

Semantic HTML. Check if your product content uses semantic tags like <h1> for the product name, <article> for the main content, and <section> for descriptions. AI crawlers use HTML semantics to understand content hierarchy. A page built entirely with <div> tags is harder for AI to parse.

Image alt text. Search for your product images and check if they have descriptive alt text. AI agents use alt text as a primary source of product information when they process images. If your images have empty alt="" or no alt attribute at all, you are losing a data signal.

For a deeper dive into testing your structured data specifically, see our guide on schema validators and AI discoverability testing tools for ecommerce.

Tool 5: Google Rich Results Test

Google’s Rich Results Test (search.google.com/test/rich-results) shows you whether Google can parse your structured data. While it is designed for Google’s own systems, it serves as a good proxy for how well any AI crawler can extract your product information.

How to Use It

  1. Go to search.google.com/test/rich-results
  2. Enter your product page URL
  3. Click “Test URL”
  4. Review the detected structured data

What to Look For

Detected product schema. The tool should detect your Product schema and show the extracted fields. If it detects nothing, your structured data is not accessible.

Field completeness. Check which fields the tool extracted. Required fields for AI visibility include: name, image, description, price, availability, and brand. Optional but valuable fields include: SKU, GTIN, aggregateRating, and review.

Errors and warnings. The tool reports specific errors like missing required fields or invalid values. Fix every error. Warnings are less critical but still worth addressing.

Rendering method. The tool tells you whether it detected the data from the raw HTML or after JavaScript execution. If your schema only appears after JavaScript execution, most AI crawlers will miss it.

Why This Tool Matters for AI Agents

Google’s Shopping Graph feeds not just Google Shopping and Google AI Mode, but also third-party AI systems that license Google’s product data. A Digital Applied audit of 92 domains in April 2026 found that schema-only optimization produced a 3.1% AI citation lift, but when combined with opinion-rich prose content, the lift jumped to 47%. Valid schema is the baseline. Rich, extractable content is what actually drives citations.

Comparison: Which Tool to Use When

ToolBest ForSpeedAccuracyJavaScript Support
Jina ReaderFull AI extraction previewFastHighPartial
curl + grepRaw HTML verificationInstantExactNone
Browser Reader ModeQuick content checkInstantMediumFull
View Page SourceStructured data inspectionInstantExactNone
Rich Results TestSchema validationModerateHighPartial

Recommended workflow:

  1. Start with curl to verify your product data is in the raw HTML. If it is not, fix your rendering first. Everything else depends on this.
  2. Run Jina Reader to see the full AI extraction. This shows you what an AI agent actually processes.
  3. Check View Source for structured data. Ensure JSON-LD is server-rendered and complete.
  4. Run Rich Results Test to validate your schema against Google’s parser.
  5. Use Reader Mode as a quick spot-check for individual pages.

Common Extraction Problems and Fixes

Problem: Product Data Loads via JavaScript

Symptoms: curl returns empty content. Jina Reader shows a partial page. Rich Results Test detects no schema.

Fix: Implement server-side rendering for product pages. If you use Shopify, most themes render product data in the initial HTML. If you use a headless setup (Shopify Hydrogen, Next.js Commerce, custom), ensure SSR is configured for product routes.

Impact: This is the single biggest fix for AI discoverability. Without server-rendered product data, most AI crawlers cannot see your products at all.

Problem: Navigation Dominates the Extracted Text

Symptoms: Jina Reader output starts with 50+ lines of menu items before reaching the product. The extracted text is mostly navigation.

Fix: Use semantic HTML. Wrap your main product content in <main> or <article> tags. Ensure your navigation uses <nav> tags. AI extraction tools use semantic HTML to identify the primary content block. When everything is a <div>, the extraction algorithm cannot distinguish navigation from content.

Problem: Prices Missing from Extraction

Symptoms: Product name appears but price does not. The price shows on the rendered page but not in curl or Jina output.

Fix: Ensure prices are rendered as plain text in the HTML. Avoid rendering prices solely through JavaScript widgets, canvas elements, or images. If you must use dynamic pricing, include the base price in the HTML and update it with JavaScript as an enhancement.

Problem: No Structured Data Detected

Symptoms: View Source shows no JSON-LD. Rich Results Test returns empty.

Fix: Add Product schema JSON-LD to your product page template. This is the highest-ROI technical SEO task for ecommerce stores in 2026. Include at minimum: name, image, description, offers (price, currency, availability), and brand. For the complete setup guide, see our product schema markup guide for AI shopping agents.

Problem: Duplicate Content in Extraction

Symptoms: Jina Reader shows the same product description three times.

Fix: Check your theme for duplicate content containers. Some Shopify themes output the product description in a visible container, a hidden SEO container, and a structured data block. Consolidate to one clean output.

Building a Weekly Preview Routine

Set up a 15-minute weekly check using these tools:

Monday: Pick 5 key products. Choose your highest-traffic or highest-margin products. These are the ones you need AI agents to find.

Tuesday: Run curl on all 5. Verify the raw HTML contains product name, price, and JSON-LD. This takes 2 minutes per product.

Wednesday: Run Jina Reader on all 5. Read the full extraction. Check for missing content, navigation noise, or garbled text.

Thursday: Fix any issues found. Common fixes take minutes: update alt text, fix a missing schema field, adjust a JavaScript rendering issue.

Friday: Re-test. Verify your fixes worked by running the same checks.

This routine catches problems before they become visibility losses. AI crawlers re-visit your pages on unpredictable schedules. If a theme update breaks your schema on Monday and you do not notice until Friday’s traffic report, you have lost a full week of AI citations.

Tools like shopti.ai automate this monitoring by continuously checking your store’s AI extractability across the full product catalog, not just five manual samples. For stores with hundreds or thousands of products, manual weekly checks cover only a fraction of the catalog. Automated monitoring fills the gap.

How AI Content Extraction Relates to llms.txt

Content preview tools show you what AI agents read from individual pages. But there is another layer: your llms.txt file, which tells AI models how to navigate and understand your entire site.

If your individual product pages extract well (good job using the tools above), but AI agents still cannot find your store, the problem might be navigational. AI agents need to know your store exists and where to find the product catalog. That is what llms.txt does. It sits at yourstore.com/llms.txt and provides a markdown summary of your site, your product categories, and where the important content lives.

Think of it this way: the tools in this guide ensure that when an AI agent lands on a product page, it can read the content. Your llms.txt ensures the agent can find the product page in the first place. Both are necessary. For the complete setup guide, see our llms.txt guide for ecommerce stores.

The Bottom Line

AI agent visibility starts with extractability. If your product content cannot be read as plain text with structured data, no amount of prompt engineering or feed optimization will make your store appear in AI recommendations. The five tools in this guide are free, take minutes to use, and reveal exactly what ChatGPT, Google AI, and Perplexity see when they visit your store.

Run curl first. If your product data is not in the raw HTML, fix that before anything else. Then use Jina Reader to see the full extraction, View Source for structured data, and Rich Results Test for schema validation. Build it into a weekly routine and you will catch problems before they cost you citations.

Check your store agent discoverability score free at shopti.ai.

FAQ

Does Jina Reader process JavaScript on my page?

Jina Reader has partial JavaScript support. It can render some JavaScript but not all. For a true picture of what non-JS crawlers see, use curl. For a picture of what AI retrieval tools see, Jina Reader is the closer approximation since many AI retrieval systems execute JavaScript to some degree.

How is this different from Google Search Console?

Google Search Console tells you what Google sees and indexes. The tools in this guide tell you what AI agents like ChatGPT and Perplexity see, which is a different rendering engine entirely. Google has the most sophisticated JavaScript rendering in the industry. Most AI crawlers are far less capable. You need to test for both.

Do I need all five tools?

No. Start with curl and Jina Reader. Those two cover 80% of what you need to know. Add the others when you need to debug specific issues: Rich Results Test for schema problems, View Source for quick HTML inspection, Reader Mode for a fast spot-check.

How often should I run these checks?

At minimum, run curl and Jina Reader on your top 5 products once a week. Any time you update your theme, install a new app, or change your product page template, run the full set of tools immediately. Theme updates are the most common cause of sudden AI visibility loss.

What if my store uses Shopify?

Most Shopify themes render product data in the initial HTML, which is good for AI extraction. The most common Shopify-specific issues are: missing structured data fields (brand, GTIN), product descriptions hidden behind JavaScript “Read more” toggles, and navigation-heavy themes that confuse content extraction. Run Jina Reader on your Shopify product pages to see exactly where you stand.

Sources

  1. Pragma Partners, “State of Ecommerce Product Feeds 2026” - Analysis of product feed quality across 2,400+ ecommerce domains, finding 41% contain at least one critical ingestion-blocking error (2026).
  2. BrightEdge, “AI Search Results Growth Report” - Documentation of 850% growth in AI-generated search results between mid-2024 and early 2025, based on analysis of billions of search queries (2025).
  3. Seer Interactive, “Google AI Mode Click Behavior Study” - Analysis of 25.1 million Google AI Mode impressions showing 93% of queries end without a click (May 2026).
  4. Digital Applied, “AI Citation Factors Audit” - 92-domain audit finding schema-only optimization produced 3.1% citation lift versus 47% for opinion-rich prose content (April 2026).