KEMBAR78
AI and The Evolution of Search | PDF | Search Engine Optimization | Artificial Intelligence
0% found this document useful (0 votes)
80 views43 pages

AI and The Evolution of Search

This document serves as a practical guide to understanding and measuring visibility in AI-driven search engines, particularly focusing on OpenAI's ChatGPT. It outlines the evolution of OpenAI, the role of its various bots, and emphasizes the importance of log analysis for tracking content visibility and optimizing for AI search. The guide also provides best practices for content structuring to enhance discoverability in AI tools.

Uploaded by

karghuma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
80 views43 pages

AI and The Evolution of Search

This document serves as a practical guide to understanding and measuring visibility in AI-driven search engines, particularly focusing on OpenAI's ChatGPT. It outlines the evolution of OpenAI, the role of its various bots, and emphasizes the importance of log analysis for tracking content visibility and optimizing for AI search. The guide also provides best practices for content structuring to enhance discoverability in AI tools.

Uploaded by

karghuma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

AI & The Evolution

of Search
A practical guide to measuring
your visibility with log analysis
03 Introduction
Contents
05 What is OpenAI?

09 Overview of OpenAI's bot ecosystem

18 Tracking ChatGPT indexation and visibility with


Oncrawl’s log analysis

25 Best practices to help your content appear in


ChatGPT

34 FAQs

40 Wrapping up

2
Introduction

3
It feels like we are hearing or reading A LOT about
artificial intelligence everywhere we look. And while
the amount of information available can sometimes
be overwhelming, understanding how and where AI
is integrated into our work and lives is becoming an
unavoidable necessity.
As ChatGPT is becoming a growing source of
Particularly when looking at the search industry, we are traffic, optimizing for it is a bit like traditional
in an era in which AI is transforming how we search SEO—you first need to understand how the
for and discover information. Being able to identify search feature actually works. Also, and
why your content appears—or doesn’t—in tools like arguably just as important, you need to be able
ChatGPT is more than just a point of curiosity. It's now to measure and monitor your efforts. Without
a legitimate part of any SEO strategy. that, you won’t know if you’re heading in the
right direction.
AI-driven search engines don’t operate quite like Google
or Bing. They rely on real-time crawling, user prompts,
and natural language understanding to provide - Jérôme Salomon
answers to our questions. Furthermore, it is difficult Senior Technical SEO at
to track if your brand is visible in the answers using Oncrawl
traditional tools. That’s where log analysis comes in.

In this e-book, we’ll break down the core concepts


behind OpenAI’s bots, explain how to track them with
log analysis in Oncrawl, and walk you through the steps
you should take to increase your chances of being
seen, cited, and clicked in the AI space.

4
What is OpenAI?

5
A quick history It blends some classic search engine features with
up-to-date information as well as real-time search
OpenAI was founded in 2015 and initially focused its on third party providers, and it's all delivered via a
research on deep learning and reinforcement learning. natural language interface.
However, the business model rapidly evolved and the
company began to concentrate their efforts more on There are a few other distinctions that set it apart from
general AI development and AI for research, which traditional search engines, but we will address that in
quickly led to the introduction of Generative Pre- detail later.
trained Transformer (GPT) models - the foundation
of ChatGPT.

Fast-forward to 2022 when OpenAI officially released


ChatGPT, its flagship tool and one of the most July 2024: ChatGPT Search prototype launched
advanced chatbots powered by AI.
October 2024: Search directly integrated into
ChatGPT: From text generator to search the ChatGPT app

engine November 2024: Sites begin to see 44% monthly


growth in referring traffic from chatgpt.com
Although ChatGPT has now become a household
name, it is only part of the OpenAI story. OpenAI’s December 2024: ChatGPT Search generated
technology powers a suite of products that are 6x more clicks than Perplexity
reshaping how p e op le discover and access
information.

In 2024, they introduced ChatGPT Search, a feature


that can be manually initiated by the user or
automatically enabled depending on the prompt.

6
ChatGPT Search’s main goal is to provide better answers by considering chat context and
adding citations to the content source.

Although Google remains the dominant search engine, ChatGPT Search is disrupting historic
search usage.

The advancements brought about by ChatGPT Search are driving search marketing professionals
to change how we define “search” and how we optimize for it. In the SEO industry, there is
actually a lot of discussion around whether we actually can optimize for this new opportunity.

Good news - it is possible if you understand how ChatGPT Search works and are able to measure
and monitor your efforts.

7
The process of getting detected by AI tools in modern times goes beyond achieving Google search
engine rankings.

Nowadays, you likely turn to ChatGPT, Gemini or Perplexity AI whenever you need information. The
tools evaluate answers based on their match to your actual question instead of choosing websites
with the highest ranking.

I have observed that well-organized content, combined with clear structure, plays a crucial role in
success. Websites that take too long to load or have confusing layouts will not succeed.

The success of your content depends heavily on short paragraphs with clear language, easy-to-follow
headings and direct answers. Users, together with artificial intelligence systems, choose content that
provides simple and fast solutions.

A practical lesson I learned came from an experience that involved optimizing our FAQ pages. The
FAQs became more visible to AI-generated responses after I shortened the answers and made them
easier to understand. The result? More traffic and happier visitors.

My best advice: Put yourself in your users' shoes. Answer the actual questions your users need help
with in a straightforward manner. The content needs to be simple and organized while also loading
quickly. All users, including your audience and AI systems, value content that is straightforward to
understand and fast to retrieve.

- Veronika Höller

Head of Demand Generation at Tresorit

8
Overview of OpenAI's
bot ecosystem

9
When we talk about visibility in AI-driven tools like ChatGPT, we’re really talking about how
OpenAI’s bots interact with your website. These bots are the digital scouts that crawl, cache,
and surface your content in response to user prompts.

OpenAI uses different bots, or crawlers, for different tasks and features.

Meet OpenAI’s crawlers


OpenAI currently deploys three primary crawlers. Each plays a specific role in making sure your
content is either learnable, searchable, or usable in real-time conversations.

10
GPTBot

Think of this one as the training bot. Its job is to crawl websites and gather information that
helps improve the general intelligence of OpenAI’s language models.
• Purpose: Model training
• Timing: Offline and asynchronous
• Implications: Your content may be used to train future versions of GPT, but won’t necessarily
show up in ChatGPT Search results

What do you do if you want to prevent this? You can block GPTBot in your robots.txt file while
still allowing your site to appear in ChatGPT answers.
User-agent: GPTBot
Disallow: /

OAI-SearchBot

This is the indexing assistant for ChatGPT Search. It works a bit like Googlebot, but with one
major difference: it doesn’t build a massive search index itself. Instead, it augments results that
come from Bing and other sources.

OAI-SearchBot mainly relies on Bing’s search index and SERPs to locate relevant pages during
web browsing.
• Purpose: Improves retrieval and ranking in ChatGPT Search
• Timing: Crawling is asynchronous to user queries
• Key takeaway: Being indexed in Bing drastically improves your discoverability by OAI-
SearchBot

Tip: Ensure this bot is allowed if you want to show up in ChatGPT’s Search feature.
User-agent: OAI-SearchBot
Allow: /

11
The value of ChatGPT-User and OAI-SearchBot bot hits comes from what you can actually learn from
them. ChatGPT-User and OAI-SearchBot are both important in the ChatGPT Search system: ChatGPT-
User crawls webpages to answer user queries in real-time, while OAI-SearchBot indexes content to
improve the search process.

By analyzing log files, site owners can track bot visits, monitor status codes, keep an eye on server
response times, and ensure key pages are properly crawled and accessible.

The volume of ChatGPT-User bot hits serves as a reliable indicator of a site’s visibility, somewhat
equivalent to impressions on traditional search engines. When ChatGPT-User visits a page, it signals
that there’s growing interest from ChatGPT in your content—and possibly a rising trend around your
topics.

Tracking bot activity also helps understand which content attracts the most attention and allows you
to verify that critical pages are regularly being crawled. This is particularly important for brands and
sites looking to optimize their visibility in ChatGPT Search, as checking the last crawl date ensures
recent updates are captured and indexed effectively.

- Jérôme Salomon

Senior Technical SEO at Oncrawl

12
ChatGPT-User

If you see this bot in your logs, it means a real user query prompted ChatGPT to crawl your site in real-time.
• Purpose: Fetches up-to-date content to directly answer user questions
• Timing: Real-time or just-in-time crawling
• SEO goldmine: Visits from this bot are your best signal of visibility in ChatGPT. You can treat these hits like the
impressions metric you would find in Google Search Console.

Keep in mind, however, that ChatGPT-User does not crawl all search results. It will skip pages protected by a paywall,
inaccessible pages due to poor status codes, or pages with disallowed access (more on that later).

It’s also important to note that not all pages crawled by ChatGPT-user will be used as a citation in the answer.

Be sure to keep those pages that matter for your visibility open and accessible.
#Allow AI Search
ChatGPT-User:
OAI-Searchbot:
Allow: /

In terms of measuring AI visibility, one of the challenges with the data we currently have is that it’s limited. We know
what tools digital marketers are using to try to track visibility—for example, in GA4, you can set up a custom dimension
to track referral traffic from ChatGPT. But the problem is, there’s not enough data to fully measure your efforts.

You can also try scraping ChatGPT Search answers or using visibility tracking tools, but user queries and search volumes
are still largely based on assumptions. That’s why analyzing your own server logs is so important—it gives you direct,
reliable insights into what’s actually happening.

- Janaina Barreto-Romero

Senior Technical SEO at Oncrawl

13
How to identify OpenAI’s crawlers
ChatGPT-User and OAI-SearchBot are the key bots to monitor as they are the drivers behind
the ChatGPT Search system.

When you examine your log files, you can identify and track visits and referral traffic from
ChatGPT if you know what you’re looking for. Below are the user agents to keep an eye on:

GPTBot user-agent: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible;


GPTBot/1.1; +https://openai.com/gptbot

OAI-SearchBot user-agent: OAI-SearchBot/1.0; +https://openai.com/searchbot

ChatGPT-User user-agent: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko);


compatible; GPTBot/1.1; +https://openai.com/gptbot

14
What role does each bot play in the AI search workflow?
Now that we have explored what the OpenAI search system looks like, let’s examine how it works
behind the scenes and what happens when someone asks ChatGPT a question.

As previously mentioned, OAI-SearchBot and ChatGPT-User bots have distinct tasks and
features, so they play specific roles in the search workflow. The workflow from prompt to click
runs through the following steps:

15
1. Prompt

The user enters a query or prompt.

2. Query sent to Bing

After a user enters a prompt into ChatGPT, it is


turned into a search query, taking into account the
conversation and user context.

3. ChatGPT-User crawls

ChatGPT will then perform a real-time search


using this query. Relying on third-party sources like
Bing and news partners, the bot retrieves relevant
content to compose a response.

4. Response generated

Cited content is included directly in the answer,


Note: If either OAI-SearchBot or ChatGPT-User can’t access
often with clickable links.
your page, it’s like waving a flag that says “don’t bother
In an asynchronous process, OAI-SearchBot complements using this content”—even if it’s high quality.
the above process by ensuring that relevant content
is accessible and usable within OpenAI’s ecosystem.

16
Your technical SEO gatekeepers
During ChatGPT-User’s crawling process and OAI-SearchBot’s indexation process the traditional
SEO signals are taken into account when defining if a page is eligible.
The bots respect robots.txt directives, meta robots (noindex) directives and status codes.
You have full control over whether to:
• Opt into ChatGPT Search
• Opt out of training
• Fine-tune what gets crawled by which bot

The most permissive setup, if you're aiming for maximum visibility in ChatGPT Search, looks
like this:
# Allow ChatGPT Search
User-agent: ChatGPT-User
Allow: /

User-agent: OAI-SearchBot
Allow: /

# Disallow model training (optional)


User-agent: GPTBot
Disallow: /

When you understand how these bots function and what you’re looking for, you’ll have a better
idea of what insights you can glean from your log analysis.

17
Tracking ChatGPT
indexation and
visibility with
Oncrawl’s log
analysis
18
Measuring and monitoring your SEO efforts in
Log file analysis has never been more strategic. As LLMs (ChatGPT,
ChatGPT Search is challenging because there's
Gemini, Perplexity, etc.) increasingly rely on web content to generate
no equivalent of Google Search Console for
their responses, understanding what they actually explore on your site
ChatGPT Search.
has become a key issue. Some AI players have tried to standardize
This means that traditional SEO metrics, like this process with the LLMS.txt file, which is meant to guide AI bots
keyword search volume, average position, toward the right content to use. But let’s be honest: this file is neither
impressions, clicks and CTR are either not standardized nor consistently followed. In reality, LLMs still crawl
available or meaningless. websites in their own way.

However, there is a way to “see” your AI visibility That’s exactly where log analysis becomes essential. It allows you to
in real time: server log analysis -your new best pinpoint which pages are visited by AI bots, helping you understand
friend. And when it comes to making sense of which content is most likely to be used—either directly or indirectly—
those logs, Oncrawl turns your raw data into as a source in their answers. For e-commerce businesses, this is
clear insights. especially critical: LLMs are now integrating product data to deliver
transactional answers, much like Google Shopping. So it’s vital to
Why is log analysis so important? know whether your catalog is being properly crawled or not.

Since most traditional SEO tools can’t help Just like in SEO, log analysis in GEO (Generative Engine Optimization)
measure a site’s AI visibility, we need to examine acts as your compass: which pages are being explored? Which are
websites from a different lens—and logs offer being ignored? Are there any technical blockers? It’s a necessary
exactly that. step if you want your brand to show up in AI responses—and not
hand the opportunity to your competitors.
Every time an OpenAI bot accesses or crawls
your site, it leaves a new line in your server log
- Mathieu Chapon
files as well as valuable clues.
Director of Data and Innovation at Peak Ace

19
How does log analysis work?
Log analysis is the process of reading those clues left in your logs and reconstructing the path
your site’s visitors have taken.

Your log files represent all activity on your website. It is the most complete and most reliable
source of information regarding what happens on your site.

Each line contains the following information used for SEO log analysis:
• URL visited
• Date and time
• User-agent (to identify the botname)
• Referrer (to identify the source of the user visit)
• Status code of the URL

If you haven’t analyzed your logs before, no worries. Oncrawl can help facilitate the process.
With Oncrawl’s log analysis, you can:
• Identify which pages were visited by each bot
• Track when the visits occurred
• Monitor bot frequency and coverage trends
• Map bot behavior to page types and content segments

This lets you see what ChatGPT is actually looking at, what’s potentially missing from its view,
and where you might be overlooking opportunities to bring in more traffic.

20
What data can you find in Oncrawl?
When analyzing your log files in Oncrawl, a number of different filters and metrics are available
and they are very useful for extracting key insights.

1. Which pages are visited—and how often

You’ll see exactly which pages each bot hits, whether it’s your homepage, a product page, or
an obscure blog post. This helps you understand what type of content ChatGPT is gravitating
towards.
• Are your most valuable pages being crawled?
• Are certain sections ignored entirely?

This is your starting point for identifying missed opportunities.

2. When visits happen

Timing matters—especially when you want to know if updates to your site are being picked up.
Oncrawl provides:
• Daily and hourly bot hit evolution
• Trend lines that show whether activity is growing or fading
• Last seen crawl timestamps for each page

This helps you answer questions like:


• Did OAI-SearchBot pick up our latest blog post?
• Has ChatGPT-User been hitting our pricing pages recently?

21
3. Volume of visits per page group

Thanks to segmentation features in Oncrawl, you can group URLs by:


• Page type (e.g. blog, product, support)
• Topic or taxonomy
• Language or market

From there, you can compare bot activity across these groups. Are your product pages getting
all the visits while your blog remains invisible? Or vice versa?

4. Bot-specific filtering

You can isolate specific bot activity from:


• ChatGPT-User (real-time user queries)
• OAI-SearchBot (search enhancement and indexability)
• GPTBot (training crawler, if allowed)

By applying filters, you can dig into the behavior of each bot individually—ideal for diagnosing
indexation gaps or visibility wins.
22
How can you use this information?
Once you know what’s being crawled and what isn’t, you can then set up your action plan.

Highlight accessibility issues

Pages that return 404s or are blocked by robots.txt will never appear in ChatGPT Search. You
can catch and fix those quickly.

Identify overlooked but valuable content

If key pages aren’t being visited by OAI-SearchBot, check if they’re indexed in Bing or if technical
SEO issues are preventing them from being crawled and indexed.

Measure visibility via ChatGPT-User hits

Every ChatGPT-User crawl is a signal that your content was used to answer a real prompt. That’s
the best measure of SEO visibility in the AI world.

Benchmark against your own data

Track changes in visibility over time. Identify which type of content ChatGPT is crawling the most
and the last time a page was crawled.

Did a recent update to your robots.txt or a new content strategy lead to more visits? Test and
learn to see what works best for your site.
23
Example filters and segments that
work well
Here are a few ideas to inspire your own custom
dashboards in Oncrawl:
• Segment by site sections to compare blog vs.
product visibility for example
• Filter for ChatGPT-User only to measure prompt-
driven hits
• Segment by language/market to analyze
international AI visibility
• Check crawl recency to confirm whether your
key pages are visited often

When you look at the right data, the right way, your logs
can inform strategic decisions—no guesswork needed.

The AI search ecosystem is still in flux, but thanks to log


analysis, you have something concrete to hold onto.

Oncrawl lets you quantify what matters, track what


changes, and steer your content towards real visibility
in ChatGPT.

24
Best practices to
help your content
appear in ChatGPT

25
Visibility in ChatGPT isn’t just about being found—it’s about being featured.

Unlike traditional search engines, where rankings are determined by a blend of signals over time,
ChatGPT’s search functionality blends real-time crawling, user context, and search engine
results into a single seamless response.

ChatGPT isn’t simply regurgitating search results; it’s evaluating which sources are relevant,
credible, and helpful.

Once you understand how ChatGPT Search works, you have a better idea of what to leverage
in order to be visible.

How ChatGPT Search works To be visible you need to:

Uses Bing’s SERPs to find relevant content Index and rank your pages in Bing

Bots are used to access content Make content accessible to ChatGPT bots

Only uses relevant sources in the answer Provide relevant content on your site

Looks for up-to-date information Make sure your content is up-to-date

26
Ranking in Bing
Bing indexation is an important starting point for
visibility in ChatGPT Search. Since ChatGPT relies
heavily on Bing’s index to retrieve and evaluate content,
setting up a Bing Webmaster Tools account is highly
recommended.

With BWT, you can check which pages are indexed,


monitor performance for specific keywords, and
identify technical issues that may block visibility.

This insight helps you understand which content is


eligible to appear in ChatGPT responses—and where
to focus your Bing optimization efforts.

Although optimizing for Bing is a very important first


step, your job isn’t done just yet.

27
Answer Engines like ChatGPT and Perplexity are redefining search. Fundamental SEO metrics like traffic,
backlinks, and domain authority don't drive results in AI search. Instead, these engines prioritize clear,
structured, and authoritative content that immediately addresses user questions. At the same time,
there is a tidal shift in user behavior towards conversational experiences. Together, these factors are
catalyzing a shift in how people find information itself.

Our research at Profound shows only about 12% overlap between Google results and ChatGPT answers,
highlighting how distinct these systems are. Without proper indexing by Bing, your site remains invisible
to ChatGPT. Each AI platform has distinct preferences: ChatGPT is biased towards Wikipedia, Perplexity
relies on Reddit and YouTube, and Microsoft Copilot disproportionately uses industry leaders such as
Forbes and Gartner.

Winning in AI search means that your brand needs to start providing concise, definitive answers that
directly meet consumer needs. AI search is decision-oriented: users want immediate recommendations,
structured comparisons, and actionable insights.

Optimizing for AI means creating structured, authoritative content specifically tailored for these engines
(llms.txt, html tables, etc.). Brands embracing this shift now will lead the way.

AI search is here now. There will be a two tiered internet in our very near future: one for humans (well-
designed and beautiful) and one for AI (structured and digestible). Your strategy must reflect this reality.

- Joshua Blyskal

AI Strategist at Profound

28
Make sure your content is accessible
Another important step you can take to make sure Internal linking
ChatGPT is using your site’s pages to answer queries ChatGPT has been known to use more than one page
is to simply ensure the content is accessible. from a single domain to answer a question and the
Technical accessibility bots will continue to revisit the relevant pages it has
already identified.
When we say technical accessibility, this includes
proper status codes, crawlable page structures, and Therefore, a well-structured, semantic internal linking
fast-loading content. strategy from your top crawled pages can boost the
discovery of other pages.
In our previous research, log analysis has shown that
OpenAI’s bots keep crawling 404 pages and that’s a Javascript rendering
problem that can be fixed! Studies have shown that many AI crawlers have
You can use your server log files to identify which limited JavaScript rendering capabilities. At the time
pages crawled by the OpenAI bots responded with a of writing, OpenAI bots focus primarily on HTML content;
404 status code and subsequently create a redirect therefore, you want to make sure your key content is
mapping in order to ensure that ChatGPT accesses available in your HTML or with server-side rendering.
the right pages with the relevant content. Tip: Be sure to use your favorite crawler without JS
Additionally, you can improve your accessibility rendering to verify your content at scale.
by including structured data where appropriate—
especially for things like FAQs, how-to instructions, or
product information.

While not a direct “ranking factor” for ChatGPT,


structured content makes it easier for both Bing and
OpenAI crawlers to interpret your content quickly

29
There are two primary ways to appear in ChatGPT Some site owners choose to block AI bots entirely, which
results, and while they’re connected, each requires a limits visibility on both fronts.
slightly different strategy.
However, the upside of allowing access has grown
Let’s say a user asks, “What’s the best breadmaker considerably in the past year, particularly as ChatGPT
for a family?” can now drive measurable traffic.

ChatGPT will search the web and return a list of The benefit is in the search results; your domain will
recommended products. The results will incorporate be linked and receive clicks. Unless you're a publisher,
and cite various sources. giving these bots access is often a no-brainer.

1. Content from your own domain The added bonus? You maintain control. When you
update your website, those changes can be reflected
Ideally, your own website should be one of those cited
in ChatGPT results—sometimes more quickly than
sources. You might have a PLP (Product Listing Page) for
traditional SEO would allow. And the pages that appear
“family breadmakers” or blog content that addresses
in ChatGPT may not always be the same ones that rank
which features are best for family breadmakers.
highly in Google, giving you more room to work with.
Here, you have two key opportunities for visibility:
• First, by being included in the model’s training
data. Is your site being crawled by the GPTBot?
• Second, by showing up as one of the URLs
retrieved via web search—typically powered by
third-party services like Bing.

30
2. Mentioned on external content There isn’t anything revolutionary about these tactics,
they are actions that your PR and SEO teams might
For “the best” or “which X is the best” comparison-style
have already been taking. This just means that these
queries, ChatGPT often leans on sources that review
strategies may need a slight recalibration—not a
multiple products or services—think listicles or product
reinvention. You’ll likely have to shift your focus towards
roundups.
the pages and publications that AI assistants are
In our breadmaker example, a page like Ideal Home’s actually surfacing and crawling.
“Best Breadmaker – the 8 Top Bread Machines” might
be frequently cited.
- John Campbell
Your goal should be to:
• Get mentioned in those lists Head of Innovation and AI at ROAST
• Climb higher in lists where you're already
featured
• Be selected for specific subcategories (e.g., “best
for speed,” “best value,” “best for families”)

If ChatGPT uses two or three of these listicle style pages


as sources and you’re appearing in multiple lists, your
visibility multiplies.

It’s worth noting that the sources ChatGPT uses often


differ from those that rank highest on Google, which
may have been the focus of your original PR coverage
list.

31
Create relevant content
“Relevant” is the key word that OpenAI’s documentation To ensure your content surfaces in these high-value
emphasizes when defining which web pages are used URL categories, you can:
in their responses.
Test with prompts: Run prompts on ChatGPT Search
Although there is no clear definition of what they and inspect the JSON file via browser developer tools
deem to be relevant, you can identify relevancy by to gather the exact URLs used.
inspecting a ChatGPT Search response’s network
Analyze with a crawler: Feed those URLs into a web
activity and examining how the bots interact with your
crawler to extract metadata like:
site.
• Page title and meta description
The underlying JSON of a search response contains • Heading structure (<h1>, <h2>, etc.)
several categories of URLs, including: • Main content blocks
• Structured data
s o urc es _fo otn ote – URLs directly cited in the
• N-gram frequency (to analyze phrase usage)
response.
Compare against your content: Use the insights to
supporting_websites – Additional sources used for
evaluate what’s working and what could be missing.
context or hover citations.
• Content gaps - Are there concepts or features
search_result – URLs retrieved from the search missing from your page?
engine (typically Bing). • Content uniqueness - Does your content add
new value?
safe_urls – Approved and safe-to-visit URLs. • Freshness - Is your content up to date?

blocked_urls – Domains filtered out for policy or By dissecting what ChatGPT deems “relevant,” you can
quality reasons. reverse-engineer the factors that lead to inclusion.

Only sources_footnote and supporting_websites


URLs are deemed “relevant.” The rest may have been
crawled but didn’t meet the bar for inclusion.

32
Identify any other blockers
If you’ve run through the checklist above and you still aren’t seeing any progress by way of
visibility, make sure to check if your pages have the proper meta robot tags/directives. ChatGPT’s
crawlers won’t try to bypass barriers—they’ll simply move on to the next option.

Similarly, content behind login walls, gated forms, or paywalls will generally be excluded from
responses. If the content can’t be accessed in real time, it won’t make the cut for inclusion.

And unlike Google, which crawls and indexes content relatively often, OpenAI bots revisit content
based on triggers. If your pages haven’t been crawled in weeks, your latest updates may not
be reflected in the search assistant’s answers—even if you're technically "in the index."

Ranking in ChatGPT isn’t about gaming an algorithm—it’s about being technically accessible,
contextually relevant, and indexed in the right places. It’s SEO in a new form, and it rewards
the same things that have always mattered: clarity, structure, and consistency.

33
FAQs

34
As more SEOs tune into the role of ChatGPT and AI
search tools, a lot of questions keep surfacing—about
bots, crawling, visibility, and what it all means in the
context of traditional SEO.

We have answered a number of the more common


questions throughout the e-book, but below is a
cheat sheet that regroups those questions and
(hopefully) clarifies
​​ some of the trickier and commonly
misunderstood aspects of OpenAI.

Quick navigation
• ChatGPT visibility basics
• Search & indexing
• Bot behavior
• Content optimization
• Technical questions

35
ChatGPT visibility basics
How do I know if ChatGPT used my content in response Is there a tool like Google Search Console for ChatGPT?
to a query? How can I know if a page is indexed by ChatGPT?

There’s no notification system (yet), but your server logs At the time of writing, no. There’s no first-party tool from
tell the story. When ChatGPT-User visits your page, it’s OpenAI that lets you check impressions or rankings in
because a user’s prompt led ChatGPT to fetch your ChatGPT Search.
content in real time. That’s your biggest visibility signal.
This is why log analysis is critical. Tools like Oncrawl
If I’m ranking in Google, shouldn’t I also rank in fill the gap by letting you monitor bot visits and infer
ChatGPT? visibility patterns based on real-time activity.

Not necessarily. As mentioned above, ChatGPT pulls


its results primarily from Bing, not Google. So if your
content isn’t indexed in Bing, your odds of visibility in
ChatGPT drop sharply.

That being said, if you’re optimized for Google, you’re


probably doing a lot of things right—especially around
technical health and content clarity. But for ChatGPT,
you may need to recheck your site’s indexing status
and crawl accessibility with Bing in mind.

36
Search & indexing
Is it true that OAI-SearchBot's web browsing is based Does OAI-SearchBot index all visited pages?
on Bing's index and SERPs?
No, OAI-SearchBot does not index all visited pages.
Yes, OAI-SearchBot’s web browsing capabilities are Similar to other search engines, OAI-SearchBot
powered in part by Bing’s index and search engine respects standard web protocols such as robots.txt
results pages (SERPs). This allows the bot to access directives and meta tags (e.g., noindex) to determine
up-to-date, public content and retrieve relevant whether a page should be indexed. Additionally, the
information efficiently. bot considers the page's status code (e.g., 200, 404)
to decide if the content is eligible for indexing.
Does that mean you first need to be indexed in Bing
if you want to be indexed by OAI-SearchBot?

While OAI-SearchBot does leverage Bing’s infrastructure


for search results, being indexed by Bing is not strictly
required to be accessed by OAI-SearchBot. However,
it significantly increases visibility and the likelihood
that your content can be discovered and referenced
in responses.

If your site is not indexed in Bing, it may still be reachable


if the exact URL is provided or linked from other indexed
sources, but discoverability will be limited.

37
Bot behavior Content optimization
Can my content appear in ChatGPT if I block GPTBot? Can I improve my visibility without creating new
content?
Yes. GPTBot is used exclusively for training OpenAI’s
language models—it doesn’t impact your visibility in Definitely. Technical SEO updates alone can significantly
ChatGPT Search. If your content is blocked for GPTBot, impact your discoverability. Fixing blocked pages,
but still open to ChatGPT-User and OAI-SearchBot, you correcting status codes, improving crawl depth,
can absolutely still appear in responses. and updating robots.txt can all help with visibility—
sometimes more than publishing something new.
What does it mean if none of the OpenAI bots are
visiting my site? You can also reformat existing content to better
match user questions. Think clearer headings, shorter
Start by checking:
paragraphs, or directly answering “what is” or “how
1. Is your robots.txt blocking any of the bots? to” prompts.

2. Are your pages indexed in Bing?

3. Do your pages return proper 200 status codes?

4. Is your content thin, gated, or dynamically


rendered in a way bots can’t interpret?

If everything checks out, it could be a matter of time.


Visibility in ChatGPT grows as more users prompt for
content in your niche. Use Oncrawl to monitor changes
over weeks, not just days.

38
Technical questions
When a user sends a search-enabled prompt, the answers contain different types
of links, (i.e. search_result; supporting_websites; safe_urls; blocked_urls) what do
they mean?

search_result:

These denote the search result URLs gathered by ChatGPT for the prompt.

sources_footnote:

These are the main citation URLs used directly in the answer.

supporting_websites:

These are additional citation URLs available on hover when multiple citations are used.

safe_urls:

These are URLs that were verified as safe for the user to visit. They are included to provide
users with relevant and trustworthy resources.

blocked_urls:

These refer to URLs ChatGPT is not permitted to access or include in responses for safety,
content, or policy reasons.

39
Wrapping up

40
While Google still holds a majority of the search engine market share, we
are seeing some changes in the search landscape. Search is no longer
happening on search engines alone—it’s happening in conversations
with voice assistants, smart devices, and AI tools like ChatGPT. Users
are now getting answers before they even reach your site.

However, this shift doesn’t mean SEO is dead; it means it is evolving.

The rise of ChatGPT Search represents a new kind of discovery—one


that blends traditional indexing with real-time intent and conversational
context. And with new ways to discover information come new methods
of optimization.

All that being said, the basics of search remain the same for traditional
search engines as for AI search. If your content can’t be reached or
understood, it won’t reach your users—no matter how good it is.

The good news? You’re not powerless. You can track how OpenAI bots
interact with your site and ensure the right ones are allowed in. You can
use log file analysis to diagnose what's blocking visibility and refine your
content to better respond to the questions your users are really asking.

Whether you’re a technical SEO expert or just getting started, this new
landscape pushes us to think about visibility in terms of presence and
no longer just rankings.

Are your pages showing up when questions are being asked? You now
have the knowledge—and the tools—to answer yes.

41
Contributors
Thank you to all of the talented professionals who helped us make this e-book happen with their insightful contributions.

- Jérôme Salomon - Veronika Höller

Senior Technical SEO at Oncrawl Head of Demand Generation at Tresorit

- Janaina Barreto-Romero - Mathieu Chapon

Senior Technical SEO at Oncrawl Director of Data and Innovation at Peak Ace

- John Campbell - Joshua Blyskal

Head of Innovation and AI at ROAST AI Strategist at Profound

42
Curious to see how Oncrawl can help you monitor AI
bot behavior, uncover crawl gaps,
and make smarter SEO decisions—faster?

Book a demo

www.oncrawl.com

43

You might also like