Connect with us


“It’s Impossible To Crawl The Whole Web”



“It’s Impossible To Crawl The Whole Web”


In response to a question about why SEO tools don’t show all backlinks, Google’s Search Advocate John Mueller says it’s impossible to crawl the whole web.

This is stated in a comment on Reddit in a thread started by a frustrated SEO professional.

They ask why all links pointing to a site aren’t getting found by an SEO tool they’re using.

Which tool the person is using isn’t important. As we learn from Mueller, it’s not possible for any tool to discover 100% of a website’s inbound links.

Here’s why.

There’s No Way To Crawl The Web “Properly”

Mueller says there’s no objectively correct way to crawl the web because it has an infinite number of URLs.

No one has the resources to keep an endless amount of URLs in a database, so web crawlers try to determine what’s worth crawling


As Mueller explains, that inevitably leads to URLs getting crawled infrequently or not at all.

“There’s no objective way to crawl the web properly.

It’s theoretically impossible to crawl it all, since the number of actual URLs is effectively infinite. Since nobody can afford to keep an infinite number of URLs in a database, all web crawlers make assumptions, simplifications, and guesses about what is realistically worth crawling.

And even then, for practical purposes, you can’t crawl all of that all the time, the internet doesn’t have enough connectivity & bandwidth for that, and it costs a lot of money if you want to access a lot of pages regularly (for the crawler, and for the site’s owner).

Past that, some pages change quickly, others haven’t changed for 10 years – so crawlers try to save effort by focusing more on the pages that they expect to change, rather than those that they expect not to change.”

How Web Crawlers Determine What’s Worth Crawling

Mueller goes on to explain how web crawlers, including search engines and SEO tools, figure out which URLs are worth crawling.

“And then, we touch on the part where crawlers try to figure out which pages are actually useful.

The web is filled with junk that nobody cares about, pages that have been spammed into uselessness. These pages may still regularly change, they may have reasonable URLs, but they’re just destined for the landfill, and any search engine that cares about their users will ignore them.

Sometimes it’s not just obvious junk either. More & more, sites are technically ok, but just don’t reach “the bar” from a quality point of view to merit being crawled more.”

Web Crawlers Work With A Limited Set Of URLs

Mueller concludes his response saying all web crawlers work on a “simplified” set of URLs.


Since there’s no correct way to crawl the web, as mentioned previously, every SEO tool has its own way of deciding which URLs are worth crawling.

That’s why one tool may discover backlinks that another tool didn’t find.

“Therefore, all crawlers (including SEO tools) work on a very simplified set of URLs, they have to work out how often to crawl, which URLs to crawl more often, and which parts of the web to ignore. There are no fixed rules for any of this, so every tool will have to make their own decisions along the way. That’s why search engines have different content indexed, why SEO tools list different links, why any metrics built on top of these are so different.”

Source: Reddit

Featured Image: rangizzz/Shutterstock

fbq('track', 'PageView');

fbq('trackSingle', '1321385257908563', 'ViewContent', { content_name: 'googles-john-mueller-its-impossible-to-crawl-the-whole-web', content_category: 'news digital-marketing-tools ' });


Source link


Upgrade Your SEO Content Strategy With These 3 Steps [Webinar]



Upgrade Your SEO Content Strategy With These 3 Steps [Webinar]

With Google constantly releasing algorithm updates, how do you keep up?

Gone are the days when businesses could rank on Google simply by pumping out generic, keyword-stuffed material, which wasn’t necessarily user-friendly.

Now, you can stay ahead of the curve with a forward-thinking approach to search marketing.

In today’s ever-evolving digital landscape, it’s crucial that your SEO tactics stay up-to-date. And, your content should be one of the first places you assess.

A refreshed content strategy can increase organic traffic and drive more conversions.

The key to consistently high rankings is creating high-quality content that offers maximum value to searchers.

It’s all about matching intent to keep users on your site longer and take action.

Join me and Carlos Meza, President and CEO at Crowd Content, as we talk about ways you can upgrade your SEO content strategy in just three steps. This webinar will detail how you can execute a winning content strategy that delivers the best ROI for your efforts.


Key Insights:

  • Drop Outdated Tactics: Learn about and let go of content strategies that used to work but don’t anymore due to Google algorithm updates.
  • Think Ahead: Discover how to future-proof your content strategy to stay ahead of market changes.
  • Level Up: Get tips to scale your content strategy, increase organic traffic, and improve search rankings.

In this live session, you’ll learn how to optimize your content for user intent and relevancy by using a meaningful content structure.

If you struggle with adapting to fast-changing SEO standards, this webinar will help keep your content strategies up-to-date.

Don’t miss out! Register now and discover how quality content can increase your online presence.

Can’t make the live session? That’s ok. Sign up for this sponsored webinar, and we’ll send you a recording after the event. Hope to see you there! Be sure to bring your questions for the live Q&A!

Source link

Continue Reading

Subscribe To our Newsletter
We promise not to spam you. Unsubscribe at any time.
Invalid email address