Connect with us

SEO

Is RankBrain A Ranking Factor?

Published

on

Is RankBrain A Ranking Factor?


Without knowing what “RankBrain” means, people new to SEO may assume it refers to a technology Google uses to rank search results.

That assumption isn’t far off, but not every component of Google’s search algorithm is a ranking factor in and of itself.

In this article, we’ll investigate the claims around RankBrain as a ranking factor and provide clarity on what RankBrain is and how it’s used in search results.

The Claim: RankBrain Is A Ranking Factor

RankBrain is a technology that’s said to impact how Google returns search results.

Due to its association with search, RankBrain is commonly referred to as a ranking factor.

If you’re new to SEO, you may hear that and start to think RankBrain is one more signal you have to optimize for.

But that’s exactly not how it works.

Advertisement

The next section goes over what RankBrain is designed to do, and when it’s called upon by Google to assist with answering queries.

The Evidence: Is RankBrain A Ranking Factor?

RankBrain is an artificial intelligence (AI) system introduced in 2015 to help Google with returning results for queries that have never been searched before.

That changed somewhere between the spring of 2015 and 2016 when an unannounced update was made to RankBrain which integrated the AI into all queries.

This information was revealed in a Wired article, which notes Google isn’t clear on how RankBrain improves all queries but it does affect rankings.

From Wired:

“Google is characteristically fuzzy on exactly how it improves search (something to do with the long tail? Better interpretation of ambiguous requests?) but [Google engineer Jeff Dean] says that RankBrain is “involved in every query,” and affects the actual rankings “probably not in every query but in a lot of queries.”

What differentiates RankBrain from other Google algorithms is its ability to learn how to answer more ambiguous queries.

As Google’s Gary Illyes explains, this is accomplished through making educated guesses at what a user would likely click on for a never-before-seen query.

“RankBrain is a PR-sexy machine learning ranking component that uses historical search data to predict what would a user most likely click on for a previously unseen query.”

RankBrain allows Google to solve problems it used to run into with traditional algorithms.

Advertisement

Contrary to popular theories about how RankBrain works, it does not use data gathered from users’ interactions with a web page.

RankBrain relies more on data gathered from users’ interactions with search results.

Illyes provides further clarity:

“It is a really cool piece of engineering that saved our butts countless times whenever traditional algos were like, e.g. “oh look a “not” in the query string! let’s ignore the hell out of it!”, but it’s generally just relying on (sometimes) months old data about what happened on the results page itself, not on the landing page.”

In short – RankBrain is a machine learning system that allows Google’s search algorithm to deliver more relevant results.

This is thought to be accomplished through an improved understanding of ambiguous queries and long-tail keywords.

RankBrain uses data gathered from users’ interactions with search results to predict which pages will likely get clicked on for a brand new search query.

RankBrain As A Ranking Factor: Our Verdict

Google has confirmed that RankBrain is used to rank search results and it is involved in all queries.

Advertisement

In 2016, Andrey Lipattsev, a Google Search Quality Senior Strategist at Google, said RankBrain was one of the three most important ranking signals (along with content and links).

RankBrain continues to play an important role in search results today.

RankBrain differs from traditional ranking factors in that there’s not an obvious way to actively optimize for it.

How do you optimize for ambiguous keywords or queries that no one’s ever entered into Google before?

The only option is to provide Google with as much information about a page as possible, which is something site owners should be doing anyway if they’re creating holistic content for users.

Illyes was asked this question once and replied with a similar sentiment:

“you optimize your content for users and thus for RankBrain. That hasn’t changed.”

Search Engine Journal VIP Contributor Dave Davies provides more advanced tips for communicating information to Google regarding different entities on a page in A Complete Guide to the Google RankBrain Algorithm.


Featured Image: Robin Biong/Search Engine Journal

Advertisement





Source link

Advertisement
See also  Alt Text Only A Factor For Image Search

SEO

How & Why To Prevent Bots From Crawling Your Site

Published

on

How & Why To Prevent Bots From Crawling Your Site

For the most part, bots and spiders are relatively harmless.

You want Google’s bot, for example, to crawl and index your website.

However, bots and spiders can sometimes be a problem and provide unwanted traffic.

This kind of unwanted traffic can result in:

  • Obfuscation of where the traffic is coming from.
  • Confusing and hard to understand reports.
  • Misattribution in Google Analytics.
  • Increased bandwidth costs that you pay for.
  • Other nuisances.

There are good bots and bad bots.

Good bots run in the background, seldom attacking another user or website.

Bad bots break the security behind a website or are used as a wide, large-scale botnet to deliver DDOS attacks against a large organization (something that a single machine cannot take down).

Here’s what you should know about bots and how to prevent the bad ones from crawling your site.

Advertisement

What Is A Bot?

Looking at exactly what a bot is can help identify why we need to block it and keep it from crawling our site.

A bot, short for “robot,” is a software application designed to repeat a specific task repeatedly.

For many SEO professionals, utilizing bots goes along with scaling an SEO campaign.

“Scaling” means you automate as much work as possible to get better results faster.

Common Misconceptions About Bots

You may have run into the misconception that all bots are evil and must be banned unequivocally from your site.

But this could not be further from the truth.

Google is a bot.

If you block Google, can you guess what will happen to your search engine rankings?

Advertisement

Some bots can be malicious, designed to create fake content or posing as legit websites to steal your data.

However, bots are not always malicious scripts run by bad actors.

Some can be great tools that help make work easier for SEO professionals, such as automating common repetitive tasks or scraping useful information from search engines.

Some common bots SEO professionals use are Semrush and Ahrefs.

These bots scrape useful data from the search engines, help SEO pros automate and complete tasks, and can help make your job easier when it comes to SEO tasks.

Why Would You Need to Block Bots From Crawling Your Site?

While there are many good bots, there are also bad bots.

Bad bots can help steal your private data or take down an otherwise operating website.

We want to block any bad bots we can uncover.

Advertisement

It’s not easy to discover every bot that may crawl your site but with a little bit of digging, you can find malicious ones that you don’t want to visit your site anymore.

So why would you need to block bots from crawling your website?

Some common reasons why you may want to block bots from crawling your site could include:

Protecting Your Valuable Data

Perhaps you found that a plugin is attracting a number of malicious bots who want to steal your valuable consumer data.

Or, you found that a bot took advantage of a security vulnerability to add bad links all over your site.

Or, someone keeps trying to spam your contact form with a bot.

This is where you need to take certain steps to protect your valuable data from getting compromised by a bot.

See also  Google Search Ranking Algorithm Update On April 20th & 21st

Bandwidth Overages

If you get an influx of bot traffic, chances are your bandwidth will skyrocket as well, leading to unforeseen overages and charges you would rather not have.

Advertisement

You absolutely want to block the offending bots from crawling your site in these cases.

You don’t want a situation where you’re paying thousands of dollars for bandwidth you don’t deserve to be charged for.

What’s bandwidth?

Bandwidth is the transfer of data from your server to the client-side (web browser).

Every time data is sent over a connection attempt you use bandwidth.

When bots access your site and you waste bandwidth, you could incur overage charges from exceeding your monthly allotted bandwidth.

You should have been given at least some detailed information from your host when you signed up for your hosting package.

Limiting Bad Behavior

If a malicious bot somehow started targeting your site, it would be appropriate to take steps to control this.

Advertisement

For example, you would want to ensure that this bot would not be able to access your contact forms. You want to make sure the bot can’t access your site.

Do this before the bot can compromise your most critical files.

By ensuring your site is properly locked down and secure, it is possible to block these bots so they don’t cause too much damage.

How To Block Bots From Your Site Effectively

You can use two methods to block bots from your site effectively.

The first is through robots.txt.

This is a file that sits at the root of your web server. Usually, you may not have one by default, and you would have to create one.

These are a few highly useful robots.txt codes that you can use to block most spiders and bots from your site:

Disallow Googlebot From Your Server

If, for some reason, you want to stop Googlebot from crawling your server at all, the following code is the code you would use:

Advertisement

User-agent: Googlebot
Disallow: /

You only want to use this code to keep your site from being indexed at all.

Don’t use this on a whim!

Have a specific reason for making sure you don’t want bots crawling your site at all.

For example, a common issue is wanting to keep your staging site out of the index.

You don’t want Google crawling the staging site and your real site because you are doubling up on your content and creating duplicate content issues as a result.

Disallowing All Bots From Your Server

If you want to keep all bots from crawling your site at all, the following code is the one you will want to use:

User-agent: *
Disallow: /

Advertisement

This is the code to disallow all bots. Remember our staging site example from above?

Perhaps you want to exclude the staging site from all bots before fully deploying your site to all of them.

Or perhaps you want to keep your site private for a time before launching it to the world.

Either way, this will keep your site hidden from prying eyes.

See also  Google On The SEO Impact Of Changing Website Hosting Location

Keeping Bots From Crawling a Specific Folder

If for some reason, you want to keep bots from crawling a specific folder that you want to designate, you can do that too.

The following is the code you would use:

User-agent: *
Disallow: /folder-name/

There are many reasons someone would want to exclude bots from a folder. Perhaps you want to ensure that certain content on your site isn’t indexed.

Advertisement

Or maybe that particular folder will cause certain types of duplicate content issues, and you want to exclude it from crawling entirely.

Either way, this will help you do that.

Common Mistakes With Robots.txt

There are several mistakes that SEO professionals make with robots.txt. The top common mistakes include:

  • Using both disallow in robots.txt and noindex.
  • Using the forward slash / (all folders down from root), when you really mean a specific URL.
  • Not including the correct path.
  • Not testing your robots.txt file.
  • Not knowing the correct name of the user-agent you want to block.

Using Both Disallow In Robots.txt And Noindex On The Page

Google’s John Mueller has stated you should not be using both disallow in robots.txt and noindex on the page itself.

If you do both, Google cannot crawl the page to see the noindex, so it could potentially still index the page anyway.

This is why you should only use one or the other, and not both.

Using The Forward Slash When You Really Mean A Specific URL

The forward slash after Disallow means “from this root folder on down, completely and entirely for eternity.”

Every page on your site will be blocked forever until you change it.

One of the most common issues I find in website audits is that someone accidentally added a forward slash to “Disallow:” and blocked Google from crawling their entire site.

Advertisement

Not Including The Correct Path

We understand. Sometimes coding robots.txt can be a tough job.

You couldn’t remember the exact correct path initially, so you went through the file and winging it.

The problem is that these similar paths all result in 404s because they are one character off.

This is why it’s important always to double-check the paths you use on specific URLs.

You don’t want to run the risk of adding a URL to robots.txt that isn’t going to work in robots.txt.

Not Knowing The Correct Name Of The User-Agent

If you want to block a particular user-agent but you don’t know the name of that user-agent, that’s a problem.

Rather than using the name you think you remember, do some research and figure out the exact name of the user-agent that you need.

If you are trying to block specific bots, then that name becomes extremely important in your efforts.

Advertisement

Why Else Would You Block Bots And Spiders?

There are other reasons SEO pros would want to block bots from crawling their site.

Perhaps they are deep into gray hat (or black hat) PBNs, and they want to hide their private blog network from prying eyes (especially their competitors).

They can do this by utilizing robots.txt to block common bots that SEO professionals use to assess their competition.

See also  Google Modifies Logo Requirements for AMP Structured Data

For example Semrush and Ahrefs.

If you wanted to block Ahrefs, this is the code to do so:

User-agent: AhrefsBot
Disallow: /

This will block AhrefsBot from crawling your entire site.

If you want to block Semrush, this is the code to do so.

Advertisement

There are also other instructions here.

There are a lot of lines of code to add, so be careful when adding these:

To block SemrushBot from crawling your site for different SEO and technical issues:

User-agent: SiteAuditBot
Disallow: /

To block SemrushBot from crawling your site for Backlink Audit tool:

User-agent: SemrushBot-BA
Disallow: /

To block SemrushBot from crawling your site for On Page SEO Checker tool and similar tools:

User-agent: SemrushBot-SI
Disallow: /

To block SemrushBot from checking URLs on your site for SWA tool:

Advertisement

User-agent: SemrushBot-SWA
Disallow: /

To block SemrushBot from crawling your site for Content Analyzer and Post Tracking tools:

User-agent: SemrushBot-CT
Disallow: /

To block SemrushBot from crawling your site for Brand Monitoring:

User-agent: SemrushBot-BM
Disallow: /

To block SplitSignalBot from crawling your site for SplitSignal tool:

User-agent: SplitSignalBot
Disallow: /

To block SemrushBot-COUB from crawling your site for Content Outline Builder tool:

Advertisement

User-agent: SemrushBot-COUB
Disallow: /

Using Your HTACCESS File To Block Bots

If you are on an APACHE web server, you can utilize your site’s htaccess file to block specific bots.

For example, here is how you would use code in htaccess to block ahrefsbot.

Please note: be careful with this code.

If you don’t know what you are doing, you could bring down your server.

We only provide this code here for example purposes.

Make sure you do your research and practice on your own before adding it to a production server.

Order Allow,Deny
Deny from 51.222.152.133
Deny from 54.36.148.1
Deny from 195.154.122
Allow from all

For this to work properly, make sure you block all the IP ranges listed in this article on the Ahrefs blog.

Advertisement

If you want a comprehensive introduction to .htaccess, look no further than this tutorial on Apache.org.

If you need help using your htaccess file to block specific types of bots, you can follow the tutorial here.

Blocking Bots and Spiders Can Require Some Work

But it’s well worth it in the end.

By making sure you block bots and spiders from crawling your site, you don’t fall into the same trap as others.

You can rest easy knowing your site is immune to certain automated processes.

When you can control these particular bots, it makes things that much better for you, the SEO professional.

If you have to, always make sure that block the required bots and spiders from crawling your site.

This will result in enhanced security, a better overall online reputation, and a much better site that will be there in the years to come.

Advertisement

More resources:


Featured Image: Roman Samborskyi/Shutterstock

!function(f,b,e,v,n,t,s)
{if(f.fbq)return;n=f.fbq=function(){n.callMethod?
n.callMethod.apply(n,arguments):n.queue.push(arguments)};
if(!f._fbq)f._fbq=n;n.push=n;n.loaded=!0;n.version=’2.0′;
n.queue=[];t=b.createElement(e);t.async=!0;
t.src=v;s=b.getElementsByTagName(e)[0];
s.parentNode.insertBefore(t,s)}(window,document,’script’,
‘https://connect.facebook.net/en_US/fbevents.js’);

if( typeof sopp !== “undefined” && sopp === ‘yes’ ){
fbq(‘dataProcessingOptions’, [‘LDU’], 1, 1000);
}else{
fbq(‘dataProcessingOptions’, []);
}

fbq(‘init’, ‘1321385257908563’);

fbq(‘track’, ‘PageView’);

fbq(‘trackSingle’, ‘1321385257908563’, ‘ViewContent’, {
content_name: ‘prevent-bot-crawling’,
content_category: ‘technical-seo web-development’
});

Source link

Advertisement
Continue Reading

DON'T MISS ANY IMPORTANT NEWS!
Subscribe To our Newsletter
We promise not to spam you. Unsubscribe at any time.
Invalid email address

Trending