Connect with us

SEO

How & Why To Prevent Bots From Crawling Your Site

Published

on

How & Why To Prevent Bots From Crawling Your Site

For the most part, bots and spiders are relatively harmless.

You want Google’s bot, for example, to crawl and index your website.

However, bots and spiders can sometimes be a problem and provide unwanted traffic.

This kind of unwanted traffic can result in:

  • Obfuscation of where the traffic is coming from.
  • Confusing and hard to understand reports.
  • Misattribution in Google Analytics.
  • Increased bandwidth costs that you pay for.
  • Other nuisances.

There are good bots and bad bots.

Good bots run in the background, seldom attacking another user or website.

Bad bots break the security behind a website or are used as a wide, large-scale botnet to deliver DDOS attacks against a large organization (something that a single machine cannot take down).

Here’s what you should know about bots and how to prevent the bad ones from crawling your site.

Advertisement

What Is A Bot?

Looking at exactly what a bot is can help identify why we need to block it and keep it from crawling our site.

A bot, short for “robot,” is a software application designed to repeat a specific task repeatedly.

For many SEO professionals, utilizing bots goes along with scaling an SEO campaign.

“Scaling” means you automate as much work as possible to get better results faster.

Common Misconceptions About Bots

You may have run into the misconception that all bots are evil and must be banned unequivocally from your site.

But this could not be further from the truth.

Google is a bot.

If you block Google, can you guess what will happen to your search engine rankings?

Advertisement

Some bots can be malicious, designed to create fake content or posing as legit websites to steal your data.

However, bots are not always malicious scripts run by bad actors.

Some can be great tools that help make work easier for SEO professionals, such as automating common repetitive tasks or scraping useful information from search engines.

Some common bots SEO professionals use are Semrush and Ahrefs.

These bots scrape useful data from the search engines, help SEO pros automate and complete tasks, and can help make your job easier when it comes to SEO tasks.

Why Would You Need to Block Bots From Crawling Your Site?

While there are many good bots, there are also bad bots.

Bad bots can help steal your private data or take down an otherwise operating website.

We want to block any bad bots we can uncover.

Advertisement

It’s not easy to discover every bot that may crawl your site but with a little bit of digging, you can find malicious ones that you don’t want to visit your site anymore.

So why would you need to block bots from crawling your website?

Some common reasons why you may want to block bots from crawling your site could include:

Protecting Your Valuable Data

Perhaps you found that a plugin is attracting a number of malicious bots who want to steal your valuable consumer data.

Or, you found that a bot took advantage of a security vulnerability to add bad links all over your site.

Or, someone keeps trying to spam your contact form with a bot.

This is where you need to take certain steps to protect your valuable data from getting compromised by a bot.

Bandwidth Overages

If you get an influx of bot traffic, chances are your bandwidth will skyrocket as well, leading to unforeseen overages and charges you would rather not have.

Advertisement

You absolutely want to block the offending bots from crawling your site in these cases.

You don’t want a situation where you’re paying thousands of dollars for bandwidth you don’t deserve to be charged for.

What’s bandwidth?

Bandwidth is the transfer of data from your server to the client-side (web browser).

Every time data is sent over a connection attempt you use bandwidth.

When bots access your site and you waste bandwidth, you could incur overage charges from exceeding your monthly allotted bandwidth.

You should have been given at least some detailed information from your host when you signed up for your hosting package.

Limiting Bad Behavior

If a malicious bot somehow started targeting your site, it would be appropriate to take steps to control this.

Advertisement

For example, you would want to ensure that this bot would not be able to access your contact forms. You want to make sure the bot can’t access your site.

Do this before the bot can compromise your most critical files.

By ensuring your site is properly locked down and secure, it is possible to block these bots so they don’t cause too much damage.

How To Block Bots From Your Site Effectively

You can use two methods to block bots from your site effectively.

The first is through robots.txt.

This is a file that sits at the root of your web server. Usually, you may not have one by default, and you would have to create one.

These are a few highly useful robots.txt codes that you can use to block most spiders and bots from your site:

Disallow Googlebot From Your Server

If, for some reason, you want to stop Googlebot from crawling your server at all, the following code is the code you would use:

Advertisement

User-agent: Googlebot
Disallow: /

You only want to use this code to keep your site from being indexed at all.

Don’t use this on a whim!

Have a specific reason for making sure you don’t want bots crawling your site at all.

For example, a common issue is wanting to keep your staging site out of the index.

You don’t want Google crawling the staging site and your real site because you are doubling up on your content and creating duplicate content issues as a result.

Disallowing All Bots From Your Server

If you want to keep all bots from crawling your site at all, the following code is the one you will want to use:

User-agent: *
Disallow: /

Advertisement

This is the code to disallow all bots. Remember our staging site example from above?

Perhaps you want to exclude the staging site from all bots before fully deploying your site to all of them.

Or perhaps you want to keep your site private for a time before launching it to the world.

Either way, this will keep your site hidden from prying eyes.

Keeping Bots From Crawling a Specific Folder

If for some reason, you want to keep bots from crawling a specific folder that you want to designate, you can do that too.

The following is the code you would use:

User-agent: *
Disallow: /folder-name/

There are many reasons someone would want to exclude bots from a folder. Perhaps you want to ensure that certain content on your site isn’t indexed.

Advertisement

Or maybe that particular folder will cause certain types of duplicate content issues, and you want to exclude it from crawling entirely.

Either way, this will help you do that.

Common Mistakes With Robots.txt

There are several mistakes that SEO professionals make with robots.txt. The top common mistakes include:

  • Using both disallow in robots.txt and noindex.
  • Using the forward slash / (all folders down from root), when you really mean a specific URL.
  • Not including the correct path.
  • Not testing your robots.txt file.
  • Not knowing the correct name of the user-agent you want to block.

Using Both Disallow In Robots.txt And Noindex On The Page

Google’s John Mueller has stated you should not be using both disallow in robots.txt and noindex on the page itself.

If you do both, Google cannot crawl the page to see the noindex, so it could potentially still index the page anyway.

This is why you should only use one or the other, and not both.

Using The Forward Slash When You Really Mean A Specific URL

The forward slash after Disallow means “from this root folder on down, completely and entirely for eternity.”

Every page on your site will be blocked forever until you change it.

One of the most common issues I find in website audits is that someone accidentally added a forward slash to “Disallow:” and blocked Google from crawling their entire site.

Advertisement

Not Including The Correct Path

We understand. Sometimes coding robots.txt can be a tough job.

You couldn’t remember the exact correct path initially, so you went through the file and winging it.

The problem is that these similar paths all result in 404s because they are one character off.

This is why it’s important always to double-check the paths you use on specific URLs.

You don’t want to run the risk of adding a URL to robots.txt that isn’t going to work in robots.txt.

Not Knowing The Correct Name Of The User-Agent

If you want to block a particular user-agent but you don’t know the name of that user-agent, that’s a problem.

Rather than using the name you think you remember, do some research and figure out the exact name of the user-agent that you need.

If you are trying to block specific bots, then that name becomes extremely important in your efforts.

Advertisement

Why Else Would You Block Bots And Spiders?

There are other reasons SEO pros would want to block bots from crawling their site.

Perhaps they are deep into gray hat (or black hat) PBNs, and they want to hide their private blog network from prying eyes (especially their competitors).

They can do this by utilizing robots.txt to block common bots that SEO professionals use to assess their competition.

For example Semrush and Ahrefs.

If you wanted to block Ahrefs, this is the code to do so:

User-agent: AhrefsBot
Disallow: /

This will block AhrefsBot from crawling your entire site.

If you want to block Semrush, this is the code to do so.

Advertisement

There are also other instructions here.

There are a lot of lines of code to add, so be careful when adding these:

To block SemrushBot from crawling your site for different SEO and technical issues:

User-agent: SiteAuditBot
Disallow: /

To block SemrushBot from crawling your site for Backlink Audit tool:

User-agent: SemrushBot-BA
Disallow: /

To block SemrushBot from crawling your site for On Page SEO Checker tool and similar tools:

User-agent: SemrushBot-SI
Disallow: /

To block SemrushBot from checking URLs on your site for SWA tool:

Advertisement

User-agent: SemrushBot-SWA
Disallow: /

To block SemrushBot from crawling your site for Content Analyzer and Post Tracking tools:

User-agent: SemrushBot-CT
Disallow: /

To block SemrushBot from crawling your site for Brand Monitoring:

User-agent: SemrushBot-BM
Disallow: /

To block SplitSignalBot from crawling your site for SplitSignal tool:

User-agent: SplitSignalBot
Disallow: /

To block SemrushBot-COUB from crawling your site for Content Outline Builder tool:

Advertisement

User-agent: SemrushBot-COUB
Disallow: /

Using Your HTACCESS File To Block Bots

If you are on an APACHE web server, you can utilize your site’s htaccess file to block specific bots.

For example, here is how you would use code in htaccess to block ahrefsbot.

Please note: be careful with this code.

If you don’t know what you are doing, you could bring down your server.

We only provide this code here for example purposes.

Make sure you do your research and practice on your own before adding it to a production server.

Order Allow,Deny
Deny from 51.222.152.133
Deny from 54.36.148.1
Deny from 195.154.122
Allow from all

For this to work properly, make sure you block all the IP ranges listed in this article on the Ahrefs blog.

Advertisement

If you want a comprehensive introduction to .htaccess, look no further than this tutorial on Apache.org.

If you need help using your htaccess file to block specific types of bots, you can follow the tutorial here.

Blocking Bots and Spiders Can Require Some Work

But it’s well worth it in the end.

By making sure you block bots and spiders from crawling your site, you don’t fall into the same trap as others.

You can rest easy knowing your site is immune to certain automated processes.

When you can control these particular bots, it makes things that much better for you, the SEO professional.

If you have to, always make sure that block the required bots and spiders from crawling your site.

This will result in enhanced security, a better overall online reputation, and a much better site that will be there in the years to come.

Advertisement

More resources:


Featured Image: Roman Samborskyi/Shutterstock

!function(f,b,e,v,n,t,s)
{if(f.fbq)return;n=f.fbq=function(){n.callMethod?
n.callMethod.apply(n,arguments):n.queue.push(arguments)};
if(!f._fbq)f._fbq=n;n.push=n;n.loaded=!0;n.version=’2.0′;
n.queue=[];t=b.createElement(e);t.async=!0;
t.src=v;s=b.getElementsByTagName(e)[0];
s.parentNode.insertBefore(t,s)}(window,document,’script’,
‘https://connect.facebook.net/en_US/fbevents.js’);

if( typeof sopp !== “undefined” && sopp === ‘yes’ ){
fbq(‘dataProcessingOptions’, [‘LDU’], 1, 1000);
}else{
fbq(‘dataProcessingOptions’, []);
}

fbq(‘init’, ‘1321385257908563’);

fbq(‘track’, ‘PageView’);

fbq(‘trackSingle’, ‘1321385257908563’, ‘ViewContent’, {
content_name: ‘prevent-bot-crawling’,
content_category: ‘technical-seo web-development’
});

Source link

Advertisement

SEO

How We Used a Video Course to Promote Ahrefs (And Got 500K+ Views)

Published

on

How We Used a Video Course to Promote Ahrefs (And Got 500K+ Views)

Creating and selling educational courses can be a lucrative business. But if you already have a product to sell, you can actually use courses as a marketing tool.

Back in 2017, about two years after joining Ahrefs, I decided to create a course on content marketing.

I had a very clear understanding of how an educational course would help me promote Ahrefs.

  • People like courses – Folks like Brian Dean and Glen Allsopp were selling theirs for $500 to $2,000 a pop (and rather successfully). So a free course of comparable quality was sure to get attention.
  • Courses allow for a deeper connection – You would basically be spending a few hours one on one with your students. And if you managed to win their trust, you’d get an opportunity to promote your product to them.

That was my raw thought process going into this venture.

And I absolutely didn’t expect that the lifespan of my course would be as interesting and nuanced as it turned out to be.

The lessons of my course have generated over 500K+ in total views, brought in mid-five-figures in revenue (without even trying), and turned out to be a very helpful resource for our various marketing purposes.

So here goes the story of my “Blogging for Business” course.

1. The creation

I won’t give you any tips on how to create a successful course (well, maybe just one). There are plenty of resources (courses?) on that topic already.

Advertisement

All I want to say is that my own experience was quite grueling.

The 10 lessons of my course span some 40K words. I have never attempted the feat of writing a book, but I imagine creating such a lengthy course is as close as it gets.

Scripts of the course in Google Docs.

I spent a tremendous amount of time polishing each lesson. The course was going to be free, so it was critical that my content was riveting. If not, people would just bounce from it.

Paid courses are quite different in that sense. You pay money to watch them. So even if the content is boring at times, you’ll persevere anyway to ensure a return on your investment.

When I showed the draft version of the course to my friend, Ali Mese, he gave me a simple yet invaluable tip: “Break your lessons into smaller ones. Make each just three to four minutes long.”

How did I not think of this myself? 

Short, “snackable” lessons provide a better sense of completion and progress. You’re also more likely to finish a short lesson without getting distracted by something. 

I’m pretty sure that it is because of this simple tip that my course landed this Netflix comparison (i.e., best compliment ever):

2. The strategy

With the prices of similar courses ranging from $500 to $2,000, it was really tempting to make some profit with ours.

I think we had around 15,000 paying customers at Ahrefs at that time (and many more on the free plan). So if just 1% of them bought that course for $1K, that would be an easy $150K to pocket. And then we could keep upselling it to our future customers.

Alternatively, we thought about giving access to the course to our paying customers only. 

This might have boosted our sales, since the course was a cool addition to the Ahrefs subscription. 

And it could also improve user retention. The course was a great training resource for new employees, which our customers would lose access to if they canceled their Ahrefs subscription.

And yet, releasing it for free as a lead acquisition and lead nurturing play seemed to make a lot more sense than the other two options. So we stuck to that.

3. The waitlist

Teasing something to people before you let them get it seems like one of the fundamental rules of marketing.

  • Apple announces new products way before they’re available in stores. 
  • Movie studios publish trailers of upcoming movies months (sometimes years) before they hit the theaters. 
  • When you have a surprise for your significant other (or your kids), you can’t help but give them some hints before the reveal.

There’s something about “the wait” and the anticipation that we humans just love to experience.

So while I was toiling away and putting lessons of my course together, we launched a landing page to announce it and collect people’s emails.

Advertisement
The landing page of the course.

In case someone hesitated to leave their email, we had two cool bonuses to nudge them:

  1. Access to the private Slack community
  2. Free two-week trial of Ahrefs

The latter appealed to freebie lovers so much that it soon “leaked” to Reddit and BlackHatWorld. In hindsight, this leak was actually a nice (unplanned) promo for the course.

4. The promotion

I don’t remember our exact promotion strategy. But I’m pretty sure it went something like this:

I also added a little “sharing loop” to the welcome email. I asked people to tell their friends about the course, justifying it with the fact that taking the course with others was more fun than doing it alone.

Welcome email with a "sharing loop."

I have no idea how effective that “growth hack” was, but there was no reason not to encourage sharing.

In total, we managed to get some 16,000 people on our waitlist by the day of the course launch.

5. The launch

On a set date, the following email went out to our waitlist:

Course launch email.

Did you notice the “note” saying that the videos were only available for free for 30 days? We did that to nudge people to watch them as soon as possible and not save them to the “Watch later” folder.

In retrospect, I wish we had used this angle from the very beginning: “FREE for 30 days. Then $799.”

This would’ve killed two birds with one stone: 

  1. Added an urgency to complete the course as soon as possible
  2. Made the course more desirable by assigning a specific (and rather high) monetary value to it

(If only we could be as smart about predicting the future as we are about reflecting on the past.) 

Once it was live, the course started to promote itself. I was seeing many super flattering tweets:

We then took the most prominent of those tweets and featured them on the course landing page for some social proof. (They’re still there, by the way.)

6. The paywall

Once the 30 days of free access ran out, we added a $799 paywall. And it didn’t take long for the first sale to arrive:

This early luck didn’t push us to focus on selling this course, though. We didn’t invest any effort into promoting it. It was just sitting passively in our Academy with a $799 price tag, and that was it.

And yet, despite the lack of promotion, that course was generating 8-10 sales every month—which were mostly coming from word of mouth.

A comment in TrafficThinkTank.
Eric Siu giving a shout-out about my course in TTT Slack.

Thanks to its hefty price, my course soon appeared on some popular websites with pirated courses. And we were actually glad that it did. Because that meant more people would learn about our content and product.

Then some people who were “late to the party” started asking me if I was ever going to reopen the course for free again. This actually seemed like a perfectly reasonable strategy at the time:

7. The giveaways

That $799 price tag also turned my free course into a pretty useful marketing tool. It was a perfect gift for all sorts of giveaways on Twitter, on podcasts, during live talks, and so on.

Giving away the course during a live talk.
Me giving away the course during a live talk.

And whenever we partnered with someone, they were super happy to get a few licenses of the course, which they could give out to their audience.

8. The relaunch

Despite my original plan to update and relaunch this course once a year, I got buried under other work and didn’t manage to find time for it.

And then the pandemic hit. 

That’s when we noticed a cool trend. Many companies were providing free access to their premium educational materials. This was done to support the “stay at home” narrative and help people learn new skills.

I think it was SQ who suggested that we should jump on that train with my “Blogging for Business” course. And so we did:

We couldn’t have hoped for a better timing for that relaunch. The buzz was absolutely insane. The announcement tweet alone has generated a staggering 278K+ impressions (not without some paid boosts, of course).

The statistics of the course announcement tweet.

We also went ahead and reposted that course on ProductHunt once again (because why not?).

All in all, that relaunch turned out to be even more successful than the original launch itself. 

In the course of their lifespan on Wistia, the 40 video lessons of my course generated a total of 372K plays.

Advertisement
Play count from Wistia.

And this isn’t even the end of it.

9. The launch on YouTube

Because the course was now free, it no longer made sense to host it at Wistia. So we uploaded all lessons to YouTube and made them public.

To date, the 41 videos of my course have generated about 187K views on YouTube.

"Blogging for Business" course playlist.

It’s fair to mention that we had around 200,000 subscribers on our channel at the time of publishing my course there. A brand-new channel with no existing subscribers will likely generate fewer views.

10. The relaunch on YouTube [coming soon]

Here’s an interesting observation that both Sam and I made at around the same time. 

Many people were publishing their courses on YouTube as a single video spanning a few hours rather than cutting them into individual lessons like we did. And those long videos were generating millions of views!

Like these two, ranking at the top for “learn Python course,” which have 33M and 27M views, respectively:

"Learn python course" search on YouTube.

So we decided to run a test with Sam’s “SEO for Beginners” course. It was originally published on YouTube as 14 standalone video lessons and generated a total of 140K views.

Well, the “single video” version of that same course has blown it out of the water with over 1M views as of today.

I’m sure you can already tell where I’m going with this.

We’re soon going to republish my “Blogging for Business” course on YouTube as a single video. And hopefully, it will perform just as well.

Advertisement

The end

So that’s the story of my “Blogging for Business” course. From the very beginning, it was planned as a promotional tool for Ahrefs. And judging by its performance, I guess it fulfilled its purpose rather successfully.

A screenshot of a Slack message.

Don’t get me wrong, though. 

The fact that my course was conceived as a promotional tool doesn’t mean that I didn’t pour my heart and soul into it. It was a perfectly genuine and honest attempt to create a super useful educational resource for content marketing newbies.

And I’m still hoping to work on the 2.0 version of it someday. In the past four years, I have accrued quite a bit more content marketing knowledge that I’m keen to share with everyone. So follow me on Twitter, and stay tuned.



Source link

Continue Reading

DON'T MISS ANY IMPORTANT NEWS!
Subscribe To our Newsletter
We promise not to spam you. Unsubscribe at any time.
Invalid email address

Trending

en_USEnglish