Connect with us

SEO

How & Why To Prevent Bots From Crawling Your Site

Published

on

For the most part, bots and spiders are relatively harmless.

You want Google’s bot, for example, to crawl and index your website.

However, bots and spiders can sometimes be a problem and provide unwanted traffic.

This kind of unwanted traffic can result in:

  • Obfuscation of where the traffic is coming from.
  • Confusing and hard to understand reports.
  • Misattribution in Google Analytics.
  • Increased bandwidth costs that you pay for.
  • Other nuisances.

There are good bots and bad bots.

Good bots run in the background, seldom attacking another user or website.

Bad bots break the security behind a website or are used as a wide, large-scale botnet to deliver DDOS attacks against a large organization (something that a single machine cannot take down).

Here’s what you should know about bots and how to prevent the bad ones from crawling your site.

What Is A Bot?

Looking at exactly what a bot is can help identify why we need to block it and keep it from crawling our site.

A bot, short for “robot,” is a software application designed to repeat a specific task repeatedly.

For many SEO professionals, utilizing bots goes along with scaling an SEO campaign.

“Scaling” means you automate as much work as possible to get better results faster.

Common Misconceptions About Bots

You may have run into the misconception that all bots are evil and must be banned unequivocally from your site.

But this could not be further from the truth.

Google is a bot.

If you block Google, can you guess what will happen to your search engine rankings?

Some bots can be malicious, designed to create fake content or posing as legit websites to steal your data.

However, bots are not always malicious scripts run by bad actors.

Some can be great tools that help make work easier for SEO professionals, such as automating common repetitive tasks or scraping useful information from search engines.

Some common bots SEO professionals use are Semrush and Ahrefs.

These bots scrape useful data from the search engines, help SEO pros automate and complete tasks, and can help make your job easier when it comes to SEO tasks.

Why Would You Need to Block Bots From Crawling Your Site?

While there are many good bots, there are also bad bots.

Bad bots can help steal your private data or take down an otherwise operating website.

We want to block any bad bots we can uncover.

It’s not easy to discover every bot that may crawl your site but with a little bit of digging, you can find malicious ones that you don’t want to visit your site anymore.

So why would you need to block bots from crawling your website?

Some common reasons why you may want to block bots from crawling your site could include:

Protecting Your Valuable Data

Perhaps you found that a plugin is attracting a number of malicious bots who want to steal your valuable consumer data.

Or, you found that a bot took advantage of a security vulnerability to add bad links all over your site.

Or, someone keeps trying to spam your contact form with a bot.

This is where you need to take certain steps to protect your valuable data from getting compromised by a bot.

Bandwidth Overages

If you get an influx of bot traffic, chances are your bandwidth will skyrocket as well, leading to unforeseen overages and charges you would rather not have.

You absolutely want to block the offending bots from crawling your site in these cases.

You don’t want a situation where you’re paying thousands of dollars for bandwidth you don’t deserve to be charged for.

What’s bandwidth?

Bandwidth is the transfer of data from your server to the client-side (web browser).

Every time data is sent over a connection attempt you use bandwidth.

When bots access your site and you waste bandwidth, you could incur overage charges from exceeding your monthly allotted bandwidth.

You should have been given at least some detailed information from your host when you signed up for your hosting package.

Limiting Bad Behavior

If a malicious bot somehow started targeting your site, it would be appropriate to take steps to control this.

For example, you would want to ensure that this bot would not be able to access your contact forms. You want to make sure the bot can’t access your site.

Do this before the bot can compromise your most critical files.

By ensuring your site is properly locked down and secure, it is possible to block these bots so they don’t cause too much damage.

How To Block Bots From Your Site Effectively

You can use two methods to block bots from your site effectively.

The first is through robots.txt.

This is a file that sits at the root of your web server. Usually, you may not have one by default, and you would have to create one.

These are a few highly useful robots.txt codes that you can use to block most spiders and bots from your site:

Disallow Googlebot From Your Server

If, for some reason, you want to stop Googlebot from crawling your server at all, the following code is the code you would use:

User-agent: Googlebot
Disallow: /

You only want to use this code to keep your site from being indexed at all.

Don’t use this on a whim!

Have a specific reason for making sure you don’t want bots crawling your site at all.

For example, a common issue is wanting to keep your staging site out of the index.

You don’t want Google crawling the staging site and your real site because you are doubling up on your content and creating duplicate content issues as a result.

Disallowing All Bots From Your Server

If you want to keep all bots from crawling your site at all, the following code is the one you will want to use:

User-agent: *
Disallow: /

This is the code to disallow all bots. Remember our staging site example from above?

Perhaps you want to exclude the staging site from all bots before fully deploying your site to all of them.

Or perhaps you want to keep your site private for a time before launching it to the world.

Either way, this will keep your site hidden from prying eyes.

Keeping Bots From Crawling a Specific Folder

If for some reason, you want to keep bots from crawling a specific folder that you want to designate, you can do that too.

The following is the code you would use:

User-agent: *
Disallow: /folder-name/

There are many reasons someone would want to exclude bots from a folder. Perhaps you want to ensure that certain content on your site isn’t indexed.

Or maybe that particular folder will cause certain types of duplicate content issues, and you want to exclude it from crawling entirely.

Either way, this will help you do that.

Common Mistakes With Robots.txt

There are several mistakes that SEO professionals make with robots.txt. The top common mistakes include:

  • Using both disallow in robots.txt and noindex.
  • Using the forward slash / (all folders down from root), when you really mean a specific URL.
  • Not including the correct path.
  • Not testing your robots.txt file.
  • Not knowing the correct name of the user-agent you want to block.

Using Both Disallow In Robots.txt And Noindex On The Page

Google’s John Mueller has stated you should not be using both disallow in robots.txt and noindex on the page itself.

If you do both, Google cannot crawl the page to see the noindex, so it could potentially still index the page anyway.

This is why you should only use one or the other, and not both.

Using The Forward Slash When You Really Mean A Specific URL

The forward slash after Disallow means “from this root folder on down, completely and entirely for eternity.”

Every page on your site will be blocked forever until you change it.

One of the most common issues I find in website audits is that someone accidentally added a forward slash to “Disallow:” and blocked Google from crawling their entire site.

Not Including The Correct Path

We understand. Sometimes coding robots.txt can be a tough job.

You couldn’t remember the exact correct path initially, so you went through the file and winging it.

The problem is that these similar paths all result in 404s because they are one character off.

This is why it’s important always to double-check the paths you use on specific URLs.

You don’t want to run the risk of adding a URL to robots.txt that isn’t going to work in robots.txt.

Not Knowing The Correct Name Of The User-Agent

If you want to block a particular user-agent but you don’t know the name of that user-agent, that’s a problem.

Rather than using the name you think you remember, do some research and figure out the exact name of the user-agent that you need.

If you are trying to block specific bots, then that name becomes extremely important in your efforts.

Why Else Would You Block Bots And Spiders?

There are other reasons SEO pros would want to block bots from crawling their site.

Perhaps they are deep into gray hat (or black hat) PBNs, and they want to hide their private blog network from prying eyes (especially their competitors).

They can do this by utilizing robots.txt to block common bots that SEO professionals use to assess their competition.

For example Semrush and Ahrefs.

If you wanted to block Ahrefs, this is the code to do so:

User-agent: AhrefsBot
Disallow: /

This will block AhrefsBot from crawling your entire site.

If you want to block Semrush, this is the code to do so.

There are also other instructions here.

There are a lot of lines of code to add, so be careful when adding these:

To block SemrushBot from crawling your site for different SEO and technical issues:

User-agent: SiteAuditBot
Disallow: /

To block SemrushBot from crawling your site for Backlink Audit tool:

User-agent: SemrushBot-BA
Disallow: /

To block SemrushBot from crawling your site for On Page SEO Checker tool and similar tools:

User-agent: SemrushBot-SI
Disallow: /

To block SemrushBot from checking URLs on your site for SWA tool:

User-agent: SemrushBot-SWA
Disallow: /

To block SemrushBot from crawling your site for Content Analyzer and Post Tracking tools:

User-agent: SemrushBot-CT
Disallow: /

To block SemrushBot from crawling your site for Brand Monitoring:

User-agent: SemrushBot-BM
Disallow: /

To block SplitSignalBot from crawling your site for SplitSignal tool:

User-agent: SplitSignalBot
Disallow: /

To block SemrushBot-COUB from crawling your site for Content Outline Builder tool:

User-agent: SemrushBot-COUB
Disallow: /

Using Your HTACCESS File To Block Bots

If you are on an APACHE web server, you can utilize your site’s htaccess file to block specific bots.

For example, here is how you would use code in htaccess to block ahrefsbot.

Please note: be careful with this code.

If you don’t know what you are doing, you could bring down your server.

We only provide this code here for example purposes.

Make sure you do your research and practice on your own before adding it to a production server.

Order Allow,Deny
Deny from 51.222.152.133
Deny from 54.36.148.1
Deny from 195.154.122
Allow from all

For this to work properly, make sure you block all the IP ranges listed in this article on the Ahrefs blog.

If you want a comprehensive introduction to .htaccess, look no further than this tutorial on Apache.org.

If you need help using your htaccess file to block specific types of bots, you can follow the tutorial here.

Blocking Bots and Spiders Can Require Some Work

But it’s well worth it in the end.

By making sure you block bots and spiders from crawling your site, you don’t fall into the same trap as others.

You can rest easy knowing your site is immune to certain automated processes.

When you can control these particular bots, it makes things that much better for you, the SEO professional.

If you have to, always make sure that block the required bots and spiders from crawling your site.

This will result in enhanced security, a better overall online reputation, and a much better site that will be there in the years to come.

More resources:


Featured Image: Roman Samborskyi/Shutterstock

!function(f,b,e,v,n,t,s)
{if(f.fbq)return;n=f.fbq=function(){n.callMethod?
n.callMethod.apply(n,arguments):n.queue.push(arguments)};
if(!f._fbq)f._fbq=n;n.push=n;n.loaded=!0;n.version=’2.0′;
n.queue=[];t=b.createElement(e);t.async=!0;
t.src=v;s=b.getElementsByTagName(e)[0];
s.parentNode.insertBefore(t,s)}(window,document,’script’,
‘https://connect.facebook.net/en_US/fbevents.js’);

if( typeof sopp !== “undefined” && sopp === ‘yes’ ){
fbq(‘dataProcessingOptions’, [‘LDU’], 1, 1000);
}else{
fbq(‘dataProcessingOptions’, []);
}

fbq(‘init’, ‘1321385257908563’);

fbq(‘track’, ‘PageView’);

fbq(‘trackSingle’, ‘1321385257908563’, ‘ViewContent’, {
content_name: ‘prevent-bot-crawling’,
content_category: ‘technical-seo web-development’
});

Source link

SEO

Twitter Will Share Ad Revenue With Twitter Blue Verified Creators

Published

on

Twitter Will Share Ad Revenue With Twitter Blue Verified Creators

Elon Musk, owner and CEO of Twitter, announced that starting today, Twitter will share ad revenue with creators. The new policy applies only to ads that appear in a creator’s reply threads.

The move comes on the heels of YouTube launching ad revenue sharing for creators through the YouTube Partner Program in a bid to become the most rewarding social platform for creators.

Social networks like Instagram, TikTok, and Snapchat have similar monetization options for creators who publish reels and video content. For example, Instagram’s Reels Play Bonus Program offers eligible creators up to $1,200 for Reel views.

The catch? Unlike other social platforms, creators on Twitter must have an active subscription to Twitter Blue and meet the eligibility requirements for the Blue Verified checkmark.

The following is an example of a Twitter ad in a reply thread (Promoted by @ASUBootcamps). It should generate revenue for the Twitter Blue Verified creator (@rowancheung), who created the thread.

Screenshot from Twitter, January 2023

To receive the ad revenue share, creators would have to pay $8 per month (or more) to maintain an active Twitter Blue subscription. Twitter Blue pricing varies based on location and is available in the United States, Canada, Australia, New Zealand, Japan, the United Kingdom, Saudi Arabia, France, Germany, Italy, Portugal, and Spain.

Eligibility for the Twitter Blue Verified checkmark includes having an active Twitter Blue subscription and meeting the following criteria.

  • Your account must have a display name, profile photo, and confirmed phone number.
  • Your account has to be older than 90 days and active within the last 30 days.
  • Recent changes to your account’s username, display name, or profile photo can affect eligibility. Modifications to those after verification can also result in a temporary loss of the blue checkmark until Twitter reviews your updated information.
  • Your account cannot appear to mislead or deceive.
  • Your account cannot spam or otherwise try to manipulate the platform for engagement or follows.

Did you receive a Blue Verified checkmark before the Twitter Blue subscription? That will not help creators who want a share of the ad revenue. The legacy Blue Verified checkmark does not make a creator account eligible for ad revenue sharing.

When asked about accounts with a legacy and Twitter Blue Verified checkmark, Musk tweeted that the legacy Blue Verified is “deeply corrupted” and will sunset in just a few months.

Regardless of how you gained your checkmark, it’s important to note that Twitter can remove a checkmark without notice.

In addition to ad revenue sharing for Twitter Blue Verified creators, Twitter Dev announced that the Twitter API would no longer be free in an ongoing effort to reduce the number of bots on the platform.

While speculation looms about a loss in Twitter ad revenue, the Wall Street Journal reported a “fire-sale” Super Bowl offer from Musk to win back advertisers.

The latest data from DataReportal shows a positive trend for Twitter advertisers. Ad reach has increased from 436.4 million users in January 2022 to 556 million in January 2023.

Twitter is also the third most popular social network based on monthly unique visitors and page views globally, according to SimilarWeb data through December 2022.


Featured Image: Ascannio/Shutterstock



Source link

Continue Reading

SEO

AI Content Detection Software: Can They Detect ChatGPT?

Published

on

AI Content Detection Software: Can They Detect ChatGPT?

We live in an age when AI technologies are booming, and the world has been taken by storm with the introduction of ChatGPT.

ChatGPT is capable of accomplishing a wide range of tasks, but one that it does particularly well is writing articles. And while there are many obvious benefits to this, it also presents a number of challenges.

In my opinion, the biggest hurdle that AI-generated written content poses for the publishing industry is the spread of misinformation.

ChatGPT, or any other AI tool, may generate articles that may contain factual errors or are just flat-out incorrect.

Imagine someone who has no expertise in medicine starting a medical blog and using ChatGPT to write content for their articles.

Their content may contain errors that can only be identified by professional doctors. And if that blog content starts spreading over social media, or maybe even ranks in Search, it could cause harm to people who read it and take erroneous medical advice.

Another potential challenge ChatGPT poses is how students might leverage it within their written work.

If one can write an essay just by running a prompt (and without having to do any actual work), that greatly diminishes the quality of education – as learning about a subject and expressing your own ideas is key to essay writing.

Even before the introduction of ChatGPT, many publishers were already generating content using AI. And while some honestly disclose it, others may not.

Also, Google recently changed its wording regarding AI-generated content, so that it is not necessarily against the company’s guidelines.

Image from Twitter, November 2022

This is why I decided to try out existing tools to understand where the tech industry is when it comes to detecting content generated by ChatGPT, or AI generally.

I ran the following prompts in ChatGPT to generate written content and then ran those answers through different detection tools.

  • “What is local SEO? Why it is important? Best practices of Local SEO.”
  • “Write an essay about Napoleon Bonaparte invasion of Egypt.”
  • “What are the main differences between iPhone and Samsung galaxy?”

Here is how each tool performed.

1. Writer.com

For the first prompt’s answer, Writer.com fails, identifying ChatGPT’s content as 94% human-generated.

Writer.com resultsScreenshot from writer.com, January 2023

For the second prompt, it worked and detected it as AI-written content.

Writer.com test resultScreenshot from writer.com, January 2023

For the third prompt, it failed again.

Sample ResultScreenshot from writer.com, January 2023

However, when I tested real human-written text, Writer.com did identify it as 100% human-generated very accurately.

2. Copyleaks

Copyleaks did a great job in detecting all three prompts as AI-written.

Sample ResultScreenshot from Copyleaks, January 2023

3. Contentatscale.ai

Contentatscale.ai did a great job in detecting all three prompts as AI-written, even though the first prompt, it gave a 21% human score.

Contentscale.aiScreenshot from Contentscale.ai, January 2023

4. Originality.ai

Originality.ai did a great job on all three prompts, accurately detecting them as AI-written.

Also, when I checked with real human-written text, it did identify it as 100% human-generated, which is essential.

Originality.aiScreenshot from Originality.ai, January 2023

You will notice that Originality.ai doesn’t detect any plagiarism issues. This may change in the future.

Over time, people will use the same prompts to generate AI-written content, likely resulting in a number of very similar answers. When these articles are published, they will then be detected by plagiarism tools.

5. GPTZero

This non-commercial tool was built by Edward Tian, and specifically designed to detect ChatGPT-generated articles. And it did just that for all three prompts, recognizing them as AI-generated.

GPTZeroScreenshot from GPTZero, January 2023

Unlike other tools, it gives a more detailed analysis of detected issues, such as sentence-by-sentence analyses.

sentence by sentence text perplexityScreenshot from GPTZero, January 2023

OpenAI’s AI Text Classifier

And finally, let’s see how OpenAi detects its own generated answers.

For the 1st and 3rd prompts, it detected that there is an AI involved by classifying it as “possibly-AI generated”.

AI Text Classifier. Likely AI-generatedAI Text Classifier. Likely AI-generated

But surprisingly, it failed for the 2nd prompt and classified that as “unlikely AI-generated.” I did play with different prompts and found that, as of the moment, when checking it, few of the above tools detect AI content with higher accuracy than OpenAi’s own tool.

AI Text Classifier. Unlikely AI-generatedAI Text Classifier. Unlikely AI-generated

As of the time of this check, they had released it a day before. I think in the future, they will fine tune it, and it will work much better.

Conclusion

Current AI content generation tools are in good shape and are able to detect ChatGPT-generated content (with varying degrees of success).

It is still possible for someone to generate copy via ChatGPT and then paraphrase that to make it undetectable, but that might require almost as much work as writing from scratch – so the benefits aren’t as immediate.

If you think about ranking an article in Google written by ChatGPT, consider for a moment: If the tools we looked at above were able to recognize them as AI-generated, then for Google, detecting them should be a piece of cake.

On top of that, Google has quality raters who will train their system to recognize AI-written articles even better by manually marking them as they find them.

So, my advice would be not to build your content strategy on ChatGPT-generated content, but use it merely as an assistant tool.

More resources: 


Featured Image: /Shutterstock



Source link

Continue Reading

SEO

Five things you need to know about content optimization in 2023

Published

on

5 Things You Need To Know About Optimizing Content in 2023

30-second summary:

  • As the content battleground goes through tremendous upheaval, SEO insights will continue to grow in importance
  • ChatGPT can help content marketers get an edge over their competition by efficiently creating and editing high-quality content
  • Making sure your content rank high enough to engage the target audience requires strategic planning and implementation

Google is constantly testing and updating its algorithms in pursuit of the best possible searcher experience. As the search giant explains in its ‘How Search Works’ documentation, that means understanding the intent behind the query and bringing back results that are relevant, high-quality, and accessible for consumers.

As if the constantly shifting search landscape weren’t difficult enough to navigate, content marketers are also contending with an increasingly technology-charged environment. Competitors are upping the stakes with tools and platforms that generate smarter, real-time insights and even make content optimization and personalization on the fly based on audience behavior, location, and data points.

Set-it-and-forget-it content optimization is a thing of the past. Here’s what you need to know to help your content get found, engage your target audience, and convert searchers to customers in 2023.

AI automation going to be integral for content optimization

Technologies-B2B-organizations-use-to-optimize-content

As the content battleground heats up, SEO insights will continue to grow in importance as a key source of intelligence. We’re optimizing content for humans, not search engines, after all – we had better have a solid understanding of what those people need and want.

While I do not advocate automation for full content creation, I believe next year – as resources become stretched automation will have a bigger impact on helping with content optimization of existing content.

CHATGPT

ChatGPT, developed by OpenAI, is a powerful language generation model that leverages the Generative Pre-trained Transformer (GPT) architecture to produce realistic human-like text. With Chat GPT’s wide range of capabilities – from completing sentences and answering questions to generating content ideas or powering research initiatives – it can be an invaluable asset for any Natural Language Processing project.

ChatGPT-for-content

The introduction on ChatGPT has caused considerable debate and explosive amounts of content on the web. With ChatGPT, content marketers can achieve an extra edge over their competition by efficiently creating and editing high-quality content. It offers assistance with generating titles for blog posts, summaries of topics or articles, as well as comprehensive campaigns when targeting a specific audience.

However, it is important to remember that this technology should be used to enhance human creativity rather than completely replacing it.

For many years now AI-powered technology has been helping content marketers and SEOs automate repetitive tasks such as data analysis, scanning for technical issues, and reporting, but that’s just the tip of the iceberg. AI also enables real-time analysis of a greater volume of consumer touchpoints and behavioral data points for smarter, more precise predictive analysis, opportunity forecasting, real-time content recommendations, and more.

With so much data in play and recession concerns already impacting 2023 budgets in many organizations, content marketers will have to do more with less this coming year. You’ll need to carefully balance human creative resources with AI assists where they make sense to stay flexible, agile, and ready to respond to the market.

It’s time to look at your body of content as a whole

Google’s Helpful Content update, which rolled out in August, is a sitewide signal targeting a high proportion of thin, unhelpful, low-quality content. That means the exceptional content on your site won’t rank to their greatest potential if they’re lost in a sea of mediocre, outdated assets.

It might be time for a content reboot – but don’t get carried away. Before you start unpublishing and redirecting blog posts, lean on technology for automated site auditing and see what you can fix up first. AI-assisted technology can help sniff out on-page elements, including page titles and H1 tags, and off-page factors like page speed, redirects, and 404 errors that can support your content refreshing strategy.

Focus on your highest trafficked and most visible pages first, i.e.: those linked from the homepage or main menu. Google’s John Mueller confirmed recently that if the important pages on your website are low quality, it’s bad news for the entire site. There’s no percentage by which this is measured, he said, urging content marketers and SEOs to instead think of what the average user would think when they visit your website.

Take advantage of location-based content optimization opportunities

Consumers crave personalized experiences, and location is your low-hanging fruit. Seasonal weather trends, local events, and holidays all impact your search traffic in various ways and present opportunities for location-based optimization.

AI-assisted technology can help you discover these opportunities and evaluate topical keywords at scale so you can plan content campaigns and promotions that tap into this increased demand when it’s happening.

Make the best possible use of content created for locally relevant campaigns by repurposing and promoting it across your website, local landing pages, social media profiles, and Google Business Profiles for each location. Google Posts, for example, are a fantastic and underutilized tool for enhancing your content’s visibility and interactivity right on the search results page.

Optimize content with conversational & high-volume keywords

Look for conversational and trending terms in your keyword research, too. Top-of-funnel keywords that help generate awareness of the topic and spur conversations in social channels offer great opportunities for promotion. Use hashtags organically and target them in paid content promotion campaigns to dramatically expand your audience.

Conversational keywords are a good opportunity for enhancing that content’s visibility in search, too. Check out the ‘People Also Ask’ results and other featured snippets available on the search results page (SERP) for your keyword terms. Incorporate questions and answers in your content to naturally optimize for these and voice search queries.

SEO-and-creating-content-in-2023

It’s important that you utilize SEO insights and real-time data correctly; you don’t want to be targeting what was trending last month and is already over. AI is a great assist here, as well, as an intelligent tool can be scanning and analyzing constantly, sending recommendations for new content opportunities as they arise.

Consider how you optimize content based on intent and experience

The best content comes from a deep, meaningful understanding of the searcher’s intent. What problem were they experiencing or what need did they have that caused them to seek out your content in the first place? And how does your blog post, ebook, or landing page copy enhance their experience?

Look at the search results page as a doorway to your “home”. How’s your curb appeal? What do potential customers see when they encounter one of your pages in search results? What kind of experience do you offer when they step over the threshold and click through to your website?

The best content meets visitors where they are at with relevant, high-quality information presented in a way that is accessible, fast loading, and easy to digest. This is the case for both short and long form SEO content. Ensure your content contains calls to action designed to give people options and help them discover the next step in their journey versus attempting to sell them on something they may not be ready for yet.

2023, the year of SEO: why brands are leaning in and how to prepare

Conclusion

The audience is king, queen, and the entire court as we head into 2023. SEO and content marketing give you countless opportunities to connect with these people but remember they are a means to an end. Keep searcher intent and audience needs at the heart of every piece of content you create and campaign you plan for the coming year.

Source link

Continue Reading

Trending

en_USEnglish