Connect with us

SEO

9 Aspects That Might Surprise You

Published

on

9 Aspects That Might Surprise You

Building links to English content has been covered a million times. But what do we know about link building in non-English markets? A lot of that knowledge is often kept locally.

There could be significant differences from what you’re used to. Success in one market doesn’t guarantee the same in another. Moreover, you’ll often run into link opportunities that violate Google’s guidelines, enticing you to overlook ethical link building practices to deliver results.

In this article, we’ll go through nine aspects of international link building with the help of four SEO and link building experts who collectively share insights from more than 20 global markets:

  1. Anna Podruczna – Link building team lead at eVisions International, with insights across Europe
  2. Andrew Prasatya – Head of content marketing at RevoU, with insights from Southeast Asia focusing on Indonesia
  3. Aditya Mishra – SEO and link building consultant, with insights from India
  4. Sebastian Galanternik – SEO manager at Crehana, with insights from South America focusing on Argentina

Huge thanks to everyone involved.

And we can start with the most controversial tactics right away…

1. Buying links is very common

Let’s be honest—link buying is in the arsenal of many link builders. Aira’s annual survey reports that 31% of SEOs buy links. This number can jump to a whopping 74% for link builders, as shown in a recent survey by Authority Hacker.

That’s a huge number for English websites, but I’m convinced that this number is even higher for non-English content. Let me explain.

The fewer link prospecting opportunities there are, the less creative you can be when building links. You’ll waste resources trying to create fantastic link bait or launch ambitious PR campaigns in most niches in many smaller markets. Link buying often becomes an attractive (yet risky) tactic when your options get limited.

We gathered multiple insights on the topic of link buying across different markets:

Most bought links aren’t tagged as “sponsored” or “nofollow”

A compulsory reminder first:

Google considers buying or selling links for ranking purposes a link spam technique. In other words, the only acceptable form of link buying in the eyes of Google is when the link is clearly disclosed with a “nofollow” or “sponsored” link attribute.

But that’s different from what most link builders do. All contributors share this experience, and some of that goes against Google’s guidelines even more.

For example, Anna shared that if you’re negotiating link buying in Poland, you sometimes have to pay more for a “followed” link (i.e., without link attributes). I’ve heard similar stories from others, but it’s unclear how widespread the practice is. 

That said, it makes sense, as passing link equity is one of the main reasons we build links. On the other hand, it’s quite a paradox that you have to pay extra to get a link you could get punished for by Google.

The average price per link is over €100 across all monitored countries in Europe

eVisions International, the agency where Anna works, is focused on expansion across many European markets. It keeps track of all the link offers and deals, so we were able to extract the average price per link in comparable niches for each country where it has a solid sample size (100+ links): 

Average price of links with comparable quality across Europe

What surprised me is that there isn’t a single country with an average price below €100, considering the large economic disparity across European countries. For comparison, Authority Hacker reports an average price of $83, and our survey from a few years back resulted in a much higher $361 average cost.

Sidenote.

Different methodologies could play a huge role in the cost disparity. Neither of the cited surveys above segmented link cost by country. Mark Webster from Authority Hacker said he didn’t collect country/market data but that most of his respondents likely operate in English-speaking markets. Our study was focused on links in English only.

Regarding specific countries, Austria in second place stood out the most to me, as it’s not very expensive. Anna explained that it’s due mainly to the limited number of suitable Austrian websites you can build links from. So German websites will likely be cheaper if you only care about links from German content.

Reselling links can be a common business practice

Aditya said that reselling links is popular in India. You have networks of link sellers with access to different websites for potential link placements. The same links can cost you more or less based on the reseller.

This overlaps with the everlasting use of PBNs, which are still popular in India.

Needless to say, websites involved in such unsophisticated link schemes can easily be the first target of link spam updates and SpamBrain.

Negotiating prices can make or break the deal 

Some cultures thrive on negotiating prices in all aspects of their lives. Others are on the completely opposite side with one definitive fixed price.

Some of this is also reflected in the process of buying links. According to Anna, negotiations are expected in some regions, such as Poland and the Balkans. Still, more often than not, the price is the price, and there is little flexibility. You can make or break the deal with the same response in different countries.

2. Guest posting is often about buying the placement

Guest posting is one of the most popular link building tactics worldwide.

Most popular link building tactics
Data from Authority Hacker’s survey.

What makes this different in smaller markets is the popularity of paid guest posting opportunities. Aditya said that running an outreach campaign with “paid post” in the subject line in India can easily achieve a 20%+ conversion rate, including from established high-authority websites.

I’m also regularly seeing such opportunities in Czech/Slovak SEO groups, and other contributors reported that it’s common in their markets, too.

As expected, these paid article insertions aren’t usually tagged as sponsored, allowing link equity to pass (which is against Google’s guidelines).

Free guest posting then falls into two categories:

  • You must show that you’re credible and authoritative enough in the niche to get the opportunity (well, that’s best practice anywhere in the world).
  • You post content on websites that allow UGC. Andrew mentioned this is popular in Indonesia, especially on bigger media websites where it’s otherwise difficult to land coverage. The links are usually nofollowed, but the exposure and potential SEO benefits (think E-A-T, “nofollow” being a hint, etc.) are worth it.
Guest posting example
A UGC example from one of the biggest Indonesian media sites, Kumparan, translated in Chrome. Ahrefs’ SEO Toolbar shows all the outgoing links.

3. Large websites are often selling “media packages”

Continuing on the topic of link and guest post buying, you may also come across “media packages.” It’s something large websites offer to close bigger deals and properly process selling their website space to other publishers.

Anna shared that this is common practice across most European countries. Aditya also reported a similar experience with India and explained that most of these packages come with strict terms and conditions, such as:

  • The link will be live for one year only.
  • The post will be clearly tagged as sponsored.
  • The link will be “nofollow.”
  • The article won’t be displayed on the homepage or the blog category page.
  • Payment must be made in advance.
  • Turnaround time will be more than a month.

As you can see, this is now perfectly “white hat.” Most links and articles you’d get this way would be tagged, according to Google’s guidelines.

4. Building relationships in your niche is a big leverage

Your success in link building largely depends on your communication and networking skills. Getting links and coverage gets much easier if you’ve built relationships with publishers and journalists in your niche and beyond.

This is often overlooked in the world of English content. Where do you even begin when there are so many websites, bloggers, and journalists? This is where the limited number of link prospects in smaller markets becomes an advantage.

Let’s take a look at a few implications.

You can easily engage in more sophisticated link exchanges

Reciprocal linking is an outdated tactic that’s easy to spot for both link builders and search engines. But link exchanges are far from dead, and they can be highly effective if done well.

Three-way link exchanges (also known as ABC link exchanges) are thriving in many places across the world.

Three-way link exchange

For example, Sebastian shared that many website owners in Argentina have multiple websites and that this type of link exchange is common practice. Aditya claimed that webmasters in India often have a second website for this purpose as well.

But there are still some markets where this is relatively unheard of. Anna listed the example of Balkan countries.

Keep in mind that this tactic is considered a link scheme and violates Google’s guidelines.

Building personal relationships is your best bet to land top-tier links

Andrew has experienced massive success by building relationships with journalists and editors in Indonesia. He claimed that knowing senior people from top-tier media websites helped him pitch stories that got the required attention to land the coverage and links.

You can attend industry events these people go to, but Andrew suggested that doing media visits is the most straightforward way to get the results.

Referring domains report, via Ahrefs' Site Explorer
Referring domains to RevoU where Andrew works. He landed “followed” links from the biggest media outlets in Indonesia. Data from the Referring domains report in Ahrefs’ Site Explorer.

Hiring local link building experts is highly recommended

Even though Anna achieved link building success while venturing into new markets, she recommended hiring local experts with valuable connections. This value proposition alone likely beats any skill you have as a link builder. She even tested this in the Balkans and had significantly lower success rates than the locals.

5. It’s a miracle to get a “followed” link in some countries

We’ve likely passed the period of some big media houses exclusively using the “nofollow” attribute for all external links. I believe the increasing use of “nofollow” for the sake of being on the safe side pushed Google to expand on the link attributes with “sponsored” and “ugc” and proclaim them all as hints rather than directives.

However, avoiding the risk of passing link equity where you shouldn’t is on a whole different level in some countries, namely Germany and Austria, in our sample.

Many German websites were hit by a link penalty in 2014, which profoundly affected the local (and Austrian) market. Anna reported, still, that too many websites in these two markets only link with “nofollow” links up to this day. 

6. TLDs can play a big role in your link building success

Your TLD choice can impact how your target audience perceives your website and brand. But Sebastian mentioned that this also affects how well your outreach is received.

In South America, having a gTLD instead of ccTLD is important when building links beyond your home country. Many large Spanish sites in Argentina are on a .com domain, making their link building efforts easier.

For example, a Peruvian website will be likelier to link to content on example.com than on example.ar. Argentinian links may still be more important in some cases, but the gTLD opens up many more opportunities.

Referring domains report, via Ahrefs' Site Explorer
A sneak peek into the link profile of infobae.com, which attracts links across Latin America. It was originally an Argentinian online newspaper that now operates internationally.

If you add to that the benefits of PageRank consolidation and easier management and scalability, you have a pretty strong argument for using gTLDs for international SEO.

7. Take language and cultural specifics into account

Doing localized outreach should have a higher success rate than sending people emails in foreign languages. That much is obvious. English should be only used when:

  • It’s a common language in that country. Aditya mostly writes outreach emails to Indian websites in English because there are many local languages that aren’t mutually intelligible.
  • You’re making a deal with media houses or agencies.
  • You know that the other side is OK with using English.

But using the correct language is just a start for proper outreach localization. You should also adjust to local communication styles and some cultural specifics. This mainly translates to how formal the email is.

In Europe, Slavic cultures tend to use the most formal communication, but even that has its differences. Anna highlighted the Balkans as an area where you have to be formal. In contrast, you can get away with being a bit more casual in the Czech Republic or Slovakia.

This can get even more complicated in countries like Japan or South Korea, where you have complex honorific systems. A sentence said to a friend could be entirely different from the one you’d say to an older journalist you weren’t yet acquainted with—even if it had the same meaning.

On the other hand, there are also a lot of countries where informal and casual conversations win. This can be generally applied to Roman languages. Anna shared her experience of having friendly and casual conversations—especially with Italian webmasters and editors.

Recommendation

Even if you know a language well, you should familiarize yourself with local terms to achieve the desired effect. Different spellings and choices of words make even English distinctive across different English-speaking countries. And it can get even more complicated than that with other languages (like Spanish, Portuguese, or Chinese).

8. SEO knowledge can vastly differ depending on the country

Being on the same page as the outreach recipient is another key to success. Try to offer a “40+ DR 3W exchange” to the average Joe on the internet, and they’ll likely label it spam immediately.

Based on my conversations with contributors, there are some advanced markets where many webmasters are well-versed in link building, such as Poland, the Czech Republic, and Slovakia. 

Many other markets seem to fall into the “somewhat knowledgeable” bracket. We can name Italy, India, and Argentina here. You should still be fine using SEO terms here and there from the start.

And lastly, there’s the “lack of link building” knowledge bracket where Anna will generally put the Balkans and Germany. Here, you shouldn’t assume that the person receiving your outreach email has any solid knowledge of SEO.

9. Email isn’t necessarily the best medium

Almost every guide about outreach is focused on writing emails. There’s nothing wrong with this; it’s many people’s preferred communication style. But that doesn’t apply everywhere in the world.

Andrew shared that while email is still good for first-time introductions in Indonesia, it’s much more effective to use apps and platforms like WhatsApp later on.

In other Southeast Asian countries like Thailand or Vietnam, Andrew added that email isn’t really working at all. He recommended using social media like LinkedIn or, even better, Facebook Messenger to pitch your content.

I have a similar experience in the Chinese market. Most people I got in touch with preferred to communicate over WeChat. That often involved exchanging voice messages, bringing me to the last point.

Calling someone as a part of your outreach process could get you far ahead of the competition. Anna said that leaving your phone contact so the other side can get in touch with you on a more personal level works well in Germany. She also reported success doing all aspects of her outreach over the phone in Poland. I know Czech link builders who also got links this way.

Final thoughts

As you can see, the path to link building success and what success even means locally can vastly differ from country to country.

There are also a lot of similarities to the best link building practices you already know, naturally.

Great link bait still makes a fantastic asset that can land you links in top-tier media and websites. But getting the buy-in for carrying this out can be difficult in many niches, as the number of link prospects tends to be quite limited. Another tactic that seems to work well everywhere is ego baiting.

On the other hand, there are still a lot of people using outdated bad tactics. These include mass outreach to buy links with specific anchor texts, spamming UGC links, doing unsophisticated link exchanges, etc.

Got any questions or insights to share? Let me know on Twitter.



Source link

Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address

SEO

Google Warns Against Over-Reliance On SEO Tool Metrics

Published

on

By

Google Warns Against Over-Reliance On SEO Tool Metrics

In a recent discussion on Reddit’s r/SEO forum, Google’s Search Advocate, John Mueller, cautioned against relying too heavily on third-party SEO metrics.

His comments came in response to a person’s concerns about dramatic changes in tool measurements and their perceived impact on search performance.

The conversation was sparked by a website owner who reported the following series of events:

  1. A 50% drop in their website’s Domain Authority (DA) score.
  2. A surge in spam backlinks, with 75% of all their website’s links acquired in the current year.
  3. An increase in spam comments, averaging 30 per day on a site receiving about 150 daily visits.
  4. A discrepancy between backlink data shown in different SEO tools.

The owner, who claimed never to have purchased links, is concerned about the impact of these spammy links on their site’s performance.

Mueller’s Perspective On Third-Party Metrics

Mueller addressed these concerns by highlighting the limitations of third-party SEO tools and their metrics.

He stated:

“Many SEO tools have their own metrics that are tempting to optimize for (because you see a number), but ultimately, there’s no shortcut.”

He cautioned against implementing quick fixes based on these metrics, describing many of these tactics as “smoke & mirrors.”

Mueller highlighted a crucial point: the metrics provided by SEO tools don’t directly correlate with how search engines evaluate websites.

He noted that actions like using disavow files don’t affect metrics from SEO tools, as these companies don’t have access to Google data.

This highlights the need to understand the sources and limitations of SEO tool data. Their metrics aren’t direct indicators of search engine rankings.

What To Focus On? Value, Not Numbers

Mueller suggested a holistic SEO approach, prioritizing unique value over specific metrics like Domain Authority or spam scores.

He advised:

“If you want to think about the long term, finding ways to add real value that’s unique and wanted by people on the web (together with all the usual SEO best practices as a foundation) is a good target.”

However, Mueller acknowledged that creating unique content isn’t easy, adding:

“Unique doesn’t mean a unique combination of words, but really something that nobody else is providing, and ideally, that others can’t easily provide themselves.

It’s hard, it takes a lot of work, and it can take a lot of time. If it were fast & easy, others would be – and probably are already – doing it and have more practice at it.”

Mueller’s insights encourage us to focus on what really matters: strategies that put users first.

This helps align content with Google’s goals and create lasting benefits.

Key Takeaways

  1. While potentially useful, third-party SEO metrics shouldn’t be the primary focus of optimization efforts.
  2. Dramatic changes in these metrics don’t reflect changes in how search engines view your site.
  3. Focus on creating unique content rather than chasing tool-based metrics.
  4. Understand the limitations and sources of SEO tool data

Featured Image: JHVEPhoto/Shutterstock

Source link

Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address
Continue Reading

SEO

A Guide To Robots.txt: Best Practices For SEO

Published

on

By

A Guide To Robots.txt: Best Practices For SEO

Understanding how to use the robots.txt file is crucial for any website’s SEO strategy. Mistakes in this file can impact how your website is crawled and your pages’ search appearance. Getting it right, on the other hand, can improve crawling efficiency and mitigate crawling issues.

Google recently reminded website owners about the importance of using robots.txt to block unnecessary URLs.

Those include add-to-cart, login, or checkout pages. But the question is – how do you use it properly?

In this article, we will guide you into every nuance of how to do just so.

What Is Robots.txt?

The robots.txt is a simple text file that sits in the root directory of your site and tells crawlers what should be crawled.

The table below provides a quick reference to the key robots.txt directives.

Directive Description
User-agent Specifies which crawler the rules apply to. See user agent tokens. Using * targets all crawlers.
Disallow Prevents specified URLs from being crawled.
Allow Allows specific URLs to be crawled, even if a parent directory is disallowed.
Sitemap Indicates the location of your XML Sitemap by helping search engines to discover it.

This is an example of robot.txt from ikea.com with multiple rules.

Example of robots.txt from ikea.com

Note that robots.txt doesn’t support full regular expressions and only has two wildcards:

  • Asterisks (*), which matches 0 or more sequences of characters.
  • Dollar sign ($), which matches the end of a URL.

Also, note that its rules are case-sensitive, e.g., “filter=” isn’t equal to “Filter=.”

Order Of Precedence In Robots.txt

When setting up a robots.txt file, it’s important to know the order in which search engines decide which rules to apply in case of conflicting rules.

They follow these two key rules:

1. Most Specific Rule

The rule that matches more characters in the URL will be applied. For example:

User-agent: *
Disallow: /downloads/
Allow: /downloads/free/

In this case, the “Allow: /downloads/free/” rule is more specific than “Disallow: /downloads/” because it targets a subdirectory.

Google will allow crawling of subfolder “/downloads/free/” but block everything else under “/downloads/.”

2. Least Restrictive Rule

When multiple rules are equally specific, for example:

User-agent: *
Disallow: /downloads/
Allow: /downloads/

Google will choose the least restrictive one. This means Google will allow access to /downloads/.

Why Is Robots.txt Important In SEO?

Blocking unimportant pages with robots.txt helps Googlebot focus its crawl budget on valuable parts of the website and on crawling new pages. It also helps search engines save computing power, contributing to better sustainability.

Imagine you have an online store with hundreds of thousands of pages. There are sections of websites like filtered pages that may have an infinite number of versions.

Those pages don’t have unique value, essentially contain duplicate content, and may create infinite crawl space, thus wasting your server and Googlebot’s resources.

That is where robots.txt comes in, preventing search engine bots from crawling those pages.

If you don’t do that, Google may try to crawl an infinite number of URLs with different (even non-existent) search parameter values, causing spikes and a waste of crawl budget.

When To Use Robots.txt

As a general rule, you should always ask why certain pages exist, and whether they have anything worth for search engines to crawl and index.

If we come from this principle, certainly, we should always block:

  • URLs that contain query parameters such as:
    • Internal search.
    • Faceted navigation URLs created by filtering or sorting options if they are not part of URL structure and SEO strategy.
    • Action URLs like add to wishlist or add to cart.
  • Private parts of the website, like login pages.
  • JavaScript files not relevant to website content or rendering, such as tracking scripts.
  • Blocking scrapers and AI chatbots to prevent them from using your content for their training purposes.

Let’s dive into examples of how you can use robots.txt for each case.

1. Block Internal Search Pages

The most common and absolutely necessary step is to block internal search URLs from being crawled by Google and other search engines, as almost every website has an internal search functionality.

On WordPress websites, it is usually an “s” parameter, and the URL looks like this:

https://www.example.com/?s=google

Gary Illyes from Google has repeatedly warned to block “action” URLs as they can cause Googlebot to crawl them indefinitely even non-existent URLs with different combinations.

Here is the rule you can use in your robots.txt to block such URLs from being crawled:

User-agent: *
Disallow: *s=*
  1. The User-agent: * line specifies that the rule applies to all web crawlers, including Googlebot, Bingbot, etc.
  2. The Disallow: *s=* line tells all crawlers not to crawl any URLs that contain the query parameter “s=.” The wildcard “*” means it can match any sequence of characters before or after “s= .” However, it will not match URLs with uppercase “S” like “/?S=” since it is case-sensitive.

Here is an example of a website that managed to drastically reduce the crawling of non-existent internal search URLs after blocking them via robots.txt.

Screenshot from crawl stats reportScreenshot from crawl stats report

Note that Google may index those blocked pages, but you don’t need to worry about them as they will be dropped over time.

2. Block Faceted Navigation URLs

Faceted navigation is an integral part of every ecommerce website. There can be cases where faceted navigation is part of an SEO strategy and aimed at ranking for general product searches.

For example, Zalando uses faceted navigation URLs for color options to rank for general product keywords like “gray t-shirt.”

However, in most cases, this is not the case, and filter parameters are used merely for filtering products, creating dozens of pages with duplicate content.

Technically, those parameters are not different from internal search parameters with one difference as there may be multiple parameters. You need to make sure you disallow all of them.

For example, if you have filters with the following parameters “sortby,” “color,” and “price,” you may use this set of rules:

User-agent: *
Disallow: *sortby=*
Disallow: *color=*
Disallow: *price=*

Based on your specific case, there may be more parameters, and you may need to add all of them.

What About UTM Parameters?

UTM parameters are used for tracking purposes.

As John Mueller stated in his Reddit post, you don’t need to worry about URL parameters that link to your pages externally.

John Mueller on UTM parametersJohn Mueller on UTM parameters

Just make sure to block any random parameters you use internally and avoid linking internally to those pages, e.g., linking from your article pages to your search page with a search query page “https://www.example.com/?s=google.”

3. Block PDF URLs

Let’s say you have a lot of PDF documents, such as product guides, brochures, or downloadable papers, and you don’t want them crawled.

Here is a simple robots.txt rule that will block search engine bots from accessing those documents:

User-agent: *
Disallow: /*.pdf$

The “Disallow: /*.pdf$” line tells crawlers not to crawl any URLs that end with .pdf.

By using /*, the rule matches any path on the website. As a result, any URL ending with .pdf will be blocked from crawling.

If you have a WordPress website and want to disallow PDFs from the uploads directory where you upload them via the CMS, you can use the following rule:

User-agent: *
Disallow: /wp-content/uploads/*.pdf$
Allow: /wp-content/uploads/2024/09/allowed-document.pdf$

You can see that we have conflicting rules here.

In case of conflicting rules, the more specific one takes priority, which means the last line ensures that only the specific file located in folder “wp-content/uploads/2024/09/allowed-document.pdf” is allowed to be crawled.

4. Block A Directory

Let’s say you have an API endpoint where you submit your data from the form. It is likely your form has an action attribute like action=”/form/submissions/.”

The issue is that Google will try to crawl that URL, /form/submissions/, which you likely don’t want. You can block these URLs from being crawled with this rule:

User-agent: *
Disallow: /form/

By specifying a directory in the Disallow rule, you are telling the crawlers to avoid crawling all pages under that directory, and you don’t need to use the (*) wildcard anymore, like “/form/*.”

Note that you must always specify relative paths and never absolute URLs, like “https://www.example.com/form/” for Disallow and Allow directives.

Be cautious to avoid malformed rules. For example, using /form without a trailing slash will also match a page /form-design-examples/, which may be a page on your blog that you want to index.

Read: 8 Common Robots.txt Issues And How To Fix Them

5. Block User Account URLs

If you have an ecommerce website, you likely have directories that start with “/myaccount/,” such as “/myaccount/orders/” or “/myaccount/profile/.”

With the top page “/myaccount/” being a sign-in page that you want to be indexed and found by users in search, you may want to disallow the subpages from being crawled by Googlebot.

You can use the Disallow rule in combination with the Allow rule to block everything under the “/myaccount/” directory (except the /myaccount/ page).

User-agent: *
Disallow: /myaccount/
Allow: /myaccount/$


And again, since Google uses the most specific rule, it will disallow everything under the /myaccount/ directory but allow only the /myaccount/ page to be crawled.

Here’s another use case of combining the Disallow and Allow rules: in case you have your search under the /search/ directory and want it to be found and indexed but block actual search URLs:

User-agent: *
Disallow: /search/
Allow: /search/$

6. Block Non-Render Related JavaScript Files

Every website uses JavaScript, and many of these scripts are not related to the rendering of content, such as tracking scripts or those used for loading AdSense.

Googlebot can crawl and render a website’s content without these scripts. Therefore, blocking them is safe and recommended, as it saves requests and resources to fetch and parse them.

Below is a sample line that is disallowing sample JavaScript, which contains tracking pixels.

User-agent: *
Disallow: /assets/js/pixels.js

7. Block AI Chatbots And Scrapers

Many publishers are concerned that their content is being unfairly used to train AI models without their consent, and they wish to prevent this.

#ai chatbots
User-agent: GPTBot
User-agent: ChatGPT-User
User-agent: Claude-Web
User-agent: ClaudeBot
User-agent: anthropic-ai
User-agent: cohere-ai
User-agent: Bytespider
User-agent: Google-Extended
User-Agent: PerplexityBot
User-agent: Applebot-Extended
User-agent: Diffbot
User-agent: PerplexityBot
Disallow: /
#scrapers
User-agent: Scrapy
User-agent: magpie-crawler
User-agent: CCBot
User-Agent: omgili
User-Agent: omgilibot
User-agent: Node/simplecrawler
Disallow: /

Here, each user agent is listed individually, and the rule Disallow: / tells those bots not to crawl any part of the site.

This, besides preventing AI training on your content, can help reduce the load on your server by minimizing unnecessary crawling.

For ideas on which bots to block, you may want to check your server log files to see which crawlers are exhausting your servers, and remember, robots.txt doesn’t prevent unauthorized access.

8. Specify Sitemaps URLs

Including your sitemap URL in the robots.txt file helps search engines easily discover all the important pages on your website. This is done by adding a specific line that points to your sitemap location, and you can specify multiple sitemaps, each on its own line.

Sitemap: https://www.example.com/sitemap/articles.xml
Sitemap: https://www.example.com/sitemap/news.xml
Sitemap: https://www.example.com/sitemap/video.xml

Unlike Allow or Disallow rules, which allow only a relative path, the Sitemap directive requires a full, absolute URL to indicate the location of the sitemap.

Ensure the sitemaps’ URLs are accessible to search engines and have proper syntax to avoid errors.

Sitemap fetch error in search consoleSitemap fetch error in search console

9. When To Use Crawl-Delay

The crawl-delay directive in robots.txt specifies the number of seconds a bot should wait before crawling the next page. While Googlebot does not recognize the crawl-delay directive, other bots may respect it.

It helps prevent server overload by controlling how frequently bots crawl your site.

For example, if you want ClaudeBot to crawl your content for AI training but want to avoid server overload, you can set a crawl delay to manage the interval between requests.

User-agent: ClaudeBot
Crawl-delay: 60

This instructs the ClaudeBot user agent to wait 60 seconds between requests when crawling the website.

Of course, there may be AI bots that don’t respect crawl delay directives. In that case, you may need to use a web firewall to rate limit them.

Troubleshooting Robots.txt

Once you’ve composed your robots.txt, you can use these tools to troubleshoot if the syntax is correct or if you didn’t accidentally block an important URL.

1. Google Search Console Robots.txt Validator

Once you’ve updated your robots.txt, you must check whether it contains any error or accidentally blocks URLs you want to be crawled, such as resources, images, or website sections.

Navigate Settings > robots.txt, and you will find the built-in robots.txt validator. Below is the video of how to fetch and validate your robots.txt.

2. Google Robots.txt Parser

This parser is official Google’s robots.txt parser which is used in Search Console.

It requires advanced skills to install and run on your local computer. But it is highly recommended to take time and do it as instructed on that page because you can validate your changes in the robots.txt file before uploading to your server in line with the official Google parser.

Centralized Robots.txt Management

Each domain and subdomain must have its own robots.txt, as Googlebot doesn’t recognize root domain robots.txt for a subdomain.

It creates challenges when you have a website with a dozen subdomains, as it means you should maintain a bunch of robots.txt files separately.

However, it is possible to host a robots.txt file on a subdomain, such as https://cdn.example.com/robots.txt, and set up a redirect from  https://www.example.com/robots.txt to it.

You can do vice versa and host it only under the root domain and redirect from subdomains to the root.

Search engines will treat the redirected file as if it were located on the root domain. This approach allows centralized management of robots.txt rules for both your main domain and subdomains.

It helps make updates and maintenance more efficient. Otherwise, you would need to use a separate robots.txt file for each subdomain.

Conclusion

A properly optimized robots.txt file is crucial for managing a website’s crawl budget. It ensures that search engines like Googlebot spend their time on valuable pages rather than wasting resources on unnecessary ones.

On the other hand, blocking AI bots and scrapers using robots.txt can significantly reduce server load and save computing resources.

Make sure you always validate your changes to avoid unexpected crawability issues.

However, remember that while blocking unimportant resources via robots.txt may help increase crawl efficiency, the main factors affecting crawl budget are high-quality content and page loading speed.

Happy crawling!

More resources: 


Featured Image: BestForBest/Shutterstock

Source link

Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address
Continue Reading

SEO

Google Search Has A New Boss: Prabhakar Raghavan Steps Down

Published

on

By

Google Search Has A New Boss: Prabhakar Raghavan Steps Down

Google has announced that Prabhakar Raghavan, the executive overseeing the company’s search engine and advertising products, will be stepping down from his current role.

The news came on Thursday in a memo from CEO Sundar Pichai to staff.

Nick Fox To Lead Search & Ads

Taking over Raghavan’s responsibilities will be Nick Fox, a longtime Google executive with experience across various departments.

Fox will now lead the Knowledge & Information team, which includes Google’s Search, Ads, Geo, and Commerce products.

Pichai expressed confidence in Fox’s ability to lead these crucial divisions, noting:

“Throughout his career, Nick has demonstrated leadership across nearly every facet of Knowledge & Information, from Product and Design in Search and Assistant, to our Shopping, Travel, and Payments products.”

Raghavan’s New Role

Raghavan will transition to the newly created position of Chief Technologist.

He will work closely with Pichai and other Google leaders in this role to provide technical direction.

Pichai praised Raghavan’s contributions, stating:

“Prabhakar’s leadership journey at Google has been remarkable, spanning Research, Workspace, Ads, and Knowledge & Information. He led the Gmail team in launching Smart Reply and Smart Compose as early examples of using AI to improve products, and took Gmail and Drive past 1 billion users.”

Past Criticisms

This recent announcement from Google comes in the wake of earlier criticisms leveled at the company’s search division.

In April, an opinion piece from Ed Zitron highlighted concerns about the direction of Google Search under Raghavan’s leadership.

The article cited industry analysts who claimed that Raghavan’s background in advertising, rather than search technology, had led to decisions prioritizing revenue over search quality.

Critics alleged that under Raghavan’s tenure, Google had rolled back key quality improvements to boost engagement metrics and ad revenue.

Internal emails from 2019 were referenced. They described a “Code Yellow” emergency response to lagging search revenues when Raghavan was head of Ads. This reportedly resulted in boosting sites previously downranked for using spam tactics.

Google has disputed many of these claims, maintaining that its advertising systems do not influence organic search results.

More Restructuring

As part of Google’s restructuring:

  1. The Gemini app team, led by Sissie Hsiao, will join Google DeepMind under CEO Demis Hassabis.
  2. Google Assistant teams focused on devices and home experiences will move to the Platforms & Devices division.

Looking Ahead

Fox’s takeover from Raghavan could shake things up at Google.

We may see faster AI rollouts in search and ads, plus more frequent updates. Fox might revisit core search quality, addressing recent criticisms.

Fox might push for quicker adoption of new tech to fend off competitors, especially in AI. He’s also likely to be more savvy about regulatory issues.

It’s important to note that these potential changes are speculative based on the limited information available.

The actual changes in leadership style and priorities will become clearer as Fox settles into his new role.


Featured Image: One Artist/Shutterstock

Source link

Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address
Continue Reading

Trending