Connect with us

SEO

ChatGPT Study Finds Training Data Doesn’t Match Real-World Use

Published

on

ChatGPT Study Finds Training Data Doesn't Match Real-World Use

A study by the Data Provenance Initiative, a collective of independent and academic researchers dedicated to data transparency, reveals a mismatch between ChatGPT’s training data and its typical use cases.

The study, which analyzed 14,000 web domains, found that ChatGPT’s training data primarily consists of news articles, encyclopedias, and social media content.

However, the most common real-world applications of the tool involve creative writing, brainstorming, and seeking explanations.

As the study states,

“Whereas news websites comprise nearly 40% of all tokens… fewer than 1% of ChatGPT queries appear to be related to news or current affairs.”

Diving deeper into usage patterns, the researchers analyzed a dataset called WildChat, containing 1 million user conversations with ChatGPT. They found that over 30% of these conversations involve creative compositions such as fictional story writing or role-playing.

This mismatch suggests that ChatGPT’s performance may vary depending on the specific task and its alignment with the tool’s training data.

Marketers should know that ChatGPT might struggle to generate content based on current events, industry-specific knowledge, or niche topics.

Adapting To ChatGPT’s Strengths & Limitations

Knowing what ChatGPT is trained on can help you align prompts with the tool’s strengths and limitations.

This means you may need to add more context, specify the desired tone and style, and break down complex tasks into smaller steps.

For AI-assisted content creation, leverage ChatGPT for tasks like ideating social posts or email subject lines. Reserve human expertise for complex, industry-specific content.

Use effective prompt engineering to optimize output. Always fact-check and edit AI-generated content to ensure quality.

AI tools can accelerate ideation and content creation but don’t expect perfection. Human review is essential for accuracy, brand consistency, and channel-specific copy.

Looking Ahead

This research highlights the need for marketers to be careful with AI tools like ChatGPT.

Understand what AI can and can’t do and combine it with human expertise. This combo can boost content strategies and help hit KPIs.

As the field evolves, we might see AI tools better tailored to real-world usage patterns.

Until then, remember that it assists but doesn’t replace expert judgment.


Featured Image: Emil Kazaryan/Shutterstock

Source link

Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address

SEO

What Links Should You Build For A Natural Backlink Profile?

Published

on

By

How To Find The Right Long-tail Keywords For Articles

This week’s Ask an SEO column comes from an anonymous asker:

“What should a backlink profile look like, and how do you build good backlinks?”

Great question!

Backlinks are a part of SEO as a way to build trust and authority for your domain, but they’re not as important as link builders claim.

You can rank a website without backlinks. The trick is focusing on your audience and having them create brand demand. This can be equal in weight to backlinks but drives more customers.

Once you are driving demand and have created solid resources, backlinks start occurring naturally. And when you have an active audience built from other channels, you can survey them to create “link worthy” pages that can result in journalists reaching out.

With that said, and when all else is equal, having the trust and authority from a healthy and natural backlink profile can be the deciding factor on who gets into the top positions and who gets no traffic.

A healthy backlink profile is one that appears to be natural.

Search engines, including Google, expect a certain amount of spammy links from directories, website monitoring tools, and even competitors that spam or try to do a negative SEO attack. These are part of a healthy backlink profile.

What is unnatural is when your website or company has done nothing to earn an actual link.

When there is nothing noteworthy, no original thought leadership or studies, or something that goes viral and the media covers, there’s no reason someone would ever have linked to you.

Having backlinks for no reason would likely be considered an unhealthy link profile, especially if they’re mostly dofollow.

Healthy link profiles contain a mix of dofollow, nofollow, sponsored, and mentions from actual users in forums, communities, and social media shares.

Unhealthy backlink profiles are where a website has links from topically irrelevant websites, when the articles have mentions of big brands and “trustworthy” or “high authority” sites, and then randomly feature a smaller company or service provider with them.

It’s an old trick that does not work anymore. Unhealthy link profiles also include private blogger networks (PBNs), link farms, link wheels, link networks, and where the sites have a high domain authority (DA), Authority Score (AS), etc.

Bonus tip: DA, AS, and other metrics are not used by search engines. They are scores that third-party SEO tools created and have absolutely no say when it comes to the quality of a website or backlink.

If someone is telling you high DA is good and Google trusts these sites, they’re selling you snake oil.

Although backlinks are not as important as they used to be, backlinks still matter. So, if you’re looking to build some, here are a few strategies to try, avoid, and tread lightly with.

Scholarship, Grants, And Sponsorships

These don’t work. Google knows you’re offering them to get .edu links, and in rare cases .gov links. And definitely from charities and events.

It’s easy to map back to who paid or bought them, and these likely won’t count for you SEO-wise.

If they make up the majority of your links, you should expect them to be neutralized by the search engines or to get a manual action against your site for unnatural link building in Search Console from Google.

If you’re doing a sponsorship, ask for the website being sponsored to place “sponsored” instead of “nofollow.”

And if you’re doing a scholarship or grant, feature the winner on your site, provide a full education and follow up about them, and have them share their story for the next few years in a monthly or quarterly column on your blog.

If you genuinely want to do good, share their story and progress. Otherwise, it was just for getting backlinks, and that works against you.

Citations And Broken Links

When you get mentions in the media, or a competitor has a naturally occurring link to a study, but it goes to a broken page, this is a good way to build a natural link. Reach out to these sites and ask them to link to your study instead.

You can mention their visitors are currently hitting a dead page if it’s a broken link, and present your study or resource, which is of equal or better value. Or share that yours has been updated where the current source is outdated and no longer applies.

For citations where nobody has a link, try letting the website owner know it saves the user a trip to a search engine to find another answer. And when they have a good experience on the website, they’re likely to come back for more information.

Topically Relevant PR

I’m a big believer in PR to acquire backlinks naturally. But you have to do things that make sense for your business.

  • Local stores and service providers should get links from local news stations, local bloggers, and niche websites in their industry.
  • Service providers need to focus on trade publications, industry-relevant blogs and publications, events, and social networks.
  • Stores will do well with niche and audience-relevant bloggers, communities, publications or media websites, and mass media coverage that is not affiliate links or in an affiliate folder.

Think about what is newsworthy that you can do or provide that these groups would want to cover.

PR and SEO agencies that work with content will be able to provide ideas, then you can choose which ones you like and run with them. Not every campaign will work, but hang in there – the right one will happen.

You can also try surveying your audience for original data points and studies, and then publish them. And that goes to the next tip.

The publications must be topically relevant to you in order to help with SEO and avoid penalties.

If your customers and users are not the reader base of the website or publication, the link and coverage will appear unnatural and you’ll eventually get penalized or a devaluation.

Press Releases

Press release backlinks and syndication backlinks work against you, not for you. But that doesn’t mean they cannot help with link acquisition. For this strategy to work, provide enough data to gauge interest.

Share some of the data points from the study as a teaser and give a way for editors, journalists, and industry professionals to reach out to you.

Don’t charge for the study. But ask them to source and cite the data on your website, or reference your company as the source of the information.

But keep in mind that if your talking points are the same as your competitors, and you have the same type of data, there’s no reason to add another citation or to cover you.

What can you discover and share that hasn’t been covered and will enhance the publication’s articles in a new way? Put yourself in the reader’s shoes and think about what is missing or what questions were not answered.

If comments are enabled on the publications, look for questions and build a resource backed by data that answers them.

You can then reach out to the editors and make a strong case to either add you or create a new post about the new topic since the previous one did well.

Bonus tip: Even if you don’t get a backlink, being cited can go a long way, as you may be able to use the company’s logo in your PR bar as a trust builder. You can also reach out to the PR or brand team and ask for the link using the citation strategy mentioned above.

Blog And Forum Commenting

This does not work. Search engines know that anyone can go and spam these, use a bot, or pay someone to do this.

They will work against you, not for you. Just don’t. Let the communities and site owners link to you naturally.

If your customers are on the blog or in the community, join the community and participate. Use it to acquire an audience and build trust for your brand.

Not for backlinks. The backlinks and community mentions will eventually happen. And this is how they can become natural.

Social Media Profile Links

This does not work because anyone can create an account and get the link.

Links for SEO must be earned. Social media is about building an audience and bringing them to your website.

The backlinks are useless for SEO, with one exception. Some search engines crawl and index accounts.

If you struggle to get crawled, an active social media account that gets crawled and indexed fast may be able to encourage spiders to find your website and pages more easily.

Focus On Being Worth Linking To

There’s no shortage of ways to get backlinks, but not all links are good. If the link can be purchased or acquired by anyone, like a directory, it won’t help you with SEO.

If your customers are not on that website, and the majority of the website isn’t topically relevant to you, chances are the backlink will work against you.

Healthy link profiles have a mix of good and bad, natural and unnatural. If your company hasn’t done or shared anything link-worthy, there are no backlinks that can bring you long-term success.

Focus on being worth linking to, and the backlinks will come naturally.

More resources: 


Featured Image: Paulo Bobita/Search Engine Journal

Source link

Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address
Continue Reading

SEO

Google Warns Against Over-Reliance On SEO Tool Metrics

Published

on

By

Google Warns Against Over-Reliance On SEO Tool Metrics

In a recent discussion on Reddit’s r/SEO forum, Google’s Search Advocate, John Mueller, cautioned against relying too heavily on third-party SEO metrics.

His comments came in response to a person’s concerns about dramatic changes in tool measurements and their perceived impact on search performance.

The conversation was sparked by a website owner who reported the following series of events:

  1. A 50% drop in their website’s Domain Authority (DA) score.
  2. A surge in spam backlinks, with 75% of all their website’s links acquired in the current year.
  3. An increase in spam comments, averaging 30 per day on a site receiving about 150 daily visits.
  4. A discrepancy between backlink data shown in different SEO tools.

The owner, who claimed never to have purchased links, is concerned about the impact of these spammy links on their site’s performance.

Mueller’s Perspective On Third-Party Metrics

Mueller addressed these concerns by highlighting the limitations of third-party SEO tools and their metrics.

He stated:

“Many SEO tools have their own metrics that are tempting to optimize for (because you see a number), but ultimately, there’s no shortcut.”

He cautioned against implementing quick fixes based on these metrics, describing many of these tactics as “smoke & mirrors.”

Mueller highlighted a crucial point: the metrics provided by SEO tools don’t directly correlate with how search engines evaluate websites.

He noted that actions like using disavow files don’t affect metrics from SEO tools, as these companies don’t have access to Google data.

This highlights the need to understand the sources and limitations of SEO tool data. Their metrics aren’t direct indicators of search engine rankings.

What To Focus On? Value, Not Numbers

Mueller suggested a holistic SEO approach, prioritizing unique value over specific metrics like Domain Authority or spam scores.

He advised:

“If you want to think about the long term, finding ways to add real value that’s unique and wanted by people on the web (together with all the usual SEO best practices as a foundation) is a good target.”

However, Mueller acknowledged that creating unique content isn’t easy, adding:

“Unique doesn’t mean a unique combination of words, but really something that nobody else is providing, and ideally, that others can’t easily provide themselves.

It’s hard, it takes a lot of work, and it can take a lot of time. If it were fast & easy, others would be – and probably are already – doing it and have more practice at it.”

Mueller’s insights encourage us to focus on what really matters: strategies that put users first.

This helps align content with Google’s goals and create lasting benefits.

Key Takeaways

  1. While potentially useful, third-party SEO metrics shouldn’t be the primary focus of optimization efforts.
  2. Dramatic changes in these metrics don’t reflect changes in how search engines view your site.
  3. Focus on creating unique content rather than chasing tool-based metrics.
  4. Understand the limitations and sources of SEO tool data

Featured Image: JHVEPhoto/Shutterstock

Source link

Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address
Continue Reading

SEO

A Guide To Robots.txt: Best Practices For SEO

Published

on

By

A Guide To Robots.txt: Best Practices For SEO

Understanding how to use the robots.txt file is crucial for any website’s SEO strategy. Mistakes in this file can impact how your website is crawled and your pages’ search appearance. Getting it right, on the other hand, can improve crawling efficiency and mitigate crawling issues.

Google recently reminded website owners about the importance of using robots.txt to block unnecessary URLs.

Those include add-to-cart, login, or checkout pages. But the question is – how do you use it properly?

In this article, we will guide you into every nuance of how to do just so.

What Is Robots.txt?

The robots.txt is a simple text file that sits in the root directory of your site and tells crawlers what should be crawled.

The table below provides a quick reference to the key robots.txt directives.

Directive Description
User-agent Specifies which crawler the rules apply to. See user agent tokens. Using * targets all crawlers.
Disallow Prevents specified URLs from being crawled.
Allow Allows specific URLs to be crawled, even if a parent directory is disallowed.
Sitemap Indicates the location of your XML Sitemap by helping search engines to discover it.

This is an example of robot.txt from ikea.com with multiple rules.

Example of robots.txt from ikea.com

Note that robots.txt doesn’t support full regular expressions and only has two wildcards:

  • Asterisks (*), which matches 0 or more sequences of characters.
  • Dollar sign ($), which matches the end of a URL.

Also, note that its rules are case-sensitive, e.g., “filter=” isn’t equal to “Filter=.”

Order Of Precedence In Robots.txt

When setting up a robots.txt file, it’s important to know the order in which search engines decide which rules to apply in case of conflicting rules.

They follow these two key rules:

1. Most Specific Rule

The rule that matches more characters in the URL will be applied. For example:

User-agent: *
Disallow: /downloads/
Allow: /downloads/free/

In this case, the “Allow: /downloads/free/” rule is more specific than “Disallow: /downloads/” because it targets a subdirectory.

Google will allow crawling of subfolder “/downloads/free/” but block everything else under “/downloads/.”

2. Least Restrictive Rule

When multiple rules are equally specific, for example:

User-agent: *
Disallow: /downloads/
Allow: /downloads/

Google will choose the least restrictive one. This means Google will allow access to /downloads/.

Why Is Robots.txt Important In SEO?

Blocking unimportant pages with robots.txt helps Googlebot focus its crawl budget on valuable parts of the website and on crawling new pages. It also helps search engines save computing power, contributing to better sustainability.

Imagine you have an online store with hundreds of thousands of pages. There are sections of websites like filtered pages that may have an infinite number of versions.

Those pages don’t have unique value, essentially contain duplicate content, and may create infinite crawl space, thus wasting your server and Googlebot’s resources.

That is where robots.txt comes in, preventing search engine bots from crawling those pages.

If you don’t do that, Google may try to crawl an infinite number of URLs with different (even non-existent) search parameter values, causing spikes and a waste of crawl budget.

When To Use Robots.txt

As a general rule, you should always ask why certain pages exist, and whether they have anything worth for search engines to crawl and index.

If we come from this principle, certainly, we should always block:

  • URLs that contain query parameters such as:
    • Internal search.
    • Faceted navigation URLs created by filtering or sorting options if they are not part of URL structure and SEO strategy.
    • Action URLs like add to wishlist or add to cart.
  • Private parts of the website, like login pages.
  • JavaScript files not relevant to website content or rendering, such as tracking scripts.
  • Blocking scrapers and AI chatbots to prevent them from using your content for their training purposes.

Let’s dive into examples of how you can use robots.txt for each case.

1. Block Internal Search Pages

The most common and absolutely necessary step is to block internal search URLs from being crawled by Google and other search engines, as almost every website has an internal search functionality.

On WordPress websites, it is usually an “s” parameter, and the URL looks like this:

https://www.example.com/?s=google

Gary Illyes from Google has repeatedly warned to block “action” URLs as they can cause Googlebot to crawl them indefinitely even non-existent URLs with different combinations.

Here is the rule you can use in your robots.txt to block such URLs from being crawled:

User-agent: *
Disallow: *s=*
  1. The User-agent: * line specifies that the rule applies to all web crawlers, including Googlebot, Bingbot, etc.
  2. The Disallow: *s=* line tells all crawlers not to crawl any URLs that contain the query parameter “s=.” The wildcard “*” means it can match any sequence of characters before or after “s= .” However, it will not match URLs with uppercase “S” like “/?S=” since it is case-sensitive.

Here is an example of a website that managed to drastically reduce the crawling of non-existent internal search URLs after blocking them via robots.txt.

Screenshot from crawl stats reportScreenshot from crawl stats report

Note that Google may index those blocked pages, but you don’t need to worry about them as they will be dropped over time.

2. Block Faceted Navigation URLs

Faceted navigation is an integral part of every ecommerce website. There can be cases where faceted navigation is part of an SEO strategy and aimed at ranking for general product searches.

For example, Zalando uses faceted navigation URLs for color options to rank for general product keywords like “gray t-shirt.”

However, in most cases, this is not the case, and filter parameters are used merely for filtering products, creating dozens of pages with duplicate content.

Technically, those parameters are not different from internal search parameters with one difference as there may be multiple parameters. You need to make sure you disallow all of them.

For example, if you have filters with the following parameters “sortby,” “color,” and “price,” you may use this set of rules:

User-agent: *
Disallow: *sortby=*
Disallow: *color=*
Disallow: *price=*

Based on your specific case, there may be more parameters, and you may need to add all of them.

What About UTM Parameters?

UTM parameters are used for tracking purposes.

As John Mueller stated in his Reddit post, you don’t need to worry about URL parameters that link to your pages externally.

John Mueller on UTM parametersJohn Mueller on UTM parameters

Just make sure to block any random parameters you use internally and avoid linking internally to those pages, e.g., linking from your article pages to your search page with a search query page “https://www.example.com/?s=google.”

3. Block PDF URLs

Let’s say you have a lot of PDF documents, such as product guides, brochures, or downloadable papers, and you don’t want them crawled.

Here is a simple robots.txt rule that will block search engine bots from accessing those documents:

User-agent: *
Disallow: /*.pdf$

The “Disallow: /*.pdf$” line tells crawlers not to crawl any URLs that end with .pdf.

By using /*, the rule matches any path on the website. As a result, any URL ending with .pdf will be blocked from crawling.

If you have a WordPress website and want to disallow PDFs from the uploads directory where you upload them via the CMS, you can use the following rule:

User-agent: *
Disallow: /wp-content/uploads/*.pdf$
Allow: /wp-content/uploads/2024/09/allowed-document.pdf$

You can see that we have conflicting rules here.

In case of conflicting rules, the more specific one takes priority, which means the last line ensures that only the specific file located in folder “wp-content/uploads/2024/09/allowed-document.pdf” is allowed to be crawled.

4. Block A Directory

Let’s say you have an API endpoint where you submit your data from the form. It is likely your form has an action attribute like action=”/form/submissions/.”

The issue is that Google will try to crawl that URL, /form/submissions/, which you likely don’t want. You can block these URLs from being crawled with this rule:

User-agent: *
Disallow: /form/

By specifying a directory in the Disallow rule, you are telling the crawlers to avoid crawling all pages under that directory, and you don’t need to use the (*) wildcard anymore, like “/form/*.”

Note that you must always specify relative paths and never absolute URLs, like “https://www.example.com/form/” for Disallow and Allow directives.

Be cautious to avoid malformed rules. For example, using /form without a trailing slash will also match a page /form-design-examples/, which may be a page on your blog that you want to index.

Read: 8 Common Robots.txt Issues And How To Fix Them

5. Block User Account URLs

If you have an ecommerce website, you likely have directories that start with “/myaccount/,” such as “/myaccount/orders/” or “/myaccount/profile/.”

With the top page “/myaccount/” being a sign-in page that you want to be indexed and found by users in search, you may want to disallow the subpages from being crawled by Googlebot.

You can use the Disallow rule in combination with the Allow rule to block everything under the “/myaccount/” directory (except the /myaccount/ page).

User-agent: *
Disallow: /myaccount/
Allow: /myaccount/$


And again, since Google uses the most specific rule, it will disallow everything under the /myaccount/ directory but allow only the /myaccount/ page to be crawled.

Here’s another use case of combining the Disallow and Allow rules: in case you have your search under the /search/ directory and want it to be found and indexed but block actual search URLs:

User-agent: *
Disallow: /search/
Allow: /search/$

6. Block Non-Render Related JavaScript Files

Every website uses JavaScript, and many of these scripts are not related to the rendering of content, such as tracking scripts or those used for loading AdSense.

Googlebot can crawl and render a website’s content without these scripts. Therefore, blocking them is safe and recommended, as it saves requests and resources to fetch and parse them.

Below is a sample line that is disallowing sample JavaScript, which contains tracking pixels.

User-agent: *
Disallow: /assets/js/pixels.js

7. Block AI Chatbots And Scrapers

Many publishers are concerned that their content is being unfairly used to train AI models without their consent, and they wish to prevent this.

#ai chatbots
User-agent: GPTBot
User-agent: ChatGPT-User
User-agent: Claude-Web
User-agent: ClaudeBot
User-agent: anthropic-ai
User-agent: cohere-ai
User-agent: Bytespider
User-agent: Google-Extended
User-Agent: PerplexityBot
User-agent: Applebot-Extended
User-agent: Diffbot
User-agent: PerplexityBot
Disallow: /
#scrapers
User-agent: Scrapy
User-agent: magpie-crawler
User-agent: CCBot
User-Agent: omgili
User-Agent: omgilibot
User-agent: Node/simplecrawler
Disallow: /

Here, each user agent is listed individually, and the rule Disallow: / tells those bots not to crawl any part of the site.

This, besides preventing AI training on your content, can help reduce the load on your server by minimizing unnecessary crawling.

For ideas on which bots to block, you may want to check your server log files to see which crawlers are exhausting your servers, and remember, robots.txt doesn’t prevent unauthorized access.

8. Specify Sitemaps URLs

Including your sitemap URL in the robots.txt file helps search engines easily discover all the important pages on your website. This is done by adding a specific line that points to your sitemap location, and you can specify multiple sitemaps, each on its own line.

Sitemap: https://www.example.com/sitemap/articles.xml
Sitemap: https://www.example.com/sitemap/news.xml
Sitemap: https://www.example.com/sitemap/video.xml

Unlike Allow or Disallow rules, which allow only a relative path, the Sitemap directive requires a full, absolute URL to indicate the location of the sitemap.

Ensure the sitemaps’ URLs are accessible to search engines and have proper syntax to avoid errors.

Sitemap fetch error in search consoleSitemap fetch error in search console

9. When To Use Crawl-Delay

The crawl-delay directive in robots.txt specifies the number of seconds a bot should wait before crawling the next page. While Googlebot does not recognize the crawl-delay directive, other bots may respect it.

It helps prevent server overload by controlling how frequently bots crawl your site.

For example, if you want ClaudeBot to crawl your content for AI training but want to avoid server overload, you can set a crawl delay to manage the interval between requests.

User-agent: ClaudeBot
Crawl-delay: 60

This instructs the ClaudeBot user agent to wait 60 seconds between requests when crawling the website.

Of course, there may be AI bots that don’t respect crawl delay directives. In that case, you may need to use a web firewall to rate limit them.

Troubleshooting Robots.txt

Once you’ve composed your robots.txt, you can use these tools to troubleshoot if the syntax is correct or if you didn’t accidentally block an important URL.

1. Google Search Console Robots.txt Validator

Once you’ve updated your robots.txt, you must check whether it contains any error or accidentally blocks URLs you want to be crawled, such as resources, images, or website sections.

Navigate Settings > robots.txt, and you will find the built-in robots.txt validator. Below is the video of how to fetch and validate your robots.txt.

2. Google Robots.txt Parser

This parser is official Google’s robots.txt parser which is used in Search Console.

It requires advanced skills to install and run on your local computer. But it is highly recommended to take time and do it as instructed on that page because you can validate your changes in the robots.txt file before uploading to your server in line with the official Google parser.

Centralized Robots.txt Management

Each domain and subdomain must have its own robots.txt, as Googlebot doesn’t recognize root domain robots.txt for a subdomain.

It creates challenges when you have a website with a dozen subdomains, as it means you should maintain a bunch of robots.txt files separately.

However, it is possible to host a robots.txt file on a subdomain, such as https://cdn.example.com/robots.txt, and set up a redirect from  https://www.example.com/robots.txt to it.

You can do vice versa and host it only under the root domain and redirect from subdomains to the root.

Search engines will treat the redirected file as if it were located on the root domain. This approach allows centralized management of robots.txt rules for both your main domain and subdomains.

It helps make updates and maintenance more efficient. Otherwise, you would need to use a separate robots.txt file for each subdomain.

Conclusion

A properly optimized robots.txt file is crucial for managing a website’s crawl budget. It ensures that search engines like Googlebot spend their time on valuable pages rather than wasting resources on unnecessary ones.

On the other hand, blocking AI bots and scrapers using robots.txt can significantly reduce server load and save computing resources.

Make sure you always validate your changes to avoid unexpected crawability issues.

However, remember that while blocking unimportant resources via robots.txt may help increase crawl efficiency, the main factors affecting crawl budget are high-quality content and page loading speed.

Happy crawling!

More resources: 


Featured Image: BestForBest/Shutterstock

Source link

Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address
Continue Reading

Trending