Connect with us

SEO

6 Common Robots.txt Issues & And How To Fix Them

Published

on

6 Common Robots.txt Issues & And How To Fix Them


Robots.txt is a useful and relatively powerful tool to instruct search engine crawlers on how you want them to crawl your website.

It is not all-powerful (in Google’s own words, “it is not a mechanism for keeping a web page out of Google”) but it can help to prevent your site or server from being overloaded by crawler requests.

If you have this crawl block in place on your site, you need to be certain it’s being used properly.

This is particularly important if you use dynamic URLs or other methods that generate a theoretically infinite number of pages.

In this guide, we will look at some of the most common issues with the robots.txt file, the impact they can have on your website and your search presence, and how to fix these issues if you think they have occurred.

But first, let’s take a quick look at robots.txt and its alternatives.

What Is Robots.txt?

Robots.txt uses a plain text file format and is placed in the root directory of your website.

It must be in the topmost directory of your site; if you place it in a subdirectory, search engines will simply ignore it.

Despite its great power, robots.txt is often a relatively simple document, and a basic robots.txt file can be created in a matter of seconds using an editor like Notepad.

There are other ways to achieve some of the same goals that robots.txt is usually used for.

Individual pages can include a robots meta tag within the page code itself.

You can also use the X-Robots-Tag HTTP header to influence how (and whether) content is shown in search results.

What Can Robots.txt do?

Robots.txt can achieve a variety of results across a range of different content types:

Web pages can be blocked from being crawled.

They may still appear in search results, but will not have a text description. Non-HTML content on the page will not be crawled either.

Media files can be blocked from appearing in Google search results.

This includes images, video, and audio files.

If the file is public, it will still ‘exist’ online and can be viewed and linked to, but this private content will not show in Google searches.

Resource files like unimportant external scripts can be blocked.

But this means if Google crawls a page that requires that resource to load, the Googlebot robot will ‘see’ a version of the page as if that resource did not exist, which may affect indexing.

You cannot use robots.txt to completely block a web page from appearing in Google’s search results.

To achieve that, you must use an alternative method such as adding a noindex meta tag to the head of the page.

How Dangerous Are Robots.txt Mistakes?

A mistake in robots.txt can have unintended consequences, but it’s often not the end of the world.

The good news is that by fixing your robots.txt file, you can recover from any errors quickly and (usually) in full.

Google’s guidance to web developers says this on the subject of robots.txt mistakes:

“Web crawlers are generally very flexible and typically will not be swayed by minor mistakes in the robots.txt file. In general, the worst that can happen is that incorrect [or] unsupported directives will be ignored.

Bear in mind though that Google can’t read minds when interpreting a robots.txt file; we have to interpret the robots.txt file we fetched. That said, if you are aware of problems in your robots.txt file, they’re usually easy to fix.”

6 Common Robots.txt Mistakes

  1. Robots.txt Not In The Root Directory.
  2. Poor Use Of Wildcards.
  3. Noindex In Robots.txt.
  4. Blocked Scripts And Stylesheets.
  5. No Sitemap URL.
  6. Access To Development Sites.

If your website is behaving strangely in the search results, your robots.txt file is a good place to look for any mistakes, syntax errors, and overreaching rules.

Let’s take a look at each of the above mistakes in more detail and see how to ensure you have a valid robots.txt file.

1. Robots.txt Not In The Root Directory

Search robots can only discover the file if it’s in your root folder.

That’s why there should be only a forward slash between the .com (or equivalent domain) of your website, and the ‘robots.txt’ filename, in the URL of your robots.txt file.

If there’s a subfolder in there, your robots.txt file is probably not visible to the search robots, and your website is probably behaving as if there was no robots.txt file at all.

To fix this issue, move your robots.txt file to your root directory.

It’s worth noting that this will need you to have root access to your server.

Some content management systems will upload files to a ‘media’ subdirectory (or something similar) by default, so you might need to circumvent this to get your robots.txt file in the right place.

2. Poor Use Of Wildcards

Robots.txt supports two wildcard characters:

  • Asterisk * which represents any instances of a valid character, like a Joker in a deck of cards.
  • Dollar sign $ which denotes the end of a URL, allowing you to apply rules only to the final part of the URL, such as the filetype extension.

It’s sensible to adopt a minimalist approach to using wildcards, as they have the potential to apply restrictions to a much broader portion of your website.

It’s also relatively easy to end up blocking robot access from your entire site with a poorly placed asterisk.

To fix a wildcard issue, you’ll need to locate the incorrect wildcard and move or remove it so that your robots.txt file performs as intended.

3. Noindex In Robots.txt

This one is more common in websites that are more than a few years old.

Google has stopped obeying noindex rules in robots.txt files as of September 1, 2019.

If your robots.txt file was created before that date, or contains noindex instructions, you’re likely to see those pages indexed in Google’s search results.

The solution to this problem is to implement an alternative ‘noindex’ method.

One option is the robots meta tag, which you can add to the head of any web page you want to prevent Google from indexing.

4. Blocked Scripts And Stylesheets

It might seem logical to block crawler access to external JavaScripts and cascading stylesheets (CSS).

However, remember that Googlebot needs access to CSS and JS files in order to “see” your HTML and PHP pages correctly.

If your pages are behaving oddly in Google’s results, or it looks like Google is not seeing them correctly, check whether you are blocking crawler access to required external files.

A simple solution to this is to remove the line from your robots.txt file that is blocking access.

Or, if you have some files you do need to block, insert an exception that restores access to the necessary CSS and JavaScripts.

5. No Sitemap URL

This is more about SEO than anything else.

You can include the URL of your sitemap in your robots.txt file.

Because this is the first place Googlebot looks when it crawls your website, this gives the crawler a headstart in knowing the structure and main pages of your site.

While this is not strictly an error, as omitting a sitemap should not negatively affect the actual core functionality and appearance of your website in the search results, it’s still worth adding your sitemap URL to robots.txt if you want to give your SEO efforts a boost.

6. Access To Development Sites

Blocking crawlers from your live website is a no-no, but so is allowing them to crawl and index your pages that are still under development.

It’s best practice to add a disallow instruction to the robots.txt file of a website under construction so the general public doesn’t see it until it’s finished.

Equally, it’s crucial to remove the disallow instruction when you launch a completed website.

Forgetting to remove this line from robots.txt is one of the most common mistakes among web developers, and can stop your entire website from being crawled and indexed correctly.

If your development site seems to be receiving real-world traffic, or your recently launched website is not performing at all well in search, look for a universal user agent disallow rule in your robots.txt file:

User-Agent: *


Disallow: /

If you see this when you shouldn’t (or don’t see it when you should), make the necessary changes to your robots.txt file and check that your website’s search appearance updates accordingly.

How To Recover From A Robots.txt Error

If a mistake in robots.txt is having unwanted effects on your website’s search appearance, the most important first step is to correct robots.txt and verify that the new rules have the desired effect.

Some SEO crawling tools can help with this so you don’t have to wait for the search engines to next crawl your site.

When you are confident that robots.txt is behaving as desired, you can try to get your site re-crawled as soon as possible.

Platforms like Google Search Console and Bing Webmaster Tools can help.

Submit an updated sitemap and request a re-crawl of any pages that have been inappropriately delisted.

Unfortunately, you are at the whim of Googlebot – there’s no guarantee as to how long it might take for any missing pages to reappear in the Google search index.

All you can do is take the correct action to minimize that time as much as possible and keep checking until the fixed robots.txt is implemented by Googlebot.

Final Thoughts

Where robots.txt errors are concerned, prevention is definitely better than cure.

On a large revenue-generating website, a stray wildcard that removes your entire website from Google can have an immediate impact on earnings.

Edits to robots.txt should be made carefully by experienced developers, double-checked, and – where appropriate – subject to a second opinion.

If possible, test in a sandbox editor before pushing live on your real-world server to ensure you avoid inadvertently creating availability issues.

Remember, when the worst happens, it’s important not to panic.

Diagnose the problem, make the necessary repairs to robots.txt, and resubmit your sitemap for a new crawl.

Your place in the search rankings will hopefully be restored within a matter of days.

More resources:


Featured Image: M-SUR/Shutterstock





Source link

SEO

Top 10 Essential Website Optimization Strategies

Published

on

Top 10 Essential Website Optimization Strategies

Google officially launched 24 years ago in 1998.

A lot has changed since then, but one thing remains the same. If you simply focus on the basics, you can still be highly successful online.

Of course, the basics in 2022 are much different from the basics in 1998. It’s easy to get overwhelmed and distracted. It has never been more important to be disciplined in one’s approach to SEO.

So, the obvious question is this: What are the factors to concentrate on? How can one boost rankings? How can anyone build traffic in such a competitive environment?

This post will delve into which factors carry the most weight and how to optimize for each.

1. Search Intent

As machine learning, artificial intelligence, and deep learning continue to evolve, each will carry more weight in the Google Core Algorithm.

The end goal for Google is to understand the context of a given search query and to serve results consistent with the user intent. This makes advanced-level keyword research and keyword selection more important than ever.

Before spending time and resources trying to rank for a phrase, you will need to look at the websites that are currently at the top of the SERPs for that phrase.

A keyword’s contextual relevance must align with a search query. There will be some keywords and queries that will be impossible to rank for.

For example, if Google has determined that people searching for “Personal Injury Attorney [insert city]” want a list of lawyers to choose from, then a series of trusted law directories will appear at the top of the SERPs.

An individual or single firm will not supplant those directories. In those cases, you will need to refine your strategy.

2. Technical SEO

The foundation for technical SEO is having a solid website architecture.

One cannot simply publish a random collection of pages and posts. An SEO-friendly site architecture will guide users throughout your site and make it easy for Google to crawl and index your pages.

Once you have the right architecture in place, it’s time to perform a technical or SEO audit.

Thanks to the many SEO tools available, an SEO audit is no longer a daunting task. That said, the key is to know how to interpret the data provided and what to do with it.

For starters, you should check the following and fix any issues that are uncovered:

  • Check for status code errors and correct them.
  • Check the robot.txt for errors. Optimize if needed.
  • Check your site indexing via Google Search Console. Examine and fix any issues discovered.
  • Fix duplicate title tags and duplicate meta descriptions.
  • Audit your website content. Check the traffic stats in Google Analytics. Consider improving or pruning underperforming content.
  • Fix broken links. These are an enemy of the user experience – and potentially rankings.
  • Submit your XML sitemap to Google via Google Search Console.

3. User Experience

User experience (UX) is centered on gaining insight into users, their needs, their values, their abilities, and their limitations.

UX also takes into consideration business goals and objectives. The best UX practices focus on improving the quality of the user experience.

According to Peter Morville, factors that influence UX include:

  • Useful: Your content needs to be unique and satisfy a need.
  • Usable: Your website needs to be easy to use and navigate.
  • Desirable: Your design elements and brand should evoke emotion and appreciation.
  • Findable: Integrate design and navigation elements to make it easy for users to find what they need.
  • Accessible: Content needs to be accessible to everyone – including the 12.7% of the population with disabilities.
  • Credible: Your site needs to be trustworthy for users to believe you.
  • Valuable: Your site needs to provide value to the user in terms of experience and to the company in terms of positive ROI.

Multivariate and A/B testing is the best way to measure and create a better experience for website users. Multivariate testing is best when considering complex changes.

One can incorporate many different elements and test how they all work together. A/B testing, on the other hand, will compare two different elements on your site to determine which performs the best.

4. Mobile-First

Google officially began rolling out the mobile-first index in March 2018. Smart marketers were taking a mobile-first approach long before the official rollout.

According to Google Search Central:

“Neither mobile-friendliness nor a mobile-responsive layout are requirements for mobile-first indexing. Pages without mobile versions still work on mobile and are usable for indexing. That said, it’s about time to move from desktop-only and embrace mobile :)”

Here are some basics for making your site mobile-friendly:

  • Make your site adaptive to any device – be it desktop, mobile, or tablet.
  • Always scale your images when using a responsive design, especially for mobile users.
  • Use short meta titles. They are easier to read on mobile devices.
  • Avoid pop-ups that cover your content and prevent visitors from getting a glimpse of what your content is all about.
  • Less can be more on mobile. In a mobile-first world, long-form content doesn’t necessarily equate to more traffic and better rankings.
  • Don’t use mobile as an excuse for cloaking. Users and search engines need to see the same content.

5. Core Web Vitals

In July of 2021, the Page Experience Update rolled out and is now incorporated into Google’s core algorithm, as a ranking factor.

As the name implies, the core web vitals initiative was designed to quantify the essential metrics for a healthy website. This syncs up with Google’s commitment to delivering the best user experience.

According to Google, “loading experience, interactivity, and visual stability of page content, and combined are the foundation of Core Web Vitals.”

Each one of these metrics:

  • Focuses on a unique aspect of the user experience.
  • Is measurable and quantifiable for an objective determination of the outcome.

Tools To Measure Core Web Vitals:

  • PageSpeed Insights: Measures both mobile and desktop performance and provides recommendations for improvement.
  • Lighthouse: An open-source, automated tool developed by Google to help developers improve web page quality. It has several features not available in PageSpeed Insights, including some SEO checks.
  • Search Console: A Core Web Vitals report is now included in GSC, showing URL performance as grouped by status, metric type, and URL group.

6. Schema

Schema markup, once added to a webpage, creates a rich snippet – an enhanced description that appears in the search results.

All leading search engines, including Google, Yahoo, Bing, and Yandex, support the use of microdata. The real value of schema is that it can provide context to a webpage and improve the search experience.

There is no evidence that adding schema has any influence on SERPs.

Following, you will find some of the most popular uses for schema

If you find the thought of adding schema to a page intimidating, you shouldn’t. Schema is quite simple to implement. If you have a WordPress site, there are several plugins that will do this for you.

7. Content Marketing

It is projected that 97 zettabytes of data will be created, captured, copied, and consumed worldwide this year.

To put this in perspective, that’s the equivalent of 18.7 trillion songs or 3,168 years of HD video every day.

The challenge of breaking through the clutter will become exponentially more difficult as time passes.

To do so:

  • Create a content hub in the form of a resource center.
  • Fill your resource hub with a combination of useful, informative, and entertaining content.
  • Write “spoke” pieces related to your resource hub and interlink.
  • Write news articles related to your resource and interlink.
  • Spread the word. Promote your news articles on social channels.
  • Hijack trending topics related to your content. Promote on social media.
  • Use your smartphone camera. Images and videos typically convert better than text alone.
  • Update stale and low-trafficked content.

8. Link Building

Links continue to be one of the most important ranking factors.

Over the years, Google has become more adept at identifying and devaluing spammy links, especially so after the launch of Penguin 4.0. That being the case, quality will continue to trump quantity.

The best link-building strategies for 2022 include:

9. Test And Document Changes

You manage what you measure.

One recent study showed that less than 50% of pages “optimized” result in more clicks. Worse yet, 34% of changes led to a decrease in clicks!

Basic steps for SEO testing:

  • Determine what you are testing and why.
  • Form a hypothesis. What do you expect will happen because of your changes?
  • Document your testing. Make sure it can be reliably replicated.
  • Publish your changes and then submit the URLs for inspection via Google Search Console.
  • Run the test for a long enough period to confirm if your hypothesis is correct or not. Document your findings and any other observations, such as changes made by competitors that may influence the outcome.
  • Take appropriate actions based on the results of your tests.

This process can be easily executed and documented by using a spreadsheet.

10. Track And Analyze KPIs

According to Roger Monti, the following are the 9 Most Important SEO KPIs to consider:

  • Customer Lifetime Value (CLV).
  • Content Efficiency.
  • Average Engagement Time.
  • Conversion Goals by Percent-Based Metrics.
  • Accurate Search Visibility.
  • Brand Visibility in Search.
  • New And Returning Users.
  • Average Time on Site.
  • Revenue Per Thousand (RPM) And Average Position.

The thing to remember about these KPIs is they are dependent upon your goals and objectives. Some may apply to your situation whereas others may not.

Think of this as a good starting point for determining how to best measure the success of a campaign.

Conclusion

Because the internet has no expiration date, mounds of information and disinformation are served up daily in various search queries.

If you aren’t careful, implementing bad or outdated advice can lead to disastrous results.

Do yourself a favor and just focus on these 10 essentials. By doing so, you will be setting yourself up for long-term success.

More Resources:


Featured Image: Rawpixel.com/Shutterstock



Source link

Continue Reading

DON'T MISS ANY IMPORTANT NEWS!
Subscribe To our Newsletter
We promise not to spam you. Unsubscribe at any time.
Invalid email address

Trending

en_USEnglish