Connect with us

SEO

6 Common Robots.txt Issues & And How To Fix Them

Published

on

6 Common Robots.txt Issues & And How To Fix Them

Robots.txt is a useful and relatively powerful tool to instruct search engine crawlers on how you want them to crawl your website.

It is not all-powerful (in Google’s own words, “it is not a mechanism for keeping a web page out of Google”) but it can help to prevent your site or server from being overloaded by crawler requests.

If you have this crawl block in place on your site, you need to be certain it’s being used properly.

This is particularly important if you use dynamic URLs or other methods that generate a theoretically infinite number of pages.

In this guide, we will look at some of the most common issues with the robots.txt file, the impact they can have on your website and your search presence, and how to fix these issues if you think they have occurred.

Advertisement

But first, let’s take a quick look at robots.txt and its alternatives.

What Is Robots.txt?

Robots.txt uses a plain text file format and is placed in the root directory of your website.

It must be in the topmost directory of your site; if you place it in a subdirectory, search engines will simply ignore it.

Despite its great power, robots.txt is often a relatively simple document, and a basic robots.txt file can be created in a matter of seconds using an editor like Notepad.

There are other ways to achieve some of the same goals that robots.txt is usually used for.

Individual pages can include a robots meta tag within the page code itself.

Advertisement

You can also use the X-Robots-Tag HTTP header to influence how (and whether) content is shown in search results.

What Can Robots.txt do?

Robots.txt can achieve a variety of results across a range of different content types:

Web pages can be blocked from being crawled.

They may still appear in search results, but will not have a text description. Non-HTML content on the page will not be crawled either.

Media files can be blocked from appearing in Google search results.

This includes images, video, and audio files.

Advertisement

If the file is public, it will still ‘exist’ online and can be viewed and linked to, but this private content will not show in Google searches.

Resource files like unimportant external scripts can be blocked.

But this means if Google crawls a page that requires that resource to load, the Googlebot robot will ‘see’ a version of the page as if that resource did not exist, which may affect indexing.

You cannot use robots.txt to completely block a web page from appearing in Google’s search results.

To achieve that, you must use an alternative method such as adding a noindex meta tag to the head of the page.

How Dangerous Are Robots.txt Mistakes?

A mistake in robots.txt can have unintended consequences, but it’s often not the end of the world.

Advertisement

The good news is that by fixing your robots.txt file, you can recover from any errors quickly and (usually) in full.

Google’s guidance to web developers says this on the subject of robots.txt mistakes:

“Web crawlers are generally very flexible and typically will not be swayed by minor mistakes in the robots.txt file. In general, the worst that can happen is that incorrect [or] unsupported directives will be ignored.

Bear in mind though that Google can’t read minds when interpreting a robots.txt file; we have to interpret the robots.txt file we fetched. That said, if you are aware of problems in your robots.txt file, they’re usually easy to fix.”

6 Common Robots.txt Mistakes

  1. Robots.txt Not In The Root Directory.
  2. Poor Use Of Wildcards.
  3. Noindex In Robots.txt.
  4. Blocked Scripts And Stylesheets.
  5. No Sitemap URL.
  6. Access To Development Sites.

If your website is behaving strangely in the search results, your robots.txt file is a good place to look for any mistakes, syntax errors, and overreaching rules.

Let’s take a look at each of the above mistakes in more detail and see how to ensure you have a valid robots.txt file.

1. Robots.txt Not In The Root Directory

Search robots can only discover the file if it’s in your root folder.

That’s why there should be only a forward slash between the .com (or equivalent domain) of your website, and the ‘robots.txt’ filename, in the URL of your robots.txt file.

Advertisement

If there’s a subfolder in there, your robots.txt file is probably not visible to the search robots, and your website is probably behaving as if there was no robots.txt file at all.

To fix this issue, move your robots.txt file to your root directory.

It’s worth noting that this will need you to have root access to your server.

Some content management systems will upload files to a ‘media’ subdirectory (or something similar) by default, so you might need to circumvent this to get your robots.txt file in the right place.

2. Poor Use Of Wildcards

Robots.txt supports two wildcard characters:

  • Asterisk * which represents any instances of a valid character, like a Joker in a deck of cards.
  • Dollar sign $ which denotes the end of a URL, allowing you to apply rules only to the final part of the URL, such as the filetype extension.

It’s sensible to adopt a minimalist approach to using wildcards, as they have the potential to apply restrictions to a much broader portion of your website.

It’s also relatively easy to end up blocking robot access from your entire site with a poorly placed asterisk.

Advertisement

To fix a wildcard issue, you’ll need to locate the incorrect wildcard and move or remove it so that your robots.txt file performs as intended.

3. Noindex In Robots.txt

This one is more common in websites that are more than a few years old.

Google has stopped obeying noindex rules in robots.txt files as of September 1, 2019.

If your robots.txt file was created before that date, or contains noindex instructions, you’re likely to see those pages indexed in Google’s search results.

The solution to this problem is to implement an alternative ‘noindex’ method.

One option is the robots meta tag, which you can add to the head of any web page you want to prevent Google from indexing.

Advertisement

4. Blocked Scripts And Stylesheets

It might seem logical to block crawler access to external JavaScripts and cascading stylesheets (CSS).

However, remember that Googlebot needs access to CSS and JS files in order to “see” your HTML and PHP pages correctly.

If your pages are behaving oddly in Google’s results, or it looks like Google is not seeing them correctly, check whether you are blocking crawler access to required external files.

A simple solution to this is to remove the line from your robots.txt file that is blocking access.

Or, if you have some files you do need to block, insert an exception that restores access to the necessary CSS and JavaScripts.

5. No Sitemap URL

This is more about SEO than anything else.

Advertisement

You can include the URL of your sitemap in your robots.txt file.

Because this is the first place Googlebot looks when it crawls your website, this gives the crawler a headstart in knowing the structure and main pages of your site.

While this is not strictly an error, as omitting a sitemap should not negatively affect the actual core functionality and appearance of your website in the search results, it’s still worth adding your sitemap URL to robots.txt if you want to give your SEO efforts a boost.

6. Access To Development Sites

Blocking crawlers from your live website is a no-no, but so is allowing them to crawl and index your pages that are still under development.

It’s best practice to add a disallow instruction to the robots.txt file of a website under construction so the general public doesn’t see it until it’s finished.

Equally, it’s crucial to remove the disallow instruction when you launch a completed website.

Advertisement

Forgetting to remove this line from robots.txt is one of the most common mistakes among web developers, and can stop your entire website from being crawled and indexed correctly.

If your development site seems to be receiving real-world traffic, or your recently launched website is not performing at all well in search, look for a universal user agent disallow rule in your robots.txt file:

User-Agent: *


Disallow: /

If you see this when you shouldn’t (or don’t see it when you should), make the necessary changes to your robots.txt file and check that your website’s search appearance updates accordingly.

How To Recover From A Robots.txt Error

If a mistake in robots.txt is having unwanted effects on your website’s search appearance, the most important first step is to correct robots.txt and verify that the new rules have the desired effect.

Some SEO crawling tools can help with this so you don’t have to wait for the search engines to next crawl your site.

When you are confident that robots.txt is behaving as desired, you can try to get your site re-crawled as soon as possible.

Platforms like Google Search Console and Bing Webmaster Tools can help.

Advertisement

Submit an updated sitemap and request a re-crawl of any pages that have been inappropriately delisted.

Unfortunately, you are at the whim of Googlebot – there’s no guarantee as to how long it might take for any missing pages to reappear in the Google search index.

All you can do is take the correct action to minimize that time as much as possible and keep checking until the fixed robots.txt is implemented by Googlebot.

Final Thoughts

Where robots.txt errors are concerned, prevention is definitely better than cure.

On a large revenue-generating website, a stray wildcard that removes your entire website from Google can have an immediate impact on earnings.

Edits to robots.txt should be made carefully by experienced developers, double-checked, and – where appropriate – subject to a second opinion.

Advertisement

If possible, test in a sandbox editor before pushing live on your real-world server to ensure you avoid inadvertently creating availability issues.

Remember, when the worst happens, it’s important not to panic.

Diagnose the problem, make the necessary repairs to robots.txt, and resubmit your sitemap for a new crawl.

Your place in the search rankings will hopefully be restored within a matter of days.

More resources:


Featured Image: M-SUR/Shutterstock

Advertisement




Source link

Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address

SEO

Top Priorities, Challenges, And Opportunities

Published

on

By

Top Priorities, Challenges, And Opportunities

The world of search has seen massive change recently. Whether you’re still in the planning stages for this year or underway with your 2024 strategy, you need to know the new SEO trends to stay ahead of seismic search industry shifts.

It’s time to chart a course for SEO success in this changing landscape.

Watch this on-demand webinar as we explore exclusive survey data from today’s top SEO professionals and digital marketers to inform your strategy this year. You’ll also learn how to navigate SEO in the era of AI, and how to gain an advantage with these new tools.

You’ll hear:

  • The top SEO priorities and challenges for 2024.
  • The role of AI in SEO – how to get ahead of the anticipated disruption of SGE and AI overall, plus SGE-specific SEO priorities.
  • Winning SEO resourcing strategies and reporting insights to fuel success.

With Shannon Vize and Ryan Maloney, we’ll take a deep dive into the top trends, priorities, and challenges shaping the future of SEO.

Discover timely insights and unlock new SEO growth potential in 2024.

Advertisement

View the slides below or check out the full webinar for all the details.

Join Us For Our Next Webinar!

10 Successful Ways To Improve Your SERP Rankings [With Ahrefs]

Reserve your spot and discover 10 quick and easy SEO wins to boost your site’s rankings.

Source link

Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address
Continue Reading

SEO

E-E-A-T’s Google Ranking Influence Decoded

Published

on

By

E-E-A-T's Google Ranking Influence Decoded

The idea that something is not a ranking factor that nevertheless plays a role in ranking websites seems to be logically irreconcilable. Despite seeming like a paradox that cancels itself out, SearchLiaison recently tweeted some comments that go a long way to understanding how to think about E-E-A-T and apply it to SEO.

What A Googler Said About E-E-A-T

Marie Haynes published a video excerpt on YouTube from an event at which a Googler spoke, essentially doubling down on the importance of E-A-T.

This is what he said:

“You know this hasn’t always been there in Google and it’s something that we developed about ten to twelve or thirteen years ago. And it really is there to make sure that along the lines of what we talked about earlier is that it really is there to ensure that the content that people consume is going to be… it’s not going to be harmful and it’s going to be useful to the user. These are principles that we live by every single day.

And E-A-T, that template of how we rate an individual site based off of Expertise, Authoritativeness and Trustworthiness, we do it to every single query and every single result. So it’s actually very pervasive throughout everything that we do .

I will say that the YMYL queries, the Your Money or Your Life Queries, such as you know when I’m looking for a mortgage or when I’m looking for the local ER,  those we have a particular eye on and we pay a bit more attention to those queries because clearly they’re some of the most important decisions that people can make.

Advertisement

So I would say that E-A-T has a bit more of an impact there but again, I will say that E-A-T applies to everything, every single query that we actually look at.”

How can something be a part of every single search query and not be a ranking factor, right?

Background, Experience & Expertise In Google Circa 2012

Something to consider is that in 2012 Google’s senior engineer at the time, Matt Cutts, said that experience and expertise brings a measure of quality to content and makes it worthy of ranking.

Matt Cutts’ remarks on experience and expertise were made in an interview with Eric Enge.

Discussing whether the website of a hypothetical person named “Jane” deserves to rank with articles that are original variations of what’s already in the SERPs.

Matt Cutts observed:

Advertisement

“While they’re not duplicates they bring nothing new to the table.

Google would seek to detect that there is no real differentiation between these results and show only one of them so we could offer users different types of sites in the other search results.

They need to ask themselves what really is their value add? …they need to figure out what… makes them special.

…if Jane is just churning out 500 words about a topic where she doesn’t have any background, experience or expertise, a searcher might not be as interested in her opinion.”

Matt then cites the example of Pulitzer Prize-Winning movie reviewer Roger Ebert as a person with the background, experience and expertise that makes his opinion valuable to readers and the content worthy of ranking.

Matt didn’t say that a webpage author’s background, experience and expertise were ranking factors. But he did say that these are the kinds of things that can differentiate one webpage from another and align it to what Google wants to rank.

He specifically said that Google’s algorithm detects if there is something different about it that makes it stand out. That was in 2012 but not much has changed because Google’s John Mueller says the same thing.

Advertisement

For example, in 2020 John Mueller said that differentiation and being compelling is important for getting Google to notice and rank a webpage.

“So with that in mind, if you’re focused on kind of this small amount of content that is the same as everyone else then I would try to find ways to significantly differentiate yourselves to really make it clear that what you have on your website is significantly different than all of those other millions of ringtone websites that have kind of the same content.

…And that’s the same recommendation I would have for any kind of website that offers essentially the same thing as lots of other web sites do.

You really need to make sure that what you’re providing is unique and compelling and high quality so that our systems and users in general will say, I want to go to this particular website because they offer me something that is unique on the web and I don’t just want to go to any random other website.”

In 2021, in regard to getting Google to index a webpage, Mueller also said:

“Is it something the web has been waiting for? Or is it just another red widget?”

This thing about being compelling and different than other sites, it’s something that’s been a part of Google’s algorithm awhile, just like the Googler in the video said, just like Matt Cutts said and exactly like what Mueller has said as well.

Are they talking about signals?

Advertisement

E-EA-T Algorithm Signals

We know there’s something in the algorithm that relates to someone’s expertise and background that Google’s looking for. The table is set and we can dig into the next step of what it all means.

A while back back I remember reading something that Marie Haynes said about E-A-T, she called it a framework. And I thought, now that’s an interesting thing she just did, she’s conceptualizing E-A-T.

When SEOs discussed E-A-T it was always in the context of what to do in order to demonstrate E-A-T. So they looked at the Quality Raters Guide for guidance, which kind of makes sense since it’s a guide, right?

But what I’m proposing is that the answer isn’t really in the guidelines or anything that the quality raters are looking for.

The best way to explain it is to ask you to think about the biggest part of Google’s algorithm, relevance.

What’s relevance? Is it something you have to do? It used to be about keywords and that’s easy for SEOs to understand. But it’s not about keywords anymore because Google’s algorithm has natural language understanding (NLU). NLU is what enables machines to understand language in the way that it’s actually spoken (natural language).

Advertisement

So, relevance is just something that’s related or connected to something else. So, if I ask, how do I satiate my thirst? The answer can be water, because water quenches the thirst.

How is a site relevant to the search query: “how do I satiate my thirst?”

An SEO would answer the problem of relevance by saying that the webpage has to have the keywords that match the search query, which would be the words “satiate” and “thirst.”

The next step the SEO would take is to extract the related entities for “satiate” and “thirst” because every SEO “knows” they need to do entity research to understand how to make a webpage that answers the search query, “How do I satiate my thirst?”

Hypothetical Related entities:

  • Thirst: Water, dehydration, drink,
  • Satiate: Food, satisfaction, quench, fulfillment, appease

Now that the SEO has their entities and their keywords they put it all together and write a 600 word essay that uses all their keywords and entities so that their webpage is relevant for the search query, “How do I satiate my thirst?”

I think we can stop now and see how silly that is, right? If someone asked you, “How do I satiate my thirst?” You’d answer, “With water” or “a cold refreshing beer” because that’s what it means to be relevant.

Advertisement

Relevance is just a concept. It doesn’t have anything to do with entities or keywords in today’s search algorithms because the machine is understanding search queries as natural language, even more so with AI search engines.

Similarly, E-E-A-T is also just a concept. It doesn’t have anything to do with author bios, LinkedIn profiles, it doesn’t have anything at all to do with making your content say that you handled the product that’s being reviewed.

Here’s what SearchLiaison recently said about an E-E-A-T, SEO and Ranking:

“….just making a claim and talking about a ‘rigorous testing process’ and following an ‘E-E-A-T checklist’ doesn’t guarantee a top ranking or somehow automatically cause a page to do better.”

Here’s the part where SearchLiaison ties a bow around the gift of E-E-A-T knowledge:

“We talk about E-E-A-T because it’s a concept that aligns with how we try to rank good content.”

E-E-A-T Can’t Be Itemized On A Checklist

Remember how we established that relevance is a concept and not a bunch of keywords and entities? Relevance is just answering the question.

E-E-A-T is the same thing. It’s not something that you do. It’s closer to something that you are.

Advertisement

SearchLiaison elaborated:

“…our automated systems don’t look at a page and see a claim like “I tested this!” and think it’s better just because of that. Rather, the things we talk about with E-E-A-T are related to what people find useful in content. Doing things generally for people is what our automated systems seek to reward, using different signals.”

A Better Understanding Of E-E-A-T

I think it’s clear now how E-E-A-T isn’t something that’s added to a webpage or is something that is demonstrated on the webpage. It’s a concept, just like relevance.

A good way to think o fit is if someone asks you a question about your family and you answer it. Most people are pretty expert and experienced enough to answer that question. That’s what E-E-A-T is and how it should be treated when publishing content, regardless if it’s YMYL content or a product review, the expertise is just like answering a question about your family, it’s just a concept.

Featured Image by Shutterstock/Roman Samborskyi

Source link

Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address
Continue Reading

SEO

Google Announces A New Carousel Rich Result

Published

on

By

Google Announces A New Carousel Rich Result

Google announced a new carousel rich result that can be used for local businesses, products, and events which will show a scrolling horizontal carousel displaying all of the items in the list. It’s very flexible and can even be used to create a top things to do in a city list that combines hotels, restaurants, and events. This new feature is in beta, which means it’s being tested.

The new carousel rich result is for displaying lists in a carousel format. According to the announcement the rich results is limited to the following types:

LocalBusiness and its subtypes, for example:
– Restaurant
– Hotel
– VacationRental
– Product
– Event

An example of subtypes is Lodgings, which is a subset of LocalBusiness.

Here is the Schema.org hierarchical structure that shows the LodgingBusiness type as being a subset of the LocalBusiness type.

  • Thing > Organization > LocalBusiness > LodgingBusiness
  • Thing > Place > LocalBusiness > LodgingBusiness

ItemList Structured Data

The carousel displays “tiles” that contain information from the webpage that’s about the price, ratings and images. The order of what’s in the ItemList structured data is the order that they will be displayed in the carousel.

Advertisement

Publishers must use the ItemList structured data in order to become eligible for the new rich result

All information in the ItemList structured data must be on the webpage. Just like any other structured data, you can’t stuff the structured data with information that is not visible on the webpage itself.

There are two important rules when using this structured data:

  1. 1. The ItemList type must be the top level container for the structured data.
  2. 2. All the URLs of in the list must point to different webpages on the same domain.

The part about the ItemList being the top level container means that the structured data cannot be merged together with another structured data where the top-level container is something other than ItemList.

For example, the structured data must begin like this:

<script type="application/ld+json"> { "@context": "https://schema.org", "@type": "ItemList", "itemListElement": [ { "@type": "ListItem", "position": 1,

A useful quality of this new carousel rich result is that publishers can mix and match the different entities as long as they’re within the eligible structured data types.

Eligible Structured Data Types

Advertisement
  • LocalBusiness and its subtypes
  • Product
  • Event

Google’s announcement explains how to mix and match the different structured data types:

“You can mix and match different types of entities (for example, hotels, restaurants), if needed for your scenario. For example, if you have a page that has both local events and local businesses.”

Here is an example of a ListItem structured data that can be used in a webpage about Things To Do In Paris.

The following structured data is for two events and a local business (the Eiffel Tower):

<script type="application/ld+json"> { "@context": "https://schema.org", "@type": "ItemList", "itemListElement": [ { "@type": "ListItem", "position": 1, "item": { "@type": "Event", "name": "Paris Seine River Dinner Cruise", "image": [ "https://example.com/photos/1x1/photo.jpg", "https://example.com/photos/4x3/photo.jpg", "https://example.com/photos/16x9/photo.jpg" ], "offers": { "@type": "Offer", "price": 45.00, "priceCurrency": "EUR" }, "aggregateRating": { "@type": "AggregateRating", "ratingValue": 4.2, "reviewCount": 690 }, "url": "https://www.example.com/event-location1" } }, { "@type": "ListItem", "position": 2, "item": { "@type": "LocalBusiness", "name": "Notre-Dame Cathedral", "image": [ "https://example.com/photos/1x1/photo.jpg", "https://example.com/photos/4x3/photo.jpg", "https://example.com/photos/16x9/photo.jpg" ], "priceRange": "$", "aggregateRating": { "@type": "AggregateRating", "ratingValue": 4.8, "reviewCount": 4220 }, "url": "https://www.example.com/localbusiness-location" } }, { "@type": "ListItem", "position": 3, "item": { "@type": "Event", "name": "Eiffel Tower With Host Summit Tour", "image": [ "https://example.com/photos/1x1/photo.jpg", "https://example.com/photos/4x3/photo.jpg", "https://example.com/photos/16x9/photo.jpg" ], "offers": { "@type": "Offer", "price": 59.00, "priceCurrency": "EUR" }, "aggregateRating": { "@type": "AggregateRating", "ratingValue": 4.9, "reviewCount": 652 }, "url": "https://www.example.com/event-location2" } } ] } </script>

Be As Specific As Possible

Google’s guidelines recommends being as specific as possible but that if there isn’t a structured data type that closely matches with the type of business then it’s okay to use the more generic LocalBusiness structured data type.

“Depending on your scenario, you may choose the best type to use. For example, if you have a list of hotels and vacation rentals on your page, use both Hotel and VacationRental types. While it’s ideal to use the type that’s closest to your scenario, you can choose to use a more generic type (for example, LocalBusiness).”

Can Be Used For Products

A super interesting use case for this structured data is for displaying a list of products in a carousel rich result.

Advertisement

The structured data for that begins as a ItemList structured data type like this:

<script type="application/ld+json"> { "@context": "https://schema.org", "@type": "ItemList", "itemListElement": [ { "@type": "ListItem", "position": 1, "item": { "@type": "Product",

The structured data can list images, ratings, reviewCount, and currency just like any other product listing, but doing it like this will make the webpage eligible for the carousel rich results.

Google has a list of recommended recommended properties that can be used with the Products version, such as offers, offers.highPrice, and offers.lowPrice.

Good For Local Businesses and Merchants

This new structured data is a good opportunity for local businesses and publishers that list events, restaurants and lodgings to get in on a new kind of rich result.

Using this structured data doesn’t guarantee that it will display as a rich result, it only makes it eligible for it.

This new feature is in beta, meaning that it’s a test.

Advertisement

Read the new developer page for this new rich result type:

Structured data carousels (beta)

Featured Image by Shutterstock/RYO Alexandre

Source link

Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address
Continue Reading

Trending

Follow by Email
RSS