Connect with us

SEO

How to Find and Fix Orphan Pages (The Right Way)

Published

on

How to Find and Fix Orphan Pages (The Right Way)

Quicksand awaits unsuspecting SEOs when they start working on a website with a long history.

These pits of technical site errors, littered by several generations of previous agencies, slow down and hinder SEO efforts and progress. 

And when you’re the one tasked to clean it up, finding the quick fixes is your number one task.

So you may start with a basic site audit and see several orphan pages. You’ve probably heard that orphan pages are bad for a site but do not fully understand what they are and how to fix them.

In this article, you’ll learn:

Orphan pages are pages that search engines may have difficulty discovering because they have no internal links from elsewhere on your website. 

These URLs tend to fall through the cracks because search engine crawlers can only discover pages from the sitemap file or external backlinks, and users can only get to the page if they know the URL.

What causes orphan pages?

Usually, orphan pages are accidental and occur for various reasons. The most common cause is not having processes for site migrations, navigation changes, site redesigns, out-of-stock products, testing, or dev pages. 

Orphan pages may also be intentional, as with promotional and paid advertising landing pages, or any instance where you do not want the page to be part of the user journey.

Why are orphan pages bad for SEO?

Search engines have a hard time finding orphan pages because they use links to help discover new content and understand the page’s significance.

Here’s what Google says:

Google searches the web with automated programs called crawlers, looking for pages that are new or updated. […] We find pages by many different methods, but the main method is following links from pages that we already know about.

For example, let’s say you publish a new webpage and forget to link to it from elsewhere on your site. If the page isn’t in your sitemap and has no backlinks, Google will not find or index it. That’s because their web crawler doesn’t know that it exists.

Even worse, the page cannot receive PageRank. 

If you haven’t heard of the term “PageRank” before, it’s a big deal. 

Generally speaking, PageRank is Google’s way of understanding the significance of the page by counting the number of “votes” a page gets. You can read more about how PageRank works and affects SEO here.

To find orphan pages on your site, you need to compare a list of crawlable URLs (what Google can find) with a list of URLs people are hitting on your site. 

This may sound quite technical, but don’t be discouraged. We have broken down how to find orphan pages into three easy steps using tools you’re familiar with. 

1. Find crawlable URLs

There are a lot of tools you can use to gather a list of all crawlable URLs. We’re going to use Ahrefs’ Site Audit because it’s completely free with an Ahrefs Webmaster Tools account and you have the option to use external backlinks as a source to find even more URLs.

Here’s how to do it:

  1. Go to Site Audit.
  2. Click + New Project.
  3. Follow the prompts until step 3. Click on the URL sources tab and check Backlinks as a URL source in addition to the default settings.
  4. Click Continue, follow the instructions to complete the setup, then run the crawl.
Scheduling a site audit in Ahrefs' Site Audit

Backlink data is useful for finding orphan pages because it brings URLs from Ahrefs’ link index into the mix. 

If a page does not have any internal links, a basic crawler won’t find it. 

But, if a page has a backlink, Ahrefs will find the URL on your site and know that the crawl found no internal links, so it must be an orphan page.

When the site audit is complete, export all internal pages from Page Explorer and save them. You’ll use this in step 3.

Page Explorer in Ahrefs' Site Audit

Before we continue…

As Site Audit uses both sitemaps and backlinks as URL sources, it does a reasonable job of finding orphan pages for you without any extra work. To see them, go to Page Explorer, click Links, and select Orphan pages:

Orphan pages in Ahrefs' Site Audit

However, you’ll only see orphan pages found via backlinks or sitemaps here. If you have orphan pages not included in sitemaps and without backlinks, Ahrefs won’t be able to find them. 

Keep reading if you think this may be the case for you and want to dig a little deeper for orphan pages.

2. Find URLs with hits

The next step is getting a list of all the URLs with hits on our site. 

There are quite a few ways to do this, and it’s always best to use as many data sources as you have access to. 

If you have access, log files work well because they are server-side data which is more accurate. We won’t be going into the nitty-gritty of accessing these because it depends on how the server is set up. 

But if you choose to go this route, here are three official guides for common server types:

In this article, we will use Google Analytics (GA4) and Google Search Console because the process is basically the same for everyone. 

Here’s how to find URLs with hits in Google Analytics (GA4):

  1. Log in to your Data Studio account.
  2. Start a new blank report.
  3. Connect Google Analytics as your data source.
  4. Choose the account you’re analyzing > select GA4 property.
  5. Add a basic table to your report.
  6. Set data source to the GA4 property created in step 4.
  7. Set dimension to Page path.
  8. Set metric to Views.
  9. Sort by Views in descending order.
  10. Set default date range to before GA4 was installed on the site.
Google Data Studio settings

To export the results from your table, click the three vertical dots in the top right corner and hit Export. Save with a helpful name like “date_GA_URLs_people_are_hitting_brandname” because you will need it again in just a bit.

Because we exported the page path and not the full page URL, we need to add the domain to the beginning of all cells in our spreadsheet. This is easy enough in Google sheets. Just import the CSV into a blank sheet, insert a new column to the left, and paste this formula into cell A1 (make sure to replace example.com with your domain): 

=IFERROR(ARRAYFORMULA(IF(ISBLANK(B:B),"",IF(B:B="Page Path","",IF(B:B="(not set)","","https://example.com" & B:B)))))

Formula in Google Sheets

As multiple URL sources are always best, we will also pull data from Google Search Console (GSC).

GSC limits exports to the first 1,000 URLs, but Google Data Studio has a neat little trick that allows you to pull more. 

Here’s how to do it:

  1. Reopen your Data Studio report.
  2. Start a new page (command + M).
  3. Open Resource > Manage added data sources.
  4. Click ADD A DATA SOURCE.
  5. Select Search Console.
  6. Choose the site you’re analyzing > URL impression > web.
  7. Add a basic table to your report.
  8. Set dimension to Landing page.
  9. Set metric to Impressions.
  10. Expand rows per page to 5,000.
  11. Edit the date range to view at least the past three months.
  12. Export the results from your table. 

Name your sheet something helpful like “date GSC_URLs_people_are_hitting_brandname” because you’ll need it again in a moment. 

Now, combine all the URLs people are hitting from your different sources into one spreadsheet and clean up the data by removing duplicates. 

Remove duplicates Google Sheets

3. Cross-reference the two URL sources

You are in the home stretch! The last step is cross-referencing crawlable URLs (from Ahrefs’ Site Audit) and URLs with hits (from GA and GSC). To do this, create a blank Google Sheet and create three tabs. Label them crawl, hits, and cross reference. 

The three sheets you need in Google Sheets

In the first sheet, crawl, copy, and paste all of the crawlable URLs from Ahrefs’ Site Audit.

To find these, open the exported CSV from step 1 and filter for results with incomingAllLinks equal to zero. This is super important because these are orphan pages, so including them in the “crawl” tab will lead to inaccurate results when cross-referencing. 

Remove all IncomingAllLinks that equal zero

Instead, you should copy these URLs and add them to the “hits” tab. 

Next, copy and paste the remaining URLs from the Ahrefs export into the crawl tab of your Google Sheet.

Crawl URLs in spreadsheet

In the second sheet, hits, copy/paste all URLs from step 2. These are the pages you found using Google Analytics, Google Search Console, or your site log files. It includes webpages that users have visited.

Hit URLs in spreadsheet

In the third sheet, cross reference, enter the following function into the first cell: 

=UNIQUE(FILTER(hits!A:A, ISNA(MATCH (hits!A:A, crawl!A:A, 0))))

Hit enter. The function will automatically pull all of your orphan pages for easy analysis.

Orphan URLs in spreadsheet

Marketers often make the mistake of simply adding internal links to all orphan pages across the board. 

The main issue with this approach is that just because a quick fix can be applied across all pages does not mean it should be. 

Some orphan pages are intentional, like PPC landing pages, while others can just be removed, like test pages.

We don’t want to waste resources fixing something that’s not broken or is unlikely to have a positive impact.

To help solve this problem, use this decision tree:

How to deal with orphan pages flowchart

The idea here is to think critically about each orphan page and decide whether noindexing, deleting, merging/consolidating, or simply adding internal links is the best fix.

For example, if a page was missed during a site migration and that page does not offer any value for visitors, deleting it is probably the best option. However, if the page has backlinks, it may also be worth redirecting the URL to another relevant page to preserve backlink equity. 

TIP

Checking orphan pages for backlinks in bulk (up to 200 URLs at a time) is easy with Ahrefs’ Batch Analysis tool. Just paste URLs from your “cross reference” sheet and click Analyse.

Batch Analysis tool in Ahrefs

Let’s look at the four strategies to fix orphan pages.

Internally link

Orphan pages that are valuable for site visitors should be incorporated into your site’s internal linking structure to make them easier for visitors and search engines to find. 

For example, let’s say an article was forgotten during a site migration or redesign. We need to internally link to it from a relevant page we know Google will soon (re)crawl.

Here’s an easy way to do that in Ahrefs:

  1. Go to Site Audit
  2. Open your site’s most recent crawl 
  3. Under Tools > Open Page Explorer.
  4. Search for a word or phrase in Page text.
  5. Sort the results by Organic traffic.
Finding internal link opportunities in Ahrefs' Site Audit

This finds contextual internal linking opportunities on pages that get organic traffic, which means Google is likely to recrawl them sooner rather than later and see our changes. 

Learn more: How to Use Page Explorer

Noindex

Orphan pages that were intentionally not internally linked to, like landing pages for ads, should be noindexed to prevent them from appearing in organic search results. 

Most SEO plugins have made this as easy as checking a box, but you can also do it manually by copying and pasting this into the <head> section of the page:

<meta name="robots" content="noindex" />

Sidenote.

Make sure these pages are still crawlable in robots.txt. Otherwise, search engines won’t see the noindex directive. 

Merge/consolidate

Orphan pages with the same or similar content to another page should be merged. This means consolidating the content and redirecting the orphan URL to the other page.

For example, let’s say you have two product listings for the same product. One of them is an orphan page; the other isn’t. You should take any unique valuable information from the orphan page and add it to the other page before redirecting the orphan page there.

Delete

Orphan pages that offer no value for visitors and serve no other purpose (e.g., paid traffic campaign) should be deleted. 

For example, an unused CMS theme page can be removed. This will result in a 404 page and naturally drop out of search results over time.

Sidenote.

If the page has backlinks, you may want to redirect the URL to another relevant page to preserve link equity after deleting. 

How to prevent orphan pages

As you can see, auditing orphan pages is time intensive. So once you’ve put in the work, you want to prevent orphan pages in the future. Here are a few policies and procedures to consider.

Have a plan for site migrations

Be proactive by having a plan any time you do a website migration. You can avoid broken links and confusion on your website by redirecting old pages to new versions with a 301 redirect.

Set up your site structure for success

If you have to internally link to new pages manually, you’re bound to miss some and end up with orphan pages. This is why you should opt for a site structure that handles internal linking for you. 

Most types of CMS do this out of the box. For example, each time we publish a new blog post, WordPress adds an internal link from our blog homepage and archive. 

However, if you’re using a custom solution, you need to ensure the necessary code is in place for a good site structure.

Learn more: Website Structure: How to Build Your SEO Foundation

Remove discontinued products properly

If you run an e‑commerce site, you should remove discontinued products from the catalog (along with all internal links pointing to them) and set a status code of 404 or 410. Failing to remove internal links to such products is a common cause of orphan pages.

If the page has great backlinks and there is an updated or improved version of the product, you may want to consider keeping the page to preserve the backlink equity.

To do this, update the page content to explain why the product is no longer available, including introducing the new design features and linking to the new product page.

This way, the user is not landing on a completely unrelated page or 404.

Run regular site audits

By running the audit every month, you can stay on top of any accidental orphan pages that may slip through the cracks. You can do this easily using the scheduling feature in Ahrefs’ Site Audit.

1649127119 768 How to Find and Fix Orphan Pages The Right Way

Final thoughts

Looking at rows and rows of orphan page errors and trying to make sense of heavy technical jargon is intimidating.

While finding and fixing orphan pages is time intensive, it doesn’t need to be painstaking. Using Ahrefs’ Site Audit and the orphan pages flowchart will help streamline your process.

Got questions? Ping me on Twitter.




Source link

Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address

SEO

WordPress Insiders Discuss WordPress Stagnation

Published

on

By

WordPress Insiders Discuss WordPress Stagnation

A recent webinar featuring WordPress executives from Automattic and Elementor, along with developers and Joost de Valk, discussed the stagnation in WordPress growth, exploring the causes and potential solutions.

Stagnation Was The Webinar Topic

The webinar, “Is WordPress’ Market share Declining? And What Should Product Businesses Do About it?” was a frank discussion about what can be done to increase the market share of new users that are choosing a web publishing platform.

Yet something that came up is that there are some areas that WordPress is doing exceptionally well so it’s not all doom and gloom. As will be seen later on, the fact that the WordPress core isn’t progressing in terms of specific technological adoption isn’t necessarily a sign that WordPress is falling behind, it’s actually a feature.

Yet there is a stagnation as mentioned at the 17:07 minute mark:

“…Basically you’re saying it’s not necessarily declining, but it’s not increasing and the energy is lagging. “

The response to the above statement acknowledged that while there are areas of growth like in the education and government sectors, the rest was “up for grabs.”

Joost de Valk spoke directly and unambiguously acknowledged the stagnation at the 18:09 minute mark:

“I agree with Noel. I think it’s stagnant.”

That said, Joost also saw opportunities with ecommerce, with the performance of WooCommerce. WooCommerce, by the way, outperformed WordPress as a whole with a 6.80% year over year growth rate, so there’s a good reason that Joost was optimistic of the ecommerce sector.

A general sense that WordPress was entering a stall however was not in dispute, as shown in remarks at the 31:45 minute mark:

“… the WordPress product market share is not decreasing, but it is stagnating…”

Facing Reality Is Productive

Humans have two ways to deal with a problem:

  1. Acknowledge the problem and seek solutions
  2. Pretend it’s not there and proceed as if everything is okay

WordPress is a publishing platform that’s loved around the world and has literally created countless jobs, careers, powered online commerce as well as helped establish new industries in developing applications that extend WordPress.

Many people have a stake in WordPress’ continued survival so any talk about WordPress entering a stall and descent phase like an airplane that reached the maximum altitude is frightening and some people would prefer to shout it down to make it go away.

Acknowledging facts and not brushing them aside is what this webinar achieved as a step toward identifying solutions. Everyone in the discussion has a stake in the continued growth of WordPress and their goal was to put it out there for the community to also get involved.

The live webinar featured:

  • Miriam Schwab, Elementor’s Head of WP Relations
  • Rich Tabor, Automattic Product Manager
  • Joost de Valk, founder of Yoast SEO
  • Co-hosts Matt Cromwell and Amber Hinds, both members of the WordPress developer community moderated the discussion.

WordPress Market Share Stagnation

The webinar acknowledged that WordPress market share, the percentage of websites online that use WordPress, was stagnating. Stagnation is a state at which something is neither moving forward nor backwards, it is simply stuck at an in between point. And that’s what was openly acknowledged and the main point of the discussion was understanding the reasons why and what could be done about it.

Statistics gathered by the HTTPArchive and published on Joost de Valk’s blog show that WordPress experienced a year over year growth of 1.85%, having spent the year growing and contracting its market share. For example, over the latest month over month period the market share dropped by -0.28%.

Crowing about the WordPress 1.85% growth rate as evidence that everything is fine is to ignore that a large percentage of new businesses and websites coming online are increasingly going to other platforms, with year over year growth rates of other platforms outpacing the rate of growth of WordPress.

Out of the top 10 Content Management Systems, only six experienced year over year (YoY) growth.

CMS YoY Growth

  1. Webflow: 25.00%
  2. Shopify: 15.61%
  3. Wix: 10.71%
  4. Squarespace: 9.04%
  5. Duda: 8.89%
  6. WordPress: 1.85%

Why Stagnation Is A Problem

An important point made in the webinar is that stagnation can have a negative trickle-down effect on the business ecosystem by reducing growth opportunities and customer acquisition. If fewer of the new businesses coming online are opting in for WordPress are clients that will never come looking for a theme, plugin, development or SEO service.

It was noted at the 4:18 minute mark by Joost de Valk:

“…when you’re investing and when you’re building a product in the WordPress space, the market share or whether WordPress is growing or not has a deep impact on how easy it is to well to get people to, to buy the software that you want to sell them.”

Perception Of Innovation

One of the potential reasons for the struggle to achieve significant growth is the perception of a lack of innovation, pointed out at the 16:51 minute mark that there’s still no integration with popular technologies like Next JS, an open-source web development platform that is optimized for fast rollout of scalable and search-friendly websites.

It was observed at the 16:51 minute mark:

“…and still today we have no integration with next JS or anything like that…”

Someone else agreed but also expressed at the 41:52 minute mark, that the lack of innovation in the WordPress core can also be seen as a deliberate effort to make WordPress extensible so that if users find a gap a developer can step in and make a plugin to make WordPress be whatever users and developers want it to be.

“It’s not trying to be everything for everyone because it’s extensible. So if WordPress has a… let’s say a weakness for a particular segment or could be doing better in some way. Then you can come along and develop a plug in for it and that is one of the beautiful things about WordPress.”

Is Improved Marketing A Solution

One of the things that was identified as an area of improvement is marketing. They didn’t say it would solve all problems. It was simply noted that competitors are actively advertising and promoting but WordPress is by comparison not really proactively there. I think to extend that idea, which wasn’t expressed in the webinar, is to consider that if WordPress isn’t out there putting out a positive marketing message then the only thing consumers might be exposed to is the daily news of another vulnerability.

Someone commented in the 16:21 minute mark:

“I’m missing the excitement of WordPress and I’m not feeling that in the market. …I think a lot of that is around the product marketing and how we repackage WordPress for certain verticals because this one-size-fits-all means that in every single vertical we’re being displaced by campaigns that have paid or, you know, have received a a certain amount of funding and can go after us, right?”

This idea of marketing being a shortcoming of WordPress was raised earlier in the webinar at the 18:27 minute mark where it was acknowledged that growth was in some respects driven by the WordPress ecosystem with associated products like Elementor driving the growth in adoption of WordPress by new businesses.

They said:

“…the only logical conclusion is that the fact that marketing of WordPress itself is has actually always been a pain point, is now starting to actually hurt us.”

Future Of WordPress

This webinar is important because it features the voices of people who are actively involved at every level of WordPress, from development, marketing, accessibility, WordPress security, to plugin development. These are insiders with a deep interest in the continued evolution of WordPress as a viable platform for getting online.

The fact that they’re talking about the stagnation of WordPress should be of concern to everybody and that they are talking about solutions shows that the WordPress community is not in denial but is directly confronting situations, which is how a thriving ecosystem should be responding.

Watch the webinar:

Is WordPress’ Market share Declining? And What Should Product Businesses Do About it?

Featured Image by Shutterstock/Krakenimages.com

Source link

Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address
Continue Reading

SEO

Google’s New Support For AVIF Images May Boost SEO

Published

on

By

Google's New Support For AVIF Images May Boost SEO

Google announced that images in the AVIF file format will now be eligible to be shown in Google Search and Google Images, including all platforms that surface Google Search data. AVIF will dramatically lower image sizes and improve Core Web Vitals scores, particularly Largest Contentful Paint.

How AVIF Can Improve SEO

Getting pages crawled and indexed are the first step of effective SEO. Anything that lowers file size and speeds up web page rendering will help search crawlers get to the content faster and improve the amount of pages crawled.

Google’s crawl budget documentation recommends increasing the speeds of page loading and rendering as a way to avoid receiving “Hostload exceeded” warnings.

It also says that faster loading times enables Googlebot to crawl more pages:

Improve your site’s crawl efficiency

Increase your page loading speed
Google’s crawling is limited by bandwidth, time, and availability of Googlebot instances. If your server responds to requests quicker, we might be able to crawl more pages on your site.

What Is AVIF?

AVIF (AVI Image File Format) is a next generation open source image file format that combines the best of JPEG, PNG, and GIF image file formats but in a more compressed format for smaller image files (by 50% for JPEG format).

AVIF supports transparency like PNG and photographic images like JPEG does but does but with a higher level of dynamic range, deeper blacks, and better compression (meaning smaller file sizes). AVIF even supports animation like GIF does.

AVIF Versus WebP

AVIF is generally a better file format than WebP in terms of smaller files size (compression) and image quality.  WebP is better for lossless images, where maintaining high quality regardless of file size is more important. But for everyday web usage, AVIF is the better choice.

See also: 12 Important Image SEO Tips You Need To Know

Is AVIF Supported?

AVIF is currently supported by Chrome, Edge, Firefox, Opera, and Safari browsers. Not all content management systems support AVIF. However, both WordPress and Joomla support AVIF. In terms of CDN, Cloudflare also already supports AVIF.

I couldn’t at this time ascertain whether Bing supports AVIF files and will update this article once I find out.

Current website usage of AVIF stands at 0.2% but now that it’s available to surfaced in Google Search, expect that percentage to grow. AVIF images will probably become a standard image format because of its high compression will help sites perform far better than they currently do with JPEG and PNG formats.

Research conducted in July 2024 by Joost de Valk (founder of Yoast, ) discovered that social media platforms don’t all support AVIF files. He found that LinkedIn, Mastodon, Slack, and Twitter/X do not currently support AVIF but that Facebook, Pinterest, Threads and WhatsApp do support it.

AVIF Images Are Automatically Indexable By Google

According to Google’s announcement there is nothing special that needs to be done to make AVIF image files indexable.

“Over the recent years, AVIF has become one of the most commonly used image formats on the web. We’re happy to announce that AVIF is now a supported file type in Google Search, for Google Images as well as any place that uses images in Google Search. You don’t need to do anything special to have your AVIF files indexed by Google.”

Read Google’s announcement:

Supporting AVIF in Google Search

Featured Image by Shutterstock/Cast Of Thousands

Source link

Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address
Continue Reading

SEO

CMOs Called Out For Reliance On AI Content For SEO

Published

on

By

CMOs Called Out For Reliance On AI Content For SEO

Eli Schwartz, Author of Product-Led SEO, started a discussion on LinkedIn about there being too many CMOs (Chief Marketing Officers) who believe that AI written content is an SEO strategy. He predicted that there will be reckoning on the way after their strategies end in failure.

This is what Eli had to say:

“Too many CMOs think that AI-written content is an SEO strategy that will replace actual SEO.

This mistake is going to lead to an explosion in demand for SEO strategists to help them fix their traffic when they find out they might have been wrong.”

Everyone in the discussion, which received 54 comments, strongly agreed with Eli, except for one guy.

What Is Google’s Policy On AI Generated Content?

Google’s policy hasn’t changed although they did update their guidance and spam policies on March 5, 2024 at the same time as the rollout of the March 2024 Core Algorithm Update. Many publishers who used AI to create content subsequently reported losing rankings.

Yet it’s not said that using AI is enough to merit poor rankings, it’s content that is created for ranking purposes.

Google wrote these guidelines specifically for autogenerated content, including AI generated content (Wayback machine copy dated March 6, 2024)

“Our long-standing spam policy has been that use of automation, including generative AI, is spam if the primary purpose is manipulating ranking in Search results. The updated policy is in the same spirit of our previous policy and based on the same principle. It’s been expanded to account for more sophisticated scaled content creation methods where it isn’t always clear whether low quality content was created purely through automation.

Our new policy is meant to help people focus more clearly on the idea that producing content at scale is abusive if done for the purpose of manipulating search rankings and that this applies whether automation or humans are involved.”

Many in Eli’s discussion were in agreement that reliance on AI by some organizations may come to haunt them, except for that one guy in the discussion

Read the discussion on LinkedIn:

Too many CMOs think that AI-written content is an SEO strategy that will replace actual SEO

Featured Image by Shutterstock/Cast Of Thousands

Source link

Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address
Continue Reading

Trending