Connect with us

SEO

How to Find and Fix Orphan Pages (The Right Way)

Published

on

How to Find and Fix Orphan Pages (The Right Way)


Quicksand awaits unsuspecting SEOs when they start working on a website with a long history.

These pits of technical site errors, littered by several generations of previous agencies, slow down and hinder SEO efforts and progress. 

And when you’re the one tasked to clean it up, finding the quick fixes is your number one task.

So you may start with a basic site audit and see several orphan pages. You’ve probably heard that orphan pages are bad for a site but do not fully understand what they are and how to fix them.

In this article, you’ll learn:

Orphan pages are pages that search engines may have difficulty discovering because they have no internal links from elsewhere on your website. 

These URLs tend to fall through the cracks because search engine crawlers can only discover pages from the sitemap file or external backlinks, and users can only get to the page if they know the URL.

What causes orphan pages?

Usually, orphan pages are accidental and occur for various reasons. The most common cause is not having processes for site migrations, navigation changes, site redesigns, out-of-stock products, testing, or dev pages. 

Orphan pages may also be intentional, as with promotional and paid advertising landing pages, or any instance where you do not want the page to be part of the user journey.

Why are orphan pages bad for SEO?

Search engines have a hard time finding orphan pages because they use links to help discover new content and understand the page’s significance.

Here’s what Google says:

Google searches the web with automated programs called crawlers, looking for pages that are new or updated. […] We find pages by many different methods, but the main method is following links from pages that we already know about.

For example, let’s say you publish a new webpage and forget to link to it from elsewhere on your site. If the page isn’t in your sitemap and has no backlinks, Google will not find or index it. That’s because their web crawler doesn’t know that it exists.

Even worse, the page cannot receive PageRank. 

If you haven’t heard of the term “PageRank” before, it’s a big deal. 

Generally speaking, PageRank is Google’s way of understanding the significance of the page by counting the number of “votes” a page gets. You can read more about how PageRank works and affects SEO here.

To find orphan pages on your site, you need to compare a list of crawlable URLs (what Google can find) with a list of URLs people are hitting on your site. 

This may sound quite technical, but don’t be discouraged. We have broken down how to find orphan pages into three easy steps using tools you’re familiar with. 

1. Find crawlable URLs

There are a lot of tools you can use to gather a list of all crawlable URLs. We’re going to use Ahrefs’ Site Audit because it’s completely free with an Ahrefs Webmaster Tools account and you have the option to use external backlinks as a source to find even more URLs.

Here’s how to do it:

  1. Go to Site Audit.
  2. Click + New Project.
  3. Follow the prompts until step 3. Click on the URL sources tab and check Backlinks as a URL source in addition to the default settings.
  4. Click Continue, follow the instructions to complete the setup, then run the crawl.
Scheduling a site audit in Ahrefs' Site Audit

Backlink data is useful for finding orphan pages because it brings URLs from Ahrefs’ link index into the mix. 

If a page does not have any internal links, a basic crawler won’t find it. 

But, if a page has a backlink, Ahrefs will find the URL on your site and know that the crawl found no internal links, so it must be an orphan page.

When the site audit is complete, export all internal pages from Page Explorer and save them. You’ll use this in step 3.

Page Explorer in Ahrefs' Site Audit

Before we continue…

As Site Audit uses both sitemaps and backlinks as URL sources, it does a reasonable job of finding orphan pages for you without any extra work. To see them, go to Page Explorer, click Links, and select Orphan pages:

Orphan pages in Ahrefs' Site Audit

However, you’ll only see orphan pages found via backlinks or sitemaps here. If you have orphan pages not included in sitemaps and without backlinks, Ahrefs won’t be able to find them. 

Keep reading if you think this may be the case for you and want to dig a little deeper for orphan pages.

2. Find URLs with hits

The next step is getting a list of all the URLs with hits on our site. 

There are quite a few ways to do this, and it’s always best to use as many data sources as you have access to. 

If you have access, log files work well because they are server-side data which is more accurate. We won’t be going into the nitty-gritty of accessing these because it depends on how the server is set up. 

But if you choose to go this route, here are three official guides for common server types:

In this article, we will use Google Analytics (GA4) and Google Search Console because the process is basically the same for everyone. 

Here’s how to find URLs with hits in Google Analytics (GA4):

  1. Log in to your Data Studio account.
  2. Start a new blank report.
  3. Connect Google Analytics as your data source.
  4. Choose the account you’re analyzing > select GA4 property.
  5. Add a basic table to your report.
  6. Set data source to the GA4 property created in step 4.
  7. Set dimension to Page path.
  8. Set metric to Views.
  9. Sort by Views in descending order.
  10. Set default date range to before GA4 was installed on the site.
Google Data Studio settings

To export the results from your table, click the three vertical dots in the top right corner and hit Export. Save with a helpful name like “date_GA_URLs_people_are_hitting_brandname” because you will need it again in just a bit.

Because we exported the page path and not the full page URL, we need to add the domain to the beginning of all cells in our spreadsheet. This is easy enough in Google sheets. Just import the CSV into a blank sheet, insert a new column to the left, and paste this formula into cell A1 (make sure to replace example.com with your domain): 

=IFERROR(ARRAYFORMULA(IF(ISBLANK(B:B),"",IF(B:B="Page Path","",IF(B:B="(not set)","","https://example.com" & B:B)))))

Formula in Google Sheets

As multiple URL sources are always best, we will also pull data from Google Search Console (GSC).

GSC limits exports to the first 1,000 URLs, but Google Data Studio has a neat little trick that allows you to pull more. 

Here’s how to do it:

  1. Reopen your Data Studio report.
  2. Start a new page (command + M).
  3. Open Resource > Manage added data sources.
  4. Click ADD A DATA SOURCE.
  5. Select Search Console.
  6. Choose the site you’re analyzing > URL impression > web.
  7. Add a basic table to your report.
  8. Set dimension to Landing page.
  9. Set metric to Impressions.
  10. Expand rows per page to 5,000.
  11. Edit the date range to view at least the past three months.
  12. Export the results from your table. 

Name your sheet something helpful like “date GSC_URLs_people_are_hitting_brandname” because you’ll need it again in a moment. 

Now, combine all the URLs people are hitting from your different sources into one spreadsheet and clean up the data by removing duplicates. 

Remove duplicates Google Sheets

3. Cross-reference the two URL sources

You are in the home stretch! The last step is cross-referencing crawlable URLs (from Ahrefs’ Site Audit) and URLs with hits (from GA and GSC). To do this, create a blank Google Sheet and create three tabs. Label them crawl, hits, and cross reference. 

The three sheets you need in Google Sheets

In the first sheet, crawl, copy, and paste all of the crawlable URLs from Ahrefs’ Site Audit.

To find these, open the exported CSV from step 1 and filter for results with incomingAllLinks equal to zero. This is super important because these are orphan pages, so including them in the “crawl” tab will lead to inaccurate results when cross-referencing. 

Remove all IncomingAllLinks that equal zero

Instead, you should copy these URLs and add them to the “hits” tab. 

Next, copy and paste the remaining URLs from the Ahrefs export into the crawl tab of your Google Sheet.

Crawl URLs in spreadsheet

In the second sheet, hits, copy/paste all URLs from step 2. These are the pages you found using Google Analytics, Google Search Console, or your site log files. It includes webpages that users have visited.

Hit URLs in spreadsheet

In the third sheet, cross reference, enter the following function into the first cell: 

=UNIQUE(FILTER(hits!A:A, ISNA(MATCH (hits!A:A, crawl!A:A, 0))))

Hit enter. The function will automatically pull all of your orphan pages for easy analysis.

Orphan URLs in spreadsheet

Marketers often make the mistake of simply adding internal links to all orphan pages across the board. 

The main issue with this approach is that just because a quick fix can be applied across all pages does not mean it should be. 

Some orphan pages are intentional, like PPC landing pages, while others can just be removed, like test pages.

We don’t want to waste resources fixing something that’s not broken or is unlikely to have a positive impact.

To help solve this problem, use this decision tree:

How to deal with orphan pages flowchart

The idea here is to think critically about each orphan page and decide whether noindexing, deleting, merging/consolidating, or simply adding internal links is the best fix.

For example, if a page was missed during a site migration and that page does not offer any value for visitors, deleting it is probably the best option. However, if the page has backlinks, it may also be worth redirecting the URL to another relevant page to preserve backlink equity. 

TIP

Checking orphan pages for backlinks in bulk (up to 200 URLs at a time) is easy with Ahrefs’ Batch Analysis tool. Just paste URLs from your “cross reference” sheet and click Analyse.

Batch Analysis tool in Ahrefs

Let’s look at the four strategies to fix orphan pages.

Internally link

Orphan pages that are valuable for site visitors should be incorporated into your site’s internal linking structure to make them easier for visitors and search engines to find. 

For example, let’s say an article was forgotten during a site migration or redesign. We need to internally link to it from a relevant page we know Google will soon (re)crawl.

Here’s an easy way to do that in Ahrefs:

  1. Go to Site Audit
  2. Open your site’s most recent crawl 
  3. Under Tools > Open Page Explorer.
  4. Search for a word or phrase in Page text.
  5. Sort the results by Organic traffic.
Finding internal link opportunities in Ahrefs' Site Audit

This finds contextual internal linking opportunities on pages that get organic traffic, which means Google is likely to recrawl them sooner rather than later and see our changes. 

Learn more: How to Use Page Explorer

Noindex

Orphan pages that were intentionally not internally linked to, like landing pages for ads, should be noindexed to prevent them from appearing in organic search results. 

Most SEO plugins have made this as easy as checking a box, but you can also do it manually by copying and pasting this into the <head> section of the page:

<meta name="robots" content="noindex" />

Sidenote.

Make sure these pages are still crawlable in robots.txt. Otherwise, search engines won’t see the noindex directive. 

Merge/consolidate

Orphan pages with the same or similar content to another page should be merged. This means consolidating the content and redirecting the orphan URL to the other page.

For example, let’s say you have two product listings for the same product. One of them is an orphan page; the other isn’t. You should take any unique valuable information from the orphan page and add it to the other page before redirecting the orphan page there.

Delete

Orphan pages that offer no value for visitors and serve no other purpose (e.g., paid traffic campaign) should be deleted. 

For example, an unused CMS theme page can be removed. This will result in a 404 page and naturally drop out of search results over time.

Sidenote.

If the page has backlinks, you may want to redirect the URL to another relevant page to preserve link equity after deleting. 

How to prevent orphan pages

As you can see, auditing orphan pages is time intensive. So once you’ve put in the work, you want to prevent orphan pages in the future. Here are a few policies and procedures to consider.

Have a plan for site migrations

Be proactive by having a plan any time you do a website migration. You can avoid broken links and confusion on your website by redirecting old pages to new versions with a 301 redirect.

Set up your site structure for success

If you have to internally link to new pages manually, you’re bound to miss some and end up with orphan pages. This is why you should opt for a site structure that handles internal linking for you. 

Most types of CMS do this out of the box. For example, each time we publish a new blog post, WordPress adds an internal link from our blog homepage and archive. 

However, if you’re using a custom solution, you need to ensure the necessary code is in place for a good site structure.

Learn more: Website Structure: How to Build Your SEO Foundation

Remove discontinued products properly

If you run an e‑commerce site, you should remove discontinued products from the catalog (along with all internal links pointing to them) and set a status code of 404 or 410. Failing to remove internal links to such products is a common cause of orphan pages.

If the page has great backlinks and there is an updated or improved version of the product, you may want to consider keeping the page to preserve the backlink equity.

To do this, update the page content to explain why the product is no longer available, including introducing the new design features and linking to the new product page.

This way, the user is not landing on a completely unrelated page or 404.

Run regular site audits

By running the audit every month, you can stay on top of any accidental orphan pages that may slip through the cracks. You can do this easily using the scheduling feature in Ahrefs’ Site Audit.

Final thoughts

Looking at rows and rows of orphan page errors and trying to make sense of heavy technical jargon is intimidating.

While finding and fixing orphan pages is time intensive, it doesn’t need to be painstaking. Using Ahrefs’ Site Audit and the orphan pages flowchart will help streamline your process.

Got questions? Ping me on Twitter.





Source link

SEO

Google Business Profile Optimization For The Financial Vertical

Published

on

Google Business Profile Optimization For The Financial Vertical

The financial vertical is a dynamic, challenging, and highly regulated space.

As such, for businesses in this vertical, optimizing local search presence and, specifically, Google Business Profile listings requires a greater level of sensitivity and specialization than industries like retail or restaurant.

The inherent challenges stem from a host of considerations, such as internal branding guidelines, accessibility considerations, regulatory measures, and governance considerations among lines of business within the financial organization, among others.

This means that local listings in this vertical are not “one size fits all” but rather vary based on function and fall into one of several listing types, including branches, loan officers, financial advisors, and ATMS (which may be inclusive of walk-up ATMs, drive-through ATMs, and “smart ATMs”).

Each of these types of listings requires a unique set of hours, categories, hyper-local content, attributes, and a unique overall optimization strategy.

The goal of this article is to dive deeper into why having a unique optimization strategy matters for businesses in the financial vertical and share financial brand-specific best practices for listing optimization strategy.

Financial Brand Listing Type Considerations

One reason listing optimization is so nuanced in the financial vertical is that, in addition to all the listing features that vary by business function as mentioned above, Google also has essentially different classifications (or types) of listings by definition – each with its own set of guidelines (read “rules”) that apply according to a listing scenario.

This includes the distinction between a listing for an organization (e.g., for a bank branch) vs. that of an individual practitioner (used to represent a loan officer that may or may not sit at the branch, which has a separate listing).

Somewhere between those two main divisions, there may be a need for a department listing (e.g., for consumer banking vs. mortgages).

Again, each listing classification has rules and criteria around how (and how many) listings can be established for a given address and how they are represented.

Disregarding Google’s guidelines here carries the risk of disabled listings or even account-level penalties.

While that outcome is relatively rare, those risks are ill-advised and theoretically catastrophic to revenue and reputation in such a tightly regulated and competitive industry.

Editor’s note: If you have 10+ locations, you can request bulk verification.

Google Business Profile Category Selection

Category selection in Google Business Profile (GBP) is one of the most influential, and thus important, activities involved in creating and optimizing listings – in the context of ranking, visibility, and traffic attributable to the listing.

Keep in mind you can’t “keyword optimize” a GBP listing (unless you choose to violate Business Title guidelines), and this is by design on Google’s part.

Because of this, the primary and secondary categories that you select are collectively one of the strongest cues that you can send to Google around who should see your listing in the local search engine results pages (SERPs), and for what queries (think relevancy).

Suffice it to say this is a case where quality and specificity are more important than quantity.

This is, in part, because Google only allows for one primary category to be selected – but also because of the practice of spamming the secondary category field with as many entries as Google will allow (especially with categories that are only tangentially relevant for the listing) can have consequences that are both unintuitive and unintended.

The point is too many categories can (and often do) muddy the signal for Google’s algorithm regarding surfacing listings for appropriate queries and audiences.

This can lead to poor alignment with users’ needs and experiences and drive the wrong traffic.

It can also cause confusion for the algorithm around relevancy, resulting in the listing being suppressed or ranking poorly, thus driving less traffic.

Governance Vs. Cannibalization

Just as we already discussed the distinction between the choice of classification types and the practice of targeting categories appropriately according to the business functions and objectives represented by a given listing, these considerations play together to help frame a strategy around governance within the context of the organic local search channel.

The idea here is to create separation between lines of business (LOBs) to prevent internal competition over rankings and visibility for search terms that are misaligned for one or more LOB, such that they inappropriately cannibalize each other.

In simpler terms, users searching for a financial advisor or loan officer should not be served a listing for a consumer bank branch, and vice versa.

This creates a poor user experience that will ultimately result in frustrated users, complaints, and potential loss of revenue.

The Importance Of Category Selection

To illustrate this, see the example below.

A large investment bank might have the following recommended categories for Branches and Advisors, respectively (an asterisk refers to the primary category):

Branch Categories

  • *Investment Service.
  • Investment Company.
  • Financial Institution.

Advisor Categories

  • *Financial Consultant.
  • Financial Planner.
  • Financial Broker.

Notice the Branch categories signal relevance for the institution as a whole, whereas the Advisor categories align with Advisors (i.e., individual practitioners.) Obviously, these listings serve separate but complementary functions.

When optimized strategically, their visibility will align with the needs of users seeking out information about those functions accordingly.

Category selection is not the only factor involved in crafting a proper governance strategy, albeit an important one.

That said, all the other available data fields and content within the listings should be similarly planned and optimized in alignment with appropriate governance considerations, in addition to the overall relevancy and content strategy as applicable for the associated LOBs.

Specialized Financial Brand Listing Attributes

GBP attributes are data points about a listing that help communicate details about the business being represented.

They vary by primary category and are a great opportunity to serve users’ needs while boosting performance by differentiating against the competition, and feeding Google’s algorithm more relevant information about a given listing.

This is often referred to as the “listing completeness” aspect of Google’s local algorithm, which translates to “the more information Google has about a listing, the more precisely it can provide that listing to users according to the localized queries they use.”

The following is a list of attributes that are helpful for the financial vertical:

  • Online Appointments.
  • Black-Owned.
  • Family-Led.
  • Veteran-Led.
  • Women-Led.
  • Appointment Links.
  • Wheelchair Accessible Elevator.
  • Wheelchair Accessible Entrance.
  • Wheelchair Accessible Parking Lot.

The following chart helps to illustrate which attributes are best suited for listing based on listing/LOB/ORG type:

Image from Rio SEO, December 2022

Managing Hours Of Operation

This is an important and often overlooked aspect of listings management in the financial space and in general.

Hours of operation, first and foremost, should be present in the listings, not left out. While providing hours is not mandatory, not doing so will impact user experience and visibility.

Like most of the previous items, hours for a bank branch (e.g., 10 am to 5 pm) will be different than those of the drive-through ATM (open 24 hours), and that of a mortgage loan officer and financial advisor that both have offices at the same address.

Each of these services and LOBs can best be represented by separate listings, each with its own hours of operation.

Leaving these details out, or using the same set of operating hours across all of these LOBs and listing types, sets users up for frustration and prevents Google from properly serving and messaging users around a given location’s availability (such as “open now,” “closing soon,” or “closed,” as applicable.)

All of this leads to either missed opportunities when hours are omitted, allowing a competitor (that Google knows is open) to rank higher in the SERPs, or frustrated customers that arrive at an investment banking office expecting to make a consumer deposit or use an ATM.

Appointment URL With Local Attribution Tracking

This is especially relevant for individual practitioner listings such as financial advisors, mortgage loan officers, and insurance agents.

Appointment URLs allow brands to publish a link where clients can book appointments with the individual whose listing the user finds and interacts within search.

This is a low-hanging fruit tactic that can make an immediate and significant impact on lead generation and revenue.

Taking this another step, these links can be tagged with UTM parameters (for brands using Google Analytics and similarly tagged for other analytic platforms) to track conversion events, leads, and revenue associated with this listing feature.

Editorial note: Here is an example of a link with UTM parameters: https://www.domain.com/?utm_source=source&utm_medium=medium&utm_campaign=campaign

 

Financial vertical appointment booking exampleImage from Google, December 2022

Leveraging Services

Services can be added to a listing to let potential customers know what services are available at a given location.

add-services-google-business-profileScreenshot from Google, January 2023

Services in GBP are subject to availability by primary category, another reason category selection is so important, as discussed above.

Specifically, once services are added to a listing, they will be prominently displayed on the listing within the mobile SERPs under the “Services” tab of the listing.

financial-brand-services-google-business-profile-mobileScreenshot from Google, January 2023

This not only feeds more data completeness, which benefits both mobile and desktop performance, and increases engagement in the mobile SERPs (click to website, call, driving directions) which are bottom-funnel key performance indicators (KPIs) that drive revenue.

Google Posts

Google Posts represent a content marketing opportunity that is valuable on multiple levels.

An organization can post relevant, evergreen content that is strategically optimized for key localized phrases, services, and product offerings.

While there is no clear evidence or admission by Google that relevant content will have a direct impact on rankings overall for that listing, what we can say for certain from observation is that listings with well-optimized posts do present in the local SERPs landscape for keyword queries that align with that content.

This happens in the form of “related to your search” snippets and has been widely observed since 2019.

This has a few different implications, reinforcing the benefits of leveraging Google Posts in your local search strategy.

First, given that Post snippets are triggered, it is fair to infer that if a given listing did not have the relevant post, that listing may not have surfaced at all in the SERPs. Thus, we can infer a benefit around visibility, which leads to more traffic.

Second, it is well-documented that featured snippets are associated with boosts in click-through rate (CTR), which amplifies the traffic increases that result from the increased visibility alone.

Additional Post Benefits

Beyond these two very obvious benefits of Google Posts, they also provide many benefits around messaging potential visitors and clients with relevant information about the location, including products, services, promotions, events, limited-time offers, and potentially many others.

Use cases for this can include consumer banks that feature free checking or direct deposit or financial advisors that offer a free 60-minute initial consultation.

Taking the time to publish posts that highlight these differentiators could have a measurable impact on traffic, CTR, and revenue.

Another great aspect of Google Posts is that, for a while, they were designed to be visible according to specific date ranges – and, at one time, would “expire” or fall out of the SERPs once the time period passed.

Certain post types will surface long after the expiration date of the post if there is a relevancy match between the user’s query and the content.

Concluding Thoughts

To summarize, the financial vertical requires a highly specialized, precise GBP optimization strategy, which is well-vetted for the needs of users, LOBs, and regulatory compliance.

Considerations like primary and secondary categories, hours, attributes, services, and content (in the form of Google Posts) all play a critical role in defining that overall strategy, including setting up and maintaining crucial governance boundaries between complementary LOBs.

Undertaking all these available listing features holistically and strategically allows financial institutions and practitioners to maximize visibility, engagement, traffic, revenue, and overall performance from local search while minimizing cannibalism, complaints, and poor user experience.

More resources: 


Featured Image: Andrey_Popov/Shutterstock



Source link

Continue Reading

SEO

11 Disadvantages Of ChatGPT Content

Published

on

11 Disadvantages Of ChatGPT Content

ChatGPT produces content that is comprehensive and plausibly accurate.

But researchers, artists, and professors warn of shortcomings to be aware of which degrade the quality of the content.

In this article, we’ll look at 11 disadvantages of ChatGPT content. Let’s dive in.

1. Phrase Usage Makes It Detectable As Non-Human

Researchers studying how to detect machine-generated content have discovered patterns that make it sound unnatural.

One of these quirks is how AI struggles with idioms.

An idiom is a phrase or saying with a figurative meaning attached to it, for example, “every cloud has a silver lining.” 

A lack of idioms within a piece of content can be a signal that the content is machine-generated – and this can be part of a detection algorithm.

This is what the 2022 research paper Adversarial Robustness of Neural-Statistical Features in Detection of Generative Transformers says about this quirk in machine-generated content:

“Complex phrasal features are based on the frequency of specific words and phrases within the analyzed text that occur more frequently in human text.

…Of these complex phrasal features, idiom features retain the most predictive power in detection of current generative models.”

This inability to use idioms contributes to making ChatGPT output sound and read unnaturally.

2. ChatGPT Lacks Ability For Expression

An artist commented on how the output of ChatGPT mimics what art is, but lacks the actual qualities of artistic expression.

Expression is the act of communicating thoughts or feelings.

ChatGPT output doesn’t contain expressions, only words.

It cannot produce content that touches people emotionally on the same level as a human can – because it has no actual thoughts or feelings.

Musical artist Nick Cave, in an article posted to his Red Hand Files newsletter, commented on a ChatGPT lyric that was sent to him, which was created in the style of Nick Cave.

He wrote:

“What makes a great song great is not its close resemblance to a recognizable work.

…it is the breathless confrontation with one’s vulnerability, one’s perilousness, one’s smallness, pitted against a sense of sudden shocking discovery; it is the redemptive artistic act that stirs the heart of the listener, where the listener recognizes in the inner workings of the song their own blood, their own struggle, their own suffering.”

Cave called the ChatGPT lyrics a mockery.

This is the ChatGPT lyric that resembles a Nick Cave lyric:

“I’ve got the blood of angels, on my hands
I’ve got the fire of hell, in my eyes
I’m the king of the abyss, I’m the ruler of the dark
I’m the one that they fear, in the shadows they hark”

And this is an actual Nick Cave lyric (Brother, My Cup Is Empty):

“Well I’ve been sliding down on rainbows
I’ve been swinging from the stars
Now this wretch in beggar’s clothing
Bangs his cup across the bars
Look, this cup of mine is empty!
Seems I’ve misplaced my desires
Seems I’m sweeping up the ashes
Of all my former fires”

It’s easy to see that the machine-generated lyric resembles the artist’s lyric, but it doesn’t really communicate anything.

Nick Cave’s lyrics tell a story that resonates with the pathos, desire, shame, and willful deception of the person speaking in the song. It expresses thoughts and feelings.

It’s easy to see why Nick Cave calls it a mockery.

3. ChatGPT Does Not Produce Insights

An article published in The Insider quoted an academic who noted that academic essays generated by ChatGPT lack insights about the topic.

ChatGPT summarizes the topic but does not offer a unique insight into the topic.

Humans create through knowledge, but also through their personal experience and subjective perceptions.

Professor Christopher Bartel of Appalachian State University is quoted by The Insider as saying that, while a ChatGPT essay may exhibit high grammar qualities and sophisticated ideas, it still lacked insight.

Bartel said:

“They are really fluffy. There’s no context, there’s no depth or insight.”

Insight is the hallmark of a well-done essay and it’s something that ChatGPT is not particularly good at.

This lack of insight is something to keep in mind when evaluating machine-generated content.

4. ChatGPT Is Too Wordy

A research paper published in January 2023 discovered patterns in ChatGPT content that makes it less suitable for critical applications.

The paper is titled, How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection.

The research showed that humans preferred answers from ChatGPT in more than 50% of questions answered related to finance and psychology.

But ChatGPT failed at answering medical questions because humans preferred direct answers – something the AI didn’t provide.

The researchers wrote:

“…ChatGPT performs poorly in terms of helpfulness for the medical domain in both English and Chinese.

The ChatGPT often gives lengthy answers to medical consulting in our collected dataset, while human experts may directly give straightforward answers or suggestions, which may partly explain why volunteers consider human answers to be more helpful in the medical domain.”

ChatGPT tends to cover a topic from different angles, which makes it inappropriate when the best answer is a direct one.

Marketers using ChatGPT must take note of this because site visitors requiring a direct answer will not be satisfied with a verbose webpage.

And good luck ranking an overly wordy page in Google’s featured snippets, where a succinct and clearly expressed answer that can work well in Google Voice may have a better chance to rank than a long-winded answer.

OpenAI, the makers of ChatGPT, acknowledges that giving verbose answers is a known limitation.

The announcement article by OpenAI states:

“The model is often excessively verbose…”

The ChatGPT bias toward providing long-winded answers is something to be mindful of when using ChatGPT output, as you may encounter situations where shorter and more direct answers are better.

5. ChatGPT Content Is Highly Organized With Clear Logic

ChatGPT has a writing style that is not only verbose but also tends to follow a template that gives the content a unique style that isn’t human.

This inhuman quality is revealed in the differences between how humans and machines answer questions.

The movie Blade Runner has a scene featuring a series of questions designed to reveal whether the subject answering the questions is a human or an android.

These questions were a part of a fictional test called the “Voigt-Kampff test“.

One of the questions is:

“You’re watching television. Suddenly you realize there’s a wasp crawling on your arm. What do you do?”

A normal human response would be to say something like they would scream, walk outside and swat it, and so on.

But when I posed this question to ChatGPT, it offered a meticulously organized answer that summarized the question and then offered logical multiple possible outcomes – failing to answer the actual question.

Screenshot Of ChatGPT Answering A Voight-Kampff Test Question

Screenshot from ChatGPT, January 2023

The answer is highly organized and logical, giving it a highly unnatural feel, which is undesirable.

6. ChatGPT Is Overly Detailed And Comprehensive

ChatGPT was trained in a way that rewarded the machine when humans were happy with the answer.

The human raters tended to prefer answers that had more details.

But sometimes, such as in a medical context, a direct answer is better than a comprehensive one.

What that means is that the machine needs to be prompted to be less comprehensive and more direct when those qualities are important.

From OpenAI:

“These issues arise from biases in the training data (trainers prefer longer answers that look more comprehensive) and well-known over-optimization issues.”

7. ChatGPT Lies (Hallucinates Facts)

The above-cited research paper, How Close is ChatGPT to Human Experts?, noted that ChatGPT has a tendency to lie.

It reports:

“When answering a question that requires professional knowledge from a particular field, ChatGPT may fabricate facts in order to give an answer…

For example, in legal questions, ChatGPT may invent some non-existent legal provisions to answer the question.

…Additionally, when a user poses a question that has no existing answer, ChatGPT may also fabricate facts in order to provide a response.”

The Futurism website documented instances where machine-generated content published on CNET was wrong and full of “dumb errors.”

CNET should have had an idea this could happen, because OpenAI published a warning about incorrect output:

“ChatGPT sometimes writes plausible-sounding but incorrect or nonsensical answers.”

CNET claims to have submitted the machine-generated articles to human review prior to publication.

A problem with human review is that ChatGPT content is designed to sound persuasively correct, which may fool a reviewer who is not a topic expert.

8. ChatGPT Is Unnatural Because It’s Not Divergent

The research paper, How Close is ChatGPT to Human Experts? also noted that human communication can have indirect meaning, which requires a shift in topic to understand it.

ChatGPT is too literal, which causes the answers to sometimes miss the mark because the AI overlooks the actual topic.

The researchers wrote:

“ChatGPT’s responses are generally strictly focused on the given question, whereas humans’ are divergent and easily shift to other topics.

In terms of the richness of content, humans are more divergent in different aspects, while ChatGPT prefers focusing on the question itself.

Humans can answer the hidden meaning under the question based on their own common sense and knowledge, but the ChatGPT relies on the literal words of the question at hand…”

Humans are better able to diverge from the literal question, which is important for answering “what about” type questions.

For example, if I ask:

“Horses are too big to be a house pet. What about raccoons?”

The above question is not asking if a raccoon is an appropriate pet. The question is about the size of the animal.

ChatGPT focuses on the appropriateness of the raccoon as a pet instead of focusing on the size.

Screenshot of an Overly Literal ChatGPT Answer

11 Disadvantages Of ChatGPT ContentScreenshot from ChatGPT, January 2023

9. ChatGPT Contains A Bias Towards Being Neutral

The output of ChatGPT is generally neutral and informative. It’s a bias in the output that can appear helpful but isn’t always.

The research paper we just discussed noted that neutrality is an unwanted quality when it comes to legal, medical, and technical questions.

Humans tend to pick a side when offering these kinds of opinions.

10. ChatGPT Is Biased To Be Formal

ChatGPT output has a bias that prevents it from loosening up and answering with ordinary expressions. Instead, its answers tend to be formal.

Humans, on the other hand, tend to answer questions with a more colloquial style, using everyday language and slang – the opposite of formal.

ChatGPT doesn’t use abbreviations like GOAT or TL;DR.

The answers also lack instances of irony, metaphors, and humor, which can make ChatGPT content overly formal for some content types.

The researchers write:

“…ChatGPT likes to use conjunctions and adverbs to convey a logical flow of thought, such as “In general”, “on the other hand”, “Firstly,…, Secondly,…, Finally” and so on.

11. ChatGPT Is Still In Training

ChatGPT is currently still in the process of training and improving.

OpenAI recommends that all content generated by ChatGPT should be reviewed by a human, listing this as a best practice.

OpenAI suggests keeping humans in the loop:

“Wherever possible, we recommend having a human review outputs before they are used in practice.

This is especially critical in high-stakes domains, and for code generation.

Humans should be aware of the limitations of the system, and have access to any information needed to verify the outputs (for example, if the application summarizes notes, a human should have easy access to the original notes to refer back).”

Unwanted Qualities Of ChatGPT

It’s clear that there are many issues with ChatGPT that make it unfit for unsupervised content generation. It contains biases and fails to create content that feels natural or contains genuine insights.

Further, its inability to feel or author original thoughts makes it a poor choice for generating artistic expressions.

Users should apply detailed prompts in order to generate content that is better than the default content it tends to output.

Lastly, human review of machine-generated content is not always enough, because ChatGPT content is designed to appear correct, even when it’s not.

That means it’s important that human reviewers are subject-matter experts who can discern between correct and incorrect content on a specific topic.

More resources: 


Featured image by Shutterstock/fizkes



Source link

Continue Reading

SEO

9 Common Technical SEO Issues That Actually Matter

Published

on

9 Common Technical SEO Issues That Actually Matter

In this article, we’ll see how to find and fix technical SEO issues, but only those that can seriously affect your rankings.

If you’d like to follow along, get Ahrefs Webmaster Tools and Google Search Console (both are free) and check for the following issues.

Indexability is a webpage’s ability to be indexed by search engines. Pages that are not indexable can’t be displayed on the search engine results pages and can’t bring in any search traffic. 

Three requirements must be met for a page to be indexable:

  1. The page must be crawlable. If you haven’t blocked Googlebot from entering the page robots.txt or you have a website with fewer than 1,000 pages, you probably don’t have an issue there. 
  2. The page must not have a noindex tag (more on that in a bit).
  3. The page must be canonical (i.e., the main version). 

Solution

In Ahrefs Webmaster Tools (AWT):  

  1. Open Site Audit
  2. Go to the Indexability report 
  3. Click on issues related to canonicalization and “noindex” to see affected pages
Indexability issues in Site Audit

For canonicalization issues in this report, you will need to replace bad URLs in the link rel="canonical" tag with valid ones (i.e., returning an “HTTP 200 OK”). 

As for pages marked by “noindex” issues, these are the pages with the “noindex” meta tag placed inside their code. Chances are most of the pages found in the report there should stay as is. But if you see any pages that shouldn’t be there, simply remove the tag. Do make sure those pages aren’t blocked by robots.txt first. 

Recommendation

Click on the question mark on the right to see instructions on how to fix each issue. For more detailed instructions, click on the “Learn more” link. 

Instruction on how to fix an SEO issue in Site Audit

A sitemap should contain only pages that you want search engines to index. 

When a sitemap isn’t regularly updated or an unreliable generator has been used to make it, a sitemap may start to show broken pages, pages that became “noindexed,” pages that were de-canonicalized, or pages blocked in robots.txt. 

Solution 

In AWT:

  1. Open Site Audit 
  2. Go to the All issues report
  3. Click on issues containing the word “sitemap” to find affected pages 
Sitemap issues shown in Site Audit

Depending on the issue, you will have to:

  • Delete the pages from the sitemap.
  • Remove the noindex tag on the pages (if you want to keep them in the sitemap). 
  • Provide a valid URL for the reported page. 

Google uses HTTPS encryption as a small ranking signal. This means you can experience lower rankings if you don’t have an SSL or TLS certificate securing your website. 

But even if you do, some pages and/or resources on your pages may still use the HTTP protocol. 

Solution 

Assuming you already have an SSL/TLS certificate for all subdomains (if not, do get one), open AWT and do these: 

  1. Open Site Audit
  2. Go to the Internal pages report 
  3. Look at the protocol distribution graph and click on HTTP to see affected pages
  4. Inside the report showing pages, add a column for Final redirect URL 
  5. Make sure all HTTP pages are permanently redirected (301 or 308 redirects) to their HTTPS counterparts 
Protocol distribution graph
Internal pages issues report with added column

Finally, let’s check if any resources on the site still use HTTP: 

  1. Inside the Internal pages report, click on Issues
  2. Click on HTTPS/HTTP mixed content to view affected resources 
Site Audit reporting six HTTPS/HTTP mixed content issues

You can fix this issue by one of these methods:

  • Link to the HTTPS version of the resource (check this option first) 
  • Include the resource from a different host, if available 
  • Download and host the content on your site directly if you are legally allowed to do so
  • Exclude the resource from your site altogether

Learn more: What Is HTTPS? Everything You Need to Know 

Duplicate content happens when exact or near-duplicate content appears on the web in more than one place. 

It’s bad for SEO mainly for two reasons: It can cause undesirable URLs to show in search results and can dilute link equity

Content duplication is not necessarily a case of intentional or unintentional creation of similar pages. There are other less obvious causes such as faceted navigation, tracking parameters in URLs, or using trailing and non-trailing slashes

Solution 

First, check if your website is available under only one URL. Because if your site is accessible as:

  • http://domain.com
  • http://www.domain.com
  • https://domain.com
  • https://www.domain.com

Then Google will see all of those URLs as different websites. 

The easiest way to check if users can browse only one version of your website: type in all four variations in the browser, one by one, hit enter, and see if they get redirected to the master version (ideally, the one with HTTPS). 

You can also go straight into Site Audit’s Duplicates report. If you see 100% bad duplicates, that is likely the reason.

Duplicates report showing 100% bad duplicates
Simulation (other types of duplicates turned off).

In this case, choose one version that will serve as canonical (likely the one with HTTPS) and permanently redirect other versions to it. 

Then run a New crawl in Site Audit to see if there are any other bad duplicates left. 

Running a new crawl in Site Audit

There are a few ways you can handle bad duplicates depending on the case. Learn how to solve them in our guide

Learn more: Duplicate Content: Why It Happens and How to Fix It 

Pages that can’t be found (4XX errors) and pages returning server errors (5XX errors) won’t be indexed by Google so they won’t bring you any traffic. 

Furthermore, if broken pages have backlinks pointing to them, all of that link equity goes to waste. 

Broken pages are also a waste of crawl budget—something to watch out for on bigger websites. 

Solution

In AWT, you should: 

  1. Open Site Audit.
  2. Go to the Internal pages report.
  3. See if there are any broken pages. If so, the Broken section will show a number higher than 0. Click on the number to show affected pages.
Broken pages report in Site Audit

In the report showing pages with issues, it’s a good idea to add a column for the number of referring domains. This will help you make the decision on how to fix the issue. 

Internal pages report with no. of referring domains column added

Now, fixing broken pages (4XX error codes) is quite simple, but there is more than one possibility. Here’s a short graph explaining the process:

How to deal with broken pages

Dealing with server errors (the ones reporting a 5XX) can be a tougher one, as there are different possible reasons for a server to be unresponsive. Read this short guide for troubleshooting.

Recommendation

With AWT, you can also see 404s that were caused by incorrect links to your website. While this is not a technical issue per se, reclaiming those links may give you an additional SEO boost.

  1. Go to Site Explorer
  2. Enter your domain 
  3. Go to the Best by links report
  4. Add a “404 not found” filter
  5. Then sort the report by referring domains from high to low
How to find broken backlinks in Site Explorer
In this example, someone linked to us, leaving a comma inside the URL.

If you’ve already dealt with broken pages, chances are you’ve fixed most of the broken links issues. 

Other critical issues related to links are: 

  • Orphan pages – These are the pages without any internal links. Web crawlers have limited ability to access those pages (only from sitemap or backlinks), and there is no link equity flowing to them from other pages on your site. Last but not least, users won’t be able to access this page from the site navigation. 
  • HTTPS pages linking to internal HTTP pages – If an internal link on your website brings users to an HTTP URL, web browsers will likely show a warning about a non-secure page. This can damage your overall website authority and user experience.

Solution

In AWT, you can:

  1. Go to Site Audit.
  2. Open the Links report.
  3. Open the Issues tab. 
  4. Look for the following issues in the Indexable category. Click to see affected pages. 
Important SEO issues related to links

Fix the first issue by changing the links from HTTP to HTTPS or simply delete those links if no longer needed.

For the second issue, an orphan page needs to be either linked to from some other page on your website or deleted if a given page holds no value to you.

Sidenote.

Ahrefs’ Site Audit can find orphan pages as long as they have backlinks or are included in the sitemap. For a more thorough search for this issue, you will need to analyze server logs to find orphan pages with hits. Find out how in this guide.

7. Mobile experience issues

Having a mobile-friendly website is a must for SEO. Two reasons: 

  1. Google uses mobile-first indexing – It’s mostly using the content of mobile pages for indexing and ranking.
  2. Mobile experience is part of the Page Experience signals – While Google will allegedly always “promote” the page with the best content, page experience can be a tiebreaker for pages offering content of similar quality. 

Solution

In GSC: 

  1. Go to the Mobile Usability report in the Experience section
  2. View affected pages by clicking on issues in the Why pages aren’t usable on mobile section 
Mobile Usability report in Google Search Console

You can read Google’s guide for fixing mobile issues here.  

8. Performance and stability issues 

Performance and visual stability are other aspects of Page Experience signals used by Google to rank pages. 

Google has developed a special set of metrics to measure user experience called Core Web Vitals (CWV). Site owners and SEOs can use those metrics to see how Google perceives their website in terms of UX. 

Google's search signals for page experience

While page experience can be a ranking tiebreaker, CWV is not a race. You don’t need to have the fastest website on the internet. You just need to score “good” ideally in all three categories: loading, interactivity, and visual stability. 

Three categories of Core Web Vitals

Solution 

In GSC: 

  1. First, click on Core Web Vitals in the Experience section of the reports.
  2. Then click Open report in each section to see how your website scores. 
  3. For pages that aren’t considered good, you’ll see a special section at the bottom of the report. Use it to see pages that need your attention.
How to find Core Web Vitals in Google Search Console
CWV issue report in Google Search Console

Optimizing for CWV may take some time. This may include things like moving to a faster (or closer) server, compressing images, optimizing CSS, etc. We explain how to do this in the third part of this guide to CWV. 

Bad website structure in the context of technical SEO is mainly about having important organic pages too deep into the website structure. 

Pages that are nested too deep (i.e., users need >6 clicks from the website to get to them) will receive less link equity from your homepage (likely the page with the most backlinks), which may affect their rankings. This is because link value diminishes with every link “hop.” 

Sidenote.

Website structure is important for other reasons too such as the overall user experience, crawl efficiency, and helping Google understand the context of your pages. Here, we’ll only focus on the technical aspect, but you can read more about the topic in our full guide: Website Structure: How to Build Your SEO Foundation.

Solution 

In AWT

  1. Open Site Audit
  2. Go to Structure explorer, switch to the Depth tab, and set the data type to Data table
  3. Configure the Segment to only valid HTML pages and click Apply
  4. Use the graph to investigate pages with more than six clicks away from the homepage 
How to find site structure issues in Site Audit
Adding a new segment in Site Audit

The way to fix the issue is to link to these deeper nested pages from pages closer to the homepage. More important pages could find their place in site navigation, while less important ones can be just linked to the pages a few clicks closer.

It’s a good idea to weigh in user experience and the business role of your website when deciding what goes into sitewide navigation. 

For example, we could probably give our SEO glossary a slightly higher chance to get ahead of organic competitors by including it in the main site navigation. Yet we decided not to because it isn’t such an important page for users who are not particularly searching for this type of information. 

We’ve moved the glossary only up a notch by including a link inside the beginner’s guide to SEO (which itself is just one click away from the homepage). 

Structure explorer showing glossary page is two clicks away from the homepage
One page from the glossary folder is two clicks away from the homepage.
Link that moved SEO glossary a click closer to the homepage
Just one link, even at the bottom of a page, can move a page higher in the overall structure.

Final thoughts 

When you’re done fixing the more pressing issues, dig a little deeper to keep your site in perfect SEO health. Open Site Audit and go to the All issues report to see other issues regarding on-page SEO, image optimization, redirects, localization, and more. In each case, you will find instructions on how to deal with the issue. 

All issues report in Site Audit

You can also customize this report by turning issues on/off or changing their priority. 

Issue report in Site Audit is customizable

Did I miss any important technical issues? Let me know on Twitter or Mastodon.



Source link

Continue Reading

Trending

en_USEnglish