Google: It’s Normal for 20% of a Site to Not Be Indexed
Google’s John Mueller answered a question about indexing, offering insights into how overall site quality influences indexing patterns. He also offered the insight that it’s within the bounds of normal that 20% of a site’s content is not indexed.
Pages Discovered But Not Crawled
The person asking the question offered background information about their site.
Of particular concern was the stated fact that the server was overloaded and if that might affect how many pages Google indexes.
When a server is overloaded the request for a web page may result in a 500 error response. This is because when a server cannot serve a web page the standard response is a 500 Internal Server Error message.
The person asking the question did not mention that Google Search Console was reporting that Googlebot was receiving 500 error response codes.
So if it’s the case that Googlebot did not receive a 500 error response then the server overload issue is probably not the reason why 20% of the pages are not getting indexed.
The person asked the following question:
“20% of my pages are not getting indexed.
It says they’re discovered but not crawled.
Does this have anything to do with the fact that it’s not crawled because of potential overload of my server?
Or does it have to do with the quality of the page?”
Crawl Budget Not Generally Why Small Sites Have Non-indexed Pages
Google’s John Mueller offered an interesting explanation of how overall site quality is an important factor that determines whether Googlebot will index more web pages.
But first he discussed how the crawl budget isn’t usually a reason why pages remain non-indexed for a small site.
John Mueller answered:
“Probably a little of both.
So usually if we’re talking about a smaller site then it’s mostly not a case that we’re limited by the crawling capacity, which is the crawl budget side of things.
If we’re talking about a site that has millions of pages, then that’s something where I would consider looking at the crawl budget side of things.
But smaller sites probably less so.”
Overall Site Quality Determines Indexing
John next went into detail about how overall site quality can affect how much of a website is crawled and indexed.
This part is especially interesting because it gives a peek at how Google evaluates a site in terms of quality and how the overall impression influences indexing.
Mueller continued his answer:
“With regards to the quality, when it comes to understanding the quality of the website, that is something that we take into account quite strongly with regards to crawling and indexing of the rest of the website.
But that’s not something that’s necessarily related to the individual URL.
So if you have five pages that are not indexed at the moment, it’s not that those five pages are the ones we would consider low quality.
It’s more that …overall, we consider this website maybe to be a little bit lower quality. And therefore we won’t go off and index everything on this site.
Because if we don’t have that page indexed, then we’re not really going to know if that’s high quality or low quality.
So that’s the direction I would head there …if you have a smaller site and you’re seeing a significant part of your pages are not being indexed, then I would take a step back and try to reconsider the overall quality of the website and not focus so much on technical issues for those pages.”
Technical Factors and Indexing
Mueller next mentions technical factors and how easy it is for modern sites to get that part right so that it doesn’t get in the way of indexing.
Mueller observed:
“Because I think, for the most part, sites nowadays are technically reasonable.
If you’re using a common CMS then it’s really hard to do something really wrong.
And it’s often more a matter of the overall quality.”
It’s Normal for 20% of a Site to Not Be Indexed
This next part is also interesting in that Mueller downplays 20% of a site not indexed as something that is within the bounds of normal.
Mueller has more access to information about how much of sites are typically not indexed so I take him at his word because he speaking from the perspective of Google.
Mueller explains why it’s normal for pages to not be indexed:
“The other thing to keep in mind with regards to indexing, is it’s completely normal that we don’t index everything off of the website.
So if you look at any larger website or any even midsize or smaller website, you’ll see fluctuations in indexing.
It’ll go up and down and it’s never going to be the case that we index 100% of everything that’s on a website.
So if you have a hundred pages and (I don’t know) 80 of them are being indexed, then I wouldn’t see that as being a problem that you need to fix.
That’s sometimes just how it is for the moment.
And over time, when you get to like 200 pages on your website and we index 180 of them, then that percentage gets a little bit smaller.
But it’s always going to be the case that we don’t index 100% of everything that we know about.”
Don’t Panic if Pages Aren’t Indexed
There’s quite a lot of information Mueller shared about indexing to take in.
- It’s within the bounds of normal for 20% of a site to not be indexed.
- Technical issues probably won’t impeded indexing.
- Overall site quality can determine how much of a site gets indexed.
- How much of a site gets indexed fluctuates.
- Small sites generally don’t have to worry about crawl budget.
Citation
It’s Normal for 20% of a Site to be Non-indexed
Watch Mueller discussing what is normal indexing from about the 27:26 minute mark.
Google Warns About Misuse of Its Indexing API
Google has updated its Indexing API documentation with a clear warning about spam detection and the possible consequences of misuse.
Warning Against API Misuse The new message in the guide says:
“All submissions through the Indexing API are checked for spam. Any misuse, like using multiple accounts or going over the usage limits, could lead to access being taken away.”
This warning is aimed at people trying to abuse the system by exceeding the API’s limits or breaking Google’s rules.
What Is the Indexing API? The Indexing API allows websites to tell Google when job posting or livestream video pages are added or removed. It helps websites with fast-changing content get their pages crawled and indexed quickly.
But it seems some users have been trying to abuse this by using multiple accounts to get more access.
Impact of the Update Google is now closely watching how people use the Indexing API. If someone breaks the rules, they might lose access to the tool, which could make it harder for them to keep their search results updated for time-sensitive content.
How To Stay Compliant To use the Indexing API properly, follow these rules:
- Don’t go over the usage limits, and if you need more, ask Google instead of using multiple accounts.
- Use the API only for job postings or livestream videos, and make sure your data is correct.
- Follow all of Google’s API guidelines and spam policies.
- Use sitemaps along with the API, not as a replacement.
Remember, the Indexing API isn’t a shortcut to faster indexing. Follow the rules to keep your access.
This Week in Search News: Simple and Easy-to-Read Update
Here’s what happened in the world of Google and search engines this week:
1. Google’s June 2024 Spam Update
Google finished rolling out its June 2024 spam update over a period of seven days. This update aims to reduce spammy content in search results.
2. Changes to Google Search Interface
Google has removed the continuous scroll feature for search results. Instead, it’s back to the old system of pages.
3. New Features and Tests
- Link Cards: Google is testing link cards at the top of AI-generated overviews.
- Health Overviews: There are more AI-generated health overviews showing up in search results.
- Local Panels: Google is testing AI overviews in local information panels.
4. Search Rankings and Quality
- Improving Rankings: Google said it can improve its search ranking system but will only do so on a large scale.
- Measuring Quality: Google’s Elizabeth Tucker shared how they measure search quality.
5. Advice for Content Creators
- Brand Names in Reviews: Google advises not to avoid mentioning brand names in review content.
- Fixing 404 Pages: Google explained when it’s important to fix 404 error pages.
6. New Search Features in Google Chrome
Google Chrome for mobile devices has added several new search features to enhance user experience.
7. New Tests and Features in Google Search
- Credit Card Widget: Google is testing a new widget for credit card information in search results.
- Sliding Search Results: When making a new search query, the results might slide to the right.
8. Bing’s New Feature
Bing is now using AI to write “People Also Ask” questions in search results.
9. Local Search Ranking Factors
Menu items and popular times might be factors that influence local search rankings on Google.
10. Google Ads Updates
- Query Matching and Brand Controls: Google Ads updated its query matching and brand controls, and advertisers are happy with these changes.
- Lead Credits: Google will automate lead credits for Local Service Ads. Google says this is a good change, but some advertisers are worried.
- tROAS Insights Box: Google Ads is testing a new insights box for tROAS (Target Return on Ad Spend) in Performance Max and Standard Shopping campaigns.
- WordPress Tag Code: There is a new conversion code for Google Ads on WordPress sites.
These updates highlight how Google and other search engines are continuously evolving to improve user experience and provide better advertising tools.
AI
Exploring the Evolution of Language Translation: A Comparative Analysis of AI Chatbots and Google Translate
According to an article on PCMag, while Google Translate makes translating sentences into over 100 languages easy, regular users acknowledge that there’s still room for improvement.
In theory, large language models (LLMs) such as ChatGPT are expected to bring about a new era in language translation. These models consume vast amounts of text-based training data and real-time feedback from users worldwide, enabling them to quickly learn to generate coherent, human-like sentences in a wide range of languages.
However, despite the anticipation that ChatGPT would revolutionize translation, previous experiences have shown that such expectations are often inaccurate, posing challenges for translation accuracy. To put these claims to the test, PCMag conducted a blind test, asking fluent speakers of eight non-English languages to evaluate the translation results from various AI services.
The test compared ChatGPT (both the free and paid versions) to Google Translate, as well as to other competing chatbots such as Microsoft Copilot and Google Gemini. The evaluation involved comparing the translation quality for two test paragraphs across different languages, including Polish, French, Korean, Spanish, Arabic, Tagalog, and Amharic.
In the first test conducted in June 2023, participants consistently favored AI chatbots over Google Translate. ChatGPT, Google Bard (now Gemini), and Microsoft Bing outperformed Google Translate, with ChatGPT receiving the highest praise. ChatGPT demonstrated superior performance in converting colloquialisms, while Google Translate often provided literal translations that lacked cultural nuance.
For instance, ChatGPT accurately translated colloquial expressions like “blow off steam,” whereas Google Translate produced more literal translations that failed to resonate across cultures. Participants appreciated ChatGPT’s ability to maintain consistent levels of formality and its consideration of gender options in translations.
The success of AI chatbots like ChatGPT can be attributed to reinforcement learning with human feedback (RLHF), which allows these models to learn from human preferences and produce culturally appropriate translations, particularly for non-native speakers. However, it’s essential to note that while AI chatbots outperformed Google Translate, they still had limitations and occasional inaccuracies.
In a subsequent test, PCMag evaluated different versions of ChatGPT, including the free and paid versions, as well as language-specific AI agents from OpenAI’s GPTStore. The paid version of ChatGPT, known as ChatGPT Plus, consistently delivered the best translations across various languages. However, Google Translate also showed improvement, performing surprisingly well compared to previous tests.
Overall, while ChatGPT Plus emerged as the preferred choice for translation, Google Translate demonstrated notable improvement, challenging the notion that AI chatbots are always superior to traditional translation tools.
Source: https://www.pcmag.com/articles/google-translate-vs-chatgpt-which-is-the-best-language-translator
-
WORDPRESS4 days ago
WordPress biz Automattic details WP Engine deal demands • The Register
-
SEARCHENGINES6 days ago
Daily Search Forum Recap: September 30, 2024
-
SEARCHENGINES5 days ago
Daily Search Forum Recap: October 1, 2024
-
AFFILIATE MARKETING7 days ago
Nvidia CEO Jensen Huang Praises Nuclear Energy to Power AI
-
WORDPRESS6 days ago
How to Easily Add Snapchat Pixel for WooCommerce in WordPress
-
SEARCHENGINES4 days ago
Programming Note: Rosh Hashanah 5785
-
SEO6 days ago
Bing Expands Generative Search Capabilities For Complex Queries
-
SEARCHENGINES3 days ago
Daily Search Forum Recap: October 3, 2024