Connect with us

SEO

Google Clarifies the “Google-Extended” Crawler Documentation

Published

on

Google Clarifies the "Google-Extended" Crawler Documentation

Google recently updated the documentation of its Google-Extended web crawler user agent, reflecting changes in product naming and clarifying the impact on search, which may be a concern for those who choose to block the crawler. The updated documentation offers clearer guidance on controlling content access for use in AI model training.

Google-Extended User Agent

Introduced on September 28, 2023, Google-Extended offers web publishers a user agent that can be used to control how their sites are crawled. Publishers can allow or disallow the Google-Extended user agent using the Robots Exclusion Protocol, giving them a way to opt-out of having their content scraped and included in AI training datasets.

Google describes Google-Extended as a “standalone product token” but that’s non-standard terminology for how publishers understand the concept of User Agents.

The original announcement described the new user agent:

“Today we’re announcing Google-Extended, a new control that web publishers can use to manage whether their sites help improve Bard and Vertex AI generative APIs, including future generations of models that power those products.

By using Google-Extended to control access to content on a site, a website administrator can choose whether to help these AI models become more accurate and capable over time.”

Blocking Google-Extended is done with the “Google-Extended” User Agent:

User-agent: Google-Extended
Disallow: /

Google Changelog

Google keeps a changelog of important updates made to guidance and communication with web publishers and the search marketing community. The changelog of Google’s developer pages announced a change to the Google-Extended documentation.

The revision comes after the renaming of Bard to Gemini Apps, specifying that Google-Extended’s indexing now contributes to Gemini Apps and Vertex AI generative APIs. The new wording reassures publishers that this does not affect Google Search, addressing potential concerns about the possible implications from opting out of Google-Extended AI data collection.

What Changed?

Google’s changelog clarifies that Google-Extended crawling is exclusive to Gemini Apps and has no impact on Google Search.

The Changelog advises:

“Updated the description of the Google-Extended product token
What: With the name change of Bard to Gemini Apps, we clarified that Gemini Apps is affected by Google-Extended, and, based on publisher feedback, we specified that Google-Extended doesn’t affect Google Search.”

The updated guidance no longer uses the Bard brand name, switching it out to Gemini. And the following sentence was added:

“Google-Extended does not impact a site’s inclusion or ranking in Google Search.”

Read Google’s updated crawler overview:

Overview of Google crawlers and fetchers (user agents)

Featured Image by Shutterstock/Ribkhan

Source link

Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address

SEO

Technical SEO Checklist for 2024: A Comprehensive Guide

Published

on

Technical SEO Checklist 2024 Comprehensive Strategies

Technical SEO Checklist 2024 Comprehensive Strategies

With Google getting a whopping total of six algorithmic updates and four core updates in 2023, you can bet the search landscape is more complicated (and competitive) to navigate nowadays.

To succeed in SEO this year, you will need to figure out what items to check and optimize to ensure your website stays visible. And if your goal is to not just make your website searchable, but have it rank at the top of search engine results, this technical SEO checklist for 2024 is essential.

Webmaster’s Note: This is part one of our three-part SEO checklist for 2024. I also have a longer guide on advanced technical SEO, which covers best practices and how to troubleshoot and solve common technical issues with your websites.

Technical SEO Essentials for 2024

Technical SEO refers to optimizations that are primarily focused on helping search engines access, crawl, interpret, and index your website without any issues. It lays the foundation for your site to be properly understood and served up by search engines to users.

1. Website Speed Optimization

A site’s loading speed is a significant ranking factor for search engines like Google, which prioritize user experience. Faster websites generally provide a more pleasant user experience, leading to increased engagement and improved conversion rates.

Server Optimization

Often, the reason why your website is loading slowly is because of the server it’s hosted on. It’s important to choose a high-quality server that ensures quick loading times from the get-go so you skip the headache that is server optimization.

Google recommends keeping your server response time under 200ms. To check your server’s response time, you need to know your website’s IP address. Once you have that, use your command prompt.

In the window that appears, type ping, followed by your website’s IP address. Press enter and the window should show how long it took your server to respond. 

If you find that your server goes above the recommended 200ms loading time, here’s what you need to check:

  1. Collect the data from your server and identify what is causing your response time to increase. 
  2. Based on what is causing the problem, you will need to implement server-side optimizations. This guide on how to reduce initial server response times can help you here.
  3. Measure your server response times after optimization to use as a benchmark. 
  4. Monitor any regressions after optimization.

If you work with a hosting service, then you should contact them when you need to improve server response times. A good hosting provider should have the right infrastructure, network connections, server hardware, and support services to accommodate these optimizations. They may also offer hosting options if your website needs more server resources to run smoothly.

Website Optimization

Aside from your server, there are a few other reasons that your website might be loading slowly. 

Here are some practices you can do:

  1. Compressing images to decrease file sizes without sacrificing quality
  2. Minimizing the code, eliminating unnecessary spaces, comments, and indentation.
  3. Using caching to store some data locally in a user’s browser to allow for quicker loading on subsequent visits.
  4. Implementing Content Delivery Networks (CDNs) to distribute the load, speeding up access for users situated far from the server.
  5. Lazy load your web pages to prioritize loading the objects or resources only your users need.

A common tool to evaluate your website speed is Google’s PageSpeed Insights or Google Lighthouse. Both tools can analyze the content of your website and then generate suggestions to improve its overall loading speed, all for free. There are also some third-party tools, like GTMetrix, that you could use as well.

Here’s an example of one of our website’s speeds before optimization. It’s one of the worst I’ve seen, and it was affecting our SEO.

slow site speed score from GTMetrixslow site speed score from GTMetrix

So we followed our technical SEO checklist. After working on the images, removing render-blocking page elements, and minifying code, the score greatly improved — and we saw near-immediate improvements in our page rankings. 

site speed optimization results from GTMetrixsite speed optimization results from GTMetrix

That said, playing around with your server settings, coding, and other parts of your website’s backend can mess it up if you don’t know what you’re doing. I suggest backing up all your files and your database before you start working on your website speed for that reason. 

2. Mobile-First Indexing

Mobile-first Indexing is a method used by Google that primarily uses the mobile version of the content for indexing and ranking. 

It’s no secret that Google places a priority on the mobile users’ experience, what with mobile-first indexing being used. Beyond that, optimizing your website for mobile just makes sense, given that a majority of people now use their phones to search online.

This change signifies that a fundamental shift in your approach to your website development and design is needed, and it should also be part of your technical SEO checklist.

  1. Ensuring the mobile version of your site contains the same high-quality, rich content as the desktop version.
  2. Make sure metadata is present on both versions of your site.
  3. Verify that structured data is present on both versions of your site.

Tools like Google’s mobile-friendly test can help you measure how effectively your mobile site is performing compared to your desktop versions, and to other websites as well.

3. Crawlability & Indexing Check

Always remember that crawlability and Indexing are the cornerstones of SEO. Crawlability refers to a search engine’s ability to access and crawl through a website’s content. Indexing is how search engines organize information after a crawl and before presenting results.

  1. Utilizing a well-structured robots.txt file to communicate with web crawlers about which of your pages should not be processed or scanned.
  2. Using XML sitemaps to guide search engines through your site’s content and ensure that all valuable content is found and indexed. There are several CMS plugins you can use to generate your sitemap.
  3. Ensuring that your website has a logical structure with a clear hierarchy, helps both users and bots navigate to your most important pages easily. 

Google Search Console is the tool you need to use to ensure your pages are crawled and indexed by Google. It also provides reports that identify any problems that prevent crawlers from indexing your pages. 

4. Structured Data Markup

Structured Data Markup is a coding language that communicates website information in a more organized and richer format to search engines. This plays a strategic role in the way search engines interpret and display your content, enabling enhanced search results through “rich snippets” such as stars for reviews, prices for products, or images for recipes.

Doing this allows search engines to understand and display extra information directly in the search results from it.

Key Takeaway

With all the algorithm changes made in 2023, websites need to stay adaptable and strategic to stay at the top of the search results page. Luckily for you, this technical SEO checklist for 2024 can help you do just that. Use this as a guide to site speed optimization, indexing, and ensuring the best experience for mobile and desktop users.

Source link

Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address
Continue Reading

SEO

Why Google Seems To Favor Big Brands & Low-Quality Content

Published

on

By

Why Google Seems To Favor Big Brands & Low-Quality Content

Many people are convinced that Google shows a preference for big brands and ranking low quality content, something that many feel has become progressively worse. This may not be a matter of perception, something is going on, nearly everyone has an anecdote of poor quality search results. The possible reasons for it are actually quite surprising.

Google Has Shown Favoritism In The Past

This isn’t the first time that Google’s search engine results pages (SERPs) have shown a bias that favored big brand websites. During the early years of Google’s algorithm it was obvious that sites with a lot of PageRank ranked for virtually anything they wanted.

For example, I remember a web design company that built a lot of websites, creating a network of backlinks, raising their PageRank to a remarkable level normally seen only in big corporate sites like IBM. As a consequence they ranked for the two-word keyword phrase, Web Design and virtually every other variant like Web Design + [any state in the USA].

Everyone knew that websites with a PageRank of 10, the highest level shown on Google’s toolbar, practically had a free pass in the SERPs, resulting in big brand sites outranking more relevant webpages. It didn’t go unnoticed when Google eventually adjusted their algorithm to fix this issue.

The point of this anecdote is to point out an instance of where Google’s algorithm unintentionally created a bias that favored big brands.

Here are are other  algorithm biases that publishers exploited:

  • Top 10 posts
  • Longtail “how-to” articles
  • Misspellings
  • Free Widgets in footer that contained links (always free to universities!)

Big Brands And Low Quality Content

There are two things that have been a constant for all of Google’s history:

  • Low quality content
  • Big brands crowding out small independent publishers

Anyone that’s ever searched for a recipe knows that the more general the recipe the lower the quality of recipe that gets ranked. Search for something like cream of chicken soup and the main ingredient for nearly every recipe is two cans of chicken soup.

A search for Authentic Mexican Tacos results in recipes with these ingredients:

  • Soy sauce
  • Ground beef
  • “Cooked chicken”
  • Taco shells (from the store!)
  • Beer

Not all recipe SERPs are bad. But some of the more general recipes Google ranks are so basic that a hobo can cook them on a hotplate.

Robin Donovan (Instagram), a cookbook author and online recipe blogger observed:

“I think the problem with google search rankings for recipes these days (post HCU) are much bigger than them being too simple.

The biggest problem is that you get a bunch of Reddit threads or sites with untested user-generated recipes, or scraper sites that are stealing recipes from hardworking bloggers.

In other words, content that is anything but “helpful” if what you want is a tested and well written recipe that you can use to make something delicious.”

Explanations For Why Google’s SERPs Are Broken

It’s hard not to get away from the perception that Google’s rankings for a variety of topics always seem to default to big brand websites and low quality webpages.

Small sites grow to become big brands that dominate the SERPs, it happens. But that’s the thing, even when a small site gets big, it’s now another big brand dominating the SERPs.

Typical explanations for poor SERPs:

  • It’s a conspiracy to increase ad clicks
  • Content itself these days are low quality across the board
  • Google doesn’t have anything else to rank
  • It’s the fault of SEOs
  • Affiliates
  • Poor SERPs is Google’s scheme to drive more ad clicks
  • Google promotes big brands because [insert your conspiracy]

So what’s going on?

People Love Big Brands & Garbage Content

The recent Google anti-trust lawsuit exposed the importance of the Navboost algorithm signals as a major ranking factor. Navboost is an algorithm that interprets user engagement signals to understand what topics a webpage is relevant for, among other things.

The idea of using engagement signals as an indicator of what users expect to see makes sense. After all, Google is user-centric and who better to decide what’s best for users than the users themselves, right?

Well, consider that arguably the the biggest and most important song of 1991, Smells Like Teen Spirt by Nirvana, didn’t make the Billboard top 100 for that year. Michael Bolton and Rod Stewart made the list twice, with Rod Stewart top ranked for a song called “The Motown Song” (anyone remember that one?)

Nirvana didn’t make the charts until the next year…

My opinion, given that we know that user interactions are a strong ranking signal, is that Google’s search rankings follow a similar pattern related to users’ biases.

People tend to choose what they know. It’s called a Familiarity Bias.

Consumers have a habit of choosing things that are familiar over those that are unfamiliar. This preference shows up in product choices that prefer brands, for example.

Behavioral scientist, Jason Hreha, defines Familiarity Bias like this:

“The familiarity bias is a phenomenon in which people tend to prefer familiar options over unfamiliar ones, even when the unfamiliar options may be better. This bias is often explained in terms of cognitive ease, which is the feeling of fluency or ease that people experience when they are processing familiar information. When people encounter familiar options, they are more likely to experience cognitive ease, which can make those options seem more appealing.”

Except for certain queries (like those related to health), I don’t think Google makes an editorial decision to certain kinds of websites, like brands.

Google uses many signals for ranking. But Google is strongly user focused.

I believe it’s possible that strong user preferences can carry a more substantial weight than Reviews System signals. How else to explain why Google seemingly has a bias for big brand websites with fake reviews rank better than honest independent review sites?

It’s not like Google’s algorithms haven’t created poor search results in the past.

  • Google’s Panda algorithm was designed to get rid of a bias for cookie cutter content.
  • The Reviews System is a patch to fix Google’s bias for content that’s about reviews but aren’t necessarily reviews.

If Google has systems for catching low quality sites that their core algorithm would otherwise rank, why do big brands and poor quality content still rank?

I believe the answer is that is what users prefer to see those sites, as indicated by user interaction signals.

The big question to ask is whether Google will continue to rank what users biases and inexperience trigger user satisfaction signals.  Or will Google continue serving the sugar-frosted bon-bons that users crave?

Should Google make the choice to rank quality content at the risk that users find it too hard to understand?

Or should publishers give up and focus on creating for the lowest common denominator like the biggest popstars do?



Source link

Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address
Continue Reading

SEO

Google Announces Gemma: Laptop-Friendly Open Source AI

Published

on

By

Google Announces Gemma: Laptop-Friendly Open Source AI

Google released an open source large language model based on the technology used to create Gemini that is powerful yet lightweight, optimized to be used in environments with limited resources like on a laptop or cloud infrastructure.

Gemma can be used to create a chatbot, content generation tool and pretty much anything else that a language model can do. This is the tool that SEOs have been waiting for.

It is released in two versions, one with two billion parameters (2B) and another one with seven billion parameters (7B). The number of parameters indicates the model’s complexity and potential capability. Models with more parameters can achieve a better understanding of language and generate more sophisticated responses, but they also require more resources to train and run.

The purpose of releasing Gemma is to democratize access to state of the art Artificial Intelligence that is trained to be safe and responsible out of the box, with a toolkit to further optimize it for safety.

Gemma By DeepMind

The model is developed to be lightweight and efficient which makes it ideal for getting it into the hands of more end users.

Google’s official announcement noted the following key points:

  • “We’re releasing model weights in two sizes: Gemma 2B and Gemma 7B. Each size is released with pre-trained and instruction-tuned variants.
  • A new Responsible Generative AI Toolkit provides guidance and essential tools for creating safer AI applications with Gemma.
  • We’re providing toolchains for inference and supervised fine-tuning (SFT) across all major frameworks: JAX, PyTorch, and TensorFlow through native Keras 3.0.
  • Ready-to-use Colab and Kaggle notebooks, alongside integration with popular tools such as Hugging Face, MaxText, NVIDIA NeMo and TensorRT-LLM, make it easy to get started with Gemma.
  • Pre-trained and instruction-tuned Gemma models can run on your laptop, workstation, or Google Cloud with easy deployment on Vertex AI and Google Kubernetes Engine (GKE).
  • Optimization across multiple AI hardware platforms ensures industry-leading performance, including NVIDIA GPUs and Google Cloud TPUs.
  • Terms of use permit responsible commercial usage and distribution for all organizations, regardless of size.”

Analysis Of Gemma

According to an analysis by an Awni Hannun, a machine learning research scientist at Apple, Gemma is optimized to be highly efficient in a way that makes it suitable for use in low-resource environments.

Hannun observed that Gemma has a vocabulary of 250,000 (250k) tokens versus 32k for comparable models. The importance of that is that Gemma can recognize and process a wider variety of words, allowing it to handle tasks with complex language. His analysis suggests that this extensive vocabulary enhances the model’s versatility across different types of content. He also believes that it may help with math, code and other modalities.

It was also noted that the “embedding weights” are massive (750 million). The embedding weights are a reference to the parameters that help in mapping words to representations of their meanings and relationships.

An important feature he called out is that the embedding weights, which encode detailed information about word meanings and relationships, are used not just in processing input part but also in generating the model’s output. This sharing improves the efficiency of the model by allowing it to better leverage its understanding of language when producing text.

For end users, this means more accurate, relevant, and contextually appropriate responses (content) from the model, which improves its use in conetent generation as well as for chatbots and translations.

He tweeted:

“The vocab is massive compared to other open source models: 250K vs 32k for Mistral 7B

Maybe helps a lot with math / code / other modalities with a heavy tail of symbols.

Also the embedding weights are big (~750M params), so they get shared with the output head.”

In a follow-up tweet he also noted an optimization in training that translates into potentially more accurate and refined model responses, as it enables the model to learn and adapt more effectively during the training phase.

He tweeted:

“The RMS norm weight has a unit offset.

Instead of “x * weight” they do “x * (1 + weight)”.

I assume this is a training optimization. Usually the weight is initialized to 1 but likely they initialize close to 0. Similar to every other parameter.”

He followed up that there are more optimizations in data and training but that those two factors are what especially stood out.

Designed To Be Safe And Responsible

An important key feature is that it is designed from the ground up to be safe which makes it ideal for deploying for use. Training data was filtered to remove personal and sensitive information. Google also used reinforcement learning from human feedback (RLHF) to train the model for responsible behavior.

It was further debugged with manual re-teaming, automated testing and checked for capabilities for unwanted and dangerous activities.

Google also released a toolkit for helping end-users further improve safety:

“We’re also releasing a new Responsible Generative AI Toolkit together with Gemma to help developers and researchers prioritize building safe and responsible AI applications. The toolkit includes:

  • Safety classification: We provide a novel methodology for building robust safety classifiers with minimal examples.
  • Debugging: A model debugging tool helps you investigate Gemma’s behavior and address potential issues.
  • Guidance: You can access best practices for model builders based on Google’s experience in developing and deploying large language models.”

Read Google’s official announcement:

Gemma: Introducing new state-of-the-art open models

Featured Image by Shutterstock/Photo For Everything



Source link

Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address
Continue Reading

Trending

Follow by Email
RSS