SEO
Build Your Own SEO AnswerBox With GPT-3 Codex & Streamlit
Google has integrated a new method of querying data in GA4 whereby you simply type a question or a phrase, and it creates the dashboard.
Imagine if you could do the same with your own SEO data – what would that do for your productivity?
In this article, you’ll learn how to configure your own dashboards using phrases and questions, creating an AnswerBox of your own with GPT-3 Codex and Streamlit.
Google’s Answer Box
In Google Analytics v4, you’ve probably noticed a clever search bar that allows you to get:
- Instant answers.
- Reports.
- Answers on configuration.
- Help.
Instant answers are particularly useful. By asking questions about your data, you get answers and – most importantly – ready-to-use reports.
There’s nothing magical about it. This technology relies on natural language processing (NLP), so you have to be precise about metrics, dimensions, and timing when asking for an answer.
For example, you can search for [conversions last week from the United States] and see the results in the search panel that appears on the right.
This new way of using a data visualization tool is incredibly powerful and will surely be integrated into all solutions of this type.
The time savings for the user are impressive, as users don’t have to search through all the views of the tool and no longer have to configure the view.
Everything is done automatically based on the instructions provided in the search box.
Can we easily do the same thing? And which data should we use?
Smart SEO Dashboard
Before requesting a report, you need to think about the important data to be taken into account.
I suggest you look into the concept of the Smart SEO Dashboard.
- The first requirement is to keep the graphs simple and specific. Less is always more.
- Next, the abscissa or ordinates must refer to measurable data. Otherwise, it is impossible to see the evolution.
- In addition, graphs must focus on meaningful parameters. It is useless to monitor parameters that will have no influence on your activity. Weather is an excellent example: it plays a crucial role on some sites and none on others.
- Dashboards should always include relevant summaries in order to be quickly read and understood. Generally speaking, if it takes more than three seconds to understand a dashboard, it can definitely be improved.
- Finally, the most important data is time. It is imperative to track time data by comparing each day, month, year, etc.
Now, you need to identify the best technology to generate this type of dashboard.
GPT-3 Codex
GPT-3 Codex is a computer code generator that was created in August 2021.
Access to GPT-3 Codex was given much faster than access to GPT-3.
Not surprisingly, GPT-3 Codex has been fed millions of quality source codes available on GitHub – that is, more than 54 million GitHub repositories.
Like GPT-3, it is a sophisticated neural network that is capable of self-learning.
GPT-3 Codex does not only work in Python. You can also generate code in Go, Javascript, Perl, and PHP.
On the other hand, GPT-3 Codex has only 12 billion parameters, unlike its big brother GPT-3 Da Vinci which has 175 billion.
Let’s take a closer look at this size-versus-cost ratio.
OpenAI’s experiments show that the size-versus-performance ratio of Codex follows a logarithmic scale.
This means that the performance gains gradually decrease as the size of the model increases.
Therefore, the additional costs of collecting data, training, and running a larger model are not at all worth the slight increase in performance.
All of these reasons explain why the model has only 12 billion parameters for its first version.
We’ve found an AI to generate the code.
Now let’s look for the best framework available at the moment to execute it all in a friendly interface with clicks and drag-and-drop.
Streamlit
Streamlit is an open source technology that allows you to quickly build very advanced user interfaces.
Streamlit also includes many very useful components to have even more interactions like:
- Session management.
- Password management.
- User management.
The community is particularly active and shares many handy custom modules for SEO.
To start, we’ll use GPT-3 Codex to generate graphs with Streamlit, and then attempt to produce a Streamlit app that generates the code and runs it automatically.
Two Excellent Examples For Querying The Data
First, we need to generate an app for Streamlit and run it.
1. With OpenAI (Semi-Automatic)
The first thing to generate is a Streamlit web app that retrieves all the logs from the month of May 1995 from NASA and displays the number of URLs crawled per day.
First of all, we need to retrieve the CSV file by specifying the name of the columns and the format if necessary.
For our example, it is important that the date is in UTC format.
Then you can ask OpenAI to display the graph of your choice, once it has understood your data.
From these instructions, you will have a working code.
Remember that we don’t want to copy and paste code, but drive everything through English instructions with a no-code approach.
2. With Streamlit (Full Automatic)
Here is an open source example based on one of Streamlit’s applications.
It is an app directly connected to GPT-3 Codex that generates computer code and allows you to execute it.
With Charly Wargnier, we did the same thing but for SEO use cases in an app called “Codex for SEO”.
In one click, you can import your data.
Then, you can describe the content of the imported file: What are the columns? What are the data types?
Then you specify your instructions.
In our example, we ask it to group the queries together and to sum the clicks and impressions.
We’ll tell it to keep only the columns we’re interested in (the Queries and Clicks columns), and then click the Execute button.
No line of code is needed to get the results, and everything is generated by the OpenAI Codex and executed by Streamlit.
Our proof of concept is therefore validated with many different use cases.
Moreover, if you need assistance, everything you need for this is accessible via a training program with 150 minutes of video.
For educational and transparency reasons, we have provided the generated code as well as the results.
And with that, the SEO AnswerBox is now available for everyone to create!
More resources:
Featured Image: NicoElNino/Shutterstock
SEO
Google Revamps Entire Crawler Documentation
Google has launched a major revamp of its Crawler documentation, shrinking the main overview page and splitting content into three new, more focused pages. Although the changelog downplays the changes there is an entirely new section and basically a rewrite of the entire crawler overview page. The additional pages allows Google to increase the information density of all the crawler pages and improves topical coverage.
What Changed?
Google’s documentation changelog notes two changes but there is actually a lot more.
Here are some of the changes:
- Added an updated user agent string for the GoogleProducer crawler
- Added content encoding information
- Added a new section about technical properties
The technical properties section contains entirely new information that didn’t previously exist. There are no changes to the crawler behavior, but by creating three topically specific pages Google is able to add more information to the crawler overview page while simultaneously making it smaller.
This is the new information about content encoding (compression):
“Google’s crawlers and fetchers support the following content encodings (compressions): gzip, deflate, and Brotli (br). The content encodings supported by each Google user agent is advertised in the Accept-Encoding header of each request they make. For example, Accept-Encoding: gzip, deflate, br.”
There is additional information about crawling over HTTP/1.1 and HTTP/2, plus a statement about their goal being to crawl as many pages as possible without impacting the website server.
What Is The Goal Of The Revamp?
The change to the documentation was due to the fact that the overview page had become large. Additional crawler information would make the overview page even larger. A decision was made to break the page into three subtopics so that the specific crawler content could continue to grow and making room for more general information on the overviews page. Spinning off subtopics into their own pages is a brilliant solution to the problem of how best to serve users.
This is how the documentation changelog explains the change:
“The documentation grew very long which limited our ability to extend the content about our crawlers and user-triggered fetchers.
…Reorganized the documentation for Google’s crawlers and user-triggered fetchers. We also added explicit notes about what product each crawler affects, and added a robots.txt snippet for each crawler to demonstrate how to use the user agent tokens. There were no meaningful changes to the content otherwise.”
The changelog downplays the changes by describing them as a reorganization because the crawler overview is substantially rewritten, in addition to the creation of three brand new pages.
While the content remains substantially the same, the division of it into sub-topics makes it easier for Google to add more content to the new pages without continuing to grow the original page. The original page, called Overview of Google crawlers and fetchers (user agents), is now truly an overview with more granular content moved to standalone pages.
Google published three new pages:
- Common crawlers
- Special-case crawlers
- User-triggered fetchers
1. Common Crawlers
As it says on the title, these are common crawlers, some of which are associated with GoogleBot, including the Google-InspectionTool, which uses the GoogleBot user agent. All of the bots listed on this page obey the robots.txt rules.
These are the documented Google crawlers:
- Googlebot
- Googlebot Image
- Googlebot Video
- Googlebot News
- Google StoreBot
- Google-InspectionTool
- GoogleOther
- GoogleOther-Image
- GoogleOther-Video
- Google-CloudVertexBot
- Google-Extended
3. Special-Case Crawlers
These are crawlers that are associated with specific products and are crawled by agreement with users of those products and operate from IP addresses that are distinct from the GoogleBot crawler IP addresses.
List of Special-Case Crawlers:
- AdSense
User Agent for Robots.txt: Mediapartners-Google - AdsBot
User Agent for Robots.txt: AdsBot-Google - AdsBot Mobile Web
User Agent for Robots.txt: AdsBot-Google-Mobile - APIs-Google
User Agent for Robots.txt: APIs-Google - Google-Safety
User Agent for Robots.txt: Google-Safety
3. User-Triggered Fetchers
The User-triggered Fetchers page covers bots that are activated by user request, explained like this:
“User-triggered fetchers are initiated by users to perform a fetching function within a Google product. For example, Google Site Verifier acts on a user’s request, or a site hosted on Google Cloud (GCP) has a feature that allows the site’s users to retrieve an external RSS feed. Because the fetch was requested by a user, these fetchers generally ignore robots.txt rules. The general technical properties of Google’s crawlers also apply to the user-triggered fetchers.”
The documentation covers the following bots:
- Feedfetcher
- Google Publisher Center
- Google Read Aloud
- Google Site Verifier
Takeaway:
Google’s crawler overview page became overly comprehensive and possibly less useful because people don’t always need a comprehensive page, they’re just interested in specific information. The overview page is less specific but also easier to understand. It now serves as an entry point where users can drill down to more specific subtopics related to the three kinds of crawlers.
This change offers insights into how to freshen up a page that might be underperforming because it has become too comprehensive. Breaking out a comprehensive page into standalone pages allows the subtopics to address specific users needs and possibly make them more useful should they rank in the search results.
I would not say that the change reflects anything in Google’s algorithm, it only reflects how Google updated their documentation to make it more useful and set it up for adding even more information.
Read Google’s New Documentation
Overview of Google crawlers and fetchers (user agents)
List of Google’s common crawlers
List of Google’s special-case crawlers
List of Google user-triggered fetchers
See also:
Featured Image by Shutterstock/Cast Of Thousands
SEO
Client-Side Vs. Server-Side Rendering
Faster webpage loading times play a big part in user experience and SEO, with page load speed a key determining factor for Google’s algorithm.
A front-end web developer must decide the best way to render a website so it delivers a fast experience and dynamic content.
Two popular rendering methods include client-side rendering (CSR) and server-side rendering (SSR).
All websites have different requirements, so understanding the difference between client-side and server-side rendering can help you render your website to match your business goals.
Google & JavaScript
Google has extensive documentation on how it handles JavaScript, and Googlers offer insights and answer JavaScript questions regularly through various formats – both official and unofficial.
For example, in a Search Off The Record podcast, it was discussed that Google renders all pages for Search, including JavaScript-heavy ones.
This sparked a substantial conversation on LinkedIn, and another couple of takeaways from both the podcast and proceeding discussions are that:
- Google doesn’t track how expensive it is to render specific pages.
- Google renders all pages to see content – regardless if it uses JavaScript or not.
The conversation as a whole has helped to dispel many myths and misconceptions about how Google might have approached JavaScript and allocated resources.
Martin Splitt’s full comment on LinkedIn covering this was:
“We don’t keep track of “how expensive was this page for us?” or something. We know that a substantial part of the web uses JavaScript to add, remove, change content on web pages. We just have to render, to see it all. It doesn’t really matter if a page does or does not use JavaScript, because we can only be reasonably sure to see all content once it’s rendered.”
Martin also confirmed a queue and potential delay between crawling and indexing, but not just because something is JavaScript or not, and it’s not an “opaque” issue that the presence of JavaScript is the root cause of URLs not being indexed.
General JavaScript Best Practices
Before we get into the client-side versus server-side debate, it’s important that we also follow general best practices for either of these approaches to work:
- Don’t block JavaScript resources through Robots.txt or server rules.
- Avoid render blocking.
- Avoid injecting JavaScript in the DOM.
What Is Client-Side Rendering, And How Does It Work?
Client-side rendering is a relatively new approach to rendering websites.
It became popular when JavaScript libraries started integrating it, with Angular and React.js being some of the best examples of libraries used in this type of rendering.
It works by rendering a website’s JavaScript in your browser rather than on the server.
The server responds with a bare-bones HTML document containing the JS files instead of getting all the content from the HTML document.
While the initial upload time is a bit slow, the subsequent page loads will be rapid as they aren’t reliant on a different HTML page per route.
From managing logic to retrieving data from an API, client-rendered sites do everything “independently.” The page is available after the code is executed because every page the user visits and its corresponding URL are created dynamically.
The CSR process is as follows:
- The user enters the URL they wish to visit in the address bar.
- A data request is sent to the server at the specified URL.
- On the client’s first request for the site, the server delivers the static files (CSS and HTML) to the client’s browser.
- The client browser will download the HTML content first, followed by JavaScript. These HTML files connect the JavaScript, starting the loading process by displaying loading symbols the developer defines to the user. At this stage, the website is still not visible to the user.
- After the JavaScript is downloaded, content is dynamically generated on the client’s browser.
- The web content becomes visible as the client navigates and interacts with the website.
What Is Server-Side Rendering, And How Does It Work?
Server-side rendering is the more common technique for displaying information on a screen.
The web browser submits a request for information from the server, fetching user-specific data to populate and sending a fully rendered HTML page to the client.
Every time the user visits a new page on the site, the server will repeat the entire process.
Here’s how the SSR process goes step-by-step:
- The user enters the URL they wish to visit in the address bar.
- The server serves a ready-to-be-rendered HTML response to the browser.
- The browser renders the page (now viewable) and downloads JavaScript.
- The browser executes React, thus making the page interactable.
What Are The Differences Between Client-Side And Server-Side Rendering?
The main difference between these two rendering approaches is in the algorithms of their operation. CSR shows an empty page before loading, while SSR displays a fully-rendered HTML page on the first load.
This gives server-side rendering a speed advantage over client-side rendering, as the browser doesn’t need to process large JavaScript files. Content is often visible within a couple of milliseconds.
Search engines can crawl the site for better SEO, making it easy to index your webpages. This readability in the form of text is precisely the way SSR sites appear in the browser.
However, client-side rendering is a cheaper option for website owners.
It relieves the load on your servers, passing the responsibility of rendering to the client (the bot or user trying to view your page). It also offers rich site interactions by providing fast website interaction after the initial load.
Fewer HTTP requests are made to the server with CSR, unlike in SSR, where each page is rendered from scratch, resulting in a slower transition between pages.
SSR can also buckle under a high server load if the server receives many simultaneous requests from different users.
The drawback of CSR is the longer initial loading time. This can impact SEO; crawlers might not wait for the content to load and exit the site.
This two-phased approach raises the possibility of seeing empty content on your page by missing JavaScript content after first crawling and indexing the HTML of a page. Remember that, in most cases, CSR requires an external library.
When To Use Server-Side Rendering
If you want to improve your Google visibility and rank high in the search engine results pages (SERPs), server-side rendering is the number one choice.
E-learning websites, online marketplaces, and applications with a straightforward user interface with fewer pages, features, and dynamic data all benefit from this type of rendering.
When To Use Client-Side Rendering
Client-side rendering is usually paired with dynamic web apps like social networks or online messengers. This is because these apps’ information constantly changes and must deal with large and dynamic data to perform fast updates to meet user demand.
The focus here is on a rich site with many users, prioritizing the user experience over SEO.
Which Is Better: Server-Side Or Client-Side Rendering?
When determining which approach is best, you need to not only take into consideration your SEO needs but also how the website works for users and delivers value.
Think about your project and how your chosen rendering will impact your position in the SERPs and your website’s user experience.
Generally, CSR is better for dynamic websites, while SSR is best suited for static websites.
Content Refresh Frequency
Websites that feature highly dynamic information, such as gambling or FOREX websites, update their content every second, meaning you’d likely choose CSR over SSR in this scenario – or choose to use CSR for specific landing pages and not all pages, depending on your user acquisition strategy.
SSR is more effective if your site’s content doesn’t require much user interaction. It positively influences accessibility, page load times, SEO, and social media support.
On the other hand, CSR is excellent for providing cost-effective rendering for web applications, and it’s easier to build and maintain; it’s better for First Input Delay (FID).
Another CSR consideration is that meta tags (description, title), canonical URLs, and Hreflang tags should be rendered server-side or presented in the initial HTML response for the crawlers to identify them as soon as possible, and not only appear in the rendered HTML.
Platform Considerations
CSR technology tends to be more expensive to maintain because the hourly rate for developers skilled in React.js or Node.js is generally higher than that for PHP or WordPress developers.
Additionally, there are fewer ready-made plugins or out-of-the-box solutions available for CSR frameworks compared to the larger plugin ecosystem that WordPress users have access too.
For those considering a headless WordPress setup, such as using Frontity, it’s important to note that you’ll need to hire both React.js developers and PHP developers.
This is because headless WordPress relies on React.js for the front end while still requiring PHP for the back end.
It’s important to remember that not all WordPress plugins are compatible with headless setups, which could limit functionality or require additional custom development.
Website Functionality & Purpose
Sometimes, you don’t have to choose between the two as hybrid solutions are available. Both SSR and CSR can be implemented within a single website or webpage.
For example, in an online marketplace, pages with product descriptions can be rendered on the server, as they are static and need to be easily indexed by search engines.
Staying with ecommerce, if you have high levels of personalization for users on a number of pages, you won’t be able to SSR render the content for bots, so you will need to define some form of default content for Googlebot which crawls cookieless and stateless.
Pages like user accounts don’t need to be ranked in the search engine results pages (SERPs), so a CRS approach might be better for UX.
Both CSR and SSR are popular approaches to rendering websites. You and your team need to make this decision at the initial stage of product development.
More resources:
Featured Image: TippaPatt/Shutterstock
SEO
HubSpot Rolls Out AI-Powered Marketing Tools
HubSpot announced a push into AI this week at its annual Inbound marketing conference, launching “Breeze.”
Breeze is an artificial intelligence layer integrated across the company’s marketing, sales, and customer service software.
According to HubSpot, the goal is to provide marketers with easier, faster, and more unified solutions as digital channels become oversaturated.
Karen Ng, VP of Product at HubSpot, tells Search Engine Journal in an interview:
“We’re trying to create really powerful tools for marketers to rise above the noise that’s happening now with a lot of this AI-generated content. We might help you generate titles or a blog content…but we do expect kind of a human there to be a co-assist in that.”
Breeze AI Covers Copilot, Workflow Agents, Data Enrichment
The Breeze layer includes three main components.
Breeze Copilot
An AI assistant that provides personalized recommendations and suggestions based on data in HubSpot’s CRM.
Ng explained:
“It’s a chat-based AI companion that assists with tasks everywhere – in HubSpot, the browser, and mobile.”
Breeze Agents
A set of four agents that can automate entire workflows like content generation, social media campaigns, prospecting, and customer support without human input.
Ng added the following context:
“Agents allow you to automate a lot of those workflows. But it’s still, you know, we might generate for you a content backlog. But taking a look at that content backlog, and knowing what you publish is still a really important key of it right now.”
Breeze Intelligence
Combines HubSpot customer data with third-party sources to build richer profiles.
Ng stated:
“It’s really important that we’re bringing together data that can be trusted. We know your AI is really only as good as the data that it’s actually trained on.”
Addressing AI Content Quality
While prioritizing AI-driven productivity, Ng acknowledged the need for human oversight of AI content:
“We really do need eyes on it still…We think of that content generation as still human-assisted.”
Marketing Hub Updates
Beyond Breeze, HubSpot is updating Marketing Hub with tools like:
- Content Remix to repurpose videos into clips, audio, blogs, and more.
- AI video creation via integration with HeyGen
- YouTube and Instagram Reels publishing
- Improved marketing analytics and attribution
The announcements signal HubSpot’s AI-driven vision for unifying customer data.
But as Ng tells us, “We definitely think a lot about the data sources…and then also understand your business.”
HubSpot’s updates are rolling out now, with some in public beta.
Featured Image: Poetra.RH/Shutterstock
-
WORDPRESS6 days ago
How to Connect Your WordPress Site to the Fediverse – WordPress.com News
-
SEARCHENGINES6 days ago
Daily Search Forum Recap: September 13, 2024
-
SEO7 days ago
SEO Experts Gather for a Candid Chat About Search [Podcast]
-
SEO6 days ago
The Expert SEO Guide To URL Parameter Handling
-
SEO4 days ago
9 HTML Tags (& 11 Attributes) You Must Know for SEO
-
WORDPRESS6 days ago
7 Best WordPress Event Ticketing Plugins for 2024 (Tested)
-
SEARCHENGINES5 days ago
Google Ranking Volatility, Apple Intelligence, Navboost, Ads, Bing & Local
-
WORDPRESS5 days ago
20 Must-Know WordPress Stats Defining the Leading Platform in 2024
You must be logged in to post a comment Login