SEO
How To Use IndexNow API With Python For Bulk Indexing
IndexNow is a protocol developed by Microsoft Bing and adopted by Yandex that enables webmasters and SEO pros to easily notify search engines when a webpage has been updated via an API.
And today, Microsoft announced that it is making the protocol easier to implement by ensuring that submitted URLs are shared between search engines.
Given its positive implications and the promise of a faster indexing experience for publishers, the IndexNow API should be on every SEO professional’s radar.
Using Python for automating URL submission to the IndexNow API or making an API request to the IndexNow API for bulk URL indexing can make managing IndexNow more efficient for you.
In this tutorial, you’ll learn how to do just that, with step-by-step instructions for using the IndexNow API to submit URLs to Microsoft Bing in bulk with Python.
Note: The IndexNow API is similar to Google’s Indexing API with only one difference: the Google Indexing API is only for job advertisements or broadcasting web pages that contain a video object within it.
Google announced that they will test the IndexNow API but hasn’t updated us since.
Bulk Indexing Using IndexNow API with Python: Getting Started
Below are the necessities to understand and implement the IndexNow API tutorial.
Below are the Python packages and libraries that will be used for the Python IndexNow API tutorial.
- Advertools (must).
- Pandas (must).
- Requests (must).
- Time (optional).
- JSON (optional).
Before getting started, reading the basics can help you to understand this IndexNow API and Python tutorial better. We will be using an API Key and a .txt file to provide authentication along with specific HTTP Headers.
1. Import The Python Libraries
To use the necessary Python libraries, we will use the “import” command.
- Advertools will be used for sitemap URL extraction.
- Requests will be used for making the GET and POST requests.
- Pandas will be used for taking the URLs in the sitemap into a list object.
- The “time” module is to prevent a “Too much request” error with the “sleep()” method.
- JSON is for possibly modifying the POST JSON object if needed.
Below, you will find all of the necessary import lines for the IndexNow API tutorial.
import advertools as adv import pandas as pd import requests import json import time
2. Extracting The Sitemap URLs With Python
To extract the URLs from a sitemap file, different web scraping methods and libraries can be used such as Requests or Scrapy.
But to keep things simple and efficient, I will use my favorite Python SEO package – Advertools.
With only a single line of code, all of the URLs within a sitemap can be extracted.
sitemap_urls = adv.sitemap_to_df("https://www.example.com/sitemap_index.xml")
The “sitemap_to_df” method of the Advertools can extract all the URLs and other sitemap-related tags such as “lastmod” or “priority.”
Below, you can see the output of the “adv.sitemap_to_df” command.
All of the URLs and dates are specified within the “sitemap_urls” variable.
Since sitemaps are useful sources for search engines and SEOs, Advertools’ sitemap_to_df method can be used for many different tasks including a Sitemap Python Audit.
But that’s a topic for another time.
3. Take The URLs Into A List Object With “to_list()”
Python’s Pandas library has a method for taking a data frame column (data series) into a list object, to_list().
Below is an example usage:
sitemap_urls["loc"].to_list()
Below, you can see the result:
All URLs within the sitemap are in a Python list object.
4. Understand The URL Syntax Of IndexNow API Of Microsoft Bing
Let’s take a look at the URL syntax of the IndexNow API.
Here’s an example:
https://<searchengine>/indexnow?url=url-changed&key=your-key
The URL syntax represents the variables and their relations to each other within the RFC 3986 standards.
- The <searchengine> represents the search engine name that you will use the IndexNow API for.
- “?url=” parameter is to determine the URL that will be submitted to the search engine via IndexNow API.
- “&key=” is the API Key that will be used within the IndexNow API.
- “&keyLocation=” is to provide an authenticity that shows that you are the owner of the website that IndexNow API will be used for.
The “&keyLocation” will bring us to the API Key and its “.txt” version.
5. Gather The API Key For IndexNow And Upload It To The Root
You’ll need a valid key to use the IndexNow API.
Use this link to generate the Microsoft Bing IndexNow API Key.
Clicking the “Generate” button creates an IndexNow API Key.
When you click on the download button, it will download the “.txt” version of the IndexNow API Key.
The TXT version of the API key will be the file name and as well as within the text file.
The next step is uploading this TXT file to the root of the website’s server.
Since I use FileZilla for my FTP, I have uploaded it easily to my web server’s root.
The next step is performing a simple for a loop example for submitting all of the URLs within the sitemap.
6. Submit The URLs Within The Sitemap With Python To IndexNow API
To submit a single URL to the IndexNow, you can use a single “requests.get()” instance. But to make it more useful, we will use a for a loop.
To submit URLs in bulk to the IndexNow API with Python, follow the steps below:
- Create a key variable with the IndexNow API Key value.
- Replace the <searchengine> section with the search engine that you want to submit URLs (Microsoft Bing, or Yandex, for now).
- Assign all of the URLs from the sitemap within a list to a variable.
- Use the “txt” file within the root of the web server with its URL value.
- Place the URL, key, and key location URL within the string manipulation value.
- Start your for a loop, and use the “requests.get()” for all of the URLs within the sitemap.
Below, you can see the implementation:
key = "22bc7c564b334f38b0b1ed90eec8f2c5" url = sitemap_urls["loc"].to_list()
for i in url: endpoint = f"https://bing.com/indexnow?url={i}&key={key}&keyLocation={location}" response = requests.get(endpoint) print(i) print(endpoint) print(response.status_code, response.content) #time.sleep(5)
If you’re concerned about sending too many requests to the IndexNow API, you can use the Python time module to make the script wait between every request.
Here you can see the output of the script:
The 200 Status Code means that the request was successful.
With the for a loop, I have submitted 194 URLs to Microsoft Bing.
According to the IndexNow Documentation, the HTTP 200 Response Code signals that the search engine is aware of the change in the content or the new content. But it doesn’t necessarily guarantee indexing.
For instance, I have used the same script for another website. After 120 seconds, Microsoft Bing says that 31 results are found. And conveniently, it shows four pages.
The only problem is that on the first page there are only two results, and it says that the URLs are blocked by Robots.txt even if the blocking was removed before submission.
This can happen if the robots.txt was changed to remove some URLs before using the IndexNow API because it seems that Bing does not check the Robots.txt again.
Thus, if you previously blocked them, they try to index your website but still use the previous version of the robots.txt file.
On the second page, there is only one result:
On the third page, there is no result, and it shows the Microsoft Bing Translate for translating the string within the search bar.
When I checked Google Analytics, it shows that Bing still hadn’t crawled the website or indexed it. I know this is true as I also checked the log files.
Below, you will see the Bing Webmaster Tool’s report for the example website:
It says that I submitted 38 URLs.
The next step will involve the bulk request with the POST Method and a JSON object.
7. Perform An HTTP Post Request To The IndexNow API
To perform an HTTP post request to the IndexNow API for a set of URLs, a JSON object should be used with specific properties.
- Host property represents the search engine hostname.
- Key represents the API Key.
- Key represents the location of the API Key’s txt file within the web server.
- urlList represents the URL set that will be submitted to the IndexNow API.
- Headers represent the POST Request Headers that will be used which are “Content-type” and “charset.”
Since this is a POST request, the “requests.post” will be used instead of the “requests.get().”
Below, you will find an example of a set of URLs submitted to Microsoft Bing’s IndexNow API.
data = { "host": "www.bing.com", "key": "22bc7c564b334f38b0b1ed90eec8f2c5", "keyLocation": "https://www.example.com/22bc7c564b334f38b0b1ed90eec8f2c5.txt", "urlList": [ 'https://www.example.com/technical-seo/http-header/', 'https://www.example.com/python-seo/nltk/lemmatize', 'https://www.example.com/pagespeed/broser-hints/preload', 'https://www.example.com/python-seo/nltk/stemming', 'https://www.example.com/python-seo/categorize-queries/', 'https://www.example.com/python-seo/nltk/tokenization', 'https://www.example.com/review/oncrawl/', 'https://www.example.com/technical-seo/hreflang/', 'https://www.example.com/technical-seo/multilingual-seo/' ] } headers = {"Content-type":"application/json", "charset":"utf-8"} r = requests.post("https://bing.com/", data=data, headers=headers) r.status_code, r.content
In the example above, we have performed a POST Request to index a set of URLs.
We have used the “data” object for the “data parameter of requests.post,” and the headers object for the “headers” parameter.
Since we POST a JSON object, the request should have the “content-type: application/json” key and value with the “charset:utf-8.”
After I make the POST request, 135 seconds later, my live logfile analysis dashboard started to show the immediate hits from the Bingbot.
8. Create Custom Function For IndexNow API To Make Time
Creating a custom function for IndexNow API is useful to decrease the time that will be spent on the code preparation.
Thus, I have created two different custom Python functions to use the IndexNow API for bulk requests and individual requests.
Below, you will find an example for only the bulk requests to the IndexNow API.
The custom function for bulk requests is called “submit_url_set.”
Even if you just fill in the parameters, still you will be able to use it properly.
def submit_url_set(set_:list, key, location, host="https://www.bing.com", headers={"Content-type":"application/json", "charset":"utf-8"}): key = "22bc7c564b334f38b0b1ed90eec8f2c5" set_ = sitemap_urls["loc"].to_list() data = { "host": "www.bing.com", "key": key, "keyLocation": "https://www.example.com/22bc7c564b334f38b0b1ed90eec8f2c5.txt", "urlList": set_ } r = requests.post(host, data=data, headers=headers) return r.status_code
An explanation of this custom function:
- The “Set_” parameter is to provide a list of URLs.
- “Key” parameter is to provide an IndexNow API Key.
- “Location” parameter is to provide the location of the IndexNow API Key’s txt file within the web server.
- “Host” is to provide the search engine host address.
- “Headers” is to provide the headers that are necessary for the IndexNow API.
I have defined some of the parameters with default values such as “host” for Microsoft Bing. If you want to use it for Yandex, you will need to state it while calling the function.
Below is an example usage:
submit_url_set(set_=sitemap_urls["loc"].to_list(), key="22bc7c564b334f38b0b1ed90eec8f2c5", location="https://www.example.com/22bc7c564b334f38b0b1ed90eec8f2c5.txt")
If you want to extract sitemap URLs with a different method, or if you want to use the IndexNow API for a different URL set, you will need to change “set_” parameter value.
Below, you will see an example of the Custom Python function for the IndexNow API for only individual requests.
def submit_url(url, location, key = "22bc7c564b334f38b0b1ed90eec8f2c5"): key = "22bc7c564b334f38b0b1ed90eec8f2c5" url = sitemap_urls["loc"].to_list() for i in url: endpoint = f"https://bing.com/indexnow?url={i}&key={key}&keyLocation={location}" response = requests.get(endpoint) print(i) print(endpoint) print(response.status_code, response.content) #time.sleep(5)
Since this is for a loop, you can submit more URLs one by one. The search engine can prioritize these types of requests differently.
Some of the bulk requests will include non-important URLs, the individual requests might be seen as more reasonable.
If you want to include the sitemap URL extraction within the function, you should include Advertools naturally into the functions themselves.
Tips For Using The IndexNow API With Python
An Overview of How The IndexNow API Works, Capabilities & Uses
- The IndexNow API doesn’t guarantee that your website or the URLs that you submitted will be indexed.
- You should only submit URLs that are new or for which the content has changed.
- The IndexNow API impacts the crawl budget.
- Microsoft Bing has a threshold for the URL Content Quality and Calculation of the Crawl Need for a URL. If the submitted URL is not good enough, they may not crawl it.
- You can submit up to 10,000 URLs.
- The IndexNow API suggests submitting URLs even if the website is small.
- Submitting the same pages many times within a day can block the IndexNow API from crawling the redundant URLs or the source.
- The IndexNow API is useful for sites where the content changes frequently, like every 10 minutes.
- IndexNow API is useful for pages that are gone and are returning a 404 response code. It lets the search engine know that the URLs are gone.
- IndexNow API can be used for notifying of new 301 or 302 redirects.
- The 200 Status Response Code means that the search engine is aware of the submitted URL.
- The 429 Status Code means that you made too many requests to the IndexNow API.
- If you put a “txt” file that contains the IndexNow API Key into a subfolder, the IndexNow API can be used only for that subfolder.
- If you have two different CMS, you can use two different IndexNow API Keys for two different site sections
- Subdomains need to use a different IndexNow API key.
- Even if you already use a sitemap, using IndexNow API is useful because it efficiently tells the search engines of website changes and reduces unnecessary bot crawling.
- All search engines that adopt the IndexNow API (Microsoft Bing and Yandex) share the URLs that are submitted between each other.
In this IndexNow API tutorial and guideline with Python, we have examined a new search engine technology.
Instead of waiting to be crawled, publishers can notify the search engines to crawl when there is a need.
IndexNow reduces the use of search engine data center resources, and now you know how to use Python to make the process more efficient, too.
More resources:
An Introduction To Python & Machine Learning For Technical SEO
How to Use Python to Monitor & Measure Website Performance
Advanced Technical SEO: A Complete Guide
Featured Image: metamorworks/Shutterstock
SEO
Google Declares It The “Gemini Era” As Revenue Grows 15%
Alphabet Inc., Google’s parent company, announced its first quarter 2024 financial results today.
While Google reported double-digit growth in key revenue areas, the focus was on its AI developments, dubbed the “Gemini era” by CEO Sundar Pichai.
The Numbers: 15% Revenue Growth, Operating Margins Expand
Alphabet reported Q1 revenues of $80.5 billion, a 15% increase year-over-year, exceeding Wall Street’s projections.
Net income was $23.7 billion, with diluted earnings per share of $1.89. Operating margins expanded to 32%, up from 25% in the prior year.
Ruth Porat, Alphabet’s President and CFO, stated:
“Our strong financial results reflect revenue strength across the company and ongoing efforts to durably reengineer our cost base.”
Google’s core advertising units, such as Search and YouTube, drove growth. Google advertising revenues hit $61.7 billion for the quarter.
The Cloud division also maintained momentum, with revenues of $9.6 billion, up 28% year-over-year.
Pichai highlighted that YouTube and Cloud are expected to exit 2024 at a combined $100 billion annual revenue run rate.
Generative AI Integration in Search
Google experimented with AI-powered features in Search Labs before recently introducing AI overviews into the main search results page.
Regarding the gradual rollout, Pichai states:
“We are being measured in how we do this, focusing on areas where gen AI can improve the Search experience, while also prioritizing traffic to websites and merchants.”
Pichai reports that Google’s generative AI features have answered over a billion queries already:
“We’ve already served billions of queries with our generative AI features. It’s enabling people to access new information, to ask questions in new ways, and to ask more complex questions.”
Google reports increased Search usage and user satisfaction among those interacting with the new AI overview results.
The company also highlighted its “Circle to Search” feature on Android, which allows users to circle objects on their screen or in videos to get instant AI-powered answers via Google Lens.
Reorganizing For The “Gemini Era”
As part of the AI roadmap, Alphabet is consolidating all teams building AI models under the Google DeepMind umbrella.
Pichai revealed that, through hardware and software improvements, the company has reduced machine costs associated with its generative AI search results by 80% over the past year.
He states:
“Our data centers are some of the most high-performing, secure, reliable and efficient in the world. We’ve developed new AI models and algorithms that are more than one hundred times more efficient than they were 18 months ago.
How Will Google Make Money With AI?
Alphabet sees opportunities to monetize AI through its advertising products, Cloud offerings, and subscription services.
Google is integrating Gemini into ad products like Performance Max. The company’s Cloud division is bringing “the best of Google AI” to enterprise customers worldwide.
Google One, the company’s subscription service, surpassed 100 million paid subscribers in Q1 and introduced a new premium plan featuring advanced generative AI capabilities powered by Gemini models.
Future Outlook
Pichai outlined six key advantages positioning Alphabet to lead the “next wave of AI innovation”:
- Research leadership in AI breakthroughs like the multimodal Gemini model
- Robust AI infrastructure and custom TPU chips
- Integrating generative AI into Search to enhance the user experience
- A global product footprint reaching billions
- Streamlined teams and improved execution velocity
- Multiple revenue streams to monetize AI through advertising and cloud
With upcoming events like Google I/O and Google Marketing Live, the company is expected to share further updates on its AI initiatives and product roadmap.
Featured Image: Sergei Elagin/Shutterstock
SEO
brightonSEO Live Blog
Hello everyone. It’s April again, so I’m back in Brighton for another two days of Being the introvert I am, my idea of fun isn’t hanging around our booth all day explaining we’ve run out of t-shirts (seriously, you need to be fast if you want swag!). So I decided to do something useful and live-blog the event instead.
Follow below for talk takeaways and (very) mildly humorous commentary. sun, sea, and SEO!
SEO
Google Further Postpones Third-Party Cookie Deprecation In Chrome
Google has again delayed its plan to phase out third-party cookies in the Chrome web browser. The latest postponement comes after ongoing challenges in reconciling feedback from industry stakeholders and regulators.
The announcement was made in Google and the UK’s Competition and Markets Authority (CMA) joint quarterly report on the Privacy Sandbox initiative, scheduled for release on April 26.
Chrome’s Third-Party Cookie Phaseout Pushed To 2025
Google states it “will not complete third-party cookie deprecation during the second half of Q4” this year as planned.
Instead, the tech giant aims to begin deprecating third-party cookies in Chrome “starting early next year,” assuming an agreement can be reached with the CMA and the UK’s Information Commissioner’s Office (ICO).
The statement reads:
“We recognize that there are ongoing challenges related to reconciling divergent feedback from the industry, regulators and developers, and will continue to engage closely with the entire ecosystem. It’s also critical that the CMA has sufficient time to review all evidence, including results from industry tests, which the CMA has asked market participants to provide by the end of June.”
Continued Engagement With Regulators
Google reiterated its commitment to “engaging closely with the CMA and ICO” throughout the process and hopes to conclude discussions this year.
This marks the third delay to Google’s plan to deprecate third-party cookies, initially aiming for a Q3 2023 phaseout before pushing it back to late 2024.
The postponements reflect the challenges in transitioning away from cross-site user tracking while balancing privacy and advertiser interests.
Transition Period & Impact
In January, Chrome began restricting third-party cookie access for 1% of users globally. This percentage was expected to gradually increase until 100% of users were covered by Q3 2024.
However, the latest delay gives websites and services more time to migrate away from third-party cookie dependencies through Google’s limited “deprecation trials” program.
The trials offer temporary cookie access extensions until December 27, 2024, for non-advertising use cases that can demonstrate direct user impact and functional breakage.
While easing the transition, the trials have strict eligibility rules. Advertising-related services are ineligible, and origins matching known ad-related domains are rejected.
Google states the program aims to address functional issues rather than relieve general data collection inconveniences.
Publisher & Advertiser Implications
The repeated delays highlight the potential disruption for digital publishers and advertisers relying on third-party cookie tracking.
Industry groups have raised concerns that restricting cross-site tracking could push websites toward more opaque privacy-invasive practices.
However, privacy advocates view the phaseout as crucial in preventing covert user profiling across the web.
With the latest postponement, all parties have more time to prepare for the eventual loss of third-party cookies and adopt Google’s proposed Privacy Sandbox APIs as replacements.
Featured Image: Novikov Aleksey/Shutterstock
-
SEARCHENGINES6 days ago
Daily Search Forum Recap: April 19, 2024
-
WORDPRESS7 days ago
How to Make $5000 of Passive Income Every Month in WordPress
-
WORDPRESS6 days ago
13 Best HubSpot Alternatives for 2024 (Free + Paid)
-
MARKETING6 days ago
Battling for Attention in the 2024 Election Year Media Frenzy
-
WORDPRESS6 days ago
7 Best WooCommerce Points and Rewards Plugins (Free & Paid)
-
AFFILIATE MARKETING7 days ago
AI Will Transform the Workplace. Here’s How HR Can Prepare for It.
-
MARKETING7 days ago
Tinuiti Marketing Analytics Recognized by Forrester
-
SEO6 days ago
Google Answers Whether Having Two Sites Affects Rankings
You must be logged in to post a comment Login