Connect with us


What Is Latent Semantic Indexing and Why It Doesn’t Matter for SEO via @martinibuster



Many claims are made for Latent Semantic Indexing (LSI) and “LSI Keywords” for SEO.

Some even say that Google relies on “LSI keywords” for understanding webpages.

This has been discussed for nearly twenty years and the evidence-based facts have been there the entire time.

This Is Latent Semantic Indexing

Latent semantic indexing (also referred to as Latent Semantic Analysis) is a method of analyzing a set of documents in order to discover statistical co-occurrences of words that appear together which then give insights into the topics of those words and documents.

Two of the problems (among several) that LSI sets out to solve are the issues of synonymy and polysemy.

Synonymy is a reference to how many words can describe the same thing.

A person searching for “flapjack recipes” is equal to a search for “pancake recipes” (outside of the UK) because flapjacks and pancakes are synonymous.


Continue Reading Below


Polysemy refers to words and phrases that have more than one meaning. The word jaguar can mean an animal, automobile, or an American football team.

LSI is able to statistically predict which meaning of a word represents by statistically analyzing the words that co-occur with it in a document.

If the word “jaguar” is accompanied in a document by the word “Jacksonville,” it is statistically probable that the word “jaguar” is a reference to an American football team.

By understanding how words occur together, a computer is better able to answer a query by correctly associating the right keywords to the search query.

The patent for LSI was filed on September 15, 1988. It’s an old technology that came years before the internet as we know it existed.

LSI is not new nor is it cutting edge.

It is important to understand that in 1988, LSI was advancing the state of the art of simple text matching.

LSI preceded the internet and was created during a time when Apple computers looked like this:


image of an Apple Macintosh SE computer from 1988

image of an Apple Macintosh SE computer from 1988

LSI was created when a popular business computer (IBM AS/400) looked like this:

Image of an IBM AS400 computer from 1988

Image of an IBM AS400 computer from 1988

LSI is a technology that goes way back.


Continue Reading Below

Just like computers from 1988, the state of the art in Information Retrieval has come a long way over the past 30+ years.

LSI is Not Practical for the Web

A major shortcoming of using Latent Semantic Indexing for the entire web is that the calculations done to create the statistical analysis have to be recalculated every time a new webpage is published and indexed.

See also  Landing Page SEO Best Practices & Tips For Success

This shortcoming is mentioned in a 2003 (non-Google) research paper about using LSI for detecting email spam (Using Latent Semantic Indexing to Filter Spam PDF).

The research paper notes:


“One issue with LSI is that it does not support the ad-hoc addition of new documents once the semantic set has been generated. Any update to any cell value will change the coefficient in every other word vector, as SVD uses all linear relations in its assigned dimensionality to induce vectors that will predict every text samples in which the word occurs…”

I asked Bill Slawski about the unsuitability of LSI for search engine information retrieval and he agreed, saying:

“LSI is an older indexing approach developed for smaller static databases. There are similarities with newer technologies such as the use of word vectors or word2Vec.

One of the limitations of LSI is that if new content is added to a corpus that indexing for the entire corpus is required, which makes it of limited usefulness for a quickly changing corpus such as the Web.”

Is There a Google LSI Keywords Research Paper?

Some in the search community believe Google uses “LSI Keywords” in their search algorithm as if LSI is still a cutting-edge technology.

To prove it, some refer to a 2016 research paper called, Improving Semantic Topic Clustering for Search Queries with Word Co-occurrence and Bigraph Co-clustering (PDF).

That research paper is absolutely not an example of Latent Semantic Indexing. It’s a completely different technology.

In fact, that research paper is so not about LSI (a.k.a. Latent Semantic Analysis) that it cites a 1999 LSI research paper ([5] T. Hofmann. Probabilistic latent semantic indexing. …1999) as part of an explanation of why LSI is not useful for the problem the authors are trying to solve.


Continue Reading Below


Here’s what it says:

“Latent dirichlet allocation (LDA) and probabilistic latent semantic analysis (PLSA) are widely used techniques to unveil latent themes in text data. …These models learn the hidden topics by implicitly taking advantage of document level word co-occurrence patterns.

Short texts however – such as search queries, tweets or instant messages – suffer from data sparsity, which causes problems for traditional topic modeling techniques.”

It’s a mistake to use the above research paper as proof that Google uses LSI as an important ranking factor. The paper is not about LSI and it’s not even about analyzing webpages.

It’s an interesting research paper from 2016 about data mining short search queries in order to understand what they mean.

That research paper aside, we know that Google uses BERT and neural matching technologies to understand search queries in the real world.

Long story short: the use of that research paper to make a definitive statement about Google’s ranking algorithm is sketchy all around.


Continue Reading Below


Does Google Use LSI Keywords?

In search marketing, there are two kinds of trustworthy and authoritative data:

  1. Factual ideas that are based on public documents like research papers and patents.
  2. SEO ideas that are based on what Googlers have revealed.

Everything else is mere opinion.

It’s important to know the difference.

Google’s John Mueller has been straightforward about debunking the concept of LSI Keywords.

Noted search patent expert Bill Slawski has also been outspoken about the notion of Latent Semantic Indexing and SEO.

Bill’s statements on LSI are based on a deep knowledge of Google’s algorithms, which he has shared in fact-based articles (like here and here).


Continue Reading Below


Bill Slawski Tweets His Informed Opinion on Latent Semantic Indexing

Why Google Is Associated with Latent Semantic Analysis

Despite there not being any proof in terms of patents and research papers that LSI/LSA are important ranking-related factors, Google is still associated with Latent Semantic Indexing.

One reason for this is Google’s 2003 acquisition of a company called Applied Semantics.

Applied Semantics had created a technology called Circa. Circa was a semantic analysis algorithm that was used in AdSense and also in Google AdWords.


Continue Reading Below


According to Google’s press release:

“Applied Semantics is a proven innovator in semantic text processing and online advertising,” said Sergey Brin, Google’s co-founder and president of Technology. “This acquisition will enable Google to create new technologies that make online advertising more useful to users, publishers, and advertisers alike.

Applied Semantics’ products are based on its patented CIRCA technology, which understands, organizes, and extracts knowledge from websites and information repositories in a way that mimics human thought and enables more effective information retrieval. A key application of the CIRCA technology is Applied Semantics’ AdSense product that enables web publishers to understand the key themes on web pages to deliver highly relevant and targeted advertisements.”

Semantic Analysis & SEO

The phrase “Semantic Analysis” was a hot buzzword in the early 2000s, perhaps partially driven by Ask Jeeves’ semantic search technology.

Google’s purchase of Applied Semantics accelerated the trend of associating Google with Latent Semantic Indexing, despite there being no credible evidence.


Continue Reading Below

Thus, by 2005 the search marketing community was making unsubstantiated statements such as this:

“For several months I’ve noticed changes in website rankings on Google and it was clear something had changed in their algorithm.

One of the most important changes is the likelihood that Google is now giving more weight to Latent Semantic Indexing (LSI).


This should come as no surprise considering Google purchased Applied Semantics in April 2003 and has reportedly been serving up their AdSense ads using latent semantic indexing.”

The SEO myth that Google uses LSI Keywords quite possibly originated from the popularity of phrases like “Semantic Analysis,” “Semantic Indexing” and “Semantic Search” having become SEO buzzwords, given life by Ask Jeeves’ semantic search technology and Google’s purchase of semantic analysis company Applied Semantics.

The Facts About Latent Semantic Indexing

LSI is a very old method of understanding what a document is about.

It was patented in 1988, well before the internet as we know it existed.


Continue Reading Below

The nature of LSI makes it unsuitable for applying across the entire internet for purposes of information retrieval.

There are no research papers that explicitly show that latent semantic indexing is an important feature of Google search ranking.


The facts presented in this article show that this has been the case since the early 2000s.

Rumors of Google’s use of LSI and LSA surfaced in 2003 after Google acquired Applied Semantics, the company that produced the contextual advertising product AdSense.

Yet Googlers have affirmed multiple times that Google uses no such thing as LSI Keywords.

Let me say it again louder for those at the back: There is no such thing as LSI Keywords.

Considering the overwhelming amount of evidence, it is reasonable to assert that it is a fact that the concept of LSI Keywords is false.

The facts also indicate that LSI is not an important part of Google’s ranking algorithms.

Regarded in the light of recent advancements in AI, natural language processing, and BERT, the idea that Google would prominently use LSI as a ranking feature is literally beyond belief and ridiculous.



Continue Reading Below

More Resources:

Featured image by the author


The Definitive Guide To Podcast Intros



The Definitive Guide To Podcast Intros

Podcast intros are an important quality of a successful podcast.

The right intro sets the podcast on a path to success.

These seven tips will help your podcast build an audience and retain it:

  1. Hook the listeners fast.
  2. Make every second of the podcast intro count.
  3. A good podcast intro builds audience retention.
  4. Test podcast intros for audience retention.
  5. Three things a podcast intro must communicate.
  6. Podcast intro builds loyalty.
  7. Where to get music for a podcast.

Let’s dig into each one and see how you can put it work for your podcast.

1. Hook The Listeners Fast

Erin Sparks of Edge of the Web Radio podcast says that there is a subtle but important value in the podcast intro when it comes to what he calls, “click browsing.”

Erin suggests that the intro functions like a hook – to grab the listener’s attention and immediately intrigue them.

He shares this insight:

“The audio ‘hook’ is important to podcast click browsing. Walking through a podcast app, people will click and listen to 7-10 seconds to hear if they ‘feel’ the show.

Much different than any other medium.”


Chris Brogan of Making the Brand podcast agrees that a podcast intro should be short.

He shares these insights on the qualities of a useful podcast intro:

“I’m a huge fan of brief. Once you hear it more than twice, it’s boring to everyone.

An intro should set the mental stage for what’s coming up.

Choose music and words that emulate the show.”

2. Make Every Second Of The Podcast Intro Count

Jorge Hermida, Program Director at WMR.FM and Cannabis Radio Podcasts, observes that it’s important to give listeners a reason to stick around for the podcast but to do it in the shortest amount of time possible.

He says there is absolutely no time to waste within your podcast intro so it’s super important to literally make every second count.

He shares:

“Podcast listeners, just like anybody else, have a short attention span.

You have to give listeners a reason to listen to your content within the first 30 seconds.


Whether you create a cold opener or you run down what you’re going to be talking about on the program, you need to satisfy that listener immediately.

Create the intro as if every listener has a short attention span because in my professional experience, they will either stay and listen to your show, or they’ll drop off and find another show to listen to.”

3. Podcast Intro Builds Audience Retention

Azeem Ahmad of the Azeem Digital SEO podcast shares that a good podcast intro will help maintain audience retention, as well as encourage engagement and loyalty.

This is an element of conversion theory, where even seemingly trivial elements can encourage or discourage the action we are looking for.

A classic example is a PPC arbitrage marketer who maximizes the number of sales for every click.

Affiliate PPC marketers succeed or go out of business fast depending on how well they convert every visitor.

This person discovered that detecting the mobile device and adding an “iPhone friendly” or “Android friendly” badge increased their conversion rates by a measurable rate.

The follow-up insight Azeem suggests is similar.


He said that a podcast intro has the same effect of encouraging a user to click and stay for the podcast or to leave.

And for that reason, it’s important to view the intro as a configurable asset that can be used to improve audience retention.

Azeem shares how a podcast intro is important for retention rates and engagement:

“People will get bored with repetition, and regardless of your podcast format – the idea is to engage the listener.

If you lose them within the first 30 seconds, you will very likely see a drop in retention rate and engaged listeners.”

4. Test Podcast Intros For Audience Retention

Azeem next shares that a way to improve retention and engagement is to experiment with new intros and outros.

He shares this tip:

“As a host you should change this up sometimes.

Customizing the intro every time is basically an option to test for what works the best.

For example, you could test asking people to subscribe in the intro vs. the outro for a few episodes and see which drives more growth.”


5. Three Things A Podcast Intro Must Communicate

Sparks offers useful information about what should be communicated in a podcast introduction.

He shares how the introduction should communicate the “What’s in it for me?” proposition to the listener.

Figuring out the tried-and-true principle of answering the question of “What’s in it for me?” is a great way to think about how to create a podcast intro that is useful for the listener.

So, it makes sense to apply that approach to podcast intros so that a listener is reminded of why they are there, which could be to become better at what they do, to catch up on industry news, to be entertained, etc.

Here is what Sparks shares:

“A good intro provides:

  1. A promise to the listener in the first five to seven seconds (a transaction of knowledge communicating what they are going to get).
  2. Sonic branding.
  3. Credibility, contextual reference to subject matter expertise.”

6. Podcast Intro Builds Loyalty

Jim Hedger, the co-host of the popular Webcology SEO podcast, suggests that the podcast intro helps to build a sense of familiarity and ownership of a space.

I’ve noticed that people tend to feel a sense of ownership in a website they enjoy, perhaps because the site might be a part of their self-identity as a baker, sportsperson, or whatever the topic is.

Ever walk into a favorite restaurant and immediately receive a feeling of comfort or anticipation?

It’s a sense of ownership of an experience, that this experience is yours and it’s yours yet again.


Hedger says that a podcast intro can have a similar effect, to bring a sense of comfort and anticipation that one feels in physical spaces that one feels loyal and connected to.

He observes:

“I once read that people aren’t loyal to restaurants as much as they are loyal to spaces they feel comfortable being in.

The same can be said for podcasts.

Like radio, podcasts are a theater of the mind. Your intro is the breath that first forms the space you, your guests, and the audience will create together.

Podcasts are incredibly intimate. I think you need to feel love for your audience and deeply respect the topic and your introduction is your first chance to establish that.

A host’s job is to help the audience develop a zone in which they and the host are virtually in the same place.”

7. Where To Get Music For A Podcast Intro

Something to keep in mind is that any music used should be licensed.

There is an idea that it’s okay to use just a little bit of someone else’s music, but that might not be the case.


And if that’s the direction you are moving in, then it may be prudent to check with an attorney first.

The podcasting professionals consulted for this article all agree that it’s important to purchase a license for the right to any music used within a podcast.

Everyone agrees that it’s best to license royalty-free podcast intro music because this safeguards against copyright infringement claims.

Hermida shares:

“Our music is licensed, and most other podcasts most likely use some kind of licensed music from other licensed music providers for some original music that’s not prone to any copyright issues.

It doesn’t really matter where the music comes from, except that I would always recommend to make sure you use music that you are allowed to use and that license to use the music is documented and can be proven.”

Sparks also recommends paying for a license to use music:

“We have a number of music licenses that we have used over the years.

We highly recommend reviewing different sound repositories and utilizing them to create that sonic brand.

Places to license music are Envato Elements, Epidemic Sound, and the like.


We also have a continual license with our deep voice announcer, our voice over talent.

That should also be something to consider when you’re developing a long-term show.”

Brogan recommends:

“Epidemic Sound works fine. Buy a license. “

Always read the license when choosing a digital music asset in order to be aware of what you can and can’t do with the music and for how long you are entitled to use it.

  • Epidemic Soundseveral of the podcasters mentioned Epidemic Sound as a good place to purchase a license for music.
  • Envato Elements is a source for high-quality licensed, royalty-free music suitable for a podcast intro.
  • Shutterstock Music – Shutterstock is known for its stock photography library, but they also offer royalty-free music specifically for podcasts. A license that’s appropriate for use in a podcast costs $49.
  • Music Bakery offers royalty-free music where you pay for it once and can use it anywhere, but be sure to read the license agreement to know exactly what you are paying for.
  • InstantMusicNow offers digital downloads starting at $4.95.
  • Adobe Stock Music Library – Adobe offers royalty-free music that can be used in multiple projects.

Podcast Intros Are Important

At this point, it should be clear that a seemingly trivial thing like a podcast intro is actually part of the foundation of a successful podcast.

Clearly, the content of the podcast is the most important quality of a podcast.

Yet, as important as the content is, it’s the podcast intro that sets the stage and makes listeners feel they have arrived at their happy place, while also communicating what is in it for the listener, which encourages them to stick around for the content.

More Resources:

Featured Image: Alex from the Rock/Shutterstock



if( typeof sopp !== “undefined” && sopp === ‘yes’ ){
fbq(‘dataProcessingOptions’, [‘LDU’], 1, 1000);
fbq(‘dataProcessingOptions’, []);

fbq(‘init’, ‘1321385257908563’);

fbq(‘track’, ‘PageView’);

fbq(‘trackSingle’, ‘1321385257908563’, ‘ViewContent’, {
content_name: ‘definitive-guide-podcast-intros’,
content_category: ‘content creation trends’

Source link

See also  What Is Discord & How Can You Use It For Marketing?
Continue Reading

Subscribe To our Newsletter
We promise not to spam you. Unsubscribe at any time.
Invalid email address