Connect with us


How To Build A Recommender System With TF-IDF And NMF (Python)



How To Build A Recommender System With TF-IDF And NMF (Python)

Topic clusters and recommender systems can help SEO experts to build a scalable internal linking architecture.

And as we know, internal linking can impact both user experience and search rankings. It’s an area we want to get right.

In this article, we will use Wikipedia data to build topic clusters and recommender systems with Python and the Pandas data analysis tool.

To achieve this, we will use the Scikit-learn library, a free software machine learning library for Python, with two main algorithms:

  • TF-IDF: Term frequency-inverse document frequency.
  • NMF: Non-negative matrix factorization, which is a group of algorithms in multivariate analysis and linear algebra that can be used to analyze dimensional data.

Specifically, we will:

  1. Extract all of the links from a Wikipedia article.
  2. Read text from Wikipedia articles.
  3. Create a TF-IDF map.
  4. Split queries into clusters.
  5. Build a recommender system.

Here is an example of topic clusters that you will be able to build: 

Screenshot from Pandas, February 2022

Moreover, here’s the overview of the recommender system that you can recreate.

example of a recommender system in pandasScreenshot from Pandas, February 2022

Ready? Let’s get a few definitions and concepts you’ll want to know out of the way first.

The Difference Between Topic Clusters & Recommender Systems

Topic clusters and recommender systems can be built in different ways.

In this case, the former is grouped by IDF weights and the latter by cosine similarity. 

In simple SEO terms:

  • Topic clusters can help to create an architecture where all articles are linked to.
  • Recommender systems can help to create an architecture where the most relevant pages are linked to.

What Is TF-IDF?

TF-IDF, or term frequency-inverse document frequency, is a figure that expresses the statistical importance of any given word to the document collection as a whole.

TF-IDF is calculated by multiplying term frequency and inverse document frequency.

  • TF: Number of times a word appears in a document/number of words in the document.
  • IDF: log(Number of documents / Number of documents that contain the word).

To illustrate this, let’s consider this situation with Machine Learning as a target word:

  • Document A contains the target word 10 times out of 100 words.
  • In the entire corpus, 30 documents out of 200 documents also contain the target word.

Then, the formula would be:

TF-IDF = (10/100) * log(200/30)

What TF-IDF Is Not

TF-IDF is not something new. It’s not something that you need to optimize for. 

According to John Mueller, it’s an old information retrieval concept that isn’t worth focusing on for SEO.

There is nothing in it that will help you outperform your competitors.

Still, TF-IDF can be useful to SEOs.

Learning how TF-IDF works gives insight into how a computer can interpret human language.

Consequently, one can leverage that understanding to improve the relevancy of the content using similar techniques.

What Is Non-negative Matrix Factorization (NMF)?

Non-negative matrix factorization, or NMF, is a dimension reduction technique often used in unsupervised learning that combines the product of non-negative features into a single one.

In this article, NMF will be used to define the number of topics we want all the articles to be grouped under.

Definition Of Topic Clusters

Topic clusters are groupings of related terms that can help you create an architecture where all articles are interlinked or on the receiving end of internal links.

Definition Of Recommender Systems

Recommender systems can help to create an architecture where the most relevant pages are linked to.

Building A Topic Cluster

Topic clusters and recommender systems can be built in different ways.

In this case, topic clusters are grouped by IDF weights and the Recommender systems by cosine similarity. 

Extract All The Links From A Specific Wikipedia Article

Extracting links on a Wikipedia page is done in two steps.

First, select a specific subject. In this case, we use the Wikipedia article on machine learning.

Second, use the Wikipedia API to find all the internal links on the article.

Here is how to query the Wikipedia API using the Python requests library.

import requests

main_subject="Machine learning"

params = {
        'action': 'query',
        'format': 'json',
        'titles': main_subject,

r = requests.get(url, params=params)
r_json = r.json()
linked_pages = r_json['query']['pages']

page_titles = [p['title'] for p in linked_pages.values()]

At last, the result is a list of all the pages linked from the initial article.

all the pages linkedScreenshot from Pandas, February 2022

These links represent each of the entities used for the topic clusters.

Select A Subset Of Articles

For performance purposes, we will select only the first 200 articles (including the main article on machine learning).

# select first X articles
num_articles = 200
pages = page_titles[:num_articles] 

# make sure to keep the main subject on the list
pages += [main_subject] 

# make sure there are no duplicates on the list
pages = list(set(pages))

Read Text From The Wikipedia Articles

Now, we need to extract the content of each article to perform the calculations for the  TF-IDF analysis.

To do so, we will fetch the API again for each of the pages stored in the pages variable.

From each response, we will store the text from the page and add it to a list called text_db.

Note that you may need to install tqdm and lxml packages to use them.

import requests
from lxml import html
from tqdm.notebook import tqdm

text_db = []
for page in tqdm(pages):
    response = requests.get(
                'action': 'parse',
                'page': page,
                'format': 'json',

    raw_html = response['parse']['text']['*']
    document = html.document_fromstring(raw_html)
    for p in document.xpath('//p'):
        text += p.text_content()

This query will return a list in which each element represent the text of the corresponding Wikipedia page.

## Print number of articles
print('Number of articles extracted: ', len(text_db))


Number of articles extracted:  201

As we can see, there are 201 articles.

This is because we added the article on “Machine learning” on top of the top 200 links from that page.

Furthermore, we can select the first article (index 0) and read the first 300 characters to gain a better understanding.

# read first 300 characters of 1st article


'nBiology is the  scientific study of life.[1][2][3] It is a natural science with a broad scope but has several unifying themes that tie it together as a single, coherent field.[1][2][3] For instance, all organisms are made up of  cells that process hereditary information encoded in genes, which can '

Create A TF-IDF Map

In this section, we will rely on pandas and TfidfVectorizer to create a Dataframe that contains the bi-grams (two consecutive words) of each article.

Here, we are using TfidfVectorizer.

This is the equivalent of using CountVectorizer followed by TfidfTransformer, which you may see in other tutorials.

In addition, we need to remove the “noise”. In the field of Natural Language Processing, words like “the”, “a”, “I”, “we” are called “stopwords”.

In the English language, stopwords have low relevancy for SEOs and are overrepresented in documents.

Hence, using nltk, we will add a list of English stopwords to the TfidfVectorizer class.

import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from nltk.corpus import stopwords

# Create a list of English stopwords stop_words = stopwords.words('english')
# Instantiate the class vec = TfidfVectorizer( stop_words=stop_words, ngram_range=(2,2), # bigrams use_idf=True )
# Train the model and transform the data tf_idf = vec.fit_transform(text_db)
# Create a pandas DataFrame df = pd.DataFrame( tf_idf.toarray(), columns=vec.get_feature_names(), index=pages )
# Show the first lines of the DataFrame df.head()
tfidf pandas resultScreenshot from Pandas, February 2022

In the DataFrame above:

  • Rows are the documents.
  • Columns are the bi-grams (two consecutive words).
  • The values are the word frequencies (tf-idf).
word frequenciesScreenshot from Pandas, February 2022

Sort The IDF Vectors

Below, we are sorting the Inverse document frequency vectors by relevance.

idf_df = pd.DataFrame(
idf weightsScreenshot from Pandas, February 2022

Specifically, the IDF vectors are calculated from the log of the number of articles divided by the number of articles containing each word.

The greater the IDF, the more relevant it is to an article.

The lower the IDF, the more common it is across all articles.

  • 1 mention out of 1 articles = log(1/1) = 0.0
  • 1 mention out of 2 articles = log(2/1) = 0.69
  • 1 mention out of 10 articles = log(10/1) = 2.30
  • 1 mention out of 100 articles = log(100/1) = 4.61

Split Queries Into Clusters Using NMF

Using the tf_idf matrix, we will split queries into topical clusters.

Each cluster will contain closely related bi-grams.

Firstly, we will use NMF to reduce the dimensionality of the matrix into topics.

Simply put, we will group 201 articles into 25 topics.

from sklearn.decomposition import NMF
from sklearn.preprocessing import normalize

# (optional) Disable FutureWarning of Scikit-learn
from warnings import simplefilter
simplefilter(action='ignore', category=FutureWarning)

# select number of topic clusters
n_topics = 25

# Create an NMF instance
nmf = NMF(n_components=n_topics)

# Fit the model to the tf_idf
nmf_features = nmf.fit_transform(tf_idf)

# normalize the features
norm_features = normalize(nmf_features)

We can see that the number of bigrams stays the same, but articles are grouped into topics.

# Compare processed VS unprocessed dataframes
print('Original df: ', df.shape)
print('NMF Processed df: ', nmf.components_.shape)

Secondly, for each of the 25 clusters, we will provide query recommendations.

# Create clustered dataframe the NMF clustered df
components = pd.DataFrame(

clusters = {}

# Show top 25 queries for each cluster
for i in range(len(components)):
    clusters[i] = []
    loop = dict(components.loc[i,:].nlargest(25)).items()
    for k,v in loop:
        clusters[i].append({'q':k[0],'sim_score': v})

Thirdly, we will create a data frame that shows the recommendations.

# Create dataframe using the clustered dictionary
grouping = pd.DataFrame(clusters).T
grouping['topic'] = grouping[0].apply(lambda x: x['q'])
grouping.drop(0, axis=1, inplace=True)
grouping.set_index('topic', inplace=True)

def show_queries(df):
    for col in df.columns:
        df[col] = df[col].apply(lambda x: x['q'])
    return df

# Only display the query in the dataframe
clustered_queries = show_queries(grouping)

Finally, the result is a DataFrame showing 25 topics along with the top 25 bigrams for each topic.

example of a topic cluster in pandasScreenshot from Pandas, February 2022

Building A Recommender System

Now, instead of building topic clusters, we will now build a recommender system using the same normalized features from the previous step.

The normalized features are stored in the norm_features variable.

# compute cosine similarities of each cluster
data = {}
# create dataframe
norm_df = pd.DataFrame(norm_features, index=pages)
for page in pages:
    # select page recommendations
    recommendations = norm_df.loc[page,:]

    # Compute cosine similarity
    similarities =

    data[page] = []
    loop = dict(similarities.nlargest(20)).items()
    for k, v in loop:
        if k != page:
            data[page].append({'q':k,'sim_score': v})

What the code above does is:

  • Loops through each of the pages selected at the start.
  • Selects the corresponding row in the normalized dataframe.
  • Computes the cosine similarity of all the bigram queries.
  • Selects the top 20 queries sorted by similarity score.

After the execution, we are left with a dictionary of pages containing lists of recommendations sorted by similarity score.

similarity scoreScreenshot from Pandas, February 2022

The next step is to convert that dictionary into a DataFrame.

# convert dictionary to dataframe
recommender = pd.DataFrame(data).T

def show_queries(df):
    for col in df.columns:
        df[col] = df[col].apply(lambda x: x['q'])
    return df


The resulting DataFrame shows the parent query along with sorted recommended topics in each column.

example of a recommender system in pandasScreenshot from Pandas, February 2022


We are done building our own recommender system and topic cluster.

Interesting Contributions From The SEO Community

I am a big fan of Daniel Heredia, who has also played around with TF-IDF by finding relevant words with TF IDF, textblob, and Python.

Python tutorials can be daunting.

A single article may not be enough.

If that is the case, I encourage you to read Koray Tuğberk GÜBÜR’s tutorial, which exposes a similar way to use TF-IDF.

Billy Bonaros also came up with a creative application of TF-IDF in Python and showed how to create a TF-IDF keyword research tool.


In the end, I hope you have learned a logic here that can be adapted to any website.

Understanding how topic clusters and recommender systems can help improve a website’s architecture is a valuable skill for any SEO pro wishing to scale your work.

Using Python and Scikit-learn, you have learned how to build your own – and have learned the basics of TF-IDF and of non-negative matrix factorization in the process.

More resources:

Featured Image: Kateryna Reka/Shutterstock

Source link


Top YouTube Videos, Shorts, And Ads of 2022



Top YouTube Videos, Shorts, And Ads of 2022

Examining YouTube’s list of the top trending videos and top Shorts of 2022, as well as the YouTube Ads Leaderboard: 2022 year-end-wrap-up can teach content marketers, content creators, and digital advertisers some important lessons that they can apply in 2023.

But, it helps if you have a secret decoder ring to decipher why there are three lists – and why each one uses a different methodology to come up with the rankings.

YouTube unveiled its first list of the 10 most-watched YouTube videos back in December 2010. Unfortunately, that list taught many marketers that “view count” was the only metric that mattered.

But, I got my secret decoder ring back in October 2012, when YouTube started adjusting the ranking of videos in YouTube search results to reward engaging videos that kept viewers watching.

In other words, YouTube replaced “view count” with “watch time.”

This was a significant shift, because “watch time” gives you a sense of what content viewers actually watch, as opposed to videos that they click on and then abandon.

In December 2012, YouTube shifted from unveiling its 10 “most-watched” videos of the year to unveiling its “top trending videos,” based on time spent watching, sharing, commenting, liking, and other factors.

In other words, “watch time” and “engagements” were now the metrics that mattered.

Today, YouTube’s algorithm rewards “viewer satisfaction.”

In other words, YouTube doesn’t pay attention to videos; it pays attention to viewers.

So, rather than trying to make videos that’ll make an algorithm happy, focus on making videos that make your viewers happy.

This brings us to YouTube’s lists of “trending videos” and “top Shorts” for 2022.

To learn important lessons that can be applied in 2023, we need to realize that YouTube’s discovery system uses both absolute and relative watch time as signals when deciding audience engagement.

Ultimately, YouTube wants both short and long videos to succeed, so relative watch time is more important for short videos, and absolute watch time is more important for longer videos.

Top 7 Trending Videos Of 2022

1. “So Long Nerds“ By Technoblade (6:32 long, 88.3 million Views, 10.2 million engagements)

In this moving tribute, the father of beloved Minecraft creator Technoblade reads a farewell letter from his son.

The gamer lost his battle with cancer in June, but his legacy remains on YouTube.

2. “Watch The Uncensored Moment Will Smith Smacks Chris Rock On Stage At The Oscars, Drops F-bomb” By Guardian News (1:24 long, 104 million Views, and 1.8 million engagements)

It was the smack heard ‘round the world: Academy Award winner Will Smith went off-script and slapped Chris Rock, live on-stage, at the film industry’s most prestigious event.

3. “Hi, I’m Dream” By Dream (5:42 long, 48.5 million Views, and 4.7 million engagements)

Dream’s ingenuity within Minecraft has led him to become a top creator with a devoted fanbase.

But no one knew what he looked like IRL, until now.

4. “ Dre, Snoop Dogg, Eminem, Mary J. Blige, Kendrick Lamar & 50 Cent Full Pepsi Sb Lvi Halftime Show” By NFL (14:41 long, 146 million Views, and 3.5 million engagements)

Lose yourself in this epic Super Bowl halftime show packed with some of the biggest artists in hip-hop history: Dr. Dre, Snoop, Eminem, Mary J. Blige, Kendrick Lama, and 50 Cent.

5. “I Built Willy Wonka’s Chocolate Factory!” By Mrbeast (17:01 long, 132 million Views, and 5.1 million engagements)

In a “Willy Wonka” inspired warehouse, MrBeast challenges contestants to traverse a chocolate river, climb a candy wall, compete in confection-themed games, and indulge in their sweetest fantasies.

6. “Pranks Destroy Scam Callers- Glitterbomb Payback” By Mark Rober (26:41 long, 55.9 million Views, and 2.2 million engagements).

Engineer Mark Rober exacts dazzling revenge on a scam call center in the latest version of his glitterbomb series.

7. “Being Not Straight” By Jaiden Animations (15:22 long, 17.8 million Views, and 1.7 million engagements)

In this coming-out video, Jaiden Animations depicts a personal journey from adolescence to adulthood, sharing how they discovered their sexual identity along the way.

Top 7 Shorts Of 2022

1. “Diver Cracks Egg At 45 Ft Deep #Shorts” By Shangerdanger (0:56 long, 251 million Views, and 12.3 million engagements)

The ocean floor is a mysterious place. It’s full of unknown sea creatures, strange plants, and…chicken eggs?!

Join Shangerdanger as he cracks up the internet and dives egg-first into the blue depths.

2. “Sarah Trust Challenges” By Hingaflips (0:31 long, 142 million Views, and 6.5 million engagements)

Better than parkour? This is Trampwall: an epic sport where acrobats defy gravity and leap off a wall, onto a trampoline, to pull off mind-blowing aerial stunts.

3. “Come With Me To Shave My Fluffy Dog! #Doggrooming #Grooming #Goldendoodle” By Brodie That Dood (0:52 long, 108 million Views, and 6.8 million engagements)

For years, his long fluffy fur has made Brodie one of the most iconic dogs on YouTube. So, the heartbreak was real when it was decided that he needed a close trim.

4. “Dave and Busters Bet Me 1000 Tickets I Couldn’t Do This…” By Chris Ivan (0:59 long, 83.6 million Views, and 6.3 million engagements).

No one does trick shots like creator Chris Ivan. In this Short, he attempts to land a plunger on a Dave & Buster’s sign.

The prize? 1,000 tickets … if he can pull it off.

5. “That Gap Between Your Car Seat and Center Console” By Jay & Sharon (0:58 long, 182 million Views, and 6.4 million engagements)

We’ve all lost something in the dreaded gap between the car seat and the center console.

In this comedic sketch, creators Jay & Sharon show us what’s really going on down there.

6. “Welcome To The Stomach #Shorts” By Adrian Bliss (0:34 long, 118 million Views, and 7.0 million engagements)

In this bite-sized skit, witty creator Adrian Bliss brings to life all the characters trying to gain entrance – and party in – his space-limited stomach.

7. “This Magic Trick Explained (America’s Got Talent)” By Zack D. Films (0:34 long, 97.4 million Views, and 5.6 million engagements).

How did he do it? The judges of “America’s Got Talent” were confounded by this magic trick.

But not internet-sleuth Zack D., who unveils its clever secret.

Top 7 YouTube Ads Of 2022

Meanwhile, YouTube uses an entirely different methodology to determine the top YouTube ad for its 2022 year-end wrap-up Leaderboard. This makes sense.

The top ads are generally the ones with the biggest budgets, which drive up view counts, but not always engagements.

1. “Amazon’s Big Game Commercial: Mind Reader” By Amazon (1:31 long, 69.7 million Views, and 25,700 engagements)

The creative agency for this ad was Lucky Generals and the media agency was IPG – Rufus.

The ad’s description asks, “Is Alexa reading minds a good idea? No. No, it is not.”

2. “Welcome To Clan Capital! Clash Of Clans New Update!” By Clash Of Clans (1:20 long, 52.9 million Views, and 212,000 engagements)

The creative agency was Psyop, and the media agency was in-house.

The ad’s description says,

“Welcome to the ultimate clan destination! A place where you and your clan can BUILD and BATTLE together! A place called CLAN CAPITAL!”

3. “Goal Of The Century X BTS | Yet To Come (Hyundai Ver.) Official Music Video” By Hyundaiworldwide (4:08 long, 40.5 million Views, and 886,000 engagements)

The ad’s description says,

“Our ‘Goal of the Century’ can’t be achieved by one individual alone, but we can achieve it if we all join forces and unite.

Just like football players come together as a team to score goals, we aim to use the power of football to go forward together in pursuit of the greatest goal – ‘A united world for sustainability.’”

4. “Harry Potter 20th Anniversary: Return To Hogwarts | Official Trailer | HBO Max” By HBO Max (1:58 long, 27.3 million Views, and 739,000 engagements)

The creative agency was in-house, and the media agency was Hearts & Science.

The ad’s description says,

“Harry Potter 20th Anniversary: Return to Hogwarts invites fans on a magical first-person journey through one of the most beloved film franchises of all time as it reunites Daniel Radcliffe, Rupert Grint, Emma Watson, and other esteemed cast members and filmmakers across all eight Harry Potter films for the first time to celebrate the anniversary of the franchise’s first film, Harry Potter and the Sorcerer’s Stone.”

5. “Introducing iPhone 14 Pro | Apple” by Apple (4:20 long, 23.8 million views, and 571,000 engagements)

The ad’s description asks, “What lies beyond a traditional smartphone? Let’s find out. This is iPhone 14 Pro.”

6. All of Us Are Dead | Official Trailer | Netflix” by Netflix (2:35 long, 22.6 million views, and 518,000 engagements)

The creative agency was The Refinery, and the media agency was in-house. The ad’s description says,

“All of us will die. There is no hope.” The school turned into a bloody battleground and our friends into worst enemies. Who will make it out alive?”

7. Sally’s Seashells (Extended) | Big Game Commercial 2022“ by Squarespace (1:07 long, 21.6 million views, and 67,600 engagements)

The media agency was in-house. The ad’s description says,

“See everything that Sally sells in this extended cut of our 2022 Big Game commercial. Starring Zendaya as Sally and narrated by andré 3000.”

Most Important Lesson That Marketers Can Apply In 2023

Looking back at YouTube’s lists of top trending videos, top Shorts, and top ads for 2022, there is a meta-lesson that marketers can learn: one size does not fit all.

Different metrics matter when measuring different types of video, and different types of ads are better for different marketing objectives.

Or, as the British say, “There are horses for courses.”

Now, that’s a lesson that all of us can apply in 2023, and beyond.

More resources:

Featured Image: /Shutterstock

Source link

Continue Reading


Meta Reinstates Trump to Facebook & Instagram



Meta Reinstates Trump to Facebook & Instagram

Meta announced that the suspension of former president Trump from Facebook and Instagram will be lifted within a few weeks, with “guardrails” in place to discourage repeat offenses.

2021 Suspension of Trump

Meta indefinitely suspended the account of then-president Trump after he praised the people who engaged in anti-government violence that ended in several deaths.

The suspension was reviewed by the Meta Oversight Board who concluded that the indefinite suspension was inconsistent with rules in place for dealing with policy violations.

The Oversight Board wrote:

“…it was not appropriate for Facebook to impose the indeterminate and standardless penalty of indefinite suspension. Facebook’s normal penalties include removing the violating content, imposing a time-bound period of suspension, or permanently disabling the page and account.

The Board insists that Facebook review this matter to determine and justify a proportionate response that is consistent with the rules that are applied to other users of its platform.”

Facebook responded to the board that the suspension will last for two years beginning on January 7, 2021 after which the suspension would be reconsidered.

The indefinite suspension remained in place until the announcement that it will be lifted in the weeks following January 25, 2022, just over two years after the suspension.

Why the Suspension of Trump was Lifted

The review of the suspension was timed for two years after the imposition of the original suspension on January 7, 2021. This was by agreement with the Oversight Board.

Meta undertook a review of whether Trump continued to pose a risk to public safety and decided that enough had changed to lower the risk.

The explanation of the decision indicated that multiple factors were considered:

“To assess whether the serious risk to public safety that existed in January 2021 has sufficiently receded, we have evaluated the current environment according to our Crisis Policy Protocol, which included looking at the conduct of the US 2022 midterm elections, and expert assessments on the current security environment.

Our determination is that the risk has sufficiently receded, and that we should therefore adhere to the two-year timeline we set out.

As such, we will be reinstating Mr. Trump’s Facebook and Instagram accounts in the coming weeks. However, we are doing so with new guardrails in place to deter repeat offenses.”

Facebook Public Figure Penalty Guardrails

Meta published updated policies, Restricting accounts by public figures during civil unrest, that describe the new protocols for dealing with public figures who violate Meta guidelines.

The updated rules apply to both Facebook and Instagram.

The new policies outline tiered penalties increasing in severity depending on the content violations.

Meta explained that the goal of the penalties were to deter violations of their policies.

The penalties last from one to thirty days to as long as two years for especially egregious violations.

Three factors will be considered to determine the severity of the penalty:

  1. “The severity of the violation and the public figure’s history on Facebook or Instagram, including current and past violations.
  2. The public figure’s potential influence over, and relationship to, the individuals engaged in violence.
  3. The severity of the violence and any related physical harm.”


Heightened Penalties

Public figures who return after a suspension will face heightened penalties, including disabling the account of any public figure that fails to respond to repeated warnings.

Meta’s rules targets QAnon content and outlines specific measures they will take that will limit the reach of penalized public figures.

That means anyone who is following the restricted account of a public figure will not see content posted to those accounts, plus the removal of reshare buttons.

“Our updated protocol also addresses content that does not violate our Community Standards but that contributes to the sort of risk that materialized on January 6, such as content that delegitimizes an upcoming election or is related to QAnon.

We may limit the distribution of such posts, and for repeated instances, may temporarily restrict access to our advertising tools.

This step would mean that content would remain visible on Mr. Trump’s account but would not be distributed in people’s Feeds, even if they follow Mr. Trump.

We may also remove the reshare button from such posts, and may stop them being recommended or run as ads.”

Response to Reinstatement of Trump

Facebook’s announcement stated that they expect to be criticized but that the decision was guided by guidelines set down by the Oversight Board.

The response on social media was predictably passionate, with congressman Adam Schiff characterizing the reinstatement as Facebook having “caved.”

Others accused Facebook of having no rules or procedures even though Meta’s decision was based on rules and procedures.

Many of the top tweets commenting on the Trump reinstatement commented that Facebook’s decision was based on greed while others lamented the lack of consequences from Trump’s action, even though he was punished with a two year suspension.

Read Meta’s Announcement:

Ending Suspension of Trump’s Accounts With New Guardrails to Deter Repeat Offenses



Source link

Continue Reading


Wayback Machine: 5 Alternatives To Try



Wayback Machine: 5 Alternatives To Try

Much of the web is ephemeral.

Web pages exist until they don’t. The content on them exists until it’s updated – and then it’s gone.

Unless you go digging in an archive.

Archiving the web is important for cultural and anthropological research. It’s also helpful for business reasons, like competitive analysis. It can even help document or monitor political processes.

Your particular reason for seeking archived content might determine which service works best.

The Wayback Machine is the most commonly known archive.

Screenshot,, January 2023.

The Internet Archive is a nonprofit organization, and the Wayback Machine is the web version of its archive, containing an absolutely massive amount of data.

You can request that it save a webpage in its current state, as well as make use of tools, like an API.

As huge as the Wayback Machine archive is, it’s likely not 100% complete. If you’re having trouble finding something specific or wondering if there are alternatives with more features, these alternatives might help.

I won’t be going over paid SaaS subscriptions, as I don’t consider a paid service a true alternative to a free one provided by a nonprofit.

Let’s go!

1. The Memento Project

Memento is an exceptional alternative to the Wayback Machine because it aggregates several different sources, including the Wayback Machine itself.

On the website, you can access archives from several sources by using the Time Travel tool.

Wayback Machine: 5 Alternatives To TryScreenshot,, January 2023.

This is the first distinction that makes Memento so cool, and it includes some of the other archives on this list, too. That means it’s a customizable experience and likely one of the most complete.

Memento’s other distinct feature is the Chrome extension that allows you to select the date on which you’d like to view your current page. This brings the tool to where you’re browsing instead of making you put a URL into a form.

You can also create a snapshot of a page and generate a link to it that won’t break. This is particularly useful for citation.

If you’re concerned a page might disappear, or the content might get updated, but you want to use the information, creating one of these links ensures that people will be able to see your original source.

2. is another “snapshot” tool. It allows you to save a link to a page as it currently exists.

Following the link will send users to an unalterable version of the page.

Wayback Machine: 5 Alternatives To TryScreenshot,, January 2023.

It also features some relatively advanced search queries you can perform on domains and URLs to find snapshots that have been saved with the tool.

This tool also features a Chrome extension as well as an Android app.

Searches on Memento can include results from

3. WebCite

WebCite has powerful applications for authors, journalists, academics, and publishers.

It offers a variety of ways to build and present the archived pages and the URLs.

Wayback Machine: 5 Alternatives To TryScreenshot, r, January 2023.

Unfortunately, at the time of publishing, it doesn’t appear to be taking new requests. But you can still access already archived pages. When and if it starts accepting requests again, it’s a very useful tool for that.

Its most powerful feature for authors and publishers is the ability to upload a manuscript directly to the website.

The tool will scan every link in an uploaded manuscript and automatically create archives of each of the pages linked to as they currently exist. This saves a lot of time if you’ve used a lot of website citations.

If you’ve created content that you want people to be able to create snapshots of, then you can add a specific WebCite link to your page that users can click on. This embeds archive functionality into your page, saving users time if they decide to use your work as a citation.

4. GitHub

GitHub is a development and collaboration platform that also prioritizes public projects and open-source code.

It documents and archives open-source code and programs, and is searchable by other archives such as the Wayback Machine.

Wayback Machine: 5 Alternatives To TryScreenshot,, January 2023.

But, if you’re looking for something related to code or software development, it might be easier to go straight to GitHub instead of using another archive service.

While it does have paid business plans, GitHub is free for the average user. It even offers 15GB of storage and some computing power in its cloud developer environment for free for your personal use.

5. Country-Specific Web Archives

Several countries run their own web archives.

These can be particularly helpful alternatives to the Wayback Machine if you’re looking for a website highly relevant to a specific location, or the culture of a country.

More focused archives might have more complete information if you’re having trouble finding it elsewhere, although again, I want to mention that the first alternative in this list, Memento, pulls from several different country-specific archives.

I should also note that many archives specific to a country, region, educational institution, or individual library are partnered with Archive-it, a service provider built by The Internet Archive (makers of the Wayback Machine).

They curate specific collections based on relevance, but all Archive-it partners leverage the same source: The Internet Archive.

These are a few of the country-specific web archives:


When you’re looking for alternatives to the Wayback Machine, you might not realize that a great many of them, in part or in whole, are powered by the same archive.

But there are other services out there you can use. Some have more helpful features, depending on what your goals are.

This isn’t an exhaustive list of alternative tools, but it’s most of the easily accessible tools for the average user.

Others require monthly payments, and some are free to academic and legal institutions, but not to individual users.

I chose to focus on the best of the tools that you could go and use right now with no fuss.

More resources:

Featured Image: Studio Romantic/Shutterstock

Source link

Continue Reading