SEO
How to Block, Scrapers, Hackers and Spammers with Wordfence
Wordfence is a popular WordPress security plugin. Among the features are scanner that monitors for hacked files and a firewall with regularly updated rules that proactively blocks malicious bots.
There’s also a useful feature tucked away in the tool that makes user-configurable firewall rules available that can supercharge your ability to block hackers, scrapers and spammers.
Scrapers are especially troublesome because they copy your content and publish it elsewhere.
Using a tool like Wordfence can help reduce the amount of content that scrapers can plagiarize.
There are many WordPress security plugins and SaaS solutions to choose from that are highly recommended, including Sucuri Security and Cloudflare. Wordfence is one of many security solutions available and it’s up to you to figure out which feels more comfortable within your workflow.
Wordfence and other solutions function fine as a set it and forget it solution.
However, in my experience I have found that the user configurable firewall in Wordfence gives one an opportunity to dial up the bot hammering power and really stick it to the hackers and scrapers.
But before you dial up the firewall it’s important to know how far these firewall rules can be taken and we’ll take a look at that, too.
Wordfence WordPress Security
Wordfence is trusted by over 4 million users for protecting their WordPress sites.
The default Firewall behavior is to block bots that grab too many pages too fast or bots and humans that display activities that signal an intent to hack the site.
The firewall will block the IP address of the rogue bot for a set period of time, after which Wordfence drops the block.
The default settings on the firewall works great.
But sometimes bots still get through and are able to scrape a site or probe it for vulnerabilities by scraping the site slowly.
A common approach by hackers is to set a bot to hit the site quickly and when it gets blocked it will rotate to other IP addresses and user agents, which causes a firewall to start the detection process all over again.
But these bots aren’t always programmed very well which makes it easy to block them more efficiently than with the default Wordfence settings.
Background Information About Wordfence Firewall Rules
It’s possible to accomplish efficient bot blocking with server level tools, multiple plugins and even by the use of an .htaccess file.
But editing an .htaccess file can be tricky because there are strict rules to follow and a mistake in the .htaccess file can cause the entire site to fail.
Using firewall rules is simply an easier way to block bots.
What Can You Block With Wordfence?
Wordfence allows you to create rules to block according to each of the following reasons:
- IP Address Range
- Hostname
- Browser User Agent
- Referrer
IP Address Range
IP address means the IP address of the server or ISP that the bot or human is coming from.
Hostname
Hostname means the name of the host. The host isn’t always declared, sometimes the bot/human visitor displays just an IP address.
Browser User Agent
Every site visitor generally tells the server what browser it is using. Browser User Agent means the browser that the visitor says it’s using. A bot can say it’s virtually any browser, which they sometimes do in order to evade detection.
Referrer
This is a page that a bot or human supposedly clicked a link from.
Wordfence Custom Pattern Blocking
The way to block bad bots using any of the above four variables is by adding a custom rule in the Custom Pattern Blocking tool.
Here’s how to reach it.
Step 1
Click the link to the Firewall from the left side admin menu in WordPress
Step 2
Choose the tab labeled Blocking
Step 3
Choose the “Custom Pattern” tab and create a firewall rule in the appropriate field. One of the fields is labeled “Block Reason.” Use that field to add a descriptive phrase like Hostname, User Agent or whatever. It will help you to review all rules you create by being able to sort by what kind of block it is.
Step 4
Step 5
Make your rule by clicking the “Block Visitors Matching This Pattern” button and you’re done.
Wordfence rules can use the asterisk (*) as a wild card.
Should You Block IP Addresses with Wordfence?
Wordfence makes it easy for a publisher to set up firewall rules that efficiently blocks bots.
That’s a blessing but it can also be a curse. For example, permanently blocking thousands of IP addresses using Wordfence firewall is not efficient and probably not a proper use of Wordfence.
Temporarily blocking IP addresses is fine. Permanently blocking IP addresses probably not fine because, as I understand it, going by memory, this can bloat or slow down your WordPress installation.
In general, permanently blocking thousands or even millions of IP addresses is best accomplished with an .htaccess file.
Hostname Blocking with Wordfence
Blocking a hostname with Wordfence can be a way to block hackers, spammers and scrapers. By clicking Wordfence > Tools you can view the Wordfence Live Traffic log.
That shows you bot and human visitors, including bots that were blocked automatically by Wordfence.
Not all site visitors display their hostname. However in some cases they do display their hostname and that makes it easy to block an entire web host.
For example, one site, for whatever reason, attracts DDOS levels of bot traffic from a single host. None of my other sites attracts that much attention from this host, just this one site.
Between March 2020 and December 2021 that one site received over 250,000 attacks and every single one of them was blocked by Wordfence.
Clearly, blocking bots by hostname can be useful if you want to block a cloud host that sends nothing but hackers and scrapers.
However some hosts, like Amazon Web Services (AWS) send both bad bots and good bots. Blocking AWS servers can also inadvertently block good bots.
So it’s important to monitor you’re traffic and be absolutely certain that blocking a hostname will not backfire.
On the other hand, if you have no use for traffic from Russia or China, then it’s easy to block hackers, scrapers and spammers from those two countries by creating a firewall rule using the hostname field.
All you have to do is create a rule that blocks all hostnames that end in .ru and .cn. That will block all Russian and Chinese hostnames that end in .ru and .cn.
This is what you enter into the Hostname field:
*.ru
*.cn
This is not meant to encourage anyone to use Wordfence to block Russian and Chinese bots via the hostname. It’s just an example to show how it’s done.
Block Hackers and Scrapers By User Agent
Many rogue bots use old and out of date browser user agents.
After Russia invaded Ukraine I noticed an increase in hacking bots using the Chrome 90 user agent (UA) from the same group of web hosts. Normally bot traffic is different across the different websites. So this stood out when they all looked the same across all of my sites.
Whenever Wordfence automatically blocked these bots for hitting my site too fast the bots would switch IP address and begin hitting the sites over and over again.
So I decided to block these bots by their Browser User Agent (often referred to as simply, UA).
First I checked the StatCounter website to determine how many users around the world are using Chrome 90. According to the StatCounter statistics, Chrome 90 browser share as of January 2022 stood at 0.09% market share in the USA.
At the time of this writing the Chrome browser is at version 100. Considering that Chrome automatically updates browser versions for the vast majority of users it’s not surprising that the usage of Chrome 90 is virtually nothing, so it’s very unlikely that blocking all visitors using a Chrome 90 browser user agent will not block an actual and legit person visiting your site.
So I determined that it’s safe to block anything that shows up to my site with the Chrome 90 user agent.
However, there are online tools, like GTMetrix and a security server header checker, that use the Chrome 90 user agent.
So if I blocked all versions of Chrome 90 (by using this rule: *Chrome/90.*), I would also block those two online tools.
Another way to do is to look at the specific Chrome 90 variants used by the hackers and the online tools.
GTMetrix and the other tool use this Chrome UA:
Chrome/90.0.4430.212
Hackers and scrapers use these Chrome UAs:
Chrome/90.0.4400.8 Chrome/90.0.4427.0 Chrome/90.0.4430.72 Chrome/90.0.4430.85 Chrome/90.0.4430.86 Chrome/90.0.4430.93
So, if you want to allow the online tools to still scan your site but also block the bad bots, this is an example of how to do it:
*Chrome/90.0.4400.8* *Chrome/90.0.4427.0* *Chrome/90.0.4430.72* *Chrome/90.0.4430.85* *Chrome/90.0.4430.86* *Chrome/90.0.4430.93*
This is how to block Chrome/90.0.4430.93:
Caveat About Blocking User Agents
Before blocking Chrome 90 I kept checking the Wordfence traffic log (accessible at Wordfence > Tools) in order to be sure that no legit bots, like GTMetrix, are using Chrome 90 was using that user agent.
For example, you might not want to block Chrome 96 because some of Google’s tools use Chrome 96 as a user agent.
Always research whether legitimate bots are using a particular user agent or hostname.
And easy way to research that is by using the Wordfence Traffic Log.
Wordfence Traffic Log
The Wordfence traffic log shows you at a glance all user agents accessing your site in near real-time. The traffic log shows information such as user agent, indicates whether the visitor is a bot or a human, provides the IP address, hostname, the page being accessed and other information that helps determine if a visitor is legit or not.
The way to access the traffic log is by clicking Wordfence > Tools.
Blocking old browser versions is an easy way to block a lot of bad bots. Chrome versions from the 80, 70, 60, 50, 30 and 40 series are particularly numerous on some sites.
Here’s an example of how to block old Chrome UAs that are used by bad bots:
*Chrome/8*.* *Chrome/7*.* *Chrome/6*.* *Chrome/5.0* *Chrome/95.* *Chrome/5*.* *Chrome/3*.* *Chrome/4*.*
Again, the above is not an encouragement to block the above bots.
The reason I would use *Chrome/6*.* is because with a single rule I can block the entire Chrome 60 series of user agents, Chrome 60, 61, 63, etc., without having to write all ten user agents.
I can block the entire 60 series with a single rule.
Do not block the ten and up series like this *Chrome/1*.* because that will also block the most current version of Chrome, Chrome 100.
The above is an example of how to block bad bots using the described Chrome user agents.
Bad bots also use old and retired Firefox browser user agents and some even display python-requests/ as a user agent.
Be Careful When Creating Firewall Rules
Always do your research first to determine what bad bots are using on your own sites and make sure that no legitimate bots or site visitors are using those old and retired browser user agents.
The way to do your research is by inspecting your traffic log files or the Wordfence traffic logs to determine which user agents (or hostnames) are from malicious traffic that you don’t want.
SEO
Stop Overcomplicating Things. Entity SEO is Just SEO
“Entity SEO”.
Sounds scary, doesn’t it? Not only does the word “entity” sound foreign, it feels like yet another thing to add to your never-ending SEO to-do list. You’re barely afloat when it comes to SEO, but ohgawd here comes one more new thing to dedicate your scarce resources.
I have good news for you though: You don’t have to do entity SEO.
Why? Because you’re probably already doing it.
Let’s start from the beginning.
In 2012, Google announced the Knowledge Graph. The Knowledge Graph is a knowledge base of entities and the relationships between them.
An entity is any object or concept that can be distinctly identified. This includes tangibles like people, places, and organizations, and intangibles like colors, concepts, and feelings.
For example, the footballer Federico Chiesa is an entity:
So is the famous British-Indian restaurant Dishoom:
Entities are connected by edges, which describe the relationships between them.
Introducing the Knowledge Graph helped improve Google’s search results because:
- Google could better understand search intent — People search for the same thing but describe it in different ways. Google can now understand this and serve the same results.
- It reduced reliance on keyword matching — Matching the number of keywords on a page doesn’t guarantee relevance; also it prevents crafty SEOs from keyword stuffing.
- It reduced Google’s computational load — The Internet is virtually infinite and Google simply cannot understand the meaning of every word, paragraph, webpage, and website. Entities provide a structure where Google can improve understanding while minimizing load.
For example, even though we didn’t mention the actor’s name, Google can understand we’re looking for Harrison Ford and therefore shows his filmography:
That’s because Hans Solo and Harrison Ford are closely connected entities in the Knowledge Graph. Google shows Knowledge Graph data in SERP features like Knowledge Panels and Knowledge Cards.
With this knowledge, we can then define entity SEO as optimizing your website or webpages for such entities.
If Google has moved to entity-oriented search, then entity SEO is just SEO. As my colleague Patrick Stox says, “The entity identification part is more on Google’s end than on our end.”
I mean, if you look at the ‘entity SEO’ tactics you find in blog posts, you’ll discover that they’re mostly just SEO tactics:
- Earn a Wikipedia page
- Create a Google Business Profile
- Add internal links
- Create all digital assets Google is representing on the page (e.g., videos, images, Twitter)
- Develop topical authority
- Include semantically related words on a page
- Add schema markup
Let’s be honest. If you’re serious about SEO and are investing in it, then it’s likely you’re already doing most of the above.
Regardless of entities, wouldn’t you want a Wikipedia page? After all, it confers benefits beyond “entity SEO”. Brand recognition, backlinks from one of the world’s most authoritative sites (albeit nofollow)—any company would want that.
If you’re a local business, you’ve probably created a Google Business Profile. Adding internal links is just SEO 101.
And billions of blistering barnacles, creating all digital assets Google wants to see, like images and videos, is practically marketing 101. If you’re a Korean recipe site and want to be associated with the kimchi jjigae entity, wouldn’t you already know you need to make a video and have photos of the cooking process?
When I started my breakdance site years ago, I knew nothing about SEO and content marketing but I still knew I needed to make YouTube videos. Because guess what? It’s hard to learn breakdancing from words. I don’t think I needed an entity SEO to tell me that.
Topical authority is an SEO concept where a website aims to become the go-to authority on one or more topics. Call me crazy, but it feels like blogging 101. Read most guides on how to start a blog and I’m sure you’ll find a subheading called “niche down”. And once you niche down, it’s inevitable you’ll create content surrounding that one topic.
If I start a breakdance site, what are the chances I’ll write about contemporary dance or pop art? Pretty low.
In fact, topical authority is similar to the Wiki Strategy, which Nat Eliason wrote about in 2017. There wasn’t a single mention of entities. It was just the right way to make content for the Internet.
I think the biggest problem here isn’t entities versus keywords or that topical authority is a brand-new strategy. It’s simply that many SEOs are driven by short-sightedness or the wrong incentives.
You can target a whole bunch of unrelated keywords that have high search volume, gain incredible amounts of search traffic, and brag about how successful you are as an SEO.
Some of the pages sending HubSpot the most search traffic has barely anything to do with their core product. A page on how to type the shrug emoji? The most famous quotes?
This is not to single out HubSpot—I’m sure they have their reasons, as explored by Ryan here—but to illustrate that many companies do the exact same thing. And when Google stops rewarding this behavior, all of a sudden companies realise they do need to write about their core competencies. They need to “build topical authority”.
I don’t want to throw the baby out with the bathwater because I do see value in the last two ‘entity SEO tactics’. But again, if you’re doing something similar to the Wiki Strategy for your site, chances are you would have naturally included entities or semantically relevant words without thinking too much about it. It’s difficult to create content about kimchi jjigae without mentioning kimchi, pork, or gochujang.
However, to prevent the curse of knowledge or simply to avoid blindspots, checking for important subtopics you might have missed is useful. At Ahrefs, we run a page-level content gap analysis and look out for subtopics:
For example, if we ran a content gap analysis on “inbound marketing” for the top three ranking pages, we see that we might need to include these subtopics:
- What is inbound marketing
- Inbound marketing strategy
- Inbound marketing examples
- Inbound marketing tools
Finally, adding schema markup makes the most sense because it’s how Google recognizes entities and better understands the content of web pages. But if it’s just one new tactic—which I believe is already part of ‘standard’ SEO and you might already be doing it—then there’s no need to create a category to define the “new era” (voice SEO, where art thou?)
Final thoughts
Two years ago, someone on Reddit asked for an SEO workflow that utilized super advanced SEO methodologies:
The top answer: None of the above.
When our Chief Marketing Officer Tim Soulo tweeted about this Reddit thread, he got similar replies too:
And even though I don’t know him, this is a person after my own heart:
You don’t have to worry about entity SEO. If you have passion for a topic and are creating high-quality content that fulfills what people are looking for, then you’re likely already doing “entity SEO”.
Just follow this meme: Make stuff people like.
SEO
Assigning The Right Conversion Values To Make Value-Based Bidding Work For Lead Gen
Last week, we tackled setting your data strategy for value-based bidding.
The next key is to assign the right values for the conversion actions that are important to your business.
We know this step is often seen as trickier for lead gen-focused businesses than, say, ecommerce businesses.
How much is a whitepaper download, newsletter signup, or online quote request worth to your business? While you may not have exact figures, that’s OK. What you do know is they aren’t all valued equally.
Check out the quick 2-minute video in our series below, and then keep reading as we dive deeper into assigning conversion values to optimize your value-based bidding strategy.
Understanding Conversion Values
First, let’s get on the same page about what “conversion value” means.
A conversion refers to a desired action taken by a user, such as filling out a lead form, making a purchase, or signing up for a newsletter.
Conversion value is simply a numerical representation of how much each of these conversions is worth to your business.
Estimating The Value Of Each Conversion
Ideally, you’d have a precise understanding of how much revenue each conversion generates.
However, we understand that this is not always feasible.
In such cases, it’s perfectly acceptable to use “proxy values” – estimations that align with your business priorities.
The important thing is to ensure that these proxy values reflect the relative importance of different conversions to your business.
For example, a whitepaper download may indicate less “value” than a product demo registration based on what you understand about your past customer acquisition efforts.
Establishing Proxy Values
Let’s explore some scenarios to illustrate how you might establish proxy values.
Take the event florist example mentioned in the video. You’ve seen that clients who provide larger guest counts or budgets in their online quote requests tend to result in more lucrative events.
Knowing this, you can assign higher proxy values to these leads compared to those with smaller guest counts or budgets.
Similarly, if you’re an auto insurance advertiser, you might leverage your existing lead scoring system as a basis for proxy values. Leads with higher scores, indicating a greater likelihood of a sale, would naturally be assigned higher values.
You don’t need to have exact value figures to make value-based bidding effective. Work with your sales and finance teams to help identify the key factors that influence lead quality and value.
This will help you understand which conversion actions indicate a higher likelihood of becoming a customer – and even which actions indicate the likelihood of becoming a higher-value customer for your business.
Sharing Conversion Values With Google Ads
Once you’ve determined the proxy values for your conversion actions, you’ll need to share that information with Google Ads. This enables the system to prioritize actions that drive the most value for your business.
To do this, go to the Summary tab on the Conversions page (under the Goals icon) in your account. From there, you can edit your conversion actions settings to input the value for each. More here.
As I noted in the last episode, strive for daily uploads of your conversion data, if possible, to ensure Google Ads has the most up-to-date information by connecting your sources via Google Ads Data Manager or the Google Ads API.
Fine-Tuning With Conversion Value Rules
To add another layer of precision, you can utilize conversion value rules.
Conversion value rules allow you to adjust the value assigned to a conversion based on specific attributes or conditions that aren’t already indicated in your account. For example, you may have different margins for different types of customers.
Instead of every lead form submission having the same static value you’ve assigned, you can tell Google Ads which leads are more valuable to your business based on three factors:
- Location: You might adjust conversion values based on the geographical location of the user. For example, if users in a particular region tend to convert at a higher rate or generate more revenue.
- Audience: You can tailor conversion values based on specific audience segments, such as first-party data or Google audience lists.
- Device: Consider adjusting conversion values based on the device the user is using. Perhaps users on mobile devices convert at a higher rate – you could increase their conversion value to reflect that.
When implementing these rules, your value-based bidding strategies (maximize conversion value with an optional target ROAS) will take them into account and optimize accordingly.
Conversion value rules can be set at the account or campaign levels. They are supported in Search, Shopping, Display, and Performance Max campaigns.
Google Ads will prioritize showing your ads to users predicted to be more likely to generate those leads you value more.
Conversion Value Rules And Reporting
These rules also impact how you report conversion value in your account.
For example, you may value a lead at $5, but know that these leads from Californian users are typically worth twice as much. With conversion value rules, you could specify this, and Google Ads would multiply values for users from California by two and report that accordingly in the conversion volume column in your account.
Additionally, you can segment your conversion value rules in Campaigns reporting to see the impact by selecting Conversions, then Value rule adjustment.
There are three segment options:
- Original value (rule applied): Total original value of conversions, which then had a value rule applied.
- Original value (no rule applied): Total recorded value of conversions that did not have a value rule applied.
- Audience, Location, Device, or No Condition: The net adjustment when value rules were applied.
You can add the conversion value rules column to your reporting as well. These columns are called “All value adjustment” and “Value adjustment.”
Also note that reporting for conversion value rules applies to all conversions, not just the ones in the ‘conversions’ column.
Conversion Value Rule Considerations
You can also create more complex rules by combining conditions.
For example, if you observe that users from Texas who have also subscribed to your newsletter are exceptionally valuable, you could create a rule that increases their conversion value even further.
When using conversion value rules, keep in mind:
- Start Simple: Begin by implementing a few basic conversion value rules based on your most critical lead attributes.
- Additive Nature of Rules: Conversion value rules are additive. If multiple rules apply to the same user, their effects will be combined.
- Impact on Reporting: The same adjusted value that’s determined at bidding time is also used for reporting.
- Regular Review for Adjustment: As your business evolves and you gather more data, revisit your conversion values and rules to ensure they remain aligned with your goals.
Putting The Pieces Together
Assigning the right values to your conversions is a crucial step in maximizing the effectiveness of your value-based bidding strategies.
By providing Google Ads with accurate and nuanced conversion data, you empower the system to make smarter decisions, optimize your bids, and ultimately drive more valuable outcomes for your business.
Up next, we’ll talk about determining which bid strategy is right for you. Stay tuned!
More resources:
Featured Image: BestForBest/Shutterstock
SEO
Expert Embedding Techniques for SEO Success
AI Overviews are here, and they’re making a big impact in the world of SEO. Are you up to speed on how to maximize their impact?
Watch on-demand as we dive into the fascinating world of Google AI Overviews and their functionality, exploring the concept of embeddings and demystifying the complex processes behind them.
We covered which measures play a crucial role in how Google AI assesses the relevance of different pieces of content, helping to rank and select the most pertinent information for AI-generated responses.
You’ll see:
- An understanding of the technical side of embeddings & how they work, enabling efficient information retrieval and comparison.
- Insights into AI Content curation, including the criteria and algorithms used to rank and choose the most relevant snippets for AI-generated overviews.
- A visualization of the step-by-step process of how AI overviews are constructed, with a clear perspective on the decision-making process behind AI-generated content.
With Scott Stouffer from Market Brew, we explored their AI Overviews Visualizer, a tool that deconstructs AI Overviews and provides an inside look at how Snippets and AI Overviews are curated.
If you’re looking to clarify misconceptions around AI, or looking to face the challenge of optimizing your own content for the AI Overview revolution, then be sure to watch this webinar.
View the slides below, or check out the full presentation for all the details.
Join Us For Our Next Webinar!
[Expert Panel] How Agencies Leverage AI Tools To Drive ROI
Join us as we discuss the importance of AI to your performance as an agency or small business, and how you can use it successfully.
-
SEO7 days ago
Early Analysis & User Feedback
-
AFFILIATE MARKETING6 days ago
What Is Founder Mode and Why Is It Better Than Manager Mode?
-
SEARCHENGINES6 days ago
Daily Search Forum Recap: September 6, 2024
-
WORDPRESS6 days ago
John Kostak of Web Dev USA – WordPress.com News
-
SEARCHENGINES7 days ago
Google Ads To Require Gambling Advertisers With Games Certification To Recertify
-
SEARCHENGINES5 days ago
Google August Core Update Done, Google Interview, Google Ads & Merchant Center News & The YouTube Algorithm SEO
-
SEO5 days ago
Plot Up To Five Metrics At Once
-
SEO4 days ago
Google’s Guidance About The Recent Ranking Update
You must be logged in to post a comment Login