Google's 2020 Webspam Report

Every year, Google releases their end-of-year recap summarizing what they have done to fight spam.

In this year’s report, they highlighted their penchant for using AI to fight spam.

They also tout its effectiveness at catching both known and never-before-seen trends in search spam.

One of their statistics says that in 2020 they reduced spam by more than 80% compared to just a couple of years ago.

How Has Google Fought Spam ‘Smarter’?

Google explains their process for fighting spam, providing information about what happens behind-the-scenes:

Before we deliver a set of search results on Google, there’s a lot that happens behind the scenes. Every day, we’re discovering, crawling, and indexing billions of web pages. Among those pages is a lot of spam — every day, we discover 40 billion spammy pages. Here’s how we work to keep that spam from getting in the way of your search for helpful, useful information.

Diagram conceptualizing how Google fights spam

This diagram conceptualizes how we defend against spam.

First, we have systems that can detect spam when we crawl pages or other content. Crawling is when our automatic systems visit content and consider it for inclusion in the index we use to provide search results. Some content detected as spam isn’t added to the index.

These systems also work for content we discover through sitemaps and Search Console. For example, Search Console has a Request Indexing feature so creators can let us know about new pages that should be added quickly. We observed spammers hacking into vulnerable sites, pretending to be the owners of these sites, verifying themselves in the Search Console and using the tool to ask Google to crawl and index the many spammy pages they created. Using AI, we were able to pinpoint suspicious verifications and prevented spam URLs from getting into our index this way.

Next, we have systems that analyze the content that is included in our index. When you issue a search, they work to double-check if the content that matches might be spam. If so, that content won’t appear in the top search results. We also use this information to better improve our systems to prevent such spam from being included in the index at all.

The result is that very little spam actually makes it into the top results anyone sees for a search, thanks to our automated systems that are aided by AI. We estimated that these automated systems help keep more than 99% of visits from Search completely spam-free. As for the tiny percentage left, our teams take manual action and use the learnings from that to further improve our automated systems.

They also mention that they discover more than 40 billion spammy pages. That’s a ton of spam!

Among the types of spam that Google continuously protects against is hacked spam, or spam from sites that have been compromised by hackers.

One of the most common attacks they observed includes spammers hacking into sites that are vulnerable. By pretending to be the site’s owners, they verify themselves in Google Search Console, and bait the tool into crawling and indexing the spammy pages they made.

It’s not like this is the first time they have been using AI. In fact, they have been using AI for more than several years at least, especially with the introduction

How are Sites Compromised by Hackers?

Google explains that the following are the most common reasons that sites are hacked and compromised:

stolen and/or compromised passwords;
the owner does not install the security updates;
themes and plugins that are not secure;
social engineering;
holes in the site’s security policies; and
leaks of crucial data

This is why they note that:

“It’s essential to periodically check for software updates for your site in order to patch vulnerabilities. Better yet, set up automatic updates for your software where possible and sign up for security announcement lists for any of your active running software.”

When it comes to social engineering, it appears it’s not entirely ineffective:

“According to a study from Google about social engineering, some of the most effective phishing campaigns have a 45% success rate!”

They also note that poor security policies are the main reason why hackers are able to gain access to a website for malicious attacks.

If You See Something, Say Something

If you are aware of or have seen spam in Google’s search results, you may want to file a spam report.

Taking note of the precise query you’re using, along with any other identifying information about the spam you witness in the SERPs will be helpful.

You can find their spam report form here.

How Has Google Fought Spam ‘Smarter’?

How are Sites Compromised by Hackers?

If You See Something, Say Something

Brian Harnish

Recent Articles

Google’s 2020 Webspam Report

How Has Google Fought Spam ‘Smarter’?

How are Sites Compromised by Hackers?

If You See Something, Say Something

Brian Harnish

Recent Articles