An SEO professional was concerned about malware hacks that their site had, back in December. They made sure that there are no security issues in Google Search Console.
However, the indexed pages that were indexed on Google as a result of the malware hack are still showing in the search results.
They double-checked, and they do have a proper 404 set up. They were just wondering: what else can they do?
They can’t just clean it up from the search results because there’s a lot and they can’t use the temporary removal tool, because there have been hundreds of thousands of URLs that were being shown in the search results.
John answered that he would double-check that these pages are actually removed.
Because some types of website hacks are done in a way that if you check manually, it looks like the pages were removed.
But in actuality, for Google, all the pages are still there. He said as well that he would check with the Inspect URL tool on some of those pages, just to double-check: is it really entirely cleaned up?
Or, could there be something leftover that is trying to hide?
That’s the basis for everything else.
On the other hand, there are other things you can do as well. First, make sure that the more visible pages are manually removed.
This means searching for your company name for your website, searching for your primary products, those types of things, and seeing the pages that show up in the search results.
And making sure that anything that you don’t want to have shown is not there.
Usually this results in probably up to 100 URLs where you’re saying “Oh, these were hacked. I want them removed as quickly as possible.” For these, use the removal tool.
This is the fastest way to clean things up. The removal tool takes these URLs out within about a day.
This helps take care of things – especially for those pages that are visible to users within the search results.
The other thing they can do is that for any remaining URLs, they will be re-crawled over time. But usually, when it comes to lots of URLs on a website, that’s something that takes a couple of months.
On one hand, you could just leave these and say “Well, they’re not visible for people unless you explicitly search for the keywords, or they had content, or you do a site:query of your site.” And they will drop out over time.
Just leave them be for a half a year.
Then, you can check again afterwards to see if they’re actually entirely cleaned up.
Additionally, you can use the URL removal tool using the prefix setting if you want to resolve this as quickly as humanly possible.
This happens at approximately the 24:07 mark in the video.
John Mueller Hangout Transcript
SEO Professional 6 24:07
Hey, John. So we have a website, which had a malware attack back in December- last December- and we have cleaned it up. And we made sure there is no security issue in Google Search Console. But you know, the indexed pages, you know, the unwanted pages, which were indexed in the result of malware are still showing in the search results.
And I double-checked, we have a proper 404 set up. And I was just wondering, what else can we do? So we can just clean it up from the search results because there’s a lot and we can’t actually use the temporary removal tool, because there have been hundreds and thousands of URLs which were being used shown in the search results, so yep, that was the situation.
John 25:05
Okay. So I think first of all, I would double-check that these pages are actually removed. Because some types of website hacks are done in a way that if you check manually, then it looks like the pages are removed. But actually, for Google, it’s still there. So I would check with the Inspect URL tool, some of those pages just to double-check, is it really completely cleaned up? Or is there something leftover that is trying to hide?
And I think that’s kind of the basis of everything else. Then for the rest, there are two kinds of approaches that I recommend. On the one hand, I think the best approach is to make sure that the more visible pages are manually removed. That means searching for your company name, for your, for your website name, searching for your primary products, those kinds of things, and seeing the pages that show up in the search results.
And making sure that anything that you don’t want to have shown is not shown. And usually that results in I don’t know, maybe like up to 100 URLs, where you’re saying, Oh, these are hacked, and I want them removed as quickly as possible. And for those use the removal tool. That’s essentially the fastest way to clean things up. The removal tool takes those URLs out within about a day.
So that is, especially for things that would be visible to your users, that kind of helps take care of that. The other part is the URLs that are remaining, they will be re-crawled over time. But usually, when it comes to lots of URLs on a website, that’s something that takes a couple of months. So on the one hand, you could just leave those be and kind of like say, well, they’re not visible to people, unless you explicitly search for the keyword or they had content or do a site query of your website.
And they will drop out over time. And just kind of like leaving them be for half a year. And then double-check afterwards to see if they’re actually completely cleaned up. If you really want to try to resolve that as quickly as possible, you can also use the removal tool with the prefix setting. And essentially try to find common prefixes for these half pages, which might be a folder name or file name or something that’s in the beginning.
And kind of filter those out. The removal tool doesn’t take them out of our index. So it doesn’t change anything for the ranking, but it doesn’t show them in the results anymore. So that’s one way you could kind of go past just the more visible pages to try to clean the rest up. Personally, I don’t think you need to clean up all of those pages.
Because if users don’t see them, then it’s like, well, technically they exist in the search results. But if no one sees them, it doesn’t really change anything for your website. So from that point of view, I would focus on the visible part, clean that up. And when that’s done, just let the rest kind of work itself out.
SEO Professional 6 28:22
Yeah, that happens actually, for a few websites, it just cleaned up real quick. But for some websites, you know, it takes ages, which is sometimes I’ll do the–I’ll follow the instructions. I have another question if you have time.
So lots of common cases where we have websites with a valid URL to the quality content. And they’re just following the guidelines, which are being mentioned in the Google Search Console search central guidelines. They’re following that, you know, they’re avoiding duplication, they have quality content, and they have I mean, not you know, you know, duplicating or I mean, doing nasty stuff, but they’re valid pages.
But sometimes, you know, it took ages to index those URLs, email, and you know, people just come and say, Okay, look at this URL is valid. And we’ve been requesting this for a long time, we have internal links set up. And the overall website is I mean, quite old, and they have, you know, good reputation across the website.
So they’re just following the basics. So I wish we have a tool or something that we can, you know, use it to, you know, help people to index them faster.
John 29:45
Now, yeah, it’s interesting. You bring both those sides on the one hand, like how do I get stuff out of search and how do I get stuff into search? Maybe you should be able to trade between those two sides. That would be an interesting idea for a tool I guess. So I think, overall there is the Submit to Indexing tool or functionality in Search Console.
That’s kind of what we recommend for these things. But at the same time, we just don’t index it. And it can very well happen that you have something that is a valid page. But we just don’t index it. I think one of the reasons that kind of goes in that direction is, nowadays, almost all pages are valid pages.
And it’s really hard to set up a CMS, where you produce pages that are invalid. If you use WordPress, or any of the common systems, it just produces valid pages by default. And from a technical point of view, we can’t index everything on the web. So we have to draw the line somewhere. And it’s, it’s completely normal for websites to have parts of their paid–parts of their content index and parts of their content not indexed.
Usually, over time, as we understand that this is a really good website. And if it has a reasonable internal structure, then we can pick up more and more, but it’s not a guarantee that we’ll index everything on the web. So that’s, I think, always something to kind of keep in mind. And especially in Search Console, it’s easy to look at the reports and say, Oh, these pages are not indexed, therefore, I’m doing something wrong. But from our point of view, it’s normal that not everything is indexed. It’s just a lot more visible nowadays.
SEO Professional 6 31:32
Yeah, that’s what we’ve been telling to those people that came up with the problem. You know, normally, if you just talk about WordPress, for example, yeah, they do have common pages. So, you know, we definitely, you know, respond to them, “Okay, these pages are identical.”
So the content is indexed already, the main content, and, you know, the same thing you were suggesting. But, you know, sometimes we may just, you know, it’s nail biting, you know, to see some URLs and decide, I mean, you know, they don’t want to try to trick Google, you know, they just have valid pages.
And we will check everything, you know, the sitemap, they’re requested as well. You know, it seems like everything, then we wish, you know, there should be something, which we just press and, you know, to help these people, because those are important pages for those websites.
John 32:33
Yeah. Yeah. I think it’s tricky, because everyone wants, like, all of their pages indexed and everything. Just want to click something to get it done. Yeah. It’s hard sometimes.