During John Mueller’s hangout on 09/17/2021, one webmaster was concerned about removing old news from a site that had content that was ten years old (or more).
Their first question was regarding implementing a content pruning process where they remove older pages. They wanted to know if there would be an effect on doing so.
The webmaster also asked how Google could crawl a website that is brand-new with absolutely no links pointing to it.
John explained that there is not a lot of value in removing old news from a site, even if there were three million pages. Now, it could be a good thing for users if you put it into an archive, but this would have almost zero impact on rankings.
He also explained that the process of verifying Google Search Console causes the initial crawl and makes Google want to find out more information about the site and its potential for search along with any additional information.
This discussion happens at about the 49:36 mark in the video.
John Mueller 09/17/2021 Hangout Transcript
Hi, John. I’ve just two questions, real quick. Once submitted on YouTube. Is it worth looking at old news and removing no indexing, disallowing it? It’s a site with, like, content from ten plus years old. Does it something like do some to the quality of the page overall or crawl budget. It’s a page with like, three million pages on it. Does it do something to that?
John 50:12
I don’t think you would get a lot of value out of removing just old news. So it’s also not something I would recommend to news websites because sometimes all the information is still useful. So from that point of view, I wouldn’t do this for SEO reasons. If there are reasons why you want to remove content or put it into an archive section on your website for usability reasons or for maintenance or whatever. That’s something that you can definitely do. But I wouldn’t just blindly remove old content because it’s old.
Webmaster 6 50:49
Okay. Clear. Thanks. And another question is, I was just really curious about Googlebot. I was working on my personal site a few months ago. And the very first thing as an SEO I did is verified Google Search Console and Google Analytics and all that stuff. So I was developing for a few months. And when I was ready to go live and do a crawl request for the home page in Search Console. I really saw that Googlebot tried to crawl pages in crawl stats. And I was wondering, how does that happen? And why does it happen? It’s a new domain. There are no links to it. And I was just wondering, how does Googlebot still find this domain and tries to crawl it?
John 51:32
I think it’s something that we do when you verify your website in Search Console, where we kind of say, “Well, it looks like you’re trying to get this website, into search. We don’t have any information on it. We will take a look.” And I don’t think it’s something that we do for policy reasons or anything. It’s basically just oh, it looks like you’re trying to go into search, we will help you so that you don’t have to do every step yourself.