During the Search Off The Record podcast, Googlers John Mueller, Gary Illyes, and Lizzi Sassman discussed all things Sitemaps.
With this discussion, these Googlers dived right into the topic of sitemaps, how they can help Google discover the content on your site, and how you can take advantage of sitemaps for better crawling and indexing.
This includes how John Mueller got started at Google and how creating a sitemap generator led to that, along with a number of other great tidbits.
Without further ado, let’s get into it and learn more about sitemaps!
Sitemaps and the Future of Crawling
They said that it was easier to crawl and find all of the content on the web. They also said that, at the time, regardless of whether sitemap files actually helped with visibility, creating a sitemap file forced them to look at their website and think about all of the URLs that Google could find. And why is it not finding this part? And what’s up with all of these parameters, and upper and lowercase? And all of these things, where when you crawl your website, it’s like suddenly this infinite mess.
But when you see that for the first time, you realize that this is something that I can control. And this is something that a site owner can kind of work on to make it easier for search engines to crawl. So then they were asked if they made some changes on their website based off of the learning exercises. And if so, what kind of changes did they make?
URL Parameters and Crawling
And they said that they don’t know the details of what they changed on the website. But things like URL parameters were super common. And kind of understanding that using random URL parameters like session IDs and URL parameters at the time was super common to have that you just have this really wrong long number as a parameter attached. And every user gets a different number. And that’s something that, in the early days, you would look at a website and say, well, it is how it is.
And like, I’m not supposed to understand all of these things. But when you crawl it, you realize that actually, this makes it pretty much impossible to crawl the website properly. Unless a search engine can figure that out. And if you can figure it out for the search engine, it makes it a little bit easier. They also said that they noticed this on their website, but also when they made the generator, like other people were using it, and they would contact me and say, Well, I ran your tool on my website, and it’s not stopping. And then you kind of are forced to look at other people’s websites and try them out as well.
And then you notice that these kinds of crawling issues, they’re just everywhere. They believe that a lot of that has gotten significantly better, because people use more common CMS systems. And they don’t generate this kind of messy website anymore. But at least back in the early days, it was super common to have a website that was pretty much impossible to crawl.
Using Sitemaps to Make a Difference on Your Site
They said that if you want to make a difference for your website and make it easier for Google to crawl, then sitemap files are definitely something that you should look into. They were also asked what are some of the common mistakes that people make with sitemap files? And they said that the most common mistake is probably not including all of the pages on the website. So they have this sitemap file, but it only includes like the 10 most important pages, or maybe 20 pages. But there’s thousands of pages on their website.
And they’re just kind of ignoring all of those other pages. And that’s something that can be really bad, because if you don’t include a page in your sitemap file, Google will probably never find that page. So if you have a page that you want people to be able to find, then make sure that it’s in your sitemap file. They also said that another common mistake is not updating the sitemap file when the website changes.
Common Sitemap Mistakes
So if you add a new page, or if you remove a page, or if you change the URL of a page, then you need to update your sitemap file as well. Otherwise, Google is going to try to crawl the old URL, and they’re just going to get a 404 error because the page doesn’t exist anymore. Or they’re going to try to index the new page, but they’re not going to be able to because it’s not in the sitemap file. So those are two really common mistakes that people make with sitemap files.
They said that, as a small business owner, one would not want to do the exercise because it is not necessary. They said that, before 2013, crawl budget was not a thing. However, it became a thing because of the popularity of the internet.
The idea behind priority is kind of understandable, but at the same time, if you’re making these files for any larger website, you have to automatically fill out these values. And you don’t necessarily know what is the relative priority of this random blog post that I have. And at some point, you just say, well, everything is important, or you create this kind of artificial structure of priority for your website.
But you can’t really determine it yourself. And at that point, that data is not really that useful. They said that the last modification date is something that has an absolute value that can be supplied. However, with change frequency, one does not really know in advance how often it’ll change. It’s more that search engines could over time track kind of how often this page changes on average. And they could use that to determine how often to recrawl it. So, at that point, why would a site owner specify that directly? Because it is much more tempting to say, well, this page could change every day, even if it doesn’t.