In John Mueller’s hangout, a webmaster was concerned about getting two hits from Googlebot at the same time in their server logs, approximately a few milliseconds apart.
They were wondering if there is an explanation for this—from raw HTML vs. rendered HTML or something else?
John explained that it’s possible the problem is different user agents, such as desktop vs. mobile.
Otherwise, it could be a potential glitch.
If user agents are trackable within the server-side software, it’s possible that it’s a glitch on Google’s side that needs to be repaired.
Either way, it’s inconsequential.
The exchange occurs at the 5:01 mark in the video.
John Mueller Hangout Transcript
Cool. Hi, John. So when we look at our server logs, we generally see two hits from Googlebot at the same time or, you know, give or take, a few milliseconds, right? My team has been trying to figure out an explanation for this.
And we even have a bit of a pool going right one believes that it’s hitting our server from different physical locations. Right? Another, you know, thinks that it could be desktop versus mobile. My money is on raw HTML versus rendered HTML. Did I back the right horse?
John 5:35
I don’t know, what, I mean, what are you seeing? Is it the same URL and the same user agent? Everything? Is it the same user agent as well?
That seems like something that shouldn’t be happening. But because, I mean, if it were desktop and mobile, those would be different user agents. And if it’s the same Googlebot user agent, then we should be caching that a little bit.
It shouldn’t be that like, within seconds, we would refresh the same page again, even if we were to render the page, we would use kind of the, the HTML version that we have, and then we render the page pulling in all of the extra content. We wouldn’t render the page by fetching the HTML again because that would be kind of, kind of inefficient.
So that feels like almost more like a bug on our side, if you’re seeing that, or maybe you’re seeing something in the logs that is simplified or tracking it slightly differently, I don’t know. But it’s the same URL, same user agent, like within seconds, that seems like an issue on our side.
Different physical locations wouldn’t be the case because we usually crawl sites from the same location. So it’s not that we would crawl it like one from this data center, once from a different one. We essentially pick one location, and we crawl the whole site from that location.
Webmaster 3 7:12
Okay, so that’s interesting because I mean, if you know, our consumer experience, we use, you know, the the location to kind of, you know, to serve up more relevant content, so that that could mean that we could be limiting the amount of content accessible to Googlebot if you’re only crawling it from one location, yes?
John 7:33
Almost certainly. Yes. Oh, another thing that might be happening there is, I’ve seen that sometimes with CDNs or security setups, where essentially, we request the URL once and it does a kind of a redirect back to the same URL or something like that, as an—I don’t know, an anti-bot measure or something like that, where maybe we’re following a redirect like that, and kind of indirectly picking up the content.
And that would be something where I think if, if that’s actually happening, if you can tell, for example, with the inspect URL tool, when you do live requests, that kind of bouncing through your servers happening, that seems like something that would be inefficient on your side, where you’d probably be able to get more crawls in if you didn’t have that kind of indirection in between.