An SEO professional was concerned about crawling and whether Googlebot will crawl via an identifying IP.
They have Googlebot crawling specific URLs that were setup for advertising reasons. Namely, UTM parameter URLs, Google Display URLs, and Universal App campaigns.
Even so, the professional has checked and they do not see any links coming from anywhere to those URLs.
So, this behavior caught them by surprise.
John explained that while he doesn’t know offhand, he does believe they would crawl those pages as Googlebot in order to ensure that they can pick them up for the Merchant Center.
If there are tagged URLs like UTM URLs, it’s possible that regular Googlebot will crawl them and they will show up in Google Search Console.
It’s possible as well that there may be someone who is acting on their behalf to submit the URLs as well.
The most likely reason is, if Google does find links to these pages somewhere, they will try to crawl them.
If there are tagged internal links anywhere within a website, Google will attempt to process them and crawl those.
For example, if you have something setup in JavaScript, and URLs with these parameters are set up somewhere, then Google could potentially crawl them.
He also explained that Google doesn’t crawl URLs at a large scale without showing it’s Googlebot, specifically.
This happens at approximately the 54:22 mark in the video.
John Mueller Hangout Transcript
SEO Professional 9 54:22
Hi, John. I also have a question about crawling. We see in our log files and also proven that it’s Google bot, via IEP. A lot of crawling from the organic bot to UTM parameter URLs, Google Display, and universal app campaigns. But we have checked and we don’t see any links coming from anywhere to those URLs. So we are a bit surprised by this behavior. Do you have any idea of where or why this might be happening?
John 55:07
And what kind of tag URLs are they?
SEO Professional 9 55:11
Google Display. So UTM, Google Display, and universal app campaign. They’re both basically display campaigns.
John 55:24
Now, I don’t know offhand. The one place where, with Googlebot, we also crawl pages that you list in an ads campaigns, I think, is for product search. So if you have a product search feed or merchant center feed, I’m not sure what it’s called, set up, then we would also crawl those pages for Googlebot to make sure that we can pick them up for the Merchant Center.
And if you have tagged URLs in there, then that can happen. We will keep those tagged URLs and reprocess them. I don’t know offhand how it works with Merchant Center, it might also be that other people are able to submit these kinds of products, kind of like from your site in a separate feed. So it might necessarily not be you who’s submitting them, but maybe someone who’s working on your behalf or has the permission to do that as well.
But that seems like the most likely reason for something like this. I mean, the other aspect is, if we find links to these pages somewhere, we will try to crawl them. So if you have tagged internal links within a website, we will still try to pick that up and crawl that. If you have things set up in JavaScript that maybe you have tracking URLs with these parameters set up somewhere.
And when we process the JavaScript, it looks like it’s a link to those tracking URLs, we could also process that. But it sounds to me like it’s not individual cases that are happening for your site, but rather like a large number of these URLs. And that feels a lot like the merchant center side of things.
SEO Professional 9 57:18
Okay, yeah, cuz it’s, it’s quite large. So it’s, those URLs are crawled. So basically 50 percent of our crawling budget or crawling of our organic part. But the thing is, I’ve also been looking into Search Console, and in Search Console that’s not reported. So we don’t see those parameter URLs in Search Console, might that be an indication of something?
John 57:45
It depends on where you look in Search Console. So in the crawl stats report, if they’re being crawled through the Googlebot infrastructure, then they should also be listed there. In the index coverage and search performance site, I don’t think we would list them.
SEO Professional 9 58:00
Yeah, no, it wasn’t in the crawling report.
John 58:04
So if they’re not visible in the crawling report, then that almost feels like something weird is happening in the sense that maybe it’s not an official Googlebot that is crawling you like that? I don’t know, it’s hard to say. Because we do have different systems that use the Googlebot infrastructure, and they allow for that reverse IP lookup. And all of those should be listed in the crawl stats report in Search Console.
SEO Professional 9 58:38
Okay, because we checked the IP and also the DNS lookup manually for a couple of them. And it does seem to be Googlebot.
John 58:47
Does it map back to Google or to Googlebot directly? Because within Google Cloud, you can also run services and those IP addresses go back to Google. But they don’t go back to Googlebot.
SEO Professional 9 59:05
No it does–checking the terminal and it is Googlebot.
John 59:09
Okay. Yeah. Then they should definitely be in the crawl stats report in Search Console.
SEO Professional 9 59:17
Okay. So mystery–and I have another question, if I may. Does Google ever crawl websites without being Googlebot?
John 59:36
Without being Googlebot?
SEO Professional 9 59:37
So I don’t know that, for example, in cases like you’re trying to check, I can call this, like showing two different versions to users and bot or something like this. I don’t know if there are any systems like that.
John 59:51
I don’t think we would crawl like that. We might check individual URLs and there are lots of other systems that kind of go into that as well, which could be something like I don’t know, Google Translate, for example. That might use–I’m pretty sure that it uses a normal browser user agent. But it still goes through kind of like Google’s infrastructure. But we wouldn’t crawl through Google Translate, then it would just be like individual requests. So a large number of URLs, from my point of view, I don’t think we would kind of process like that. Other than, like I mentioned, like the merchant center side of things, or the ads landing pages checks. Those are also like a large number of URLs, but they’re based on what you submit to those systems appropriately.
SEO Professional 9 1:00:46
In any case, those would be crawled via Google ads bot, typically, right?
John 1:00:52
Yeah, I think the Merchant Center wouldn’t, I think the Merchant Center would be normal Googlebot.
SEO Professional 9 1:00:59
Okay, so basically, no Google crawling large scale without showing that it’s somehow Google?.
John 1:01:06
Yeah, I don’t think we do that for policy reasons anyway, but I’m kinda like, theoretically, like, it might be that someone has tried that at some point. But I don’t think that’s something that we would do on purpose.