One SEO professional was concerned about how JavaScript strings were being rendered on their site – more specifically, the string starting with a slash being interpreted as a URL and being followed by Googlebot.
With these crawls, sometimes a URL isn’t valid.
They also see different crawl errors in Search Console.
Their main question is: what is the official recommendation in terms of how to nofollow such URLs? They are used to splitting the strings into two or more parts.
Does having millions of pages with these types of strings negatively affect crawl budget?
John answered that they don’t have to worry about crawl budget.
When it comes to crawling, Google prioritizes things in different ways. All of them are kind of random URL discoveries that they come across, where their URL is mentioned in a text or in a JS file somewhere, and these are lower on the list.
If they have anything important that they recognize on a website, any new pages that are linked to any new content that has been created, they will prioritize the new content first.
Then, if they have time, they will also go through all of the random other URL mentions that they have discovered.
From a crawl budget perspective, this is usually a non-issue. If the SEO pro is concerned about this overall, and Google is crawling too much of the website, then they can adjust the amount of crawling in Search Console with the crawl rate setting.
Again, Google will still prioritize things. If you set the setting to be low, then they will focus on the important things first. And if they can cover the important things, then they will try to go through the rest.
From that perspective, if you’re seeing that Google is hitting the server too hard, then you can adjust the crawl rate after a day or two, and it should settle down at the new rate.
Then they should be able to keep on crawling.
Regarding following these URLs, it’s not possible to really do this in the JavaScript files. Google tries to recognize URLs in JavaScript, because sometimes URLs are only mentioned in JavaScript.
What is possible, however, is to put these URLs into a JavaScript file that’s then blocked by robots.txt.
If the URL is blocked by robots.txt, then they won’t be able to see the JavaScript file and they won’t see these URLs.
This happens at approximately the 30:30 mark in the video.
John Mueller Hangout Transcript
John (Submitted Question) 30:30
We see that every JavaScript string starting with a slash is interpreted as a URL and is followed by Googlebot. Sometimes a URL is not valid, and we see different crawl errors in Search Console. Is there an official recommendation on how to nofollow such URLs? We used to split the strings into two or more parts. Having millions of pages with such strings may negatively impact the crawl budget.
John (30:58)
So I think, just kind of like the last question, or the last part of the question there with regards to crawl budget, that’s one thing you definitely don’t have to worry about. Because when it comes to crawling, we prioritize things in different ways. And all of these kind of like random URL discoveries that we come across, where your URL is mentioned in a text or in a JavaScript file somewhere, those tend to be fairly low on the list.
So if we have anything important that we recognize on your website, any new pages that you link to, any new content that you’ve created, we’ll prioritize that first. And then if we have time, we’ll also go through all of these random other URL mentions that we’ve discovered. So from a crawl budget point of view, this is usually a non issue. If you’re seeing that overall, we’re crawling too much of your website, then you can adjust the amount of crawling in Search Console with the crawl rate setting.
And again, here, we still prioritize things. So if you set the setting to be fairly low, then we’ll still try to focus on the important things first. And if we can cover the important things, then we’ll try to kind of go through the rest. So from that point of view, if you’re really seeing that we’re hitting your server too hard, you can just adjust that after a day or two, it should kind of settle down at that new rate.
And we should be able to, kind of like, keep on crawling. With regards to no-following these URLs, you can’t really do that in the JavaScript files. Because we try to recognize URLs in JavaScript, because sometimes URLs are only mentioned in JavaScript. What you can do, however, is put these URLs into a JavaScript file that is blocked by robots.txt. And if the URL is blocked by robots.txt, then we won’t be able to see the JavaScript file and we won’t see those URLs.
So if it’s really a critical thing that you’re thinking, Googlebot is getting totally lost on my website, then you could use robots.txt to block that JavaScript file. The important part there is to keep in mind that your site should still render normally, with that file kind of blocked. So in Chrome, you can, I believe, you can just block that individual URL and test it out. But especially the mobile friendliness of a page should still be guaranteed, we should still be able to see kind of the layout of the page properly with that JavaScript file blocked.
So if it’s only some kind of interactive functionality that is being blocked by that, then usually that’s less of an issue. If it blocks all of the JavaScript and your page doesn’t work at all anymore, then that’s something where I’d say maybe you need to find a different approach to handle that.