With these crawls, sometimes a URL isn’t valid.
They also see different crawl errors in Search Console.
Their main question is: what is the official recommendation in terms of how to nofollow such URLs? They are used to splitting the strings into two or more parts.
Does having millions of pages with these types of strings negatively affect crawl budget?
John answered that they don’t have to worry about crawl budget.
When it comes to crawling, Google prioritizes things in different ways. All of them are kind of random URL discoveries that they come across, where their URL is mentioned in a text or in a JS file somewhere, and these are lower on the list.
If they have anything important that they recognize on a website, any new pages that are linked to any new content that has been created, they will prioritize the new content first.
Then, if they have time, they will also go through all of the random other URL mentions that they have discovered.
From a crawl budget perspective, this is usually a non-issue. If the SEO pro is concerned about this overall, and Google is crawling too much of the website, then they can adjust the amount of crawling in Search Console with the crawl rate setting.
Again, Google will still prioritize things. If you set the setting to be low, then they will focus on the important things first. And if they can cover the important things, then they will try to go through the rest.
From that perspective, if you’re seeing that Google is hitting the server too hard, then you can adjust the crawl rate after a day or two, and it should settle down at the new rate.
Then they should be able to keep on crawling.
This happens at approximately the 30:30 mark in the video.