One SEO professional asked John Mueller during a hangout about controlling Google crawling their site via API requests.
Their question was: So they have a live stream shopping platform as their website, and their site currently spends about 20 percent of the crawl budget on the API subdomain, and another 20 percent on image thumbnails of videos.
Neither of the subdomains have content that are part of their SEO strategy. Should they disallow these subdomains from crawling further?
How are the API endpoints discovered and/or used?
And if they access an API that’s on a site, then Google will try to load the content from that API and use that for rendering of the page.
So, this is usually the place where this is discovered. And that’s something you can help, by making sure that the API results can also be cached as well.
So you don’t inject any timestamps into URLs.
And that will block all of those API requests from happening. So that’s something where, first of all, you need to figure out whether these API results are part of the primary content or is it important critical content that you want to have indexed by Google?
And if so, then perhaps you should not block crawling. But if this is something where it’s generating something that is almost secondary to your pages, or anything that’s not critical for your pages themselves, then it might be worthwhile to double check what it looks like when they’re blocked.