Over on Twitter, Google’s John Mueller announced an update to their search documentation about Google’s handling of content within large HTML files.
They had updated their search documentation to explain that Googlebot only reads about the first 15 MB of content.
This is what Google said about this limitation:
This limit only applies to the bytes (content) received for the initial request Googlebot makes, not the referenced resources within the page. For example, when you open https://example.com/puppies.html, your browser will initially download the bytes of the HTML file, and based on those bytes it might make further requests for external JavaScript, images, or whatever else is referenced with a URL in the HTML. Googlebot does the same thing.
What does this 15 MB limit mean to me?
Most likely nothing. There are very few pages on the internet that are bigger in size. You, dear reader, are unlikely to be the owner of one, since the median size of a HTML file is about 500 times smaller: 30 kilobytes (kB). However, if you are the owner of an HTML page thatβs over 15 MB, perhaps you could at least move some inline scripts and CSS dust to external files, pretty please.
What happens to the content after 15 MB?
The content after the first 15 MB is dropped by Googlebot, and only the first 15 MB gets forwarded to indexing.
What content types does the 15 MB limit apply to?
The 15 MB limit applies to fetches made by Googlebot (Googlebot Smartphone and Googlebot Desktop) when fetching file types supported by Google Search.
Does this mean Googlebot doesn’t see my image or video?
No. Googlebot fetches videos and images that are referenced in the HTML with a URL (for example, ) separately with consecutive fetches.
Do data URIs add to the HTML file size?
Yes. Using data URIs will contribute to the HTML file size since they are in the HTML file.β
This is not a new thing, it's just newly written down. If you haven't seen issues from this so far, you'll continue not to see them. While I trust that you can make HTML files that are larger, it's a *lot of work* and almost nobody does that.
— π johnmu.csv (personal) weighs more than 15MB π (@JohnMu) June 28, 2022
And, if you care about SEO, would you put the only mention of a topic that you'd like to rank for after 15MB of content? Imagine you have a HTML page that contains a few books … like Pride and Prejudice by Jane Austen, Alice's Adventures in Wonderland by Lewis Carroll, and …
— π johnmu.csv (personal) weighs more than 15MB π (@JohnMu) June 28, 2022
The Yellow Wallpaper by Charlotte Perkins Gilman, Noli Me Tangere by José Rizal, A Tale of Two Cities by Charles Dickens, The Great Gatsby by F. Scott Fitzgerald, Japanese Girls and Women by Alice Mabel Bacon, and, for Googlers, Great Expectations by Charles Dickens, but also …
— π johnmu.csv (personal) weighs more than 15MB π (@JohnMu) June 28, 2022
Unfortunately, Googlebot is now into 15MB HTML for this single page, so the additional content that you add here (after the top 16 books from Project Gutenberg, all on the same page) won't be taken into account.
— π johnmu.csv (personal) weighs more than 15MB π (@JohnMu) June 28, 2022