Googlebot Crawls & Indexes First 15 MB HTML Content
The recent update to Google bots helps document contains confirmation that it will crawl at least the first 15 MB of a web page and that anything after that would be cut off and will not be included while calculating the rankings. Google specifies in the help document that any resources referred to in the HTML, like video images and JavaScript, are fetched separately. After the first 15 MB of the file, Googlebot will stop crawling and consider only the first 15 MB for indexing the file.
For some time, the SEO community was left wondering if this meant Googlebot would completely disregard text that fell below the images. Instead, it is peculiar to the HTML file itself as its return.
What does it mean for SEO?
You need to ensure that critical content is included near the top of the web pages to ensure Googlebot weighs it. It means code must be structured in a way that puts the SEO-relevant information with the first 15 MB in the HTML or supported text file. It also means images and videos must be compressed and should not be encoded directly in the HTML if feasible. Seo best practices recommend keeping HTML pages 200 KB or even less so that several sites will be unaffected by this change.
Page size can be checked with different tools like Google page speed insight. In theory, it might sound stressful that you could potentially have content on a page that does not get used for indexing. But 15 MB is considered a considerable amount of HTML in practice. Google also states that resources like images and videos would be fetched differently. So it sounds like this 15 MB cut-off applies only for HTML as per Google's wordings.
What exactly does the 15 MB restriction mean?
Nothing is likely to happen. There aren't many websites with pages this large. Since the average HTML file is only 30 kilobytes, you, the reader, are unlikely to own one (kB). Even if your HTML page contains more than 15 MB of inline scripts and CSS dust, you may be able to move some of these to other files.
When the content reaches 15 MB, what happens?
Only the first 15 MB of a page's content is sent to indexing after Googlebot removes the rest.
The 15 MB limit applies to what sorts of content?
If you're using Googlebot (GoogleBot Smartphone or GoogleBot Desktop) to download file formats supported by Google Search, you're limited to 15 MB per download.
Does this mean Googlebot won't be able to see my image or video?
This isn't the case. Googlebot gets videos and photos that have a URL in the HTML (for example, < img src="https://example.com/images/puppy.jpg" alt="cute puppy looking very disappointed" /> ) individually and in successive fetches.
Do data URIs make HTML files bigger?
Yes. Data URIs contribute to the HTML file size because they are contained within the HTML page.
How do I find out how big a page is?
The simplest method is generally to use your browser and the Developer Tools built into it. Once you've opened the page in your browser, switch to the Network tab in the Developer Tools. To see all the requests your browser made to render the page, reload this page. You're searching for the top request, which includes the page's byte count in the Size column.
It would be challenging to go over that limit with HTML unless you are publishing books that are worth of text on a single page. If you have a page beyond 15 MB, you will likely have an underlying issue that you have to fix anyway.