An SEO professional was concerned about duplicate content in the form of HTML pages and PDFs.
So they asked John Mueller in a hangout regarding this type of content. They have a PDF file featuring a case study article.
Now, they want to present it in the form of an HTML blog article.
They were curious if this had any negative impact from their side, because of the duplicate content.
John answered that they would not see it as duplicate content, because it is different content.
Even if the primary piece of content on these is the same, the entire way it’s presented is different (one in HTML markup, and one in PDF).
From that level, John explains that they wouldn’t see it as duplicate content.
John thinks that, at most, the difficulty may be that in the search results, it could happen that both of these show up at the same time.
And whether you want that to happen, though, is more of a strategic question on the SEO professional’s side.
From Google’s perspective, they would not see it as a negative, when it comes to SEO.
But perhaps they have more strategic reasons to make the PDF or the HTML page more visible than the other.
John also went on to say that he believes, for the most part, that PDFs will likely be less visible in the search results because they are less tied in with the rest of your site.
And that when it comes to internal linking, you will usually link to web pages as opposed to linking to the PDF. Then, from one of these web pages, you will link to the PDF.
So there’s a bit of a de-emphasis on PDFs when it comes to internal linking in that situation. However, they could appear in the same search results. The problem is that they could end up competing with each other there.
This happens at approximately the 17:09 mark in the video.
John Mueller Hangout Transcript
SEO Professional 6 17:09
Yeah. Hi, John. I have a question regarding internal duplicate content. So I have the content of a PDF file of a case study I submitted to my website. Now I want to present it as well in an HTML blog article. Does this have any negative impact from my side, because of duplicate content?
John 17:28
So we wouldn’t see it as duplicate content, because it’s different content. It’s like, one is an HTML page, one is a PDF, even if the primary piece of content on there is the same, the whole thing around it is different.
So kind of from that level, we wouldn’t see it as duplicate content. I think at most, the difficulty might be that in the search results, it can happen that both of these show up at the same time. And whether or not you want that to happen, that’s more almost like a strategic question on your site.
So from my point of view, I wouldn’t see it as a negative, when it comes to SEO. But maybe you have strategic reasons to kind of have either the PDF or the HTML page more visible.
SEO Professional 6 18:18
So good, but then they’re competing against each other?
John 18:22
They could, sure. Yeah, I think I think for the most part, PDFs will probably be less visible just because they’re less tied in with the rest of your website.
In that, in your internal linking, you will link to the web pages. And then from one of those web pages, you’ll link to the PDF. So there’ll be a little bit of de-emphasis there from internal linking. But they could appear in the same search results, and they could kind of compete with each other there.
SEO Professional 6 18:52
That will be bad, right?
John 18:54
I mean, it depends on what you want, because users will see that it’s a PDF in the search results.
And if they want a PDF, then maybe that’s the best choice for them. But that’s ultimately up to you.
SEO Professional 6 19:08
So if I want the HTML to be indexed, I would need to set a canonical on the PDF?
John 19:14
Yes, you can. I believe you can set the canonical on a PDF with HTML headers. You can definitely also use noindex in the headers for PDF files.