An SEO professional was concerned about duplicate content in the form of HTML pages and PDFs.
So they asked John Mueller in a hangout regarding this type of content. They have a PDF file featuring a case study article.
Now, they want to present it in the form of an HTML blog article.
They were curious if this had any negative impact from their side, because of the duplicate content.
John answered that they would not see it as duplicate content, because it is different content.
Even if the primary piece of content on these is the same, the entire way it’s presented is different (one in HTML markup, and one in PDF).
From that level, John explains that they wouldn’t see it as duplicate content.
John thinks that, at most, the difficulty may be that in the search results, it could happen that both of these show up at the same time.
And whether you want that to happen, though, is more of a strategic question on the SEO professional’s side.
From Google’s perspective, they would not see it as a negative, when it comes to SEO.
But perhaps they have more strategic reasons to make the PDF or the HTML page more visible than the other.
John also went on to say that he believes, for the most part, that PDFs will likely be less visible in the search results because they are less tied in with the rest of your site.
And that when it comes to internal linking, you will usually link to web pages as opposed to linking to the PDF. Then, from one of these web pages, you will link to the PDF.
So there’s a bit of a de-emphasis on PDFs when it comes to internal linking in that situation. However, they could appear in the same search results. The problem is that they could end up competing with each other there.
This happens at approximately the 17:09 mark in the video.