One webmaster in John’s 09/10/2021 hangout was asking about problems with pages that had similar content, impacting site quality. Most of the pages involve pages for specific postal codes. The content is not super interesting for the users on these pages, as the webmaster explains it.
The webmaster is facing indexing issues. Around 90 percent of their pages are excluded from the index. They wondered where they went wrong and why the pages they created are not being indexed by Google.
John explained that sites that analyze and spit out a lot of data are not always useful for users, especially in cases where niches are already highly competitive and these types of sites already exist. You have to work on the quality of the page and make sure that you’re also providing pages that are useful for people. It’s not enough to simply regurgitate data that’s already being spit out by everyone else.
Create your own unique insights and create content that is supplemental in nature to the redundant data. Anything beyond just regurgitation can help improve your chances of getting indexed.
The main issue is quality and making sure that your pages are high-quality, unique, and useful enough for the user that Google will want to index them no matter what.
This occurs at approximately the 18:18 point in the video:
John Mueller 09/10/2021 Hangout Transcript
Webmaster 4 18:18
So we are a very young startup that just started here in Germany. And, basically, what we do is that we crawl leading real estate platforms and we try and assess which realtor is actually achieving higher prices, which is achieving fast sales duration. And so we have for each postal code in Germany, we have one specific page for that postal code, which includes like, which calculates up to 28,000 pages, which is not that huge for other websites. But we’re facing challenges in indexing those sites.
So our goal is that someone typing in “realtor Munich,” that our page with the ranking of the best agents in that region will pop up. But currently, we see that 90% of our pages are excluded from the index. And I’ve read a lot about it. And you are also talking about how the content might be similar or that the content is not super interesting for the users.
And my question to you is how to solve this. So how to get ranked when someone gets or types in, “Hey, I want to search for a realtor in that location”? Because it’s super crucial for him, but the bot might not understand that. So yeah, super excited to hear what you have to say.
John 19:49
Yeah, I think there are few things that come into play. On the one hand, from, from the quality side, that’s something that I think you almost need to figure out before you push too far, in the sense that it’s, it’s sometimes easy to create these, these backends that analyze a lot of data, and spit out some, some metrics for individual locations and maybe make some lists based on existing things.
But you really need to make sure that these are also useful pages for people. So it’s not just like a re-compilation of data that’s already out there. So I don’t know, for example, if you take a city and they have ten realtors, and it’s essentially the same ten that are in the phone book, then it’s hard to say that your compilation is providing something of unique value.
So really kind of making sure that the content that you’re publishing is something that is really high quality and useful, I think that’s almost like the first step because that kind of helps to grow your website over time.
Webmaster 4 20:58
But how would you get that? Sorry to interrupt you? I mean, how do you… because we are pretty sure that this is pretty unique content all over Germany, might, might be all over the world. And that’s why it’s, it’s very analytical. I also posted an example of a page in the link in the below. But I—we have a hard time knowing how to really, yeah.
John 21:23
Yeah, I think that’s something where you almost need to do something like user studies to figure out what is the best UX for these kinds of pages? What kind of information do people need? What ways can you provide kind of assurance that the content is correct, that you’re not skewed by, I don’t know, payment or whatever, that it’s, it’s really trustworthy content, essentially? But that’s something which is, from, from our point of view, almost like a requirement for the next steps, because it is possible to get a lot of pages indexed in like using different ways.
But if we recognize that you have a large website, and we think that the content overall is low quality, then you almost have a bigger problem, kind of in telling Google, kind of like, hey we improve the quality of our pages. Whereas if you start off with really high-quality content, then it’s more a matter of the challenge of, well, how do I get more of my pages indexed over time? So that’s kind of the first thing I would do. And with regards to indexing more pages over time, that’s usually something that happens automatically, as we recognize that your website is really valuable and really important. And that’s something that just takes time.
And one of the strategies I try to, I don’t know, encourage people or tell people to think about is, on the one hand, you can decide what, what kind of pages you think are important within your website, and which ones you want Google to focus on through internal linking, and kind of working to make sure that, like if, if you say 90% are currently not indexed, and 10% is indexed, that you make sure that at least the 10% that are indexed, are really good pages, and really important pages for you. So that, like you, you get a mass of people already going to these pages and saying, “This is fantastic content.” Maybe they recommend it to other people, maybe there’s a link to it from other places. But at least that 10% that you start with is something that you can kind of grow with. And then over time, what you’ll see is Google crawls more and more of your website, as it recognizes that it’s really good and important.
And that can then result in, well, on the one hand, crawling more frequently, the existing pages, on the other hand, crawling deeper within your website. So kind of like more layers within the internal linking and digging in a little bit deeper. So that’s something that essentially kind of happens over time algorithmically. And it’s, it’s something where it’s sometimes tricky, especially with a website, like, like, I’m guessing yours is to create an internal linking structure in the beginning that focuses on things that you find important. Because it’s very easy to say, Well, I will just list all my postal codes in numerical order.
And Google will crawl and index all of these pages, but maybe the first 10% of those pages that you have linked there are irrelevant to your business or a very low value for your website. So it’s, it’s almost like you have to create a more funneled web structure in the beginning. And then over time, as you see that Google is doing more and more on your website, then to expand that and expand that step by step until you’ve reached kind of your end situation where Google is actually actively indexing all of the content on your website.
Webmaster 4 25:11
Okay, great. So summary, I tried to nail down my content on the pieces of pages that Google already indexed. And from there, try to then grow my website by doing internet ranking properly. And showing Google that this content is unique. This content is valuable.
John 25:29
Yeah, I think, first of all, I would really, really focus on the quality and especially if it’s in kind of a almost machine-generated website, you really need to make sure that everything is such that it’s not only the correct numbers, kind of like for, I don’t know, analytical people who look at the pages, but it’s also something that the average user thinks is trustworthy and useful. And also, I would not focus on the pages that Google currently has indexed, but rather look at your site overall, and say, it’s like 10% is what I have as a budget, and which, which of my 10% of the pages do I want to have indexed? And then almost like, twist it around in that direction.