One SEO professional was worried about their site that has approximately 20-30,000 pages. It’s mostly a directory service, and they have unique content. They are also worried about using numbers as content.
However, they ended up adding noindex to approximately two thirds of those pages.
The reason for this decision was because they wanted to focus more on what Google looks at.
A month or so after they implemented this, less than 1 percent of these pages were indexed.
They are seeing “discovered but not indexed” errors for mostly all of their pages in Google Search Console.
This issue has their team scratching their heads.
Even after making these tweaks, they don’t appear to be wrong to them.
They believe it’s possible that they are restricting themselves a little with the canonical setting, but the settings don’t really look wrong.
Ultimately, they seem to believe it’s a site quality issue.
This happens at approximately the 22:59 mark in the video.
Looking for a new way to improve your SEO audits? Our Ultimate SEO Audit Template could be right up your alley!
John Mueller Hangout Transcript
SEO Professional 6 22:59
Hi, I am new and bear with me, my question isn’t clear enough. But we have a website with about, say, 20-30,000 pages. And it’s like a directory service, but we have unique content. So we put noindex on probably about two thirds of those pages, because we wanted to kind of concentrate on what Google looks at.
But after a month or so, only less than 1% of those pages are indexed. And we’re getting identified or discovered, but not indexed for almost all the pages. So kind of scratching our heads about what we can do. We made a lot of tweaks and it looks like internal linking is all right. We might be restricting ourselves a little bit by the canonical setting, but they don’t look wrong.
So in the end, we kind of figured, you know, going through your videos and stuff, it might be a site quality issue, because there are predecessors with kind of similar content that are already indexed. And yet we do have very unique content. The problem may be that that unique content is not text, it’s data.
And it looks like a lot of the quality is assessed by sort of things that’s readable by human beings. But our service is really focused on a number that is very valuable. And we’re just wondering what we can do, should we kind of try to squeeze in text, which isn’t actually central to what we do? Or are there things that we can do to tell Google that that is actually valuable to people who visit our sites?
Okay, um, I think in general, having numbers on a page is fine, and it’s something that we do see as being kind of part of unique content on a page. So just because a lot of the content is kind of numerical wouldn’t necessarily mean that we would see the whole page as being duplicate. So it’s hard for me to kind of estimate what kind of site it is with regards to kind of, like you mentioned, it’s a directory type site.
From a quality point of view, I do think that matters quite a bit in terms of how much we actually index from your website. So that’s something where I would probably still focus on that. If you’re saying the site is like 25,000 pages large, it feels like something that we should be able to index, if it’s something that we would consider to be a reasonable, quality website.
And when it comes to the quality assessment, it is something where we do look at the content of the page, and we try to make sure that it’s not duplicate. But we also try to understand the bigger picture of that page within the context of the rest of the web. So that sometimes makes it a little bit trickier.
SEO Professional 6 26:11
Right, just as a follow up, we’re getting discovered and not indexed, rather, rather than crawled and not indexed for those 99%. Should we differentiate between those two, because our site is not that big. And this can’t be a crawl budget issue, in my view. In that case, are those two designations pretty much the same in that it’s just a quality issue?
I like, again, I don’t know your website. So it’s hard for me to say okay, and but it, if it’s something where you’re seeing kind of the clean URLs being listed in the discovered, not indexed report, essentially, the URLs that you do want to have indexed, then to me that sounds like it’s really less a matter of like Google can’t go off and crawl that many URLs.
Because, again, with like 25,000 pages, most servers that are kind of reasonably sized can easily allow that kind of crawling on on a regular basis. And it’s probably really more a matter of our understanding of the overall website quality. And with larger websites, or if in the discovered report—discovered, not indexed report—you see that there are lots of different variations of URLs, like with parameters or with upper or lowercase, those kinds of things, that can be a sign that the internal linking is kind of messy, and that we’re having trouble finding the right URLs to crawl.
But if we’re showing the right URLs in the discovered and not indexed report, and it’s a reasonably small website, then to me that kind of points more in the direction of the overall site quality.
SEO Professional 6 27:54
Right? So do you think we should try to add text in there? What we show is really a directory of companies, and we show what a stock price means in terms of future growth of that company. So it’s a number. But there’s not a whole lot of, sort of, readable text that goes with it. It’s just, you know, a number.
Does this $20 stock mean 10% growth or 15% growth, is what’s valuable. We can add it, we do have a description, but it’s common to all those companies. And, you know, we’ve got to figure out what to do if we were to squeeze in unique text per each company. But should we head in that direction? Do you think?
I don’t think the text will affect how we index the pages. So so from that point of view, it’s something where if you see the text affecting how users kind of look at your pages, and are able to interact with your pages, and then sure, but that’s more a matter of trying to figure out what users are actually looking for and where you can pry provide unique value to your users.
But just kind of adding text to pages, I don’t think would affect how we would crawl and index those pages. If it’s something where you’re providing numbers like, I don’t know, like, like the stock numbers there. That’s something where I would also try to figure out what you can do to make sure that what you’re providing is unique and provides value to users and do something maybe along the lines of a user study to figure out what what is it that we can do to make our website such that users recommend it to other people as well and kind of that it builds up almost like I don’t know, trust or something from from the user point of view.
And a lot of times those are not purely technical things that you change on a website where you change a design or you change, convert some of the text, some of the numbers into text for example. It’s really a matter of the overall setup of the website.