Googler John Mueller, a Search Advocate working out of Google’s Switzerland Office, holds regular video office hours on most Fridays.
We are here to recap the last hangout from August 6, 2021, and deliver the goods on SEO insights from John Mueller into the following topics:
- Expired domains ranking in Google,
- How to deal with people who steal your content,
- Whether minor language differences matter for SEO,
- If offline branding affects SEO,
- and much more.
As always, John is full of great SEO insights, and this hangout is no exception. Be sure to read our SEO insights below, and watch the video.
John Mueller SEO Insight #1: Expired Domains are Ranking on Google Without Issue—What’s Up?
A webmaster was concerned about expired domains outranking their own sites. They (the other webmasters) were not writing about health topics before, but now they are. They have all sorts of good links like Wikipedia, Government sites, and so on.
John explained that Google does have systems in place designed to catch these expired domains andfigure out how to handle them better, instead of just ranking them as they are. He recommends filling out the spam report form.
John Mueller SEO Insight #2: What’s the Best Way to Deal With Website Copycats?
Another webmaster was concerned with others copying their website verbatim and ranking better than them in the SERPs. The problem they were concerned about is Google finding everyone else as the source of the content instead of them. That’s been exactly the issue as of late—everyone else is ranking above them despite them being the original source of the content. Their other issue was that they accidentally deindexed the site a few months ago, and they’re thinking that now Google believes they’re the copycat site.
John recommends the DMCA approach—DMCA stands for the Digital Millennium Copyright Act—a body of law that protects content owners against their content being stolen. What this approach means is that the site will contact the webmasters of the offending sites and file a notice that they are in violation of copyright and demand the offending content be removed. If the webmaster doesn’t respond, the typical next step is for the content owner to go to the web host with their DMCA request. The problem with this approach is when the offending sites have a lot of content, this can be labor-intensive.
Regarding the webmaster’s second issue, John explained that this sometimes happens when sites go offline or get deindexed. But if it happens, it should fix itself automatically.
John Mueller SEO Insight #3: Do Minor Language Differences Matter for SEO?
One webmaster was concerned about using British English vs. American English and whether this matters for SEO.
John explained that it’s a choice that’s entirely up to the webmaster and that this doesn’t matter for SEO. He suggested focusing more on what their users want to read, rather than the SEO aspect in that case. There is not really an existing classifier that attempts to examine whether or not the website is written in British or American English. Instead, it’s more about the phrasing and spelling that is used on the site and a matter of presenting the content in such a way that delights the user. Whether or not it’s in British or American English is entirely up to the webmaster.
John Mueller SEO Insight #4: Does Offline Branding Affect SEO?
One webmaster was wondering how John felt regarding offline branding and its impact on SEO because they are in a space where many large brands are competing against each other.
John explained that he doesn’t believe that offline branding affects SEO directly. But, in another sense, there is something interesting about branded searches in that they are hyper-focused on that search. This means that you’re not exactly ranking for many other search terms that may be similarly competitive or more competitive.
So you have that individuality going for your site when it comes to brand recognition and people typing that brand name into the search box. There is some importance in terms of building up brand awareness among your user base and getting them to search for your site. Once you build up brand awareness, most of that should be responsible for stable branded traffic.
John Mueller SEO Insight #5: How Does Google Determine the Date for an Article?
Another webmaster was asking about dates in an article and how Google would determine the article date. The webmaster updated the snippet in the SERPs with the wrong date. They updated the article metadata with the correct date and they updated the date in the Schema, and nothing changed. It didn’t matter how it was updated, it still showed the incorrect date.
John explained that Google uses multiple factors to try and figure out how the page should be dated. It’s not about the metadata. Google tries to look at whether or not there has been a significant change on said page. For example, if you just change the date on the page, Google needs to see that you have made significant changes as well so they can show the updated date to users. It’s not just about updating the single date.
John Mueller SEO Insight #6: Different Core Updates Can Affect Sites Differently
One webmaster was concerned about core updates and was wondering if two different updates could be responsible for uplifts in traffic. They were also wondering if it could be related to speed.
John said that if you see a change during a core update, you may not always see a change during the other core update as well. He also explained that the core update is not related to the page experience update because that’s an entirely different type of update.
John Mueller SEO Insight #7: How Can a Webmaster Trigger a Sitewide Re-evaluation?
Another webmaster asked—what are some things that a webmaster can do to trigger a sitewide re-evaluation of a site after changes have been implemented?
John answered that there isn’t anything technical that can be done to trigger a re-evaluation. It’s not needed because Google is re-evaluating all the time. It examines the content that it finds for a site, and eventually, if the content changes, Google takes these changes into account. There’s nothing you can do to manually trigger such a re-evaluation.
Watch the Hangout!
You can watch John’s 08/06/2021 hangout here:
Tired of watching videos? You can read the transcript of the session below:
John Mueller Google Search Central Office Hours Hangout Transcript from 08/06/2021
All right, welcome, everyone to today’s Google search, central SEO office hours Hangouts. My name is John Mueller. I’m a search Advocate at Google here in Switzerland. And part of what we do are these office hour hangouts where people can join in and ask their questions around search. And we’ll try to find answers. A bunch of the questions were submitted on YouTube already. So we can go through some of those, but maybe like, like always, if any of you want to get started with a question, we can go look at those first. It looks like some of you already have your hands raised, so we can go through that. Let’s see, Praveen.
Webmaster 1
So my question is about expired domains. So from the past few weeks, or say months, what I’m seeing is, there are some sites that are ranking prominently in search. So when I look at their website, they don’t really they don’t look like a quality website. And when I looked at history, like through web archive, usually they belonged to Indian government agencies, some institutions and government institutions. So what the government institution did is they created new websites. And since they are government, they don’t care about migrations, or 301 redirects, and all that stuff. So they created the new websites, and you know, abundant, these old ones, they got expired, someone bought those domains and created a new blog kind of thing. So initially, they started writing about government jobs, and then while reading, they started writing about education. Now they’re writing about health topics, like on vaccines. And they are ranking prominently on things like the best vaccine to take and all this stuff. So what can we do about such domains where…I think they are just ranking based on their history, like the domain history and links, because they got links from sites like Wikipedia, or some government official website, I can share a few links if you want, like if you want to have a look at the kind of content they produce.
John
Sure, I mean, it is hard to say without looking at the specifics, but we do have systems in place that try to catch these expired domains and to figure out how to handle them better. But if you have some examples, I’m happy to pass them on to your team. It also sounds like something where you’re not sure if it’s spam or not. And for those kinds of things, I will just also use the spam report form and kind of let us know from there, what you’ve been seeing. And if it’s not spam, then the website and team will recognize that as well. But that’s, I think, always a useful approach too.
Webmaster 1
Yeah, I think what Google recommends for health topics is that they should be written by someone who is a specialist or someone who’s, you know, a health specialist or doctor. If you just go through, I just shared the link, if you just go through these links, there’s no way you can figure out how this information is based on what conclusion, they’re saying that this is the best, or this is what you should do. For vaccinations, it makes these sites I think it’s mainly because they have the domain history. So that’s what is boosting the rankings.
John
It’s hard to say because there are always so many factors that are involved with regards to kind of showing up in search. But I, I don’t know, it’s hard to say after…
Webmaster 1
I just shared the examples. If you get time. Please go through…
John
Thank you. All right, Evan.
Webmaster 2
Right. Hi, thank you for doing this. We have a thing we noticed back in June. You know, we have a website that’s been around for nine years with a huge copyrighted database of 200,000 profiles that we’ve manually written over nine years with a team of 40 writers. And there’s always been websites that have copied our entire database. And just literally sometimes they copy our Terms of Use page and our team page that are blatant about it. But they copy our website. And we’ve noticed that they’re showing up in search, sometimes above us. And sometimes I might copy like, you know, maybe if we’ve 700 words on the page 250 words, and then no, add additional machine texts that are pages longer. Or other times I’ll just copy it directly. And again, it’s never been an issue because we rank ahead of them. But lately, they’ve been ranking ahead of us with our copyrighted content. And I think the most concerning thing we noticed is that when we kind of do a quote, search and search for two of our sentences in Google The five or six copycat sites show up and then our site doesn’t show up. And it says in order to show you the most relevant results, we’ve omitted some entries. So it’s almost as if it is reversed. And now things were copycat even where the original source. So we’ve gone down the DMCA path. But it’s hard to do that at a full domain level. And we accidentally deindexed our site two months ago, we launched a different language. So we’re wondering if maybe that one day we reindex, once we reindexed it, Google thought that these copycat sites would be the original source. So we’re, you know, we’re running servers, we got to the site to remove the DMCA web host, but then they just moved to another host. So, um, you know, we are the source of the copyrighted database, but other sites are ranking ahead of us with our direct content. So we’re kind of running in circles.
John
Now, I think, in general, the DMCA approach is probably the correct approach to take there. Because that’s kind of the most direct process that involves that. And it is something that is on a per URL basis, primarily. So if you have a lot of, I don’t know, if it, if a site has a lot of copies of your content, then it’s kind of a lot of work to get that done. But that’s kind of the ideal approach because that’s, I don’t know, the official path were you to say it’s like, well, this is my content. And it’s not their content, and I can prove it to you. With regards to kind of being the index, for a brief time, I don’t think that should play a role there. That’s something is like these kinds of things happen on a technical level from now, every now and then. And sometimes legitimate sites do something wrong, and they go offline for a day or so. And that fixes itself automatically. Again, it’s not something that would be held against the site.
Webmaster 2
And how do you DMCA if they keep once their new domains pop up? So once you get it blocked, they move it to another domain? It’s kind of like a never-ending game. So how do you kind of let Google ultimately know that you’re the initial source? Some of these other sites might have zero authority, like random domains, and we have an eight authority or eight authority been around for nine years? So how do we kind of, I don’t know, fix the error, if you will, of Google not recognizing that word, original source? You know, because it keeps happening on new domains? Once we snap them.
John
Um, I, I don’t think there is like this one trick that you can do to say something like, well, this is always our content. So that’s one thing to keep in mind. The other thing I usually try to recommend is to focus on sites that are actually a problem for your site, in terms of like, are they actually ranking above your site or not? Because it’s very easy to go and search for an exact copy, and you’ll find tons of copies on the web. But just because those copies exist, doesn’t mean it’s their problem for your website.
Webmaster 2
Last question on that, how about the omitted results when we like search two sentences in quotes, and they show a copycat’s, but then ours is actually one that was omitted, that was like the most glaring concern, because Google’s almost saying, this is a supplemental site, we’re not going to show but it’s showing. So it’s, again, it’s at the bottom, Google thinks it’s the copycats that are just blatantly copying are being shown. So how do you know…
John
Now I think that’s more of a technical thing with regards to how you search. So that’s something that depending on the type of query that you do it, it’s possible to trigger that, but it doesn’t mean that Google thinks your site is a copy. So that that’s I don’t know, there are few of those things around search, where if you search in a very specific way, then it comes out as like, well, Google showing this and it probably means that but it’s just well, Google’s systems, showing you something doesn’t mean that it has any additional meaning past that.
Webmaster 2
Okay. And then for the DMCA, since it’s so hard to do it URL by URL if you’re talking about 200,000 unique pieces of content. You know, I know there’s an email you can submit to Google, like a more formal request on the entire URL, like, database, but it’s hard to do it one by one, obviously. So, you know, how would you kind of suggest scaling that if there’s like five sites that are problem 200,000 URLs per site? You know, that’s a lot of clerical work, you know.
John
I don’t know. I don’t have any experience with that side. So I mean, that’s something where I don’t know if we have a question. Type the address for that it might be that is primarily through the form. My understanding was that if we recognize that sites submit a lot of DMCA complaints, then we’ll reach out and see if there are more scalable ways to do that. But I don’t know what the actual process is there.
Webmaster 3
I’ve got one question, which is in the comments, and I apologize if this has already been in a previous Hangout, but on a website called unhealthy principles, and it’s .co.uk CCTLD. Should we be running in British English or American English? And it’s sort of switched from British English to American English because about 60 to 70% of the traffic is from the US? So what’s your advice?
John
I don’t think it matters. It’s totally up to you. And something where I would focus more on what your users want to read, and less on the SEO aspect there. I don’t think for example, that we have any classifier that tries to figure out like, is this written in British English or not, but rather, we we try to understand the English content, the way that we find it and kind of figure out how we should show that relevantly and the kind of exact phrasing and spelling that you use on your site is more a matter of you presenting your content in a way that your users want to accept that and find kind of want to be able to work with that. So from that point of view, it’s totally up to you.
Webmaster 3
Hi, John. I just wanted to ask about how you feel offline branding impacts SEO, organic rankings, it puts a bit of context we’re working on, on a highly competitive transactional market. And we are, let’s say, getting close to being in the H1 and stuff like this. But we are clearly at a disadvantage in terms of brand awareness. There are some very large brands in our market. And I wanted to know a little bit of how you feel offline, especially offline brand, advertising can affect that.
John
I don’t think it affects SEO directly. But it is something where if people search for your brand because they recognize your brand, then essentially you have no competition, right? Because your site is by far going to be the most relevant one. If people are searching for your company, and you are that company, then it’s almost like a navigational query, what we call it where people want to go to a specific page. And they’re just like entering something into Google to get to that page. So that’s something where I kind of think, building up some kind of brand awareness amongst your users. And having a brand that is easily findable, that can help in the long run in that all of this kind of brand branded traffic that you get, that’s traffic that’s not going to fluctuate as our algorithms change. Because if people are searching for your site, they will always want to find your site. And if our algorithm shows something else, that’s almost like a problem in our algorithms. Whereas if people are searching for a generic kind of product that you have, whether we show your site or another one, the search quality engineer is on our side, they can argue about that. And there is no correct answer for that. But if you kind of build up that brand awareness, and have a certain level of kind of branded traffic to your site, then at least that traffic is something that will be fairly stable. And I wouldn’t see this as something where if you have branded traffic, then suddenly you rank better for non-branded traffic, but more like well, the different sources of traffic.
Webmaster 3
Okay, and how do you think that Google deals with non-branded traffic with their brands that are basically expected by the user to be on top I mean, like from our position, if I understand it from just a pure theoretical perspective, and if the top five positions are kind of expected there to be some brands, how can we get into those places? Because I guess we will always be at least at a disadvantage in propensity to click on our search result. Do you think that plays any role at all?
John
I don’t think so. I don’t think it’s the case that for most kinds of queries that we would have any system in place that would say, for this non branded query, we should always show these other brands. There’s only one situation where I’ve seen something similar happen. But that’s something where essentially, we are our systems recognize the brand as almost being a synonym for the non branded term, which is something that from from our point of view, I would call that a bug on our side, but it is sometimes something where people just like purely associated a specific brand with a specific kind of item. And that’s something where it’s, it’s less a matter of, well, you have to do some SEO tricks to get around that, but you kind of really have to build up the awareness. So I’ll actually, there are multiple brands that are active in this space. And I people should be searching for other stuff as well. Okay. I don’t know if that made a lot of sense. But yeah, I think I think, for the most part, I wouldn’t see it as a given that we always show brands for non-branded queries. And it’s definitely not the case that these would be kind of like a fixed set of brands that we would always show.
Webmaster 3
Okay, no, but I think just in general, like, like the algorithms are kind of tried to give the best result for for use a query, and it happens to be you know, that the users want to see those brands that can it kind of in the long term, it self-teaches that some of those brands need to be on the top decisions.
John
I yeah, I don’t think our algorithms would learn it in the sense that we always need to show those brands for those queries. I think that would be like an extreme exception that we would maybe pick one, or something like that to see as a synonym. But it is something where over time, if it’s a very competitive area, then these sites have built up kind of a really strong and stable foundation and kind of breaking into that, that that is hard. And that is something for smaller or newer sites where you almost have to think about well, is it worthwhile to take on these brands for these queries? Or should we find something smaller where we can be kind of the king of our niche first and kind of grow from there? And kind of picking that is always more of a business decision than an SEO decision?
Webmaster 4
Hi, hello. My question is regarding the updated article date in result snippets, like we showed that seven days ago or two days ago, so how does Google assumes that it is outdated two days ago like we submitted through the metadata article modified, and through the Schema also. But still, like, if you have updated the page yesterday, it is still showing like it is updated two days ago, whereas in our case, our competitors, it is showing that updated yesterday only. So my question is, how does Google determine when the page is updated?
John
We use multiple factors to try to figure out which dates are relevant for the page. So it’s not just the metadata on the page, but we tried to figure out, like, has there actually been a significant change on the page. And we haven’t helped center articles, I think, on this topic. So I would check that out. One, one thing that you should also keep in mind is that it should also be something visible on the page. So if you just change the date on a page, then that’s something where, like, we need to really recognize that you’ve made significant changes there so that we can show that to users and say, well, actually something significant happened on this page two days ago, or one day ago, or whatever. And kind of all of that needs to come together. And it’s very possible that for some pages, we don’t pick that up. So that’s something where I would not say that it’s a given that even if you do everything right, we will use the date that you give us. Sometimes there are situations where our algorithms just pick something else regardless.
Webmaster 4
Regarding the visible dates, if there are multiple dates in the page, like our main article, we have updated it. But there are a listing of other news articles below the article like ratio-related news. In that also we have the date so like, by any chance Google can pick that date, also.
John
It can’t happen. But usually we, because we try to take so many factors into account, we try to find a date that has some kind of support across multiple factors. So if it’s just a random date on a page, then probably we will ignore that, unless the actual date is also positioned as a random date on the page. Like if you can’t present it in a way that is clear to us that this is really the date that you want to use, then we might have to guess. And that’s something where the metadata comes into play a little bit. Also, where things like time zone sometimes play a role, where if you say, well, it changed today on this date, but you’re in a timezone that is very far off from where Google is crawling, perhaps, then Google might say, well, you said this date, but we think it’s a date later. And that’s more because of the different time zones. So for the kind of the metadata on the page, also make sure that the time zones are correct. If you have dates and times on a page as well, mentioning your timezone there in the text is also a good idea.
Webmaster 5
Thank you. Sure. I think to the grass, this is going to be a multi-parter. We have a domain verified certification. And it contains several subdomains, we also have the individual subdomains verified through URL prefixes. And when we are comparing the data that the person that clicks between these two different property setups, for one specific domain, we get quite a big delta and the impressions and clicks are somewhere between 40 percent to 80 percent. So we’re wondering if maybe there are any suggestions before theories on why that might be the case? This isn’t true of all setups like that we have some domains that don’t show that discrepancy. This one particular one does.
John
Yeah, sometimes that’s pretty tricky. It’s hard to say exactly what is happening there. What usually plays a role is, on the one hand, we have queries that we filter out, which might be more visible in some domains or are less visible than others. And on the other hand, we have a limited amount of data that we keep per day, essentially for each of these properties. So if you have a domain-level property, and it has a lot of different kinds of data for every day, then it’s very possible that we will trim that and kind of keep the top set. And usually what you would see is that the overall sum is still correct. But if you look at the tables, then you might see differences. So that’s my guess is that one of those things is happening there. What I would do in a case like this, where you see these kinds of differences is used the more specific property and use those metrics instead.
Webmaster 5
Okay, yeah, we can sort of walk through the documentation for the performance reports and inventions that some longtail data loss might occur. If you are looking at grouping by query or page. You mentioned some of the queries might filter out. I understand that some of them are optimized for privacy reasons. Could you could you sort of give a definition of what longtail data means, though, is that the same as the anonymized queries? Or are we just reaching a hard limit on how much data we can store?
John
It’s hard to say offhand without looking at the site. But usually, with regards to longtail data, if you’re looking at a really large site or a site that has a lot of different kinds of data for every day, then it’s more limited to the number of specific data points that we have per day. So if you have a lot of different search impression types, or if you have a lot of different queries that lead to, to a lot of unique pages, then kind of all of that, multiplied essentially means that we have a lot of data entries per day. And we have to cap those for performance and stability reasons. So that’s probably what you’re seeing there. And we call it more longtail data because it tends to be data that has either low impressions or low clicks. And hence we try to focus on the more important data for the site.
Webmaster 5
Okay. We say data entry is the best way to consider a large site, just the total volume of impressions between URLs that you have, clicks? Like, Is there a specific metric that correlates really well with large or…
John
Not really not really, it’s big because of the way that the data is collected. Like, if there are a lot of unique URLs or a lot of unique queries that lead to the site, then all of that kind of multiplies out. And or if you have a lot of unique kinds of search features that are being triggered, then all of that also plays a role in that.
Webmaster 5
Okay. So, in the documentation, it says that the longtail data loss affects that table, it mentions that for both the queries and the pages, can we get around that through? Like filtering for specific directories or queries that pull more data in? Or could we use something like the the API and all that data, that’d be better…
John
I would definitely try it with the API, I mean, especially with the larger site than with the API, it’s easier to get the full set of data. And then to kind of just put together either some, some dashboard on your side, to pull out the details that you want. But I would also, especially when you’re looking at multiple properties, within a really large domain, I will try to kind of focus on the data for the individual properties. So instead of going to the highest level view, kind of the domain level, then actually go down to the subdomain, or whatever you have there. And look at the data there. Because for each individual property, we tend to track the data a little bit more accurately.
Webmaster 5
Okay. And I guess, the last question here, like, would I want a subdomain? If you’d have like some cannibalization issues? Will that affect how the data is aggregated between the two?
John
It could, yeah, it could as well. So we try to use the canonicals for individual URLs. And we map that to the properties then. So if you have something like dub dub, dub, and non dub dub dub, and it’s kind of fluctuating between those different versions, then on a domain level, that would also be in the same domain, but on a kind of a lower level sub-domain level, that would be different subdomains. So that would be tracked individually. Sorry, could you elaborate on that last time? So if a page on your site is essentially fluctuating between dub dub dub and non dub dub dub, then if you have those two verified separately in Search Console, then you will see the data sometimes in one property and sometimes in the other property. Because of the way the canonical changes, usually, like on a larger site, that canonical doesn’t change that much. So it’s less something that you would see, but it can happen, especially if it’s an older site that was maybe not created in the cleanest way possible, and that the internal links are sometimes with HTTP, sometimes, with HTTPS, and sometimes with dub dub or not, then sometimes it happens that we kind of move URLs around, but it’s, it’s fairly rare for modern sites, I’d say.
Webmaster 5
Okay, and just last question, here. We’re talking about large sites quite a bit. Is there some sort of like a threshold? Or at some point where we can say like, you know, 100 million impressions are…?
John
Not? Not, not really. So it’s like a convention, it’s more a matter of the unique entries that you’d have. And it’s, it can be the case that a site gets a lot of impressions, a lot of clicks. But it’s all focused on a smaller set of URLs, or it’s mostly focused on something like, I don’t know, the top 10,000 or top 100,000 URLs. And in cases like that, then, despite it having a ton of impressions, it doesn’t mean that it has a ton of unique data entries per day.
So that’s, that makes it a bit tricky to kind of find a threshold where we say, Well, this is from where you need to start looking. And that’s that’s kind of why we frame it as the long tail data in the Help Center because there is no kind of absolute number where we can kind of apply that. Thinking, sure. Oh, it looks like a bunch of you still have your hands raised, but I need to go through some of the submitted questions as well. We’ll definitely have more time for live questions. As well along the way, but just to make sure that people who submitted questions don’t get their questions lost completely. Let’s see.
I read online that people said that they’ve seen an uplift after the June core update, they’ve also seen one for July as well. For us, it wasn’t the same as what could have reverted in July? Could it be related to speed? These, as far as I know, were essentially separate and unique updates that we did. So we call them both core updates because they affect the core of our ranking systems. But that doesn’t mean that they affect the same core parts of the ranking system.
So from that point of view, it’s not the case that if you see a change during one of these core updates, you will always see a change during the other one as well. So from there, I wouldn’t assume that they have to be related. With regards Could it be related to speed, things can definitely be related to speed because the whole page experience update is something that I think started rolling out in June.
So in July, you might still see some changes there. But the decor update itself isn’t something that I think is related to the page experience update. That’s kind of a separate thing. This Google looks at the amount of customers and our reviews, a website has to rank it higher in the search results. Why are some pages getting indexed after more than two weeks is because the crawlers don’t deem these pages strong enough to index them faster than two weeks? So as far as I know, we don’t use the number of customers or reviews when it comes to web search with regards to ranking. Sometimes we pull that information out.
And we might show it as kind of a rich result on the search results. It might be that for the Google My Business side of things, maybe that’s taken into account more. I don’t have much insight there. But with regards to normal web search, we don’t take that into account. With regards to getting indexed after more than two weeks, it is really hard to say usually, that’s a mix of either technical issues or that we can’t crawl all of the content quickly.
And quality issues that our systems are not that interested in crawling the content so quickly. And kind of figuring out where in between there, your sites, it is sometimes a bit tricky. But sometimes it’s also easier to figure out, especially if you can determine that from a technical point of view, your site is actually pretty fast. So that’s something where sometimes it makes sense to take a step back and think about the quality more than just the technical aspects.
What are some things that a webmaster can do to trigger a site-wide reevaluation, from a quality point of view, like when you change domains? Or when does Google say okay, let’s try to collect new signals and see whether the site is better? From a quality point of view. I don’t think there is anything technical that you can do to retreat to trigger a reevaluation. And usually, that’s also not necessary, because essentially, our systems re-evaluate all the time. And then they look at the content that we found for a site. And over time, as we see that change, we will take that into account. So that’s not something where you kind of have to do something manually to trigger that.
The one time, where we do have to kind of reconsider how the site works is if a site does a serious re restructuring of its website where it changes a lot of the URLs and all of the internal links change, where maybe you move from one CMS to another CMS and everything changes and it looks different. Then from a quality point of view, or from a technical point of view, we can’t just keep the old understanding of the site and of the pages because everything is different now. So we kind of have to rethink all of that.
But that’s also not something that is triggered by anything specific, but rather, it’s just, well, lots of things have changed on this site. And even to kind of incrementally keep up. We have to do a lot of incremental changes to reevaluate that our website’s posts are being indexed after one week, though its site is old and mobile-friendly. When I set my URLs and inspection tools, it shows it’s crawled by Google smartphone.
But in the about section, the indexing crawler is shown as Googlebot desktop, does this mean the smartphone bot is crawling the URL and the desktop but is indexing this URL? Is this the reason behind the indexing delay? This would not be a reason for any kind of indexing issues. I don’t know why you might see this kind of mix of desktop and mobile crawlers there. It might just be that search console is essentially saying, well, by default, with the inspection tool, we’ll just use the mobile Googlebot. In practice, we always crawl with both of these crawlers. It’s just a matter of how much we crawl and index with any of these crawlers.
So usually, you’ll see something like 80% is coming from mobile, Googlebot and 20%, maybe from desktop Googlebot, and that kind of ratio is essentially normal and doesn’t mean that things are slower than anything else. Are there any consistent reports on AMP tanking a website’s performance? How consistent is the page experience report in regards to its interpretation of AMP pages?
I’m not aware of anything where using AMP would cause problems with a website’s performance. But it sounds like you’re looking at something very specific. So my recommendation there would be to go to the Help Forum and try to post the details of what exactly you’re seeing there. And try to get some input on that. But for the most part, using amp on a website is a great way to get that speed boost without doing a significant amount of work. So I wouldn’t see that as something that would be causing problems with a website unless the AMP integration is somehow broken in a weird way.
Then, I think the verification one, we’ve talked about that I’ve been using search console for my employer’s website, and was wondering what’s the best way to track the value of a blogs’ value on an e-commerce site. It’s driving a majority of the web traffic with most of those being first-time visitors. However, blog visitors notoriously don’t always buy on initial visits to a website, but do return later and are truly the first step in the conversion path. Employers want to see concrete data.
I don’t know. So, to me, this sounds like something that would be best looked at with regards to analytics. And there are different approaches that you can take with analytics, I think the the overall topic areas attribution modeling, where you try to attribute individuals steps that are being taken on your website to maybe figure out like, Where is this coming from, or what what parts of a website are kind of contributing to, to conversion to a sale, those kind of things. But that’s something where I think Search Console can’t really help that much. And from a search point of view, we don’t have that much insight into what happens within your website, and like who comes and goes within your website, in search.
So you’d almost need to kind of look at that maybe with an analytics expert, maybe with the analytics Help Forum and try to get some input there. But that usually is something that people can figure out. So how many affiliate sponsored links are safe or good to have on a single page? Is there a perfect ratio of links to Article length to maintain here? There is no that. So that’s, that’s something from our side, it’s not that we’re saying that affiliate links are bad or problematic, that it’s more a matter of, well, you actually need to have some useful content on your page as well. So that’s kind of the angle that we take there, the amount of affiliate links that you have on a site is totally irrelevant.
The kind of ratio of links to article length is also totally irrelevant. But essentially, what we need to find is a reason to show your site and search for users who are looking for something. And that reason is usually not the affiliate link, but the actual content that you provide on those pages. So from that point of view, kind of trying to optimize the affiliate links or trying to hide the affiliate links, or whatever you’re trying to do there, I think is almost like wasted effort, because that’s not what we care about. We care about the content and kind of why we would show your pages in the first place. And if the content of your page is essentially just a copy of a description from a bigger retailer site, then there’s no reason for us to show your site even if you had no affiliate links.
So you really need to kind of first have that reason to be visible in the search results. And then how you monetize your site or what links you place there. That’s essentially irrelevant. The cctld, I think we talked about, there’s a site that ranks in top positions for multiple high search volume city plus service without providing unique content. They just duplicate their pages and score high, even when there are other websites that seem to provide better content. They don’t have any high-quality backlinks or internal links on the city pages, is there any reason they score high on Google? I don’t know. It’s hard to say just based on your description there.
It’s also something where we take into account a lot of different signals when it comes to search. And it can be the case that a site does something really terribly and does other things really well. And in the search results, we might still show that site fairly visibly. So that’s something where just because one site does one thing, that doesn’t mean we will never show up in search. And usually, this is something where a lot of sites kind of profit from because nobody really knows exactly what they should be doing perfectly.
And it’s something where everyone gets input on what they should be doing from various people. And sometimes you listen to the wrong people, you do kind of something stupid along the way. And just because you do some things incorrectly or sub-optimally shouldn’t mean from our side that we shouldn’t show your side at all. So this is something that we see all the time, especially with smaller businesses, and that they listen to something or some blog online that says, Oh, you need to put your keywords and hidden text on your pages and search engines will fall for that. And they go off and do that.
And essentially, that’s against our webmaster guidelines. So theoretically, we could go in there and say, like, Oh, you did this one thing wrong, we’re going to remove your site from search. But it’s a lot better for us to go in and say, well, we recognize you did this thing, and we can ignore it. And we’ll focus on the rest of your site instead. And if the rest of your site is okay, then there’s no reason for us not to show it. So it’s not the case that you have to get everything perfect in order to be visible in search.
But rather, it’s like lots of things can add up. And you can do some things really well and will still show you in search, even if you do other things kind of sub-optimally. So I think that’s always tricky. When you look at other people’s sites and kind of say, well, they’re using hidden text or using this other technique that is against their guidelines. Therefore they shouldn’t be visible ranking above me, because that’s not that’s almost like way too simplistic. We had a speculative question regarding the recent confirmation of deleting redirects after a year, where you quote, this time frame allows Google to transfer all signals to new URLs, including re-crawling and reassigning of links on other sites pointing to your old URLs. We understood this to mean that all off-site authority is passed after a year even once deleted, what would happen if the old page that was redirected previously to hundreds again? Like essentially, would a turn come back? would both pages benefit from an off-site link?
Now, so essentially, what happens with regards to redirect is we will forward those signals as best that we can and what happens with regards to External links going to those pages will forward those External links as well. And if that redirect is gone, at some point, then essentially, we will have kind of two pages and they will rank individually. So it’s not the case that you can just redirect. And suddenly you have two times as much value on your website. But rather, it’s like, well, we take whatever value goes into the website, we follow the redirect, and we pointed out the final URLs there.
And if you remove redirects, then that kind of chain is broken there and remains somewhere else. So just kind of like taking a site, adding redirects. And then after you’re dropping those redirects, and hoping that suddenly you have two times as much value on your site, that’s not how it works. So from that point of view, I don’t think you would see any value in doing that. The main reason for this is less to kind of prevent abuse around this, but rather like people change their minds all the time.
And it can happen that you move a URL from one location to another and then at some point later on, you say, Well, actually, I want to have the other URL index as well or I want to kind of revert that change, go back to the previous URL. And we need to be able to kind of go back and understand a site again, even after having redirects for a while. Okay, looks like we’re coming kind of close to time. So maybe I’ll switch back to live questions from you all. I also have a bit more time afterwards, if any of you want to ask even more questions. So we can go through some of that too. Let me just one. One quick question here that I also see out here, a site that I looked into briefly with regard to the coupon site that is mentioned there.
With regards to indexing and being shown in search, one of the things I noticed there was that this is like a really low-quality site. So I don’t know the person who asked this question, I would definitely recommend not focusing on technical things, but really focusing on the quality overall, first, and making sure that the quality of the site itself is actually a lot better. Before you worry about things like how quickly Google is crawling and indexing pages.
Webmaster 6
Let’s start with the first one. So basically, Search Console console actually reported some static files as affecting the performance. And actually, we minified those files and also added the set of rules on the robots.txt. I’m not sure if that’s the correct thing to do. So once we’ve done that, actually, what happened was the mobile usability was affected. And then it was flagged in the database, mobile usability. So I just wanted to know what is the best practice and how we should go about doing that.
John
What kind of files were they? Like CSS? JavaScript? So I think with the robots.txt disallow, essentially, you’re preventing us from looking at those files. And if those files are necessary for the rendering of a mobile page, then we won’t be able to render those mobile pages to recognize that it’s mobile-friendly. So that’s probably why you’re seeing that shift with regards to mobile-friendliness in the report there.
Webmaster 7
Right. And so should we actually even add the disallowed rule for those? Or should we just keep them open?
John
No. I think for the most part, we recommend making JavaScript and CSS files crawlable so that we can render the pages, we won’t index them individually. So it’s not that I will say, Oh, this JavaScript file is suddenly a text file we should show in search because that doesn’t make sense. But we need to be able to access them so that we can render the pages and see what they look like.
Webmaster 7
Right. The next question is about using disallow on robots.txt versus using noindex? What’s the difference? And what are the different use cases?
John
We don’t support noindex in robots. txt. So that’s kind of like the main difference there. So essentially, if you’re talking about them, the no index robots meta tag on a page versus robots. txt, the robots. txt file prevents us from crawling a URL. But it doesn’t prevent us from indexing the URL alone, which is kind of a weird situation. But essentially, they’re their pages that are blocked by robots. txt, but they’re still useful for users. And we don’t know what is on the page. But we might be able to recognize they’re actually still useful for users. So it can happen that we index a page just purely by the URL.
And based on the anchor text of links pointing to that URL, we can still show that in search. And we essentially don’t know what is on the page. But we see lots of people recommending this one particular topic. So we might go along with that. With regards to the robots, meta tag, no index, if we can crawl the page, and we see that no index meta tag, then we won’t index that page, then it won’t be shown in search.
So in practice for most normal pages on a website, if it’s blocked by robots.txt, then we won’t show it for normal searches, because we usually have more relevant content from your website to show. If you search specifically for that page, then we will still show if you do like a site query, then we might still say well, actually, like for this particular URL you’re looking for we have this page. So those are kind of the differences there.
Webmaster 7
And what does it mean when the Search Console says indexed, though, blocked by robots.txt, I think it’s linked to in the same response that you saw…
John
Yeah, yeah, exactly. That’s where we essentially index the page without knowing what the content is. And it’s not necessarily a bad thing. It’s just kind of like for your information. If you wanted to have the page indexed and rank based on its content, then with that setup there, that wouldn’t be possible. Right?
Webmaster 7
And if you want to change the link on the website, that is basically to rename the URL. How can you do that without affecting the Search Console ratings?
John
Essentially, you would just set up the redirects there. So you would redirect from the old URL to the new one. And that’s something that our systems follow. And that should, should for the most part, just be fine.
Webmaster 8
I have a quick question about images, which seem to have been a little bit of a hot topic this week. Okay. So, um, if a site uploaded a number of images over the years without being aware of like the importance of file sizes, and it slowed up the site, and then someone like myself pointed out to them this issue, could they go back and optimize the same image uploaded, again, as a smaller file and use the same original alt attribute? Or do they have to change the old title,
John
They can just change it. So it’s, it’s something where, essentially, when we, I mean, what happens in practice is, with images, we tend not to recrawl them that often. So it might be a while for us to actually recognize that the image file has changed. Even if it’s just a matter of re-compressing or kind of tweaking the settings. It might take a while. And we would tend to pick that up based on the crawling of the web page.
So kind of the landing page, if we recall that often, then we’ll see Oh, this image is still embedded here. And every now and then we’ll double-check the image file to see how that is set up. And if the image file changes, then the image file changes. And we can kind of reprocess that. If the old text changes on the page, we can also include that a little bit faster, because the HTML pages tend to be re-crawled more often.
If you need to have an image changed quicker in search, maybe there’s something in the image that you don’t want to have shown in the search results, then I would recommend using a different URL for that image. So that when we crawl the HTML page, we clearly see Oh, this old image is no longer on this page. But there’s a new image here, we should check out this new image URL.
And this is basically to just make it faster. Yeah. Sites faster. So yeah. And with regards to speed, it’s not something where we need to have the kind of the optimized image indexed. But that’s something where essentially, for the core web vitals, we use the real user metrics, so we see kind of what people are actually seeing on the page. And that’s independent of what we have indexed for that page. Okay, great. Thank you. Sure. Carmen.
Webmaster 9
Hello, John. Good morning. I saw that I’m practically working on the website. And the website is through Twitter. It is a company in the finance sector, specifically mortgage and two questions. So building the new structure, I have two options. So one is build the hierarchy using pages is a CMS in WordPress. So I come to have different levels for my main products and have a kind of categorization by using pages.
Then I have said option B which could be to rebuild our CMS in WordPress and have a built-in categorization. That is my choice. And however, yes, I have these two options at the moment. So the two questions are what will be the main impact by using pages or their categorization and building a new category on the CMS? And the second question is how this is if, if I do if I rebuild the CMS, what will be the main benefit of following the ehg best practice?
John
So I think, first of all, I would not worry about the kind of pages versus posts and categories. Question from our point of view, we see all of these as HTML pages, and how they’re classified within your server and within your CMS is essentially totally up to you, it’s not something that our algorithms would care about. So from that point of view, I would choose whichever setup is easier for you to maintain is easier for you to make sure it’s fast and usable, all have kind of the almost like the non SEO factors, and focus it based on that.
And that’s something where sometimes using WordPress makes sense. Sometimes using another CMS makes sense, sometimes a kind of a mix makes sense. But that’s essentially purely on your side with regards to how you can maintain the content, how you can make sure it’s fast enough to secure all of those things.
The one place where I think I would keep an eye out on is if you’re migrating from one setup to a different setup, then that’s something where you essentially have to think about all of the usual site migration things. So kind of making sure that you know which the old URLs are and the new ones, making sure you have all of the redirects in place, fixing the internal linking, all of that kind of plays into that as well, as well as everything that is on the page as well, especially if you’re moving from one CMS to another, you will have things like headings are suddenly different, or the images are embedded in different ways.
And suddenly, you have, I don’t know, maybe captions below the image instead of just an alt text for the image. And all of those are essentially on-page SEO things that you can do well, or that you might do kind of badly. And then it’s less a matter of like, which CMS is better, but rather, which setup is the optimal one for you, and how are you migrating between those setups?
Webmaster 10
Hi, John. I’ve been in the left webmaster thing out also and asked you about our touch home page, which isn’t in the homepage anymore. I don’t know if you can remember also posting the question in the YouTube comments. And yeah, we couldn’t really find the reason why the specific page isn’t indexed anymore. I also checked troubleshooting reports on support google.com. And I just couldn’t find a reason why this one page isn’t in the index anymore. So maybe afterwards, you and your teammates could have a deeper look into this and…
John
Maybe you can just post that URL in the chat here. And then I can pick that up. I can’t guarantee that we’ll be able to look into it or that we’ll be able to do something there. But I’m hoping to try to see what we can find there.
John
Okay, um, let me take a break here with regards to the recording. It’s been great having all of your questions here. And I hope if you’re watching this on YouTube, you’ll find this recording useful, and maybe we’ll see you in one of the future hangouts as well. All right, that is pause.
Watch John Mueller’s Google Search Central Office Hours Hangout Most Fridays at 7:00 a.m. PT
If you want to watch and/or participate in John’s office hours hangouts, you can join in most Fridays at 7:00 a.m. Pacific Time.
Also, if you want to watch all of the past Google Search Central Office Hours Hangouts, you can view them on the Google Search Central YouTube channel as well.
We will be covering John’s hangouts every week they happen with our SEO insights. Stay tuned!