The interesting part of this particular episode of Search Off the Record was how Gary alluded to certain “magic” ranking factors.
During this podcast, Gary referred to Google’s usual ranking factors as “magical ranking factors” because they don’t want to reveal too much information about how they rank websites (for obvious reasons).
Basically, he said:
There are Hundreds of Signals Used to Rank Results
He detailed how they use basically hundreds of signals to rank results for a specific query. He explained signals like topicality, relevancy and how it’s based on the query and the content of the page. He also explained that search engines don’t return all the results all the time.
The reason why is because there might be billions of results for a given query. Nobody is going to go through all of those results, so Google has to devise a method of choosing which ones are the best to display to the user.
A Filter is Used to Create a Numbered Limit of These Results
With such a glut of results, working with a numbered limit would be more reasonable, something like 1,000 or 10,000 results. The first thing they do, he said, is start ranking the pages, and then applying signals to an indexed sample of pages, such as PageRank, topicality, and so on to create the ordered list of results. These results are used to create a reversed ordered list of the results they want to send to the user from the index. After creating the reverse ordered list, they create something that shows whatever number of results they want to display actually is: perhaps 1,000 or so.
After that filter, they then have only 1,000 results to work with. However, there are still tweaks and refinements that need to happen at this stage. He refers to this layer of ranking as the place where the Google magic happens.
Rank Brain is Part of the “Magic Signals”
He explained that they now have to reorder the documents so that they are actually relevant to a user’s query. This is where Rank Brain is applied. Gary referred to these as the magic signals, or magic algorithms. They do the heavy lifting when it comes to ranking that results set.
But at this point,they are still only working with the 1,000 sample sites they randomly selected earlier in the process.
Google May Add a Reward Multiplier if a Site is a Better Result for That Query
At the end of the ranking mechanism, Gary explained that they will promote a certain result using a reward multiplier—this is especially true if a result is exceptional for that particular query.
Conversely, There is a Removal Multiplier
Gary also said that there is a removal multiplier applied when they want to remove results from the set, which is basically zero.
What if More Than One Page Has the Same Number?
John asked the hypothetical question, “what if more than one page has the same number?”
Gary explained that the possibility of that happening is remote. Extremely remote. He explained the role that HTTPS plays as a tie-breaker in these scenarios—if a set of results are tied, HTTPS (or Core Web Vitals) would be the tie-breaker that helps these sites.
They designed it so that it would work all the time. However, the signal will not rearrange the final result set unless there is an actual tie. What it will do after that tie is boost one of the results—that result would be boosted with the reward multiplier mentioned earlier.
Machine Learning Takes Care Of Most Pretty Obvious Spam Attempts
Duy Nguyen, a guest on the podcast, explained as well that machine learning is smart enough to really identify and make sure that most spam attempts are taken care of automatically.
For example, simple spam like link spam, is—in general—already ignored by Google’s algorithms and for most simple cases, you don’t have to disavow those links.
By training their machine learning algorithms to spot easier attempts at spam, it’s possible to automatically filter out the spam they don’t want.
And as Google, we have all these signals, and all these data that we’ve accumulated and analyzed and studied over the years, so, you know, it’s entirely possible to collect that data to study it and build things like machine learning models to tackle spam. Machine learning model is interesting, because it has so many use cases that recommend music for you, you trust it enough to drive cars around so you don’t have to drive.
So building machine learning models for spam turns out to be a pretty natural step for us. So yeah, we have so much data around not just a search result, but specifically spam. So we were able to build a very effective and comprehensive machine learning model that basically took care of most of the obvious spam basically took over all the heavy lifting, so we can focus on more important work.”
Be sure to check out the podcast itself here, and you can also read the transcript:
Search Off the Record Transcript
Welcome, everyone to another episode of the search off the record podcast. Our plan for this series is to talk a bit about what’s happening at Google search, how things work behind the scenes. And who knows, maybe have some fun along the way. My name is John Mueller. I’m a search advocate on the search relations team here at Google in Switzerland. And I’m joined for this episode by Martin and Gary, also on the search relations team. Our guest today is Duy, from the search quality team. Duy, would you like to introduce yourself briefly?
Duy 0:42
Yes. I’m so sorry. Gary told me to do that. Thank you for having me. I’m Duy, I’m from the search quality team based in California.
Martin 0:50
Cool. So maybe it’s just me and me being the newbie on the team. What exactly is the search quality team?
Duy 0:58
So I mostly focus on the low quality and spam aspect of the work. But basically, we have a lot of signals. And a lot of new websites and pages we look at every day, how do we rank the high quality results while demoting low quality or spent one? So that’s basically the search quality team.
Martin 1:19
What brought you to a team that looks at the lower quality part of the web?
Duy 1:27
I’m a curious person. Before joining Google, I worked a lot of different roles. But I also work on marketing and search engine optimization. So I’ve always been interested in how search works, and how do you get information and make sure that people can find it? How do you tell search engines that Oh, hey, I have this piece of information? Can you rank it in a way that, you know, people can search for a query, and then my results would come with the relevant search queries? So I saw this job posting on Google, it looks interesting. If you know, people are doing SEO, and some people are doing bad stuff and spam search engine. How do you guys fight against that as the Google search engine of the world? So I was curious, and with my previous web development and SEO knowledge, I thought I could offer some help.
Martin 2:19
That is amazing. I think the first time I came in contact with web spam was I remember, the first thing I built with PHP on my own website was a counter like a visitor counter. And then the second thing was a guestbook. And Oh my god, it was a simple form that was sent back to the server and then saved into a database. I think, in the beginning, even just like a file. And within days, I had so many entries in my guest book, and I was like, whoo, so many people commented in my guest book, and it was awesome.
John 2:50
I’m sorry for that. Martin. I mean, it wasn’t me. It wasn’t me. Not. Not really. I mean, I don’t know. Gary, was it you?
Martin 3:00
It was probably Gary. Yeah. Yeah, it probably was me. Thanks, Gary. Thanks for ruining that website for me.
Gary 3:01
You’re welcome. Any time.
John 3:10
So I know…okay. Yeah. So So what did you end up doing? Martin? Did you add anti spam measures? Did you add signals?
Martin 3:19
Oh, my God, I wish I would be as smart as Duy is back when I wrote that. And I couldn’t figure it out. I basically just built in, like, a very basic version of the captcha and that are surprisingly, like drove away 95% of the spammers, apparently, because it was no longer a super simple target. But then I still got a lot of bad posts. And those I just decided to…
John 3:42
Oh, my gosh.
Martin 3:43
How would you actually detect that?
Duy 3:45
So for such low quality or spammy content, it’s relatively easy. If you’re a person and you look at a page full of gibberish, or in this case, guest books with spammy posts, you should be able to say that, emphatically Yes, this is spam within you know, seconds. Even if it’s more complicated with a trained eye, it should take less than a minute to determine if something is spammy or not. And as Google, we have all these signals, and all these data that we’ve accumulated and analyzed and studied over the years, so, you know, it’s entirely possible to collect that data to study it and build things like machine learning models to tackle spam. Machine learning model is interesting, because it has so many use cases that recommend music for you, you trust it enough to drive cars around so you don’t have to drive. So building machine learning models for spam turns out to be a pretty natural step for us. So yeah, we have so much data around not just a search result, but specifically spam. So we were able to build a very effective and comprehensive machine learning model that basically took care of most of the obvious spam basically took over all the heavy lifting, so we can focus on more important work.
John 5:04
So cool. So if that model were to run across Martin’s guestbook with all of these friendly non people visitors. What to do? Would it say like Martin is a spammer? Or like, how would you treat that website?
Duy 5:19
So for sites that were specifically built for spam, we basically demote them. So they never show up for relevant queries, for sites that are overrun with spam, you know, like give a mix of very good quality content and and part of your sites are being overrun by spam. We have manual actions to help the webmasters and let them know that Oh, yeah, this is happening, you should take care of that. Actually, we just published a blog post yesterday about this. So we will send you a notification. And then we would explain that portion of your site is really overrun by spam. Mostly, this is a guest book or forum, or the common section that you should just go in and nowadays is relatively easy to take care of spam, you have a ton of measures and tools, if you use CMS is.
John 6:03
So cool. So basically, if you ran across Martin’s guestbook, you would try to figure out who this person is, and send them a notification, I guess, finding out who the person is, is just based on things like Search Console accounts, like if Martin is verified in Search Console. And he still had his guestbook running, and it was over on like this. And we noticed that we would say, hey, Martin, clean up your site.
Duy 6:30
Yeah, we shouldn’t punish the entire site that Martin’s been building, because one of the pages is spam, right. That’s something we really care about the good content. So yeah, we would try to ask Martin to clean it up and help him verify that.
Martin 6:45
So it was a smart move that my guestbook content was on a separate page. So you would like kill that page, never show it in search results. But all the other fantastic content that I’m sure I had back in the day, like lots of cat images and stuff probably has been fantastically well ranked?
Duy 7:03
Yeah. So it’s, you know, even more important that you are verified in Search Console. So if we detect that your site is hacked, or part of it is being abused for spam, we can let you know immediately. We’re pretty good with that.
Martin 7:15
It happens a few times to me as well, actually, in the past with different websites. Yeah, that’s a really nice tool. I think,
John 7:22
What do you do with your website, Martin?
Martin 7:26
I had lots of comments on my blog, I had lots of fantastically well written content on my blog. Basically, I know that it’s fantastically well written and useful because I wrote it for myself because I forget things. So like, I learned a thing, I figured out how to do something. And then I wrote a blog post. And like six months later, I would Google for the thing, and then my blog would come up. But when I had comments enabled, I had the exact same problem, I really could not be arsed to implement some sort of spam fighting measures. So I got a manual action at some point. And I was like, Ah, okay, that’s annoying. So I disabled.
John 7:58
Okay, it’s like, wait a throw everything out just because of a little bit of spam.
Martin 8:03
That’s why we can’t have nice things.
John 8:05
Yeah. So I’m kind of curious, like, how did you recognize it was spam? Did you realize at the time that it was around SEO? Or did it just look like why are these bots posting on my site?
Martin 8:18
I mean, I had a tech blog and the comments were like, Hey, you want to buy cheap Viagra? And I’m like, Hmm, maybe they haven’t actually read the article. Maybe they just like are not even human. So he said, like, you get the feeling and with a trained eye, I could determine that to be spam.
John 8:38
So cool.
Martin 8:39
Now, I don’t know how cool that was really, like it was not cool. Yeah. Don’t do spam kids!
Gary 8:44
Not even when you say trained eye do you mean that you train yourself to spam. And that’s how you’ve recognized, others spamming or…?
Duy 8:55
I would say that if you just look at spam a few times, and you immediately know what it is, it’s really not difficult. Oftentimes, it’s just gibberish, or a bunch of keywords stuffed together. Like what do spammers really think that this is 1998? We’re still building web pages with Frontpage or Dreamweaver and adding 500 spammy keywords would make it do well in search engines in general, I’m very curious to understand like, why they think that would work. But it should be very obvious.
John 9:23
So do you think a lot of this is just old scripts just being run on autopilot? Or how do you see that?
Duy 9:31
I think it’s just lazy. I think they just basically, they want to rank a piece of content, maybe it’s low quality, maybe it’s spam. But then instead of you know, spending that time to write a relevant article for it, building a site with good user experience with categories with tagging with all these helpful user experience. They simply write a script to generate something very fast and simple, and just spread around hoping that one of these would rank well in Such and make them some small money.
Gary 9:55
I mean, this is very similar to the phishing and scam attacks on email. Because there when you read the emails does come emails about the Narnian Prince sending you an email about heritage and whatever word inheritance. When you read it, it’s very obvious to you that maybe I shouldn’t reply perhaps, but then I don’t know one person out of 1000 will actually reply with data that they can use to make some money out of, if you have a script that’s running on autopilot and spamming the web, quite literally, the whole web. Then, if you make money out of, I don’t know, we did one spam comment that you left out of? I don’t know, millions of comments. There’s your money. And you didn’t invest all that much into it?
Duy 10:55
Yeah, I think that the fact is that there’s a lot of people trying to spam Google. Actually, we publish every year that, you know, we for the most recent figure we had was every day we discover about 40 billion pages that are spam, there’s actually not a typo. It’s billion with a B. And when users search stuff on Google, less than 1% of search queries would land on a spammy page. So a lot of work went into the testing and removing those spammy queries. So at the end of the day, I think it’s still possible to do that to prevent these spammy and manipulative content from reaching users. And that’s what we focus on every day, we have enough signals and solutions to deal with these.
John 11:43
Now, when I read that number of the number of kind of spammy pages we discover every day, really blew my mind. It’s, it’s amazing to see how much energy must be put into just creating a giant mass of pages. And it’s really cool to see that the quality of the search results is extremely high. It’s like every now and then I will run into something where I’d say, Oh, this is pretty spammy, or like this is redirecting, or this is obviously, someone’s expired domain. And they’re hosting something else on it now. But it’s really rare that I run into these kinds of things on like, normal searches that I do. So that’s like, I don’t know, I don’t know how many gazillion people you have working on this. But it’s, it’s been, it’s been working really well, I think,
Duy 12:32
yeah, that’s an Asian proverb. I’m not sure how to translate this. But it literal translations, like, the more you sweep the floor, the more dust you find. So if you really try to find spam, you may come across that. But overall, even mean, my personal day to day uses of Google search, I really don’t come across spam all that much. And that’s what we strive for less than 1% of whatever you search for, should be, you know, surfacing some best stuff. And more than 90% of your time should go towards like high quality and relevant results
John 13:04
now. So What kinds of things do you still find problematic on the web? Like, where does our impact kind of come to its limits? Or I don’t know how to frame it.
Duy 13:15
I would say hack Spam is still a problem for the ecosystem. Many sites still run on older versions of CMS, or you know, they use outdated plugins or templates, if you think about it. Well, personally, I don’t know anyone that still runs Windows with stats. And if you have friends, I still run Windows Vista, as you probably touch them, right. So can we do that as the web ecosystem, if people still run really outdated CMS is can we help them to get on, you know, a version that is extremely more secure. A lot of the hacked spam that took place today is barely any hacking a lot of the tools and scripts that, you know, people discovered like, five, six years ago, sometimes still being used today to exploit websites, especially like older websites. I think at the very least, we should make it a lot more difficult for the spammers to hack into sites and spread spammy or malware content. Because when users visit your website, like if they visit Martin’s tech blog, they don’t expect to walk away with ransomware or malware. I think we have enough resources and cooperation in the ecosystem to make that happen. I really look forward to that.
John 14:30
Yeah, I think that’s super tricky, too, because I don’t know how other people run their websites. But for me, I would put up a new blog and put some content on it because I thought like, Oh, it’s like, I spend a lot of time write a lot of stuff here. And then I just never get to it. And then it just keeps running and running and running. And if you don’t activate the automatic updates, then suddenly you’re running this old version. You’re like, I don’t really have time to deal with it. And you don’t even realize what is happening there. And I imagine At mass, like looking at, like the bigger part of the web, there are lots of smaller companies that just have their site kind of running like that, where it’s like, oh, like people can find my phone number. And that’s good enough. And they don’t realize that they’re potentially causing a problem for the bigger web, just by keeping things running on something that is essentially outdated.
Duy 15:22
I would also say that the very least they can do in those situations is to sign up for a search console, because then they would have more data, you know, where they would realize that Oh, yeah, running this very old version of CMS, really hinder the site’s potential, maybe, you know, it’s just a whole lot slower. If you have a bunch of improvements that Search Console, say you should do, which is extremely difficult. So now, suddenly, they realize there’s a lot more incentive to keep the site’s up to date. And obviously, you’re to sign up with Search Console, we find hack or any problems, we will notify you immediately that we’re pretty fast. And we’re pretty effective at detecting hacks. So yeah, that’s the least you can do. And hopefully, by signing up for a search console, you find more incentives to keep your site up to date, do all these improvements that in the end would benefit users a lot?
John 16:12
Yeah. I don’t know, Martin, were you signed up for search console in the beginning, or when when did that come up on your radar?
Martin 16:18
It didn’t happen in the beginning and happened eventually, when I grew the blog a little more. And I noticed that like a lot of people are actually using it. And it did show up a bit in search, I was like, maybe I should actually sign up for search console to get like more insights into it. And I didn’t even realize that that would help me with things like manual actions and hacked issues. I didn’t really have that many hacked issues, because it is a static site. So that’s nice. But yeah, I think it’s a pretty useful tool. But I doubt that every so I think like a lot of the old websites that are around are probably from people who aren’t necessarily primarily concerned about the website, I don’t know, like the bakery around the corner, the cafe in a small town, they want to have a website in case someone needs to discover them or wants to find out something like what’s on the menu today or something like that. But they don’t necessarily care enough to update or know enough to update their CMS. And they probably also don’t really are the main audience for a search console. So that can be a little tricky. I think with a bunch of CMS communities, we are now working on closing that gap, things like site kit, right, that tries to get the key information into the CMS control panel, so that you don’t have to go to third party tools.
John 17:38
So I think that’s a pretty cool approach. Yeah. I also moved my whole site to a static site as well. Partially, also, because of all of the hacking things, I mean, I had everything set up to automatically update. And every week or so I’d get an email saying, Oh, we installed another update. And at some point, it was like, I get so many emails that my blog is updating automatically. And none of my content has been updated for 10 years. So it’s like, I might as well just make a static site out of it. And also kind of protect from the hacking angle there. I don’t know from one of the things that I think came up to me out of all of the hack discussions and seeing how things are hacked on the web, from Google internally and from the help forums is that, especially for smaller businesses, it makes sense to kind of offload all of that and just say, like, use some hosted platform instead of trying to host your own website. Do you think that would protect against most of these hacking things? Or what additional things should small businesses kind of do?
Duy 18:37
Yeah, I think that would be a good solution. Actually, we just published a number that in 2020, we sent over 140 million messages to site owners in the Search Console. That’s a lot more messages than previous years, right. And the bulk of that was from sites that were coming on to search console for the first time. So a lot of businesses because of pandemic or whatnot, realize that they need better online presence. So suddenly, they invest a lot more into, you know, building the website, even simple things like menu, were suddenly updated a lot more frequently, or now you can order online to pick up or get delivered. And I noticed they also worked with a lot more hosted platforms. So I think that’s a good solution. If you don’t have your dedicated team to manage your websites or social media presence, you can go with the hosted platform, and that probably take care of a lot of the overhead.
John 19:31
Cool. So what about your site, Gary? Like when are these recipes going online? And what would happen if someone were to hack your recipes and change the butter for mayonnaise? Oh, no.
Gary 19:43
That would probably work actually, in many cases. But my recipes are online. Actually. I keep them in Apple notes. And they are kept in the cloud.
John 19:53
Oh, I mean, like, publicly online. Where can we share your password with The podcast listeners,
Gary 20:01
I would prefer not to share my password for reasons I have stuff recipes that I would prefer not sharing. At least not yet, because I have to perfect them. Like the dorayaki that I was eating. Before we started recording this podcast that was one of those recipes, which I took the original Japanese recipe and I made it. Well, different
John 20:28
Swiss cheese. Yeah, I added Swiss cheese. I’m glad that you’re not obsessed with cheese. Cool, what other things? Do you kind of like, What keeps you awake at night? Do we, when it comes to search quality? Don’t say the cats are learned that’s the wrong answer.
Duy 20:45
I’ll try not to. Be other bits that can keep me up at night is a scam. There’s a lot of scams going around all sorts of scam. But for example, customer support scam used to be a popular one, if you’re looking for Gmail customer support number, a lot of people try to rank for that and would publish a false number to make you call them. And then for some reason, somehow, at the end of the conversation, you’d be sending the money or buying them gift cards, there’s a lot of YouTube videos you can find about how the scam works. So we did a lot of work into preventing that we were able to protect hundreds of millions of queries since 2018, that going to customer support queries, we basically devoted most of the scam there. But I think there’s also the awareness part, you know, people if they’re similar to very pure spam, we’re very obvious spam. If you no such scam exists, you probably won’t fall for that. A lot of time people, you know, click on these sites or call these weird numbers, maybe because they don’t know that is a possibility that someone were out to trick them. But apparently, that’s a common problem that government departments like the IRS do with that all the time. And yeah, now it’s, uh, it may spread to other parts of the web. But we’re doing a lot to prevent scam from reaching users. But users should also research more to protect themselves against scams, it’s relatively easy. Once you know that. And once your family members know that, it probably won’t happen.
John 22:18
I don’t know for a while they were quite visible in the search results, but it feels like I haven’t seen them for a really long time. So I don’t know what you’ve been doing. But it seems to be working. Pretty cool. One question I always get where maybe you have some insights or some tips as well is what if, like a competitor of mine, or doing something kind of spammy? where maybe they’re just like keyword stuffing on their pages, or they’re creating some kind of a doorway page. And I know that this is spammy? Because I read the webmaster guidelines. And my competitors getting away with it. Like they’re ranking right above me. What could I do there is is that something where I can report them to? I don’t know the spam police and they’ll take care of it for me? Or what are the options? Or is it even like something where I can do anything about it?
Duy 23:09
Yeah, I would say that a lot of times, maybe the competitors is not necessarily ranking well, because they do spam, that there are so many factors. When it comes to ranking. I’m sure Gary will touch on them. But if you’re really concerned about that, you can report them to us, we have a spam report that we review pretty frequently. So yep, please send us a spam report. You can also seek help in the support forum, the Webmaster Help Forum. And then yeah, we will also be able to take a look.
John 23:41
Yeah. So cool. Yeah. I don’t know. It’s, I always feel a bit sorry for people who are seeing that kind of thing where they’re kind of almost like, stuck in in a situation where they’re thinking, well, maybe I should be spamming as well, so that I can rank above my competitor who is spamming. But that always feels like a bad idea.
Duy 24:02
Yeah. If everyone was doing that, then where does that leave the users? Will they have, you know, good user experience and good content to consume? I really don’t think that’s a solution. I think everyone should be focusing on doing what’s right and doing what’s best for not just your website, but for your users. If you focus too much on a single metric, or something that you think that would, for some reason propel your sites, most of the time, it would lead to a pretty negative outcome. Yeah,
John 24:35
I think it’s also, like you said, kind of like one of those things where, like, you don’t even know if it will actually help your site and potentially, it’ll just harm your site. And then you’re just digging a bigger hole for yourself rather than working on something positive for your website to improve things for the long run.
Duy 24:55
Yeah, an example that we observed was webmasters are spammers tend to focus on improving one or two particular metrics that are external that we absolutely do not use. They, for some reason think that if they put a lot of time and money in improving such scores, it would perform really well on Google search. I’ve never seen a case where that actually worked well. And I find it, you know, pretty sad, right? Because if all of that time and money were spent on building up the website, with better user experience, more functionality, writing better quality content, producing high quality images, they probably do a lot better on search engines, obviously, a lot more sustainable for the site itself.
John 25:42
Yeah, the one area where I kind of see where people I don’t know, use that almost in a reasonable way, is when it comes to monetizing their site, where they just want some externally visible metric to go to some advertiser and say, like, Look, my site is actually pretty reasonably placed. And if you spend some money with me, then I can get your message out to a broader audience. But it feels like sometimes, I see people in the forums just saying, like, I want to improve this metric, they don’t really want to kind of focus on the site overall. They’re just like, I just want to change this number from seven to 25. Like, why is it doesn’t doesn’t change much?
Duy 26:33
Yeah, I love data myself, I think, you know, the more data you have, the better you would be at your role, whatever that may be. As a site owner, or a online marketer, I think it’s really great to have a bunch of metrics that you monitor and measure and try to improve. As long as you don’t focus on one thing. As a site owner, I used to look at, you know, bounce rate and time spent on pages all the time, for example, to know which content that are really hitting it off with my audience so I can improve more. Or if for some reason, I find that nobody really discover our contact or support pages. Why is that? Do we have a problem there? If people need to contact us? Maybe we should just put it somewhere else? We write better content. So yep, as long as you don’t focus on one single thing, because we have hundreds and hundreds of ranking signal. Focusing on one thing doesn’t mean you will improve it across the board, and we will rank your site better.
John 27:27
So how many rankings signals? I don’t know. Gary, what do you think, which is a signal like people should just focus on like, just pick one?
Gary 27:28
JM Authority.
John 27:29
The JM Authority? I haven’t heard of that one. You haven’t? I’ll have to Google that.
Martin 27:45
I think it involves the meta meta cheese tag, if I remember correctly…
John 27:48
The meta cheese tag. Oh no, I just wish search engines would serve cheese instead of pages.
Martin 27:57
I’d be down for that.
Gary 27:58
That’s a horrible idea. But like, that’s one of the worst ideas that you’ve ever had. Actually, I will go as far.
Martin 28:04
That’s something to think about for Steve.
John 28:08
I don’t know is like, I mean, cheese smell is kind of bad. But it can be okay. Right?
Martin 28:15
Smells like a big seller for Steve 3.0.
John 28:16
Steve 3.0: serve cheese. Not pages.
Gary 28:21
But if you only serve cheese, then you’re not going to find recipes about how to prepare the cheese, how to prepare food that contains said cheese that you found. See. Something to think about?
John 28:35
Okay, so tell us a bit about serving Gary, how does it actually work?
Gary 28:40
I was telling you about that for the past four episodes or three episodes or whatever. Come on, you’re not listening. Again. Cheese.
John 28:49
Cheese. Yeah. Cheese!
Gary 28:53
So ranking is one of those topics where we don’t want to say too much. And that’s on purpose. But if you ever went to information retrieval class, then you heard about ranking there and the public version of it. Because ranking results is actually just math and figuring out first, basic relevancy. And then some magic that when that’s the part that we are not talking about, basically, that’s the magic part is the it’s card tricks, really card tricks…
Martin 29:10
Sometimes conflicts.
Gary 29:15
Thanks, Martin. That was very useful. Every search engine has its own kind of magic that they are using that they were brewing for years. And typically the part that we don’t want to talk about, but the other part, the thing that you can hear about in computer science classes, bits, actually, there’s no reason not to talk about it, because it’s kind of public.
So once you had the query, and you found the bunch of results in your index, then you will have to start ranking those results that you found before presenting them to the user. And that is done. I mean, I’m waving my hand here, we’re using hundreds of signals. But some of them are quite obvious. One that we typically refer to as topicality, basically relevancy, that is based on the query and the content of the page. So for example, if you are writing about cookies, I don’t know, lemon coconut cookies, then if the query contains those terms, then your page will be in the retrieved set.
Now, search engines don’t return all the results all the time, because for some queries, like for example, cookies, just the term cookies, there might be billions of results, quite literally. And there’s no reason to return all of them, because there’s no person who is ever going to go through all of them. So what we need to do, Don’t make that face because really, there is no person that is going to go through billions of results. So what we need to do is limit the number to something more reasonable, say 1,000 to 10,000, or whatever the number is, at the moment, how do you do that, basically, you start ranking the pages, you apply the things that you could collect your indexing, the I don’t know PageRank, for example, again, topicality and other simple quality signals that you can use already to create a reverse order list of the results that you intend to send up to the user from the index. And then once you created that list, the reverse ordered list, then you make a card that whatever the number of results that you can present, is, say, 1000.
And then from there on, you only have to work with those 1000 results. But those results are not finished with yet. Meaning that you can still tweak those results to make them better. And that’s usually where the magic is. And those are the signals, the magic signals that we still apply on the result set to make them better for the user’s query. You probably heard of rankbrain, for example, as the listener, I hope he or everyone else heard about rankbrain.
Martin 32:44
Yeah, I know. Nothing was like rankbrain. I have heard that before. Yeah.
Gary 32:45
Okay. Okay. We don’t have to fire you.
Martin 32:46
Fun. Fun fact, actually, I’m not allowed on a bunch of the documents that you guys have access to just say, okay, snap.
Gary 33:02
Anyway. So we still have to reorder the documents to make them more relevant to the users query. And that’s where we would apply, for example, rankbrain. And these magic signals, or magic algorithms, they can still make massive changes in the result set, but they are only working with the 1000 that we already presented to them. Now, ranking is number based basically, for each result, we will assign a number and we calculate that number using the signals that we collected during indexing plus the other signals. And then essentially, what you see in the results is a reverse order. Based on those numbers that we assigned the magic signals or magic algorithms that we use, like rankbrain, what they do is multiply those numbers that we assigned to each result by a number, like for example, if they want to promote result, because it was determined that that would be a better result for lemon coconut cookie, then let’s say that it would multiply the results score by to basically doubling its score, which means that it will jump up in the result set. I have no idea why I’m gesticulating with my hand because no one can see.
Martin
It looks great, though. It looks great at this stage.
Gary
Thanks, Martin. very disturbing. You’re doing well. Anyway, if we wanted to remove a result from the set, for whatever reason, we could multiply it score with zero, because then that will turn the score to zero and then like a score of zero. Why Why would you present it?
John 34:45
So just to pause you for a moment. What happens if multiple pages have the same number in the end? It’s like, Is it like a someone’s throwing dice?
Gary 34:57
So that almost never happens? It’s highly unlikely that you would see a result that has this the same score as another result, if that happens then is it will be just fluttering, basically one jumping up and down. As you refresh the page, basically, they will just switch positions. But I think it’s highly unlikely that will happen. We have all these algorithms that are actually kind of designed to like cut ties, like that HTTPS boost, which is one of those magic algorithms that would cut this tie. Like if one of the pages or one of the URLs is HTTPS, or starts with HTTPS, then that would get a tiny tiny boost to actually propel it a little further up in the result set. I actually recently got asked if the tie breakers are only applied in those situations, or if they are always applied, they just don’t have such a large impact and other car always applied. They are actually but they are designed to not have large impact. So I can only speak for HTTPS ranking boost, in this case, as a tiebreaker because I mean, we created that with teammate was going up anyway. And their design was or we designed it such that it will work all the time. But it will not rearrange the result set unless there is a tie. And then it would boost one of the results, while all the results actually, but it would be more visible for the results that had a tie. And it actually happened quite often in some locales more often than others. So remember that in Hindi, India, in the India as in like the country language beer, it happened the 17% of the cases or something like that. So it was actually pretty active in some locales, so cool. And less so in others.
John 37:01
Okay. So I guess these are like real numbers, not not integers, where it’s like, you have seven and the other one has, ah, I want to say these are floats. Okay, so it’s like, there’s a lot of room, I guess what I’m saying in the search results, it’s not that, like, there’s just like one number jump. And if you can only get that one extra point, then you rank higher. No, these.
Gary 37:23
So I was actually looking, I think you sent an email about cycle inquiries. And then I was doing some query debugging. And I was looking at site: johnmu.com results. And the, the scores were like zero point, like the top score was like 0.76, or something like that.
John 37:45
That’s like a bad site.
Gary 37:47
No, it’s like for the results. And that was a top score. It doesn’t mean anything. Basically, the scores are relative to the result set, not to the whole index. And then the second one would have had, like 0.74. But then some one of these magic algorithms kicked in and boosted a little so it migrated one up.
John 38:11
I got the extra HTTP ranking boost. Of course, it’s boosted up now. It’s like ranking number one. I still don’t know which queries by HTTP, or HTTPS. Okay.
Gary 38:23
I think for your site, we should design an FTP ranking boost.
John 38:27
I should move to FTP.
Gary 38:29
No, that’s the article that you wrote about hacking FTP. And I still remember that.
John 38:34
that’s a long time ago. Sure.
Duy 38:37
So what happens when everyone’s on HTTPS? Do we prefer sites that serve better cheese?
Martin 38:44
The cheese tiebreaker.
John 38:46
Now, it’s like…
Gary 38:48
Unfortunately, Matt is not around anymore. So I’m not sure that we could launch that. But I would decide that
John 38:54
I mean, we can launch cheese. That’s easy. Like with a catapult or…
Martin 39:01
Like a catapult or…
Gary 39:02
Yeah, we could go up to I don’t know blacktop or something and just launch cheese down from there.
John 39:08
They’re already shaped like a projectile. So I think that’s possible. We could do a literal moonshot.
Gary 39:17
So this is the worst ideas ever podcast.
John 39:19
Okay. I don’t know. I think Google is missing out by not having me as like a product manager who’s making decisions here.
Gary 39:23
No, I think I think that’s the right decision.
Martin 39:29
I mean, launching cheese has actual impact.
Gary 39:33
That hurt. No more cookies for you.
John 39:36Gary 39:40
What my cookies?
John 39:44
You said that person? If you get hit by cheese, it will hurt.
John 39:50
Oh my gosh.
Martin 39:52
Depends on the kind of type anyway…
John 39:54
Can we just stop? Okay, maybe we should take a break here. Too much. She’s way too much. It seems like people are getting hungry. You have more. Tell me more, Gary. What did we miss?
Gary 39:55
What did we miss? We miss cookies.
John 39:59
Cookies. Yeah. And kale. It’s like by not going to the office. You’re missing out on a lot of kale. Like imagine all of the kale that you didn’t eat because you weren’t in the office.
Gary 40:21
And I’m much healthier because of that.
John 40:22
Wait.
Martin 40:23
Because you’re not eating kale?
Gary 40:25
Yeah, like mentally.
Martin 40:28
Oh, okay.
Gary 40:30
It improves my mental health if I’m not eating kale.
Martin 40:33
Yeah. Hey, I got that. I just you know you do you man.
John 40:38
Martin’s missing the kale. I see.
Martin 40:41
No, I’m absolutely not. I really am not.
John 40:44
Duy are you missing kale? Or do they have kayla now? Is that just like a Swiss thing?
Duy 40:43
They do. And I’m sorry. I’m on Gary’s side for this.
Gary 40:45
Well, that’s, that’s a first.
John 40:50
Okay, cool. Interesting, then let’s take a break here. Thanks for taking the time to join us here, Duy. If people have any questions around search quality, or kind of comments or ideas for search quality? Where should they go?
Duy 41:11
Yeah, thank you for having me. You can file a spam reports, we are really happy to see them. We review them very frequently. If you have problems or if you think you need help with something, you can always go to the Help Forum, the Webmaster Help Forum and yeah, those feedback will also reach us. So cool.
John 41:29
Yeah. If you’d like to help arms is always a good place to go if you don’t know what to do, because also the experts there they can kind of guide you in the right direction where they realize that actually you should be filing a spam report or actually, you should be contacting Gary directly on Twitter. Then they can point you that direction.
Duy 41:49
That’s always a good thing to do. Yeah. No, he’s way too free. Now. It just launched cheese. No, no, just don’t don’t send any cheese for kale. I love cheese. I don’t like kale.
John 41:55
Oh, my cool. Well, it’s been fun doing these podcast episodes, and I hope you the listener have been enjoying them as well. At any rate, let us know how you’re liking these. If there are any topics that we should be including in any of the future episodes like drop us a note, send us a comment on Twitter, or comment wherever you comment on podcasts. And of course, don’t forget to like and subscribe. So with that, thank you and until next time, bye everyone!
Catch The Search Off the Record Podcast!
You can catch the Search Off the Record podcast here.
This is a new series being done by the Search Quality Team where they dive deep into search—at least as is reasonably possible without revealing all of Google’s magic secrets.
Be sure to catch every episode!