Over on Twitter, one SEO professional asked John Mueller about PageRank. For those who don’t know, in SEO, PageRank is Google’s algorithm for measuring website “value,” developed by and thus named after Google’s co-founder Larry Page.
Their question was: they had a debate last week that they could not settle. Is Page Rank (link juice) divided among all dofollow links on page or only among unique dofollow links?
They noted that Google provided no specific answers, either.
John suggested looking at the documentation that talks about PageRank, more specifically, the documentation that exists over at Wikipedia.
I'd check out the docs about PageRank, https://t.co/yv12xrlkWE has a nice overview. It's a pretty well-documented algorithm.
— 🐝 johnmu.csv (personal) 🐝 (@JohnMu) June 23, 2022
What Does the Documentation About PageRank Say?
If you haven’t done so before, there is a lot to digest in the Wikipedia article. But, the equation-less portions of the article say the following:
The PageRank algorithm outputs a probability distribution used to represent the likelihood that a person randomly clicking on links will arrive at any particular page. PageRank can be calculated for collections of documents of any size. It is assumed in several research papers that the distribution is evenly divided among all documents in the collection at the beginning of the computational process. The PageRank computations require several passes, called “iterations,” through the collection to adjust approximate PageRank values to more closely reflect the theoretical true value.
A probability is expressed as a numeric value between 0 and 1. A 0.5 probability is commonly expressed as a “50% chance” of something happening. Hence, a document with a PageRank of 0.5 means there is a 50% chance that a person clicking on a random link will be directed to said document.
Assume a small universe of four web pages: A, B, C, and D. Links from a page to itself are ignored. Multiple outbound links from one page to another page are treated as a single link. PageRank is initialized to the same value for all pages. In the original form of PageRank, the sum of PageRank over all pages was the total number of pages on the web at that time, so each page in this example would have an initial value of 1. However, later versions of PageRank, and the remainder of this section, assume a probability distribution between 0 and 1. Hence the initial value for each page in this example is 0.25.
The PageRank transferred from a given page to the targets of its outbound links upon the next iteration is divided equally among all outbound links.
The PageRank theory holds that an imaginary surfer who is randomly clicking on links will eventually stop clicking. The probability, at any step, that the person will continue is a damping factor d. Various studies have tested different damping factors, but it is generally assumed that the damping factor will be set around 0.85.
The damping factor is subtracted from 1 (and in some variations of the algorithm, the result is divided by the number of documents (N) in the collection) and this term is then added to the product of the damping factor and the sum of the incoming PageRank scores.
So any page’s PageRank is derived in large part from the PageRanks of other pages. The damping factor adjusts the derived value downward.
The difference between them is that the PageRank values in the first formula sum to one, while in the second formula each PageRank is multiplied by N and the sum becomes N. A statement in Page and Brin’s paper that “the sum of all PageRanks is one” and claims by other Google employees support the first variant of the formula above.
Page and Brin confused the two formulas in their most popular paper “The Anatomy of a Large-Scale Hypertextual Web Search Engine,” where they mistakenly claimed that the latter formula formed a probability distribution over web pages.
Google recalculates PageRank scores each time it crawls the Web and rebuilds its index. As Google increases the number of documents in its collection, the initial approximation of PageRank decreases for all documents.
The formula uses a model of a random surfer who reaches their target site after several clicks, then switches to a random page. The PageRank value of a page reflects the chance that the random surfer will land on that page by clicking on a link. It can be understood as a Markov chain in which the states are pages, and the transitions are the links between pages – all of which are all equally probable.
If a page has no links to other pages, it becomes a sink and therefore terminates the random surfing process. If the random surfer arrives at a sink page, it picks another URL at random and continues surfing again.
When calculating PageRank, pages with no outbound links are assumed to link out to all other pages in the collection. Their PageRank scores are therefore divided evenly among all other pages. In other words, to be fair with pages that are not sinks, these random transitions are added to all nodes in the Web. This residual probability, d, is usually set to 0.85, estimated from the frequency that an average surfer uses his or her browser’s bookmark feature.”
This is all older information, and may not reflect exactly 100 percent what Google does today, but is still useful to read up on if you have no clue about PageRank and how it is calculated.
It’s also worth noting that PageRank has not gone away. It has simply been moved to more of an internal mention so that Google doesn’t have to be public about it.
Googler Gary Illyes said as much in 2017 about this:
DYK that after 18 years we're still using PageRank (and 100s of other signals) in ranking?
Wanna know how it works?https://t.co/CfOlxGauGF pic.twitter.com/3YJeNbXLml
— Gary 鯨理／경리 Illyes (@methode) February 9, 2017