If you’ve done SEO for more than 5 years, you’ve likely heard of Page Rank (also known PR). PageRank used to be a topic much discussed by Search Engine Optimization (SEO) experts.
Google used to display a PR score for each website, but that was discontinued years ago.
However, the algorithm behind Page Rank still goes on.
DYK that after 18 years we’re still using PageRank (and 100s of other signals) in ranking?
Wanna know how it works?http://infolab.stanford.edu/~backrub/googl
What’s more, Google recently updated its patent. Here is a paragraph to show that PageRank isn’t dead.
A popular search engine developed by Google Inc. of Mountain View, Calif. uses PageRank.RTM. as a page-quality metric for efficiently guiding the processes of web crawling, index selection, and web page ranking. Generally, the PageRank technique computes and assigns a PageRank score to each web page it encounters on the web, wherein the PageRank score serves as a measure of the relative quality of a given web page with respect to other web pages. PageRank generally ensures that important and high-quality web pages receive high PageRank scores, which enables a search engine to efficiently rank the search results based on their associated PageRank scores.
You can check your site’s page rank if you want to.
What is Google Page Rank?
Google uses a citation-based algorithm that looks at the relationship between individual pages. While referring domains are important, it’s the referring pages that are important.
This diagram in Google’s patent helps us to see that Page Rank is calculated by the number of links pointed towards a page, as well as the number of external links from that page.
At the heart of Page Rank is a mathematical formula that seems scary to look at but is actually fairly simple to understand.
Web designers and bloggers should take the time to fully understand how Page Rank really works – if you don’t then your site’s layout could be seriously hurting your Google listings!
Page Rank is one of the methods Google uses to determine a page’s relevance or importance. It is only one part of the story when it comes to the Google listing and the other aspects are discussed elsewhere and Page Rank is interesting enough to deserve a post of its own.
Definitions
PR: | Short for Page Rank: the actual, real, page rank for each page as calculated by Google. This can range from 0.15 to billions. |
Toolbar PR: | The Page Rank displayed in the Google toolbar in a browser. This ranges from 0 to 10. |
Page Rank used to be displayed on the toolbar of your browser if you installed the Google toolbar. The Toolbar Page Rank seemed to be something like a 10-base logarithmic scale:
Toolbar PageRank | Real PageRank |
0 | 0 – 10 |
1 | 100 – 1,000 |
2 | 1,000 – 10,000 |
3 | 10,000 – 100,000 |
4 | and so on… |
In short PageRank is a “vote”, by all the other pages on the Web, about how important a page is. A link to a page counts as a vote of support. If there’s no link there’s no support (but it’s an abstention from voting rather than a vote against the page).
Quoting from the original Google paper, PageRank is defined like this:
We assume page A has pages T1 … Tn which point to it (i.e., are citations). The parameter d is a damping factor which can be set between 0 and 1. We usually set d to 0.85. There are more details about d in the next section. Also C(A) is defined as the number of links going out of page A. The PageRank of a page A is given as follows:
PR(A) = (1-d) + d (PR(T1)/C(T1) + … + PR(Tn)/C(Tn))
Note that the PageRanks form a probability distribution over web pages, so the sum of all web pages’ PageRanks will be one.
PageRank or PR(A) can be calculated using a simple iterative algorithm, and corresponds to the principal eigenvector of the normalized link matrix of the web.
A Link-Graph and Seed Pages
In the recently updated Google patent you can see the following:
“FIG. 1 graphically illustrates a link-graph structure 100 of a set of pages on the web in accordance with an embodiment of the present invention. Link-graph 100 comprises a collection of pages which correspond to the nodes of the link-graph, and a collection of directed links between the pages, wherein these directed links correspond to the arcs of the link-graph. Note that each link is a directed connection from a “source” page to a “destination” page.
“As illustrated in FIG. 1, the collection of pages is classified into two subsets of pages: a set of seed pages 102, and a set of non-seed pages 104. Seed pages (hereinafter referred to as “seeds”) 102 form the “root” nodes of link-graph 100, which comprise: seed 106, seed 108, and seed 110. Although for simplicity FIG. 1 is described in the context of three seeds, generally the present invention can use much more than three seeds. Note that seeds 102 are interconnected with link 107 and link 109.
“Non-seed pages 104 include pages 112-130, wherein each page is either directly or indirectly connected to one or more seeds through the links in the link-graph. In one embodiment of the present invention, seeds 102 are specially selected high-quality pages which provide good web connectivity to other non-seed pages.
“More specifically, to ensure that other high-quality pages are easily reachable from seeds 102, seeds in seeds 102 need to be reliable, diverse to cover a wide range of fields of public interests, as well as well-connected with other pages (i.e., having a large number of outgoing links). For example, Google Directory and The New York Times are both good seeds which possess such properties. It is typically assumed that these seeds are also “closer” to other high-quality pages on the web. In addition, seeds with large number of useful outgoing links facilitate identifying other useful and high-quality pages, thereby acting as “hubs” on the web.
“One approach for choosing seeds involves selecting a diverse set of trusted seeds. Choosing a more diverse set of seeds can shorten the paths from the seeds to a given page. Hence, it would be desirable to have a largest possible set of seeds that include as many different types of seeds as possible. However, because selecting the seeds involves a human manually identifying these high-quality pages, the total number of the seeds is typically limited. Moreover, having too many seeds can make the selected seeds vulnerable to manipulation. Consequently, the actual number of the selected set of seeds is limited.
“As illustrated in FIG. 1, a link from a seed to a page is represented by an arrow pointing from the seed to the page. For example, seed 106 links to page 112 and page 114 through links 132 and 134, respectively. Such links assert a “support” from the seed to the linked pages.
“The set of non-seed pages 104 are also interconnected with links. For example, page 112 has three outgoing links 136, 137, and 138, which target at pages 118, 116 and 122, respectively. Furthermore, page 114 has two outgoing links 140 and 142, which connect to pages 118 and 120, respectively. Additionally, page 120 links to page 118 through link 144 as shown.
“Note that pages 118, 120 and 128 form a loop, wherein these pages point to each other in a circular manner though links 144, 146, and 148. Furthermore, page 126 and page 128 also form a loop in which they point to each other through links 150 and 152. Note that even though there is no direct link from seed 106 to page 118, page 118 is reachable from seed 106 via three distinct paths which are highlighted: (1) seed 106 <link 132> page 112 <link 136> page 118; (2) seed 106<link 134> page 114<link 140> page 118; and (3) seed 106<link 134> page 114<link 142> page 120<link 144> page 118. We are interested in determining a “shortest” path from seed 102 to page 118 among all of these possible paths, wherein the shortest path will be subsequently used to determine a ranking score for page 118. Note however that, the illustrated lengths of the links in FIG. 1 are not related to the metric which is used to determine the “lengths” of the links in computing the shortest path. We will discuss how to compute the lengths of the links below.”
Earlier Majestic put out this amazing video that goes through the exact science of how PageRank actually works. This video is extremely detailed.
How is PageRank Calculated?
This is where it gets tricky. The PR of each page depends on the PR of all the pages pointing to it.
But we won’t know what PR those pages have until the pages pointing to them have their PR calculated… And when you consider that page links can form circles it seems impossible to do this calculation!
But actually it’s not that bad. Remember this bit of the Google paper:
PageRank or PR(A) can be calculated using a simple iterative algorithm, and corresponds to the principal eigenvector of the normalized link matrix of the web.
What that means to us is that we can just go ahead and calculate a page’s PR without knowing the final value of the PR of the other pages. Each time we run the calculation we’re getting a closer estimate of the final value. So all we need to do is remember each value we calculate and repeat the calculations lots of times until the numbers stop changing much.
How many times do we need to repeat the calculation for big networks? That’s a difficult question; for a network as large as the World Wide Web it can be many millions of iterations!
PageRank Examples
Example 1
I’m not going to show the calculations here, but here are the results:
It takes about 20 iterations before the network begins to settle on these values.
Look at Page D – it has a PR of 0.15 even though no other page is voting for it (i.e. it has no incoming links)!
Observation: Every page has at least a PR of 0.15 to share. Therefore you can boost a page even on a brand new blog just with internal links. This is why internal link building works.
Example 2
A simple hierarchy with some outgoing links:
The home page has the most PR – after all, it has the most incoming links.
But what’s happened to the average? It’s only 0.378!!! That doesn’t tie up with what the Google paper said earlier ” the sum of all web pages’ PageRanks will be one”. So something is wrong somewhere!
Well no, everything is fine. Take a look at the “external site” pages – what’s happening to their PageRank? They’re not passing it on, they’re not voting for anyone, they’re wasting their PR.
Example 3
Let’s link those external sites back into our home page just so we can see what happens to the average…
Now the average PR goes back to 1.
And look at the PR of our home page! All those incoming links sure make a difference.
Example 4
Let’s see a simple hierarchy:
Our home page has 2 and a half times as much PR as the child pages!
Observation: A hierarchy concentrates votes and PR into one page.
Example 5
Hierarchical – but with a link in and one out.
In example 4 the home page only had a PR of 1.92 but now it is 3.31!
Excellent! Not only has site A contributed 0.85 PR to us, but the raised PR in the “About”, “Product” and “More” pages has had a lovely “feedback” effect, pushing up the home page’s PR even further!
Principle: A well structured site will amplify the effect of any contributed PR.
Example 6
Extensive Interlinking – or Fully Meshed
All the pages have the same number of incoming links, all pages are of equal importance to each other, all pages get the same PR of 1.0.
Example 7
Getting high PR the wrong way.
Just as an experiment, let’s see if we can get 1,000 pages pointing to our home page, but only have one link leaving it…
Those spam pages are pretty worthless but they sure add up!
- Observation: it doesn’t matter how many pages you have in your site, your average PR will always be 1.0 at best. But a hierarchical layout can strongly concentrate votes, and therefore the PR, into the home page!
This is a technique used by some disreputable sites (mostly adult content sites). But I can’t advise this – if Google’s robots decide you’re doing this there’s a good chance you’ll be banned from Google!
On the other hand there are at least two right ways to do this:
1. Be a Mega-site
Mega-sites, like http://news.bbc.co.uk have tens or hundreds of editors writing new content – i.e. new pages – all day long! Each one of those pages has rich, worthwhile content of its own and a link back to its parent or the home page! That’s why the Home page of these sites is 9/10…
If you have an online course with multiple pages, you could point them all to one page to increase its pagerank.
Principle: Content Is King! There really is no substitute for lots of good content…
2. Give away something useful and link back
Sites like phpBB or WordPress have a high PR but have no big money or marketing behind it. How can this be?
What they have done is write a very useful system that is very popular on many websites. And at the bottom of every page, in every installation, is an HTML code:
Powered by <a href=”http://www.phpbb.com/” target=”_blank”>phpBB</a>
The administrator of each installation can remove that link, but most don’t because they want to return the favor…
Can you imagine all those millions of pages giving a fraction of a vote? Wow!
Principle: Make it worth for other people to use your content or tools. If your give-away is good enough, other site admins will gladly give you a link back.
Principle: Getting lots (perhaps thousands) of links from sites with small PR is just as good as getting a few links from high PR pages.
Summary
- The average Actual PR of all pages in the index is 1.0
- If you add pages to a site you’re building, the total PR will go up for each page (but only if you link the pages together so the equation can work), but the average will remain the same.
- If you want to concentrate the PR into one, or a few, pages then hierarchical linking will do that. If you want to average out the PR amongst the pages then “fully meshing” the site (lots of evenly distributed links) will do that.
- Getting inbound links to your site is the only way to increase your site’s average PR. How that PR is distributed amongst the pages on your site depends on the details of your internal linking and which of your pages are linked to.
- Given that the average of every page is 1.0 we can see that for every site that has an actual ranking in the millions (and there are some!) there must be lots and lots of sites who’s Actual PR is below 1.0.
PageRank is, in fact, very simple (apart from one scary looking formula). But when a simple calculation is applied hundreds (or billions) of times over, the results can seem complicated.
PageRank is also only part of the story about what results get displayed high up in a Google listing.
PageRank is still part of the listings story though, so it’s worth your while as a good blogger to make sure you understand it correctly.