Playing with Google

The Matrix H

With the way H is set up as of now in the Lab, each hyperlink is given equal opportunity of being followed. However, Google uses access logs to find actual surfer tendencies. For example, the webmaster of page 1 sees that users are 2 times more likely to go to page 2 than page 3. Then, instead of row 1 of H being [ 0 1/2 1/2 0 0 0], Google could make the row [0 2/3 1/3 0 0 0].

Google also uses the location of the hyperlinks on the page, the length of the anchor text, and the context similarity between the connected documents to change the probabilities in the rows of H For more on this subject, see the SVD method module here.

The Scaling Parameter α

As mentioned previously, α controls the dependency on the hyperlink structure of the web. Thus, as α tends toward 1, we rely less on the "teleportation" and more on the hyperlink structure (and thus, more on a person just clicking links).

The "Teleportation" Matrix E

In the Google matrix formula, E = eeT (an nxn matrix of all ones) is used to multiply by 1/n in order to take into account the possibility that a surfer may type an address into the URL bar at any point while surfing. However, Google has a "personalization vector" that can be used instead. That is, G = (S + (1-α)(1/n)evT, where 0<vT<1. This makes it so that G is still primitive. With vT, teleportations are no longer evenly distributed. When a surfer teleports to another page, he/she is more likely to go to the page with the highest probability in vT. Therefore, a different vT implies a different PageRank.

For example, suppose you like to read about news; then Google can bias your personalization vector so that entries corresponding to news pages are large while other pages are nearly zero. However, in making such a personalization vector, we are making the PageRanks query-dependent and thus needing more calculations. Furthermore, while the personalization vector sounds great in theory, it is impossible to create one for each user. Some have been made based on groups of users, but a person's searching wants change from day to day. There are some speculations that Google uses vT to control spamming like link farms.

If you have a G-mail account, you can have your search personalized by using this vector.

Activity 00309-07: Changing Input

Purpose: To explore the ways in which changing the variables of the Google matrix change the PageRanks.

Materials: Matlab

Estimated Time: 15 minutes

Instructions:

1. We will be calling the function rank_play.m by typing (do not type it at this point):
[pi,time,numiter]=rank_play(A, pi0, alpha, epsilon, personal)

The variables in the brackets are the outputs. The variable "pi" is (as stated before) the stationary PageRank vector, "time" is the amount of time it took for the entire program to run, and "numiter" is the number of iterations it took for pi(k) to converge to pi.

The input variables are in the parentheses. You may use the same "A" as Subject 1's or use your own (we will keep this the same for the entire activity), "pi0" is beginning vector, "alpha" is the scaling parameter, "epsilon" is the amount of accuracy you wish to have (that is, you want pi(k) and pi to be within epsilon decimals of accuracy, and "personal" is the personalization vector.

In the activities before, we have (for an nxn matrix A) let:
pi0=(1/n)*ones(1,n)       {ones(1,n) is a matrix of ones with one row and n columns - i.e. a row vector of ones}
alpha = 0.85
epsilon = 1*10^-8
personal = ones(1,n)       {or just e
T}

2. Clear the Command Window. Enter A as Subject 1's (or your) matrix from Activity 00309-01. Enter the other input variables into Matlab's command window as they are shown above (using n=5 since we have a 5x5 matrix)

3. Let pi0, epsilon, and personal be as they are given above. We have shown that because G is a Markov chain, the starting pi (pi0) does not affect the stationary PageRank vector (pi).
Try starting pi0 at different pages. For example, pi0=[1 0 0 0 0] (for a 5x5 matrix). Try changing pi0 to a few different eigenvectors (with a sum of 1).

Question 1:

What happens to the rankings when you change pi0?
 

4. Now, re-enter pi0 as given under Instruction #3. Next, we will change the alpha variable. Try letting alpha get closer and closer to 1 (alpha = 0.9; 0.95; 0.99; 0.9999). Try letting alpha get closer and closer to 0 (alpha = 0.7; 0.6; 0.4; 0.2; 0.1; 0.0001).

Question 2:

What happens to the rankings as alpha tends to 1? As it tends to 0?
 

5. Now re-enter alpha = 0.85. Next, let epsilon be more and more accurate (that is, smaller and smaller exponents of 10). Then let epsilon be less and less accurate (that is, exponents getting closer and closer to 0).

Question 3:

What happens as epsilon is more accurate? Less accurate?
 

6. Re-enter epsilon=1*10^-8. Now change the personalization vector in any way you so choose. Look at the pages your matrix represents. For example, suppose Subject 1 wants to push the Internet Movie Database page to the top. Then she would make the element of "personal" corresponding to Internet Movie Database close to 1 and the rest nearly 0. Remember that you do not want to make any of the elements of the personal vector exactly 0 and the elements of the personal vector must sum to 1. For example, to push site D (which was last) to the top, Subject 1 could let personal = [1/24 1/24 1/24 11/12 1/24].

Question 4:

What happens when you change the "personal" vector?
 

 

You have now completed the training session of our Lab 00309! We would now like for you to take a comprehensive test on what you have learned. Once you are finished, compare your answers to the questions in the welcoming letter to those you have now given. Upon completion, please mail all parts of Lab 00309 to CRL, Inc. Be sure to include your subject identification number.

[Letter] [Activity 00309-01] [Internet History]
[Wikiracing] [PageRank] [Mathematics] [Back To Top]

[Glossary] [Concept Answers] [Alphabetical Glossary] [Contact Author]

This material is based upon work supported by National Science Foundation under Grant No. 0546622.

Any opinions, findings, and conclusions or recommendations expressed in this material are those of the
author and do not necessarily reflect the views of the National Science Foundation.