With the way H is set up as of now in the Lab, each hyperlink is given equal opportunity of being followed. However, Google uses access logs to find actual surfer tendencies. For example, the webmaster of page 1 sees that users are 2 times more likely to go to page 2 than page 3. Then, instead of row 1 of H being [ 0 1/2 1/2 0 0 0], Google could make the row [0 2/3 1/3 0 0 0].
Google also uses the location of the hyperlinks on the page, the length of the anchor text, and the context similarity between the connected documents to change the probabilities in the rows of H For more on this subject, see the SVD method module here.
As mentioned previously, α controls the dependency on
the hyperlink structure of the web. Thus, as α tends toward 1, we rely less on
the "teleportation" and more on the hyperlink structure (and thus,
more on a person just clicking links).
In the Google matrix formula, E = eeT
(an nxn matrix of all ones) is used to multiply by 1/n in order to take into
account the possibility that a surfer may type an address into the URL bar at
any point while surfing. However, Google has a "personalization
vector" that can be used instead. That is, G = (S +
(1-α)(1/n)evT, where
0<vT<1. This makes it so that G is still primitive. With vT, teleportations are no longer evenly
distributed. When a surfer teleports to another page, he/she is more likely to
go to the page with the highest probability in vT. Therefore, a different vT implies a different PageRank.
For example, suppose you like to read about
news; then Google can bias your personalization vector so that entries
corresponding to news pages are large while other pages are nearly zero.
However, in making such a personalization vector, we are making the PageRanks
query-dependent and thus needing more calculations. Furthermore, while the
personalization vector sounds great in theory, it is impossible to create one
for each user. Some have been made based on groups of users, but a person's
searching wants change from day to day. There are some speculations that Google
uses vT to control
spamming like link
farms.
If you have a G-mail account, you can have
your search personalized by using this vector.
Purpose:
To explore the ways in which changing the variables of the Google matrix change
the PageRanks.
Materials: Matlab
Estimated Time: 15 minutes
Instructions:
1. We will be calling the function
rank_play.m by typing (do not type it at this point):
[pi,time,numiter]=rank_play(A,
pi0, alpha, epsilon, personal)
The variables in the brackets are the
outputs. The variable "pi" is (as stated before) the stationary PageRank
vector, "time" is the amount of time it took for the entire program
to run, and "numiter" is the number of iterations it took for pi(k)
to converge to pi.
The input variables are in the parentheses.
You may use the same "A" as Subject 1's or use your own (we will keep
this the same for the entire activity), "pi0" is beginning vector,
"alpha" is the scaling parameter, "epsilon" is the amount
of accuracy you wish to have (that is, you want pi(k) and pi to be within epsilon
decimals of accuracy, and "personal" is the personalization vector.
In the activities before, we have (for an nxn
matrix A) let:
pi0=(1/n)*ones(1,n) {ones(1,n) is a matrix of ones with
one row and n columns - i.e. a row vector of ones}
alpha = 0.85
epsilon = 1*10^-8
personal = ones(1,n) {or just eT}
2. Clear the Command Window. Enter A as Subject 1's (or your) matrix from Activity
00309-01. Enter the other input variables into Matlab's command window as
they are shown above (using n=5 since we have a 5x5 matrix)
3. Let pi0, epsilon, and personal be as they
are given above. We have shown that because G is a Markov chain, the starting pi (pi0) does not
affect the stationary PageRank vector (pi).
Try starting pi0 at different pages. For example, pi0=[1 0 0 0 0] (for a 5x5
matrix). Try changing pi0 to a few different eigenvectors (with a sum of 1).
What happens to the rankings when you change pi0?
4. Now, re-enter pi0 as given under
Instruction #3. Next, we will change the alpha variable. Try letting alpha get
closer and closer to 1 (alpha = 0.9; 0.95; 0.99; 0.9999). Try letting alpha get
closer and closer to 0 (alpha = 0.7; 0.6; 0.4; 0.2; 0.1; 0.0001).
What happens to the rankings as alpha tends to 1? As
it tends to 0?
5. Now re-enter alpha = 0.85. Next, let
epsilon be more and more accurate (that is, smaller and smaller exponents of
10). Then let epsilon be less and less accurate (that is, exponents getting
closer and closer to 0).
What happens as epsilon is more accurate? Less
accurate?
6. Re-enter epsilon=1*10^-8. Now change the
personalization vector in any way you so choose. Look at the pages your matrix
represents. For example, suppose Subject 1 wants to push the Internet Movie
Database page to the top. Then she would make the element of
"personal" corresponding to Internet Movie Database close to 1 and
the rest nearly 0. Remember that you do not want to make any of the elements of
the personal vector exactly 0 and the elements of the personal vector must sum
to 1. For example, to push site D (which was last) to the top, Subject 1 could
let personal = [1/24 1/24 1/24 11/12 1/24].
What happens when you change the "personal"
vector?
You have now completed the training session
of our Lab 00309! We would now like for you to take a comprehensive test on
what you have learned. Once you are finished, compare your answers to the
questions in the welcoming letter to those you have now given. Upon completion,
please mail all parts of Lab 00309 to CRL, Inc. Be sure to include your subject
identification number.
This material is based upon work supported by National Science Foundation under Grant No. 0546622.
Any opinions, findings, and conclusions or recommendations expressed in this material are those of the
[Wikiracing]
[PageRank]
[Mathematics]
[Back To Top]
author and do not necessarily reflect the views of the National Science Foundation.