# Alphabetical Glossary

Aperiodic

Without cycles

Bibliometrics

The study of citations to and from papers.

"Academics build their papers on a carefully constructed foundation of citation: Each paper reaches a conclusion by citing previously published papers as proof points that advance the author's argument. Papers are judged not only on their original thinking, but also on the number of papers they cite, the number of papers that subsequently cite them back, and the perceived importance of each citation."
www.wired.com

Convergence

"The property or manner of approaching a limit, such as a point, line, function, or value"

"A series is said to converge if it approaches some limit. Formally, an infinite series is convergent if the sequence of partial sums is convergent."
www.mathworld.com

Crawler

"Computer robots are simply programs that automate repetitive tasks at speeds impossible for humans to reproduce. The term bot on the internet is usually used to describe anything that interfaces with the user or that collects data. Search engines use "spiders" which search (or spider) the web for information. They are software programs that request pages much like regular browsers do. In addition to reading the contents of pages for indexing, spiders also record links." www.search-marketing.info

Dangling Nodes

Graphically, they are nodes with no out-links. In a matrix, the are the rows with row sums of 0.

Domain

"A group of networked computers that share a common communications address."
Domain Name
"A series of alphanumeric strings separated by periods, such as www.hmco.com, that is an address of a computer network connection and that identifies the owner of the address."

Eigenvector

"Eigenvectors are a special set of vectors associated with a linear system of equations (i.e., a matrix equation) that are sometimes also known as characteristic vectors, proper vectors, or latent vectors."

A right eigenvector has a column vector, v, satisfying Av = λv.
A right eigenvector has a row vector, vT, satisfying vTAvT.

www.mathworld.com

Host

A computer containing data or programs that another computer can access by means of a network or modem.

Internet

"A worldwide system of interconnected computer networks...Today, the Internet connects millions of computers around the world in a nonhierarchical manner unprecedented in the history of communications. The Internet is a product of the convergence of media, computers, and telecommunications. It is not merely a technological development but the product of social and political processes, involving both the academic world and the government (the Department of Defense). From its origins in a non-industrial, non-corporate environment and in a purely scientific culture, it has quickly diffused into the world of commerce."

Irreducible

Every node is reachable (either directly or through a series of nodes) by every other node

Linear Stationary Process

"A linear function f(x) is one which satisfies the following two properties:
1. Additive property: f(x+y) = f(x) + f(y)
2. Homogeneity property: f(ax) = af(x), where a is a constant."

www.wikipedia.org

In general, linear equations are far easier to solve than non-linear equations.

"A stationary process (or strict(ly) stationary process is a stochastic process whose probability distribution at a fixed time or position is the same for all times or positions. As a result, parameters such as the mean and variance, if they exist, also do not change over time."

"As an example, white noise is stationary. However, the sound of a cymbal crashing is not stationary because the acoustic power of the crash (and hence its variance) diminishes over time."
wikipedia.org

"Models exploiting the Web's hyperlink structure."
Taken from the book, Google's PageRank and Beyond by Amy Langville and Carl Meyer

Markov Chain

Power Point on Markov Chains

Formal definition: www.mathworld.com

Primitive

Irreducible and aperiodic

Query-Independence

"A ranking is called query-independent if the popularity score for each page is determined off-line, and remains constant (until the next update) regardless of the query."
Taken from the book, Google's PageRank and Beyond by Amy Langville and Carl Meyer

Rank-one Update

The rank of a matrix is the number of linearly independent rows (or columns) in the matrix.
www.mathworld.com

The matrix made from (1/n)aeT is always of rank one since each dangling node row will have (1/n) as every entry and all other rows will have 0. So, all rows are a linear combination all other rows. Hence, the rank is one.

Row-stochastic Matrix

A matrix whose rows sum to one where the entries are probabilities

A row-stochastic matrix can be found by normalizing each row of a non-stochastic matrix.

For example, suppose the first row of a matrix is: [2 5 1 0 8].
Then the row-sum is 16, so the row-stochastic matrix would have its first row as: [2/16 5/16 1/16 0/16 8/16]

Spamming

Unsolicited e-mail, often of a commercial nature, sent indiscriminately to multiple mailing lists, individuals, or newsgroups; junk e-mail.

In the case of web pages, this could mean white-texting. White-texting is hidden words in the background (for example: white text on a white background) so that a context crawler "reads" those words even though the page may have nothing to do with the subject hidden.

Transition Probability Matrix

A transition matrix that gives the relationship from one state (or node) to another

In Activity 00309-01, we called this matrix the voting matrix A

A transition probability matrix is a row-stochastic (or column-stochastic) transition matrix

Web Page
"A document on the World Wide Web, consisting of an HTML file and any related files for scripts and graphics, and often hyperlinked to other documents on the Web.
Web Site
"A set of interconnected web pages, usually including a homepage, generally located on the same server, and prepared and maintained as a collection of information by a person, group, or organization."

World Wide Web

"The complete set of documents residing on all Internet servers that use the HTTP protocol, accessible to users via a simple point-and-click system...A part of the Internet that contains linked text, image, sound, and video documents. Before the World Wide Web (WWW), information retrieval on the Internet was text-based and required that users know basic UNIX commands. The World Wide Web has gained popularity largely because of its ease of use (point-and-click graphical interface) and multimedia capabilities, as well as its convenient access to other types of Internet services (such as e-mail, Telnet, and Usenet)."