# Truncated SVD and its Applications

## What is a truncated SVD?

On the previous page, we learned that singular value decomposition breaks any matrix A down so that A = U*S*V'. Let's take a closer look at the matrix S.

Remember S is a matrix of the form where D is a diagonal matrix containing the singular values. D, as one might guess, looks like this: where are the singular values of the matrix A with rank r. A full rank decomposition of A is usually denoted like this: .

We can find a reduced rank approximation (or truncated SVD) to A by setting all but the first k largest singular values equal to zero and using only the first k columns of U and V.

## The Optimality Property

Why are we using this particular decomposition for our rank k approximations? According to the optimality property, singular value decomposition gives the closest rank k approximation of a matrix. What do I mean by close?

A theorem proven by Eckart and Young shows that the error in approximating a matrix A by Ak can be written:

`				`

where B is any matrix with rank k. The notation on the left means that we are taking the Frobenius norm of the matrix A-Ak. You may remember finding the length or magnitude of a vector as a way of measuring its size. The Frobenius norm does much the same thing except with entire matrices. So, this formula means tells us that the difference between A and Ak is smaller than the difference between A and any other rank k matrix B. We cannot find or even make up another matrix of rank k that will be closer to A.

## Choosing a k

So what k should we choose? If we choose a higher k, we get a closer approximation to A. On the other hand, choosing a smaller k will save us more work. Basically, we must choose between time and accuracy. Which do you think is more important? How can we find a good balance between sacrificing time and sacrificing data?

Activity 4

One trick is to look at and compare all the singular values of your matrix. Create a random matrix A in MATLAB using the `randintr` command we used earlier. Now calculate only the singular values of your matrix with the command
`svd(A)`.
Now plot these values by entering
`plot(svd(A))`.
Note how the singular values decrease. When we multiply U, S, and V back together, the larger singular values contribute more to the magnitude of A than the smaller ones. Are any of the singular values of your matrix A significantly larger than the others?

Now create the following matrix N in MATLAB:

`				`

Again calculate the singular values (rather than the whole decomposition) and plot the values. Are any of the singular values for this matrix significantly higher than the others? What do you think we should choose as our k for this matrix?

## Comparing Rank-k approximations

We can also look at the approximations themselves to help us decide which k to use. Sometimes, we can get a good idea of how close an approximation is by looking at the matrices themselves, but this would only be practical for very small matrices. In most cases, it helps to use a graph to represent the matrix.

### Activity 5

cityplot.m
displaySVD2.m

Do you still have your term-by-employee matrix stored? If not, re-enter it into MATLAB and store it as matrix A (or another variable you prefer).

Drag the files into the current directory and enter the command
`displaySVD2(A)`

A window should pop-up showing two "city plot" graphs: one of the rank 1 approximation of your matrix and one of the matrix itself. The graphs should look something like this:

`		`

To find the value of each entry, match the colors of the tops of each "building" with the values on the color bar to the right of each graph.

Look at each graph from different angles by clicking on the "rotate" icon at the top of the window, then clicking and dragging the graph around.

When you're done looking at these two graphs, hit the space bar and the rank 2 approximation will appear. Continue to examine and compare the graphs until you have seen all k approximations.

If you had to choose a k based on these graphs, which would you choose? Why? What do your co-workers think?

Hopefully this exercise gave you a better idea of what the rank k approximations look like as k gets closer and closer to r (or the rank of the original matrix).

SVD has been useful in many other fields outside of information retrieval. One example is in digital imaging. It turns out that when computers store images, they store them as giant matrices! Each numerical entry of a matrix represents a color...sort of like a giant paint-by-number. How could SVD be useful here?

### Activity 6

Make sure the ATLAST commands are in your current directory again, then enter the commands:
``` load gatlin2 image(X), colormap(map)```
This will create a 176x260 matrix X representing the image you see after entering the second command.

Now enter ATLAST command `svdimage(X,1,map)`and a new window containing two images should appear. The one on the left is the original image, and the one on the right is a rank 1 approximation of that image. Hit any key and the rank increases by one.

What would be a good k for your image? How much storage space would the computer save by storing this approximation in decomposed form (mxn numbers to store for the original matrix vs. mxk + k + nxk numbers to store for the decomposed rank k approximation)?

For completely dense matrices, like images, we can obviously save a lot on storage. Would we save a lot on storage for a sparser matrix like our term-by-document matrices?

Computers store sparse matrices in a different way; they only store the non-zero entries of the matrix. Storage savings is probably not going to be our motivation for using rank-k approximations. So why do we use it?

## Calculating a Query with SVD

As you may recall, the equation for calculating the cosine of the angle between a query vector and another vector is:

`				`
Since Akej equals the jth column of the rank k matrix A, then we can rewrite the formula:
`				`

where the vector Akej is simply the jth column of the rank-k matrix Ak. Now if we write Ak in terms of its singular value decomposition, we have:
`			    `

If we let sj= then we can simplify the formula to:
`			           `

for j= 1, 2, 3, 4, ...n. The vectors sj and norms || sj||2 only need to be computed once, then they can be stored and retrieved when a query is being processed. So now, all that has to be computed at query time is ||q||2 and Berry & Brown, 1999. This is still a bit of an expensive calculation, but not nearly as expensive as processing a query without SVD.

## Where Else is This Being Used?

Not many search engines are still using the vector space model and SVD to process queries. Is Company Enterprises the only company still taking advantage of this method? Not completely. Some companies are still using Latent Semantic Indexing (LSI) which is a method that uses SVD. According to Langville, 2005,

```		the LSI method manipulates the matrix to eradicate
dependencies and thus consider only the independent,
smaller part of this large term-by-document matrix.
In particular, the mathematical tool used to achieve the
reduction is the truncated singular value decomposition
(SVD) of the matrix. ```
Here's an example of how a company is using LSI to its advantage.

Congratulations! You have now learned all you need to know about the vector space model and singular value decomposition and so have completed your training for Company Enterprises. Continue on to the next page for your first assignment as an official employee of Company Enterprises.