2 views

All this time (especially in Netflix contest), I always come across this blog (or leaderboard forum) where they mention how by applying a simple SVD step on data helped them in reducing sparsity in data or in general improved the performance of their algorithm in hand. I am trying to think (for a long time) but I am not able to guess why is it so. In general, the data in hand I get is very noisy (which is also the fun part of big data) and then I do know some basic feature scaling stuff like log-transformation stuff, mean normalization. But how does something like SVD helps? So let's say I have a huge matrix of user rating movies..and then in this matrix, I implement some version of recommendation system (say collaborative filtering):

1) Without SVD

2) With SVD

how does it help Thanks

by (33.1k points)

Singular-Value Decomposition (SVD) is a matrix decomposition method. It is not used to normalize the data, but to get rid of redundant data. SVD is used for the purpose of dimensionality reduction

For example, if you have two features in the dataset, one is humidity index and second is the probability of rain, then their correlation is evaluated. If the second one does not give any additional information, which useful for a classification or regression task, then it will be removed. The eigenvalues in SVD tell which variables are most informative, and which ones you do not need.

The working of SVD is simple. You perform SVD over your training data (a matrix). Then it set all values of S less than a certain arbitrary threshold (e.g. 0.1), then it fetches this new matrix S'. Some features are now set to zero and can be removed, sometimes without any performance penalty. This is called k-truncated SVD.

In some cases SVD can't help you with sparsity, it only helps you when features are redundant. Two features can be sparse and informative (relevant) both for a prediction task, so you shouldn’t remove either one.

Using SVD, you can go from n features to k features, where each feature is a linear combination of the original n. It is a dimensionality reduction step or you can say feature selection method. When some redundant features are present, then a feature selection algorithm may lead to better classification performance than SVD depending on your data set.