Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Machine Learning by (19k points)

I m doing an assignment where I am trying to build a collaborative filtering model for the Netflix prize data. The data that I am using is in a CSV file which I easily imported into a data frame. Now what I need to do is create a sparse matrix consisting of the Users as the rows and Movies as the columns and each cell is filled up by the corresponding rating value. When I try to map out the values in the data frame I need to run a loop for each row in the data frame, which is taking a lot of time in R, please can anyone suggest a better approach. Here is the sample code and data:

buildUserMovieMatrix <- function(trainingData)

{

  UIMatrix <- Matrix(0, nrow = max(trainingData$UserID), ncol = max(trainingData$MovieID), sparse = T);

  for(i in 1:nrow(trainingData))

  {

    UIMatrix[trainingData$UserID[i], trainingData$MovieID[i]] = trainingData$Rating[i];

  }

  return(UIMatrix);

}

Sample of data in the dataframe from which the sparse matrix is being created:

    MovieID UserID  Rating

1       1      2       3

2       2      3       3

3       2      4       4

4       2      6       3

5       2      7       3

So in the end I want something like this: The columns are the movie IDs and the rows are the user IDs

    1   2   3   4   5   6   7

1   0   0   0   0   0   0   0

2   3   0   0   0   0   0   0

3   0   3   0   0   0   0   0

4   0   4   0   0   0   0   0

5   0   0   0   0   0   0   0

6   0   3   0   0   0   0   0

7   0   3   0   0   0   0   0

So the interpretation is something like this: user 2 rated movie 1 as 3 star, user 3 rated the movie 2 as 3 star and so on for the other users and movies. There are about 8500000 rows in my data frame for which my code takes just about 30-45 mins to create this user item matrix, i would like to get any suggestions

1 Answer

0 votes
by (33.1k points)

The Matrix package has a constructor made especially for your type of data:

You can simply use the constructor from the 

library(Matrix)

UIMatrix <- sparseMatrix(i = trainingData$UserID,

                         j = trainingData$MovieID,

                         x = trainingData$Rating)

You can also try matrix indexing feature:

buildUserMovieMatrix <- function(trainingData) {

UIMatrix <- Matrix(0, nrow = max(trainingData$UserID),

ncol = max(trainingData$MovieID), sparse = TRUE);

  UIMatrix[cbind(trainingData$UserID,

                 trainingData$MovieID)] <- trainingData$Rating;

  return(UIMatrix);

}

Hope this answer helps you! A better perception would be provided through the Machine Learning Tutorials and Machine Learning Algorithms.

Browse Categories

...