Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
in R Programming by (50.2k points)
I am having a very large folder of images (train_dir), as well as a CSV file, including the class labels for each of those images(train_df). I know that the data is huge that is why I'd like to take only a sample of images (say 25%) along with labels(train_df); How would I be doing this in R Programming?

1 Answer

0 votes
by (108k points)

First of all you have to get the subset of the row numbers, to serve as an index into train_df;

Subset train_df, and get a sample of PNG filenames. Since column "id" is a factor, convert it to the character.

To each filename, apply a read PNG function. In this case, I have used png::readPNG, but others can be used in the same way.

The code for the above steps are:

perc  <- 0.25

n <- nrow(train_df)

i <- sample(n, n*perc)

png_filenames <- as.character(train_df[i, "id"])

png_files <- lapply(png_filenames, function(x){

  png::readPNG(x, native = TRUE)


If you are a beginner and want to know more about R then do check out the following R programming tutorial

Browse Categories