Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (18.4k points)

I am new to Data Science and I am taking a self-paced course from the internet, I have practiced using fewer values but when I tried downloading dataset from kaggle, I found a huge dataset, I am not able to process and clean the data since there are thousands of features 

1 Answer

0 votes
by (36.8k points)
edited by

In machine learning, statistics, and information theory, reducing the number of random variables is a process of dimensionality reduction which is considered by a set of principal variables 

For training, the model which are having lots of features is not preferred, since it reduces the accuracy and also costly.

The first step is to pre-process the dataset which involves removing missing values.

To remove the missing values, the code is as follows:

data[data==" ?"] <- NA

data= na.omit(data)

As you have mentioned you are new to Data Science then learn Data Science with R which will help you to solve your problem.

Refer the link below to learn Data Science with R course

Browse Categories

...