How to preprocess data for machine learning?

Question

asked Jul 10, 2019 in AI and Deep Learning by ashely (50.2k points)

I just wanted some general tips on how data should be pre-processed prior to feeding it into a machine learning algorithm. I'm trying to further my understanding of why we make different decisions at preprocessing times and if someone could please go through all of the different things we need to consider when cleaning up data, removing superfluous data, etc. I would find it very informative as I have searched the net a lot for some canonical answers or rules of thumb here and there doesn't seem to be any.

I have a set of data in a .tsv file available here. The training set amounts to 7,000 rows, the test set 3,000. What different strategies should I use for handling badly-formed data if 100 rows are not readable in each? 500? 1,000? Any guidelines to help me reason about this would be very much appreciated.

The sample code would be great to see but it is not necessary if you don't feel like it, I just want to understand what I should be doing! :)

Thanks

1 Answer

vinita · Answer 1 · 2019-07-10T11:39:33+0000

These are the following steps for the Data Preprocessing for Machine Learning:

Step 1: Import Libraries. The first step is usually importing the libraries that will be needed in the program.

Step 2: Import the Dataset.

Step 3: Taking care of Missing Data in Dataset.

Step 4: Encoding categorical data.

Step 5: Splitting the Dataset into the Training set and Test Set.

Step 6: Feature Scaling.

For better understanding refer the following link: https://medium.com/datadriveninvestor/data-preprocessing-for-machine-learning-188e9eef1d2c

How to preprocess data for machine learning?

1 Answer

Related questions

Browse Categories

Browse By Domains

Popular Courses

Popular Tutorials

Popular Resources