Modeling the data
To predict something useful from the datasets, we need to implement machine learning algorithms.
Since, there are many types of algorithm like SVM, Bayes, Regression, etc.
We will be using four algorithms-
- Dimensionality Reduction
It is a very important algorithm as it is unsupervised i.e. it can implement raw data to structured data. It is used to reduce number of random variables to improve accuracy. It tries to find a subset of original variable.
It is also useful in converting the data in higher dimension to lower dimension.
By using this algorithm, it is easy to visualize the data which will be in lower dimensions.
This algorithm is used to categorize group of variables into similar types. Clustering is useful in data visualization as well. Clustering is very much useful when there is some unique value available in a dataset. This algorithm is very much popular in today’s world as it is an unsupervised algorithm and it is known as best algorithm for raw datasets.
- Linear Regression
It is considered as statistical approach as well as machine learning algorithm.
However, this algorithm is not popular for predictive result. It implements a statistical model when there is a relationship between dependent and independent variable, it gives most accurate result.
- Logistic Regression
Dependent variable is categorical in this case. When the outcome is ‘0’ or ‘1’, it indicates success/failure. This model is used to find probability of binary output based on predictor variable.
It is used to analyze the risk factor for a particular case like fraud detection, etc.
It is considered under supervised learning. It is often used for sentimental analysis or finding polarity. It is used to classify objects.
For example- If any mail is received, then this algorithm helps to classify whether the received mail is spam or not.
It is also used to categorize objects like name of persons in same category who live in same area.