Introduction

It is a Machine Learning library that includes learning algorithms and utilities which help the programmers easily practice and use Machine Learning. To work with Machine Learning, one must know the basic concepts and the algorithms required to start with it.
The machine Learning cheat sheet will guide you with all the basic concepts and libraries of Machine Learning you need to know. It is helpful for beginners as well as experienced people to easily understand what is Machine Learning and what are its libraries.

If you have any doubts or queries related to Data Science, do a post on Machine Learning Community.

Further, if you want to learn ML in-depth, you can refer to the Machine Learning Tutorial.
You can also download the printable PDF of this MLlib cheat sheet
MLib cheat sheet design
MLlib: It is an Apache Spark machine learning library that is scalable; it consists of popular algorithms and utilities
Observations: The items or data points used for learning and evaluating
Features: The characteristic or attribute of an observation
Labels: The values assigned to observation are called a Label
Training or test data: A learning algorithm is an observation used for training and testing the data
Data Source: Access to HDFS and HBase can be done using MLlib, which enables MLlib to be plugged into the Hadoop Work process.

You can master Machine Learning Library through our Machine Learning Training. Enroll Now!

MLlib Packages

MLlib contains two packages

  • mllib
  • ml

To add the MLlib the following library is imported:

    • In Scala:
import org.apache.spark.mllib.linalg.{Vector, Vectors}
    • In Java:
importapache.spark.mllib.linalg.Vector;
    • In python:
frommllib.linalgimportSparseVector
from pyspark.mllib.regression import LabeledPoint

Certification in Bigdata Analytics

Go through these Artificial Intelligence Interview Questions And Answers to excel in your Artificial Intelligence Interview.

Spark MLlib Tools

  • ML Algorithm: These include common learning algorithms such as classification, clustering, regression, and collaborative filtering. These algorithms form the core of MLlib
  • Featurization: It includes feature extraction, transformation, dimensionality reduction, and selection
  • Pipelines: Pipelines provide tools for constructing, evaluating, and tuning ML pipelines
  • Persistence: It helps in saving and loading algorithms, models, and pipelines
  • Utilities: It provides utilities for linear algebra, statistics, and data handling

Spark MLlib Tools

MLlib algorithms

These include the popular algorithms and utilities

  • Learn Statistics: It includes the most basic of the machine learning techniques such as:
    • Summary statistics
    • Correlation
    • Stratified sampling
    • Hypothesis testing
  • Logistic Regression using R: It is a statistical approach to estimating the relationship among variables. It is widely used for prediction and forecasting
  • Classification In Machine Learning: It is used to identify to which set of categories a new observation belongs.
  • K-means classification: It is used for classification using MLlib in Java. It is used to classify every observation, experiment, or vector into one of the clusters.
  • Recommendation system: it is a subclass of information filtering systems that seeks to predict the preference or rating a person can give to an item. This can be done in two ways
    • Collaborative filtering: It approaches building a model from a user’s past behavior as well as similar decisions made by the user. The model is then used to predict the items in which the user might have an interest
    • Content-based filtering: It approaches to utilizes a series of discrete characteristics of an item to recommend more items with similar properties
  • K-means Clustering: It is a task to group a set of objects in a way that the objects in the same group are more similar to each other when compared to the objects in the other group.
  • Dimensionality Reduction: It is a process of reducing a set of random variables under consideration by obtaining a set of principal variables. It can be divided into two types
    • Feature selection: It finds a subset of original variables called attributes
    • Feature Extraction: This will transform the data from in high-dimensional space to a space of fewer dimensions.
  • Feature extraction: It starts from an initial set of derived data and builds derived values.
  • Optimization: It is a selection of the best elements from the set of available alternatives

MLlib algorithms

MLib components

Interested in learning Machine Learning? Click here to learn more about this Machine Learning Training in Bangalore!

Main concepts in Pipeline

MLlib is used to standardize the APIs for easy use of multiple algorithms being used as a single pipeline or a workflow

  • Data frame: The ML API uses Dataframe from Spark SQL as a dataset, which can be used to hold a variety of datatypes
  • Transformer: This is used to transform one Dataframe into another Dataframe. Examples are
    • Hashing Term Frequency: This calculates how a word occurs
    • Logistic Regression Model: The model which results from trying logistic regressions on a dataset
    • Binarizer: This changes a given threshold value to 1 or 0

Get 100% Hike!

Master Most in Demand Skills Now !

  • Estimator: It is an algorithm that can be used on a Dataframe to produce a Transformer. Examples are:
    • Logistic Regression: It is used to determine the weights for the resulting Logistic Regression Model by processing the dataframe
    • StandardScaler: It is used to calculate the Standard deviation
    • Pipeline: Calling fit on a pipeline produces a pipeline model, and the pipeline contains only transformers and not the estimators
  • Pipeline: A pipeline chains multiple Transformers and Estimators together to specify the ML workflow
  • Parameters: To specify the parameters a common API is used by the Transformers and Estimators

Become a Master of Machine Learning by going through this online Machine Learning course in Singapore.

Main concepts in Pipeline

MLlib work process

Become a Big Data Architect

Career Transition

Intellipaat Job Guarantee Review | Intellipaat Job Assistance Review | Data Engineer Course
Got Job Promotion After Completing Artificial Intelligence Course - Intellipaat Review | Gaurav
How Can A Non Technical Person Become Data Scientist | Intellipaat Review - Melvin
Artificial Intelligence Course | Career Transition to Machine Learning Engineer - Intellipaat Review
Non Tech to Data Scientist Career Transition | Data Science Course Review - Intellipaat

Download a Printable PDF of this Cheat Sheet

With this, we come to the end of the MLlib Cheatsheet. To get in-depth knowledge, check out our interactive, live-online Machine Learning Certification course here, which comes with 24*7 support to guide you throughout your learning period. Intellipaat’s Machine Learning certification training course includes the concepts and techniques of machine learning algorithms, supervised and unsupervised learning, probability, statistics, decision tree, random forest, linear and logistic regression through real-world hands-on projects

Enroll in this Online M.Tech in AI and ML by IIT Jammu to enhance your career!

 

Course Schedule

Name Date Details
Data Science Course 30 Mar 2024(Sat-Sun) Weekend Batch
View Details
Data Science Course 06 Apr 2024(Sat-Sun) Weekend Batch
View Details
Data Science Course 13 Apr 2024(Sat-Sun) Weekend Batch
View Details