Courses
Browse

Spark Master Course

Master Program

Intellipaat’s Spark Master’s Training is designed by experts in the industry. In this certification program, you will come across the significant concepts and modules such as Python, Python libraries and frameworks, Spark, Pyspark, Scala, etc. and become proficient in the same. Further, in this certification program, you will attain all the necessary skills to become a professional in this domain. Throughout the entire duration of this training program, we will offer 24 hours of online assistance wherein you can effortlessly clear all your queries. Moreover, you will have lifetime access to the complete training resources. Besides, once you complete the course, you will work on exercises and projects that will help you gain experience in solving problems in the real-world. At Intellipaat, we aim to make you job-ready by conducting a number of mock interviews and also helping you create your resume. Finally, you will receive the Spark Master’s certification from Intellipaat that is collaborated with Microsoft and IBM.

In Collaboration with IBM,Microsoft
  • 8+

    Courses

  • 53+

    Projects

  • 261

    Hours

  • Online Classroom training

    • Java
    • Apache Spark and Scala
    • Python
    • Pyspark
    • Databricks Spark
    • Spark Using Java
  • Self Paced Training

    • Mongodb
    • Linux

Key Feature

261 Hrs Instructor Led Training
230 Hrs Self-paced Videos
522 Hrs Project work & Exercises
Certification and Job Assistance
Flexible Schedule
Lifetime Free Upgrade
24 x 7 Lifetime Support & Access

Course Fees

Self Paced Training

  • 230 Hrs e-learning videos
  • Lifetime Free Upgrade
  • 24 x 7 Lifetime Support & Access
$439

Online Classroom preferred

  • Everything in self-paced, plus
  • 261 Hrs of instructor-led training
  • 1:1 doubt resolution sessions
  • Attend as many batches for Lifetime
  • Flexible Schedule
  • 06 Jun
  • SAT - SUN
  • 08:00 PM TO 11:00 PM IST (GMT +5:30)
  • 13 Jun
  • SAT - SUN
  • 08:00 PM TO 11:00 PM IST (GMT +5:30)
  • 20 Jun
  • SAT - SUN
  • 08:00 PM TO 11:00 PM IST (GMT +5:30)
  • 27 Jun
  • SAT - SUN
  • 08:00 PM TO 11:00 PM IST (GMT +5:30)
$790 10% OFF Expires in
$0

Corporate Training

  • Customized Learning
  • Enterprise grade learning management system (LMS)
  • 24x7 support
  • Strong Reporting

About Course

This in a comprehensive Spark master’s course that offers an easy-to-learn approach in the overall course of this training including the execution of Scala code, concepts of classes in Scala, mutable vs immutable collections, Spark applications, etc. After you complete this course, you will become proficient in the core concepts of Spark and go for corresponding job roles.

What will you learn in this Spark course?

In this course, you will cover the following concepts:

  • Pattern matching
  • Scala collections
  • RDDs in Spark
  • Data aggregation with pair RDDs
  • MLlib and RDD persistence in Spark
  • Python implementation
  • Apache Spark and Big Data
  • RDDs and frameworks of Apache Spark
  • Data frames and PySpark SQL
  • Flume and Apache Kafka
  • NumPy, SciPy, and Matplotlib
  • Python web scrapping

Professionals who must definitely sign up for this course include:

  • Big Data Analysts
  • Data Scientists
  • Analysts
  • Researchers
  • IT Developers
  • ETL Developers
  • Data Engineers
  • Software Engineers who are keen to upgrade their skills in Big Data
  • BI and Reporting Professionals
  • Students and graduates who wish to get into the field of Spark and become a professional in this domain can also take up this course

To take up this course, you need to have a good knowledge of Python or any other programming language. Besides, you must have a good understanding of SQL or any other database query language. Moreover, having experience working with UNIX or Linux systems can be beneficial for you.

View More

Talk to us

Spark Master Reviews

Abhishek Chatterjee

Vipin Panicker

Deepanjan Acharya

Someshwar Apsinge

Ashwin Singhania

Hadoop Architect at Infosys

The Intellipaat team provided a great deal of support. They gave me a platform in which I could ask all my doubts based on Spark Master and the trainers and well as the other individuals who have taken up this course helped me in clearing those doubts. Further, their placement team also conducted several mock interviews and helped me prepare my resume that really helped my land a high-salary job.

Swetha Pandit

Big Data Developer at Accenture

The trainer of the course explained the topics and concepts extremely well. It was very easy to understand all the topics with hardly any trouble. They took time to explain each and every topic which really helped me grasp the concepts easily. Also, they used a lot of examples and associated them with the topics in hand which made it easier to learn and remember the concepts.

Abhimanyu Balgopal

Product Engineer (BigData)

The online course material and resources provided in this training were extremely comprehensive. Whenever I had a doubt or had trouble in any of their assignments, I could access the course material and revise those concepts. I also worked on a couple of real-time projects from industries that really enhanced my skills and gave me experience in the corporate world.

Course Content

Java Fundamentals

Introduction to Java basics, the various components of Java language, data types, operations, compilation process, class files, loops, conditions, benefits of Java over other programming languages.

Object oriented programming

What is object-oriented programming, the concept of abstraction, attributes, methods, constructors, inheritance, encapsulation, and polymorphism.

Java Collections

Writing codes in Java, using wrapper classes, applet program UI programs, using io.lang package and deep dive into Java Collections including Vector, ArrayList, TreeSet, HashMap.

Java Packages

What is a Java package, Java interfaces, the various access specifiers, scope specifiers, exception handling in Java, introduction to multi-threading in Java, extending the thread class, synchronizing the thread.

Introduction to XML

The fundamentals of Extensible Markup Language, its uses in storing and transferring data, writing a XML file, making sense of the XML file using DOM and SAX parsers in Java.

Java Database Connectivity

Introduction to Java Database Connectivity, fundamentals of SQL like connect, select, insert, update, the various drivers of JDBC, writing a program to communicate with database using JDBC, the architecture of JDBC, how to do a batch processing transaction.

Java Servlets

What is a Java Servlet, extending the capability of the web server, dynamic Java web content technology, the HTTP and generic Servlets, session tracking and filter, forward and include Servlet request dispatchers.

Java Server Page

Fundamentals of Java Server Page, writing a code using JSP, the architecture of JSP, declarative, expression and scripts tags, JSP and JDBC.

Spring and Hibernate Frameworks

Database interaction with Hibernate, various operations in databases like insert, delete, update, collections and inheritance, HQL, Hibernate caching, creating code with the Spring framework, auto wiring and dependency injection, Spring bean scope and post processor, integration of Spring  with Hibernate framework.

Advanced Spring and AJAX

Spring framework Aspect Oriented Programming, database commit and rollback operations, AJAX framework for interacting with server, design patterns in Java Enterprise Edition.

Service Oriented Architecture

The fundamentals of Service Oriented Architecture, importance of SOA, how SOA is independent of vendor, technology, product, deploying SOA with web services, XML, Simple Object Access Protocol (SOAP), Universal Description, Discovery, and Integration(UDDI) and Web Services Description Language (WSDL).

Scala Course Content

Introduction to Scala

Introducing Scala, deployment of Scala for Big Data applications and Apache Spark analytics, Scala REPL, Lazy Values, Control Structures in Scala, Directed Acyclic Graph (DAG), First Spark Application Using SBT/Eclipse, Spark Web UI and Spark in Hadoop Ecosystem.

Pattern Matching

The importance of Scala, the concept of REPL (Read Evaluate Print Loop), deep dive into Scala pattern matching, type interface, higher-order function, currying, traits, application space and Scala for data analysis

Executing the Scala Code

Learning about the Scala Interpreter, static object timer in Scala and testing string equality in Scala, implicit classes in Scala, the concept of currying in Scala and various classes in Scala

Classes Concept in Scala

Learning about the Classes concept, understanding the constructor overloading, various abstract classes, the hierarchy types in Scala, the concept of object equality and the val and var methods in Scala

Case Classes and Pattern Matching

Understanding sealed traits, wild, constructor, tuple, variable pattern and constant pattern

Concepts of Traits with Example

Understanding traits in Scala, the advantages of traits, linearization of traits, the Java equivalent and avoiding of boilerplate code

Scala–Java Interoperability

Implementation of traits in Scala and Java and handling of multiple traits extending

Scala Collections

Introduction to Scala collections, classification of collections, the difference between Iterator and Iterable in Scala and example of list sequence in Scala

Mutable Collections Vs. Immutable Collections

The two types of collections in Scala, Mutable and Immutable collections, understanding lists and arrays in Scala, the list buffer and array buffer, queue in Scala and double-ended queue Deque, Stacks, Sets, Maps and Tuples in Scala

Use Case Bobsrockets Package

Introduction to Scala packages and imports, the selective imports, the Scala test classes, introduction to JUnit test class, JUnit interface via JUnit 3 suite for Scala test, packaging of Scala applications in Directory Structure and examples of Spark Split and Spark Scala

Spark Course Content

Introduction to Spark

Introduction to Spark, how Spark overcomes the drawbacks of working on MapReduce, understanding in-memory MapReduce, interactive operations on MapReduce, Spark stack, fine vs. coarse-grained update, Spark stack, Spark Hadoop YARN, HDFS Revision, YARN Revision, the overview of Spark and how it is better than Hadoop, deploying Spark without Hadoop, Spark history server and Cloudera distribution

Spark Basics

Spark installation guide, Spark configuration, memory management, executor memory vs. driver memory, working with Spark Shell, the concept of resilient distributed datasets (RDD), learning to do functional programming in Spark and the architecture of Spark

Working with RDDs in Spark

Spark RDD, creating RDDs, RDD partitioning, operations and transformation in RDD, deep dive into Spark RDDs, the RDD general operations, a read-only partitioned collection of records, using the concept of RDD for faster and efficient data processing, RDD action for collect, count, collects map, save-as-text-files and pair RDD functions

Aggregating Data with Pair RDDs

Understanding the concept of Key–Value pair in RDDs, learning how Spark makes MapReduce operations faster, various operations of RDD, MapReduce interactive operations, fine and coarse-grained update and Spark stack

Writing and Deploying Spark Applications

Comparing the Spark applications with Spark Shell, creating a Spark application using Scala or Java, deploying a Spark application, Scala built application, creation of mutable list, set and set operations, list, tuple, concatenating list, creating application using SBT, deploying application using Maven, the web user interface of Spark application, a real-world example of Spark and configuring of Spark

Parallel Processing

Learning about Spark parallel processing, deploying on a cluster, introduction to Spark partitions, file-based partitioning of RDDs, understanding of HDFS and data locality, mastering the technique of parallel operations, comparing repartition and coalesce and RDD actions

Spark RDD Persistence

The execution flow in Spark, understanding the RDD persistence overview, Spark execution flow and Spark terminology, distribution shared memory vs. RDD, RDD limitations, Spark shell arguments, distributed persistence, RDD lineage, Key–Value pair for sorting implicit conversions like CountByKey, ReduceByKey, SortByKey and AggregateByKey

Spark MLlib

Introduction to Machine Learning, types of Machine Learning, introduction to MLlib, various ML algorithms supported by MLlib, Linear Regression, Logistic Regression, Decision Tree, Random Forest, K-means clustering techniques and building a Recommendation Engine

Hands-on Exercise:  Building a Recommendation Engine

Integrating Apache Flume and Apache Kafka

Why Kafka, what is Kafka, Kafka architecture, Kafka workflow, configuring Kafka cluster, basic operations, Kafka monitoring tools and integrating Apache Flume and Apache Kafka

Hands-on Exercise: Configuring Single Node Single Broker Cluster, Configuring Single Node Multi Broker Cluster, Producing and consuming messages and integrating Apache Flume and Apache Kafka

Spark Streaming

Introduction to Spark Streaming, features of Spark Streaming, Spark Streaming workflow, initializing StreamingContext, Discretized Streams (DStreams), Input DStreams and Receivers, transformations on DStreams, Output Operations on DStreams, Windowed Operators and why it is useful, important Windowed Operators and Stateful Operators

Hands-on Exercise:  Twitter Sentiment Analysis, streaming using netcat server, Kafka–Spark Streaming and Spark–Flume Streaming

Improving Spark Performance

Introduction to various variables in Spark like shared variables and broadcast variables, learning about accumulators, the common performance issues and troubleshooting the performance problems

Spark SQL and Data Frames

Learning about Spark SQL, the context of SQL in Spark for providing structured data processing, JSON support in Spark SQL, working with XML data, parquet files, creating Hive context, writing Data Frame to Hive, reading JDBC files, understanding the Data Frames in Spark, creating Data Frames, manual inferring of schema, working with CSV files, reading JDBC tables, Data Frame to JDBC, user-defined functions in Spark SQL, shared variables and accumulators, learning to query and transform data in Data Frames, how Data Frame provides the benefit of both Spark RDD and Spark SQL and deploying Hive on Spark as the execution engine

Scheduling/Partitioning

Learning about the scheduling and partitioning in Spark, hash partition, range partition, scheduling within and around applications, static partitioning, dynamic sharing, fair scheduling, Map partition with index, the Zip, GroupByKey, Spark master high availability, standby masters with ZooKeeper, Single-node Recovery with Local File System and High Order Functions

What projects I will be working on this Spark–Scala training?

Project 1: Movie Recommendation

Topics: This is a project wherein you will gain hands-on experience in deploying Apache Spark for the movie recommendation. You will be introduced to the Spark Machine Learning Library, a guide to MLlib algorithms and coding which is a Machine Learning library. You will understand how to deploy collaborative filtering, clustering, regression and dimensionality reduction in MLlib. Upon the completion of the project, you will gain experience in working with streaming data, sampling, testing and statistics.

Project 2: Twitter API Integration for Tweet Analysis

Topics: With this project, you will learn to integrate Twitter API for analyzing tweets. You will write codes on the server side using any of the scripting languages, like PHP, Ruby or Python, for requesting the Twitter API and get the results in JSON format. You will then read the results and perform various operations like aggregation, filtering and parsing as per the need to come up with tweet analysis.

Project 3: Data Exploration Using Spark SQL – Wikipedia Dataset

Topics: This project lets you work with Spark SQL. You will gain experience in working with Spark SQL for combining it with ETL applications, real-time analysis of data, performing batch analysis, deploying Machine Learning, creating visualizations and processing of graphs.

Module 01 - Python Environment Setup and Essentials

1.1 Introduction to Python Language
1.2 Features, the advantages of Python over other programming languages
1.3 Python installation – Windows, Mac & Linux distribution for Anaconda Python
1.4 Deploying Python IDE
1.5 Basic Python commands, data types, variables, keywords and more

Hands-on Exercise – Installing Python Anaconda for the Windows, Linux and Mac.

Module 02 - Python language Basic Constructs

2.1 Built-in data types in Python
2.2 Learn  classes, modules, Str(String), Ellipsis Object, Null Object, Ellipsis, Debug
2.3 Basic operators, comparison, arithmetic, slicing and slice operator, logical, bitwise
2.4 Loop and control statements while, for, if, break, else, continue.

Hands-on Exercise –
1. Write your first Python program
2. Write a Python Function (with and without parameters)
3. Use Lambda expression
4. Write a class
5. Create a member function and a variable
6. create an object
7. Write a for loop

Module 03 - OOP concepts in Python

3.1 How to write OOP concepts program in Python
3.2 Connecting to a database
3.3 Classes and objects in Python
3.4 OOPs paradigm, important concepts in OOP like polymorphism, inheritance, encapsulation
3.5 Python functions, return types and parameters
3.6 Lambda expressions

Hands-on Exercise –
1. Creating an application which helps to check balance, deposit money and withdraw the money using the concepts of OOPS.

Module 04 - Database connection

4.1 Understanding the Database, need of database
4.2 Installing MySQL on windows
4.3 Understanding Database connection using Python.

Hands-on Exercise – Demo on Database Connection using python and pulling the data.

Module 05 - NumPy for mathematical computing

5.1 Introduction to arrays and matrices
5.2 Broadcasting of array math, indexing of array
5.3 Standard deviation, conditional probability, correlation and covariance.

Hands-on Exercise –
1. How to import NumPy module
2. Creating array using ND-array
3. Calculating standard deviation on array of numbers
4. Calculating correlation between two variables.

Module 06 - SciPy for scientific computing

6.1 Introduction to SciPy
6.2 Functions building on top of NumPy, cluster, linalg, signal, optimize, integrate, subpackages, SciPy with Bayes Theorem.

Hands-on Exercise –
1. Importing of SciPy
2. Applying the Bayes theorem on the given dataset.

Module 07 - Matplotlib for data visualization

7.1 How to plot graph and chart with Python
7.2 Various aspects of line, scatter, bar, histogram, 3D, the API of MatPlotLib, subplots.

Hands-on Exercise –
1. Deploying MatPlotLib for creating Pie, Scatter, Line, Histogram.

Module 08 - Pandas for data analysis and machine learning

8.1 Introduction to Python dataframes
8.2 Importing data from JSON, CSV, Excel, SQL database, NumPy array to dataframe
8.3 Various data operations like selecting, filtering, sorting, viewing, joining, combining

Hands-on Exercise –
1. Working on importing data from JSON files
2. Selecting record by a group
3. Applying filter on top, viewing records

Module 09 - Exception Handling

9.1 Introduction to Exception Handling
9.2 Scenarios in Exception Handling with its execution
9.3 Arithmetic exception
9.4 RAISE of Exception
9.5 What is Random List, running a Random list on Jupyter Notebook
9.6 Value Error in Exception Handling.

Hands-on Exercise –
1. Demo on Exception Handling with an Industry-based Use Case.

Module 10 - Multi Threading & Race Condition

10.1 Introduction to Thread, need of threads
10.2 What are thread functions
10.3 Performing various operations on thread like joining a thread, starting a thread, enumeration in a thread
10.4 Creating a Multithread, finishing the multithreads.
10.5 Understanding Race Condition, lock and Synchronization.

Hands-on Exercise –
1. Demo on Starting a Thread and a Multithread and then perform multiple operations on them.

Module 11 - Packages and Functions

11.1 Intro to modules in Python, need of modules
11.2 How to import modules in python
11.3 Locating a module, namespace and scoping
11.4 Arithmetic operations on Modules using a function
11.5 Intro to Search path, Global and local functions, filter functions
11.6 Python Packages, import in packages, various ways of accessing the packages
11.7 Decorators, Pointer assignments, and Xldr.

Hands-on Exercise –
1. Demo on Importing the modules and performing various operation on them using arithmetic functions
2. Importing various packages and accessing them and then performing different operations on them.

Module 12 - Web scraping with Python

12.1 Introduction to web scraping in Python
12.2 Installing of beautifulsoup
12.3 Installing Python parser lxml
12.4 Various web scraping libraries, beautifulsoup, Scrapy Python packages
12.5 Creating soup object with input HTML
12.6 Searching of tree, full or partial parsing, output print

Hands-on Exercise –
1. Installation of Beautiful soup and lxml Python parser
2. Making a soup object with input HTML file
3. Navigating using Py objects in soup tree.

What projects I will be working in this Python certification course?

Project 01 : Analyzing the Naming Pattern Using Python

Industry : General

Problem Statement : How to analyze the trends and the most popular baby names

Topics : In this Python project, you will work with the United States Social Security Administration (SSA) which has made data on the frequency of baby names from 1880 to 2016 available. The project requires analyzing the data considering different methods. You will visualize the most frequent names, determine the naming trends and come up with the most popular names for a certain year.

Highlights :

  • Analyzing data using Pandas Library
  • Deploying Data Frame Manipulation
  • Bar and box plots with Matplotlib

Project 02 : – Python Web Scraping for Data Science

In this project, you will be introduced to the process of web scraping using Python. It involves installation of Beautiful Soup, web scraping libraries, working on common data and page format on the web, learning the important kinds of objects, Navigable String, deploying the searching tree, navigation options, parser, search tree, searching by CSS class, list, function and keyword argument.

Project 03 : Predicting Customer Churn in Telecom Company

Industry – Telecommunications

Problem Statement – How to increase the profitability of a telecom major by reducing the churn rate

Topics :In this project, you will work with the telecom company’s customer dataset. This dataset includes subscribing telephone customer’s details. Each of the column has data on phone number, call minutes during various times of the day, the charges incurred, lifetime account duration and whether the customer has churned some services by unsubscribing it. The goal is to predict whether a customer will eventually churn or not.

Highlights :

  • Deploy Scikit-Learn ML library
  • Develop code with Jupyter Notebook
  • Build a model using performance matrix

Introduction to the Basics of Python

  • Explaining Python and Highlighting Its Importance
  • Setting up Python Environment and Discussing Flow Control
  • Running Python Scripts and Exploring Python Editors and IDEs

Sequence and File Operations

  • Defining Reserve Keywords and Command Line Arguments
  • Describing Flow Control and Sequencing
  • Indexing and Slicing
  • Learning the xrange() Function
  • Working Around Dictionaries and Sets
  • Working with Files

Functions, Sorting, Errors and Exception, Regular Expressions, and Packages

  • Explaining Functions and Various Forms of Function Arguments
  • Learning Variable Scope, Function Parameters, and Lambda Functions
  • Sorting Using Python
  • Exception Handling
  • Package Installation
  • Regular Expressions

Python: An OOP Implementation

  • Using Class, Objects, and Attributes
  • Developing Applications Based on OOP
  • Learning About Classes, Objects and How They Function Together
  • Explaining OOPs Concepts Including Inheritance, Encapsulation, and Polymorphism, Among Others

Debugging and Databases

  • Debugging Python Scripts Using pdb and IDE
  • Classifying Errors and Developing Test Units
  • Implementing Databases Using SQLite
  • Performing CRUD Operations

Introduction to Big Data and Apache Spark

  • What is Big Data?
  • 5 V’s of Big Data
  • Problems related to Big Data: Use Case
  • What tools available for handling Big Data?
  • What is Hadoop?
  • Why do we need Hadoop?
  • Key Characteristics of Hadoop
  • Important Hadoop ecosystem concepts
  • MapReduce and HDFS
  • Introduction to Apache Spark
  • What is Apache Spark?
  • Why do we need Apache Spark?
  • Who uses Spark in the industry?
  • Apache Spark architecture
  • Spark Vs. Hadoop
  • Various Big data applications using Apache Spark

Python for Spark

  • Introduction to PySpark
  • Who uses PySpark?
  • Why Python for Spark?
  • Values, Types, Variables
  • Operands and Expressions
  • Conditional Statements
  • Loops
  • Numbers
  • Python files I/O Functions
  • Strings and associated operations
  • Sets and associated operations
  • Lists and associated operations
  • Tuples and associated operations
  • Dictionaries and associated operations

Hands-On:

  • Demonstrating Loops and Conditional Statements
  • Tuple – related operations, properties, list, etc.
  • List – operations, related properties
  • Set – properties, associated operations
  • Dictionary – operations, related properties

Python for Spark: Functional and Object-Oriented Model

  • Functions
  • Lambda Functions
  • Global Variables, its Scope, and Returning Values
  • Standard Libraries
  • Object-Oriented Concepts
  • Modules Used in Python
  • The Import Statements
  • Module Search Path
  • Package Installation Ways

Hands-On:

  • Lambda – Features, Options, Syntax, Compared with the Functions
  • Functions – Syntax, Return Values, Arguments, and Keyword Arguments
  • Errors and Exceptions – Issue Types, Remediation
  • Packages and Modules – Import Options, Modules, sys Path

Apache Spark Framework and RDDs

  • Spark Components & its Architecture
  • Spark Deployment Modes
  • Spark Web UI
  • Introduction to PySpark Shell
  • Submitting PySpark Job
  • Writing your first PySpark Job Using Jupyter Notebook
  • What is Spark RDDs?
  • Stopgaps in existing computing methodologies
  • How RDD solve the problem?
  • What are the ways to create RDD in PySpark?
  • RDD persistence and caching
  • General operations: Transformation, Actions, and Functions
  • Concept of Key-Value pair in RDDs
  • Other pair, two pair RDDs
  • RDD Lineage
  • RDD Persistence
  • WordCount Program Using RDD Concepts
  • RDD Partitioning & How it Helps Achieve Parallelization
  • Passing Functions to Spark

Hands-On:

  • Building and Running Spark Application
  • Spark Application Web UI
  • Loading data in RDDs
  • Saving data through RDDs
  • RDD Transformations
  • RDD Actions and Functions
  • RDD Partitions
  • WordCount program using RDD’s in Python

PySpark SQL and Data Frames

  • Need for Spark SQL
  • What is Spark SQL
  • Spark SQL Architecture
  • SQL Context in Spark SQL
  • User-Defined Functions
  • Data Frames
  • Interoperating with RDDs
  • Loading Data through Different Sources
  • Performance Tuning
  • Spark-Hive Integration

Hands-On:

  • Spark SQL – Creating data frames
  • Loading and transforming data through different sources
  • Spark-Hive Integration

Apache Kafka and Flume

  • Why Kafka
  • What is Kafka?
  • Kafka Workflow
  • Kafka Architecture
  • Kafka Cluster Configuring
  • Kafka Monitoring tools
  • Basic operations
  • What is Apache Flume?
  • Integrating Apache Flume and Apache Kafka

Hands-On:

  • Single Broker Kafka Cluster
  • Multi-Broker Kafka Cluster
  • Topic Operations
  • Integrating Apache Flume and Apache Kafka

PySpark Streaming

  • Introduction to Spark Streaming
  • Features of Spark Streaming
  • Spark Streaming Workflow
  • StreamingContext Initializing
  • Discretized Streams (DStreams)
  • Input DStreams, Receivers
  • Transformations on DStreams
  • DStreams Output Operations
  • Describe Windowed Operators and Why it is Useful
  • Stateful Operators
  • Vital Windowed Operators
  • Twitter Sentiment Analysis
  • Streaming using Netcat server
  • WordCount program using Kafka-Spark Streaming

Hands-On:

  • Twitter Sentiment Analysis
  • Streaming using Netcat server
  • WordCount program using Kafka-Spark Streaming
  • Spark-flume Integration

Introduction to PySpark Machine Learning

  • Introduction to Machine Learning- What, Why and Where?
  • Use Case
  • Types of Machine Learning Techniques
  • Why use Machine Learning for Spark?
  • Applications of Machine Learning (general)
  • Applications of Machine Learning with Spark
  • Introduction to MLlib
  • Features of MLlib and MLlib Tools
  • Various ML algorithms supported by MLlib
  • Supervised Learning Algorithms
  • Unsupervised Learning Algorithms
  • ML workflow utilities

Hands-On:

  • K- Means Clustering
  • Linear Regression
  • Logistic Regression
  • Decision Tree
  • Random Forest

Databricks Spark

  • Why Databricks ?
  • Databricks with Microsoft Azure
  • Spark Databricks Analytics in Azure
  • Provisioning Databricks workspace in Azure portal
  • Developing Spark ML application
  • Developing Spark Streaming applications (real time Twitter Data)
  • Optimizing Spark Performance

Spark using Java

  • run Apache Spark on Java.
  • executing a Spark example program in a Java environment

Introduction to NoSQL and MongoDB

RDBMS, types of relational databases, challenges of RDBMS, NoSQL database, its significance, how NoSQL suits Big Data needs, introduction to MongoDB and its advantages, MongoDB installation, JSON features, data types and examples

MongoDB Installation

Installing MongoDB, basic MongoDB commands and operations, MongoChef (MongoGUI) installation and MongoDB data types

Hands-on Exercise: Install MongoDB and install MongoChef (MongoGUI)

Importance of NoSQL

The need for NoSQL, types of NoSQL databases, OLTP, OLAP, limitations of RDBMS, ACID properties, CAP Theorem, Base property, learning about JSON/BSON, database collection and documentation, MongoDB uses, MongoDB write concern—acknowledged, replica acknowledged, unacknowledged, journaled—and Fsync

Hands-on Exercise: Write a JSON document

CRUD Operations

Understanding CRUD and its functionality, CRUD concepts, MongoDB query and syntax and read and write queries and query optimization

Hands-on Exercise:Use insert query to create a data entry, use find query to read data, use update and replace queries to update and use delete query operations on a DB file

Data Modeling and Schema Design

Concepts of data modelling, difference between MongoDB and RDBMS modelling, model tree structure, operational strategies, monitoring and backup

Hands-on Exercise: Write a data model tree structure for a family hierarchy

Data Management and Administration

In this module, you will learn MongoDB® Administration activities such as health check, backup, recovery, database sharding and profiling, data import/export, performance tuning, etc.

Hands-on Exercise: Use shard key and hashed shard keys, perform backup and recovery of a dummy dataset, import data from a CSV file and export data to a CSV file

Data Indexing and Aggregation

Concepts of data aggregation and types and data indexing concepts, properties and variations

Hands-on Exercise: Do aggregation using pipeline, sort, skip and limit and create index on data using single key and using multi-key

MongoDB Security

Understanding database security risks, MongoDB security concept and security approach and MongoDB integration with Java and Robomongo

Hands-on Exercise: MongoDB integration with Java and Robomongo

Working with Unstructured Data

Implementing techniques to work with variety of unstructured data like images, videos, log data and others and understanding GridFS MongoDB file system for storing data

Hands-on Exercise: Work with variety of unstructured data like images, videos, log data and others

What projects I will be working on this MongoDB training?

Project: Working with the MongoDB Java Driver

Industry: General

Problem Statement: How to create table for video insertion using Java

Topics: In this project, you will work with MongoDB Java Driver and become proficient in creating a table for inserting video using Java programming. You will work with collections and documents and understand the read and write basics of MongoDB database and the Java virtual machine libraries.

Highlights:

  • Setting up MongoDB JDBC Driver
  • Connecting to the database
  • Java virtual machine libraries

Linux Installation and Initialization (Automated)

Multiple Linux installations, automated installation using Kick start, deploying it using web server, installation media and Kick start configuration files.

Package management & Process monitoring

Linux Package which is Pre-built set of programs, installation of Packages, its libraries and the dependencies, understanding the low-level and high-levels tools needed, configuring Linux Yum Server, understanding the Yum repository Server-Client system.

Services, Utilities, Important Files and Directories

Deep dive into Linux Services, the important system files, utilities, directories.

Understanding SystemD

Learning about SystemD, the Linux-specific system and service manager, understanding what are the programs that run when Linux loads, familiarising with the systemctl commands.

Linux User Administration

Linux user management, groups, attributes and file permissions, granting permission based on tasks to be performed, various advanced user administration features, setting user and group disk space quotas, Linux file system quotas.

File System Management (Generic & LVM)

Managing the Linux File System, understanding the Logical Volume Management, allocating disks, stripping, mirroring, resizing, logical volumes, deploying LVM for setting hard disks for physical volumes.

Advanced File System Management (Software RAID)

Understanding the concept of RAID data storage virtualization, the RAID software management tool, learning about Linux Kernel with RAID Support, implementing the software RAID in Linux.

Server-Client Configurations (FTP / SFTP / HTTP)

Learning about the Server Configuration in Linux, the FTP/SFTP, the HTTP Server/Client Systems configuration.

Configuring Samba and SMTP

Understanding the Samba Open Source tool, the Simple Mail Transfer Protocol, configuring the Samba Server and SMTP with Linux.

Firewall & IP Tables

Understanding the basics of firewall in Linux, deploying the firewall and iptables in Linux which is a rule-based firewall system in Linux, testing the firewall rules.

Database Configuration (MySQL / Mariadb)

Understanding how to configure databases in Linux, working with MySQL database and MariaDB database for configuring it with Linux.

Using Control Panels to Manage Linux Servers (Webmin)

Learning about the various control panels in Linux, its various uses and applications in various services, deploying of these control panels to manage the Linux servers.

View More

Free Career Counselling

Certification

Intellipaat’s comprehensive Spark Master certification is led by industry experts from India and the United States. The video sessions available in this training will help you grasp all the significant concepts of Spark Master and acquire the necessary skill-sets. You will also have an online platform in which you can ask and clear any of your doubts on Spark Master at any time throughout the course. Besides, you will also get acquainted with like-minded individuals who are in the same training program and are looking into the same career field.

After the course, you will gain hands-on experience by working in various industry-based projects that will substantiate your learning.

Once you execute all the projects successfully, you will be awarded Intellipaat’s Spark Master Certification which is in collaboration with Microsoft and IBM. Our main aim is to prepare you for job interviews via mock interviews and also resume creation and help to find a lucrative job in a reputed organization.

Our Alumni works at top 3000+ companies

client-desktop client-mobile

Course Advisor

Suresh Paritala

Suresh Paritala

Solutions Architect at Microsoft, USA

A Senior Software Architect at NextGen Healthcare who has previously worked with IBM Corporation, Suresh Paritala has worked on Big Data, Data Science, Advanced Analytics, Internet of Things and Azure, along with AI domains like Machine Learning and Deep Learning. He has successfully implemented high-impact.

David Callaghan

David Callaghan

Big Data Solutions Architect, USA

An experienced Blockchain Professional who has been bringing integrated Blockchain, particularly Hyperledger and Ethereum, and Big Data solutions to the cloud, David Callaghan has previously worked on Hadoop, AWS Cloud, Big Data and Pentaho projects that have had major impact on revenues of marquee brands around the world.

Samanth Reddy

Samanth Reddy

Data Team Lead at Sony, USA

A renowned Data Scientist who has worked with Google and is currently working at ASCAP, Samanth Reddy has a proven ability to develop Data Science strategies that have a high impact on the revenues of various organizations. He comes with strong Data Science expertise and has created decisive Data Science strategies for Fortune 500 corporations.

Frequently Asked Questions on Spark Master

Why should I sign up for this online Spark Master’s Course from Intellipaat?

Intellipaat offers one of the best online Master’s courses for Spark. This course aims to help you master all the significant concepts of Spark, Python, Scala, and PySpark. Further, this online training will assist you in acquiring all the necessary skills required to become a Spark professional. Besides, throughout the duration of the course, we will provide 24-hour support.

You will have complete access to the course material and video lectures at no additional cost. After the course, you will work on assignments and real-time projects on certain modules that will give you an idea of your grasp of the concepts learned. You will also receive Spark certification not only from Intellipaat but also from IBM and Microsoft. Additionally, we will also provide job assistance via mock interviews, along with, resume preparation.

At Intellipaat you can enroll either for the instructor-led online training or self-paced training. Apart from this Intellipaat also offers corporate training for organizations to upskill their workforce. All trainers at Intellipaat have 12+ years of relevant industry experience and they have been actively working as consultants in the same domain making them subject matter experts. Go through the sample videos to check the quality of the trainers.
Intellipaat is offering the 24/7 query resolution and you can raise a ticket with the dedicated support team anytime. You can avail the email support for all your queries. In the event of your query not getting resolved through email we can also arrange one-to-one sessions with the trainers. You would be glad to know that you can contact Intellipaat support even after completion of the training. We also do not put a limit on the number of tickets you can raise when it comes to query resolution and doubt clearance.
Intellipaat offers the self-paced training to those who want to learn at their own pace. This training also affords you the benefit of query resolution through email, one-on-one sessions with trainers, round the clock support and access to the learning modules or LMS for lifetime. Also you get the latest version of the course material at no added cost. The Intellipaat self-paced training is 75% lesser priced compared to the online instructor-led training. If you face any problems while learning we can always arrange a virtual live class with the trainers as well.
Intellipaat is offering you the most updated, relevant and high value real-world projects as part of the training program. This way you can implement the learning that you have acquired in a real-world industry setup. All training comes with multiple projects that thoroughly test your skills, learning and practical knowledge thus making you completely industry-ready. You will work on highly exciting projects in the domains of high technology, ecommerce, marketing, sales, networking, banking, insurance, etc. Upon successful completion of the projects your skills will be considered equal to six months of rigorous industry experience.
Intellipaat actively provides placement assistance to all learners who have successfully completed the training. For this we are exclusively tied-up with over 80 top MNCs from around the world. This way you can be placed in outstanding organizations like Sony, Ericsson, TCS, Mu Sigma, Standard Chartered, Cognizant, Cisco, among other equally great enterprises. We also help you with the job interview and résumé preparation part as well.
You can definitely make the switch from self-paced to online instructor-led training by simply paying the extra amount and joining the next batch of the training which shall be notified to you specifically.
Once you complete the Intellipaat training program along with all the real-world projects, quizzes and assignments and upon scoring at least 60% marks in the qualifying exam; you will be awarded the Intellipaat verified certification. This certificate is very well recognized in Intellipaat affiliate organizations which include over 80 top MNCs from around the world which are also part of the Fortune 500 list of companies.
Apparently, No. Our Job Assistance program is aimed at helping you land in your dream job. It offers a potential opportunity for you to explore various competitive openings in the corporate world and assists you in finding a well-paid job, matching your profile. The final decision on your hiring will always be based on your performance in the interview and the requirements of the recruiter.
View More

Talk to us

Select Currency