Courses
Browse

Big Data Architect Master's Course

Master Program

Our Big Data Architect master's course lets you gain proficiency in Big Data. You will work on real-world projects in Hadoop Development, Hadoop Administration, Hadoop Analysis, Hadoop Testing, Spark, Python, Splunk Developer and Admin, Apache Storm, NoSQL databases and more. In this program, you will cover 12 courses and 31 industry-based projects. As a part of this online classroom training, you will receive four additional self-paced courses co-created with IBM, namely, Spark Fundamentals I and II, Spark MLlib, and Python for Data Science.

In Collaboration with IBM
  • 12+

    Courses

  • 31+

    Projects

  • 277

    Hours

  • Online Classroom Training

    • Big Data Hadoop & Spark
    • Apache Spark & Scala
    • Splunk Developer & Admin
    • Python for Data Science
    • Pyspark
    • MongoDB
  • Self Paced Training

    • Hadoop Testing 
    • Apache Storm 
    • Apache Kafka 
    • Apache Cassandra 
    • Java 
    • Linux 

Key Features

173 Hrs Instructor Led Training
277 Hrs Self-paced Videos
384 Hrs Project work & Exercises
Certification and Job Assistance
Flexible Schedule
Lifetime Free Upgrade
24 x 7 Lifetime Support & Access

Course Fees

Self Paced Training

  • 277 Hrs e-learning videos
  • Lifetime Free Upgrade
  • 24 x 7 Lifetime Support & Access
  • Flexi-scheduling
$527

Online Classroom preferred

  • Everything in self-paced, plus
  • 173 Hrs of instructor-led training
  • 1:1 doubt resolution sessions
  • Attend as many batches for Lifetime
  • Flexible Schedule
  • 02 Jun
  • TUE - FRI
  • 07:00 AM TO 09:00 AM IST (GMT +5:30)
  • 06 Jun
  • SAT - SUN
  • 08:00 PM TO 11:00 PM IST (GMT +5:30)
  • 13 Jun
  • SAT - SUN
  • 08:00 PM TO 11:00 PM IST (GMT +5:30)
  • 20 Jun
  • SAT - SUN
  • 08:00 PM TO 11:00 PM IST (GMT +5:30)
$ 3033 $843 10% OFF Expires in
$0

Corporate Training

  • Customized Learning
  • Enterprise grade learning management system (LMS)
  • 24x7 support
  • Strong Reporting

About Big Data Architect Course

Intellipaat’s Big Data Architect master’s course will provide you with in-depth knowledge on Big Data platforms like Hadoop, Spark and NoSQL databases, along with a detailed exposure of analytics and ETL by working on tools. This program is specially designed by industry experts, and you will get 12 courses with 31 industry-based projects.

List of Courses Included:

Online Instructor-led Courses:

  • Big Data Hadoop and Spark
  • Apache Spark and Scala
  • Splunk Developer and Admin
  • Python for Data Science
  • Pyspark Training
  • MongoDB

Self-paced Courses:

  • Hadoop Testing
  • Apache Storm
  • Apache Kafka
  • Apache Cassandra
  • Java
  • Linux
  • Introduction to Hadoop ecosystem
  • Working with HDFS and MapReduce
  • Real-time analytics with Apache Spark
  • ETL in Business Intelligence domain
  • Working on large amounts of data with NoSQL databases
  • Real-time message brokering system
  • Hadoop analysis and testing
  • Data Science and Big Data Professionals and Software Developers
  • Business Intelligence Professionals, Information Architects and Project Managers
  • Those who aspire to be a Big Data Architect

There are no prerequisites for taking up this training program.

  • Global Hadoop market to reach $84.6 billion in 2 years – Allied Market Research
  • The number of jobs for all US-based data professionals will increase 2.7 million per year – IBM
  • A Hadoop Administrator in the US can get a salary of $123,000 – Indeed

Big Data is the fastest growing and the most promising technology that aids profiles like Big Data Engineer and Big Data Solutions Architect that are in huge demand. This Big Data Architect master’s course will help you grab the best jobs in this domain.

This Intellipaat training program has been specifically created to let you master the Hadoop architecture, along with helping you gain proficiency in Business Intelligence domain. Upon the completion of the training, you will be well-versed in extracting valuable business insights from raw data. This way, you can apply for top jobs in the Big Data ecosystem.

View More

Talk to us

Testimonials

John Chioles

Ritesh Bhagwat

Mr Yoga

Dileep & Ajay

Sagar

Ashok Guntupalli

Kunal Sharma

Senior Big Data Analyst at Accenture

Dear all, Intellipaat's course is nicely split in small parts making it very well suitable for learning, even with short time slot available. I appreciate the availability of video and transcript for each training sessions.

Ruchita Vijay

Software Engineer at Accenture

Videos are very informative and to the point, and they also highlight the applications and alternate scenarios very well. This course was an excellent revision of core concepts for me.

Praveen Chaudhary

Senior Consultant at Atos Syntel

The course is good on hands-on activity. I met with an expert in this field, who is good in subject matter.

Aalap Raj

Hadoop Developer at Cognizant

Excellent videos and interactive mode of online teaching. Instructors are very informed and clear while communicating.

Pravesh Bisht

Hadoop Developer at Infosys

The course is a great opportunity to get some quick practical experience with Hadoop and its related sub-projects. Also, the provided vm image could be used for future projects and deepening your understanding of Hadoop framework.

GITIKA KAPOOR

Project Lead at Wipro Technologies

Intellipaat is probably the best in the market today. The trainers are extremely good as they sound confident in whatever they teach. Plus, the support team is prompt and engaged with me even after course completion.

Course Content

Hadoop Installation and Setup

The architecture of Hadoop cluster, what is High Availability and Federation, how to setup a production cluster, various shell commands in Hadoop, understanding configuration files in Hadoop, installing single node cluster with Cloudera Manager and understanding Spark, Scala, Sqoop, Pig and Flume

Introduction to Big Data Hadoop and Understanding HDFS and MapReduce

Introducing Big Data and Hadoop, what is Big Data and where does Hadoop fit in, two important Hadoop ecosystem components, namely, MapReduce and HDFS, in-depth Hadoop Distributed File System – Replications, Block Size, Secondary Name node, High Availability and in-depth YARN – resource manager and node manager

Hands-on Exercise: HDFS working mechanism, data replication process, how to determine the size of the block, understanding a data node and name node

Deep Dive in MapReduce

Learning the working mechanism of MapReduce, understanding the mapping and reducing stages in MR, various terminologies in MR like Input Format, Output Format, Partitioners, Combiners, Shuffle and Sort

Hands-on Exercise: How to write a Word Count program in MapReduce, how to write a Custom Partitioner, what is a MapReduce Combiner, how to run a job in a local job runner, deploying unit test, what is a map side join and reduce side join, what is a tool runner, how to use counters, dataset joining with map side and reduce side joins

Introduction to Hive

Introducing Hadoop Hive, detailed architecture of Hive, comparing Hive with Pig and RDBMS, working with Hive Query Language, creation of database, table, Group by and other clauses, various types of Hive tables, HCatalog, storing the Hive Results, Hive partitioning and Buckets

Hands-on Exercise: Database creation in Hive, dropping a database, Hive table creation, how to change the database, data loading, Hive table creation, dropping and altering table, pulling data by writing Hive queries with filter conditions, table partitioning in Hive and what is a Group by clause

Advanced Hive and Impala

Indexing in Hive, the Map Side Join in Hive, working with complex data types, the Hive User-defined Functions, Introduction to Impala, comparing Hive with Impala, the detailed architecture of Impala

Hands-on Exercise: How to work with Hive queries, the process of joining table and writing indexes, external table and sequence table deployment and data storage in a different table

Introduction to Pig

Apache Pig introduction, its various features, various data types and schema in Hive, the available functions in Pig, Hive Bags, Tuples and Fields

Hands-on Exercise: Working with Pig in MapReduce and local mode, loading of data, limiting data to 4 rows, storing the data into files and working with Group By, Filter By, Distinct, Cross, Split in Hive

Flume, Sqoop and HBase

Apache Sqoop introduction, overview, importing and exporting data, performance improvement with Sqoop, Sqoop limitations, introduction to Flume and understanding the architecture of Flume and what is HBase and the CAP theorem

Hands-on Exercise: Working with Flume to generating of Sequence Number and consuming it, using the Flume Agent to consume the Twitter data, using AVRO to create Hive Table, AVRO with Pig, creating Table in HBase and deploying Disable, Scan and Enable Table

Writing Spark Applications Using Scala

Using Scala for writing Apache Spark applications, detailed study of Scala, the need for Scala, the concept of object oriented programming, executing the Scala code, various classes in Scala like Getters, Setters, Constructors, Abstract, Extending Objects, Overriding Methods, the Java and Scala interoperability, the concept of functional programming and anonymous functions, Bobsrockets package and comparing the mutable and immutable collections, Scala REPL, Lazy Values, Control Structures in Scala, Directed Acyclic Graph (DAG), first Spark application using SBT/Eclipse, Spark Web UI, Spark in Hadoop ecosystem.

Hands-on Exercise: Writing Spark application using Scala, understanding the robustness of Scala for Spark real-time analytics operation

Spark framework

Detailed Apache Spark, its various features, comparing with Hadoop, various Spark components, combining HDFS with Spark, Scalding, introduction to Scala and importance of Scala and RDD
Hands-on Exercise: The Resilient Distributed Dataset in Spark and how it helps to speed up Big Data processing

RDD in Spark

Understanding the Spark RDD operations, comparison of Spark with MapReduce, what is a Spark transformation, loading data in Spark, types of RDD operations viz. transformation and action and what is a Key/Value pair
Hands-on Exercise: How to deploy RDD with HDFS, using the in-memory dataset, using file for RDD, how to define the base RDD from external file, deploying RDD via transformation, using the Map and Reduce functions and working on word count and count log severity

Data Frames and Spark SQL

The detailed Spark SQL, the significance of SQL in Spark for working with structured data processing, Spark SQL JSON support, working with XML data and parquet files, creating Hive Context, writing Data Frame to Hive, how to read a JDBC file, significance of a Spark Data Frame, how to create a Data Frame, what is schema manual inferring, how to work with CSV files, JDBC table reading, data conversion from Data Frame to JDBC, Spark SQL user-defined functions, shared variable and accumulators, how to query and transform data in Data Frames, how Data Frame provides the benefits of both Spark RDD and Spark SQL and deploying Hive on Spark as the execution engine

Hands-on Exercise: Data querying and transformation using Data Frames and finding out the benefits of Data Frames over Spark SQL and Spark RDD

Machine Learning Using Spark (MLlib)

Introduction to Spark MLlib, understanding various algorithms, what is Spark iterative algorithm, Spark graph processing analysis, introducing Machine Learning, K-Means clustering, Spark variables like shared and broadcast variables and what are accumulators, various ML algorithms supported by MLlib, Linear Regression, Logistic Regression, Decision Tree, Random Forest, K-means clustering techniques, building a Recommendation Engine
Hands-on Exercise:  Building a Recommendation Engine

Integrating Apache Flume and Apache Kafka

Why Kafka, what is Kafka, Kafka architecture, Kafka workflow, configuring Kafka cluster, basic operations, Kafka monitoring tools, integrating Apache Flume and Apache Kafka
Hands-on Exercise:  Configuring Single Node Single Broker Cluster, Configuring Single Node Multi Broker Cluster, Producing and consuming messages, Integrating Apache Flume and Apache Kafka.

Spark Streaming

Introduction to Spark streaming, the architecture of Spark streaming, working with the Spark streaming program, processing data using Spark streaming, requesting count and DStream, multi-batch and sliding window operations and working with advanced data sources, Introduction to Spark Streaming, features of Spark Streaming, Spark Streaming workflow, initializing StreamingContext, Discretized Streams (DStreams), Input DStreams and Receivers, transformations on DStreams, Output Operations on DStreams, Windowed Operators and why it is useful, important Windowed Operators, Stateful Operators.
Hands-on Exercise: Twitter Sentiment Analysis, streaming using netcat server, Kafka-Spark Streaming and Spark-Flume Streaming

Hadoop Administration – Multi-node Cluster Setup Using Amazon EC2

Create a 4-node Hadoop cluster setup, running the MapReduce Jobs on the Hadoop cluster, successfully running the MapReduce code and working with the Cloudera Manager setup
Hands-on Exercise: The method to build a multi-node Hadoop cluster using an Amazon EC2 instance and working with the Cloudera Manager

Hadoop Administration – Cluster Configuration

The overview of Hadoop configuration, the importance of Hadoop configuration file, the various parameters and values of configuration, the HDFS parameters and MapReduce parameters, setting up the Hadoop environment, the Include and Exclude configuration files, the administration and maintenance of name node, data node directory structures and files, what is a File system image and understanding Edit log.
Hands-on Exercise: The process of performance tuning in MapReduce

Hadoop Administration – Maintenance, Monitoring and Troubleshooting

Introduction to the checkpoint procedure, name node failure and how to ensure the recovery procedure, Safe Mode, Metadata and Data backup, various potential problems and solutions, what to look for and how to add and remove nodes

Hands-on Exercise: How to go about ensuring the MapReduce File System Recovery for different scenarios, JMX monitoring of the Hadoop cluster, how to use the logs and stack traces for monitoring and troubleshooting, using the Job Scheduler for scheduling jobs in the same cluster, getting the MapReduce job submission flow, FIFO schedule and getting to know the Fair Scheduler and its configuration

ETL Connectivity with Hadoop Ecosystem (Self-Paced)

How ETL tools work in Big Data industry, introduction to ETL and data warehousing, working with prominent use cases of Big Data in ETL industry and end-to-end ETL PoC showing Big Data integration with ETL tool
Hands-on Exercise: Connecting to HDFS from ETL tool and moving data from Local system to HDFS, moving data from DBMS to HDFS, working with Hive with ETL Tool and creating MapReduce job in ETL tool

Project Solution Discussion and Cloudera Certification Tips and Tricks

Working towards the solution of the Hadoop project solution, its problem statements and the possible solution outcomes, preparing for the Cloudera certifications, points to focus for scoring the highest marks and tips for cracking Hadoop interview questions

Hands-on Exercise: The project of a real-world high value Big Data Hadoop application and getting the right solution based on the criteria set by the Intellipaat team

Following topics will be available only in self-paced mode:

Hadoop Application Testing

Why testing is important, Unit testing, Integration testing, Performance testing, Diagnostics, Nightly QA test, Benchmark and end-to-end tests, Functional testing, Release certification testing, Security testing, Scalability testing, Commissioning and Decommissioning of data nodes testing, Reliability testing and Release testing

Roles and Responsibilities of Hadoop Testing Professional

Understanding the Requirement, preparation of the Testing Estimation, Test Cases, Test Data, Test Bed Creation, Test Execution, Defect Reporting, Defect Retest, Daily Status report delivery, Test completion, ETL testing at every stage (HDFS, Hive and HBase) while loading the input (logs, files, records, etc.) using Sqoop/Flume which includes but not limited to data verification, Reconciliation, User Authorization and Authentication testing (Groups, Users, Privileges, etc.), reporting defects to the development team or manager and driving them to closure, consolidating all the defects and create defect reports, validating new feature and issues in Core Hadoop

Framework Called MRUnit for Testing of MapReduce Programs

Report defects to the development team or manager and driving them to closure, consolidate all the defects and create defect reports, responsible for creating a testing framework called MRUnit for testing of MapReduce programs

Unit Testing

Automation testing using the OOZIE and data validation using the query surge tool

Test Execution

Test plan for HDFS upgrade, test automation and result

Test Plan Strategy and Writing Test Cases for Testing Hadoop Application

How to test, install and configure

What Hadoop Projects You Will Be Working on?

Project 1: Working with MapReduce, Hive and Sqoop

Industry: General

Problem Statement: How to successfully import data using Sqoop into HDFS for data analysis

Topics: As part of this project, you will work on the various Hadoop components like MapReduce, Apache Hive and Apache Sqoop. You will have to work with Sqoop to import data from relational database management system like MySQL data into HDFS. You need to deploy Hive for summarizing data, querying and analysis. You have to convert SQL queries using HiveQL for deploying MapReduce on the transferred data. You will gain considerable proficiency in Hive and Sqoop after the completion of this project.

Highlights:

  • Sqoop data transfer from RDBMS to Hadoop
  • Coding in Hive Query Language
  • Data querying and analysis

Project 2: Work on MovieLens data for finding the top movies

Industry: Media and Entertainment

Problem Statement: How to create the top-ten-movies list using the MovieLens data

Topics: In this project you will work exclusively on data collected through MovieLens available rating data sets. The project involves writing MapReduce program to analyze the MovieLens data and creating the list of top ten movies. You will also work with Apache Pig and Apache Hive for working with distributed datasets and analyzing it.

Highlights:

  • MapReduce program for working on the data file
  • Apache Pig for analyzing data
  • Apache Hive data warehousing and querying

Project 3:  Hadoop YARN Project; End-to-end PoC

Industry: Banking

Problem Statement: How to bring the daily data (incremental data) into the Hadoop Distributed File System

Topics: In this project, we have transaction data which is daily recorded/stored in the RDBMS. Now this data is transferred everyday into HDFS for further Big Data Analytics. You will work on live Hadoop YARN cluster. YARN is part of the Hadoop ecosystem that lets Hadoop to decouple from MapReduce and deploy more competitive processing and wider array of applications. You will work on the YARN central resource manager.

Highlights:

  • Using Sqoop commands to bring the data into HDFS
  • End-to-end flow of transaction data
  • Working with the data from HDFS

Project 4: Table Partitioning in Hive

Industry: Banking

Problem Statement:  How to improve the query speed using Hive data partitioning

Topics: This project involves working with Hive table data partitioning. Ensuring the right partitioning helps to read the data, deploy it on the HDFS and run the MapReduce jobs at a much faster rate. Hive lets you partition data in multiple ways. This will give you hands-on experience in partitioning of Hive tables manually, deploying single SQL execution in dynamic partitioning and bucketing of data so as to break it into manageable chunks.

Highlights:

  • Manual Partitioning
  • Dynamic Partitioning
  • Bucketing

Project 5: Connecting Pentaho with Hadoop Ecosystem

Industry: Social Network

Problem Statement:  How to deploy ETL for data analysis activities

Topics: This project lets you connect Pentaho with the Hadoop ecosystem. Pentaho works well with HDFS, HBase, Oozie and ZooKeeper. You will connect the Hadoop cluster with Pentaho data integration, analytics, Pentaho server and report designer. This project will give you complete working knowledge on the Pentaho ETL tool.

Highlights:

  • Working knowledge of ETL and Business Intelligence
  • Configuring Pentaho to work with Hadoop distribution
  • Loading, transforming and extracting data into Hadoop cluster

Project 6: Multi-node Cluster Setup

Industry: General

Problem Statement: How to setup a Hadoop real-time cluster on Amazon EC2

Topics: This is a project that gives you opportunity to work on real world Hadoop multi-node cluster setup in a distributed environment. You will get a complete demonstration of working with various Hadoop cluster master and slave nodes, installing Java as a prerequisite for running Hadoop, installation of Hadoop and mapping the nodes in the Hadoop cluster.

Highlights:

  • Hadoop installation and configuration
  • Running a Hadoop multi-node using a 4-node cluster on Amazon EC2
  • Deploying of MapReduce job on the Hadoop cluster

Project 7: Hadoop Testing Using MRUnit

Industry: General

Problem Statement:  How to test MapReduce applications

Topics: In this project, you will gain proficiency in Hadoop MapReduce code testing using MRUnit. You will learn about real-world scenarios of deploying MRUnit, Mockito and PowerMock. This will give you hands-on experience in various testing tools for Hadoop MapReduce. After completion of this project you will be well-versed in test-driven development and will be able to write light-weight test units that work specifically on the Hadoop architecture.

Highlights:

  • Writing JUnit tests using MRUnit for MapReduce applications
  • Doing mock static methods using PowerMock and Mockito
  • MapReduce Driver for testing the map and reduce pair

Project 8: Hadoop Web Log Analytics

Industry: Internet Services

Problem Statement: How to derive insights from web log data

Topics: This project is involved with making sense of all the web log data in order to derive valuable insights from it. You will work with loading the server data onto a Hadoop cluster using various techniques. The web log data can include various URLs visited, cookie data, user demographics, location, date and time of web service access, etc. In this project, you will transport the data using Apache Flume or Kafka, workflow and data cleansing using MapReduce, Pig or Spark. The insight thus derived can be used for analyzing customer behavior and predict buying patterns.

Highlights:

  • Aggregation of log data
  • Apache Flume for data transportation
  • Processing of data and generating analytics

Project 9: Hadoop Maintenance

Industry: General

Problem Statement:  How to administer a Hadoop cluster

Topics: This project is involved with working on the Hadoop cluster for maintaining and managing it. You will work on a number of important tasks that include recovering of data, recovering from failure, adding and removing of machines from the Hadoop cluster and onboarding of users on Hadoop.

Highlights:

  • Working with name node directory structure
  • Audit logging, data node block scanner and balancer
  • Failover, fencing, DISTCP and Hadoop file formats

Project 10: Twitter Sentiment Analysis

Industry: Social Media

Problem Statement: Find out what is the reaction of the people to the demonetization move by India by analyzing their tweets

Topics:  This Project involves analyzing the tweets of people by going through what they are saying about the demonetization decision taken by the Indian government. Then you look for key phrases and words and analyze them using the dictionary and the value attributed to them based on the sentiment that they are conveying.

Highlights:

  • Download the tweets and load into Pig storage
  • Divide tweets into words to calculate sentiment
  • Rating the words from +5 to −5 on AFFIN dictionary
  • Filtering the tweets and analyzing sentiment

Project 11: Analyzing IPL T20 Cricket

Industry:  Sports and Entertainment

Problem Statement: Analyze the entire cricket match and get answers to any question regarding the details of the match

Topics:  This project involves working with the IPL dataset that has information regarding batting, bowling, runs scored, wickets taken and more. This dataset is taken as input, and then it is processed so that the entire match can be analyzed based on the user queries or needs.

Highlights:

  • Load the data into HDFS
  • Analyze the data using Apache Pig or Hive
  • Based on user queries give the right output

Apache Spark Projects

Project 1: Movie Recommendation

Industry: Entertainment

Problem Statement:  How to recommend the most appropriate movie to a user based on his taste

Topics: This is a hands-on Apache Spark project deployed for the real-world application of movie recommendations. This project helps you gain essential knowledge in Spark MLlib which is a Machine Learning library; you will know how to create collaborative filtering, regression, clustering and dimensionality reduction using Spark MLlib. Upon finishing the project, you will have first-hand experience in the Apache Spark streaming data analysis, sampling, testing and statistics, among other vital skills.

Highlights:

  • Apache Spark MLlib component
  • Statistical analysis
  • Regression and clustering

Project 2: Twitter API Integration for Tweet Analysis

Industry: Social Media

Problem Statement:  Analyzing the user sentiment based on the tweet

Topics: This is a hands-on Twitter analysis project using the Twitter API for analyzing of tweets. You will integrate the Twitter API and do programming using Python or PHP for developing the essential server-side codes. Finally, you will be able to read the results for various operations by filtering, parsing and aggregating it depending on the tweet analysis requirement.

Highlights:

  • Making requests to Twitter API
  • Building the server-side codes
  • Filtering, parsing and aggregating data

Project 3: Data Exploration Using Spark SQL – Wikipedia Data Set

Industry: Internet

Problem Statement:  Making sense of Wikipedia data using Spark SQL

Topics: In this project you will be using the Spark SQL tool for analyzing the Wikipedia data. You will gain hands-on experience in integrating Spark SQL for various applications like batch analysis, Machine Learning, visualizing and processing of data and ETL processes, along with real-time analysis of data.

Highlights:

  • Machine Learning using Spark
  • Deploying data visualization
  • Spark SQL integration

Scala Course Content

Introduction to Scala

Introducing Scala, deployment of Scala for Big Data applications and Apache Spark analytics, Scala REPL, Lazy Values, Control Structures in Scala, Directed Acyclic Graph (DAG), First Spark Application Using SBT/Eclipse, Spark Web UI and Spark in Hadoop Ecosystem.

Pattern Matching

The importance of Scala, the concept of REPL (Read Evaluate Print Loop), deep dive into Scala pattern matching, type interface, higher-order function, currying, traits, application space and Scala for data analysis

Executing the Scala Code

Learning about the Scala Interpreter, static object timer in Scala and testing string equality in Scala, implicit classes in Scala, the concept of currying in Scala and various classes in Scala

Classes Concept in Scala

Learning about the Classes concept, understanding the constructor overloading, various abstract classes, the hierarchy types in Scala, the concept of object equality and the val and var methods in Scala

Case Classes and Pattern Matching

Understanding sealed traits, wild, constructor, tuple, variable pattern and constant pattern

Concepts of Traits with Example

Understanding traits in Scala, the advantages of traits, linearization of traits, the Java equivalent and avoiding of boilerplate code

Scala–Java Interoperability

Implementation of traits in Scala and Java and handling of multiple traits extending

Scala Collections

Introduction to Scala collections, classification of collections, the difference between Iterator and Iterable in Scala and example of list sequence in Scala

Mutable Collections Vs. Immutable Collections

The two types of collections in Scala, Mutable and Immutable collections, understanding lists and arrays in Scala, the list buffer and array buffer, queue in Scala and double-ended queue Deque, Stacks, Sets, Maps and Tuples in Scala

Use Case Bobsrockets Package

Introduction to Scala packages and imports, the selective imports, the Scala test classes, introduction to JUnit test class, JUnit interface via JUnit 3 suite for Scala test, packaging of Scala applications in Directory Structure and examples of Spark Split and Spark Scala

Spark Course Content

Introduction to Spark

Introduction to Spark, how Spark overcomes the drawbacks of working on MapReduce, understanding in-memory MapReduce, interactive operations on MapReduce, Spark stack, fine vs. coarse-grained update, Spark stack, Spark Hadoop YARN, HDFS Revision, YARN Revision, the overview of Spark and how it is better than Hadoop, deploying Spark without Hadoop, Spark history server and Cloudera distribution

Spark Basics

Spark installation guide, Spark configuration, memory management, executor memory vs. driver memory, working with Spark Shell, the concept of resilient distributed datasets (RDD), learning to do functional programming in Spark and the architecture of Spark

Working with RDDs in Spark

Spark RDD, creating RDDs, RDD partitioning, operations and transformation in RDD, deep dive into Spark RDDs, the RDD general operations, a read-only partitioned collection of records, using the concept of RDD for faster and efficient data processing, RDD action for collect, count, collects map, save-as-text-files and pair RDD functions

Aggregating Data with Pair RDDs

Understanding the concept of Key–Value pair in RDDs, learning how Spark makes MapReduce operations faster, various operations of RDD, MapReduce interactive operations, fine and coarse-grained update and Spark stack

Writing and Deploying Spark Applications

Comparing the Spark applications with Spark Shell, creating a Spark application using Scala or Java, deploying a Spark application, Scala built application, creation of mutable list, set and set operations, list, tuple, concatenating list, creating application using SBT, deploying application using Maven, the web user interface of Spark application, a real-world example of Spark and configuring of Spark

Parallel Processing

Learning about Spark parallel processing, deploying on a cluster, introduction to Spark partitions, file-based partitioning of RDDs, understanding of HDFS and data locality, mastering the technique of parallel operations, comparing repartition and coalesce and RDD actions

Spark RDD Persistence

The execution flow in Spark, understanding the RDD persistence overview, Spark execution flow and Spark terminology, distribution shared memory vs. RDD, RDD limitations, Spark shell arguments, distributed persistence, RDD lineage, Key–Value pair for sorting implicit conversions like CountByKey, ReduceByKey, SortByKey and AggregateByKey

Spark MLlib

Introduction to Machine Learning, types of Machine Learning, introduction to MLlib, various ML algorithms supported by MLlib, Linear Regression, Logistic Regression, Decision Tree, Random Forest, K-means clustering techniques and building a Recommendation Engine

Hands-on Exercise:  Building a Recommendation Engine

Integrating Apache Flume and Apache Kafka

Why Kafka, what is Kafka, Kafka architecture, Kafka workflow, configuring Kafka cluster, basic operations, Kafka monitoring tools and integrating Apache Flume and Apache Kafka

Hands-on Exercise: Configuring Single Node Single Broker Cluster, Configuring Single Node Multi Broker Cluster, Producing and consuming messages and integrating Apache Flume and Apache Kafka

Spark Streaming

Introduction to Spark Streaming, features of Spark Streaming, Spark Streaming workflow, initializing StreamingContext, Discretized Streams (DStreams), Input DStreams and Receivers, transformations on DStreams, Output Operations on DStreams, Windowed Operators and why it is useful, important Windowed Operators and Stateful Operators

Hands-on Exercise:  Twitter Sentiment Analysis, streaming using netcat server, Kafka–Spark Streaming and Spark–Flume Streaming

Improving Spark Performance

Introduction to various variables in Spark like shared variables and broadcast variables, learning about accumulators, the common performance issues and troubleshooting the performance problems

Spark SQL and Data Frames

Learning about Spark SQL, the context of SQL in Spark for providing structured data processing, JSON support in Spark SQL, working with XML data, parquet files, creating Hive context, writing Data Frame to Hive, reading JDBC files, understanding the Data Frames in Spark, creating Data Frames, manual inferring of schema, working with CSV files, reading JDBC tables, Data Frame to JDBC, user-defined functions in Spark SQL, shared variables and accumulators, learning to query and transform data in Data Frames, how Data Frame provides the benefit of both Spark RDD and Spark SQL and deploying Hive on Spark as the execution engine

Scheduling/Partitioning

Learning about the scheduling and partitioning in Spark, hash partition, range partition, scheduling within and around applications, static partitioning, dynamic sharing, fair scheduling, Map partition with index, the Zip, GroupByKey, Spark master high availability, standby masters with ZooKeeper, Single-node Recovery with Local File System and High Order Functions

What projects I will be working on this Spark–Scala training?

Project 1: Movie Recommendation

Topics: This is a project wherein you will gain hands-on experience in deploying Apache Spark for the movie recommendation. You will be introduced to the Spark Machine Learning Library, a guide to MLlib algorithms and coding which is a Machine Learning library. You will understand how to deploy collaborative filtering, clustering, regression and dimensionality reduction in MLlib. Upon the completion of the project, you will gain experience in working with streaming data, sampling, testing and statistics.

Project 2: Twitter API Integration for Tweet Analysis

Topics: With this project, you will learn to integrate Twitter API for analyzing tweets. You will write codes on the server side using any of the scripting languages, like PHP, Ruby or Python, for requesting the Twitter API and get the results in JSON format. You will then read the results and perform various operations like aggregation, filtering and parsing as per the need to come up with tweet analysis.

Project 3: Data Exploration Using Spark SQL – Wikipedia Dataset

Topics: This project lets you work with Spark SQL. You will gain experience in working with Spark SQL for combining it with ETL applications, real-time analysis of data, performing batch analysis, deploying Machine Learning, creating visualizations and processing of graphs.

Splunk Development Concepts

Introduction to Splunk and Splunk developer roles and responsibilities

Basic Searching

Writing Splunk query for search, auto-complete to build a search, time range, refine search, working with events, identifying the contents of search and controlling a search job

Hands-on Exercise – Write a basic search query

Using Fields in Searches

What is a Field, how to use Fields in search, deploying Fields Sidebar and Field Extractor for REGEX field extraction and delimiting Field Extraction using FX

Hands-on Exercise – Use Fields in Search, use Fields Sidebar, use Field Extractor (FX) and delimit field Extraction using FX

Saving and Scheduling Searches

Writing Splunk query for search, sharing, saving, scheduling and exporting search results

Hands-on Exercise – Schedule a search, save a search result and share and export a search result

Creating Alerts

How to create alerts, understanding alerts and viewing fired alerts.

Hands-on Exercise –Create an alert in Splunk and view the fired alerts

Scheduled Reports

Describe and configure scheduled reports

Tags and Event Types

Introduction to Tags in Splunk, deploying Tags for Splunk search, understanding event types and utility and generating and implementing event types in search

Hands-on Exercise – Deploy tags for Splunk search and generate and implement event types in search

Creating and Using Macros

What is a Macro and what are variables and arguments in Macros

Hands-on Exercise –First, you define a Macro with arguments and then use variables with in it

Workflow

Creating get, post and search workflow actions

Hands-on Exercise – Create get, post and search workflow actions

Splunk Search Commands

Studying the search command, the general search practices, what is a search pipeline, how to specify indexes in search, highlighting the syntax and deploying the various search commands like fields, tables, sort, rename, rex and erex

Hands-on Exercise –Steps to create a search pipeline, search index specification, how to highlight syntax, using the auto complete feature and deploying the various search commands like sort, fields, tables, rename, rex and erex

Transforming Commands

Using top, rare and stats commands

Hands-on Exercise – Use top, rare and stats commands

Reporting Commands

Using following commands and their functions: addcoltotals, addtotals,top, rare and stats

Hands-on Exercise – Create reports using following commands and their functions: addcoltotals and addtotals

Mapping and Single Value Commands

iplocation, geostats, geom and addtotals commands

Hands-on Exercise – Track IP using iplocation and get geo data using geostats

Splunk Reports and Visualizations

Explore the available visualizations, create charts and time charts, omit null values and format results

Hands-on Exercise – Create time charts, omit null values and format results

Analyzing, Calculating and Formatting Results

Calculating and analyzing results, value conversion, roundoff and format values, using the eval command, conditional statements and filtering calculated search results

Hands-on Exercise – Calculate and analyze results, perform conversion on a data value, roundoff numbers, use the eval command, write conditional statements and apply filters on calculated search results

Correlating Events

How to search the transactions, creating report on transactions, grouping events using time and fields and comparing transactions with stats

Hands-on Exercise – Generate report on transactions and group events using fields and time

Enriching Data with Lookups

Learning data lookups, examples and lookup tables, defining and configuring automatic lookups and deploying lookups in reports and searches

Hands-on Exercise – Define and configure automatic lookups and deploy lookups in reports and searches

Creating Reports and Dashboards

Creating search charts, reports and dashboards, editing reports and dashboards and adding reports to dashboards

Hands-on Exercise – Create search charts, reports and dashboards, edit reports and dashboards andadd reports to dashboards

Getting Started with Parsing

Working with raw data for data extraction, transformation, parsing and preview

Hands-on Exercise – Extract useful data from raw data, perform transformation and parse different values and preview

Using Pivot

Describe pivot, relationship between data model and pivot, select a data model object, create a pivot report, create in stant pivot from a search and add a pivot report to dashboard

Hands-on Exercise – Select a data model object, create a pivot report, create instant pivot from a search and add a pivot report to dashboard

Common Information Model (CIM) Add-On

What is a Splunk CIM and using the CIM Add-On to normalize data

Hands-on Exercise – Use the CIM Add-On to normalize data

Splunk Administration Topics

Overview of Splunk

Introduction to the architecture of Splunk, various server settings, how to set up alerts, various types of licenses, important features of Splunk tool, the requirements of hardware and conditions needed for installation of Splunk

Splunk Installation

How to install and configure Splunk, the creation of index, standalone server’s input configuration, the preferences for search, Linux environment Splunk installation and the administering and architecting of Splunk

Splunk Installation in Linux

How to install Splunk in the Linux environment, the conditions needed for Splunk and configuring Splunk in the Linux environment

Distributed Management Console

Introducing Splunk distributed management console, indexing of clusters,how to deploy distributed search in Splunk environment, forwarder management, user authentication and access control

Introduction to Splunk App

Introduction to the Splunk app, how to develop Splunk apps, Splunk app management, Splunk app add-ons, using Splunk-base for installation and deletion of apps, different app permissions and implementation and how to use the Splunk app and apps on forwarder

Splunk Indexes and Users

Details of the index time configuration file and the search time configuration file

Splunk Configuration Files

Understanding of Index time and search time configuration filesin Splunk, forwarder installation, input and output configuration, Universal Forwarder management and Splunk Universal Forwarder highlights

Splunk Deployment Management

Implementing the Splunk tool, deploying it on the server, Splunk environment setup and Splunk client group deployment

Splunk Indexes

Understanding the Splunk Indexes, the default Splunk Indexes, segregating the Splunk Indexes, learning Splunk Buckets and Bucket Classification, estimating Index storage and creating new Index

User Roles and Authentication

Understanding the concept of role inheritance, Splunk authentications, native authentications and LDAP authentications

Splunk Administration Environment

Splunk installation, configuration, data inputs, app management, Splunk important concepts, parsing machine-generated data, search indexer and forwarder

Basic Production Environment

Introduction to Splunk Configuration Files, Universal Forwarder, Forwarder Management, data management, troubleshooting and monitoring

Splunk Search Engine

Converting machine-generated data into operational intelligence, setting up the dashboard, reports and charts and integrating Search Head Clustering and Indexer Clustering

Various Splunk Input Methods

Understanding the input methods, deploying scripted, Windows and network and agentless input types and fine-tuning them all

Splunk User and Index Management

Splunk user authentication and job role assignment and learning to manage, monitor and optimize Splunk Indexes

Machine Data Parsing

Understanding parsing of machine-generated data, manipulation of raw data, previewing and parsing, data field extraction and comparing single-line and multi-line events

Search Scaling and Monitoring

Distributed search concepts, improving search performance, large-scale deployment and overcoming execution hurdles and working with Splunk Distributed Management Console for monitoring the entire operation

Splunk Cluster Implementation

Cluster indexing, configuring individual nodes, configuring the cluster behavior, index and search behavior, setting node type to handle different aspects of cluster like master node, peer node and search head

What projects I will be working on this Splunk Developer and Admin training?

Project 1 : Creating an Employee Database of a Company

Industry : General

Problem Statement : How to build a Splunk dashboard where employee details are readily available

Topics : In this project, you will create a text file of employee data with details like full name, salary, designation, ID and so on. You will index the data based on various parameters, use various Splunk commands for evaluating and extracting the information. Finally, you will create a dashboard and add various reports to it.

Highlights :

  • Splunk search and index commands
  • Extracting field in search and saving results
  • Editing event types and adding tags

Project 2 : Building an Organizational Dashboard with Splunk

Industry :  E-commerce

Problem Statement : How to analyze website traffic and gather insights

Topics :  In this project, you will build an analytics dashboard for a website and create alerts for various conditions. You will capture access logs of the web server andthe sample logs and then the sample are uploaded. You will analyze the top ten users, the average time spent, peak response time of the website, the top ten errors and error code description. You will also create a Splunk dashboard for reporting and analyzing.

Highlights :

  • Creating bar and line charts
  • Sending alerts for various conditions
  • Providing admin rights for dashboard

Project 3 : Field Extraction in Splunk

Industry : General

Problem Statement :How to extract the fields from event data in Splunk

Topics : In this project, you will learn to extract fields from events using the Splunk field extraction technique. You will gain knowledge in the basics of field extractions, understand the use of the field extractor, the field extraction page in Splunk web and field extract configuration in files. You will learn the regular expression and delimiters method of field extraction. Upon the completion of the project, you will gain expertise in building Splunk dashboard and use the extracted fields data in it to create rich visualizations in an enterprise setup.

Highlight :

  • Field extraction using delimiter method
  • Delimit field extracts using FX
  • Extracting fields with the search command

Module 01 - Introduction to Data Science using Python

1.1 What is Data Science, what does a data scientist do
1.2 Various examples of Data Science in the industries
1.3 How Python is deployed for Data Science applications
1.4 Various steps in Data Science process like data wrangling, data exploration and selecting the model.
1.5 Introduction to Python programming language
1.6 Important Python features, how is Python different from other programming languages
1.7 Python installation, Anaconda Python distribution for Windows, Linux and Mac
1.8 How to run a sample Python script, Python IDE working mechanism
1.9 Running some Python basic commands
1.10 Python variables, data types and keywords.

Hands-on Exercise – Installing Python Anaconda for the Windows, Linux and Mac

Module 02 - Python basic constructs

2.1 Introduction to a basic construct in Python
2.2 Understanding indentation like tabs and spaces
2.3 Python built-in data types
2.4 Basic operators in Python
2.5 Loop and control statements like break, if, for, continue, else, range() and more.

Hands-on Exercise –
1.Write your first Python program
2. Write a Python function (with and without parameters)
3. Use Lambda expression
4. Write a class
5. Create a member function and a variable
6. Create an object and write a for loop to print all odd numbers

Module 03 - Maths for DS-Statistics & Probability

3.1 Central Tendency
3.2 Variabiltiy
3.3 Hypothesis Testing
3.4 Anova
3.5 Correlation
3.6 Regression
3.7 Probability Definitions and Notation
3.8 Joint Probabilities
3.9 The Sum Rule, Conditional Probability, and the Product Rule
3.10 Baye’s Theorem

Hands-on Exercise –
1. We will analyze both categorical data and quantitative data
2. Focusing on specific case studies to help solidify the week’s statistical concepts

Module 04 - OOPs in Python

4.1 Understanding the OOP paradigm like encapsulation, inheritance, polymorphism and abstraction
4.2 What are access modifiers, instances, class members
4.3 Classes and objects
4.4 Function parameter and return type functions
4.5 Lambda expressions.

Hands-on Exercise –
1. Writing a Python program and incorporating the OOP concepts

Module 05 - NumPy for mathematical computing

5.1 Introduction to mathematical computing in Python
5.2 What are arrays and matrices, array indexing, array math, Inspecting a numpy array, Numpy array manipulation

Hands-on Exercise –
1. How to import numpy module
2. Creating array using ND-array
3. Calculating standard deviation on array of numbers and calculating correlation between two variables.

Module 06 - Scipy for scientific computing

6.1 Introduction to scipy, building on top of numpy
6.2 What are the characteristics of scipy
6.3 Various subpackages for scipy like Signal, Integrate, Fftpack, Cluster, Optimize, Stats and more, Bayes Theorem with scipy.

Hands-on Exercise:
1. Importing of scipy
2. Applying the Bayes theorem on the given dataset.

Module 07 - Data manipulation

7.1 What is a data Manipulation. Using Pandas library
7.2 Numpy dependency of Pandas library
7.3 Series object in pandas
7.4 Dataframe in Pandas
7.5 Loading and handling data with Pandas
7.6 How to merge data objects
7.7 Concatenation and various types of joins on data objects, exploring dataset

Hands-on Exercise –
1. Doing data manipulation with Pandas by handling tabular datasets that includes variable types like float, integer, double and others.
2. Cleaning dataset, Manipulating dataset, Visualizing dataset

Module 08 - Data visualization with Matplotlib

8.1 Introduction to Matplotlib
8.2 Using Matplotlib for plotting graphs and charts like Scatter, Bar, Pie, Line, Histogram and more
8.3 Matplotlib API

Hands-on Exercise –
1. Deploying Matplotlib for creating pie, scatter, line and histogram.
2. Subplots and Pandas built-in data visualization.

Module 09 - Machine Learning using Python

9.1 Revision of topics in Python (Pandas, Matplotlib, numpy, scikit-Learn)
9.2 Introduction to machine learning
9.3 Need of Machine learning
9.4 Types of machine learning and workflow of Machine Learning
9.5 Uses Cases in Machine Learning, its various arlogrithms
9.6 What is supervised learning
9.7 What is Unsupervised Learning

Hands-on Exercise –
1. Demo on ML algorithms

Module 10 - Supervised learning

10.1 What is linear regression
10.2 Step by step calculation of Linear Regression
10.3 Linear regression in Python
10.4 Logistic Regression
10.5 What is classification
10.6 Decision Tree, Confusion Matrix, Random Forest, Naïve Bayes classifier (Self paced), Support Vector Machine(self paced), xgboost(self paced)

Hands-on Exercise – Using Python library Scikit-Learn for coming up with Random Forest algorithm to implement supervised learning.

Module 11 - Unsupervised Learning

11.1 Introduction to unsupervised learning
11.2 Use cases of unsupervised learning
11.3 What is clustering
11.4 Types of clustering(self-paced)-Exclusive clustering, Overlapping Clustering, Hierarchical Clustering(self-paced)
11.5 What is K-means clustering
11.6 Step by step calculation of k-means algorithm
11.7 Association Rule Mining(self-paced), Market Basket Analysis(self-paced), Measures in association rule mining(self-paced)-support, confidence, lift
11.8 Apriori Algorithm

Hands-on Exercise –
1. Setting up the Jupyter notebook environment
2. Loading of a dataset in Jupyter
3. Algorithms in Scikit-Learn package for performing Machine Learning techniques and training a model to search a grid.
4. Practice on k-means using Scikit
5. Practice on Apriori

Module 12 - Python integration with Spark-(selfpaced)

12.1 Introduction to pyspark
12.2 Who uses pyspark, need of spark with python
12.3 Pyspark installation
12.4 Pyspark fundamentals
12.5 Advantage over mapreduce, pyspark
12.6 Use-cases pyspark  and demo.

Hands-on Exercise:
1. Demonstrating Loops and Conditional Statements
2. Tuple – related operations, properties, list, etc.
3. List – operations, related properties
4. Set – properties, associated operations, dictionary – operations, related properties.

Module 13 - Dimensionality Reduction

13.1 Introduction to Dimensionality
13.2 Why Dimensionality Reduction
13.3 PCA
13.4 Factor Analysis
13.5 LDA

Hands-on Exercise –
Practice Dimensionality reduction Techniques : PCA, Factor Analysis, t-SNE, Random Forest, Forward and Backward feature

Module 14 - Time Series Forecasting

14.1 White Noise
14.2 AR model
14.3 MA model
14.4 ARMA model
14.5 ARIMA model
14.6 Stationarity
14.7 ACF & PACF

Hands-on Exercise –
1. Create AR model
2. Create MA model
3. Create ARMA model

What projects I will be working on this Python for Data Science course?

Project 01: Analysing the trends of COVID-19 with Python

Industry: Analytics

Problem Statement: Understanding, the trend of covid 19 spread and if the restrictions imposed by governments around the world has helped us curb the COVID cases and by what degree

Topics: In this project we will use Data Science and Python perfrom visualization to better understand the data we currently have on COVID 19 as well as using Time Series Analysis in order to make a perdiction about future cases if the current trend as observed thus far continues.

Highlights:

  • Using pandas to accumulate data from multiple data files
  • Using plotly (visualization library) to create interactive visualizations
  • Using facebooks prophet library to make timeseries models
  • Visualzing the perdiction by combining these technologies

Project 02: Analyzing the naming trends using Python

Problem Statement: The dataset is in Zipped format, we have to extract the dataset in the program, visualize the number of male and female babies born in a particular year, and find out popular baby names.

Topics: Algorithms, Python programming

Highlights:

  • Understanding the applications of data manipulation
  • Understanding how to extract only files that is having useful data
  • To understand the concepts of data visualization
  • To analyze baby names by sorting out top 100 birth counts

Project 03: Performing Analysis on Customer Churn Dataset

Problem Statement: Analysis of Employment reliability of employees in the telecom industry

Topics: Algorithms, Manipulation, Data Visualization, Python Language

Highlights:

  • Performing real time analysis of data by making use of multiple labels
  • Performing data visualization to understand the factor of reliability
  • Performing visual analysis of various columns to verify
  • Plotting charts to substantiate the findings in total

Project 04: Netflix-Recommendation system

Problem Statement: Analysis of movies dataset and recommendation of movies with respect to ratings.

Topics: Algorithms, Python, Recommendation engine

Highlights:

  • Understanding working with the combined data of movies and ratings dataset.
  • Performing data analysis on various labels in the data
  • Find the distribution of different ratings in the dataset
  • Train the SVD for the prediction of the model.

Project 05: Python Web Scraping for Data Science

In this project you will be introduced to the process of web scraping using Python. It involves installation of Beautiful Soup, web scraping libraries, working on common data and page format on the web, learning the important kinds of objects, Navigable String, deploying the searching tree, navigation options, parser, search tree, searching by CSS class, list, function and keyword argument.

Case Study 01: OOPS in Python

Problem Statement: Create multiple methods using OOPS concept

Topics: Parameterization, OOPS, Classes

Highlights:

  • A method ‘check_balance’ to check the remaining balance in the account
  • A method ‘withdraw’ to withdraw money from the bankFind the distribution of different ratings in the dataset
  • Over-ride the ‘withdraw’ method to check if minimum balance is maintained

Case Study 02: Working with NumPy

Problem Statement: Working with NumPy library to solve various problems in Python

Topics: NumPy

Highlights: 

  • Create 2D arrays
  • Initialize a numpy array of 5*5 dimensions
  • Perform simple arithmetic operations on these two numpy arrays

Case Study 03: Visualizing and Analyzing the Customer Churn dataset using R.

Problem Statement: Analyzing the data by building some aesthetic graphs to make better sense of the data.

Topics: Plots, ggplot2, Python Language

Highlights:

  • Understanding the working of ggplot2 package.
  • Understanding the applications of bar plots
  • Analyzing the data with the help of histogram graphs.
  • Observing some outliers in box-plots

Case Study 04: Building models with the help of Machine Learning Algorithms

Problem Statement: designing tree-based models on ‘Heart’ dataset.

Topics: ML Algorithms, Python Language

Highlights:

  • Performing real time data manipulation on the heart dataset.
  • Performing data visualization for multiple columnar data
  • Understanding and building tree-based model on top of the database
  • Designing a probabilistic classification model on the database.

Case Study 05: Visualizing and Analyzing the Customer Churn dataset using R.

Problem Statement: Analyzing the data by building some aesthetic graphs to make better sense of the data.

Topics: Plots, ggplot2, Python Language

Highlights

  • Understanding the working of ggplot2 package.
  • Understanding the applications of bar plots
  • Analyzing the data with the help of histogram graphs.
  • Observing some outliers in box-plots

Case Study 06: Building models with the help of Machine Learning Algorithms

Problem Statement: designing tree-based models on ‘Heart’ dataset.

Topics: ML Algorithms, Python Language

Highlights:

  • Performing real time data manipulation on the heart dataset.
  • Performing data visualization for multiple columnar data
  • Understanding and building tree-based model on top of the database
  • Designing a probabilistic classification model on the database.

Introduction to the Basics of Python

  • Explaining Python and Highlighting Its Importance
  • Setting up Python Environment and Discussing Flow Control
  • Running Python Scripts and Exploring Python Editors and IDEs

Sequence and File Operations

  • Defining Reserve Keywords and Command Line Arguments
  • Describing Flow Control and Sequencing
  • Indexing and Slicing
  • Learning the xrange() Function
  • Working Around Dictionaries and Sets
  • Working with Files

Functions, Sorting, Errors and Exception, Regular Expressions, and Packages

  • Explaining Functions and Various Forms of Function Arguments
  • Learning Variable Scope, Function Parameters, and Lambda Functions
  • Sorting Using Python
  • Exception Handling
  • Package Installation
  • Regular Expressions

Python: An OOP Implementation

  • Using Class, Objects, and Attributes
  • Developing Applications Based on OOP
  • Learning About Classes, Objects and How They Function Together
  • Explaining OOPs Concepts Including Inheritance, Encapsulation, and Polymorphism, Among Others

Debugging and Databases

  • Debugging Python Scripts Using pdb and IDE
  • Classifying Errors and Developing Test Units
  • Implementing Databases Using SQLite
  • Performing CRUD Operations

Introduction to Big Data and Apache Spark

  • What is Big Data?
  • 5 V’s of Big Data
  • Problems related to Big Data: Use Case
  • What tools available for handling Big Data?
  • What is Hadoop?
  • Why do we need Hadoop?
  • Key Characteristics of Hadoop
  • Important Hadoop ecosystem concepts
  • MapReduce and HDFS
  • Introduction to Apache Spark
  • What is Apache Spark?
  • Why do we need Apache Spark?
  • Who uses Spark in the industry?
  • Apache Spark architecture
  • Spark Vs. Hadoop
  • Various Big data applications using Apache Spark

Python for Spark

  • Introduction to PySpark
  • Who uses PySpark?
  • Why Python for Spark?
  • Values, Types, Variables
  • Operands and Expressions
  • Conditional Statements
  • Loops
  • Numbers
  • Python files I/O Functions
  • Strings and associated operations
  • Sets and associated operations
  • Lists and associated operations
  • Tuples and associated operations
  • Dictionaries and associated operations

Hands-On:

  • Demonstrating Loops and Conditional Statements
  • Tuple – related operations, properties, list, etc.
  • List – operations, related properties
  • Set – properties, associated operations
  • Dictionary – operations, related properties

Python for Spark: Functional and Object-Oriented Model

  • Functions
  • Lambda Functions
  • Global Variables, its Scope, and Returning Values
  • Standard Libraries
  • Object-Oriented Concepts
  • Modules Used in Python
  • The Import Statements
  • Module Search Path
  • Package Installation Ways

Hands-On:

  • Lambda – Features, Options, Syntax, Compared with the Functions
  • Functions – Syntax, Return Values, Arguments, and Keyword Arguments
  • Errors and Exceptions – Issue Types, Remediation
  • Packages and Modules – Import Options, Modules, sys Path

Apache Spark Framework and RDDs

  • Spark Components & its Architecture
  • Spark Deployment Modes
  • Spark Web UI
  • Introduction to PySpark Shell
  • Submitting PySpark Job
  • Writing your first PySpark Job Using Jupyter Notebook
  • What is Spark RDDs?
  • Stopgaps in existing computing methodologies
  • How RDD solve the problem?
  • What are the ways to create RDD in PySpark?
  • RDD persistence and caching
  • General operations: Transformation, Actions, and Functions
  • Concept of Key-Value pair in RDDs
  • Other pair, two pair RDDs
  • RDD Lineage
  • RDD Persistence
  • WordCount Program Using RDD Concepts
  • RDD Partitioning & How it Helps Achieve Parallelization
  • Passing Functions to Spark

Hands-On:

  • Building and Running Spark Application
  • Spark Application Web UI
  • Loading data in RDDs
  • Saving data through RDDs
  • RDD Transformations
  • RDD Actions and Functions
  • RDD Partitions
  • WordCount program using RDD’s in Python

PySpark SQL and Data Frames

  • Need for Spark SQL
  • What is Spark SQL
  • Spark SQL Architecture
  • SQL Context in Spark SQL
  • User-Defined Functions
  • Data Frames
  • Interoperating with RDDs
  • Loading Data through Different Sources
  • Performance Tuning
  • Spark-Hive Integration

Hands-On:

  • Spark SQL – Creating data frames
  • Loading and transforming data through different sources
  • Spark-Hive Integration

Apache Kafka and Flume

  • Why Kafka
  • What is Kafka?
  • Kafka Workflow
  • Kafka Architecture
  • Kafka Cluster Configuring
  • Kafka Monitoring tools
  • Basic operations
  • What is Apache Flume?
  • Integrating Apache Flume and Apache Kafka

Hands-On:

  • Single Broker Kafka Cluster
  • Multi-Broker Kafka Cluster
  • Topic Operations
  • Integrating Apache Flume and Apache Kafka

PySpark Streaming

  • Introduction to Spark Streaming
  • Features of Spark Streaming
  • Spark Streaming Workflow
  • StreamingContext Initializing
  • Discretized Streams (DStreams)
  • Input DStreams, Receivers
  • Transformations on DStreams
  • DStreams Output Operations
  • Describe Windowed Operators and Why it is Useful
  • Stateful Operators
  • Vital Windowed Operators
  • Twitter Sentiment Analysis
  • Streaming using Netcat server
  • WordCount program using Kafka-Spark Streaming

Hands-On:

  • Twitter Sentiment Analysis
  • Streaming using Netcat server
  • WordCount program using Kafka-Spark Streaming
  • Spark-flume Integration

Introduction to PySpark Machine Learning

  • Introduction to Machine Learning- What, Why and Where?
  • Use Case
  • Types of Machine Learning Techniques
  • Why use Machine Learning for Spark?
  • Applications of Machine Learning (general)
  • Applications of Machine Learning with Spark
  • Introduction to MLlib
  • Features of MLlib and MLlib Tools
  • Various ML algorithms supported by MLlib
  • Supervised Learning Algorithms
  • Unsupervised Learning Algorithms
  • ML workflow utilities

Hands-On:

  • K- Means Clustering
  • Linear Regression
  • Logistic Regression
  • Decision Tree
  • Random Forest

Introduction to NoSQL and MongoDB

RDBMS, types of relational databases, challenges of RDBMS, NoSQL database, its significance, how NoSQL suits Big Data needs, introduction to MongoDB and its advantages, MongoDB installation, JSON features, data types and examples

MongoDB Installation

Installing MongoDB, basic MongoDB commands and operations, MongoChef (MongoGUI) installation and MongoDB data types

Hands-on Exercise: Install MongoDB and install MongoChef (MongoGUI)

Importance of NoSQL

The need for NoSQL, types of NoSQL databases, OLTP, OLAP, limitations of RDBMS, ACID properties, CAP Theorem, Base property, learning about JSON/BSON, database collection and documentation, MongoDB uses, MongoDB write concern—acknowledged, replica acknowledged, unacknowledged, journaled—and Fsync

Hands-on Exercise: Write a JSON document

CRUD Operations

Understanding CRUD and its functionality, CRUD concepts, MongoDB query and syntax and read and write queries and query optimization

Hands-on Exercise:Use insert query to create a data entry, use find query to read data, use update and replace queries to update and use delete query operations on a DB file

Data Modeling and Schema Design

Concepts of data modelling, difference between MongoDB and RDBMS modelling, model tree structure, operational strategies, monitoring and backup

Hands-on Exercise: Write a data model tree structure for a family hierarchy

Data Management and Administration

In this module, you will learn MongoDB® Administration activities such as health check, backup, recovery, database sharding and profiling, data import/export, performance tuning, etc.

Hands-on Exercise: Use shard key and hashed shard keys, perform backup and recovery of a dummy dataset, import data from a CSV file and export data to a CSV file

Data Indexing and Aggregation

Concepts of data aggregation and types and data indexing concepts, properties and variations

Hands-on Exercise: Do aggregation using pipeline, sort, skip and limit and create index on data using single key and using multi-key

MongoDB Security

Understanding database security risks, MongoDB security concept and security approach and MongoDB integration with Java and Robomongo

Hands-on Exercise: MongoDB integration with Java and Robomongo

Working with Unstructured Data

Implementing techniques to work with variety of unstructured data like images, videos, log data and others and understanding GridFS MongoDB file system for storing data

Hands-on Exercise: Work with variety of unstructured data like images, videos, log data and others

What projects I will be working on this MongoDB training?

Project: Working with the MongoDB Java Driver

Industry: General

Problem Statement: How to create table for video insertion using Java

Topics: In this project, you will work with MongoDB Java Driver and become proficient in creating a table for inserting video using Java programming. You will work with collections and documents and understand the read and write basics of MongoDB database and the Java virtual machine libraries.

Highlights:

  • Setting up MongoDB JDBC Driver
  • Connecting to the database
  • Java virtual machine libraries

Introduction to Hadoop and Its Ecosystem, MapReduce and HDFS

Introduction to Hadoop and its constituent ecosystem, understanding MapReduce and HDFS, Big Data, factors constituting Big Data, Hadoop and Hadoop Ecosystem, MapReduce: concepts of Map, Reduce, ordering, concurrency, shuffle and reducing, Hadoop Distributed File System (HDFS) concepts and its importance, deep dive into MapReduce, execution framework, partitioner, combiner, data types, key pairs, HDFS deep dive: architecture, data replication, name node, data node, dataflow, parallel copying with DISTCP and Hadoop archives

Hands-on Exercises:

Installing Hadoop in pseudo-distributed mode, understanding important configuration files, their properties and Demon Threads, accessing HDFS from Command Line, MapReduce: basic exercises, understanding Hadoop ecosystem, introduction to Sqoop, use cases and installation, introduction to Hive, use cases and installation, introduction to Pig, use cases and installation, introduction to Oozie, use cases and installation, introduction to Flume, use cases and installation and introduction to YarnMini Project:

Importing MySQL data using Sqoop and querying it using Hive

MapReduce

How to develop a MapReduce application, writing unit test, the best practices for developing and writing and debugging MapReduce applications

Introduction to Pig and Its Features

What is Pig, Pig’s features, Pig use cases, interacting with Pig, basic data analysis with Pig, Pig Latin Syntax, loading data, simple data types, field definitions, data output, viewing the schema, filtering and sorting data and commonly-used functions

Hands-on Exercise: Using Pig for ETL processing

Introduction to Hive

What is Hive, Hive schema and data storage, comparing Hive to traditional databases, Hive vs. Pig, Hive use cases, interacting with Hive, relational data analysis with Hive, Hive databases and tables, Basic HiveQL Syntax, data types, joining data sets and common built-in functions

Hands-on Exercise: Running Hive queries on the Shell, Scripts and Hue

Hadoop Stack Integration Testing

Why Hadoop testing is important, unit testing, integration testing, performance testing, diagnostics, nightly QA test, benchmark and end-to-end tests, functional testing, release certification testing, security testing, scalability testing, commissioning and decommissioning of data nodes testing, reliability testing and release testing

Roles and Responsibilities of Hadoop Testing

Understanding the requirement, preparation of the testing estimation, test cases, test data, test bed creation, test execution, defect reporting, defect retest, daily status report delivery, test completion, ETL testing at every stage (HDFS, Hive and HBase) while loading the input (logs, files, records, etc.) using Sqoop/Flume which includes but not limited to data verification, reconciliation, user authorization and authentication testing (groups, users, privileges, etc.), report defects to the development team or manager and driving them to closure, consolidate all the defects and create defect reports and validating new feature and issues in core Hadoop

Framework Called MRUnit for Testing of MapReduce Programs

Report defects to the development team or manager and driving them to closure, consolidate all the defects and create defect reports, validating new feature and issues in core Hadoop and responsible for creating a testing framework called MRUnit for testing of MapReduce programs

Unit Testing

Automation testing using the Oozie and data validation using the query surge tool

Test Execution of Hadoop: Customized

Test plan for HDFS upgrade and test automation and result

Test Plan Strategy Test Cases of Hadoop Testing

How to test install and configure

What projects I will be working on this Hadoop Testing training?

Project Works

Project 1: Working with MapReduce, Hive and Sqoop

Problem Statement: It describes how to import MySQL data using Sqoop and querying it using hive and also describes how to run the word count MapReduce job.

Project 2: Testing Hadoop Using MRUnit

Industry: General

Problem Statement: How to test the Hadoop application using MRUnit testing

Topics: This project involves working with MRUnit for testing the Hadoop application without spinning a cluster. You will learn how to do the map and reduce test in an application.

Highlights:

  • Hadoop testing in isolation using MRUnit
  • Craft the test input and push through mapper and reducer
  • Deploy MapReduce driver

Understanding the Architecture of Storm

Big Data characteristics, understanding Hadoop distributed computing, the Bayesian Law, deploying Storm for real-time analytics, Apache Storm features, comparing Storm with Hadoop, Storm execution and learning about Tuple, Spout and Bolt.

Installation of Apache Storm

Installing Apache Storm and various types of run modes of Storm.

Introduction to Apache Storm

Understanding Apache Storm and the data model.

Apache Kafka Installation

Installation of Apache Kafka and its configuration.

Apache Storm Advanced

Understanding advanced Storm topics like Spouts, Bolts, Stream Groupings and Topology and its life cycle and learning about guaranteed message processing

Storm Topology

Various grouping types in Storm, reliable and unreliable messages, Bolt structure and life cycle, understanding Trident topology for failure handling, process and call log analysis topology for analyzing call logs for calls made from one number to another.

Overview of Trident

Understanding of Trident spouts and its different types, various Trident spout interface and components, familiarizing with Trident filter, aggregator and functions and a practical and hands-on use case on solving call log problem using Storm Trident

Storm Components and Classes

Various components, classes and interfaces in Storm like Base Rich Bolt Class, i RichBolt Interface, i RichSpout Interface and Base Rich Spout Class and various methodologies of working with them.

Cassandra Introduction

Understanding Cassandra, its core concepts, its strengths and deployment.

Boot Stripping

Twitter Boot Stripping, detailed understanding of Boot Stripping, concepts of Storm, Storm development environment.

What projects I will be working on this Apache Storm training?

Topics: In this project, you will be working on call logs to decipher the data and gather valuable insights using Apache Storm Trident. You will extensively work with data about calls made from one number to another. The aim of this project is to resolve the call log issues with Trident stream processing and low latency distributed querying. You will gain hands-on experience in working with Spouts and Bolts, along with various Trident functions, filters, aggregation, joins and grouping.

Project 2: Twitter Data Analysis Using Trident

Topics: This is a project that involves working with Twitter data and processing it to extract patterns out of it. The Apache Storm Trident is the perfect framework for the real-time analysis of tweets. While working with Trident, you will be able to simplify the task of live Twitter feed analysis. In this project, you will gain real-world experience of working with Spouts, Bolts and Trident filters, joins, aggregation, functions and grouping.

Project 3: The US Presidential Election Results Analysis Using Trident DRPC Query

Topics: This is a project that lets you work on the US presidential election results and predict who is leading and trailing on a real-time basis. For this, you exclusively work with Trident distributed remote procedure call server. After the completion of the project, you will learn how to access data residing in a remote computer or network and deploy it for real-time processing, analysis and prediction.

What is Kafka – An Introduction

Understanding what is Apache Kafka, the various components and use cases of Kafka, implementing Kafka on a single node.

Multi Broker Kafka Implementation

Learning about the Kafka terminology, deploying single node Kafka with independent Zookeeper, adding replication in Kafka, working with Partitioning and Brokers, understanding Kafka consumers, the Kafka Writes terminology, various failure handling scenarios in Kafka.

Multi Node Cluster Setup

Introduction to multi node cluster setup in Kafka, the various administration commands, leadership balancing and partition rebalancing, graceful shutdown of kafka Brokers and tasks, working with the Partition Reassignment Tool, cluster expending, assigning Custom Partition, removing of a Broker and improving Replication Factor of Partitions.

Integrate Flume with Kafka

Understanding the need for Kafka Integration, successfully integrating it with Apache Flume, steps in integration of Flume with Kafka as a Source.

Kafka API

Detailed understanding of the Kafka and Flume Integration, deploying Kafka as a Sink and as a Channel, introduction to PyKafka API and setting up the PyKafka Environment.

Producers & Consumers

Connecting Kafka using PyKafka, writing your own Kafka Producers and Consumers, writing a random JSON Producer, writing a Consumer to read the messages from a topic, writing and working with a File Reader Producer, writing a Consumer to store topics data into a file.

What projects I will be working on this Kafka training?

Type : Multi Broker Kafka Implementation

Topics : In this project you will learn about the Apache Kakfa which is a platform for handling real-time data feeds. You will exclusively work with Kafka brokers, understand partitioning, Kafka consumers, the terminology used for Kafka writes and failure handling in Kafka, understand how to deploy a single node Kafka with independent Zookeeper. Upon completion of the project you will gain considerable experience in working in a real world scenario for processing streaming data within an enterprise infrastructure.

Advantages and Usage of Cassandra

Introduction to Cassandra, its strengths and deployment areas

CAP Theorem and No SQL DataBase

Significance of NoSQL, RDBMS Replication, Key Challenges, types of NoSQL, benefits and drawbacks, salient features of NoSQL database. CAP Theorem, Consistency.

Cassandra fundamentals, Data model, Installation and setup

Installationintroduction to Cassandra, key concepts and deployment of non relational database, column-oriented database, Data Model – column, column family,

Cassandra Configuration

Token calculation, Configuration overview, Node tool, Validators, Comparators, Expiring column, QA

Summarization, node tool commands, cluster, Indexes, Cassandra & MapReduce, Installing Ops-center

How Cassandra modelling varies from Relational database modelling, Cassandra modelling steps, introduction to Time Series modelling, comparing Column family Vs. Super Column family, Counter column family, Partitioners, Partitioners strategies, Replication, Gossip protocols, Read operation, Consistency, Comparison

Multi Cluster setup

Creation of multi node cluster, node settings, Key and Row cache, System Key space, understanding of Read Operation, Cassandra Commands overview, VNodes, Column family

Thrift/Avro/Json/Hector Client

JSON, Hector client, AVRO, Thrift, JAVA code writing method, Hector tag

Datastax installation part,· Secondary index

Cassandra management, commands of node tool, MapReduce and Cassandra, Secondary index, Datastax Installation

Advance Modelling

Rules of Cassandra data modelling, increasing data writes, duplication, and reducing data reads, modelling data around queries, creating table for data queries

Deploying the IDE for Cassandra applications

Understanding the Java application creation methodology, learning key drivers, deploying the IDE for Cassandra applications,cluster connection and data query implementation

Cassandra Administration

Learning about Node Tool Utility, cluster management using Command Line Interface, Cassandra management and monitoring via DataStax Ops Center.

Cassandra API and Summarization and Thrift

Cassandra client connectivity, connection pool internals, API, important features and concepts of Hector client, Thrift, JAVA code, Summarization.

What projects I will be working on this Cassandra training?

Type : Deploying the IDE for Cassandra applications

Topics : This project gives you a hands-on experience in installing and working with Apache Cassandra which is a high performance and extremely scalable database for distributed data with no single point of failure. You will deploy the Java Integrated Development Environment for running Cassandra, learn about the key drivers, work with Cassandra applications in a cluster setup and implement data querying techniques.

Core Java Concepts

Introduction to Java Programming, Defining Java, Need for Java, Platform Independent in Java, Define JRE,JVM, JDK, Important Features and Evolution of Java

Writing Java Programs using Java Principles

Overview of Coding basics, Setting up the required environment, Knowing the available IDEs, Writing a Basic-level Java Program, Define Package, What are Java Comments?, Understanding the concept of Reserved Words, Introduction to Java Statements, What are Blocks in Java, Explain a Class, Different Methods

Language Conceptuals

Overview of the Language, Defining Identifiers, What are Constraints and Variables, What is an Encoding Set?, Concept of Separators, Define Primitives, How to make Primitive Conversions?, Various Operators in Java

Operating with Java Statements

Module Overview, Learn how to write If Statement, Understanding While Statement, Working with Do-while Statement, How to use For Statement?, Using Break Statement, What is Continue Statement, Working of Switch Statement

Concept of Objects and Classes

General Review of the Module, Defining Object and Classes in Java, What are Encapsulation, Static Members and Access Control?, Use and importance of ‘this’ Keyword, Deining Method Overloading with an example, ‘By Value’ vs. ‘By Reference’, Loading, Defining Initialization and Linking, How to Compare Objects in Java?, What is Garbage Collector?

Introduction to Core Classes

General Review, Concept of Object in Java, Define Core Class, What is System?, Explain String Classes, How do Arrays work?, Concept of Boxing & Unboxing, Use of ‘varargs’, ‘format’ and ‘printf’ Methods

Inheritance in Java

Introduction, Define Inheritance with an example, Accessibility concept, Method Overriding, Learning how to call a Superclass’ Constructor, What is Type Casting?, Familiarity with ’instanceof’ Keyword

Exception Handling in Detail

Getting started with exception Handling, Defining an Exception, How to use Constructs to deal with exceptions?, Classification of exceptions, Throw Exceptions, How to create an exception class?, stack Trace analysis

Getting started with Interfaces and Abstract Classes

General Review, Defining Interface, Use and Create and Interface, Concept of Extending interfaces, How to implement multiple interfaces?, What are abstract classes?, How to create and use abstract classes?, Comparison between interface and abstract classes, Concept of Nested Classes, What are Nested Classes?, Nested Classes Types, Working of an Inner Class, What is a Local Inner Class?, Anonymous Classes in java, What is a Static Nested Class

Overview of Nested Classes

What are Nested Classes?, Types of Nested Classes, What is an Inner Class?, Understanding local inner class, Anonymous Inner Class, Nested Class – Static

Getting started with Java Threads

What is a Thread?, How to create and start a Thread?, States of a Thread, Blocking the Execution of a Thread, Concept of Sleep Thread, Understanding the priorities in a thread, Synchronisation in Java Threads, Interaction between threads

Overview of Java Collections

Introduction to Collection Framework, Preeminent Interfaces, What are Comparable and Comparator?, Working with Lists, Working with Maps, Working with Sets, Working with Queues

Understanding JDBC

Define JDBC, Different types of Drivers, How to access the drivers?, What is Connection in Java?, What is a Statement?, Explaining CRUD Operations with examples, Prepared Statement and Callable Statement

Java Generics

Overview of important topics included, Important and Frequently-Used Features, Defining Generic List, What is Generic Map in Java?, Java Generic Classes & Methods, For Loop Generic, What is Generic Wild Card?

Input/Output in Java

Brief Introduction, Learning about Input and output streams in java, Concept of byte Oriented Streams, Defining Character Oriented Streams?, Explain Object Serialisation, Input and Output Based on Channel

Getting started with Java Annotations

Introduction and Definition of Annotations, How they are useful for Java programmers?, Placements in Annotations, What are Built-in Java Annotations, Defining Custom Annotations

Reflection and its Usage

Getting started, Define Java Reflection?, What is a Class Object?, Concept of Constructors, Using Fields, Applying Methods, Implementing Annotations in Your Java Program

What projects I will be working on this Java training?

Project – Library Management System

Problem Statement – It creates library management system project which includes following functionalities:

Add book, Add Member, Issue Book, Return Book, Available Book etc.

Introduction to Linux

Introduction to Linux, Basics of Shell, Basics of Kernel, CentOS 8 installation and VBox additions, Basic Linux Commands, ECHO and EXPR command, Set and unset a variable, Header of a shell script (#!).

Hands-on Exercise – Executing basic Linux commands, Installing CentOS 8 on VirtualBox and adding guest additions to the installed OS.

File Management

Text editors and file creation; Users, Groups and Processes; Root and Linux file hierarchy, Understanding file hierarchy, Understanding file permissions, chmod and chown commands, the LS command, Metacharacters, Editing a file using VIM, Displaying contents of a file, Copy, Move and Remove files.

Hands-on Exercise – Using VIM, Creating users and groups, Creating files and directories, Assigning file permissions and ownership using chmod and chown, Editing files in VIM.

Files and Processes

Everything is a file in UNIX/Linux (files, directories, executables, processes), Process control commands (ps and kill), other process control tools (top, nice, renice).

Hands-on Exercise – Executing ps and kill commands on running services, Monitoring the OS using top.

Introduction to Shell Scripting

What is shell scripting, Types of shell, Creating and writing a shell script, Changing the permission of the shell script, Executing the script, Environment variables, Defining a local and a global variable, User input in a shell script.

Hands-on Exercise – Creating a shell script, Writing and executing the shell script, creating a local and a global variable, taking input from the user in a shell script.

Conditional, Looping statements and Functions

What are Conditional statements, Using IF, IF-ELSE, Nested IF statements, What are Looping statements, Using WHILE, UNTIL and FOR statements, Using the case…esac statement, What is a Function, Creating a function in Linux, Calling functions.

Hands-on Exercise – Executing IF, IF-ELSE, Nested IF statements, Executing WHILE, UNTIL and FOR statements, Executing the case…..esac statement, creating a function in multiple ways, calling a function in a file, calling a function from another file.

Text Processing

Using GREP command, Using SED command, Using AWK command, Mounting a file to the virtual box, Creating a shared folder (mounting a folder), Using SORT command and Using pipes to combine multiple Commands.

Hands-on Exercise – Executing commands using GREP, Executing commands using SED, Executing commands using AWK, Mounting a folder in the Windows OS to the Linux OS, Installing VirtualBox guest additions on CentOS 8, Extracting zipped files.

Scheduling Tasks

What are Daemons, Introduction to Task scheduling in Linux, Scheduling a job in Linux, What is Cron and Crontab, How to use cron, Using the AT command.

Hands-on Exercise – Starting, Stopping and Restarting Daemon processes, Scheduling jobs using cron and crontab, Scheduling a one time task using AT, Managing scheduled tasks using ATQ and ATRM.

Advanced Shell Scripting

Why monitoring, Introduction to process monitoring, Top vs HTop, What does PGREP do, Introduction to file and folder monitoring, Monitoring tool inotifywait, inotifywait options for folder monitoring, Events of a folder in inotify, the FREE command.

Hands-on Exercise – Using Top to moniter the OS, Installing Htop, Using Htop to monitor the OS, Filtering and sorting using Htop, Installing inotify tools, monitoring a folder using inotifywait, monitoring a folder only for certaing events, using the FREE command.

Database Connectivity

Installing and configuring MySQL, Securing MySQL, Running Queries from terminal, Running Queries from a shell script.

Hands-on Exercise – Downloading and installing MySQL, Connecting to MySQL from terminal, Querying directly from the terminal, Pushing the query result inside a file, CRUD operations from a shell script.

Linux Networking

What is networking in Linux, Why do we need networking, Using networking commands – IFCONFIG, PING, Wget and cURL, SSH, SCP and FTP, Learning Firewall tools – iptables and firewalld, DNS and Resolving IP address, /etc/hosts and /etc/hostname, nslookup and dig.

Hands-on Exercise – Executing all the networking commands, Using iptables and firewalld, Adding and removing ports, Resolving IP address in /etc/hosts, looking into a websites IP and nameservers using nslookup and dig.

What projects I will be working on this Linux Admin training?

Project: Installing WordPress on Centos7

Industry: Internet related

Problem Statement: How to install LAMP stack on Centos7 and creating a database for WordPress

Topics: In this project you will be working on creating your account on WordPress (with Database), then flush it using Flush Privileges and Install a PHP Module. We can get that package directly from CentOS’s default repositories using yumand also we will install and update the WordPress for the latest Template & Formats.

Highlight

  • Centos server installation
  • Creating a MySQL database
  • WordPress installation & configuration
View More

Free Career Counselling

Certification

This is a comprehensive course that is designed to clear multiple certifications such as:

  • CCA Spark and Hadoop Developer (CCA175)
  • Splunk Certified Power User Certification
  • Splunk Certified Admin Certification
  • Apache Cassandra DataStax Certification
  • Linux Foundation Linux Certification
  • Java SE Programmer Certification

Furthermore, you will also be rewarded as a “Big Data Professional” for completing the following learning path that are co-created with IBM:

  • Spark Fundamentals I
  • Spark MLlib
  • Spark Fundamentals II
  • Python for Data Science

The complete course is created and delivered in association with IBM to get top jobs in the world’s best organizations. The entire training includes real-world projects and case studies that are highly valuable.

As part of this training, you will be working on real-time projects and assignments that have immense implications in the real-world industry scenario, thus helping you fast-track your career effortlessly.

At the end of this training program, there will be quizzes that perfectly reflect the type of questions asked in the respective certification exams and help you score better marks.

Intellipaat Course Completion Certificate will be awarded upon the completion of project work (after expert review) and upon scoring at least 60% marks in the quiz. Intellipaat certification is well recognized in top 80+ MNCs like Ericsson, Cisco, Cognizant, Sony, Mu Sigma, Saint-Gobain, Standard Chartered, TCS, Genpact, Hexaware, etc.

Our Alumni works at top 3000+ companies

client-desktop client-mobile

Course Advisor

Suresh Paritala

Suresh Paritala

Solutions Architect at Microsoft, USA

A Senior Software Architect at NextGen Healthcare who has previously worked with IBM Corporation, Suresh Paritala has worked on Big Data, Data Science, Advanced Analytics, Internet of Things and Azure, along with AI domains like Machine Learning and Deep Learning. He has successfully implemented high-impact.

David Callaghan

David Callaghan

Big Data Solutions Architect, USA

An experienced Blockchain Professional who has been bringing integrated Blockchain, particularly Hyperledger and Ethereum, and Big Data solutions to the cloud, David Callaghan has previously worked on Hadoop, AWS Cloud, Big Data and Pentaho projects that have had major impact on revenues of marquee brands around the world.

Samanth Reddy

Samanth Reddy

Data Team Lead at Sony, USA

A renowned Data Scientist who has worked with Google and is currently working at ASCAP, Samanth Reddy has a proven ability to develop Data Science strategies that have a high impact on the revenues of various organizations. He comes with strong Data Science expertise and has created decisive Data Science strategies for Fortune 500 corporations.

Frequently Asked Questions

What Is Intellipaat’s Masters Course And How It Is Different From Individual Courses?

Intellipaat’s Masters Course is a structured learning path specially designed by industry experts which ensures that you transform into Big Data expert. Individual courses at Intellipaat focus on one or two specializations. However, if you have to masters big data then this program is for you

Intellipaat’s Masters Course is a structured learning path specially designed by industry experts which ensures that you transform into Big Data expert. Individual courses at Intellipaat focus on one or two specializations. However, if you have to masters big data then this program is for you

Intellipaat is the pioneer of Big Data Architect training we provide:

  • Project work & Assignment – You will work on 28 industry based project which will give you hands on experience on the technology
  • 24*7 Support – Our Team work 24*7 to clear all your doubts
  • Free Course Upgrade – Keep yourself updated with latest version hence it’s a lifetime investment at one go
  • Flexible Schedule –You can attend as many batches as you want or if you are busy then you can postpone your classes to our next available batches without any extra charges.
  • Resume Preparation & Job Assistance –We will help you to prepare your resume and market your profile for jobs. We have more than 80 clients across the globe (India, US, UK, etc.) and we circulate our learner’s profiles to them.

Intellipaat offers the self-paced training and online instructor-led training.

Hadoop developer, Hadoop admin, Hadoop analyst, Hadoop testing, Spark & Scala, Python, Splunk Developer & Admin and  Java, Apache Storm, Apache Cassandra, Apache kafka are self-paced courses

If you have any queries you can contact our 24/7 dedicated support to raise a ticket. We provide you email support and solution to your queries. If the query is not resolved by email we can arrange for a one-on-one session with our trainers. The best part is that you can contact Intellipaat even after completion of training to get support and assistance. There is also no limit on the number of queries you can raise when it comes to doubt clearance and query resolution.

We provide you with the opportunity to work on 28 real world projects wherein you can apply your knowledge and skills that you acquired through our training, making you perfectly industry- ready

Yes, Intellipaat does provide you with placement assistance. We have tie-ups with 80+ organizations including Ericsson, Cisco, Cognizant, TCS, among others that are looking for Hadoop professionals and we would be happy to assist you with the process of preparing yourself for the interview and the job

Upon successful completion of training you have to take a set of quizzes, complete the projects and upon review and on scoring over 60% marks in the qualifying quiz the official Intellipaat verified certificate is awarded.The Intellipaat Certification is a seal of approval and is highly recognized in 80+ corporations around the world including many in the Fortune 500 list of companies.

Preferably 8 GB RAM (Windows or Mac) with a good internet connection

All the instructors are from the industry with over 18+ years’ experience. They are subjects experts and each of them has gone through rigorous selection process.

At Intellipaat you can enroll either for the instructor-led online training or self-paced training. Apart from this Intellipaat also offers corporate training for organizations to upskill their workforce. All trainers at Intellipaat have 12+ years of relevant industry experience and they have been actively working as consultants in the same domain making them subject matter experts. Go through the sample videos to check the quality of the trainers.
Intellipaat is offering the 24/7 query resolution and you can raise a ticket with the dedicated support team anytime. You can avail the email support for all your queries. In the event of your query not getting resolved through email we can also arrange one-to-one sessions with the trainers. You would be glad to know that you can contact Intellipaat support even after completion of the training. We also do not put a limit on the number of tickets you can raise when it comes to query resolution and doubt clearance.
Intellipaat offers the self-paced training to those who want to learn at their own pace. This training also affords you the benefit of query resolution through email, one-on-one sessions with trainers, round the clock support and access to the learning modules or LMS for lifetime. Also you get the latest version of the course material at no added cost. The Intellipaat self-paced training is 75% lesser priced compared to the online instructor-led training. If you face any problems while learning we can always arrange a virtual live class with the trainers as well.
Intellipaat is offering you the most updated, relevant and high value real-world projects as part of the training program. This way you can implement the learning that you have acquired in a real-world industry setup. All training comes with multiple projects that thoroughly test your skills, learning and practical knowledge thus making you completely industry-ready. You will work on highly exciting projects in the domains of high technology, ecommerce, marketing, sales, networking, banking, insurance, etc. Upon successful completion of the projects your skills will be considered equal to six months of rigorous industry experience.
Intellipaat actively provides placement assistance to all learners who have successfully completed the training. For this we are exclusively tied-up with over 80 top MNCs from around the world. This way you can be placed in outstanding organizations like Sony, Ericsson, TCS, Mu Sigma, Standard Chartered, Cognizant, Cisco, among other equally great enterprises. We also help you with the job interview and résumé preparation part as well.
You can definitely make the switch from self-paced to online instructor-led training by simply paying the extra amount and joining the next batch of the training which shall be notified to you specifically.
Once you complete the Intellipaat training program along with all the real-world projects, quizzes and assignments and upon scoring at least 60% marks in the qualifying exam; you will be awarded the Intellipaat verified certification. This certificate is very well recognized in Intellipaat affiliate organizations which include over 80 top MNCs from around the world which are also part of the Fortune 500 list of companies.
Apparently, No. Our Job Assistance program is aimed at helping you land in your dream job. It offers a potential opportunity for you to explore various competitive openings in the corporate world and assists you in finding a well-paid job, matching your profile. The final decision on your hiring will always be based on your performance in the interview and the requirements of the recruiter.
View More

Talk to us

Select Currency