This is an all-inclusive Big Data and Data Science Course that includes in-depth study of Hadoop and its ecosystem, the various programming languages, NoSQL database training along with business intelligence, statistics and probability. Taking this all-in-one 16 courses will equip you with all the skills needed to be a Data Scientist.
Anybody can take this Training Course regardless of their prior skills.
Big data professionals are in huge demand and due to this there are not enough qualified professionals to meet the industry requirement. Taking this all-in-one Combo Course can prepare you to meet the challenges of the industry in the Data Scientist role. This will help you grow your career and go for big salary hikes too.
Topics – Introduction of Hadoop, Problems with data growth, Solving Data Problems, Hadoop Overview, Understanding Mapreduce, Setting the stage for big data problem solving with MapReduce, Parallel Copying with Hadoop distcp, Hadoop fs, Hadoop Archives
Topics – Introduction to Distributed File System, What is Hadoop Distributed file System (HDFS) , HDFS Design Principle & Failure, HDFS Architecture High Availability Mode and Federated Mode, Overall Architecture of HDFS, HDFS Demons, Basic HDFS Commands, Understanding Map Reduce, Hadoop Architecture, Difference between MR1 and MR2, What is YARN, Yarn jobs, Resource Management.
Topics – Hadoop 2.x Cluster Architecture , Federation and High Availability, A Typical Production Hadoop Cluster, Hadoop Cluster Modes, Common Hadoop Shell Commands, Hadoop 2.x Configuration Files, Cloudera Single node cluster
Topics – What is Hadoop Map Reduce and examples, Conceptual Understanding between Map and Reduce, Anatomy of a YARN Application Run, YARN MR Application Execution Flow, YARN Workflow,Write a Map Reduce Programme using Hadoop Framework
Topics – What is Functional Programming, Difference between Functional and Imperative Programming, What is Mapping, What is Reducer, Phase of Map and Reduce,Combiner , Partitioner, Shuffle & Sort Phase, Map reduce job submission flow, Map Reduce Types- Input and Output Formats, Custom Formats, Hadoop APIs, exercise on Input and Output Format, Task Execution, Hadoop commands , Map Reduce Features : Counters, Sorting, Reduce Joins, Side Data Distribution ,Map Reduce Library Classes, Hadoop Streaming, Aggregate Data, Example of calculating time a user has spent on an Activity.
Topics – Map Reduce Problem Statement, Hadoop Mapper, Mapper Problem, How to Handle Multiple Mapper, Multiple Inputs,Working with Multiple Input Formats
Topics – What is Graph, Graph Representation, Breadth first Search Algorithm, Graph Representation of Map Reduce, How to do the Graph Algorithm, Example of Graph Map Reduce,
Topics – What Is Pig?, Pig’s Features, Pig Use Cases, Interacting with Pig
Topics – Pig Latin Syntax, Loading Data, Simple Data Types, Field Definitions, Data Output, Viewing the Schema, Filtering and Sorting Data, Commonly-Used Functions, Hands-On Exercise: Using Pig for ETL Processing
Topics – Complex/Nested Data Types, Grouping, Iterating Grouped Data, Hands-On Exercise: Analyzing Data with Pig
Topics – Techniques for Combining Data Sets, Joining Data Sets in Pig, Set Operations, Splitting Data Sets, Hands-On Exercise
Topics – Macros and Imports, UDFs, Using Other Languages to Process Data with Pig, Hands-On Exercise: Extending Pig with Streaming and UDFs
Topics – What Is Hive?, Hive Schema and Data Storage, Comparing Hive to Traditional Databases, Hive vs. Pig, Hive Use Cases, Interacting with Hive
Topics – Hive Databases and Tables, Basic HiveQL Syntax, Data Types, Joining Data Sets, Common Built-in Functions,Hands-on Exercise: Running Hive Queries on the Shell, Scripts, and Hue
Topics – Hive Data Formats, Creating Databases, Modeling in Hive and Hive-Managed Tables, Loading Data into Hive, Altering Databases and Tables, Self-Managed Tables, Simplifying Queries with Views, Storing Query Results, Controlling Access to Data, Hands-On Exercise: Data Management with Hive, Thrift server, Metastore in Hive,
Topics – Understanding Query Performance, Partitioning, Bucketing, Indexing Data
Topics – User-Defined Functions in Hive
Topics – What is Impala?, How Impala Differs from Hive and Pig, How Impala Differs from Relational Databases, Limitations and Future Directions, Using the Impala Shell
Topics – Data Storage Overview, Creating Databases and Tables, Loading Data into Tables, HCatalog, Impala Metadata Caching
Topics – Partitioning Overview, Partitioning in Impala and Hive
Topics – Selecting a File Format, Hadoop Tool Support for File Formats, Avro Schemas, Using Avro with Hive and Sqoop, Avro Schema Evolution, Compression
Topics – What is Hbase, Where does it fits, What is NOSQL
Topics – What is Spark, Comparison with Hadoop, Components of Spark
Topics – Apache Spark- Introduction, Consistency, Availability, Partition, Unified Stack Spark, Spark Components, Comparison with Hadoop – Scalding example, mahout, storm, graph
Topics – Explain python example, Show installing a spark, Explain driver program, Explaining spark context with example, Define weakly typed variable, Combine scala and java seamlessly, Explain concurrency and distribution., Explain what is trait, Explain higher order function with example, Define OFI scheduler, Advantages of Spark, Example of Lamda using spark, Explain Mapreduce with example
Topics – Hadoop Multi Node Cluster Setup using Amazon ec2 – Creating 4 node cluster setup, Running Map Reduce Jobs on Cluster
Topics – Putting it all together and Connecting Dots, Working with Large data sets, Steps involved in analyzing large data
Topics – How ETL tools work in Big data Industry, Connecting to HDFS from ETL tool and moving data from Local system to HDFS, Moving Data from DBMS to HDFS, Working with Hive with ETL Tool, Creating Map Reduce job in ETL toolEnd to End ETL PoC showing Hadoop integration with ETL tool.
Topics – Hadoop configuration overview and important configuration file, Configuration parameters and values, HDFS parameters MapReduce parameters, Hadoop environment setup, ‘Include’ and ‘Exclude’ configuration files, Lab: MapReduce Performance Tuning
Topics – Namenode/Datanode directory structures and files, File system image and Edit log, The Checkpoint Procedure, Namenode failure and recovery procedure, Safe Mode, Metadata and Data backup, Potential problems and solutions / what to look for, Adding and removing nodes, Lab: MapReduce File system Recovery
Topics – Best practices of monitoring a Hadoop cluster, Using logs and stack traces for monitoring and troubleshooting, Using open-source tools to monitor Hadoop cluster
Topics – How to schedule Hadoop Jobs on the same cluster, Default Hadoop FIFO Schedule, Fair Scheduler and its configuration
Topics – Hadoop Multi Node Cluster Setup using Amazon ec2 – Creating 4 node cluster setup, Running Map Reduce Jobs on Cluster
Topics – ZOOKEEPER Introduction, ZOOKEEPER use cases, ZOOKEEPER Services, ZOOKEEPER data Model, Znodes and its types, Znodes operations, Znodes watches, Znodes reads and writes, Consistency Guarantees, Cluster management, Leader Election, Distributed Exclusive Lock, Important points
Topics – Why Oozie?, Installing Oozie, Running an example, Oozie- workflow engine, Example M/R action, Word count example, Workflow application, Workflow submission, Workflow state transitions, Oozie job processing, OozieHadoop security, Why Oozie security?, Job submission to hadoop, Multi tenancy and scalability, Time line of Oozie job, Coordinator, Bundle, Layers of abstraction, Architecture, Use Case 1: time triggers, Use Case 2: data and time triggers, Use Case 3: rolling window
Topics – Overview of Apache Flume, Flume for Hadoop, Physically distributed Data sources, Changing structure of Data, Closer look, Anatomy of Flume, Core concepts, Event, Clients, Agents, Source, Channels, Sinks, Interceptors, Channel selector, Sink processor, Data ingest, Agent pipeline, Transactional data exchange, Routing and replicating, Why channels?, Use case- Log aggregation, Adding flume agent, Handling a server farm, Data volume per agent, Example describing a single node flume deployment
Topics – HUE introduction, HUE ecosystem, What is HUE?, HUE real world view, Advantages of HUE, How to upload data in File Browser?, View the content, Integrating users, Integrating HDFS, Fundamentals of HUE FRONTEND
Topics – IMPALA Overview: Goals, User view of Impala: Overview, User view of Impala: SQL, User view of Impala: Apache HBase, Impala architecture, Impala state store, Impala catalogue service, Query execution phases, Comparing Impala to Hive
Topics – Why Hadoop testing is important, Unit testing, Integration testing, Performance testing, Diagnostics, Nightly QA test, Benchmark and end to end tests, Functional testing, Release certification testing, Security testing, Scalability Testing, Commissioning and Decommissioning of Data Nodes Testing, Reliability testing, Release testing
Topics – Understanding the Requirement, preparation of the Testing Estimation, Test Cases, Test Data, Test bed creation, Test Execution, Defect Reporting, Defect Retest, Daily Status report delivery, Test completion, ETL testing at every stage (HDFS, HIVE, HBASE) while loading the input (logs/files/records etc) using sqoop/flume which includes but not limited to data verification, Reconciliation, User Authorization and Authentication testing (Groups, Users, Privileges etc), Report defects to the development team or manager and driving them to closure, Consolidate all the defects and create defect reports, Validating new feature and issues in Core Hadoop.
Topics – Report defects to the development team or manager and driving them to closure, Consolidate all the defects and create defect reports, Validating new feature and issues in Core Hadoop, Responsible for creating a testing Framework called MR Unit for testing of Map-Reduce programs.
Topics – Automation testing using the OOZIE, Data validation using the query surge tool.
Topics – Test plan for HDFS upgrade, Test automation and result
Topics – How to test install and configure
Topics – Major Project on Big Data and Hadoop, Hadoop Development, Cloudera Certification Tips and Guidance and Mock Interview Preparation, Practical Development Tips and Techniques, certification preparation
Topics: Understanding of R statistical computing and graphics, the statistical packages, familiarity with different datatypes and functions, learning to deploy them in various scenarios, use SQL to apply ‘join’ function.
Topics: R Functions, code compilation and data in well-defined format called R-Packages, learn about R-Package structure, Package metadata and testing, CRAN (Comprehensive R Archive Network), Vector creation and variables values assignment.
Topics: R functionality, Rep Function, generating Repeats, Sorting and generating Factor Levels, Transpose and Stack Function.
Topics: Understanding various functions like Merge, Strsplit, understanding Matrices and Manipulation of Matrix, Row Sums
Topics: Deploying R for plotting graphs, pie charts, bar plots, histogram and understanding components of Pie Chart.
Topics: One Way Analysis of Variance, Two Way Analysis of Variance
Topics: Understanding K-Means Clustering, and the workings of Cluster Algorithm, the association rule mining affinity analysis for data mining and analysis and learning co-occurrence relationships.
Topics: Learn about dependent and independent variables, linear regression and scatter plots
Topics: The concepts of Logistic Regression, deploying Logistic Regression in R, set of examples and implementation.
Topics: What is Area under ROC Curve? R –Sensitivity & Specificity, R Open Database Connectivity, deploying ODBC Tables for reading data, application of Confusion Matrix for performance visualization.
Topics: Creating an integrated environment for deploying R on Hadoop platform, working with RHadoop, RMR package and R Hadoop Integrated Programming Environment, R programming for MapReduce jobs and Hadoop execution.
Topics: Classification and Recommendation, Clustering in Mahout, Pattern Mining, Understanding machine Learning, Using Model diagram to decide the approach, Data flow, Supervised and Unsupervised learning
Topics: Concept of Recommendation, Recommendations by E-commerce site, Comparison between User Recommendations and Item recommendation, Define recommenders and Classifiers, Process of Collaborative Filtering, Explaining Pearson coefficient algorithm, Euclidean distance measure, Implementing a recommender using map reduce
Topics: Defining Clustering, User-to-user similarity, Clustering Illustration, Euclidean distance measure, Distance measure vector, Understanding the process of Clustering, Vectorizing documents-Unstructured data
Topics: Document clustering, Sequence-to-sparse Utility, K-Mean Clustering
Topics: Terminology, Predictor and Target variable, Classifiable DataKey Challenges in Classification algorithm, Vectorizing Continuous data, Classification Examples, Logic Regression and its examples
Topics: Clustering, Clustering Process, Transaction Clustering, Different techniques of Vectorization, Distance measure, Clustering algorithm-K-MEAN, Clustering Application-1, Clustering Application-2, Sentiment Analyzer
Topics: Pearson Coefficient, Collaborative Filtering Process, Collaborative Filtering, Similarity Algorithms, Pearson Correlation, Euclidean Distance Measure -Frequent Pattern & Association rules, Frequent Pattern Growth
Topics: Introduction to Data Science, importance of Data Science, statistical and analytical methods, deploying Data Science for Business Intelligence, transforming data, machine learning and introduction to Recommender systems.
Topics: How Data Science solves real world problems, Data Science Project Life Cycle, principles of Data Science, introduction to various BI and Analytical tools, data collection, introduction to statistical packages, data visualization tools, R Programming, predictive modelling, machine learning, artificial intelligence and statistical analysis.
Topics: Boxplot in R programming, understanding distribution and percentile, identifying outliers, Rstudio Tool, various types of distribution like Normal, Uniform and Skewed.
Topics: Deploying machine learning for data analysis, solving business problems, using algorithms for searching patterns in data, relationship between variables, multivariate analysis, interpreting correlation, negative correlation.
Topics: Data Transformation key phases Data Mapping and Code Generation, Data Processing operation, data patterns, data sampling, sampling distribution, normal and continuous variable, data extrapolation, regression, linear regression model.
Topics: Data analysis, hypothesis testing, simple linear regression, Chi-square for assessing compatibility between theoretical and observed data, implementing data testing on data warehouse, validating data, checking for accuracy, data operational monitoring capabilities.
Topics: Various techniques of data modelling and generating algorithms, methods of business prediction, prediction approaches, data sampling, disproportionate sampling, data modelling rules, data iteration, and deploying data for mission-critical applications.
Topics: Working with large datasets in data warehouses, data clustering, grouping, horizontal & vertical slicing, data sharding in partitioning, clustering algorithms, K-means Clustering for analysing and data mining, exclusive clustering, hierarchy clustering, Mahout Clustering algorithm and Probabilistic Clustering, nearest neighbour search, pattern recognition, and statistical classification.
Topics: Introduction to R statistical computing and graphics, concepts, features and advantages of R, Big Data Hadoop familiarity, integrating R and Hadoop, basic architecture, framework, installing RImpala packages.
Topics: What is statistics?, How is this useful, What is this course for
Topics: Converting data into useful information, Collecting the data, Understand the data, Finding useful information in the data, Interpreting the data, Visualizing the data
Topics: Descriptive statistics, Let us understand some terms in statistics, Variable
Topics: Dot Plots, Histogram, Stemplots, Box and whisker plots, Outlier detection from box plots and Box and whisker plots
Topics: What is probability?, Set & rules of probability, Bayes Theorem
Topics: Probability Distributions, Few Examples, Student T- Distribution, Sampling Distribution, Student t- Distribution, Poison distribution
Topics: Stratified Sampling, Proportionate Sampling, Systematic Sampling, P – Value, Stratified Sampling
Topics: Cross Tables, Bivariate Analysis, Multi variate Analysis, Dependence and Independence tests ( Chi-Square ), Analysis of Variance, Correlation between Nominal variables
Topics : Search Engine Basics, Lucene Overview & Features, Indexing Basics, Architecture, Inverted Indexing Technique, Lucene Schema (Documents & Fields), Analyzers, Query Types, use cases of search engine,Writing & Searching Index
Topics : Analyzers, Querying, Scoring, Boosting, Highlighting, Faceting, Grouping, Joins, Spatial Search, Configure lucene with Java, Demonstrate Writing ( Indexing )& Searching with Various methods.Apache tika
Topics : About Solr, Installing and running Solr, Introduction to Solr cores,Data types available in Solr, Adding content to Solr, Reading a Solr XML response, Changing parameters in the URL, Using the browse interface
Topics : Introduction to Solr client, Configure Solr Client, Adding your own content to Solr, Deleting data from Solr, Building a bookstore search, Adding book data, Exploring the book data, Dedupeup date processor
Topics : Sorting results, Query parsers, More queries, Hardwiring request parameters, Adding fields to default search, Faceting, Result grouping
Topics : Adding fields to the schema, Analyzing text
Topics : Field weighting, Phrase queries, Function queries, Fuzzier search, Sounds-like
Topics : More-like-this, Geospatial, Spell checking, Suggestions, Highlighting, Pseudo-fields, Pseudo-joins, Multi language, Faceting, Query Re-Ranking, , , Pagination, Grouping, Clustering, Spatial Search, Collapsing & Expanding, Exporting Results, Real-Time Search & Get, Client API’s.
Topics : Adding more kinds of data, Joining between cores.
Topics : Introduction, How SolrCloud works, Commit strategies, Introduction to ZooKeeper, Managing Solr config files.Managing Solrconfig.xml, Managing solr.xml, Managing Multiple Cores, Plugins, JVM Settings, Running On Tomcat / Jetty, Logging & SSL, Sharding, replication.
Topics: Introduction to Splunk, Splunk developer roles and responsibilities
Topics: Writing Splunk query for search, sharing, saving, scheduling and exporting search results
Topics: Creation of alert, explaining alerts and viewing fired alerts
Topics: Introduction to Tags in Splunk, deploying Tags for Splunk search, understanding event types and utility, generating and implementing event types in Search
Topics: Search Command study, search practices in general, detailed understanding of search, search field performance with different commands like table,multikv, rename, rex&erex
Topics: Using following commands and their functions:addcoltotals, addtotals,top, rare,stats
Topics: Explore the available visualizations, create charts and timecharts, omit null values and format results
Topics: Calculating and analyzing results, value conversion, round and format values, using eval command, conditional statements, filtering calculated search results
Topics: Understanding Search Transactions
Topics: Learn about data lookups, example, lookup table, defining and configuring automatic lookup, deploying lookup in reports and searches
Topics: Creating search charts, reports and dashboards
Topics: Working with raw data for data extraction, transformation, parsing and preview
Topics: Splunk installation, configuration, data inputs, app management, Splunk important concepts, parsing machine-generated data, search indexer and forwarder.
Topics: Introduction to Splunk Configuration Files, Universal Forwarder, Forwarder Management, data management, troubleshooting and monitoring.
Topics: Converting machine-generated data into operational intelligence, setting up Dashboard, Reports and Charts, integrating Search Head Clustering & Indexer Clustering.
Topics: Understanding the input methods, deploying scripted, Windows, network and agentless input types, fine-tuning it all.
Topics: Splunk User authentication and Job Role assignment, learning to manage, monitor and optimize Splunk Indexes.
Topics: Understanding parsing of machine-generated data, manipulation of raw data, previewing and parsing, data field extraction.
Topics: Distributed search concepts, improving search performance, large scale deployment and overcoming execution hurdles, working with Splunk Distributed Management Console for monitoring the entire operation.
Topics: What is Spark, what is in-memory MapReduceFrom Hadoop MapReduce to Spark, Spark Hadoop YARN, HDFS Revision, YARN Revision, Spark Overview and how spark is better than Hadoop and Spark without Hadoop used in Industries?
Topics: How to install spark, using the Spark Shell, RDDs (Resilient Distributed Datasets), Functional Programming in Spark, Spark Architecture
Topics: Creating RDDs, Other General RDD Operations
Topics: Key-Value Pair RDDs, Spark MapReduce, Other Pair RDD Operations
Topics: Spark Applications vs. Spark Shell,Creating the SparkContext, Building a Spark Application (Scala and Java), Running a Spark Application, the Spark Application Web UI, Hands-On Exercise: Write and Run a Spark Application, Configuring Spark Properties, Logging
Topics: Review: Spark on a Cluster, RDD Partitions, Partitioning of File-based RDDs, HDFS and Data Locality, Executing Parallel Operations, Stages and Tasks
Topics: RDD Lineage, RDD Persistence Overview, Distributed Persistence
Topics: Spark Streaming Overview, Example: Streaming Request Count, DStreams, Developing Spark Streaming Applications, Spark Stream processing
Topics: Multi-Batch Operations, State Operations, Sliding Window Operations, Advanced Data Sources
Topics: Common Spark Use Cases, Iterative Algorithms in Spark, Spark Graph Processing and Analysis, Machine Learning spark example k-means
Topics: Shared Variables: Broadcast Variables, Shared Variables: Accumulators, Common Performance Issues, Diagnosing Performance Problems
Topics: Spark SQL and the SQL Context, Creating DataFrames, Transforming and Querying Data Frames, Saving DataFrames, DataFrames and RDDs, Comparing Spark SQL, Impala and Hive-on-Spark
Topics: Task Scheduling/ Distribution, Scheduling Around Applications, Static Partitioning, Dynamic Sharing, Scheduling Within an Application, Fair Scheduling, High Availability of Spark Master,Standby Masters With Zookeeper, Single Node Recovery With Local File System, High Order Functions
Topics: Practical’s: Creating Maps, Transformations, Capacity planning in spark, Concurrency in java, Concurrency in Scala
Topics: Array Buffers, Compact Buffer, Protocol Buffer, Log Analysis With Spark, First Log Analyzers In Spark.
Topics: Scala Overview and Scala for big data and Apache Spark analytics
Topics: Play with Scala, Advantages of Scala, REPL (Read Evaluate print loop), Language Features, Type Interface, Higher order function, Option, Pattern Matching, Collection, Currying, Traits, Application Space and Scala for data analysis
Topics: Uses of Scala interpreter, Example of static object timer in Scala, Testing of String equality in Scala, Implicit classes in Scala with examples, Recursion in Scala for each, Currying in Scala with examples, Classes in Scala
Topics: Constructor, Constructor overloading, Properties, Abstract classes, Type hierarchy in Scala, Object equality, Val and var methods
Topics: Sealed traits, Case classes, Constant pattern in case classes, Wild card pattern, Variable pattern, Constructor pattern, Tuple pattern
Topics: Java equivalents, Advantages of traits, avoiding boilerplate code, Linearization of traits, modeling a real world example
Topics: How traits are implemented in Scala and java, How extending multiple traits is handled
Topics: Classification of Scala collections, Iterable, Iterator and iterable, List sequence example in Scala
Topics: Array in Scala, List in Scala, Difference between list and list buffer, Array buffer, Queue in Scala, Dequeue in Scala, Mutable queue in Scala, Stacks in Scala, Sets and maps in Scala, Tuples
Topics: Different import types, Selective imports, Testing-Assertions, Scala test case- Scala test fun. Suite, Junit test in Scala, Interface for Junit via Junit 3 suite in Scala test, SBT, Directory structure for packaging Scala application, Scala Split and Spark Scala example.
Topics: Big Data characteristics, understanding Hadoop distributed computing, the Bayesian Law, deploying Storm for real time analytics, the Apache Storm features, comparing Storm with Hadoop, Storm execution, learning about Tuple, Spout, Bolt.
Topics: Installing the Apache Storm, various types of run modes of Storm.
Topics: Understanding Apache Storm and the data model.
Topics: Installation of Apache Kakfa and its configuration.
Topics: Understanding of advanced Storm topics like Spouts, Bolts, Stream Groupings, Topology and its Lifecycle, learning about Guaranteed Message Processing.
Topics: Various Grouping types in Storm, reliable and unreliable messages, Bolt structure and lifecycle, understanding Trident topology for failure handling, process, CallLogAnalysis Topology for analyzing call logs for calls made from one number to another.
Topics: Understanding of Trident Spouts and its different types, the various Trident Spout interface and components, familiarizing with Trident Filter, Aggregator and Functions, a practical and hands-on use case on solving call log problem using Storm Trident.
Topics: Various components, classes and interfaces in storm like – BaseRichBolt Class, iRichBolt Interface, iRichSpout Interface, BaseRichSpout class and the various methodology of working with them.
Topics: Understanding Cassandra, its core concepts, its strengths and deployment.
Topics: Twitter Boot Stripping, detailed understanding of Boot Stripping, concepts of Storm, Storm Development Environment.
Topics : Introduction to Cassandra, its strengths and deployment areas
Topics : Significance of NoSQL, RDBMS Replication, Key Challenges, types of NoSQL, benefits and drawbacks, salient features of NoSQL database. CAP Theorem, Consistency.
Topics : Installation, introduction to Cassandra, key concepts and deployment of non relational database, column-oriented database, Data Model – column, column family,
Topics : Token calculation, Configuration overview, Node tool, Validators, Comparators, Expiring column, QA
Topics : How Cassandra modelling varies from Relational database modelling, Cassandra modelling steps, introduction to Time Series modelling, comparing Column family Vs. Super Column family, Counter column family, Partitioners, Partitioners strategies, Replication, Gossip protocols, Read operation, Consistency, Comparison
Topics : Creation of multimodecluster, node settings, Key and Row cache, System Keyspace, understanding of Read Operation, Cassandra Commands overview, VNodes, Column family
Topics : JSON, Hector client, AVRO, Thrift, JAVA code writing method, Hector tag
Topics : Cassandar management, commands of node tool, MapReduce and Cassandra, Secondary index, Datastax Installation
Topics : Rules of Cassandra data modelling, increasing data writes, duplication, and reducing data reads, modelling data around queries, creating table for data queries
Topics : Understanding the Java application creation methodology, learning key drivers, deploying the IDE for Cassandra applications,cluster connection and data query implementation
Topics : Learning about Node Tool Utility, cluster management using Command Line Interface, Cassandra management and monitoring via DataStax Ops Center.
Topics : Cassandra client connectivity, connection pool internals, API, important features and concepts of Hector client, Thrift, JAVA code, Summarization.
Topics: RDBMS, types of relational databases, challenges of RDBMS, NoSQL database, its significance, how NoSQL suits Big Data needs, Introduction to MongoDB and its advantages, MongoDB installation, JSON features, data types and examples.
Topics: Installing MongoDB, basic MongoDB commands and operations, MongoChef (MongoGUI) Installation, MongoDB Data types.
Topics: The need for NoSQL, types of NoSQL databases, OLTP, OLAP, limitations of RDBMS, ACID properties, CAP Theorem, Base property, learning about JSON/BSON, database collection & document, MongoDB uses, MongoDB Write Concern – Acknowledged, Replica Acknowledged, Unacknowledged, Journaled, Fsync.
Topics: Understanding CRUD and its functionality, CRUD concepts, MongoDB Query & Syntax, read and write queries and query optimization.
5. Data Modeling & Schema Design
Topics: Concepts of data modeling, difference between MongoDB and RDBMS modeling, Model tree structure, operational strategies, monitoring and backup.
Topics: In this module you will learn MongoDB® Administration activities such as Health Check, Backup, Recovery, database sharding and profiling, Data Import/Export, Performance tuning etc.
Topics: Concepts of data aggregation and types, data indexing concepts, properties and variations.
Topics: Understanding database security risks, MongoDB security concept and security approach, MongoDB integration with Java and Robomongo.
Topics: Implementing techniques to work with variety of unstructured data like images, videos, log data, and others, understanding GridFS MongoDB file system for storing data.
Project 1 – Working with MapReduce, Hive, Sqoop
Topics : This project is involved with working on the various Hadoop components like MapReduce, Apache Hive and Apache Sqoop. Work with Sqoop to import data from relational database management system like MySQL data into HDFS. Deploy Hive for summarizing data, querying and analysis. Convert SQL queries using HiveQL for deploying MapReduce on the transferred data. You will gain considerable proficiency in Hive, and Sqoop after completion of this project.
Project 2 – Work on MovieLens data for finding top records
Data – MovieLens dataset
Topics : In this project you will work exclusively on data collected through MovieLens available rating data sets. The project involves the following important components:
Project 3 – Hadoop YARN Project – End to End PoC
Topics : In this project you will work on a live Hadoop YARN project. YARN is part of the Hadoop 2.0 ecosystem that lets Hadoop to decouple from MapReduce and deploy more competitive processing and wider array of applications. You will work on the YARN central Resource Manager. The salient features of this project include:
Project 4 – Partitioning Tables in Hive
Topics : This project involves working with Hive table data partitioning. Ensuring the right partitioning helps to read the data, deploy it on the HDFS, and run the MapReduce jobs at a much faster rate. Hive lets you partition data in multiple ways like:
This will give you hands-on experience in partitioning of Hive tables manually, deploying single SQL execution in dynamic partitioning, bucketing of data so as to break it into manageable chunks.
Project 5 – Connecting Pentaho with Hadoop Ecosystem
Topics : This project lets you connect Pentaho with the Hadoop ecosystem. Pentaho works well with HDFS, HBase, Oozie and Zookeeper. You will connect the Hadoop cluster with Pentaho data integration, analytics, Pentaho server and report designer. Some of the components of this project include the following:
Project 6 – Multi-node cluster setup
Topics : This is a project that gives you opportunity to work on real world Hadoop multi-node cluster setup in a distributed environment. The major components of this project involve:
You will get a complete demonstration of working with various Hadoop cluster master and slave nodes, installing Java as a prerequisite for running Hadoop, installation of Hadoop and mapping the nodes in the Hadoop cluster.
Project 7 – Hadoop Testing using MR
Topics : In this project you will gain proficiency in Hadoop MapReduce code testing using MRUnit. You will learn about real world scenarios of deploying MRUnit, Mockito, and PowerMock. Some of the important aspects of this project include:
After completion of this project you will be well-versed in test driven development and will be able to write light-weight test units that work specifically on the Hadoop architecture.
Project 8 – Hadoop Weblog Analytics
Data – Weblogs
Topics : This project is involved with making sense of all the web log data in order to derive valuable insights from it. You will work with loading the server data onto a Hadoop cluster using various techniques. The various modules of this project include:
The web log data can include various URLs visited, cookie data, user demographics, location, date and time of web service access, etc. In this project you will transport the data using Apache Flume or Kafka, workflow and data cleansing using MapReduce, Pig or Spark. The insight thus derived can be used for analyzing customer behavior and predict buying patterns.
Project 9 – Hadoop Maintenance
Topics : This project is involved with working on the Hadoop cluster for maintaining and managing it. You will work on a number of important tasks like:
Project Title – Restaurant Revenue Prediction
Dataset – Sales
Project Description – This project involves predicting the sales of a restaurant on the basis of certain objective measurements. This project will give real time industry experience on handling multiple use cases and derive the solution. This project gives insights about feature engineering and selection.
Project 1 – Understanding Cold Start Problem in Data Science
Topics: This project involves understanding of the cold start problem associated with the recommender systems. You will gain hands-on experience in information filtering, working on systems with zero historical data to refer to, as in the case of launching a new product. You will gain proficiency in working with personalized applications like movies, books, songs, news and such other recommendations. This project includes the following:
Project 2 – Recommendation for Movie, Summary
Topics: This is real world project that gives you hands-on experience in working with a movie recommender system. Depending on what movies are liked by a particular user, you will be in a position to provider data-driven recommendations. This project involves understanding recommender systems, information filtering, predicting ‘rating’, learning about user ‘preference’ and so on. You will exclusively work on data related to user details, movie details and others. The main components of the project include the following:
Project – Data Analysis Project
Data – Sales
Problem Statement – It includes the following actions:
Topics: Understand the business solutions, Discussion with the warehouse team, Data Collection & Storage, Data Cleaning, Build a Hypothesis Tree around the business problem, Produce the final result.
Project – Running Function Queries on Apache Solr
Topics : In this project you will learn about the Function Queries and deploy it on the search results got in Apache Solr. You will understand how exactly the Function Queries are used to modify the search results based on certain conditions. It involves working on the index store that has dimensions of a box with arbitrary names, sort all the boxes through search and then modify the search results using Function Queries based on new parameters. Some of the query parsers used are DisMax, Extended DisMax and standard.
Topics : This project gives you hands-on experience in working with the Splunk tool. You will have the data set of employee details in a text file based on which you will create a dashboard and report. Then you will deploy the various Splunk commands to perform row operations, extract certain data fields, edit the event, add tags, search with tag name for event and then save the tag search. Upon completion of this project you will learn to create a searchable repository using data that is captured, correlated and indexed in real time and ultimately visualize it using dashboard, report and alert.
Type – Field Extraction
Topics : In this project you will learn to extract fields from events using the Splunk field extraction technique. You will gain knowledge in the basics of field extractions, understand the use of field extractor, the field extraction page in Splunk web and field extract configuration in files. Learn about the regular expression and delimiters method of field extraction. Upon completion of the project you will gain expertise in building Splunk dashboard and use the extracted fields data in it to create rich visualizations in an enterprise setup.
Project 1. Movie Recommendation
Topics – This is a project wherein you will gain hands-on experience in deploying Apache Spark for movie recommendation. You will be introduced to the Spark Machine Learning Library, a guide to MLlib algorithms and coding which is a machine learning library. Understand how to deploy collaborative filtering, clustering, regression, and dimensionality reduction in MLlib. Upon completion of the project you will gain experience in working with streaming data, sampling, testing and statistics.
Project 2. Twitter API Integration for tweet Analysis
Topics – With this project you will learn to integrate Twitter API for analyzing tweets. You will write codes on the server side using any of the scripting languages like PHP, Ruby or Python, for requesting the Twitter API and get the results in JSON format. You will then read the results and perform various operations like aggregation, filtering and parsing as per the need to come up with tweet analysis.
Project 3. Data Exploration Using Spark SQL – Wikipedia dataset
Topics – This project lets you work with Spark SQL. You will gain experience in working with Spark SQL for combining it with ETL applications, real time analysis of data, performing batch analysis, deploying machine learning,creating visualizations and processing of graphs.
Project 1. Call Log Analysis using Trident
Topics : In this project you will be working on call logs to decipher the data and gather valuable insightsusing Apache Storm Trident. You will extensively work with data about calls made from one number to another. The aim of this project is to resolve the call log issues with Trident stream processing and low latency distributed querying. You will gain hands-on experience in working with Spouts and Bolts along with various Trident functions, filters, aggregation, joins and grouping.
Project 2. Twitter Data Analysis using Trident
Topics : This is a project that involves working with Twitter data and processing it to extract patterns out of it. The Apache Storm Trident is the perfect framework for real-time analysis of tweets. Working with Trident you will be able to simplify the task of live Twitter feed analysis. In this project you will gain real world experience of working with Spouts, Bolts, and Trident filters, joins, aggregation, functions and grouping.
Project 3. US Presidential Election Result analysis using Trident DRPC Query
Topics : This is a project that lets you work on the US presidential election results and predict who is leading and trailing on a real-time basis. For this you exclusively work with Trident distributed Remote Procedure Call server. After completion of the project you will learn how to access data residing in a remote computer or network and deploy it for real-time processing, analysis and prediction.
Type : Deploying the IDE for Cassandra applications
Topics : This project gives you a hands-on experience in installing and working with Apache Cassandra which is a high performance and extremely scalable database for distributed data with no single point of failure. You will deploy the Java Integrated Development Environment for running Cassandra, learn about the key drivers, work with Cassandra applications in a cluster setup and implement data querying techniques.
Java is one of the most popular programming languages for working with MongoDB. This project tells you how to work with the MongoDB Java Driver, and using MongoDB as a Java Developer. Become proficient in creating a table for inserting video using Java programming. Some of the tasks and steps involved are as below–
In Intellipaat self-paced training program you will receive recorded sessions, course material, Quiz,related software’s and assignments.The courses are designed such that you will get real world exposure and focused on clearing relevant certification exam. After completion of training you can take quiz which enable you to check your knowledge and enables you to clear relevant certification at higher marks/grade also you will be able to work on the technology independently.
In Self-paced courses trainer is not available whereas in Online training trainer will be available for answering queries at the same time. In self-paced course we provide email support for doubt clearance or any query related to training also if you face some unexpected challenges we will arrange live class with trainer.
All Courses are highly interactive to provide good exposure. You can learn at your own place and at your leisure time. Prices of self-paced is training is 75% cheaper than online training. You will have lifetime access hence you can refer it anytime during your project work or job.
Yes, at the top of the page of course details you can see sample videos.
As soon as you enroll to the course, your LMS (The Learning Management System) Access will be Functional. You will immediately get access to our course content in the form of a complete set of previous class recordings, PPTs, PDFs, assignments and access to our 24×7 support team. You can start learning right away.
24/7 access to video tutorials and Email Support along with online interactive session support with trainer for issue resolving.
Yes, You can pay difference amount between Online training and Self-paced course and you can be enrolled in next online training batch.
Yes, we will provide you the link from where you can download the required software’s.
Please send an email . You can also chat with us to get an instant solution.
Intellipaat verified certificates will be awarded based on successful completion of course projects. There are set of quizzes after each couse module that you need to go through . After successful submission, official Intellipaat verified certificate will be given to you.
Towards the end of the Course, you will have to work on a Training project. This will help you understand how the different components of course are related to each other.
Classes are conducted via LIVE Video Streaming, where you get a chance to meet the instructor by speaking, chatting and sharing your screen. You will always have the access to videos and PPT. This would give you a clear insight about how the classes are conducted, quality of instructors and the level of Interaction in the Class.
Yes, We do keep launching multiple offers, please see offer page.
We will help you with the issue and doubts regarding the course. You can attempt the quiz again.
This is a comprehensive course that is designed to clear multiple certifications viz.
The entire training course content is in line with respective certification program and helps you clear the requisite certification exam with ease and get the best jobs in the top MNCs.
As part of this training you will be working on real time projects and assignments that have immense implications in the real world industry scenario thus helping you fast track your career effortlessly.
At the end of this training program there will be quizzes that perfectly reflect the type of questions asked in the respective certification exams and helps you score better marks in certification exam.
Intellipaat Course Completion certificate will be awarded on the completion of Project work (on expert review) and upon scoring of at least 60% marks in the quiz. Intellipaat certification is well recognized in top 80+ MNCs like Ericsson, Cisco, Cognizant, Sony, Mu Sigma, Saint-Gobain, Standard Chartered, TCS, Genpact, Hexaware, etc.
You will get Lifetime access to high quality interactive tutorials along with life time access to complete Course Material .There will be 24/7 access to video tutorials with email support. If you stuck in any unexpected problem we will provide online interactive sessions with trainer for issue resolving.
We provide 24X7 support by email for issues or doubts clearance for Self-paced training.
In online Instructor led training, trainer will be available to help you out with your queries regarding the course. If required, the support team can also provide you live support by accessing your machine remotely. This ensures that all your doubts and problems faced during labs and project work are clarified round the clock.
This course is designed for clearing CCA Spark and Hadoop Developer , Cloudera Certified Administrator for Apache Hadoop (CCAH) , R certification exam , Mahout Certification Exam ,Cloudera certification (CCP:DS) , Apache Strom training CCB-400 , Apache Cassandra Professional , Apache Spark Certification examination.
At the end of the course there will be a quiz and project assignments once you complete them you will be awarded with Intellipaat Course Completion certificate.
"PMI®", "PMP®" and "PMI-ACP®" are registered marks of the Project Management Institute, Inc.
The Open Group®, TOGAF® are trademarks of The Open Group.
The Swirl logoTM is a trade mark of AXELOS Limited.
ITIL® is a registered trade mark of AXELOS Limited.
PRINCE2® is a Registered Trade Mark of AXELOS Limited.
Certified ScrumMaster® (CSM) and Certified Scrum Trainer® (CST) are registered trademarks of SCRUM ALLIANCE®
Professional Scrum Master is a registered trademark of Scrum.org