This is an all-inclusive Big Data and Data Science Course that includes in-depth study of Hadoop and its ecosystem, the various programming languages, NoSQL database training along with business intelligence, statistics and probability. Taking this all-in-one 16 courses will equip you with all the skills needed to be a Data Scientist.
Topics – Introduction of Hadoop, Problems with data growth, Solving Data Problems, Hadoop Overview, Understanding Mapreduce, Setting the stage for big data problem solving with MapReduce, Parallel Copying with Hadoop distcp, Hadoop fs, Hadoop Archives
Topics – Introduction to Distributed File System, What is Hadoop Distributed file System (HDFS) , HDFS Design Principle & Failure, HDFS Architecture High Availability Mode and Federated Mode, Overall Architecture of HDFS, HDFS Demons, Basic HDFS Commands, Understanding Map Reduce, Hadoop Architecture, Difference between MR1 and MR2, What is YARN, Yarn jobs, Resource Management.
Topics – Hadoop 2.x Cluster Architecture , Federation and High Availability, A Typical Production Hadoop Cluster, Hadoop Cluster Modes, Common Hadoop Shell Commands, Hadoop 2.x Configuration Files, Cloudera Single node cluster
Topics – What is Hadoop Map Reduce and examples, Conceptual Understanding between Map and Reduce, Anatomy of a YARN Application Run, YARN MR Application Execution Flow, YARN Workflow,Write a Map Reduce Programme using Hadoop Framework
Topics – What is Functional Programming, Difference between Functional and Imperative Programming, What is Mapping, What is Reducer, Phase of Map and Reduce,Combiner , Partitioner, Shuffle & Sort Phase, Map reduce job submission flow, Map Reduce Types- Input and Output Formats, Custom Formats, Hadoop APIs, exercise on Input and Output Format, Task Execution, Hadoop commands , Map Reduce Features : Counters, Sorting, Reduce Joins, Side Data Distribution ,Map Reduce Library Classes, Hadoop Streaming, Aggregate Data, Example of calculating time a user has spent on an Activity.
Topics – Map Reduce Problem Statement, Hadoop Mapper, Mapper Problem, How to Handle Multiple Mapper, Multiple Inputs,Working with Multiple Input Formats
Topics – What is Graph, Graph Representation, Breadth first Search Algorithm, Graph Representation of Map Reduce, How to do the Graph Algorithm, Example of Graph Map Reduce,
Topics – What Is Pig?, Pig’s Features, Pig Use Cases, Interacting with Pig
Topics – Pig Latin Syntax, Loading Data, Simple Data Types, Field Definitions, Data Output, Viewing the Schema, Filtering and Sorting Data, Commonly-Used Functions, Hands-On Exercise: Using Pig for ETL Processing
Topics – Complex/Nested Data Types, Grouping, Iterating Grouped Data, Hands-On Exercise: Analyzing Data with Pig
Topics – Techniques for Combining Data Sets, Joining Data Sets in Pig, Set Operations, Splitting Data Sets, Hands-On Exercise
Topics – Macros and Imports, UDFs, Using Other Languages to Process Data with Pig, Hands-On Exercise: Extending Pig with Streaming and UDFs
Topics – What Is Hive?, Hive Schema and Data Storage, Comparing Hive to Traditional Databases, Hive vs. Pig, Hive Use Cases, Interacting with Hive
Topics – Hive Databases and Tables, Basic Hive QL Syntax, Data Types, Joining Data Sets, Common Built-in Functions,Hands-on Exercise: Running Hive Queries on the Shell, Scripts, and Hue
Topics – Hive Data Formats, Creating Databases, Modeling in Hive and Hive-Managed Tables, Loading Data into Hive, Altering Databases and Tables, Self-Managed Tables, Simplifying Queries with Views, Storing Query Results, Controlling Access to Data, Hands-On Exercise: Data Management with Hive, Thrift server, Meta store in Hive,
Topics – Understanding Query Performance, Partitioning, Bucketing, Indexing Data
Topics – User-Defined Functions in Hive
Topics – What is Impala?, How Impala Differs from Hive and Pig, How Impala Differs from Relational Databases, Limitations and Future Directions, Using the Impala Shell
Topics – Data Storage Overview, Creating Databases and Tables, Loading Data into Tables, HCatalog, Impala Metadata Caching
Topics – Partitioning Overview, Partitioning in Impala and Hive
Topics – Selecting a File Format, Hadoop Tool Support for File Formats, Avro Schema, Using Avro with Hive and Sqoop, Avro Schema Evolution, Compression
Topics – What is Hbase, Where does it fits, What is NOSQL
Topics – What is Spark, Comparison with Hadoop, Components of Spark
Topics – Apache Spark- Introduction, Consistency, Availability, Partition, Unified Stack Spark, Spark Components, Comparison with Hadoop – Scalding example, mahout, storm, graph
Topics – Explain python example, Show installing a spark, Explain driver program, Explaining spark context with example, Define weakly typed variable, Combine scala and java seamlessly, Explain concurrency and distribution., Explain what is trait, Explain higher order function with example, Define OFI scheduler, Advantages of Spark, Example of Lamda using spark, Explain Mapreduce with example
Topics – Hadoop Multi Node Cluster Setup using Amazon ec2 – Creating 4 node cluster setup, Running Map Reduce Jobs on Cluster
Topics – Putting it all together and Connecting Dots, Working with Large data sets, Steps involved in analyzing large data
Topics – How ETL tools work in Big data Industry, Connecting to HDFS from ETL tool and moving data from Local system to HDFS, Moving Data from DBMS to HDFS, Working with Hive with ETL Tool, Creating Map Reduce job in ETL tool End to End ETL PoC showing Hadoop integration with ETL tool.
Topics – Hadoop configuration overview and important configuration file, Configuration parameters and values, HDFS parameters MapReduce parameters, Hadoop environment setup, ‘Include’ and ‘Exclude’ configuration files, Lab: MapReduce Performance Tuning
Topics – Namenode/Datanode directory structures and files, File system image and Edit log, The Checkpoint Procedure, Namenode failure and recovery procedure, Safe Mode, Metadata and Data backup, Potential problems and solutions / what to look for, Adding and removing nodes, Lab: MapReduce File system Recovery
Topics – Best practices of monitoring a Hadoop cluster, Using logs and stack traces for monitoring and troubleshooting, Using open-source tools to monitor Hadoop cluster
Topics – How to schedule Hadoop Jobs on the same cluster, Default Hadoop FIFO Schedule, Fair Scheduler and its configuration
Topics – Hadoop Multi Node Cluster Setup using Amazon ec2 – Creating 4 node cluster setup, Running Map Reduce Jobs on Cluster
Topics – ZOOKEEPER Introduction, ZOOKEEPER use cases, ZOOKEEPER Services, ZOOKEEPER data Model, Znodes and its types, Znodes operations, Znodes watches, Znodes reads and writes, Consistency Guarantees, Cluster management, Leader Election, Distributed Exclusive Lock, Important points
Topics – Why Oozie?, Installing Oozie, Running an example, Oozie- workflow engine, Example M/R action, Word count example, Workflow application, Workflow submission, Workflow state transitions, Oozie job processing, Oozie Hadoop security, Why Oozie security?, Job submission to hadoop, Multi tenancy and scalability, Time line of Oozie job, Coordinator, Bundle, Layers of abstraction, Architecture, Use Case 1: time triggers, Use Case 2: data and time triggers, Use Case 3: rolling window
Topics – Overview of Apache Flume, Flume for Hadoop, Physically distributed Data sources, Changing structure of Data, Closer look, Anatomy of Flume, Core concepts, Event, Clients, Agents, Source, Channels, Sinks, Interceptors, Channel selector, Sink processor, Data ingest, Agent pipeline, Transactional data exchange, Routing and replicating, Why channels?, Use case- Log aggregation, Adding flume agent, Handling a server farm, Data volume per agent, Example describing a single node flume deployment
Topics – HUE introduction, HUE ecosystem, What is HUE?, HUE real world view, Advantages of HUE, How to upload data in File Browser?, View the content, Integrating users, Integrating HDFS, Fundamentals of HUE FRONT END
Topics – IMPALA Overview: Goals, User view of Impala: Overview, User view of Impala: SQL, User view of Impala: Apache HBase, Impala architecture, Impala state store, Impala catalog service, Query execution phases, Comparing Impala to Hive
Topics – Why Hadoop testing is important, Unit testing, Integration testing, Performance testing, Diagnostics, Nightly QA test, Benchmark and end to end tests, Functional testing, Release certification testing, Security testing, Scalability Testing, Commissioning and Decommissioning of Data Nodes Testing, Reliability testing, Release testing
Topics – Understanding the Requirement, preparation of the Testing Estimation, Test Cases, Test Data, Test bed creation, Test Execution, Defect Reporting, Defect Retest, Daily Status report delivery, Test completion, ETL testing at every stage (HDFS, HIVE, HBASE) while loading the input (logs/files/records etc) using sqoop/flume which includes but not limited to data verification, Reconciliation, User Authorization and Authentication testing (Groups, Users, Privileges etc), Report defects to the development team or manager and driving them to closure, Consolidate all the defects and create defect reports, Validating new feature and issues in Core Hadoop.
Topics – Report defects to the development team or manager and driving them to closure, Consolidate all the defects and create defect reports, Validating new feature and issues in Core Hadoop, Responsible for creating a testing Framework called MR Unit for testing of Map-Reduce programs.
Topics – Automation testing using the OOZIE, Data validation using the query surge tool.
Topics – Test plan for HDFS upgrade, Test automation and result
Topics – How to test install and configure
Topics – Major Project on Big Data and Hadoop, Hadoop Development, Cloudera Certification Tips and Guidance and Mock Interview Preparation, Practical Development Tips and Techniques, certification preparation
Topics: Understanding of R statistical computing and graphics, the statistical packages, familiarity with different datatypes and functions, learning to deploy them in various scenarios, use SQL to apply ‘join’ function.
Topics: R Functions, code compilation and data in well-defined format called R-Packages, learn about R-Package structure, Package metadata and testing, CRAN (Comprehensive R Archive Network), Vector creation and variables values assignment.
Topics: R functionality, Rep Function, generating Repeats, Sorting and generating Factor Levels, Transpose and Stack Function.
Topics: Understanding various functions like Merge, Strsplit, understanding Matrices and Manipulation of Matrix, Row Sums
Topics: Deploying R for plotting graphs, pie charts, bar plots, histogram and understanding components of Pie Chart.
Topics: One Way Analysis of Variance, Two Way Analysis of Variance
Topics: Understanding K-Means Clustering, and the workings of Cluster Algorithm, the association rule mining affinity analysis for data mining and analysis and learning co-occurrence relationships.
Topics: Learn about dependent and independent variables, linear regression and scatter plots
Topics: The concepts of Logistic Regression, deploying Logistic Regression in R, set of examples and implementation.
Topics: What is Area under ROC Curve? R –Sensitivity & Specificity, R Open Database Connectivity, deploying ODBC Tables for reading data, application of Confusion Matrix for performance visualization.
Topics: Creating an integrated environment for deploying R on Hadoop platform, working with R Hadoop, RMR package and R Hadoop Integrated Programming Environment, R programming for MapReduce jobs and Hadoop execution.
Topics: Classification and Recommendation, Clustering in Mahout, Pattern Mining, Understanding machine Learning, Using Model diagram to decide the approach, Data flow, Supervised and Unsupervised learning
Topics: Concept of Recommendation, Recommendations by E-commerce site, Comparison between User Recommendations and Item recommendation, Define recommender and Classifiers, Process of Collaborative Filtering, Explaining Pearson coefficient algorithm, Euclidean distance measure, Implementing a recommender using map reduce
Topics: Defining Clustering, User-to-user similarity, Clustering Illustration, Euclidean distance measure, Distance measure vector, Understanding the process of Clustering, Vectorizing documents-Unstructured data
Topics: Document clustering, Sequence-to-sparse Utility, K-Mean Clustering
Topics: Terminology, Predictor and Target variable, Classifiable Data Key Challenges in Classification algorithm, Vectorizing Continuous data, Classification Examples, Logic Regression and its examples
Topics: Clustering, Clustering Process, Transaction Clustering, Different techniques of Vectorization, Distance measure, Clustering algorithm-K-MEAN, Clustering Application-1, Clustering Application-2, Sentiment Analyzer
Topics: Pearson Coefficient, Collaborative Filtering Process, Collaborative Filtering, Similarity Algorithms, Pearson Correlation, Euclidean Distance Measure -Frequent Pattern & Association rules, Frequent Pattern Growth
Topics: Introduction to Data Science, importance of Data Science, statistical and analytical methods, deploying Data Science for Business Intelligence, transforming data, machine learning and introduction to Recommender systems.
Topics: How Data Science solves real world problems, Data Science Project Life Cycle, principles of Data Science, introduction to various BI and Analytical tools, data collection, introduction to statistical packages, data visualization tools, R Programming, predictive modelling, machine learning, artificial intelligence and statistical analysis.
Topics: Boxplot in R programming, understanding distribution and percentile, identifying outliers, Rstudio Tool, various types of distribution like Normal, Uniform and Skewed.
Topics: Deploying machine learning for data analysis, solving business problems, using algorithms for searching patterns in data, relationship between variables, multivariate analysis, interpreting correlation, negative correlation.
Topics: Data Transformation key phases Data Mapping and Code Generation, Data Processing operation, data patterns, data sampling, sampling distribution, normal and continuous variable, data extrapolation, regression, linear regression model.
Topics: Data analysis, hypothesis testing, simple linear regression, Chi-square for assessing compatibility between theoretical and observed data, implementing data testing on data warehouse, validating data, checking for accuracy, data operational monitoring capabilities.
Topics: Various techniques of data modelling and generating algorithms, methods of business prediction, prediction approaches, data sampling, disproportionate sampling, data modelling rules, data iteration, and deploying data for mission-critical applications.
Topics: Working with large datasets in data warehouses, data clustering, grouping, horizontal & vertical slicing, data sharding in partitioning, clustering algorithms, K-means Clustering for analysing and data mining, exclusive clustering, hierarchy clustering, Mahout Clustering algorithm and Probabilistic Clustering, nearest neighbour search, pattern recognition, and statistical classification.
Topics: Introduction to R statistical computing and graphics, concepts, features and advantages of R, Big Data Hadoop familiarity, integrating R and Hadoop, basic architecture, framework, installing RImpala packages.
Topics: What is statistics?, How is this useful, What is this course for
Topics: Converting data into useful information, Collecting the data, Understand the data, Finding useful information in the data, Interpreting the data, Visualizing the data
Topics: Descriptive statistics, Let us understand some terms in statistics, Variable
Topics: Dot Plots, Histogram, Stemplots, Box and whisker plots, Outlier detection from box plots and Box and whisker plots
Topics: What is probability?, Set & rules of probability, Bayes Theorem
Topics: Probability Distributions, Few Examples, Student T- Distribution, Sampling Distribution, Student t- Distribution, Poison distribution
Topics: Stratified Sampling, Proportionate Sampling, Systematic Sampling, P – Value, Stratified Sampling
Topics: Cross Tables, Bi-variate Analysis, Multi variate Analysis, Dependence and Independence tests ( Chi-Square ), Analysis of Variance, Correlation between Nominal variables
Topics : Search Engine Basics, Lucene Overview & Features, Indexing Basics, Architecture, Inverted Indexing Technique, Lucene Schema (Documents & Fields), Analyzers, Query Types, use cases of search engine,Writing & Searching Index
Topics : Analyzers, Querying, Scoring, Boosting, Highlighting, Faceting, Grouping, Joins, Spatial Search, Configure lucene with Java, Demonstrate Writing ( Indexing )& Searching with Various methods.Apache tika
Topics : About Solr, Installing and running Solr, Introduction to Solr cores,Data types available in Solr, Adding content to Solr, Reading a Solr XML response, Changing parameters in the URL, Using the browse interface
Topics : Introduction to Solr client, Configure Solr Client, Adding your own content to Solr, Deleting data from Solr, Building a bookstore search, Adding book data, Exploring the book data, Dedupeup date processor
Topics : Sorting results, Query parsers, More queries, Hardwiring request parameters, Adding fields to default search, Faceting, Result grouping
Topics : Adding fields to the schema, Analyzing text
Topics : Field weighting, Phrase queries, Function queries, Fuzzier search, Sounds-like
Topics : More-like-this, Geospatial, Spell checking, Suggestions, Highlighting, Pseudo-fields, Pseudo-joins, Multi language, Faceting, Query Re-Ranking, , , Pagination, Grouping, Clustering, Spatial Search, Collapsing & Expanding, Exporting Results, Real-Time Search & Get, Client API’s.
Topics : Adding more kinds of data, Joining between cores.
Topics : Introduction, How SolrCloud works, Commit strategies, Introduction to ZooKeeper, Managing Solr config files.Managing Solrconfig.xml, Managing solr.xml, Managing Multiple Cores, Plugins, JVM Settings, Running On Tomcat / Jetty, Logging & SSL, Sharding, replication.
Topics: Introduction to Splunk, Splunk developer roles and responsibilities
Topics: Writing Splunk query for search, sharing, saving, scheduling and exporting search results
Topics: Creation of alert, explaining alerts and viewing fired alerts
Topics: Introduction to Tags in Splunk, deploying Tags for Splunk search, understanding event types and utility, generating and implementing event types in Search
Topics: Search Command study, search practices in general, detailed understanding of search, search field performance with different commands like table,multikv, rename, rex&erex
Topics: Using following commands and their functions:addcoltotals, addtotals,top, rare,stats
Topics: Explore the available visualizations, create charts and timecharts, omit null values and format results
Topics: Calculating and analyzing results, value conversion, round and format values, using eval command, conditional statements, filtering calculated search results
Topics: Understanding Search Transactions
Topics: Learn about data lookups, example, lookup table, defining and configuring automatic lookup, deploying lookup in reports and searches
Topics: Creating search charts, reports and dashboards
Topics: Working with raw data for data extraction, transformation, parsing and preview
Topics: Splunk installation, configuration, data inputs, app management, Splunk important concepts, parsing machine-generated data, search indexer and forwarder.
Topics: Introduction to Splunk Configuration Files, Universal Forwarder, Forwarder Management, data management, troubleshooting and monitoring.
Topics: Converting machine-generated data into operational intelligence, setting up Dashboard, Reports and Charts, integrating Search Head Clustering & Indexer Clustering.
Topics: Understanding the input methods, deploying scripted, Windows, network and agentless input types, fine-tuning it all.
Topics: Splunk User authentication and Job Role assignment, learning to manage, monitor and optimize Splunk Indexes.
Topics: Understanding parsing of machine-generated data, manipulation of raw data, previewing and parsing, data field extraction.
Topics: Distributed search concepts, improving search performance, large scale deployment and overcoming execution hurdles, working with Splunk Distributed Management Console for monitoring the entire operation.
Topics: Introducing Scala and deployment of Scala for Big Data applications and Apache Spark analytics.
Topics: The importance of Scala, the concept of REPL (Read Evaluate Print Loop), deep dive into Scala pattern matching, type interface, higher order function, currying, traits, application space and Scala for data analysis.
Topics: Learning about the Scala Interpreter, static object timer in Scala, testing String equality in Scala, Implicit classes in Scala, the concept of currying in Scala, various classes in Scala.
Topics: Learning about the Classes concept, understanding the constructor overloading, the various abstract classes, the hierarchy types in Scala, the concept of object equality, the val and var methods in Scala.
Topics: Understanding Sealed traits, wild, constructor, tuple, variable pattern, and constant pattern.
Topics: Understanding traits in Scala, the advantages of traits, linearization of traits, the Java equivalent and avoiding of boilerplate code.
Topics: Implementation of traits in Scala and Java, handling of multiple traits extending.
Topics: Introduction to Scala collections, classification of collections, the difference between Iterator, and Iterable in Scala, example of list sequence in Scala.
Topics: The two types of collections in Scala, Mutable and Immutable collections, understanding lists and arrays in Scala, the list buffer and array buffer, Queue in Scala, double-ended queue Deque, Stacks, Sets, Maps, Tuples in Scala.
Topics: Introduction to Scala packages and imports, the selective imports, the Scala test classes, introduction to JUnit test class, JUnit interface via JUnit 3 suite for Scala test, packaging of Scala applications in Directory Structure, example of Spark Split and Spark Scala.
Topics: Introduction to Spark, how Spark overcomes the drawbacks of working MapReduce, understanding in-memory MapReduce.
Topics: Spark installation guide, working with Spark Shell, the concept of Resilient Distributed Datasets (RDD), learning to do functional programming in Spark, the architecture of Spark.
Topics: Deep dive into Spark RDDs, the RDD general operations, a read-only partitioned collection of records, using the concept of RDD for faster and efficient data processing.
Topics: Understanding the concept of Key-Value pair in RDDs, learning how Spark makes MapReduce operations faster, various operations of RDD.
Topics: Comparing the Spark applications with Spark Shell, creating a Spark application using Scala or Java, deploying a Spark application, the web user interface of Spark application, a real world example of Spark and configuring of Spark.
Topics: Learning about Spark parallel processing, deploying on a cluster, introduction to Spark partitions, file-based partitioning of RDDs, understanding of HDFS and data locality, mastering the technique of parallel operations.
Topics: Understanding the RDD persistence overview, distributed persistence, RDD lineage
Topics: Understanding the Spark streaming, creating a Spark stream application, processing of Spark stream, streaming request count and DStreams.
Topics: Learning about the Spark common use cases, the concept of iterative algorithm in Spark
Topics: Introduction to various variables in Spark like shared variables, broadcast variables, learning about accumulators, the common performance issues and troubleshooting the performance problems.
Topics: Learning about Spark SQL, the context of SQL in Spark for providing structured data processing, understanding the DataFrames in Spark, learning to query and transform data in DataFrames, how DataFrame provides the benefit of both Spark RDD and Spark SQL, deploying Hive on Spark as the execution engine.
Topics: Learning about the scheduling and partitioning in Spark, scheduling within and around applications, static partitioning, dynamic sharing, fair scheduling, Spark master high availability.
Topics: Understanding how to design capacity planning in Spark, Understanding about log analysis with Spark, first log analyzers in Spark.
Topics: Big Data characteristics, understanding Hadoop distributed computing, the Bayesian Law, deploying Storm for real time analytics, the Apache Storm features, comparing Storm with Hadoop, Storm execution, learning about Tuple, Spout, Bolt.
Topics: Installing the Apache Storm, various types of run modes of Storm.
Topics: Understanding Apache Storm and the data model.
Topics: Installation of Apache Kakfa and its configuration.
Topics: Understanding of advanced Storm topics like Spouts, Bolts, Stream Groupings, Topology and its Lifecycle, learning about Guaranteed Message Processing.
Topics: Various Grouping types in Storm, reliable and unreliable messages, Bolt structure and lifecycle, understanding Trident topology for failure handling, process, CallLogAnalysis Topology for analyzing call logs for calls made from one number to another.
Topics: Understanding of Trident Spouts and its different types, the various Trident Spout interface and components, familiarizing with Trident Filter, Aggregator and Functions, a practical and hands-on use case on solving call log problem using Storm Trident.
Topics: Various components, classes and interfaces in storm like – BaseRichBolt Class, iRichBolt Interface, iRichSpout Interface, BaseRichSpout class and the various methodology of working with them.
Topics: Understanding Cassandra, its core concepts, its strengths and deployment.
Topics: Twitter Boot Stripping, detailed understanding of Boot Stripping, concepts of Storm, Storm Development Environment.
Topics : Introduction to Cassandra, its strengths and deployment areas
Topics : Significance of NoSQL, RDBMS Replication, Key Challenges, types of NoSQL, benefits and drawbacks, salient features of NoSQL database. CAP Theorem, Consistency.
Topics : Installation, introduction to Cassandra, key concepts and deployment of non relational database, column-oriented database, Data Model – column, column family,
Topics : Token calculation, Configuration overview, Node tool, Validators, Comparators, Expiring column, QA
Topics : How Cassandra modelling varies from Relational database modelling, Cassandra modelling steps, introduction to Time Series modelling, comparing Column family Vs. Super Column family, Counter column family, Partitioners, Partitioners strategies, Replication, Gossip protocols, Read operation, Consistency, Comparison
Topics : Creation of multimodecluster, node settings, Key and Row cache, System Keyspace, understanding of Read Operation, Cassandra Commands overview, VNodes, Column family
Topics : JSON, Hector client, AVRO, Thrift, JAVA code writing method, Hector tag
Topics : Cassandar management, commands of node tool, MapReduce and Cassandra, Secondary index, Datastax Installation
Topics : Rules of Cassandra data modelling, increasing data writes, duplication, and reducing data reads, modelling data around queries, creating table for data queries
Topics : Understanding the Java application creation methodology, learning key drivers, deploying the IDE for Cassandra applications,cluster connection and data query implementation
Topics : Learning about Node Tool Utility, cluster management using Command Line Interface, Cassandra management and monitoring via DataStax Ops Center.
Topics : Cassandra client connectivity, connection pool internals, API, important features and concepts of Hector client, Thrift, JAVA code, Summarization.
Topics: RDBMS, types of relational databases, challenges of RDBMS, NoSQL database, its significance, how NoSQL suits Big Data needs, Introduction to MongoDB and its advantages, MongoDB installation, JSON features, data types and examples.
Topics: Installing MongoDB, basic MongoDB commands and operations, MongoChef (MongoGUI) Installation, MongoDB Data types.
Topics: The need for NoSQL, types of NoSQL databases, OLTP, OLAP, limitations of RDBMS, ACID properties, CAP Theorem, Base property, learning about JSON/BSON, database collection & document, MongoDB uses, MongoDB Write Concern – Acknowledged, Replica Acknowledged, Unacknowledged, Journaled, Fsync.
Topics: Understanding CRUD and its functionality, CRUD concepts, MongoDB Query & Syntax, read and write queries and query optimization.
5. Data Modeling & Schema Design
Topics: Concepts of data modeling, difference between MongoDB and RDBMS modeling, Model tree structure, operational strategies, monitoring and backup.
Topics: In this module you will learn MongoDB® Administration activities such as Health Check, Backup, Recovery, database sharding and profiling, Data Import/Export, Performance tuning etc.
Topics: Concepts of data aggregation and types, data indexing concepts, properties and variations.
Topics: Understanding database security risks, MongoDB security concept and security approach, MongoDB integration with Java and Robomongo.
Topics: Implementing techniques to work with variety of unstructured data like images, videos, log data, and others, understanding GridFS MongoDB file system for storing data.
Project 1 – Working with MapReduce, Hive, Sqoop
Topics : This project is involved with working on the various Hadoop components like MapReduce, Apache Hive and Apache Sqoop. Work with Sqoop to import data from relational database management system like MySQL data into HDFS. Deploy Hive for summarizing data, querying and analysis. Convert SQL queries using HiveQL for deploying MapReduce on the transferred data. You will gain considerable proficiency in Hive, and Sqoop after completion of this project.
Project 2 – Work on MovieLens data for finding top records
Data – MovieLens dataset
Topics : In this project you will work exclusively on data collected through MovieLens available rating data sets. The project involves the following important components:
Project 3 – Hadoop YARN Project – End to End PoC
Topics : In this project you will work on a live Hadoop YARN project. YARN is part of the Hadoop 2.0 ecosystem that lets Hadoop to decouple from MapReduce and deploy more competitive processing and wider array of applications. You will work on the YARN central Resource Manager. The salient features of this project include:
Project 4 – Partitioning Tables in Hive
Topics : This project involves working with Hive table data partitioning. Ensuring the right partitioning helps to read the data, deploy it on the HDFS, and run the MapReduce jobs at a much faster rate. Hive lets you partition data in multiple ways like:
This will give you hands-on experience in partitioning of Hive tables manually, deploying single SQL execution in dynamic partitioning, bucketing of data so as to break it into manageable chunks.
Project 5 – Connecting Pentaho with Hadoop Ecosystem
Topics : This project lets you connect Pentaho with the Hadoop ecosystem. Pentaho works well with HDFS, HBase, Oozie and Zookeeper. You will connect the Hadoop cluster with Pentaho data integration, analytics, Pentaho server and report designer. Some of the components of this project include the following:
Project 6 – Multi-node cluster setup
Topics : This is a project that gives you opportunity to work on real world Hadoop multi-node cluster setup in a distributed environment. The major components of this project involve:
You will get a complete demonstration of working with various Hadoop cluster master and slave nodes, installing Java as a prerequisite for running Hadoop, installation of Hadoop and mapping the nodes in the Hadoop cluster.
Project 7 – Hadoop Testing using MR
Topics : In this project you will gain proficiency in Hadoop MapReduce code testing using MRUnit. You will learn about real world scenarios of deploying MRUnit, Mockito, and PowerMock. Some of the important aspects of this project include:
After completion of this project you will be well-versed in test driven development and will be able to write light-weight test units that work specifically on the Hadoop architecture.
Project 8 – Hadoop Weblog Analytics
Data – Weblogs
Topics : This project is involved with making sense of all the web log data in order to derive valuable insights from it. You will work with loading the server data onto a Hadoop cluster using various techniques. The various modules of this project include:
The web log data can include various URLs visited, cookie data, user demographics, location, date and time of web service access, etc. In this project you will transport the data using Apache Flume or Kafka, workflow and data cleansing using MapReduce, Pig or Spark. The insight thus derived can be used for analyzing customer behavior and predict buying patterns.
Project 9 – Hadoop Maintenance
Topics : This project is involved with working on the Hadoop cluster for maintaining and managing it. You will work on a number of important tasks like:
Project Title – Restaurant Revenue Prediction
Dataset – Sales
Project Description – This project involves predicting the sales of a restaurant on the basis of certain objective measurements. This project will give real time industry experience on handling multiple use cases and derive the solution. This project gives insights about feature engineering and selection.
Project 1 – Understanding Cold Start Problem in Data Science
Topics: This project involves understanding of the cold start problem associated with the recommender systems. You will gain hands-on experience in information filtering, working on systems with zero historical data to refer to, as in the case of launching a new product. You will gain proficiency in working with personalized applications like movies, books, songs, news and such other recommendations. This project includes the following:
Project 2 – Recommendation for Movie, Summary
Topics: This is real world project that gives you hands-on experience in working with a movie recommender system. Depending on what movies are liked by a particular user, you will be in a position to provider data-driven recommendations. This project involves understanding recommender systems, information filtering, predicting ‘rating’, learning about user ‘preference’ and so on. You will exclusively work on data related to user details, movie details and others. The main components of the project include the following:
Project – Data Analysis Project
Data – Sales
Problem Statement – It includes the following actions:
Topics: Understand the business solutions, Discussion with the warehouse team, Data Collection & Storage, Data Cleaning, Build a Hypothesis Tree around the business problem, Produce the final result.
Project – Running Function Queries on Apache Solr
Topics : In this project you will learn about the Function Queries and deploy it on the search results got in Apache Solr. You will understand how exactly the Function Queries are used to modify the search results based on certain conditions. It involves working on the index store that has dimensions of a box with arbitrary names, sort all the boxes through search and then modify the search results using Function Queries based on new parameters. Some of the query parsers used are DisMax, Extended DisMax and standard.
Topics : This project gives you hands-on experience in working with the Splunk tool. You will have the data set of employee details in a text file based on which you will create a dashboard and report. Then you will deploy the various Splunk commands to perform row operations, extract certain data fields, edit the event, add tags, search with tag name for event and then save the tag search. Upon completion of this project you will learn to create a searchable repository using data that is captured, correlated and indexed in real time and ultimately visualize it using dashboard, report and alert.
Type – Field Extraction
Topics : In this project you will learn to extract fields from events using the Splunk field extraction technique. You will gain knowledge in the basics of field extractions, understand the use of field extractor, the field extraction page in Splunk web and field extract configuration in files. Learn about the regular expression and delimiters method of field extraction. Upon completion of the project you will gain expertise in building Splunk dashboard and use the extracted fields data in it to create rich visualizations in an enterprise setup.
Project 1: Movie Recommendation
Topics – This is a project wherein you will gain hands-on experience in deploying Apache Spark for movie recommendation. You will be introduced to the Spark Machine Learning Library, a guide to MLlib algorithms and coding which is a machine learning library. Understand how to deploy collaborative filtering, clustering, regression, and dimensionality reduction in MLlib. Upon completion of the project you will gain experience in working with streaming data, sampling, testing and statistics.
Project 2: Twitter API Integration for tweet Analysis
Topics – With this project you will learn to integrate Twitter API for analyzing tweets. You will write codes on the server side using any of the scripting languages like PHP, Ruby or Python, for requesting the Twitter API and get the results in JSON format. You will then read the results and perform various operations like aggregation, filtering and parsing as per the need to come up with tweet analysis.
Project 3: Data Exploration Using Spark SQL – Wikipedia dataset
Topics – This project lets you work with Spark SQL. You will gain experience in working with Spark SQL for combining it with ETL applications, real time analysis of data, performing batch analysis, deploying machine learning, creating visualizations and processing of graphs.
Project 1. Call Log Analysis using Trident
Topics : In this project you will be working on call logs to decipher the data and gather valuable insightsusing Apache Storm Trident. You will extensively work with data about calls made from one number to another. The aim of this project is to resolve the call log issues with Trident stream processing and low latency distributed querying. You will gain hands-on experience in working with Spouts and Bolts along with various Trident functions, filters, aggregation, joins and grouping.
Project 2. Twitter Data Analysis using Trident
Topics : This is a project that involves working with Twitter data and processing it to extract patterns out of it. The Apache Storm Trident is the perfect framework for real-time analysis of tweets. Working with Trident you will be able to simplify the task of live Twitter feed analysis. In this project you will gain real world experience of working with Spouts, Bolts, and Trident filters, joins, aggregation, functions and grouping.
Project 3. US Presidential Election Result analysis using Trident DRPC Query
Topics : This is a project that lets you work on the US presidential election results and predict who is leading and trailing on a real-time basis. For this you exclusively work with Trident distributed Remote Procedure Call server. After completion of the project you will learn how to access data residing in a remote computer or network and deploy it for real-time processing, analysis and prediction.
Type : Deploying the IDE for Cassandra applications
Topics : This project gives you a hands-on experience in installing and working with Apache Cassandra which is a high performance and extremely scalable database for distributed data with no single point of failure. You will deploy the Java Integrated Development Environment for running Cassandra, learn about the key drivers, work with Cassandra applications in a cluster setup and implement data querying techniques.
Java is one of the most popular programming languages for working with MongoDB. This project tells you how to work with the MongoDB Java Driver, and using MongoDB as a Java Developer. Become proficient in creating a table for inserting video using Java programming. Some of the tasks and steps involved are as below–
Intellipaat is the pioneer of Big Data, Data Science training. Data Scientist is one of the most sought-after professional roles in the corporate world today. This Intellipaat all-in-one Combo course exclusively trains you to become a top-notch Big Data, Data Science professional. You will gain hands-on experience in mastering Big Data, Data Science beginner to advanced concepts including comprehensive study of Big Data Hadoop ecosystem, programming languages, Apache Solr, NoSQL databases like MongoDB, Cassandra, HBase, machine learning tool Splunk, Apache Mahout, advanced statistics and probability.
The entire training course content is fully aligned towards clearing the following Big Data and Data Science certification exams: CCA Spark and Hadoop Developer (CCA175), Cloudera Certified Administrator for Apache Hadoop (CCAH), Cloudera Data Scientist certification (CCP:DS), Apache Hbase certification exam CCB-400, C100DEV: MongoDB Certified Developer Associate, Apache Cassandra DataStax, Splunk Certified Power User & Admin.
This is a completely career-oriented training and it is designed by industry experts. Your training program includes real time Big Data & Data Science projects, step-by-step assignments to evaluate your progress and specially designed quizzes for clearing the requisite certification exams.
Intellipaat also offers lifetime access to videos, course materials, 24/7 Support, and course material upgrades to latest version at no extra fees. For Hadoop and Spark training you get the Intellipaat Proprietary Virtual Machine for Lifetime and free cloud access for 6 months for performing training exercises. All-in-one it is a one-time investment to become a successful Data Scientist and grab the best jobs at the best salaries in top MNCs around the world.
Intellipaat basically offers the self-paced training and online instructor-led training. Apart from that we also provide corporate training for enterprises. All our trainers come with over 12 years of industry experience in relevant technologies and also they are subject matter experts working as consultants. You can check about the quality of our trainers in the sample videos provided.
If you have any queries you can contact our 24/7 dedicated support to raise a ticket. We provide you email support and solution to your queries. If the query is not resolved by email we can arrange for a one-on-one session with our trainers. The best part is that you can contact Intellipaat even after completion of training to get support and assistance. There is also no limit on the number of queries you can raise when it comes to doubt clearance and query resolution.
The Intellipaat self-paced training is for people who want to learn at their own leisurely pace. As part of this program we provide you with one-on-one sessions, doubt clearance over email, 24/7 Live Support, lifetime LMS and upgrade to the latest version at no extra cost. The prices of self-paced training can be 75% lesser than online training. While studying should you face any unexpected challenges then we shall arrange a Virtual LIVE session with the trainer.
We provide you with the opportunity to work on real world projects wherein you can apply your knowledge and skills that you acquired through our training. We have multiple projects that thoroughly test your skills and knowledge of various aspect and components making you perfectly industry-ready. These projects could be in exciting and challenging fields like banking, insurance, retail, social networking, ecommerce, marketing, sales, high technology and so on. The Intellipaat projects are equivalent to six months of relevant experience in the corporate world.
Yes, Intellipaat does provide you with placement assistance. We have tie-ups with 80+ organizations including Ericsson, Cisco, Cognizant, TCS, among others that are looking for skilled & quality professionals and we would be happy to assist you with the process of preparing yourself for the interview and the job.
Yes, if you would want to upgrade from the self-paced training to instructor-led training then you can easily do so by paying the difference of the fees amount and joining the next batch of classes which shall be separately notified to you.
Upon successful completion of training you have to take a set of quizzes, complete the projects and upon review and on scoring over 60% marks in the qualifying quiz the official Intellipaat verified certificate is awarded.The Intellipaat Certification is a seal of approval and is highly recognized in 80+ corporations around the world including many in the Fortune 500 list of companies.
This is a comprehensive course that is designed to clear multiple certifications viz.
The entire training course content is in line with respective certification program and helps you clear the requisite certification exam with ease and get the best jobs in the top MNCs.
As part of this training you will be working on real time projects and assignments that have immense implications in the real world industry scenario thus helping you fast track your career effortlessly.
At the end of this training program there will be quizzes that perfectly reflect the type of questions asked in the respective certification exams and helps you score better marks in certification exam.
Intellipaat Course Completion certificate will be awarded on the completion of Project work (on expert review) and upon scoring of at least 60% marks in the quiz. Intellipaat certification is well recognized in top 80+ MNCs like Ericsson, Cisco, Cognizant, Sony, Mu Sigma, Saint-Gobain, Standard Chartered, TCS, Genpact, Hexaware, etc.
You will get Lifetime access to high quality interactive tutorials along with life time access to complete Course Material .There will be 24/7 access to video tutorials with email support. If you stuck in any unexpected problem we will provide online interactive sessions with trainer for issue resolving.
We provide 24X7 support by email for issues or doubts clearance for Self-paced training.
In online Instructor led training, trainer will be available to help you out with your queries regarding the course. If required, the support team can also provide you live support by accessing your machine remotely. This ensures that all your doubts and problems faced during labs and project work are clarified round the clock.
This course is designed for clearing CCA Spark and Hadoop Developer , Cloudera Certified Administrator for Apache Hadoop (CCAH) , R certification exam , Mahout Certification Exam ,Cloudera certification (CCP:DS) , Apache Strom training CCB-400 , Apache Cassandra Professional , Apache Spark Certification examination.
At the end of the course there will be a quiz and project assignments once you complete them you will be awarded with Intellipaat Course Completion certificate.
"PMI®", "PMP®" and "PMI-ACP®" are registered marks of the Project Management Institute, Inc.
The Open Group®, TOGAF® are trademarks of The Open Group.
The Swirl logoTM is a trade mark of AXELOS Limited.
ITIL® is a registered trade mark of AXELOS Limited.
PRINCE2® is a Registered Trade Mark of AXELOS Limited.
Certified ScrumMaster® (CSM) and Certified Scrum Trainer® (CST) are registered trademarks of SCRUM ALLIANCE®
Professional Scrum Master is a registered trademark of Scrum.org