+91 9784286179 / +91-9019368913

Big Data Online Training

Great news for all professionals who are looking for expert Big data training in India!

Intellipaat is renowned for providing one of the best Online Big Data Hadoop training India in the present market and we are conducting successful batches every weekend.

We are a team of Big Data Architects, Business Intelligence and Cloud Computing with deep finance domain knowledge and top-class implementation experience with different BI tools and BIG data Technologies. We are dedicated to the highest quality and client satisfaction.

Big Data Hadoop Online Training

Intellipaat works in a distributed environment. We undertake online Big Data Hadoop trainings that will enable candidates across the globe to take trainings at their convenient place and time.

Intellipaat is also providing corporate trainings in esteemed organizations on various BI and Big Data training India.

 

Hadoop DW Expertise

Fee : $400

Introduction

Hadoop DW course structure is a blend of administration and writing code using Hadoop ecosystem components to show working with Big data. Topics covered in this course, include- Hive, Pig, Zoo Keeper, Sqoop and multi node setup of Hadoop Cluster on Amazon ec2 in CDH4. Hadoop developer course focuses on training participants on setting up Hadoop infrastructure, writing Map Reduce Programs, Hive and Pig Scripts, working with HDFS, Zoo keeper and Sqoop, Flume, Oozie.

Total Duration

30-35 hrs

Who should attend?

Java Developers / Architects, Data warehouse developers /SAAS Professionals/Architects, Big data Professionals

Pre requisites for attending Training

Basic knowledge of unix , sql scripting

OBJECTIVE of the Training

  • Understanding Distributed , parallel ,cloud computing ,No sql concepts
  • Setting up Hadoop infrastructure with single and multi node cluster on amazon ec2(CDH4)
  • Understanding of concepts of Map and Reduce and functional Programming
  • Writing Map and Reduce Programs , Working with HDFS
  • Writing Hive and Pig Scripts and working with Zoo Keeper and Sqoop
  • Ability to design and develop applications involving large data using Hadoop eco system

Course Outline

Introduction to Hadoop

  • Distributed computing
  • Parallel computing
  • Concurrency
  • Cloud Computing
  • Data Past, Present and Future
  • Computing Past, Present and Future
  • Hadoop
  • NoSQL
  • Hadoop Streaming
  • Distributing Debug Scripts
  • Getting Started With Eclipse

Hadoop Stack

  • CAP Theorem
  • Databases: Key Value, Document, Graph
  • Hive and Pig
  • HDFS

Lab 1: Hadoop Hands-on

  • Installing Hadoop Single Node cluster(CDH4)
  • Understanding Hadoop configuration files

HDFS Introduction

  • Architecture
  • File System
  • Data replication
  • Name Node
  • Data Node

Hive introduction

Installation and Configuration

  • Running Hive
  • Configuration management overview
  • Runtime configuration
  • Hive, Map-Reduce and Local-Mode

DDL Operations

  • Metadata Store

DML Operations

SQL Operations

  • Queries
  • SELECTS and FILTERS
  • GROUP BY
  • JOIN
  • MULTITABLE INSERT
  • STREAMING

Exercise

  • MovieLens
  • Apache log

Hive Archiecture

  • Data Store
  • Metastore
  • Architecture
  • Interface
  • HQL
  • Compiler
  • Optimizer

Pig Introduction

  • Pig and Dataflow
  • Pig Philosophy
  • Pig and Hadoop
  • Pig vs Hive
  • Why Pig

Installing and Configuring Pig

  • Download and Install from Apache
  • Running Pig
  • Local
  • Cluster
  • Cloud
  • Command Line Options

Grunt

  • Understanding Grunt
  • Entering PigLatin script in Grunt
  • HDFS Commands in Grunt
  • Controlling Pig from Grunt

Pig Data Model

  • Problem Statement and Data Model
  • Input and Output
  • Load
  • Store
  • Dump
  • Relational Operators
  • Foreach
  • Filter
  • Group
  • OrderBy
  • Distinct
  • Join
  • Limit
  • Sample
  • Parallel
  • User Defined Functions
  • Registering UDF
  • Defining UDF
  • Calling Static Java Functions

Flume

  • What is Flume?
  • How it works ?
  • An example
  • What is Sqoop?
  • How it works ?
  • An example
  • What is Oozie?
  • How it works?
  • An example

Sqoop

Oozie

Load and Store Functions

Overview of Built-in Functions

Introduction to Zoo Keeper

Cluster Planning and Cloud Manager Set-up

Hadoop Multi node Cluster Setup

  • Installation and Configuration
  • Running MapReduce Jobs on Multi Node cluster

Working with Large data sets

  • Steps involved in analyzing large data
  • Lab walk through

POC for Hadoop Connectivity with ETL Tool

High Availability Fedration, Yarn and Security

Contact Us

E-mail: sales@intellipaat.com

Mob: +91-9019368913

Hadoop Developers

Fee : $400

Introduction

Hadoop Developer course structure is a blend of administration and writing code using Hadoop ecosystem components to show working with Big data. Topics covered in this course, include Map Reduce, Hive, Pig, Zoo Keeper, Sqoop and multi node setup of Hadoop Cluster on Amazon ec2 in CDH4. Hadoop developer course focuses on training participants on setting up Hadoop infrastructure, writing Map Reduce Programs, Hive and Pig Scripts, working with HDFS, Zoo keeper and Sqoop, Flume, Oozie.

Total Duration

30-35 hrs

Who should attend?

Java Developers / Architects, Data warehouse developers /Architects, Big data Professionals

Pre requisites for attending Hadoop Developer Training

Basic knowledge of unix , java, sql scripting

OBJECTIVE of the Training:

1. Understanding Distributed , parallel ,cloud computing ,No sql concepts

2. Setting up Hadoop infrastructure with single and multi node cluster on amazon ec2(CDH4)

3. Understanding of concepts of Map and Reduce and functional Programming

4. Writing Map and Reduce Programs , Working with HDFS

5. Writing Hive and Pig Scripts and working with Zoo Keeper and Sqoop

6. Ability to design and develop applications involving large data using Hadoop eco system

Course Outline

Introduction to Hadoop

  • Distributed computing
  • Parallel computing
  • Concurrency
  • Cloud Computing
  • Data Past, Present and Future
  • Computing Past, Present and Future
  • Hadoop
  • NoSQL
  • Hadoop Streaming
  • Distributing Debug Scripts
  • Getting Started With Eclipse

Hadoop Stack

  • MapReduceNoSQL
  • CAP Theorem
  • Databases: Key Value, Document, Graph
  • Hive and Pig
  • HDFS

Lab 1: Hadoop Hands-on

  • Installing Hadoop Single Node cluster(CDH4)
  • Understanding Hadoop configuration files

MapReduce Introduction

  • Functional – Concept of Map
  • Functional – Concept of Reduce
  • Functional – Ordering, Concurrency, No Lock, Concurrency
  • Functional – Shuffling
  • Functional – Reducing, Key, Concurrency
  • MapReduce Execution framework
  • MapReduce Partitioners and Combiners
  • MapReduce and role of distributed filesystem
  • Role of Key and Pairs
  • Hadoop Data Types

Lab 2: MapReduce Exercises

  • Understanding Sample MapReduce code
  • Executing MapReduce code

HDFS Introduction

  • Architecture
  • File System
  • Data replication
  • Name Node
  • Data Node

Hive

  • Architecture
  • Data Model
  • Physical Layout
  • DDL DML SQL Operations

Lab 3: Hive Hands ON

  • Installation
  • Setup
  • Exercises

Pig

  • Rationale
  • Pig Latin
  • Input, Output and Relational Operators
  • User Defined Functions
  • Analyzing and designing using Pig Latin

Lab 4: Pig Hands on

  • Installation
  • Setup
  • Executing Pig Latin scripts on File system
  • Executing Pig Latin scripts on HDFS
  • Writing custom User Defined Functions

Flume

  • What is Flume?
  • How it works ?
  • An example
  • What is Sqoop?
  • How it works ?
  • An example
  • What is Oozie?
  • How it works?
  • An example

Sqoop

Oozie

Introduction to Zoo Keeper

Cluster Planning and Cloud Manager Set-up

Hadoop Multi node Cluster Setup

  • Installation and Configuration
  • Running MapReduce Jobs on Multi Node cluster

Working with Large data sets

  • Steps involved in analyzing large data
  • Lab walk through

High Availability Fedration, Yarn and Security

 

 

Contact Us

E-mail -sales@intellipaat.com

Mob: +91-9019368913

Learning Java with Mapreduce

Fee : $240

Introduction

Java with Mapreduce course structure is a blend of and writing code using Hadoop ecosystem components to show working with Big data. Topics covered in this course, include – Map Reduce

Total duration

12 hrs

Who should attend?

Architects, Data warehouse developers /SAAS Professionals /Architects, Big data Professionals

Pre requisites

Any programming language

OBJECTIVE of the Training:

Writing Map and Reduce Programs

TOPICS

  • Basic OO
  • Language basics
  • Classes and objects
  • Interfaces and inheritance
  • Numbers and strings
  • Exceptions
  • Basic IO
  • Regular expressions
  • Collections
  • Generics
  • Deployment
  • JAVA code management including packages and JAR files
  • Introduction to JAVA eclipse editor and basic features
  • Understanding Sample MapReduce code
  • Executing MapReduce code

Contact Us

E-mail: sales@intellipaat.com

Mob:+91-9019368913

Hadoop Administration

Fee : $300

Introduction

Hadoop Administration training for System Administrators is designed for technical operations personnel whose job is to install and maintain production Hadoop clusters in real world. We will cover Hadoop architecture and its components, installation process, monitoring and troubleshooting of the complex Hadoop issues. The training is focused on practical hands-on exercises and encourages open discussions of how people are using Hadoop in enterprises dealing with large data sets.

Total Duration

20 hrs

Who should attend?

System Administrators and Support Engineers who will maintain and troubleshoot Hadoop clusters in production or development environments.

Pre requisites for attending Hadoop Developer Training

Basic knowledge of unix and system administration. Prior knowledge of Hadoop is not required.

OBJECTIVE of the Training:

1. Understand Hadoop main components and Architecture

2. Be comfortable working with Hadoop Distributed File System

3. Understand MapReduce abstraction and how it works

4. Plan your Hadoop cluster

5. Deploy and administer Hadoop cluster

6. Optimize Hadoop cluster for the best performance based on specific job requirements

7. Monitor a Hadoop cluster and execute routine administration procedures

8. Deal with Hadoop component failures and recoveries

9. Get familiar with related Hadoop projects: Hbase, Hive and Pig

10. Know best practices of using Hadoop in enterprise world

Outline

Introduction to Hadoop

  • The amount of data processing in today’s life
  • What Hadoop is why it is important?
  • Hadoop comparison with traditional systems
  • Hadoop history
  • Hadoop main components and architecture

Hadoop Distributed File System (HDFS)

  • HDFS overview and design
  • HDFS architecture
  • HDFS file storage
  • Component failures and recoveries
  • Block placement
  • Balancing the Hadoop cluster

Planning your Hadoop cluster

  • Planning a Hadoop cluster and its capacity
  • Hadoop software and hardware configuration
  • HDFS Block replication and rack awareness
  • Network topology for Hadoop cluster

Hadoop Deployment

  • Different Hadoop deployment types
  • Hadoop distribution options
  • Hadoop competitors
  • Hadoop installation procedure
  • Distributed cluster architecture
  • Lab: Hadoop Installation

Working with HDFS

  • Ways of accessing data in HDFS
  • Common HDFS operations and commands
  • Different HDFS commands
  • Internals of a file read in HDFS
  • Data copying with ‘distcp’
  • Lab: Working with HDFS

Map-Reduce Abstraction

  • What MapReduce is and why it is popular
  • The Big Picture of the MapReduce
  • MapReduce process and terminology
  • MapReduce components failures and recoveries
  • Working with MapReduce

Hadoop Cluster Configuration

  • Hadoop configuration overview and important configuration file
  • Configuration parameters and values
  • HDFS parameters MapReduce parameters
  • Hadoop environment setup
  • ‘Include’ and ‘Exclude’ configuration files
  • Lab: MapReduce Performance Tuning

Hadoop Administration and Maintenance

  • Namenode/Datanode directory structures and files
  • File system image and Edit log
  • The Checkpoint Procedure
  • Namenode failure and recovery procedure
  • Safe Mode
  • Metadata and Data backup
  • Potential problems and solutions / what to look for
  • Adding and removing nodes
  • Lab: MapReduce File system Recovery

Hadoop Monitoring and Troubleshooting

  • Best practices of monitoring a Hadoop cluster
  • Using logs and stack traces for monitoring and troubleshooting
  • Using open-source tools to monitor Hadoop cluster

Job Scheduling

  • How to schedule Hadoop Jobs on the same cluster
  • Default Hadoop FIFO Schedule
  • Fair Scheduler and its configuration

High Availability Fedration, Yarn and Security

Contact Us

E-mail: sales@intellipaat.com

Mob:+91-9019368913

Hadoop QA

Total Time

24hrs

COURSE CONTENT

1. Software

  • Creation of a Amazon Elastic Mapreduce instance.
  • Cloudera VM.

2. Architecture

  • Complete overview of system architecture, including data flow and components theory.
  • Overview of HDFS, including basic commands to move data into and from HDFS.
  • Overview of Mappers and Reducers.

3. HDFS

  • Locating the blocks of data in hdfs.
  • Learning to browse the hdfs file system using UI.

4. Map Reduce Jobs

  • Running a map reduce code written in Java.
  • Looking at the logs generated by the job.
  • Interpreting the output messages printed by a job.
  • Monitoring the progress if the hadoop jobs in the UI.
  • Interpreting the output of the job

5. Hive

  • Overview of HIVE architecture
  • Hive Query Language
  • Writing and running Hive queries on Hadoop

6. PIG

  • Pig Architecture
  • Pig Latin Language
  • Writing and running Pig Latin scripts
  • Difference between Pig and Hive.

7. POC and Lab Exercise

  • End-End demonstration of a POC
  • Instructions for implementing an exercise.

Contact Us

Email: sales@intellipaat.com

Mob:+91-9019368913

Cassandra

Fee : $400

Introduction

Apache Cassandra is a second-generation distributed database originally open-sourced by Facebook. Its write-optimized shared-nothing architecture results in excellent performance and scalability.

Cassandra moves away from the master-slave model and instead uses a peer-to-peer model. This means there is no single master but all the nodes are potentially masters. This makes the writes and reads extremely scalable and even allows nodes to function in cases of partition tolerance.

Total Duration

15hrs

Course Content

Introducing Cassandra

  • Distributed and Decentralized
  • Elastic Scalability
  • High Availability and Fault Tolerance
  • Tuneable Consistency
  • Brewer’s CAP Theorem
  • Row-Oriented
  • Schema-Free
  • High Performance

Use Cases for Cassandra

  • Large Deployments
  • Lots of Writes, Statistics, and Analysis
  • Geographical Distribution
  • Evolving Applications

Installing Cassandra

  • Running Cassandra
  • Starting the Server
  • Running the Command-Line Client Interface
  • Basic CLI Commands
  • Hands-on Session

The Cassandra Data Model

  • The Relational Data Model
  • A Simple Introduction
  • Clusters
  • Keyspaces
  • Hands-on Session

Column Families

  • Column Family Options
  • Columns
  • Wide Rows, Skinny Rows
  • Column Sorting
  • Super Columns
  • Composite Keys

The Cassandra Architecture

  • System Keyspace
  • Peer-to-Peer
  • Gossip and Failure Detection
  • Anti-Entropy and Read Repair
  • Memtables, SSTables, and Commit Logs
  • Hinted Handoff
  • Compaction
  • Bloom Filters
  • Tombstones
  • Hands-on Session

Clients

  • Basic Client API
  • Thrift
  • Avro
  • Hands-on Session

Reading and Writing Data

  • Setup and Inserting Data
  • Using a Simple Get
  • Seeding Some Values
  • Hands-on Session

CONTACT US

Email: sales@intellipaat.com

Phone: 91-9019368913

MongoDB

Fee : $300

Total Duration

15 hrs

Course Content

Overview

  • NoSQL
  • What is MongoDB?
  • JSON primer
  • When / why should you use MongoDB?

Installation and Administration

  • Installing MongoDB
  • Starting and stopping MongoDB servers
  • The JavaScript console

MongoDB Basics

  • Servers
  • Databases
  • Collections
  • Documents / Objects
  • CRUD
  • Indexes

Clients and drivers

  • Overview and integration

Building applications with MongoDB

  • Overview
  • Getting started
  • Examples and labs

Advanced querying

  • Projections
  • Conditional operators
  • Limit and skip
  • Aggregation and grouping
  • Map / reduce

Security and Authentication

  • Overview
  • Best-practices

Performance and scaling

  • Master / slave
  • Sharding
  • Profiler
  • Import / Export and backup strategies

GridFS

  • Overview

Contact Us

E-mail: sales@intellipaat.com

Mob:+91-9019368913

 

Hbase

Fee : $300

Total Duration

12 hrs

Who should attend?

Java, Big data, Database Professionals and Professional planning to switch to Hadoop/Big data Technologies

Pre requisites for attending Training

Core Java knowledge

Course Content

1. Introduction to HBase

  • CAP Theorem and Eventual consistency
  • NoSQL
  • Creating Table – Shell and Programming
  • Column Families
  • Column Value & Key Pair
  • Index & Query
  • Learning Scan
  • MapReduce and HBase
  • Importing into HBase

2 HBase Programming – Learning API

  • CRUD Operations
  • Get
  • Put
  • Delete
  • Scan Operations
  • Result Scanner
  • Batch and Caching
  • Filters
  • Counters
  • Co-Processors
  • Pooling
  • Admin Operations
  • Create Table
  • Column Families
  • Table Properties
  • Table Operations
  • Schema Operations
  • Cluster Operations
  • Status Operations
  • Map Reduce Integration
  • Static vs Dynamic provisioning
  • Data Sink
  • Data Source
  • Custom Processing

3. HBase Architecture

  • Seek vs Transfer
  • Storage
  • Write Path
  • File Type
  • Root level
  • Table level
  • Region level
  • Region split
  • Compaction
  • HFile Format
  • Key Value Format
  • Write Ahead Log
  • Storage
  • Write Path
  • File Types
  • Read Path
  • Regions
  • Region life cycle
  • Replication
  • Zoo Keeper

4. HBase DB Design

  • Designing Keys
  • Schema Design
  • Handling Index
  • Integration for search
  • Transaction

5. Implementing HBase CRUD Framerwork – Case Study

  • Building a CRUD framework
  • Building HBase Admin core
  • Building HBase master data manager
  • Building Table administration
  • Building filter framework
  • Building configuration framework

Contact Us

Email: sales@intellipaat.com

Mob:+91-9019368913

 

Hbase Impala Flume

Flume Introduction

Architecture

Data flow model

  • Complex flows
  • Reliability
  • Recoverability

Setup

  • Setting up an agent
  • Configuring individual components
  • Wiring the pieces together

Starting an agent

Data ingestion

  • RPC
  • Executing commands
  • Network streams
  • Setting multi-agent flow
  • Consolidation

Syslog Sources

  • Syslog TCP Source
  • Syslog UDP Source
  • Legacy Sources
  • Avro Legacy Source
  • Thrift Legacy Source
  • Custom Source

Flume Sinks

  • HDFS Sink
  • Logger Sink
  • Avro Sink
  • IRC Sink
  • File Roll Sink

HBasics

Backdrop

Concepts

Implementation

Installation

Test Drive

Clients

  • Java
  • Avro, REST, and Thrift

Example

Schemas

Loading Data

Web Queries

HBase Versus RDBMS

Contact Us

E-mail: sales@intellipaat.com

Mob: +91-9019368913

Scala

Introduction

This training is intended to enable developers or development managers, who are experienced programmers in Java or other production languages like C++, C# or Ruby, to confidently start programming in Scala.

Course objectives

  • be a competent user of Scala constructs in application code
  • know and be able to apply the functional programming style in Scala
  • know how to use the fundamental Scala tools
  • be confident to start using Scala in production applications

Target Group & Prerequisites:

For this training no previous knowledge of Scala is assumed. Although intense, the training ensures you will have a solid understanding of the fundamentals of the language, the tooling and the development process as well as a good appreciation of the more advanced features.

If you already have Scala programming experience, then this course could be a useful refresher.

COURSE

Why Scala?

  • Short history
  • Overview of Scala’s core characteristics

Setting up the development environment

  • Installing the Scala distribution
  • Installing Eclipse and the Scala plugin
  • Installing sbt

First steps

  • Interactive programming in the REPL
  • Variables and methods
  • Expressions and type inference
  • First glance at functions

Basic OO features

  • Classes, class parameters, constructors
  • Fields, class parameters as fields
  • Methods
  • Operators
  • Named and default arguments
  • Packages and imports
  • Access modifiers
  • Singleton objects, companions, Predef
  • Case classes

Testing in Scala

  • sbt managed dependencies
  • Test-driven development with triggered test (~test)
  • Testing with specs2

Collections and functional programming

  • Collection hierarchy, creating instances
  • Type parameters
  • Tuples
  • Immutability versus mutability
  • Some important collection methods
  • Higher-order functions and function literals
  • Functions values, function types, short notation
  • Important higher order functions: map, flatMap and filter

For-expressions and -loops

  • Generators
  • Filters
  • Definitions
  • Translation of for-expressions and -loops

Inheritance and traits

  • Extending classes
  • Final and sealed classes
  • Enumerations
  • Overriding members
  • Abstract classes
  • Implementing abstract members
  • Standard type hierarchy
  • Traits and mix-in composition

Pattern Matching

  • Match-expressions
  • Pattern types
  • Pattern guards
  • Patterns outside of match expressions

Dealing with optional values

  • (Higher order) functions on Option
  • Handling exceptions optionally

Contact Us

E-mail: sales@intellipaat.com

Mob: +91-9019368913

Mahout

Total duration

12 hrs

Course

  • Introduction to Mahout
  • Recommendations using Mahout
  • User based recommendation
  • Item based recommendation
  • Implementing a recommender using map reduce
  • Clustering
  • Clustering algorithms
  • Implementing clustering in Hadoop
  • Classification
  • Training and evaluating a classifier
  • Developing a classifier
  • Exercise and assignment

Contact Us

E-mail: sales@intellipaat.com

Mob: +91-9019368913

Hadoop Workshop

Introduction

Hadoop Developer course structure is a blend of administration and writing code using Hadoop ecosystem components to show working with Big data. Topics covered in this course, include Map Reduce, Hive, Pig, Zoo Keeper, Sqoop and multi node setup of Hadoop Cluster on Amazon ec2. Hadoop developer course focuses on training participants on setting up Hadoop infrastructure, writing Map Reduce Programs, Hive and Pig Scripts, working with HDFS, Zoo keeper and Sqoop.

Workshop Dates :

14th, 15th, 16th June 2013

Who should attend?

Java Developers / Architects, Data warehouse developers /Architects, Big data Professionals, Database Developers, Techno Managers, CTO’s.

Pre requisites for attending Hadoop Developer Training

Basic knowledge of unix , java, sql scripting

OBJECTIVE of the Training:

1. Understanding Distributed , parallel ,cloud computing ,No sql concepts

2. Setting up Hadoop infrastructure with single and multi node cluster on amazon ec2

3. Understanding of concepts of Map and Reduce and functional Programming

4. Writing Map and Reduce Programs , Working with HDFS

5. Writing Hive and Pig Scripts and working with Zoo Keeper and Sqoop

6. Ability to design and develop applications involving large data using Hadoop eco system

Course Outline

Introduction to Hadoop

  • Distributed computing
  • Parallel computing
  • Concurrency
  • Cloud Computing
  • Data Past, Present and Future
  • Computing Past, Present and Future
  • Hadoop
  • NoSQL

Hadoop Stack

  • MapReduceNoSQL
  • CAP Theorem
  • Databases: Key Value, Document, Graph
  • Hive and Pig
  • HDFS

Lab 1: Hadoop Hands-on

  • Installing Hadoop Single Node cluster
  • Understanding Hadoop configuration files

MapReduce Introduction

  • Functional – Concept of Map
  • Functional – Concept of Reduce
  • Functional – Ordering, Concurrency, No Lock, Concurrency
  • Functional – Shuffling
  • Functional – Reducing, Key, Concurrency
  • MapReduce Execution framework
  • MapReduce Partitioners and Combiners
  • MapReduce and role of distributed filesystem
  • Role of Key and Pairs
  • Hadoop Data Types

Lab 2: MapReduce Exercises

  • Understanding Sample MapReduce code
  • Executing MapReduce code

HDFS Introduction

  • Architecture
  • File System
  • Data replication
  • Name Node
  • Data Node

Flume

  • What is Flume?
  • How it works ?
  • An example

Sqoop

  • What is Sqoop?
  • How it works ?
  • An example

Oozie

  • What is Oozie?
  • How it works?
  • An example

Hive

  • Architecture
  • Data Model
  • Physical Layout
  • DDL DML SQL Operations

Lab 3: Hive Hands ON

  • Installation
  • Setup
  • Exercises

Pig

  • Rationale
  • Pig Latin
  • Input, Output and Relational Operators
  • User Defined Functions
  • Analyzing and designing using Pig Latin

Lab 4: Pig Hands on

  • Installation
  • Setup
  • Executing Pig Latin scripts on File system
  • Executing Pig Latin scripts on HDFS
  • Writing custom User Defined Functions

Introduction to Zoo Keeper

Introduction to Sqoop

Hadoop Multi node Cluster Setup

  • Installation and Configuration
  • Running MapReduce Jobs on Multi Node cluster

Working with Large data sets

  • Steps involved in analyzing large data
  • Lab walk through

Contact Us

E-mail -sales@intellipaat.com

Mob: +91-9019368913

Hadoop Window Azure

Introduction

Hadoop Developer course structure is a blend of administration and writing code using Hadoop ecosystem components to show working with Big data. Topics covered in this course, include – Map Reduce, Hive, Pig, Zoo Keeper, Sqoop and multi node setup of Hadoop Cluster on Amazon ec2. Hadoop developer course focuses on training participants on setting up Hadoop infrastructure, writing Map Reduce Programs, Hive and Pig Scripts, working with HDFS, Zoo keeper and Sqoop.

Total Duration

40 hrs

Who should attend?

Programmers, Architects, Data warehouse developers /Architects, No prior Hadoop knowledge is required, Techno Managers, Data Scientists, CTO’s

Pre requisites for attending Hadoop Developer Training

Basic knowledge of programming language, sql scripting, No prior Hadoop knowledge is required.

 

OBJECTIVE of the Training:

At the completion of the course students will be able to:

  • Understanding Distributed , parallel ,cloud computing ,No sql concepts
  • Setting up Hadoop infrastructure with single and multi-node cluster on amazon ec2
  • Understanding of concepts of Map and Reduce and functional Programming
  • Recognize use cases for HDP for windows and Big data
  • Architecture of Map Reduce and Writing Map and Reduce Programs , Working with HDFS
  • Use of Hadoop streaming
  • Recognize use cases for Hive
  • Recognize use cases for Pig
  • Writing Hive QL scripts and Pig Scripts
  • Use Pig to apply structure to unstructured Big Data
  • Understanding use case of Sqoop
  • Data movement from Hadoop to Sql server using Sqoop
  • Explain the uses and purpose of HCatalog
  • Working with Zoo Keeper
  • Working with Flume
  • Ability to design and develop applications involving large data using Hadoop eco system
  • Use the Microsoft .NET API for Hadoop to write a C# MapReduce job

 

Lab Exercises

Students will work through the following lab exercises on Windows:

  • Hadoop Installation on single Node and Multi Node Cluster Setup
  • Access HDFS using the HDFS commands
  • Import SQL Server data into HDFS using Sqoop
  • Export HDFS data from HDFS into SQL Server using Sqoop
  • Run a MapReduce Job
  • Monitor a MapReduce Job
  • Explore data using Pig
  • Split and join datasets using Pig
  • Analyze Big Data with Hive
  • Understanding MapReduce with Hive
  • Joining datasets with Hive
  • Use HCatalog with Pig
  • Define an Oozie Workflow
  • Lab exercises with Flume

Working with Large data sets

  •  Steps involved in analyzing large data
  •  Lab walk through

Working ETL POC

 

Contact Us

Email – sales@intellipaat.com

Mob: +91-9019368913