Why do we need ZooKeeper in the Hadoop stack?

Question

2 Answers

Shivangi · Answer 1 · 2019-07-02T08:16:01+0000

Apache Zookeeper is a coordination service for distributed application that enables synchronization across a cluster. For Hadoop, Zookeeper is a centralized repository where distributed applications can put data and get data out of it. So, It is not just writing data as you said but also taking data from it.

Zookeeper has following properties -

Synchronization
Serialization
Coordination

Hadoop is a Distributed application, therefore more error prone and difficult to coordinate and work with due to huge number of machines attached to network.Due to this many problems are faced like -

1. Race condition

It occurs when a machine tries to perform two or more operations at the same time. This problem is solved by serialization property of ZooKeeper.

2. Deadlocks

These are when two or more machines try to access the same shared resource at the same time or they try to access each other’s resources which leads to lock of system as none of the system is releasing the resource but waiting for other system to release it. Synchronization in Zookeeper helps to solve the deadlock.

3. Partial failure of process

This can lead to inconsistency of data. Zookeeper handles this and makes sure that either the whole process will finish or nothing will persist after failure.

Thus, Zookeeper is an important part of Hadoop that take care of these small but important issues so that developers can focus more on the functionality of the application.

Amit Rawat · Answer 2 · 2019-09-18T12:40:15+0000

Zookeeper solves the problem of reliable distributed coordination, and Hadoop is a distributed system that is designed around Active Namenode & Standby Namenode for the failover process. At any point in time, you should not have two masters ( active Namenodes) at the same time.

For more information regarding Hadoop, refer to the following video:

Why do we need ZooKeeper in the Hadoop stack?

2 Answers

Related questions

Browse Categories

Browse By Domains

Popular Courses

Popular Tutorials

Popular Resources