Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
+1 vote
in Big Data Hadoop & Spark by (920 points)

I am new to Hadoop/ZooKeeper. I cannot understand the purpose of using ZooKeeper with Hadoop, is ZooKeeper writing data in Hadoop? If not, then why we do we use ZooKeeper with Hadoop?

2 Answers

0 votes
by (13.2k points)

Apache Zookeeper is a coordination service for distributed application that enables synchronization across a cluster. For Hadoop, Zookeeper is a centralized repository where distributed applications can put data and get data out of it. So, It is not just writing data as you said but also taking data from it.

Zookeeper has following properties -

  1. Synchronization

  2. Serialization

  3. Coordination

Hadoop is a Distributed application, therefore more error prone and difficult to coordinate and work with due to huge number of machines attached to network.Due to this many problems are faced like -

1. Race condition

It occurs when a machine tries to perform two or more operations at the same time. This problem is solved by serialization property of ZooKeeper.

2. Deadlocks

These are when two or more machines try to access the same shared resource at the same time or they try to access each other’s resources which leads to lock of system as none of the system is releasing the resource but waiting for other system to release it. Synchronization in Zookeeper helps to solve the deadlock.

 3. Partial failure of process 

This can lead to inconsistency of data. Zookeeper handles this and makes sure that either the whole process will finish or nothing will persist after failure.

Thus, Zookeeper is an important part of Hadoop that take care of these small but important issues so that developers can focus more on the functionality of the application.

0 votes
by (32.3k points)
Zookeeper solves the problem of reliable distributed coordination, and Hadoop is a distributed system that is designed around Active Namenode & Standby Namenode for the failover process. At any point in time, you should not have two masters ( active Namenodes) at the same time.

For more information regarding Hadoop, refer to the following video:

Related questions

Browse Categories