Apache Zookeeper is a coordination service for distributed application that enables synchronization across a cluster. For Hadoop, Zookeeper is a centralized repository where distributed applications can put data and get data out of it. So, It is not just writing data as you said but also taking data from it.
Zookeeper has following properties -
Hadoop is a Distributed application, therefore more error prone and difficult to coordinate and work with due to huge number of machines attached to network.Due to this many problems are faced like -
1. Race condition
It occurs when a machine tries to perform two or more operations at the same time. This problem is solved by serialization property of ZooKeeper.
These are when two or more machines try to access the same shared resource at the same time or they try to access each other’s resources which leads to lock of system as none of the system is releasing the resource but waiting for other system to release it. Synchronization in Zookeeper helps to solve the deadlock.
3. Partial failure of process
This can lead to inconsistency of data. Zookeeper handles this and makes sure that either the whole process will finish or nothing will persist after failure.
Thus, Zookeeper is an important part of Hadoop that take care of these small but important issues so that developers can focus more on the functionality of the application.