In Hadoop, the data which is stored within HDFS isn't always stored in a uniform fashion across the DataNode. The reason this inconsistency comes up is because DataNodes maybe added to the existing cluster and due to this addition, the existing data isn't automatically evenly distributed upon this addition.
After the addition of these DataNodes, the NameNodes taken into consideration various points to determine which DataNode would be receiving these blocks of data:
- Policies regarding data replication on one node of the same data block.
- Policy to spread different data replicas across a rack so that the cluster can survive the loss of an entire rack
- The data being replicated is put on the same rack as the node writing it to decrease I/O between different racks is reduced.
- The data has to spread uniformly across various nodes.
To learn more about Balancer in Hadoop, you can enroll in Hadoop Online Training.