Introduction To Hadoop Distributed File System

By Abhijit | Last updated on November 18, 2024 | 86802 Views

HDFS and its Architecture

Hadoop stores petabytes of data using the HDFS technology. Using HDFS it is possible to connect commodity hardware or personal computers, also known as nodes in Hadoop parlance. These nodes are connected over a cluster on which the data files are stored in a distributed manner. Using the power of HDFS the whole cluster and the nodes can be easily accessed for data storage and processing. The access to data is strictly on a streaming manner using the MapReduce process.

Key features of HDFS:

HDFS is highly resilient since upon failure the workload is immediately transferred to another node
It provides an extremely good amount of throughput even for gigantic volumes of data sets
It is unlike other distributed file systems since it is based on write-once-read-many model
It allows high data coherence, removes concurrency control issues and speeds up data access
HDFS moves computation to the place where data exists instead of the other way around
Thus, applications are moved closer to the point where data resides which is much cheaper, faster and improves the overall throughput.

The reasons why HDFS works so well with Big Data:

HDFS uses the method of MapReduce for access to data which is very fast
It follows a data coherency model that is simple yet highly robust and scalable
Compatible with any commodity hardware and operating system
Achieves economy by distributing data and processing on clusters with parallel nodes
Data is always safe as it is automatically saved in multiple locations in a foolproof way
It provides a JAVA API and even a C language wrapper on top
It is easily accessible using a web browser making it highly utilitarian.

About the Author

Abhijit

Technical Research Analyst - Big Data Engineering

Abhijit is a Technical Research Analyst specialising in Big Data and Azure Data Engineering. He has 4+ years of experience in the Big data domain and provides consultancy services to several Fortune 500 companies. His expertise includes breaking down highly technical concepts into easy-to-understand content.