Introduction To Hadoop Distributed File System
Hadoop stores petabytes of data using the HDFS technology. Using HDFS it is possible to connect commodity hardware or personal computers, also known as nodes in Hadoop parlance. These nodes are connected over a cluster on which the data files are stored in a distributed manner. Using the power of HDFS the whole cluster and the nodes can be easily accessed for data storage and processing. The access to data is strictly on a streaming manner using the MapReduce process.
Key features of HDFS:
- HDFS is highly resilient since upon failure the workload is immediately transferred to another node
- It provides an extremely goodamount of throughput even for gigantic volumes of data sets
- It is unlike other distributed file systems sinceit is based on write-once-read-many model
- It allows high data coherence, removes concurrency control issues and speeds up data access
- HDFS moves computation to the place where data exists instead of the other way around
- Thus, applications are moved closer to the point where data resides which is much cheaper, faster and improves the overall throughput.
The reasons why HDFS works so well with Big Data:
- HDFS uses the method of MapReduce for access to data which is very fast
- It follows a data coherency model that is simple yet highly robust and scalable
- Compatible with any commodity hardware and operating system
- Achieves economy by distributing data and processing on clusters with parallel nodes
- Data is always safe as it is automatically saved in multiple locations in a foolproof way
- It provides a JAVA API and even a C language wrapper on top
- It is easily accessible using a web browser making it highly utilitarian.