HDFS and its Architecture Hadoop stores petabytes of data using the HDFS technology. Using HDFS it is possible to connect commodity hardware or personal computers, also known as nodes in Hadoop parlance. These nodes are connected over a cluster on which the data files are stored in a distributed manner. Using the power of HDFS the whole cluster and the nodes can be easily accessed for data storage and processing. The access to data is strictly on a streaming manner using the MapReduce process. Key features of HDFS: HDFS is highly resilient since upon failure the workload is immediately transferred to another node It provides an extremely good amount of throughput even for gigantic volumes of data sets It is unlike other distributed file systems since it is based on write-once-read-many model It allows high data coherence, removes concurrency control issues and speeds up data access HDFS moves computation to the place where data exists instead of the other way around Thus, applications are moved closer to the point where data resides which is much cheaper, faster and improves the overall throughput. The reasons why HDFS works so well with Big Data: HDFS uses the method of MapReduce for access to data which is very fast It follows a data coherency model that is simple yet highly robust and scalable Compatible with any commodity hardware and operating system Achieves economy by distributing data and processing on clusters with parallel nodes Data is always safe as it is automatically saved in multiple locations in a foolproof way It provides a JAVA API and even a C language wrapper on top It is easily accessible using a web browser making it highly utilitarian.