Hadoop is an open source distributed framework for processing large datasets. It comprises of mainly two layers
It is a File System that provides us with storage capability for a huge amount of data and that too in a fault-tolerant manner with no risk of data loss as it creates replications of each set of data in different blocks across the cluster. But since it is a file system, it lacks behind in, directly accessing data randomly from any place in the file. Here, HBase comes into the actions, as in Hadoop only batch processing is performed and data will be allowed to access only in a sequential manner.
HDFS(Hadoop Distributed File System) is a component of Hadoop Architecture. It is meant for storing purpose. It basically takes a huge amount of data and divides it into small blocks of data. This helps in fast processing of data when MapReduce comes into play. It is developed from the GFS(Google File System).
Above image shows how HDFS divides the data, stores them into blocks and replicates them.
After dividing the data into smaller blocks Parallel processing is done by MapReduce, this makes the system fault-tolerant.
During this process, the file system replicates the data present in blocks as a file into different individual nodes across the different clusters. So that whenever any node crashes, the data is not lost because it can be fetched back from another node present in another server rack.
While HBase is an open source distributed column-oriented database system. It is a horizontal process to fetch data. HBase is a part of Hadoop ecosystem which deals with random read-write method and is a NoSql database.
The main difference between them is that Hadoop stores data in a flat file system manner while the HBase store data as a column fashion with a key-value pair. This column fashion of storing gives an upper hand to HBase as it gets this random read-write capability to access any data from any place in the file which is not possible in HDFS. Therefore for accessing any part of data in HDFS we have to traverse from top to bottom in a row-wise fashion, which increases its time complexity and makes it a time-consuming method to access any data from the middle.
Refer to the following video if you want more information regarding Hadoop: