0 votes
1 view
in Big Data Hadoop & Spark by (11.9k points)
This is kind of naive question but I am new to NoSQL paradigm and don't know much about it. So if somebody can help me clearly understand difference between the HBase and Hadoop or if give some pointers which might help me understand the difference.

Till now, I did some research and acc. to my understanding Hadoop provides framework to work with raw chunk of data(files) in HDFS and HBase is database engine above Hadoop, which basically works with structured data instead of raw data chunk. Hbase provides a logical layer over HDFS just as SQL does. Is it correct?

Pls feel free to correct me.


1 Answer

0 votes
by (31.6k points)
Hadoop uses distributed file system i.e HDFS for storing bigdata.But there are certain Limitations of HDFS and Inorder to overcome these limitations, NoSQL databases such as HBase,Cassandra and Mongodb came into existence.

Hadoop can perform only batch processing, and data will be accessed only in a sequential manner. That means one has to search the entire dataset even for the simplest of jobs.A huge dataset when processed results in another huge data set, which should also be processed sequentially. At this point, a new solution is needed to access any point of data in a single unit of time (random access).

Like all other FileSystems, HDFS provides us storage, but in a fault tolerant manner with high throughput and lower risk of data loss(because of the replication).But, being a File System , HDFS lacks random read and write access. This is where HBase comes into picture. It’s a distributed, scalable, big data store, modelled after Google’s BigTable. Cassandra is somewhat similar to hbase.
Welcome to Intellipaat Community. Get your technical queries answered by top developers !