Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
+18 votes
3 views
in Big Data Hadoop & Spark by (10.5k points)

Can someone tell me what is the basic difference between HBase and Hadoop?

I have done my own research but I am not quite able to understand what is that separates them. I am new to Hadoop and NoSQL, so if you can explain in Layman terms.

3 Answers

+12 votes
by (13.2k points)
edited by

Hadoop
Hadoop is associate degree open supply project of the Apache foundation, it's a framework written in Java, originally developed by Doug Cutting in 2005, it was created to support distribution for Nutch, the text program. Hadoop uses Google's Map scale back and Google classification system Technologies as its foundation. Some of the major features of Hadoop are given below:

  1.  Hadoop Is Easily Scalable, what that means is new nodes can easily be   added to the existing data, which makes it ideal to be used in open source projects.
   2.  Hadoop Is Fault Tolerant, it gets this reputation as the data is stored up in HDFS where the data is automatically gets replicated to other places.
    3. It is great at faster data processing, which is attributable to its ability to try and do multiprocessing, hadoop will perform batch processes ten times quicker than on one thread server or on the mainframe.

    Hadoop generates value edges by bringing parallel computing to the servers, leading to a considerable reduction within the value per TB of storage, that successively makes it cheap to model all of the information.

Hadoop scheme

Following are elements of Hadoop :

HDFS: Hadoop Distributed file system. It merely stores information files as close to original form of the that file.

HBase: It's Hadoop's database, it supports structured information storage for big tables.

Hive: It allows analysis of huge datasets employing a language like commonplace ANSI SQL, which means that anyone familiar with SQL ought to be ready to access information on a Hadoop cluster.

Pig: It's a simple to grasp information flow language. It helps with analysis of huge datasets that is kind of the order with Hadoop.

0 votes
by (32.3k points)
edited by

Hadoop is an open source distributed framework for processing large datasets. It comprises of mainly two layers

  • HDFS(Storage layer)

  • MapReduce(Processing layer)

It is a File System that provides us with storage capability for a huge amount of data and that too in a fault-tolerant manner with no risk of data loss as it creates replications of each set of data in different blocks across the cluster. But since it is a file system, it lacks behind in, directly accessing data randomly from any place in the file. Here, HBase comes into the actions, as in Hadoop only batch processing is performed and data will be allowed to access only in a sequential manner.

HDFS(Hadoop Distributed File System) is a component of Hadoop Architecture. It is meant for storing purpose. It basically takes a huge amount of data and divides it into small blocks of data. This helps in fast processing of data when MapReduce comes into play. It is developed from the GFS(Google File System).

image

Above image shows how HDFS divides the data, stores them into blocks and replicates them.

After dividing the data into smaller blocks Parallel processing is done by MapReduce, this makes the system fault-tolerant.

During this process, the file system replicates the data present in blocks as a file into different individual nodes across the different clusters. So that whenever any node crashes, the data is not lost because it can be fetched back from another node present in another server rack.

While HBase is an open source distributed column-oriented database system. It is a horizontal process to fetch data. HBase is a part of Hadoop ecosystem which deals with random read-write method and is a NoSql database.

The main difference between them is that Hadoop stores data in a flat file system manner while the HBase store data as a column fashion with a key-value pair. This column fashion of storing gives an upper hand to HBase as it gets this random read-write capability to access any data from any place in the file which is not possible in HDFS. Therefore for accessing any part of data in HDFS we have to traverse from top to bottom in a row-wise fashion, which increases its time complexity and makes it a time-consuming method to access any data from the middle.

Refer to the following video if you want more information regarding Hadoop:

0 votes
by (11.4k points)

 

HDFS

HBase

HDFS is a Java-based file system utilized for storing large data sets.

HBase is a Java based Not Only SQL database.

HDFS has a rigid architecture that does not allow changes. It doesn’t facilitate dynamic storage.

HBase allows for dynamic changes and can be utilized for standalone applications.

HDFS is ideally suited for write-once and read-many times use cases.

HBase is ideally suited for random write and read of data that is stored in HDFS.

 

Refrence: https://www.kdnuggets.com/2017/05/hdfs-hbase-need-know.html

31k questions

32.8k answers

501 comments

693 users

Browse Categories

...