HBase

HBase: The Hadoop Database

It is an open-source platform and is horizontally scalable. It is the database which distributed based on the column-oriented. It is built on topmost of the Hadoop file system. It is based on the nonrelational database system (NoSQL). HBase is a true and faithful, open-source implementation devised on Google’s Bigtable.

Watch this video on Hadoop before going further on this Hadoop tutorial

Column-oriented databases are those databases that store the data tables in terms of sections or columns of data instead of rows of data. It is specified based on distribution, persistent, strictly consistent storage system with near-optimal write in terms of Input/output channel saturation and excellent reading performance which make use makes use of efficient disk space by supporting pluggable compression algorithms that can be chosen based on the nature of the data in a particular set of column families.

HBase manages to shift the load and failures elegantly and clearly to the client-side. Scalability is built-in and clusters can be grown or shrunk while the system is still production stage. Changing the cluster does not involve any difficult rebalancing or resharding procedure but is fully automated as per the customer requirements.

Why do we need HBase?

There are number of limitations in RDBMS are as follows–

Not preferable for unstructured data.
Works very well for a limited number of records
Doesn’t contain de-normalized data.
Schema-oriented database.

Features of HBase

The features of HBase are as follows–

Easy java API for a client for better understanding.
Integrates with Hadoop, both as a source and destination.
It is schema-less so it doesn’t follow the concept of fixed columns schema and defines only column families.
Good only for semi-structured as well as structured data.
Automatic failure support.
Provides data replication or copy across clusters.
It is linearly scalable.
HBase provides fast lookups for larger table contents.
Provides low latency access to single rows from a collection of billions of records (Random access).
Implicitly uses the Hash tables and gives random access and it saves the data in indexed HDFS files for faster ways of lookups.

Architecture of HBase Cluster

It contains the following components:

Zookeeper –Centralized services which are used to preserve configuration information for Hbase.
Catalog Tables – Keep track of locations region servers.
Master – Monitors all the region server instances in the single cluster
Region Servers – It is responsible for serving and managing regions
Region – A set of tables belonging to the table column and holds a subset of table’ rows based on partition.

About the Author

Abhijit

Technical Research Analyst - Big Data Engineering

Abhijit is a Technical Research Analyst specialising in Big Data and Azure Data Engineering. He has 4+ years of experience in the Big data domain and provides consultancy services to several Fortune 500 companies. His expertise includes breaking down highly technical concepts into easy-to-understand content.