HBase: The Hadoop Database
It is an open source project and is horizontally scalable. It is database which distributed column oriented. It is built on top of the Hadoop file system. It is a non relational database system (NoSQL). HBase is a faithful, open source implementation of Google’s Bigtable.
Column oriented databases are those databases which store data tables as sections of columns of data rather than as rows of data. It is a distributed, persistent, strictly consistent storage system with near-optimal write in terms of I/O channel saturation and excellent read performance and it makes efficient use of disk space by supporting pluggable compression algorithms that can be chosen based on the nature of the data in particular column families.
HBase manages shifting load and failures elegantly and clearly to the clients. Scalability is built in and clusters can be grown or shrunk while the system is in production. Changing the cluster does not involve any difficult rebalancing or resharding procedure but is fully automated.
HBase = HDFS + DB Engine
Why we need HBase?
There are number of limitations in RDBMS that are –
- It is not good for unstructured data.
- This works well for a limited number of records
- It does not have de -normalized data.
- It is a schema oriented database.
So to overcome these problem HBase is used.
Features of HBase
The features of HBase are –
- It has easy java API for client.
- It integrates with Hadoop, both as a source and a destination.
- It is schema-less so it does not have the concept of fixed columns schema; defines only column families.
- It is good for semi-structured as well as structured data.
- It has automatic failure support.
- It provides data replication across clusters.
- It is linearly scalable.
- HBase provides fast lookups for larger tables.
- It provides low latency access to single rows from billions of records (Random access).
- Implicitly it uses Hash tables and gives random access and it saves the data in indexed HDFS files for faster lookups.
Architecture of HBase Cluster
It contains following components:
- Zookeeper – A centralized service used to preserve configuration information for hbase.
- Catalog Tables – Keep track of locations of region servers and regions.
- Master – Monitors all the region server instances in the cluster
- Region Servers – Responsible for serving and managing regions
- Region – A set of table belonging to the table. It holds a subset of table’rows like a partition.