• Articles
  • Tutorials
  • Interview Questions

Client API: The Basics

Working with HBase Client API

5.1 CRUD Operations
The initial set of basic operations are often referred to as CRUD, which stands for create, read, update, and delete.

5.1.1 Put Method
This group of operations can be split into separate types: those that work on single rows and those that work on lists of rows.

  • Single Puts

void put(Put put) throws IOException
It expects one or a list of Put objects that, in turn, are created with one of these constructors:

Put(byte[] row)
Put(byte[] row, RowLock rowLock)
Put(byte[] row, long ts)
Put(byte[] row, long ts, RowLock rowLock)

You need to supply a row to create a Put instance. A row in HBase is identified by a unique row key and—as is the case with most values in HBase—this is a Java byte[] array.
Once you have created the Put instance you can add data to it. This is done using these methods:

Put add(byte[] family, byte[] qualifier, byte[] value)
Put add(byte[] family, byte[] qualifier, long ts, byte[] value)
Put add(KeyValue kv) throws IOException

Each call to add() specifies exactly one column, or, in combination with an optional timestamp, one single cell. Note that if you do not specify the timestamp with the add() call, the Put instance will use the optional timestamp parameter from the constructor (also called ts) and you should leave it to the region server to set it.

  •  The KeyValue class

From your code you may have to deal with KeyValue instances directly. These instances contain the data as well as the coordinates of one specific cell. The coordinates are the row key, name of the column family, column qualifier, and timestamp. The class provides a plethora of constructors that allow you to combine all of these in many variations. The fully specified constructor looks like this:

KeyValue(byte[] row, int roffset, int rlength,
byte[] family, int foffset, int flength, byte[] qualifier, int qoffset,
int qlength, long timestamp, Type type, byte[] value, int voffset, int vlength)

 

the client-side puts sorted and grouped by region server

The client API has the ability to insert single Put instances, but it also has the advanced feature of batching operations together. This comes in the form of the following call:

void put(List<Put> puts) throws IOException

 

  • Atomic compare-and-set

There is a special variation of the put calls that warrants its own section: check and put. The method signature is:

boolean checkAndPut(byte[] row, byte[] family, byte[] qualifier,
byte[] value, Put put) throws IOException

This call allows you to issue atomic, server-side mutations that are guarded by an accompanying check. If the check passes successfully, the put operation is executed; otherwise, it aborts the operation completely. It can be used to update data based on current, possibly related, values.

Example – Application inserting data into HBase

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.util.Bytes;
import java.io.IOException;
public class PutExample {
public static void main(String[] args) throws IOException {
Configuration conf = HBaseConfiguration.create();    1.
HTable table = new HTable(conf, "testtable");            2.
Put put = new Put(Bytes.toBytes("row1"));                 3.
put.add(Bytes.toBytes("colfam1"), Bytes.toBytes("qual1"),
Bytes.toBytes("val1"));            4.
put.add(Bytes.toBytes("colfam1"), Bytes.toBytes("qual2"),
Bytes.toBytes("val2"));   5.
table.put(put);               6.
}
}
}
  1. Create the required configuration.
  2. Instantiate a new client.
  3. Create Put with specific row.
  4. Add a column, whose name is “colfam1:qual1”, to the Put.
  5. Add another column, whose name is “colfam1:qual2”, to the Put.
  6. Store the row with the column into the HBase table.

Certification in Bigdata Analytics

5.1.2 Get Method
 The next step in a client API is to retrieve what was just saved. For that the HTable is providing you with the Get call and matching classes. The operations are split into those that operate on a single row and those that retrieve multiple rows in one call.

  • Single Gets

First, the method that is used to retrieve specific values from an HBase table:

Result get(Get get) throws IOException

Similar to the Put class for the put() call, there is a matching Get class used by the aforementioned get() function. As another similarity, you will have to provide a row key when creating an instance of Get, using one of these constructors:

Get(byte[] row)
Get(byte[] row, RowLock rowLock)

 

  • The Result class

When you retrieve data using the get() calls, you receive an instance of the Result class that contains all the matching cells. It provides you with the means to access everything that was returned from the server for the given row and matching the specified query, such as column family, column qualifier, timestamp, and so on.

  • List of Gets

Another similarity to the put() calls is that you can ask for more than one row using a single request. This allows you to quickly and efficiently retrieve related—but also completely random, if required—data from the remote servers.
The method provided by the API has the following signature:

Result[] get(List<Get> gets) throws IOException

Example . Application retrieving data from HBase

Configuration conf = HBaseConfiguration.create();   1.
HTable table = new HTable(conf, "testtable");    2.
Get get = new Get(Bytes.toBytes("row1"));            3.
get.addColumn(Bytes.toBytes("colfam1"), Bytes.toBytes("qual1"));   4.
Result result = table.get(get);                                 5.
byte[] val = result.getValue(Bytes.toBytes("colfam1"),
Bytes.toBytes("qual1"));                                    6.
System.out.println("Value: " + Bytes.toString(val));  7.
  1. Create the configuration.
  2. Instantiate a new table reference.
  3. Create a Get with a specific row.
  4. Add a column to the Get.
  5. Retrieve a row with selected columns from HBase.
  6. Get a specific value for the given column.
  7. Print out the value while converting it back.

Become a Big Data Architect

5.1.3. Delete method
This method is used to delete the data from Hbase tables.

  • Single Deletes

The variant of the delete() call that takes a single Delete instance is:

void delete(Delete delete) throws IOException

Just as with the get() and put() calls you saw already, you will have to create a Delete instance and then add details about the data you want to remove. The constructors are:

Delete(byte[] row)
Delete(byte[] row, long timestamp, RowLock rowLock)

You need to provide the row you want to modify, and optionally provide a rowLock, an instance of RowLock to specify your own lock details, in case you want to modify the same row more than once subsequently.
 

  •  List of Deletes

 The list-based delete() call works very similarly to the list-based put(). You need to create a list of Delete instances, configure them, and call the following method:

void delete(List<Delete> deletes) throws IOException

 

  • Atomic compare-and-delete

There is an equivalent call for deletes that gives you access to server-side, read-and-modify functionality:

boolean checkAndDelete(byte[] row, byte[] family, byte[] qualifier,
byte[] value, Delete delete) throws IOException

You need to specify the row key, column family, qualifier, and value to check before the actual delete operation is performed. Should the test fail, nothing is deleted and the call returns a false. If the check is successful, the delete is applied and true is returned.

Example:. Application deleting data from HBase

Delete delete = new Delete(Bytes.toBytes("row1"));  1.
delete.setTimestamp(1);                      2.
delete.deleteColumn(Bytes.toBytes("colfam1"), Bytes.toBytes("qual1"), 1);  3.
delete.deleteColumns(Bytes.toBytes("colfam2"), Bytes.toBytes("qual1"));    4.
delete.deleteColumns(Bytes.toBytes("colfam2"), Bytes.toBytes("qual3"), 15);   5.
delete.deleteFamily(Bytes.toBytes("colfam3"));    6.
delete.deleteFamily(Bytes.toBytes("colfam3"), 3);   7.
table.delete(delete);  8.
table.close();
  1. Create a Delete with a specific row.
  2. Set a timestamp for row deletes.
  3. Delete a specific version in one column.
  4. Delete all versions in one column.
  5. Delete the given and all older versions in one column.
  6. Delete the entire family, all columns and versions.
  7. Delete the given and all older versions in the entire column family, that is, from all columns therein.
  8. Delete the data from the HBase table.

5.1.4 Row Locks
Mutating operations—like put(), delete(), checkAndPut(), and so on—are executed exclusively, which means in a serial fashion, for each row, to guarantee row-level atomicity. The region servers provide a row lock feature ensuring that only a client holding the matching lock can modify a row. In practice, though, most client applications do not provide an explicit lock, but rather rely on the mechanism in place that guards each operation separately. When you send, for example, a put() call to the server with an instance of Put, created with the following constructor:

Put(byte[] row)

Which is not providing a RowLock instance parameter, the servers will create a lock on your behalf, just for the duration of the call. In fact, from the client API you cannot even retrieve this short-lived, server-side lock instance.

Course Schedule

Name Date Details
No Sql Course 14 Dec 2024(Sat-Sun) Weekend Batch View Details
21 Dec 2024(Sat-Sun) Weekend Batch
28 Dec 2024(Sat-Sun) Weekend Batch

About the Author

Data Engineer

As a skilled Data Engineer, Sahil excels in SQL, NoSQL databases, Business Intelligence, and database management. He has contributed immensely to projects at companies like Bajaj and Tata. With a strong expertise in data engineering, he has architected numerous solutions for data pipelines, analytics, and software integration, driving insights and innovation.