Working with HBase Client API
5.1 CRUD Operations
The initial set of basic operations are often referred to as CRUD, which stands for create, read, update, and delete.
5.1.1 Put Method
This group of operations can be split into separate types: those that work on single rows and those that work on lists of rows.
void put(Put put) throws IOException
It expects one or a list of Put objects that, in turn, are created with one of these constructors:
Put(byte[] row)
Put(byte[] row, RowLock rowLock)
Put(byte[] row, long ts)
Put(byte[] row, long ts, RowLock rowLock)
You need to supply a row to create a Put instance. A row in HBase is identified by a unique row key and—as is the case with most values in HBase—this is a Java byte[] array.
Once you have created the Put instance you can add data to it. This is done using these methods:
Put add(byte[] family, byte[] qualifier, byte[] value)
Put add(byte[] family, byte[] qualifier, long ts, byte[] value)
Put add(KeyValue kv) throws IOException
Each call to add() specifies exactly one column, or, in combination with an optional timestamp, one single cell. Note that if you do not specify the timestamp with the add() call, the Put instance will use the optional timestamp parameter from the constructor (also called ts) and you should leave it to the region server to set it.
From your code you may have to deal with KeyValue instances directly. These instances contain the data as well as the coordinates of one specific cell. The coordinates are the row key, name of the column family, column qualifier, and timestamp. The class provides a plethora of constructors that allow you to combine all of these in many variations. The fully specified constructor looks like this:
KeyValue(byte[] row, int roffset, int rlength,
byte[] family, int foffset, int flength, byte[] qualifier, int qoffset,
int qlength, long timestamp, Type type, byte[] value, int voffset, int vlength)
The client API has the ability to insert single Put instances, but it also has the advanced feature of batching operations together. This comes in the form of the following call:
void put(List<Put> puts) throws IOException
There is a special variation of the put calls that warrants its own section: check and put. The method signature is:
boolean checkAndPut(byte[] row, byte[] family, byte[] qualifier,
byte[] value, Put put) throws IOException
This call allows you to issue atomic, server-side mutations that are guarded by an accompanying check. If the check passes successfully, the put operation is executed; otherwise, it aborts the operation completely. It can be used to update data based on current, possibly related, values.
Example – Application inserting data into HBase
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.util.Bytes;
import java.io.IOException;
public class PutExample {
public static void main(String[] args) throws IOException {
Configuration conf = HBaseConfiguration.create(); 1.
HTable table = new HTable(conf, "testtable"); 2.
Put put = new Put(Bytes.toBytes("row1")); 3.
put.add(Bytes.toBytes("colfam1"), Bytes.toBytes("qual1"),
Bytes.toBytes("val1")); 4.
put.add(Bytes.toBytes("colfam1"), Bytes.toBytes("qual2"),
Bytes.toBytes("val2")); 5.
table.put(put); 6.
}
}
}
- Create the required configuration.
- Instantiate a new client.
- Create Put with specific row.
- Add a column, whose name is “colfam1:qual1”, to the Put.
- Add another column, whose name is “colfam1:qual2”, to the Put.
- Store the row with the column into the HBase table.
5.1.2 Get Method
The next step in a client API is to retrieve what was just saved. For that the HTable is providing you with the Get call and matching classes. The operations are split into those that operate on a single row and those that retrieve multiple rows in one call.
First, the method that is used to retrieve specific values from an HBase table:
Result get(Get get) throws IOException
Similar to the Put class for the put() call, there is a matching Get class used by the aforementioned get() function. As another similarity, you will have to provide a row key when creating an instance of Get, using one of these constructors:
Get(byte[] row)
Get(byte[] row, RowLock rowLock)
When you retrieve data using the get() calls, you receive an instance of the Result class that contains all the matching cells. It provides you with the means to access everything that was returned from the server for the given row and matching the specified query, such as column family, column qualifier, timestamp, and so on.
Another similarity to the put() calls is that you can ask for more than one row using a single request. This allows you to quickly and efficiently retrieve related—but also completely random, if required—data from the remote servers.
The method provided by the API has the following signature:
Result[] get(List<Get> gets) throws IOException
Example . Application retrieving data from HBase
Configuration conf = HBaseConfiguration.create(); 1.
HTable table = new HTable(conf, "testtable"); 2.
Get get = new Get(Bytes.toBytes("row1")); 3.
get.addColumn(Bytes.toBytes("colfam1"), Bytes.toBytes("qual1")); 4.
Result result = table.get(get); 5.
byte[] val = result.getValue(Bytes.toBytes("colfam1"),
Bytes.toBytes("qual1")); 6.
System.out.println("Value: " + Bytes.toString(val)); 7.
- Create the configuration.
- Instantiate a new table reference.
- Create a Get with a specific row.
- Add a column to the Get.
- Retrieve a row with selected columns from HBase.
- Get a specific value for the given column.
- Print out the value while converting it back.
5.1.3. Delete method
This method is used to delete the data from Hbase tables.
The variant of the delete() call that takes a single Delete instance is:
void delete(Delete delete) throws IOException
Just as with the get() and put() calls you saw already, you will have to create a Delete instance and then add details about the data you want to remove. The constructors are:
Delete(byte[] row)
Delete(byte[] row, long timestamp, RowLock rowLock)
You need to provide the row you want to modify, and optionally provide a rowLock, an instance of RowLock to specify your own lock details, in case you want to modify the same row more than once subsequently.
The list-based delete() call works very similarly to the list-based put(). You need to create a list of Delete instances, configure them, and call the following method:
void delete(List<Delete> deletes) throws IOException
- Atomic compare-and-delete
There is an equivalent call for deletes that gives you access to server-side, read-and-modify functionality:
boolean checkAndDelete(byte[] row, byte[] family, byte[] qualifier,
byte[] value, Delete delete) throws IOException
You need to specify the row key, column family, qualifier, and value to check before the actual delete operation is performed. Should the test fail, nothing is deleted and the call returns a false. If the check is successful, the delete is applied and true is returned.
Example:. Application deleting data from HBase
Delete delete = new Delete(Bytes.toBytes("row1")); 1.
delete.setTimestamp(1); 2.
delete.deleteColumn(Bytes.toBytes("colfam1"), Bytes.toBytes("qual1"), 1); 3.
delete.deleteColumns(Bytes.toBytes("colfam2"), Bytes.toBytes("qual1")); 4.
delete.deleteColumns(Bytes.toBytes("colfam2"), Bytes.toBytes("qual3"), 15); 5.
delete.deleteFamily(Bytes.toBytes("colfam3")); 6.
delete.deleteFamily(Bytes.toBytes("colfam3"), 3); 7.
table.delete(delete); 8.
table.close();
- Create a Delete with a specific row.
- Set a timestamp for row deletes.
- Delete a specific version in one column.
- Delete all versions in one column.
- Delete the given and all older versions in one column.
- Delete the entire family, all columns and versions.
- Delete the given and all older versions in the entire column family, that is, from all columns therein.
- Delete the data from the HBase table.
5.1.4 Row Locks
Mutating operations—like put(), delete(), checkAndPut(), and so on—are executed exclusively, which means in a serial fashion, for each row, to guarantee row-level atomicity. The region servers provide a row lock feature ensuring that only a client holding the matching lock can modify a row. In practice, though, most client applications do not provide an explicit lock, but rather rely on the mechanism in place that guards each operation separately. When you send, for example, a put() call to the server with an instance of Put, created with the following constructor:
Put(byte[] row)
Which is not providing a RowLock instance parameter, the servers will create a lock on your behalf, just for the duration of the call. In fact, from the client API you cannot even retrieve this short-lived, server-side lock instance.