Storm-HBase is a combination of Storm and HBase. This library will let you use HBase as a spout within Storm.
HBaseSpout is used to continuously read stream data from HBase cluster according to the range of [start_timestamp, stop_timestamp]:
- If
start_timestampis set to 0,HBaseSpoutwill read data from 3 minutes ago by default; otherwise it will read data from the specifiedstart_timestamp. - If
stop_timestampis set to 0,HBaseSpoutwill read data unit now and keep on reading as time goes on by default; otherwise it will read data unit the specifiedstop_timestamp.
All the configuration options can be found in src/main/resources/storm.properties and src/main/resources/hbase.properties files. You can also custom them or some of them if necessary.
HBaseSpout is based on the following assumptions:
- the rowkey of HBase table consists of [
shardingkey,timestamp, ...]. - the
shardingkeytakes up the 1st byte of rowkey, which means the data partitions number of HBase table and usually is a short type number. - the
timestamptakes up the 2nd to 5th bytes of rowkey, which is an UNIX timestamp in second.
The code implementation of spout: HBaseSpout.java.
The test case of spout: HBaseSpoutTest.java.
How to use HBaseSpout, please refer to: DumpToHBaseTopology.java or OutputTopology.java.
- Yuan Panfeng (@ypf412)