Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
43 views11 pages

HBASE

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views11 pages

HBASE

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

LECTURE NOTES

ON
BIG DATA
VI Semester (KCS-061)

Mr. Manish Gupta, Assistant Professor


UNIT V

HBase

What is HBase?
HBase is a distributed column-oriented database built on top of the Hadoop file system. It is
an open-source project and is horizontally scalable.
HBase is a data model that is similar to Google’s big table designed to provide quick
random access to huge amounts of structured data. It leverages the fault tolerance provided
by the Hadoop File System (HDFS).
It is a part of the Hadoop ecosystem that provides random real-time read/write access to
data in the Hadoop File System.
One can store the data in HDFS either directly or through HBase. Data consumer
reads/accesses the data in HDFS randomly using HBase. HBase sits on top of the Hadoop
File System and provides read and write access.

HBase and HDFS


HDFS HBase

HDFS is a distributed file system suitable for HBase is a database built on top of the HDFS.
storing large files.

HDFS does not support fast individual record HBase provides fast lookups for larger tables.
lookups.

It provides high latency batch processing; no It provides low latency access to single rows from billions of records
concept of batch processing. (Random access).

2
It provides only sequential access of data. HBase internally uses Hash tables and provides random access, and
stores the data in indexed HDFS files for faster lookups.

Storage Mechanism in HBase


HBase is a column-oriented database and the tables in it are sorted by row. The table
schema defines only column families, which are the key value pairs. A table have multiple
column families and each column family can have any number of columns. Subsequent
column values are stored contiguously on the disk. Each cell value of the table has a
timestamp. In short, in an HBase:

 Table is a collection of rows.


 Row is a collection of column families.
 Column family is a collection of columns.
 Column is a collection of key value pairs.
Given below is an example schema of table in HBase.

Rowid Column Family Column Family Column Family Column Fam

col1 col2 col3 col1 col2 col3 col1 col2 col3 col1 col2

Column Oriented and Row Oriented


Column-oriented databases are those that store data tables as sections of columns of data,
rather than as rows of data. Shortly, they will have column families.

Row-Oriented Database Column-Oriented Database

It is suitable for Online Transaction Process (OLTP). It is suitable for Online Analytical Processing (OL

Such databases are designed for small number of rows and Column-oriented databases are designed for hug
columns. tables.

3
The following image shows column families in a column-oriented database:

HBase and RDBMS


HBase RDBMS

HBase is schema-less, it doesn't have the concept of fixed columns An RDBMS is governed by its schema, which de
schema; defines only column families. the whole structure of tables.

It is built for wide tables. HBase is horizontally scalable. It is thin and built for small tables. Hard to scale.

No transactions are there in HBase. RDBMS is transactional.

It has de-normalized data. It will have normalized data.

It is good for semi-structured as well as structured data. It is good for structured data.

Features of HBase
 HBase is linearly scalable.
 It has automatic failure support.
 It provides consistent read and writes.
 It integrates with Hadoop, both as a source and a destination.
 It has easy java API for client.

4
 It provides data replication across clusters.

Where to Use HBase


 Apache HBase is used to have random, real-time read/write access to Big Data.
 It hosts very large tables on top of clusters of commodity hardware.
 Apache HBase is a non-relational database modeled after Google's Bigtable. Bigtable
acts up on Google File System, likewise Apache HBase works on top of Hadoop and
HDFS.

Applications of HBase
 It is used whenever there is a need to write heavy applications.
 HBase is used whenever we need to provide fast random access to available data.
 Companies such as Facebook, Twitter, Yahoo, and Adobe use HBase internally.

HBase History
Year Event

Nov 2006 Google released the paper on BigTable.

Feb 2007 Initial HBase prototype was created as a Hadoop contribution.

Oct 2007 The first usable HBase along with Hadoop 0.15.0 was released.

Jan 2008 HBase became the sub project of Hadoop.

Oct 2008 HBase 0.18.1 was released.

Jan 2009 HBase 0.19.0 was released.

Sept 2009 HBase 0.20.0 was released.

May 2010 HBase became Apache top-level project.

In HBase, tables are split into regions and are served by the region servers. Regions are
vertically divided by column families into “Stores”. Stores are saved as files in HDFS. Shown
below is the architecture of HBase.
Note: The term ‘store’ is used for regions to explain the storage structure.

5
HBase has three major components: the client library, a master server, and region servers.
Region servers can be added or removed as per requirement.

MasterServer
The master server -
 Assigns regions to the region servers and takes the help of Apache ZooKeeper for
this task.
 Handles load balancing of the regions across region servers. It unloads the busy
servers and shifts the regions to less occupied servers.
 Maintains the state of the cluster by negotiating the load balancing.
 Is responsible for schema changes and other metadata operations such as creation
of tables and column families.

Regions
Regions are nothing but tables that are split up and spread across the region servers.

Region server
The region servers have regions that -

 Communicate with the client and handle data-related operations.


 Handle read and write requests for all the regions under it.
 Decide the size of the region by following the region size thresholds.
When we take a deeper look into the region server, it contain regions and stores as shown
below:

6
The store contains memory store and HFiles. Memstore is just like a cache memory.
Anything that is entered into the HBase is stored here initially. Later, the data is transferred
and saved in Hfiles as blocks and the memstore is flushed.

Zookeeper
 Zookeeper is an open-source project that provides services like maintaining
configuration information, naming, providing distributed synchronization, etc.
 Zookeeper has ephemeral nodes representing different region servers. Master
servers use these nodes to discover available servers.
 In addition to availability, the nodes are also used to track server failures or network
partitions.
 Clients communicate with region servers via zookeeper.
 In pseudo and standalone modes, HBase itself will take care of zookeeper.
 The general commands in HBase are status, version, table_help, and whoami. This
chapter explains these commands.

 status
 This command returns the status of the system including the details of the servers
running on the system. Its syntax is as follows:
 hbase(main):009:0> status
 If you execute this command, it returns the following output.
 hbase(main):009:0> status
 3 servers, 0 dead, 1.3333 average load

 version
 This command returns the version of HBase used in your system. Its syntax is as
follows:
 hbase(main):010:0> version

7
 If you execute this command, it returns the following output.
 hbase(main):009:0> version
 0.98.8-hadoop2, r6cfc8d064754251365e070a10a82eb169956d5fe, Fri Nov
14
 18:26:29 PST 2014

 table_help
 This command guides you what and how to use table-referenced commands. Given
below is the syntax to use this command.
 hbase(main):02:0> table_help
 When you use this command, it shows help topics for table-related commands. Given
below is the partial output of this command.
 hbase(main):002:0> table_help
 Help for table-reference commands.
 You can either create a table via 'create' and then manipulate the
table
 via commands like 'put', 'get', etc.
 See the standard help information for how to use each of these
commands.
 However, as of 0.96, you can also get a reference to a table, on
which
 you can invoke commands.
 For instance, you can get create a table and keep around a
reference to
 it via:
 hbase> t = create 't', 'cf'…...

 whoami
 This command returns the user details of HBase. If you execute this command,
returns the current HBase user as shown below.
 hbase(main):008:0> whoami
 hadoop (auth:SIMPLE)
 groups: hadoop

Creating a Table using HBase Shell


You can create a table using the create command, here you must specify the table name
and the Column Family name. The syntax to create a table in HBase shell is shown below.
create ‘<table name>’,’<column family>’

Example
Given below is a sample schema of a table named emp. It has two column families:
“personal data” and “professional data”.

Row key personal data professional data

8
You can create this table in HBase shell as shown below.
hbase(main):002:0> create 'emp', 'personal data', 'professional data'
And it will give you the following output.
0 row(s) in 1.1300 seconds
=> Hbase::Table - emp

Verification
You can verify whether the table is created using the list command as shown below. Here
you can observe the created emp table.
hbase(main):002:0> list
TABLE
emp
2 row(s) in 0.0340 seconds

Creating a Table Using java API


You can create a table in HBase using the createTable() method of HBaseAdmin class.
This class belongs to the org.apache.hadoop.hbase.client package. Given below are the
steps to create a table in HBase using java API.

Step1: Instantiate HBaseAdmin


This class requires the Configuration object as a parameter, therefore initially instantiate the
Configuration class and pass this instance to HBaseAdmin.
Configuration conf = HBaseConfiguration.create();
HBaseAdmin admin = new HBaseAdmin(conf);

Step2: Create TableDescriptor


HTableDescriptor is a class that belongs to the org.apache.hadoop.hbase class. This
class is like a container of table names and column families.
//creating table descriptor
HTableDescriptor table = new HTableDescriptor(toBytes("Table name"));

//creating column family descriptor


HColumnDescriptor family = new HColumnDescriptor(toBytes("column
family"));

//adding coloumn family to HTable


table.addFamily(family);

Step 3: Execute through Admin


Using the createTable() method of HBaseAdmin class, you can execute the created table
in Admin mode.
admin.createTable(table);
Given below is the complete program to create a table via admin.

9
import java.io.IOException;

import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.HColumnDescriptor;
import org.apache.hadoop.hbase.HTableDescriptor;
import org.apache.hadoop.hbase.client.HBaseAdmin;
import org.apache.hadoop.hbase.TableName;

import org.apache.hadoop.conf.Configuration;

public class CreateTable {

public static void main(String[] args) throws IOException {

// Instantiating configuration class


Configuration con = HBaseConfiguration.create();

// Instantiating HbaseAdmin class


HBaseAdmin admin = new HBaseAdmin(con);

// Instantiating table descriptor class


HTableDescriptor tableDescriptor = new
HTableDescriptor(TableName.valueOf("emp"));

// Adding column families to table descriptor


tableDescriptor.addFamily(new HColumnDescriptor("personal"));
tableDescriptor.addFamily(new HColumnDescriptor("professional"));

// Execute the table through admin


admin.createTable(tableDescriptor);
System.out.println(" Table created ");
}
}

Compile and execute the above program as shown below.


$javac CreateTable.java
$java CreateTable
The following should be the output:
Table created

Listing a Table using HBase Shell


list is the command that is used to list all the tables in HBase. Given below is the syntax of
the list command.
hbase(main):001:0 > list
When you type this command and execute in HBase prompt, it will display the list of all the
tables in HBase as shown below.
hbase(main):001:0> list

10
TABLE
emp
Here you can observe a table named emp.

11

You might also like