0% found this document useful (0 votes)

43 views11 pages

HBASE

Uploaded by

shivangiyadav09022003

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views11 pages

HBASE

Uploaded by

shivangiyadav09022003

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 11

LECTURE NOTES

ON
BIG DATA
VI Semester (KCS-061)

Mr. Manish Gupta, Assistant Professor

UNIT V

HBase

What is HBase?
HBase is a distributed column-oriented database built on top of the Hadoop file system. It is
an open-source project and is horizontally scalable.
HBase is a data model that is similar to Google’s big table designed to provide quick
random access to huge amounts of structured data. It leverages the fault tolerance provided
by the Hadoop File System (HDFS).
It is a part of the Hadoop ecosystem that provides random real-time read/write access to
data in the Hadoop File System.
One can store the data in HDFS either directly or through HBase. Data consumer
reads/accesses the data in HDFS randomly using HBase. HBase sits on top of the Hadoop
File System and provides read and write access.

HBase and HDFS

HDFS HBase

HDFS is a distributed file system suitable for HBase is a database built on top of the HDFS.
storing large files.

HDFS does not support fast individual record HBase provides fast lookups for larger tables.
lookups.

It provides high latency batch processing; no It provides low latency access to single rows from billions of records
concept of batch processing. (Random access).

2
It provides only sequential access of data. HBase internally uses Hash tables and provides random access, and
stores the data in indexed HDFS files for faster lookups.

Storage Mechanism in HBase

HBase is a column-oriented database and the tables in it are sorted by row. The table
schema defines only column families, which are the key value pairs. A table have multiple
column families and each column family can have any number of columns. Subsequent
column values are stored contiguously on the disk. Each cell value of the table has a
timestamp. In short, in an HBase:

 Table is a collection of rows.

 Row is a collection of column families.
 Column family is a collection of columns.
 Column is a collection of key value pairs.
Given below is an example schema of table in HBase.

Rowid Column Family Column Family Column Family Column Fam

col1 col2 col3 col1 col2 col3 col1 col2 col3 col1 col2

Column Oriented and Row Oriented

Column-oriented databases are those that store data tables as sections of columns of data,
rather than as rows of data. Shortly, they will have column families.

Row-Oriented Database Column-Oriented Database

It is suitable for Online Transaction Process (OLTP). It is suitable for Online Analytical Processing (OL

Such databases are designed for small number of rows and Column-oriented databases are designed for hug
columns. tables.

3
The following image shows column families in a column-oriented database:

HBase and RDBMS

HBase RDBMS

HBase is schema-less, it doesn't have the concept of fixed columns An RDBMS is governed by its schema, which de
schema; defines only column families. the whole structure of tables.

It is built for wide tables. HBase is horizontally scalable. It is thin and built for small tables. Hard to scale.

No transactions are there in HBase. RDBMS is transactional.

It has de-normalized data. It will have normalized data.

It is good for semi-structured as well as structured data. It is good for structured data.

Features of HBase
 HBase is linearly scalable.
 It has automatic failure support.
 It provides consistent read and writes.
 It integrates with Hadoop, both as a source and a destination.
 It has easy java API for client.

4
 It provides data replication across clusters.

Where to Use HBase

 Apache HBase is used to have random, real-time read/write access to Big Data.
 It hosts very large tables on top of clusters of commodity hardware.
 Apache HBase is a non-relational database modeled after Google's Bigtable. Bigtable
acts up on Google File System, likewise Apache HBase works on top of Hadoop and
HDFS.

Applications of HBase
 It is used whenever there is a need to write heavy applications.
 HBase is used whenever we need to provide fast random access to available data.
 Companies such as Facebook, Twitter, Yahoo, and Adobe use HBase internally.

HBase History
Year Event

Nov 2006 Google released the paper on BigTable.

Feb 2007 Initial HBase prototype was created as a Hadoop contribution.

Oct 2007 The first usable HBase along with Hadoop 0.15.0 was released.

Jan 2008 HBase became the sub project of Hadoop.

Oct 2008 HBase 0.18.1 was released.

Jan 2009 HBase 0.19.0 was released.

Sept 2009 HBase 0.20.0 was released.

May 2010 HBase became Apache top-level project.

In HBase, tables are split into regions and are served by the region servers. Regions are
vertically divided by column families into “Stores”. Stores are saved as files in HDFS. Shown
below is the architecture of HBase.
Note: The term ‘store’ is used for regions to explain the storage structure.

5
HBase has three major components: the client library, a master server, and region servers.
Region servers can be added or removed as per requirement.

MasterServer
The master server -
 Assigns regions to the region servers and takes the help of Apache ZooKeeper for
this task.
 Handles load balancing of the regions across region servers. It unloads the busy
servers and shifts the regions to less occupied servers.
 Maintains the state of the cluster by negotiating the load balancing.
 Is responsible for schema changes and other metadata operations such as creation
of tables and column families.

Regions
Regions are nothing but tables that are split up and spread across the region servers.

Region server
The region servers have regions that -

 Communicate with the client and handle data-related operations.

 Handle read and write requests for all the regions under it.
 Decide the size of the region by following the region size thresholds.
When we take a deeper look into the region server, it contain regions and stores as shown
below:

6
The store contains memory store and HFiles. Memstore is just like a cache memory.
Anything that is entered into the HBase is stored here initially. Later, the data is transferred
and saved in Hfiles as blocks and the memstore is flushed.

Zookeeper
 Zookeeper is an open-source project that provides services like maintaining
configuration information, naming, providing distributed synchronization, etc.
 Zookeeper has ephemeral nodes representing different region servers. Master
servers use these nodes to discover available servers.
 In addition to availability, the nodes are also used to track server failures or network
partitions.
 Clients communicate with region servers via zookeeper.
 In pseudo and standalone modes, HBase itself will take care of zookeeper.
 The general commands in HBase are status, version, table_help, and whoami. This
chapter explains these commands.

 status
 This command returns the status of the system including the details of the servers
running on the system. Its syntax is as follows:
 hbase(main):009:0> status
 If you execute this command, it returns the following output.
 hbase(main):009:0> status
 3 servers, 0 dead, 1.3333 average load

 version
 This command returns the version of HBase used in your system. Its syntax is as
follows:
 hbase(main):010:0> version

7
 If you execute this command, it returns the following output.
 hbase(main):009:0> version
 0.98.8-hadoop2, r6cfc8d064754251365e070a10a82eb169956d5fe, Fri Nov
14
 18:26:29 PST 2014

 table_help
 This command guides you what and how to use table-referenced commands. Given
below is the syntax to use this command.
 hbase(main):02:0> table_help
 When you use this command, it shows help topics for table-related commands. Given
below is the partial output of this command.
 hbase(main):002:0> table_help
 Help for table-reference commands.
 You can either create a table via 'create' and then manipulate the
table
 via commands like 'put', 'get', etc.
 See the standard help information for how to use each of these
commands.
 However, as of 0.96, you can also get a reference to a table, on
which
 you can invoke commands.
 For instance, you can get create a table and keep around a
reference to
 it via:
 hbase> t = create 't', 'cf'…...

 whoami
 This command returns the user details of HBase. If you execute this command,
returns the current HBase user as shown below.
 hbase(main):008:0> whoami
 hadoop (auth:SIMPLE)
 groups: hadoop

Creating a Table using HBase Shell

You can create a table using the create command, here you must specify the table name
and the Column Family name. The syntax to create a table in HBase shell is shown below.
create ‘<table name>’,’<column family>’

Example
Given below is a sample schema of a table named emp. It has two column families:
“personal data” and “professional data”.

Row key personal data professional data

8
You can create this table in HBase shell as shown below.
hbase(main):002:0> create 'emp', 'personal data', 'professional data'
And it will give you the following output.
0 row(s) in 1.1300 seconds
=> Hbase::Table - emp

Verification
You can verify whether the table is created using the list command as shown below. Here
you can observe the created emp table.
hbase(main):002:0> list
TABLE
emp
2 row(s) in 0.0340 seconds

Creating a Table Using java API

You can create a table in HBase using the createTable() method of HBaseAdmin class.
This class belongs to the org.apache.hadoop.hbase.client package. Given below are the
steps to create a table in HBase using java API.

Step1: Instantiate HBaseAdmin

This class requires the Configuration object as a parameter, therefore initially instantiate the
Configuration class and pass this instance to HBaseAdmin.
Configuration conf = HBaseConfiguration.create();
HBaseAdmin admin = new HBaseAdmin(conf);

Step2: Create TableDescriptor

HTableDescriptor is a class that belongs to the org.apache.hadoop.hbase class. This
class is like a container of table names and column families.
//creating table descriptor
HTableDescriptor table = new HTableDescriptor(toBytes("Table name"));

//creating column family descriptor

HColumnDescriptor family = new HColumnDescriptor(toBytes("column
family"));

//adding coloumn family to HTable

table.addFamily(family);

Step 3: Execute through Admin

Using the createTable() method of HBaseAdmin class, you can execute the created table
in Admin mode.
admin.createTable(table);
Given below is the complete program to create a table via admin.

9
import java.io.IOException;

import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.HColumnDescriptor;
import org.apache.hadoop.hbase.HTableDescriptor;
import org.apache.hadoop.hbase.client.HBaseAdmin;
import org.apache.hadoop.hbase.TableName;

import org.apache.hadoop.conf.Configuration;

public class CreateTable {

public static void main(String[] args) throws IOException {

// Instantiating configuration class

Configuration con = HBaseConfiguration.create();

// Instantiating HbaseAdmin class

HBaseAdmin admin = new HBaseAdmin(con);

// Instantiating table descriptor class

HTableDescriptor tableDescriptor = new
HTableDescriptor(TableName.valueOf("emp"));

// Adding column families to table descriptor

tableDescriptor.addFamily(new HColumnDescriptor("personal"));
tableDescriptor.addFamily(new HColumnDescriptor("professional"));

// Execute the table through admin

admin.createTable(tableDescriptor);
System.out.println(" Table created ");
}
}

Compile and execute the above program as shown below.

$javac CreateTable.java
$java CreateTable
The following should be the output:
Table created

Listing a Table using HBase Shell

list is the command that is used to list all the tables in HBase. Given below is the syntax of
the list command.
hbase(main):001:0 > list
When you type this command and execute in HBase prompt, it will display the list of all the
tables in HBase as shown below.
hbase(main):001:0> list

10
TABLE
emp
Here you can observe a table named emp.

Advanced DAX For Business Intelligence
89% (9)
Advanced DAX For Business Intelligence
178 pages
Dokumen - Pub - Understanding Etl Data Pipelines For Modern Data Architectures Early Release 9781098159252
No ratings yet
Dokumen - Pub - Understanding Etl Data Pipelines For Modern Data Architectures Early Release 9781098159252
39 pages
Schema Export and Import Guide For SAP Business One, Version For SAP HANA
No ratings yet
Schema Export and Import Guide For SAP Business One, Version For SAP HANA
5 pages
Hbase - in Detail: Pushpinder Singh Paxcel Technologies
No ratings yet
Hbase - in Detail: Pushpinder Singh Paxcel Technologies
32 pages
Lesson 6 NoSQL Databases HBase
100% (1)
Lesson 6 NoSQL Databases HBase
47 pages
Hbase - Quick Guide Hbase - Overview
No ratings yet
Hbase - Quick Guide Hbase - Overview
53 pages
HBase
No ratings yet
HBase
38 pages
Cs525: Special Topics in DBS: Large-Scale Data Management
No ratings yet
Cs525: Special Topics in DBS: Large-Scale Data Management
35 pages
BDA Unit 5 HIVE HBASE
No ratings yet
BDA Unit 5 HIVE HBASE
33 pages
Hbase Big Table: Oriented vs. Column-Oriented Data Stores. As Shown Below, in A Row
No ratings yet
Hbase Big Table: Oriented vs. Column-Oriented Data Stores. As Shown Below, in A Row
6 pages
Hbase: Q) What Is Hbase ?
No ratings yet
Hbase: Q) What Is Hbase ?
15 pages
Bda Unit 5
No ratings yet
Bda Unit 5
16 pages
HBase: Data Management & Architecture
No ratings yet
HBase: Data Management & Architecture
36 pages
HBase
No ratings yet
HBase
31 pages
Apache HBase Tutorial & Setup Guide
No ratings yet
Apache HBase Tutorial & Setup Guide
19 pages
Hadoop Week 6
No ratings yet
Hadoop Week 6
38 pages
UNIT5
No ratings yet
UNIT5
42 pages
Hadoop HBASE
No ratings yet
Hadoop HBASE
71 pages
HBase (Unit 4)
No ratings yet
HBase (Unit 4)
37 pages
Chap 3. NoSQL
No ratings yet
Chap 3. NoSQL
97 pages
Hbase
No ratings yet
Hbase
3 pages
Unit 5 BDA
No ratings yet
Unit 5 BDA
34 pages
HBase: Features, Operations, and Architecture
No ratings yet
HBase: Features, Operations, and Architecture
93 pages
D97103GC10 Ag
No ratings yet
D97103GC10 Ag
394 pages
Activate Methology SAP
100% (4)
Activate Methology SAP
39 pages
Technology Fundamentals For SAP - S4HANA and SAP Business Suite
No ratings yet
Technology Fundamentals For SAP - S4HANA and SAP Business Suite
11 pages
Bda - Unit 5
No ratings yet
Bda - Unit 5
30 pages
Big Data 22MSM40206
No ratings yet
Big Data 22MSM40206
9 pages
HBase
No ratings yet
HBase
27 pages
Big Data Unit 5
No ratings yet
Big Data Unit 5
18 pages
HBASE
No ratings yet
HBASE
18 pages
NoSQL Database Insights
No ratings yet
NoSQL Database Insights
14 pages
Lecture 3: Business Intelligence: OLAP, Data Warehouse, and Column Store
No ratings yet
Lecture 3: Business Intelligence: OLAP, Data Warehouse, and Column Store
119 pages
UNIT 5 Notes
No ratings yet
UNIT 5 Notes
47 pages
NoSQL Database Revolution
No ratings yet
NoSQL Database Revolution
5 pages
Sap Hana
100% (1)
Sap Hana
22 pages
HBase - Tutorial
No ratings yet
HBase - Tutorial
14 pages
2019-Sql-Robert Pastijn-Datenbank 19c Neue Funktionalitaeten Und Roadmap-Praesentation PDF
No ratings yet
2019-Sql-Robert Pastijn-Datenbank 19c Neue Funktionalitaeten Und Roadmap-Praesentation PDF
70 pages
Columnar Databases for Data Analysts
No ratings yet
Columnar Databases for Data Analysts
18 pages
9 HBase
No ratings yet
9 HBase
77 pages
Unit 5 Big Data
No ratings yet
Unit 5 Big Data
34 pages
Infobright ICE User Guide
No ratings yet
Infobright ICE User Guide
62 pages
CS 245 Final Exam Spring 2019 (Solutions) : 2.5 Hours
No ratings yet
CS 245 Final Exam Spring 2019 (Solutions) : 2.5 Hours
17 pages
Unit-5 Notes
No ratings yet
Unit-5 Notes
61 pages
BDA1
No ratings yet
BDA1
42 pages
Column Oriented Database
No ratings yet
Column Oriented Database
16 pages
Apache HBase
No ratings yet
Apache HBase
12 pages
Unit 5 Hbase
No ratings yet
Unit 5 Hbase
15 pages
Practical SAP HANA ABAP Interview Q&A
No ratings yet
Practical SAP HANA ABAP Interview Q&A
44 pages
Nanocubes For Real-Time Exploration of Spatiotemporal Datasets
No ratings yet
Nanocubes For Real-Time Exploration of Spatiotemporal Datasets
10 pages
NoSQL - Database Revolution - Resp
50% (2)
NoSQL - Database Revolution - Resp
54 pages
Turning Dynamic Sensor Measurements From Gas Turbines Into Insights: A Big Data Approach
No ratings yet
Turning Dynamic Sensor Measurements From Gas Turbines Into Insights: A Big Data Approach
10 pages
Lec 18
No ratings yet
Lec 18
18 pages
HBase
No ratings yet
HBase
6 pages
Certification Guide
0% (1)
Certification Guide
27 pages
Question - Catalog - Questions On CertificationPub
No ratings yet
Question - Catalog - Questions On CertificationPub
80 pages
A Course in In-Memory Data Management: Prof. Hasso Plattner
No ratings yet
A Course in In-Memory Data Management: Prof. Hasso Plattner
9 pages
Speak
No ratings yet
Speak
6 pages
Big Data Analytics Unit-5
No ratings yet
Big Data Analytics Unit-5
28 pages
10 HBase
No ratings yet
10 HBase
13 pages
Unit - 5 Part - 1
No ratings yet
Unit - 5 Part - 1
8 pages
Sap Hana
100% (2)
Sap Hana
2 pages
BDT Unit - V
No ratings yet
BDT Unit - V
15 pages
Chapter 12 HBase
No ratings yet
Chapter 12 HBase
108 pages
Ajjuproject 1
No ratings yet
Ajjuproject 1
177 pages
Big Data UNIT 5 Own
No ratings yet
Big Data UNIT 5 Own
18 pages
ClickHouse Grokking
No ratings yet
ClickHouse Grokking
18 pages
BDA - M 3 - NoSQL
No ratings yet
BDA - M 3 - NoSQL
81 pages
Unit V Hadoop Related Tools
No ratings yet
Unit V Hadoop Related Tools
54 pages
BDA Unit-5
No ratings yet
BDA Unit-5
31 pages
HBASE
No ratings yet
HBASE
18 pages
BDA Unit 5
No ratings yet
BDA Unit 5
33 pages
It-222 Reviewer
No ratings yet
It-222 Reviewer
3 pages
Unit 5 Lecture No-3 (Hbase)
No ratings yet
Unit 5 Lecture No-3 (Hbase)
35 pages
2 - Disadvantages of NoSQL Technology
No ratings yet
2 - Disadvantages of NoSQL Technology
3 pages
Unit 5 Lecture No-3 (Hbase)
No ratings yet
Unit 5 Lecture No-3 (Hbase)
35 pages
HBase
No ratings yet
HBase
39 pages
Unit - IV - Notes
No ratings yet
Unit - IV - Notes
23 pages
Wa0005.
No ratings yet
Wa0005.
53 pages
Introduction To SAP HANA
No ratings yet
Introduction To SAP HANA
12 pages
Unit 5 Bda
No ratings yet
Unit 5 Bda
42 pages
Unit 3
No ratings yet
Unit 3
15 pages
NoteGPT - What Is HBase - HBase Architecture - HBase Tutorial For Beginners - Hadoop Tutorial - Simplilearn
No ratings yet
NoteGPT - What Is HBase - HBase Architecture - HBase Tutorial For Beginners - Hadoop Tutorial - Simplilearn
5 pages
Ba Iift 17-18
No ratings yet
Ba Iift 17-18
40 pages
HBase Architecture and Its Important Components
No ratings yet
HBase Architecture and Its Important Components
11 pages
Adobe Scan 06-Aug-2025
No ratings yet
Adobe Scan 06-Aug-2025
9 pages
Unit 1 P2 HBase
No ratings yet
Unit 1 P2 HBase
22 pages
CCS334 BDA - Unit 5
No ratings yet
CCS334 BDA - Unit 5
27 pages

HBASE

Uploaded by

HBASE

Uploaded by

LECTURE NOTES

Mr. Manish Gupta, Assistant Professor

HBase and HDFS

Storage Mechanism in HBase

 Table is a collection of rows.

Rowid Column Family Column Family Column Family Column Fam

Column Oriented and Row Oriented

Row-Oriented Database Column-Oriented Database

HBase and RDBMS

No transactions are there in HBase. RDBMS is transactional.

It has de-normalized data. It will have normalized data.

Where to Use HBase

Nov 2006 Google released the paper on BigTable.

Feb 2007 Initial HBase prototype was created as a Hadoop contribution.

Jan 2008 HBase became the sub project of Hadoop.

Oct 2008 HBase 0.18.1 was released.

Jan 2009 HBase 0.19.0 was released.

Sept 2009 HBase 0.20.0 was released.

May 2010 HBase became Apache top-level project.

 Communicate with the client and handle data-related operations.

Creating a Table using HBase Shell

Row key personal data professional data

Creating a Table Using java API

Step1: Instantiate HBaseAdmin

Step2: Create TableDescriptor

//creating column family descriptor

//adding coloumn family to HTable

Step 3: Execute through Admin

public class CreateTable {

public static void main(String[] args) throws IOException {

// Instantiating configuration class

// Instantiating HbaseAdmin class

// Instantiating table descriptor class

// Adding column families to table descriptor

// Execute the table through admin

Compile and execute the above program as shown below.

Listing a Table using HBase Shell

You might also like