0% found this document useful (0 votes)

23 views32 pages

HBase Architecture

HBase is a distributed, column-oriented NoSQL data store designed for fast random reads and writes, capable of handling petabytes of data. It features a schemaless data model, self-managed data partitions, and is optimized for low latency access, addressing limitations of traditional Hadoop components. The architecture includes region servers for scalability, a master node for managing regions, and uses ZooKeeper for coordination and health monitoring of the cluster.

Uploaded by

konelatulipe

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views32 pages

HBase Architecture

Uploaded by

konelatulipe

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

HBase Architecture

Prasanth Kothuri, CERN

2
Why HBase?
- Hadoop without HBase
- Distributed, fault-tolerant, throughput-optimized data storage (HDFS)
- Distributed, batch-oriented database processing frameworks like
MapReduce
- Distributed in-memory processing of the entire dataset (Spark), SQL
executed over HDFS data as MR jobs (HIVE) or MPP model (Impala)
- Whats missing
- No low latency random access to your big data
- Can’t update or delete existing rows
- HBase does all this + much more

HBase Architecture 3
What is HBase?
- Distributed column-oriented key-value data store
- Designed to handle petabytes of data and billions of rows
- Fast random reads and writes
- Schemaless data model («NoSQL»)
- Not a RDBMS: no SQL, no joins & no indexes
- Self-managed data partitions, aka auto sharding

- Random access to your planet-sized data

HBase Architecture 4
Building Blocks
- The most basic unit in HBase is a column
- Each column can have multiple versions, with each distinct value stored in
a seperate cell

- One or more columns form a row, that is uniquely addressed by a

row key

- A table is a collection of rows

- All rows are stored in the sorted order of the row key
hbase(main):025:0> scan 'blogposts'
ROW COLUMN+CELL
post1 column=post:author, timestamp=1440601435122, value=Prasanth Kothuri
post1 column=post:body, timestamp=1440601483940, value=This is a test blog entry
post1 column=post:title, timestamp=1440601427153, value=Hello World
post2 column=post:author, timestamp=1440601550401, value=Prasanth Kothuri
post2 column=post:body, timestamp=1440601839670, value=HDFS is a distributed file system that is ...
post2 column=post:title, timestamp=1440601528247, value=Introduction to HDFS
2 row(s) in 0.0060 seconds

HBase Architecture 5
Building Blocks
- Column Families
- Columns are grouped into column families
- Defined when the table is created
- Should not be changed too often
- The number of column families should be reasonable [?]

- Refrencing columns
- Column name is called qualifier
- Reference using – family:qualifier

HBase Architecture 6
Building Blocks
- A note on the NULL value
- In RDBMS NULLs occupy space
- In Hbase, NULL columns are simply not stored

- cell
- Every column value, or cell, is timestamped
- This can be used to save multiple versions of a value
- Versions are stored in decreasing timestamp
- Cell Versions can be constrained by predicate deletions
- Keep only values from the last month

HBase Architecture 7
Tables, Rows, Columns and Cells
- Access to data
- (Table, RowKey, Family, Column, Timestamp) ->
Value
- SortedMap<RowKey, List<SortedMap<Column,
List<Value, Timestamp>>>>

- Which means:
- The first SortedMap is the table, containing a List of column
families
- The families contain another SortedMap, representing
columns and a List of value, timestamp tuples

HBase Architecture 8
Rows and Columns in HBase

HBase Architecture 9
Time-oriented and spreadsheet view

Time-oriented view

Spredsheet view

HBase Architecture 10
Building Blocks – scan of blogposts

hbase(main):044:0* scan 'blogposts'

ROW COLUMN+CELL
guestpost1 column=image:bodyimage, timestamp=1440698372345, value=image3.jpg
guestpost1 column=image:header, timestamp=1440698372323, value=image3
guestpost1 column=post:author, timestamp=1440698372251, value=Barack Obama
guestpost1 column=post:body, timestamp=1440698372298, value=blah bla blah...
guestpost1 column=post:title, timestamp=1440698372276, value=How to play Golf
post1 column=image:bodyimage, timestamp=1440698351420, value=image1.jpg
post1 column=image:header, timestamp=1440698351395, value=image1
post1 column=post:author, timestamp=1440698351284, value=Prasanth Kothuri
post1 column=post:body, timestamp=1440698351372, value=This is a test blog post
post1 column=post:title, timestamp=1440698351346, value=Hello World
post2 column=image:bodyimage, timestamp=1440698372220, value=image2.jpg
post2 column=image:header, timestamp=1440698372182, value=image2
post2 column=post:author, timestamp=1440698372095, value=Prasanth Kothuri
post2 column=post:body, timestamp=1440698372148, value=HDFS is a distributed file system that is...
post2 column=post:title, timestamp=1440698372123, value=Introduction to HDFS
post4 column=post:author, timestamp=1440698372385, value=Prasanth Kothuri
post4 column=post:body, timestamp=1440698372422, value=Distributed sorted key value store
post4 column=post:title, timestamp=1440698372403, value=HBase Architecture

4 row(s) in 0.0770 seconds

HBase Architecture 11
How does it scale?
- Region
- This is the basic unit of scalability and load balancing
- Regions are contiguous ranges of rows stored together
- Regions are split by the system when they become too large
- Regions can also be merged to reduce the number of files
- How it works
- Intially, there is one region
- System monitors region size: if a thresold is attained, SPLIT
- Regions are split in two at the middle key
- This creates roughly two equivalent regions

HBase Architecture 12
Table Regions

HBase Architecture 13
Region Servers
- Region Servers
- Each region is served by exactly one Region Server
- Region servers can server multiple regions
- from same table and also from other tables

- Failures
- Regions allow for fast recovery upon failure
- Fine-grained Load Balancing is also achieved using regions as
they can be easily moved across servers

HBase Architecture 14
Sharding in HBase

HBase Architecture 15
Anatomy of a Region Server
BlockCache

WAL
HRegion HRegion

HStore HStore
HStore HStore
StoreFile StoreFile MemStore

HFile HFile

HDFS

Legend:
- A RegionServer contains a single WAL, single BlockCache, and multiple Regions
- A Region contains multiple Stores, one for each Column Family
- A Store consists of multiple StoreFiles and a MemStore
- A StoreFile corresponds to a single Hfile
- Hfiles and WAL are persisted on HDFS

HBase Architecture 16
HBase Cluster View

Legend:
- RegionServer is collocated with an HDFS DataNode
- Clients directly communicate with Region Server for sending and receiving data
- Master manages region assignment and DDLs
- Online configuration state is maintained in ZooKeeper

HBase Architecture 17
HBase Architecture
- HBase Master
- Assigns regions to regions servers using ZooKeeper
- Handles load balancing
- Not part of the data path
- Holds metadata and schema

- Region Server
- Handles READs and WRITEs
- Handles the WAL and HFiles
- Handle region splitting

HBase Architecture 18
HBase Storage
- HFiles
- A block-indexed format to store sorted key-value pairs
- Looks like this

- Where are Hbase files stored?

- Hfiles are divided into blocks and stored into HDFS
- Hbase has a root directory set to /hbase in HDFS
- There is a subdirectory for each hbase table under /hbase

HBase Architecture 19
HBase Storage
- HBase Files in HDFS
HFiles
[hdfs@itrac925 ~]$ hdfs dfs -ls -r /hbase
drwxr-xr-x - hbase hbase 0 2015-08-26 17:03 /hbase/data/default/blogposts
drwxr-xr-x - hbase hbase 0 2015-08-26 17:03 /hbase/data/default/blogposts/.tabledesc
-rw-r--r-- 3 hbase hbase 535 2015-08-26 17:03 /hbase/data/default/blogposts/.tabledesc/.tableinfo.0000000001
drwxr-xr-x - hbase hbase 0 2015-08-26 17:03 /hbase/data/default/blogposts/.tmp
drwxr-xr-x - hbase hbase 0 2015-08-26 18:03 /hbase/data/default/blogposts/26467f89a1aaa52a7d48493a9f88549e
-rw-r--r-- 3 hbase hbase 44 2015-08-26 17:03 /hbase/data/default/blogposts/26467f89a1aaa52a7d48493a9f88549e/.regioninfo
drwxr-xr-x - hbase hbase 0 2015-08-26 18:03 /hbase/data/default/blogposts/26467f89a1aaa52a7d48493a9f88549e/.tmp
drwxr-xr-x - hbase hbase 0 2015-08-26 17:03 /hbase/data/default/blogposts/26467f89a1aaa52a7d48493a9f88549e/image
drwxr-xr-x - hbase hbase 0 2015-08-26 18:03 /hbase/data/default/blogposts/26467f89a1aaa52a7d48493a9f88549e/post
-rw-r--r-- 3 hbase hbase 1313 2015-08-26 18:03
/hbase/data/default/blogposts/26467f89a1aaa52a7d48493a9f88549e/post/a7331f6dd5544d109ce9ce1b1a4696dd
drwxr-xr-x - hbase hbase 0 2015-08-26 17:03 /hbase/data/default/blogposts/26467f89a1aaa52a7d48493a9f88549e/recovered.edits
-rw-r--r-- 3 hbase hbase 0 2015-08-26 17:03 /hbase/data/default/blogposts/26467f89a1aaa52a7d48493a9f88549e/recovered.edits/2.seqid

WALS
-rw-r--r-- 3 hbase hbase 87291610 2015-08-27 08:32 /hbase/WALs/itrac901.cern.ch,60020,1440455544980/itrac901.cern.ch%2C60020%2C1440455544980.null0.1440653566417
-rw-r--r-- 3 hbase hbase 63860405 2015-08-27 08:32 /hbase/WALs/itrac902.cern.ch,60020,1440455544763/itrac902.cern.ch%2C60020%2C1440455544763.null0.1440653563334
-rw-r--r-- 3 hbase hbase 16737365 2015-08-27 08:32 /hbase/WALs/itrac903.cern.ch,60020,1440455544535/itrac903.cern.ch%2C60020%2C1440455544535.null0.1440653562699
-rw-r--r-- 3 hbase hbase 153680074 2015-08-27 08:32 /hbase/WALs/itrac904.cern.ch,60020,1440455545250/itrac904.cern.ch%2C60020%2C1440455545250.null0.1440653567897

HBase Architecture 20
HBase Physical Architecture

HBase Architecture 21
ZooKeeper
- ZooKeeper is high performance cluster coordination
service for distributed applications like HBase
- ZooKeeper maintains region server memership and
health in the Hbase cluster
- ZooKeeper also has the location of –META- table
regions

HBase Architecture 22
HBase Read operations
- Client contacts the ZooKeeper for the location of the
.META
- Client scans the .META to find the region server hosting
the required region
- A quick exclusion of storefiles is done using bloom filter
and timestamps
- Then the memstore and remaining store files are
scanned to find the matching key

HBase Architecture 23
HBase Write operations
- First, data is written to WAL (Write Ahead Log)
- Then data is moved into memory structure called
MemStore
- When the size of MemStore reaches its threshold, data
will be flushed to a new Hfile

HBase Architecture 24
HBase Write operations – contd.
- Deletes are written as new tombstone marker
- Updates are written as seperate KeyValue instances,
possibily spread across multiple store files
- Minor compactions merge Hfiles into smaller number of
files
- Relatively low cost operation
- Major compactions merge Hfiles into single Hfile
removing any deleted items
- Costly operation

HBase Architecture 25
Client API
- Java API
- HTable class in the org.apache.hadoop.hbase.client package
- Python
- HappyBase, Startbase
- REST Interface
- Stargate
- Thrift Interface
- Hbase shell
- No SQL
- But there are gateways from Hive, Impala and other components

HBase Architecture 26
HBase use cases / workloads
- Good for
- Large Datasets
- Sparse Datasets
- Denormalized data records
- Need capability for large volume random reads

- Avoid if you have

- Small Datasets
- Relational Data
- Transactions

- Choice of row keys is cruicial in avoiding region server ‘hotspotting’

HBase Architecture 27
Hands On – 1
CRUD operations using HBase shell

Step 1) Start the HBase shell

hbase shell

Step 2) Create a table called blogposts with post and image column families
create ‘blogposts’, ‘post’, ‘image’

HBase Architecture 28
Hands On – 1 contd
Step 3) Insert some data into the table
put 'blogposts', 'post1', 'post:author', 'Prasanth Kothuri'
put 'blogposts', 'post1', 'post:title', 'Hello World'
put 'blogposts', 'post1', 'post:body', 'This is a test blog post'
put 'blogposts', 'post1', 'image:header', 'image1'
put 'blogposts', 'post1', 'image:bodyimage', 'image1.jpg'

put 'blogposts', 'post2', 'post:author', 'Prasanth Kothuri'

put 'blogposts', 'post2', 'post:title', 'Introduction to HDFS'
put 'blogposts', 'post2', 'post:body', 'HDFS is a distributed file system that is...'
put 'blogposts', 'post2', 'image:header', 'image2'
put 'blogposts', 'post2', 'image:bodyimage', 'image2.jpg'

put 'blogposts', 'guestpost1', 'post:author', 'Barack Obama'

put 'blogposts', 'guestpost1', 'post:title', 'How to play Golf'
put 'blogposts', 'guestpost1', 'post:body', 'blah bla blah...'
put 'blogposts', 'guestpost1', 'image:header', 'image3'
put 'blogposts', 'guestpost1', 'image:bodyimage', 'image3.jpg'

put 'blogposts', 'post4', 'post:author', 'Prasanth Kothuri'

put 'blogposts', 'post4', 'post:title', 'HBase Architecture'
put 'blogposts', 'post4', 'post:body', 'Distributed sorted key value store'

HBase Architecture 29
Hands On 1 - contd
Step 4) Scan the full table

scan ‘blogposts’

Step 5) Lookup for a specific key

get ‘blogposts’,’post1’

Step 6) Update a row

put 'blogposts','guestpost1','post:title','How to save the world'

Step 6) Delete a row

delete ‘blogposts’,’guestpost1’

HBase Architecture 30
Conclusion
- Logical Data Model
- Table, Rows, Column Families, Columns and Cells
- Logical HBase architecture
- Regions, Region Servers, HBase Master and ZooKeeper
- Physical HBase architecture
- WAL, Hfiles, HBase on HDFS
- Data semantics supported by Hbase
- GET, SCAN, PUT, CREATE and DELETE
- ‘usecases’ suited for HBase
HBase Architecture 31
Q&A

E-mail: [email protected]
Blog: http://prasanthkothuri.wordpress.com
See also: https://db-blog.web.cern.ch/ 32

DIY Guide To Building Your Own Pulk
No ratings yet
DIY Guide To Building Your Own Pulk
41 pages
Hadoop HBASE
No ratings yet
Hadoop HBASE
71 pages
NoSQL Databases for Tech Enthusiasts
No ratings yet
NoSQL Databases for Tech Enthusiasts
74 pages
HBase
No ratings yet
HBase
31 pages
Hadoop Week 6
No ratings yet
Hadoop Week 6
38 pages
Unit 5 Lecture No-3 (Hbase)
No ratings yet
Unit 5 Lecture No-3 (Hbase)
35 pages
Hbase - Quick Guide Hbase - Overview
No ratings yet
Hbase - Quick Guide Hbase - Overview
53 pages
HBase Presentation
No ratings yet
HBase Presentation
23 pages
NoteGPT - What Is HBase - HBase Architecture - HBase Tutorial For Beginners - Hadoop Tutorial - Simplilearn
No ratings yet
NoteGPT - What Is HBase - HBase Architecture - HBase Tutorial For Beginners - Hadoop Tutorial - Simplilearn
5 pages
Apache HBase Tutorial & Setup Guide
No ratings yet
Apache HBase Tutorial & Setup Guide
19 pages
Unit 5 Big Data
No ratings yet
Unit 5 Big Data
34 pages
HBase
No ratings yet
HBase
6 pages
Ba Iift 17-18
No ratings yet
Ba Iift 17-18
40 pages
HBASE
No ratings yet
HBASE
18 pages
HBase NoSQL Database Overview
No ratings yet
HBase NoSQL Database Overview
9 pages
HBase (Unit 4)
No ratings yet
HBase (Unit 4)
37 pages
BDA Unit 5
No ratings yet
BDA Unit 5
33 pages
HBase: Data Management & Architecture
No ratings yet
HBase: Data Management & Architecture
36 pages
Lec 18
No ratings yet
Lec 18
21 pages
HBase Architecture & Features Guide
No ratings yet
HBase Architecture & Features Guide
35 pages
Unit 3
No ratings yet
Unit 3
15 pages
10 HBase
No ratings yet
10 HBase
13 pages
Unit 5 Lecture No-3 (Hbase)
No ratings yet
Unit 5 Lecture No-3 (Hbase)
35 pages
Hbase +Fosdem+2010+Nosql 2
No ratings yet
Hbase +Fosdem+2010+Nosql 2
43 pages
HBase Architecture and Its Important Components
No ratings yet
HBase Architecture and Its Important Components
11 pages
Big Data Analytics Unit-5
No ratings yet
Big Data Analytics Unit-5
28 pages
Unit 5 BDA
No ratings yet
Unit 5 BDA
34 pages
Lesson 6 NoSQL Databases HBase
100% (1)
Lesson 6 NoSQL Databases HBase
47 pages
Lec 18
No ratings yet
Lec 18
18 pages
Hbase - in Detail: Pushpinder Singh Paxcel Technologies
No ratings yet
Hbase - in Detail: Pushpinder Singh Paxcel Technologies
32 pages
BDA Unit-4 Part-2 HBase, Hive, Pig
No ratings yet
BDA Unit-4 Part-2 HBase, Hive, Pig
74 pages
Big Data Unit 5
No ratings yet
Big Data Unit 5
18 pages
HBase
No ratings yet
HBase
38 pages
Unit 1 P2 HBase
No ratings yet
Unit 1 P2 HBase
22 pages
Hbase in Practice
No ratings yet
Hbase in Practice
46 pages
UNIT5
No ratings yet
UNIT5
42 pages
UNIT 5 Notes
No ratings yet
UNIT 5 Notes
47 pages
HBase
No ratings yet
HBase
14 pages
Adobe Scan 06-Aug-2025
No ratings yet
Adobe Scan 06-Aug-2025
9 pages
HBase
No ratings yet
HBase
27 pages
Assignment Day 10: Task 1
No ratings yet
Assignment Day 10: Task 1
8 pages
HBase Architecture PDF
No ratings yet
HBase Architecture PDF
32 pages
HBase
No ratings yet
HBase
39 pages
BDA1
No ratings yet
BDA1
42 pages
Hbase Big Table: Oriented vs. Column-Oriented Data Stores. As Shown Below, in A Row
No ratings yet
Hbase Big Table: Oriented vs. Column-Oriented Data Stores. As Shown Below, in A Row
6 pages
Chapter 12 HBase
No ratings yet
Chapter 12 HBase
108 pages
Week-5 - Lecture Notes
No ratings yet
Week-5 - Lecture Notes
138 pages
HBase Architecture
No ratings yet
HBase Architecture
1 page
4 4HBase
No ratings yet
4 4HBase
17 pages
Cs525: Special Topics in DBS: Large-Scale Data Management
No ratings yet
Cs525: Special Topics in DBS: Large-Scale Data Management
35 pages
Unit - IV - Notes
No ratings yet
Unit - IV - Notes
23 pages
Wa0005.
No ratings yet
Wa0005.
53 pages
DBMS Unit3
No ratings yet
DBMS Unit3
28 pages
BDT Unit - V
No ratings yet
BDT Unit - V
15 pages
Bda - Unit 5
No ratings yet
Bda - Unit 5
30 pages
BDA Unit 5 HIVE HBASE
No ratings yet
BDA Unit 5 HIVE HBASE
33 pages
BDA Module 2-2023
No ratings yet
BDA Module 2-2023
30 pages
2023 Reports - Luminate On Diversity
No ratings yet
2023 Reports - Luminate On Diversity
28 pages
How To Mount A Remote File System Using Network File System (NFS)
No ratings yet
How To Mount A Remote File System Using Network File System (NFS)
3 pages
M1000H
No ratings yet
M1000H
2 pages
Highway Alignment Principles
60% (5)
Highway Alignment Principles
89 pages
Control Account Reconciliation Statement
No ratings yet
Control Account Reconciliation Statement
8 pages
One Minute Manager Notes
No ratings yet
One Minute Manager Notes
8 pages
Weekly Assessment in Science
No ratings yet
Weekly Assessment in Science
1 page
2014 E400 W212 Relay & Fuse Guide
No ratings yet
2014 E400 W212 Relay & Fuse Guide
15 pages
Ielts5 - Santiago Suarez
No ratings yet
Ielts5 - Santiago Suarez
1 page
Continuous
No ratings yet
Continuous
13 pages
U.S. Foreign Assistance To Somalia - Phoenix From The Ashes
No ratings yet
U.S. Foreign Assistance To Somalia - Phoenix From The Ashes
26 pages
Lentil & Legume Price Guide
No ratings yet
Lentil & Legume Price Guide
15 pages
Cost Concepts Quiz
No ratings yet
Cost Concepts Quiz
11 pages
Construction of A New Bridge Across The River Nile at Jinja UVCO/1407/JNJ/PY/ / Abutment A1 Structural Outline Sheet - 1
No ratings yet
Construction of A New Bridge Across The River Nile at Jinja UVCO/1407/JNJ/PY/ / Abutment A1 Structural Outline Sheet - 1
71 pages
MH 400
No ratings yet
MH 400
81 pages
Cre6-C-240
No ratings yet
Cre6-C-240
1 page
Parenteral Feeding
No ratings yet
Parenteral Feeding
3 pages
BE Mech 5.5 Year
No ratings yet
BE Mech 5.5 Year
3 pages
Larry Williams Investor Profile PDF
No ratings yet
Larry Williams Investor Profile PDF
3 pages
551 1R-14 Preview
No ratings yet
551 1R-14 Preview
4 pages
List - Ipynb - Colaboratory
No ratings yet
List - Ipynb - Colaboratory
4 pages
Varron - 1B - Physical Assessment Findings
No ratings yet
Varron - 1B - Physical Assessment Findings
17 pages
Review of Anthropometric Considerations For Tractor Seat Design
No ratings yet
Review of Anthropometric Considerations For Tractor Seat Design
9 pages
Adam Sanchez - Resume-References
No ratings yet
Adam Sanchez - Resume-References
3 pages
Old Man Yells at Cloud Know Your Meme
No ratings yet
Old Man Yells at Cloud Know Your Meme
1 page
ECommerce Virtual Assistant Course
100% (1)
ECommerce Virtual Assistant Course
18 pages
EEE229/EEE223/GEE202 - Problem Sheet 1
No ratings yet
EEE229/EEE223/GEE202 - Problem Sheet 1
1 page
Tutorial 07-MA 1063
No ratings yet
Tutorial 07-MA 1063
2 pages
LU-1500N Series: LU-1508NS LU-1508NH LU-1510N LU-1510N-7 LU-1509NS LU-1509NH LU-1511N-7
No ratings yet
LU-1500N Series: LU-1508NS LU-1508NH LU-1510N LU-1510N-7 LU-1509NS LU-1509NH LU-1511N-7
5 pages

HBase Architecture

Uploaded by

HBase Architecture

Uploaded by

HBase Architecture

Prasanth Kothuri, CERN

- Random access to your planet-sized data

- One or more columns form a row, that is uniquely addressed by a

- A table is a collection of rows

hbase(main):044:0* scan 'blogposts'

4 row(s) in 0.0770 seconds

- Where are Hbase files stored?

- Avoid if you have

- Choice of row keys is cruicial in avoiding region server ‘hotspotting’

Step 1) Start the HBase shell

put 'blogposts', 'post2', 'post:author', 'Prasanth Kothuri'

put 'blogposts', 'guestpost1', 'post:author', 'Barack Obama'

put 'blogposts', 'post4', 'post:author', 'Prasanth Kothuri'

Step 5) Lookup for a specific key

Step 6) Update a row

Step 6) Delete a row

You might also like