Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
161 views32 pages

HBase Architecture PDF

HBase's architecture is based on a sorted nested map that stores data by row key and column in HFiles across multiple Region Servers. Each Region Server handles reads and writes for a range of row keys partitioned into regions, and uses a Write-Ahead Log and Memstore to buffer writes before flushing data to HFiles. A Master manages region and server assignments with Zookeeper to allow clients to efficiently lookup the correct Region Server for a given row key.

Uploaded by

mihirhota
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
161 views32 pages

HBase Architecture PDF

HBase's architecture is based on a sorted nested map that stores data by row key and column in HFiles across multiple Region Servers. Each Region Server handles reads and writes for a range of row keys partitioned into regions, and uses a Write-Ahead Log and Memstore to buffer writes before flushing data to HFiles. A Master manages region and server assignments with Zookeeper to allow clients to efficiently lookup the correct Region Server for a given row key.

Uploaded by

mihirhota
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Hbases Architecture

is inspired by

Recap

HBase vs RDBMS

This is how data is stored


in traditional databases

id

type

for user

from user

timestamp

Friend request
status

Ryan

Jessica

146710201

Comment

Chaz

Daniel

146711200

Comment

Rick

Brendan

1467112205

Like

Rick

Brendan

1467112213

Recap

Column oriented storage

id

type

for user

from user

timestamp

Friend
request status

Ryan

Jessica

146710201

Comment

Chaz

Daniel

146711200

Comment

Rick

Brendan

1467112205

Like

Rick

Brendan

1467112213

Data is stored
in a map

<Row
id,
Col
id>
Key =
Value = <data>

Recap

Column oriented storage

id

type

for user

from user

timestamp

Friend
request status

Ryan

Jessica

146710201

Comment

Chaz

Daniel

146711200

Comment

Rick

Brendan

1467112205

Like

Rick

Brendan

1467112213

Data is stored
in a map

2,
for_user
Key =
Value =
Chaz

Recap

Column oriented storage


row
id
1

column

value

type

Friend request status

for user

Ryan

from user

Jessica

id

type

for user

from user

timestamp

Friend
request status

Ryan

Jessica

146710201

timestamp

146710201

Comment

Chaz

Daniel

146711200

type

Comment

for user

Chaz

from user

Daniel

timestamp

146711200

type

Comment

for user

Rick

from user

Brendan

timestamp

1467112205

3
4

Comment
Like

Rick
Rick

Brendan
Brendan

1467112205
1467112213

Recap

Column oriented storage


Keys

An HBase table
is in fact a
sorted map

Values

row
id
1

column

value

type

Friend request status

for user

Ryan

from user

Jessica

timestamp

146710201

type

Comment

for user

Chaz

from user

Daniel

timestamp

146711200

type

Comment

for user

Rick

from user

Brendan

A sorted nested map


<Row id,
ColumnFamily,
<Column,
<Timestamp,Value>>>

A
sorted
nested
<Row id,
map
ColumnFamily,

<Column,
When you read data
<Timestamp,Value>>>

from HBase, it
performs a lookup for
the specified row id

A
sorted
nested
<Row id,
map
ColumnFamily,
<Column,
When you write data to
<Timestamp,
V
alue>>>
HBase, it needs to insert the
row id in the right place, so
the rows are sorted

A
sorted
nested
<Row id,
map
ColumnFamily,

<Column,
<Timestamp,
V
alue>>>
HBase does this

using Region Servers

Region Servers

row id
1
2
3

Region 1

4
5
6
7
8

Region 2

9
10
11
12

Region 3

Row ids in a
table are divided
into ranges
called regions

Region Servers

row id
1
2
3

Region 1

4
5
6
7
8

Region 2

9
10
11
12

Region 3

Each region is
handled by a
Region Ser ver

Region Server 1

Region 1
Region 3
Region Server 2

Region 2

Region Servers
Regions serve as an
index to perform fast
lookup for where a
row key belongs

Region Server 1

Region 1
Region 3
Region Server 2

Region 2

Region Servers
A region server
handles all read-write
operations to Regions
that are allotted to it

Region Server

Memstore

Region Servers
Initially all
writes are
stored in
memory

Region Server
WriteAheadLog

Memstore

Region Servers
Whenever there is a
new change, the
data is updated in
the Memstore and
a change log is
written to disk

Region Server
WriteAheadLog

Memstore

Region Servers
The WriteAheadLog
is created for
recovery in case
the Region Ser ver
crashes

Region Server
WriteAheadLog

HFile
Memstore

Region Servers
Periodically the
Memstore gets
full, and the data
in Memstore is
flushed to disk

Region Server
WriteAheadLog

HFile
Memstore

Region Servers
The data for a
row key is either
in the Memstore
or in a HFile

Region Server
WriteAheadLog

HFile
Memstore

Region Servers
HFiles are
stored in
HDFS

Region Server
WriteAheadLog

HFile
Memstore

Region Servers
HDFS will break
up the HFile into
blocks and store
it on different
nodes

Region Server
WriteAheadLog

HFile
Memstore

Region Servers
To minimize disk
seeks, the region
ser ver keeps an index
of row key to HFile
block in memory

Region Server

Region Servers

WriteAheadLog

HFile
Memstore

It only performs
1 disk seek for
finding a row key

Region Ser ver 1

WAL

HFile

Memstore

Region Ser ver 2

WAL

HFile

Memstore

Region Servers
When you try to
read/insert data
1. The region ser ver
containing the row
key is identified

Region Ser ver 1

WAL

HFile

Memstore

Region Ser ver 2

WAL

HFile

Memstore

Region Servers

When you try to read/


insert data

1. The region ser ver containing


the row key is identified

2. The region server will


lookup the Memstore or
the HFile and do the needful

Region Server

Clients interact directly


with a Region server
handling the relevant row
keys

WAL

HFile
Memstore

HDFS

Region Server

They need to know


which region ser ver
their row key is
being handled by

WAL

HFile
Memstore

HDFS

Region Server

HBase uses a
Master ser ver to
manage Regions
and RegionSer vers

WAL

HFile
Memstore

HDFS

Region Server

Master
The Master assigns
regions to region servers,
manages load balancing
etc

WAL

HFile
Memstore

HDFS

Region Server

Master

WAL

HFile
Memstore

The Master uses


Apache Zookeeper to
help assign regions to
region ser vers

HDFS

Region Server

Master
Zookeeper

Zookeeper helps clients


lookup the relevant
region ser ver for a
specific row id

WAL

HFile
Memstore

HDFS

HBase
Region Server

Master
Zookeeper

WAL

HFile
Memstore

HDFS

You might also like