0% found this document useful (0 votes)

36 views37 pages

HBase (Unit 4)

HBase is a distributed column-oriented database built on top of HDFS that provides Bigtable-like capabilities for the Hadoop ecosystem, with data stored in tables containing rows, columns, and versions. It uses a master-slave architecture with a single master and multiple region servers that host regions, and allows for fast random reads and writes through its data model of keys, column families, and columns.

Uploaded by

The piano guy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views37 pages

HBase (Unit 4)

Uploaded by

The piano guy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

HBase: Overview

• HBase is a distributed column-oriented data

store built on top of HDFS

• HBase is an Apache open source project whose goal

is to provide storage for the Hadoop Distributed
Computing

• Data is logically organized into tables, rows and

columns

1
HBase: Part of Hadoop’s
Ecosystem

HBase is built on top of HDFS

HBase files are

internally stored
in HDFS

2
HBase vs. HDFS
• Both are distributed systems that scale to hundreds or
thousands of nodes

• HDFS is good for batch processing (scans over big files)

• Not good for record lookup
• Not good for incremental addition of small batches
• Not good for updates

3
HBase vs. HDFS (Cont’d)
• HBase is designed to efficiently address the above points
• Fast record lookup
• Support for record-level insertion
• Support for updates (not in place)

• HBase updates are done by creating new versions of

values

4
HBase vs. HDFS (Cont’d)

If application has neither random reads or writes  Stick to HDFS

5
HBase Data Model

6
HBase Data Model
• HBase is based on Google’s Bigtable model
• Key-Value pairs

Column Family

Row key

TimeStamp value

7
HBase Logical View

8
HBase: Keys and Column
Families
Each record is divided into Column Families

Each row has a Key

Each column family consists of one or more Columns

9
Column family named “anchor”
Column family named “Contents”

Column
Time
Row key “content Column “anchor:”
• Key Stamp
s:”
• Byte array
• Serves as the primary key “<html>
t12
…”
for the table
“com.apac Column named “apache.com”
“<html>
• Indexed far fast lookup he.ww t11
…”
w”
• Column Family t10
“anchor:apache
.com”
“APACH
E”
• Has a name (string)
“anchor:cnnsi.co
• Contains one or more t15
m”
“CNN”
related columns
“anchor:my.look. “CNN.co
t13
ca” m”
• Column
“com.cnn.w “<html>
• Belongs to one column ww” t6
…”
family
“<html>
• Included inside the row t5
…”
• familyName:columnName “<html>
t3
…”

10
Version number for each row

Column
Time
Row key “content Column “anchor:”
Stamp
• Version Number s:”

• Unique within each “<html>

t12
key …” value
“com.apac
“<html>
• By default System’s he.ww
w”
t11
…”
timestamp t10
“anchor:apache “APACH
.com” E”
• Data type is Long
“anchor:cnnsi.co
t15 “CNN”
m”
• Value (Cell) “anchor:my.look. “CNN.co
t13
ca” m”
• Byte array
“com.cnn.w “<html>
t6
ww” …”

“<html>
t5
…”
“<html>
t3
…”

11
Notes on Data Model
• HBase schema consists of several Tables
• Each table consists of a set of Column Families
• Columns are not part of the schema

• HBase has Dynamic Columns

• Because column names are encoded inside the cells
• Different cells can have different columns

“Roles” column family

has different columns
in different cells

12
Notes on Data Model (Cont’d)
• The version number can be user-supplied
• Even does not have to be inserted in increasing order
• Version number are unique within each key

• Table can be very sparse

Has two columns
• Many cells are empty [cnnsi.com & my.look.ca]

• Keys are indexed as the primary key

HBase Physical Model

14
HBase Physical Model
• Each column family is stored in a separate file (called HTables)

• Key & Version numbers are replicated with each column family

• Empty cells are not stored

15
Example

16
Column Families

17
HBase Regions
• Each HTable (column family) is partitioned horizontally
into regions
• Regions are counterpart to HDFS blocks

Each will be one region

18
HBase Architecture

19
Three Major Components
• The HBaseMaster
• One master

• The HRegionServer
• Many region servers

• The HBase client

20
HBase Components
• Region
• A subset of a table’s rows, like horizontal range partitioning
• Automatically done

• RegionServer (many slaves)

• Manages data regions
• Serves data for reads and writes (using a log)

• Master
• Responsible for coordinating the slaves
• Assigns regions, detects failures
• Admin functions

21
Big Picture

22
ZooKeeper
• HBase depends on
ZooKeeper

• By default HBase manages

the ZooKeeper instance
• E.g., starts and stops
ZooKeeper

• HMaster and HRegionServers

23
Creating a Table
HBaseAdmin admin= new HBaseAdmin(config);
HColumnDescriptor []column;
column= new HColumnDescriptor[2];
column[0]=new HColumnDescriptor("columnFamily1:");
column[1]=new HColumnDescriptor("columnFamily2:");
HTableDescriptor desc= new HTableDescriptor(Bytes.toBytes("MyTable"));
desc.addFamily(column[0]);
desc.addFamily(column[1]);
admin.createTable(desc);

24
Operations On Regions: Get()
• Given a key  return corresponding record

• For each value return the highest version

• Can control the number of versions you want

25
Operations On Regions: Scan()

26
Select value from table where
Get() key=‘com.apache.www’ AND
label=‘anchor:apache.com’

Time
Row key Column “anchor:”
Stamp

t12

t11
“com.apache.www”

t10 “anchor:apache.com” “APACHE”

t9 “anchor:cnnsi.com” “CNN”

t8 “anchor:my.look.ca” “CNN.com”
“com.cnn.www”
t6

t3
Select value from table
Scan() where anchor=‘cnnsi.com’

Time
Row key Column “anchor:”
Stamp

t12

t11
“com.apache.www”

t10 “anchor:apache.com” “APACHE”

t9 “anchor:cnnsi.com” “CNN”

t8 “anchor:my.look.ca” “CNN.com”
“com.cnn.www”
t6

t3
Operations On Regions: Put()
• Insert a new record (with a new key), Or

• Insert a record for an existing key

Implicit version number
(timestamp)

Explicit version number

29
Operations On Regions: Delete()

• Marking table cells as deleted

• Multiple levels
• Can mark an entire column family as deleted
• Can make all column families of a given row as deleted

30
HBase: Joins
• HBase does not support joins

• Can be done in the application layer

• Using scan() and get() operations

31
Altering a Table

32
Logging Operations

33
HBase Deployment

Master
node

Slave
nodes

34
HBase vs. HDFS

35
HBase vs. RDBMS

36
When to use HBase

Unit 5 Notes
100% (3)
Unit 5 Notes
66 pages
Ba Iift 17-18
No ratings yet
Ba Iift 17-18
40 pages
Unit 1 P2 HBase
No ratings yet
Unit 1 P2 HBase
22 pages
HBASE
No ratings yet
HBASE
18 pages
HBase
No ratings yet
HBase
39 pages
CCS334 BDA - Unit 5
No ratings yet
CCS334 BDA - Unit 5
27 pages
Chapter 12 HBase
No ratings yet
Chapter 12 HBase
108 pages
MSR605 Magnetic Card Reader Manual
No ratings yet
MSR605 Magnetic Card Reader Manual
27 pages
HBase
No ratings yet
HBase
38 pages
Cs525: Special Topics in DBS: Large-Scale Data Management
No ratings yet
Cs525: Special Topics in DBS: Large-Scale Data Management
35 pages
Big Data Analytics Unit-5
No ratings yet
Big Data Analytics Unit-5
28 pages
Unit 5 Bda
No ratings yet
Unit 5 Bda
42 pages
Unit 5 Lecture No-3 (Hbase)
No ratings yet
Unit 5 Lecture No-3 (Hbase)
35 pages
Unit V Hadoop Related Tools
No ratings yet
Unit V Hadoop Related Tools
54 pages
Unit 5 Lecture No-3 (Hbase)
No ratings yet
Unit 5 Lecture No-3 (Hbase)
35 pages
Hbase - in Detail: Pushpinder Singh Paxcel Technologies
No ratings yet
Hbase - in Detail: Pushpinder Singh Paxcel Technologies
32 pages
H Base Tutorial
No ratings yet
H Base Tutorial
38 pages
BDA Unit 5
No ratings yet
BDA Unit 5
33 pages
Hadoop HBASE
No ratings yet
Hadoop HBASE
71 pages
HBase Architecture and Its Important Components
No ratings yet
HBase Architecture and Its Important Components
11 pages
Module 05 HBase - Distributed NoSQL Database
No ratings yet
Module 05 HBase - Distributed NoSQL Database
54 pages
9 HBase
No ratings yet
9 HBase
77 pages
BDA Unit-5
No ratings yet
BDA Unit-5
31 pages
Lec 18
No ratings yet
Lec 18
21 pages
BDA Unit-4 Part-2 HBase, Hive, Pig
No ratings yet
BDA Unit-4 Part-2 HBase, Hive, Pig
74 pages
Unit - IV - Notes
No ratings yet
Unit - IV - Notes
23 pages
Hbase - Quick Guide Hbase - Overview
No ratings yet
Hbase - Quick Guide Hbase - Overview
53 pages
10 HBase
No ratings yet
10 HBase
13 pages
Unit III - Full
No ratings yet
Unit III - Full
31 pages
Unit 5 Big Data
No ratings yet
Unit 5 Big Data
34 pages
BDT Unit - V
No ratings yet
BDT Unit - V
15 pages
Lec 18
No ratings yet
Lec 18
18 pages
Columnar Databases for Data Analysts
No ratings yet
Columnar Databases for Data Analysts
18 pages
UNIT 5 Notes
No ratings yet
UNIT 5 Notes
47 pages
Apache HBase
No ratings yet
Apache HBase
12 pages
Hadoop Week 6
No ratings yet
Hadoop Week 6
38 pages
HBase
No ratings yet
HBase
31 pages
HBase: Data Management & Architecture
No ratings yet
HBase: Data Management & Architecture
36 pages
Bda - Unit 5
No ratings yet
Bda - Unit 5
30 pages
HBase
No ratings yet
HBase
6 pages
HBase
No ratings yet
HBase
27 pages
Unit - 5 Part - 1
No ratings yet
Unit - 5 Part - 1
8 pages
HBASE
No ratings yet
HBASE
11 pages
4 4HBase
No ratings yet
4 4HBase
17 pages
Unit 5 BDA
No ratings yet
Unit 5 BDA
34 pages
UNIT5
No ratings yet
UNIT5
42 pages
Hbase
100% (1)
Hbase
30 pages
Unit 5 Hbase
No ratings yet
Unit 5 Hbase
15 pages
Big Data UNIT 5 Own
No ratings yet
Big Data UNIT 5 Own
18 pages
Hbase Big Table: Oriented vs. Column-Oriented Data Stores. As Shown Below, in A Row
No ratings yet
Hbase Big Table: Oriented vs. Column-Oriented Data Stores. As Shown Below, in A Row
6 pages
HBase - Tutorial
No ratings yet
HBase - Tutorial
14 pages
Hbase in Practice
No ratings yet
Hbase in Practice
46 pages
Big Data Unit 5
No ratings yet
Big Data Unit 5
18 pages
Apache HBase Tutorial & Setup Guide
No ratings yet
Apache HBase Tutorial & Setup Guide
19 pages
Assignment Day 10: Task 1
No ratings yet
Assignment Day 10: Task 1
8 pages
SAS Project Solution Snapshots
No ratings yet
SAS Project Solution Snapshots
7 pages
Setting Mikrotik
No ratings yet
Setting Mikrotik
10 pages
HBase NoSQL Database Overview
No ratings yet
HBase NoSQL Database Overview
9 pages
Database Systems - Questions and Mark Schemes
100% (1)
Database Systems - Questions and Mark Schemes
19 pages
Operating Systems MCQ Bank For SBI SO (440+)
67% (3)
Operating Systems MCQ Bank For SBI SO (440+)
20 pages
Hbase: Q) What Is Hbase ?
No ratings yet
Hbase: Q) What Is Hbase ?
15 pages
TAFJ-H2 Install
No ratings yet
TAFJ-H2 Install
11 pages
Big Data Analytics & Technologies: Hbase
No ratings yet
Big Data Analytics & Technologies: Hbase
30 pages
TCP 3-Way Handshake 3WHS CheatSheet - ATech (Waqas Karim)
No ratings yet
TCP 3-Way Handshake 3WHS CheatSheet - ATech (Waqas Karim)
1 page
AVL Trees - Horowitz Sahni CPP - Lec43
No ratings yet
AVL Trees - Horowitz Sahni CPP - Lec43
31 pages
Lecture 1 - Introduction
No ratings yet
Lecture 1 - Introduction
40 pages
MariaDB for Database Professionals
No ratings yet
MariaDB for Database Professionals
33 pages
Best Practices For MySQL With SSDs
No ratings yet
Best Practices For MySQL With SSDs
14 pages
Crud Operations in ASP - Net MVC 5 Using Ado
0% (1)
Crud Operations in ASP - Net MVC 5 Using Ado
11 pages
Wireshark Filter Techniques Guide
No ratings yet
Wireshark Filter Techniques Guide
8 pages
Assembly Language for PIC Users
No ratings yet
Assembly Language for PIC Users
61 pages
Comtech/EFData CDM-625A Satellite Modem Data Sheet
No ratings yet
Comtech/EFData CDM-625A Satellite Modem Data Sheet
5 pages
Two Pointers for Coders
No ratings yet
Two Pointers for Coders
19 pages
UTech CMP1025 Tutorial Lab #6 - Pointers
No ratings yet
UTech CMP1025 Tutorial Lab #6 - Pointers
6 pages
PL/SQL Database Management Lab Exercise
No ratings yet
PL/SQL Database Management Lab Exercise
8 pages
C B I T: Data Definition Language
No ratings yet
C B I T: Data Definition Language
9 pages
SAP on Azure Workshop Guide
No ratings yet
SAP on Azure Workshop Guide
57 pages
Employee Management System2
No ratings yet
Employee Management System2
14 pages
WS Drawing
No ratings yet
WS Drawing
16 pages
VERITAS Global Cluster Manager™
No ratings yet
VERITAS Global Cluster Manager™
2 pages
Koopmann Zeroing in On Performance in Oracle 10 G
No ratings yet
Koopmann Zeroing in On Performance in Oracle 10 G
62 pages
Unreadable Document Analysis
No ratings yet
Unreadable Document Analysis
10 pages
CSE-200 Accredited Services Architect Day 3 - Performance Slide
No ratings yet
CSE-200 Accredited Services Architect Day 3 - Performance Slide
63 pages
Variables and Types
No ratings yet
Variables and Types
8 pages
JasperReports TB APplus
No ratings yet
JasperReports TB APplus
196 pages
Web Service Guide for IT Managers
No ratings yet
Web Service Guide for IT Managers
42 pages
Create A College Login Page: Ellenki College of Engineering & Technology Name: Roll No: Experiment No.: 1
No ratings yet
Create A College Login Page: Ellenki College of Engineering & Technology Name: Roll No: Experiment No.: 1
55 pages

HBase (Unit 4)

Uploaded by

HBase (Unit 4)

Uploaded by

HBase: Overview

• HBase is a distributed column-oriented data

• HBase is an Apache open source project whose goal

• Data is logically organized into tables, rows and

HBase is built on top of HDFS

HBase files are

• HDFS is good for batch processing (scans over big files)

• HBase updates are done by creating new versions of

If application has neither random reads or writes  Stick to HDFS

Each row has a Key

Each column family consists of one or more Columns

• Unique within each “<html>

• HBase has Dynamic Columns

“Roles” column family

• Table can be very sparse

• Keys are indexed as the primary key

• Empty cells are not stored

Each will be one region

• The HBase client

• RegionServer (many slaves)

• By default HBase manages

• HMaster and HRegionServers

• For each value return the highest version

• Can control the number of versions you want

t10 “anchor:apache.com” “APACHE”

t10 “anchor:apache.com” “APACHE”

• Insert a record for an existing key

Explicit version number

• Marking table cells as deleted

• Can be done in the application layer

You might also like