0% found this document useful (0 votes)

7 views26 pages

Slide #7 - Cassandra Write Path

The document explains the write path in Apache Cassandra, detailing how data is written to the storage engine through components like Memtables, CommitLog, SSTables, and Compaction. It highlights the efficiency of Cassandra's log-structured storage engine and the importance of idempotency in write operations. Additionally, it covers data directory configurations and the file structures resulting from Memtable flushes and compactions.

Uploaded by

foaoxsaekqjfhiohzg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views26 pages

Slide #7 - Cassandra Write Path

Uploaded by

foaoxsaekqjfhiohzg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

Working with the

Cassandra Write Path

Apache Cassandra:
Core Concepts, Skills, and Tools

Leo Schuman, Joe Chu

Oct 20, 2014

©2014 DataStax Training. Use only with permission. Slide 1
Learning Objectives

• Understand how data is written to the storage engine

• Understand the data directories

©2014 DataStax Training. Use only with permission. Slide 2

How does Cassandra write so fast?

• Cassandra is a log-structured storage engine

• Data is sequentially appended, not placed in pre-set locations

RDBMS
CASSANDRA

?
?

Seeks and writes values to Continuously appends to a log

various pre-set locations

©2014 DataStax Training. Use only with permission. Slide 3

What are the key components of the write path?

• Each node implements four key components to handle its writes

• Memtables – in-memory tables corresponding to CQL tables, with indexes

• CommitLog – append-only log, replayed to restore downed node's Memtables

• SSTables – Memtable snapshots periodically flushed to disk, clearing heap

• Compaction – periodic process to merge and streamline SSTables

• When any node receive any write request

1. The record appends to the CommitLog, and

2. The record appends to the Memtable for this record's target CQL table

3. Periodically, Memtables flush to SSTables, clearing JVM heap and CommitLog

4. Periodically, Compaction runs to merge and streamline SSTables

©2014 DataStax Training. Use only with permission. Slide 4

How does the write path flow on a node?

Coordinator
Each write request …

Periodically …

Client

Memtable (corresponds to a CQL table)

partition key1
first:Oscar
last:Orange
level:42

partition key2
first:Ricky
last:Red

partition key3
first:Betty
last:Blue
level:63

Flush current state to SSTable

Node memory

Node file system
… … … … Periodically …

… … … … … … … …
… … … …
Append Only

… … … …
… … … … …
… … …
… … … …
… Compaction

… … … …
… … …
… … … …
… … … …
… … … … Compact related
… … … …
… … … …
… … … SSTables

…
… … … …

CommitLog
SSTables

©2014 DataStax Training. Use only with permission. Slide 5
What is the CommitLog and how is it configured?

• An append-only log used to automatically rebuild Memtables on
restart of a downed node, configured in conf/cassandra.yaml

• Memtables flush to disk when CommitLog size
reaches total allowed space

• commitlog_total_space_in_mb – size at which
oldest Memtable log segment will be flushed to disk
(default: 1024 for 64bit JVMs)

• commitlog_segment_size_in_mb – max size of
individual log segments (default: 32)

• Entries are marked as flushed, as corresponding
Memtable entries flush to disk as an SSTable
CommitLog

• Flushed CommitLog segments are periodically recycled

• Best practice is to locate CommitLog on its own disk to minimize
write head movement, or on SSD

• commitlog_directory – default is /var/lib/cassandra/commitlog (package install)
or install_location/data/commitlog (binary tarball)

©2014 DataStax Training. Use only with permission. Slide 6
What is the CommitLog and how is it configured?

• Entries accrue in memory, and are synced to

disk in either a batch or periodic manner

• commitlog_sync – either periodic or batch
(default: periodic)

• batch – writes are not acknowledged until
the log syncs to disk

• commitlog_sync_batch_window_in_ms – how long
to wait for more writes before fsync
(default: 50)

• periodic – writes are acknowledged CommitLog

immediately, while sync happens periodically

• commitlog_sync_period_in_ms – how long to wait
between fsync of log to disk (default: 10000)

©2014 DataStax Training. Use only with permission. Slide 7

What are Memtables and how are they flushed to disk?

Memtable

partition key1
first:Oscar
last:Orange
level:42

partition key2
first:Ricky
last:Red

partition key3
first:Betty
last:Blue
level:63

• Memtables are in-memory representations of a CQL table

• Each node has a Memtable for each CQL table in the keyspace

• Each Memtable accrues writes and provides reads for data not yet flushed

• Updates to Memtables mutate the in-memory partition

• When a Memtable flushes to disk

1. Current Memtable data is written to a new immutable SSTable on disk

2. JVM heap space is reclaimed from the flushed data

3. Corresponding CommitLog entries are marked as flushed

©2014 DataStax Training. Use only with permission. Slide 8

What are Memtables and how are they flushed to disk?

Memtable

partition key1
first:Oscar
last:Orange
level:42

partition key2
first:Ricky
last:Red

partition key3
first:Betty
last:Blue
level:63

• A Memtable flushes the oldest CommitLog segments to a new

corresponding SSTable on disk when

• memtable_total_space_in_mb is reached (default: 25% of JVM heap)

• commitlog_total_space_in_mb is reached

• nodetool flush command is issued

• The nodetool flush command force-flushes designated Memtables

./nodetool flush [keyspace] [table(s)]

©2014 DataStax Training. Use only with permission. Slide 9

What is an SSTable and what are its characteristics?

• An SSTable ("sorted string table") is

• an immutable file of sorted partitions

• written to disk through fast, sequential i/o

• contains the state of a Memtable when flushed

• The current data state of a CQL table is comprised of

• its corresponding Memtable plus

• all current SSTables flushed from that Memtable

• SSTables are periodically … … … …

compacted from many to one

… … … …
… … … …
… … … …
… … … …
… … … … … … … …
… … … … … … … …
… … … …
… … … …
… … … …

SSTables

©2014 DataStax Training. Use only with permission. Slide 10
What is an SSTable and what are its characteristics?

• For each SSTable, two

structures are created

Memtable (corresponds to a CQL table)

• Partition index – list of
partition key1
first:Oscar
last:Orange
level:42

its primary keys and row
start positions
partition key2
first:Ricky
last:Red

• Partition summary – in- partition key3

first:Betty
last:Blue
level:63

memory sample of its Summary
Summary
Summary

partition index (default: 1 Index
Index
Index

partition key of 128)

… … … …
… … … …
… … … … … … … …
… … … …
… … … … … … … …
… … … … … … … …
… … … …
… … … …
… … … …

SSTables

©2014 DataStax Training. Use only with permission. Slide 11
What is compaction?

• Updates do mutate Memtable partitions, but

its SSTables are immutable

Memtable (corresponds to a CQL table)

• no SSTable seeks/overwrites

partition key1
first:Oscar
last:Orange
level:42

• SSTables just accrue new
partition key2
first:Ricky
last:Red

timestamped updates

partition key3
first:Betty
last:Blue
level:63

• So, SSTables must be
periodically compacted
Summary

Index

• related SSTables are merged

• most recent version of each … … … …

column is compiled to one … … … …

… … … …
partition in one new SSTable

… … … …
• partitions marked for … Compaction

… … …
deletion are evicted
… … … …
… … … …
• old SSTables are deleted
… … … …

Note, Compaction and the Read Path are discussed in

further detail later in this course.
SSTables

©2014 DataStax Training. Use only with permission. Slide 12
What is the significance of idempotency?

Coordinator
Memtable (corresponds to a CQL table)

•
partition key1
first:Oscar
last:Orange
level:42

partition key2
first:Ricky
last:Red

partition key3
first:Betty last:Blue level:63
timestamp 541
timestamp 541
timestamp 541

partition key3
first:Betty last:Blue level:63
timestamp 583
timestamp 583
timestamp 583

• Due to the high per-operation overhead, Cassandra does not

support transactional rollback ("two phase commit")

• As a result, a Cassandra client could receive an exception from a successful
insert/update operation (e.g., TimedOutException due to network latency)

• Idempotent operation – always causes the same result

• Insert/updates are effectively idempotent when run with identical values

• Operations involving COUNTER columns are not idempotent

• Each column of any write is time-stamped, and only the most
recent are read and compacted

©2014 DataStax Training. Use only with permission. Slide 13
Exercise 1: Insert data and observe the write path flow

©2014 DataStax Training. Use only with permission. Slide 14

Learning Objectives

• Understand how data is written to the storage engine

• Understand the data directories

Where are the data directories located?

• The SSTable and CommitLog directory locations are configured in

conf/cassandra.yaml

• data_file_directories – if multiple locations, distribution is balanced

• commitlog_directory – best practice to place on separate disk

• By default, the files are all placed in /var/lib/cassandra or in install_location/data

Demo 2: Show data directory configuration in the
cassandra.yaml file

How are data directories created for a keyspace?

• Data directories are created by

keyspace and table name / id

…/data/keyspace/tablename-tableid

CREATE KEYSPACE musicdb

WITH replication = {
'class' : 'SimpleStrategy',
'replication_factor' : 1
};

CREATE TABLE performer (

name VARCHAR,
type VARCHAR,
country VARCHAR,
style VARCHAR,
founded INT,
born INT,
died INT,
PRIMARY KEY (name)
);

What files result from Memtable flush or compaction?

• Data files are created by keyspace name, table name, plus

• Version – SSTable format version (e.g., 'ka' is Cassandra 2.1)

• Generation – incremented each time SSTables flush from a Memtable

• Component – describes the type of file content

• <keyspace>-<table>-<version>-<generation>-<component>

What files result from Memtable flush or compaction?

• -CompressionInfo.db – metadata for Data file compression

• -Data.db – base SSTable data including

• row key, data size, columns index, row level tombstone info, column count, and
column list in sorted order by name

• -Filter.db – SSTable partition keys Bloom filter, to optimize reads

• -Index.db – index for this SSTable, used to optimize reads

• sorted row keys mapped to offsets in Data file; newer versions also include
column index, tombstone, and bloom filter info

• -Statistics.db – statistics for this SSTable

• row size and column count estimate, generation numbers of files from which
this SSTable was compacted, more

• -Summary.db – sampling from Index file, used to optimize reads

• sample size determined by index_interval (default: 1 of each 128)

• -TOC.txt – component list for this SSTable

©2014 DataStax Training. Use only with permission. Slide 20
What is sstable2json?

• tools/bin/sstable2json is a utility which exports an SSTable in JSON

format, for testing and debugging

• -k
display only the partitions for the specified set of keys (limit: 500)

• -x
exclude a specified set of keys (limit: 500)

• -e
enumerate keys only

./sstable2json [full_path_to_SSTable_Data_file] | more

Exercise 3: Insert data and observe SSTables created

Summary

• Cassandra writes fast because it sequentially appends to a log,

without seeking

• A Memtable is an in-memory structure corresponding to a CQL
table and its indexes

• The CommitLog is an append-only log, replayed to restore a downed
node's Memtables

• SSTables are Memtable snapshots periodically flushed to disk

• Compaction is a periodic process to merge and optimize SSTables

• CommitLog accrues in memory and syncs to disk in a batch or
periodic manner

• When Memtables flush to SSTables, heap memory is cleared and the
CommitLog is truncated

• Flush happens at Memtable_total_space_in_mb,
commitlog_total_space_in_mb, or nodetool flush

©2014 DataStax Training. Use only with permission. Slide 23
Summary

• Total table data is the current state of a Memtable plus its SSTables

• Writes should be idempotent, they persist if acknowledgment fails

• Each column in any write is time-stamped; only the most current
are read and compacted

• data_file_directories and commitlog_directory are set in cassandra.yaml

• Each CQL table in a keyspace has a corresponding keyspace name/
table name/table id folder

• Data file names are comprised of keyspace-table-version-
generation-component.db

• Data file components are: Data, Index, Summary, Filter,
CompressionInfo, Statistics, TOC

• sstable2json converts a SSTable to JSON for debug/testing

Review Questions

• What happens when a Memtable is flushed?

• What causes a Memtable to flush?

• What is the relationship of a CQL table to Memtables and SSTables?

• Do disk seeks happen during writes?

• How are data files organized?

Cassandra - An Introduction
100% (1)
Cassandra - An Introduction
35 pages
Cassandra As Used by Facebook
100% (1)
Cassandra As Used by Facebook
12 pages
Slide #8 - Cassandra Read Path
No ratings yet
Slide #8 - Cassandra Read Path
33 pages
Slide #9 - Understanding Compaction
No ratings yet
Slide #9 - Understanding Compaction
24 pages
DBMS Internals: How Does It All Work?
No ratings yet
DBMS Internals: How Does It All Work?
94 pages
Introduction To NOSQL and Cassandra: @rantav @outbrain
No ratings yet
Introduction To NOSQL and Cassandra: @rantav @outbrain
60 pages
CS403 Quiz 2 Solution by MCS of Virtuallians
100% (1)
CS403 Quiz 2 Solution by MCS of Virtuallians
2 pages
Dbms Imp Qs Chatgpt
No ratings yet
Dbms Imp Qs Chatgpt
19 pages
Apache Cassandra Write & Read Path
No ratings yet
Apache Cassandra Write & Read Path
127 pages
1 - Database Design
No ratings yet
1 - Database Design
29 pages
Lect26 After
No ratings yet
Lect26 After
28 pages
Big Data 76-100
No ratings yet
Big Data 76-100
25 pages
Deep Dive With Cassandra
No ratings yet
Deep Dive With Cassandra
29 pages
Cassandra Quick Guide
No ratings yet
Cassandra Quick Guide
60 pages
About Deletes and Tombstones in Cassandra
No ratings yet
About Deletes and Tombstones in Cassandra
31 pages
Lecture 4 LSBM Tree
No ratings yet
Lecture 4 LSBM Tree
42 pages
Designing Data-Intensive Apps - CH 3
No ratings yet
Designing Data-Intensive Apps - CH 3
7 pages
Designing Data Intensive Applications: Part 1: Storage and Retrieval
No ratings yet
Designing Data Intensive Applications: Part 1: Storage and Retrieval
85 pages
Chapter 3 Nishitha Motakatla
No ratings yet
Chapter 3 Nishitha Motakatla
4 pages
OS Module 5
No ratings yet
OS Module 5
48 pages
Cassandra Interview QA Full
No ratings yet
Cassandra Interview QA Full
2 pages
Cassandra FAQ
No ratings yet
Cassandra FAQ
2 pages
Unit 4 UDM-1
No ratings yet
Unit 4 UDM-1
13 pages
Cassandra
No ratings yet
Cassandra
25 pages
Cassandra Installation Review
No ratings yet
Cassandra Installation Review
6 pages
1 - Database Design
No ratings yet
1 - Database Design
29 pages
Introduction To Cassandra
No ratings yet
Introduction To Cassandra
47 pages
Adbms (Bca) 2 1744958912050
No ratings yet
Adbms (Bca) 2 1744958912050
40 pages
Lec 17
No ratings yet
Lec 17
21 pages
A Detailed Guide On Database Indexes 11
No ratings yet
A Detailed Guide On Database Indexes 11
14 pages
Module Iippt
No ratings yet
Module Iippt
27 pages
ENGG CassandraInternals 080821 2155 222
No ratings yet
ENGG CassandraInternals 080821 2155 222
7 pages
Datastage Answers
No ratings yet
Datastage Answers
3 pages
Cassandra
No ratings yet
Cassandra
31 pages
Cassandra for Data Engineers
No ratings yet
Cassandra for Data Engineers
50 pages
CS614 Finalterm Subjective Referencefile
No ratings yet
CS614 Finalterm Subjective Referencefile
27 pages
IBM® Edge2013 - Storage Migration Methods
No ratings yet
IBM® Edge2013 - Storage Migration Methods
67 pages
Cassandra Database Overview
No ratings yet
Cassandra Database Overview
37 pages
Unit 3-Chapter 1-File Management-II Part
No ratings yet
Unit 3-Chapter 1-File Management-II Part
19 pages
Cassandra Key Management Guide
No ratings yet
Cassandra Key Management Guide
11 pages
Reading in Cassandra: Partitioner
No ratings yet
Reading in Cassandra: Partitioner
5 pages
PR 5 - No SQL
No ratings yet
PR 5 - No SQL
9 pages
02 Blocking - Addional
No ratings yet
02 Blocking - Addional
74 pages
Cassandra Read/Write Path Guide
No ratings yet
Cassandra Read/Write Path Guide
51 pages
Cassandra
No ratings yet
Cassandra
6 pages
Cassandra for Developers & Analysts
No ratings yet
Cassandra for Developers & Analysts
6 pages
Dzone Refcard 153 Apache Cassandra 2020
No ratings yet
Dzone Refcard 153 Apache Cassandra 2020
11 pages
09b Cassandra Slides
No ratings yet
09b Cassandra Slides
26 pages
Puri Rohan Pore Shriram LSM-based Storage Techniques Strengths and Trade-Offs PDF
No ratings yet
Puri Rohan Pore Shriram LSM-based Storage Techniques Strengths and Trade-Offs PDF
44 pages
Cassandra Introduction
No ratings yet
Cassandra Introduction
99 pages
Data Management Nuts and Bolts
No ratings yet
Data Management Nuts and Bolts
21 pages
Introductiontocassandra 180218073404
No ratings yet
Introductiontocassandra 180218073404
37 pages
Intro To Cassandra For Developers
No ratings yet
Intro To Cassandra For Developers
61 pages
Scale15x-2017-Postgresql Zfs Best Practices
No ratings yet
Scale15x-2017-Postgresql Zfs Best Practices
110 pages
Understanding Data Consistency in Apache Cassandra: Cassandra Essentials Tutorial Series
No ratings yet
Understanding Data Consistency in Apache Cassandra: Cassandra Essentials Tutorial Series
15 pages
DSX Developer Ebook4 FINAL PDF
No ratings yet
DSX Developer Ebook4 FINAL PDF
27 pages
An Overview of Apache Cassandra: Cassandra Essentials Tutorial Series
No ratings yet
An Overview of Apache Cassandra: Cassandra Essentials Tutorial Series
20 pages
Inls 623 - Database Systems Ii - File Structures, Indexing, and Hashing
No ratings yet
Inls 623 - Database Systems Ii - File Structures, Indexing, and Hashing
41 pages
File System Consistency and Exam Review
No ratings yet
File System Consistency and Exam Review
43 pages

Slide #7 - Cassandra Write Path

Uploaded by

Slide #7 - Cassandra Write Path

Uploaded by

Working with the

Cassandra Write Path

• Understand how data is written to the storage engine

©2014 DataStax Training. Use only with permission. Slide 2

• Cassandra is a log-structured storage engine

Seeks and writes values to  Continuously appends to a log

©2014 DataStax Training. Use only with permission. Slide 3

• Each node implements four key components to handle its writes

©2014 DataStax Training. Use only with permission. Slide 4

Flush current state to SSTable

• Flushed CommitLog segments are periodically recycled

• Entries accrue in memory, and are synced to

©2014 DataStax Training. Use only with permission. Slide 7

• Memtables are in-memory representations of a CQL table

©2014 DataStax Training. Use only with permission. Slide 8

• A Memtable flushes the oldest CommitLog segments to a new

©2014 DataStax Training. Use only with permission. Slide 9

• An SSTable ("sorted string table") is

compacted from many to one

• For each SSTable, two 

• Partition summary – in- partition key3

• Updates do mutate Memtable partitions, but

• related SSTables are merged

column is compiled to one  … … … …

Note, Compaction and the Read Path are discussed in 

• Due to the high per-operation overhead, Cassandra does not

©2014 DataStax Training. Use only with permission. Slide 14

• Understand how data is written to the storage engine

©2014 DataStax Training. Use only with permission. Slide 15

• The SSTable and CommitLog directory locations are configured in

©2014 DataStax Training. Use only with permission. Slide 16

©2014 DataStax Training. Use only with permission. Slide 17

• Data directories are created by 

CREATE KEYSPACE musicdb

CREATE TABLE performer (

©2014 DataStax Training. Use only with permission. Slide 18

• Data files are created by keyspace name, table name, plus

©2014 DataStax Training. Use only with permission. Slide 19

• -CompressionInfo.db – metadata for Data file compression

• tools/bin/sstable2json is a utility which exports an SSTable in JSON

©2014 DataStax Training. Use only with permission. Slide 21

©2014 DataStax Training. Use only with permission. Slide 22

• Cassandra writes fast because it sequentially appends to a log,

©2014 DataStax Training. Use only with permission. Slide 24

• What happens when a Memtable is flushed?

©2014 DataStax Training. Use only with permission. Slide 25

You might also like

Seeks and writes values to Continuously appends to a log

• For each SSTable, two

column is compiled to one … … … …

Note, Compaction and the Read Path are discussed in

• Data directories are created by