Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
7 views26 pages

Slide #7 - Cassandra Write Path

The document explains the write path in Apache Cassandra, detailing how data is written to the storage engine through components like Memtables, CommitLog, SSTables, and Compaction. It highlights the efficiency of Cassandra's log-structured storage engine and the importance of idempotency in write operations. Additionally, it covers data directory configurations and the file structures resulting from Memtable flushes and compactions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views26 pages

Slide #7 - Cassandra Write Path

The document explains the write path in Apache Cassandra, detailing how data is written to the storage engine through components like Memtables, CommitLog, SSTables, and Compaction. It highlights the efficiency of Cassandra's log-structured storage engine and the importance of idempotency in write operations. Additionally, it covers data directory configurations and the file structures resulting from Memtable flushes and compactions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Working with the

Cassandra Write Path


Apache Cassandra:
Core Concepts, Skills, and Tools















Leo Schuman, Joe Chu







Oct 20, 2014

©2014 DataStax Training. Use only with permission. Slide 1
Learning Objectives

• Understand how data is written to the storage engine



• Understand the data directories

©2014 DataStax Training. Use only with permission. Slide 2


How does Cassandra write so fast?

• Cassandra is a log-structured storage engine



• Data is sequentially appended, not placed in pre-set locations

RDBMS
CASSANDRA

?
?

Seeks and writes values to  Continuously appends to a log



various pre-set locations

©2014 DataStax Training. Use only with permission. Slide 3


What are the key components of the write path?

• Each node implements four key components to handle its writes



• Memtables – in-memory tables corresponding to CQL tables, with indexes

• CommitLog – append-only log, replayed to restore downed node's Memtables

• SSTables – Memtable snapshots periodically flushed to disk, clearing heap

• Compaction – periodic process to merge and streamline SSTables



• When any node receive any write request

1. The record appends to the CommitLog, and

2. The record appends to the Memtable for this record's target CQL table

3. Periodically, Memtables flush to SSTables, clearing JVM heap and CommitLog

4. Periodically, Compaction runs to merge and streamline SSTables

©2014 DataStax Training. Use only with permission. Slide 4


How does the write path flow on a node?

Coordinator
Each write request …

Periodically …

Client

Memtable (corresponds to a CQL table)

partition key1
first:Oscar
last:Orange
level:42

partition key2
first:Ricky
last:Red

partition key3
first:Betty
last:Blue
level:63

Flush current state to SSTable



Node memory

Node file system
… … … … Periodically …

… … … … … … … …
… … … …
Append Only

… … … …
… … … … …
… … …
… … … …
… Compaction

… … … …
… … …
… … … …
… … … …
… … … … Compact related
… … … …
… … … …
… … … SSTables


… … … …

CommitLog
SSTables

©2014 DataStax Training. Use only with permission. Slide 5
What is the CommitLog and how is it configured?

• An append-only log used to automatically rebuild Memtables on
restart of a downed node, configured in conf/cassandra.yaml

• Memtables flush to disk when CommitLog size 
reaches total allowed space

• commitlog_total_space_in_mb – size at which 
oldest Memtable log segment will be flushed to disk 
(default: 1024 for 64bit JVMs)

• commitlog_segment_size_in_mb – max size of 
individual log segments (default: 32)

• Entries are marked as flushed, as corresponding 
Memtable entries flush to disk as an SSTable
CommitLog

• Flushed CommitLog segments are periodically recycled



• Best practice is to locate CommitLog on its own disk to minimize
write head movement, or on SSD

• commitlog_directory – default is /var/lib/cassandra/commitlog (package install)
or install_location/data/commitlog (binary tarball)

©2014 DataStax Training. Use only with permission. Slide 6
What is the CommitLog and how is it configured?

• Entries accrue in memory, and are synced to


disk in either a batch or periodic manner

• commitlog_sync – either periodic or batch 
(default: periodic)

• batch – writes are not acknowledged until
the log syncs to disk

• commitlog_sync_batch_window_in_ms – how long
to wait for more writes before fsync 
(default: 50)

• periodic – writes are acknowledged CommitLog

immediately, while sync happens periodically

• commitlog_sync_period_in_ms – how long to wait
between fsync of log to disk (default: 10000)

©2014 DataStax Training. Use only with permission. Slide 7


What are Memtables and how are they flushed to disk?

Memtable

partition key1
first:Oscar
last:Orange
level:42

partition key2
first:Ricky
last:Red

partition key3
first:Betty
last:Blue
level:63

• Memtables are in-memory representations of a CQL table



• Each node has a Memtable for each CQL table in the keyspace

• Each Memtable accrues writes and provides reads for data not yet flushed

• Updates to Memtables mutate the in-memory partition

• When a Memtable flushes to disk

1. Current Memtable data is written to a new immutable SSTable on disk

2. JVM heap space is reclaimed from the flushed data

3. Corresponding CommitLog entries are marked as flushed

©2014 DataStax Training. Use only with permission. Slide 8


What are Memtables and how are they flushed to disk?

Memtable

partition key1
first:Oscar
last:Orange
level:42

partition key2
first:Ricky
last:Red

partition key3
first:Betty
last:Blue
level:63

• A Memtable flushes the oldest CommitLog segments to a new


corresponding SSTable on disk when

• memtable_total_space_in_mb is reached (default: 25% of JVM heap)

• commitlog_total_space_in_mb is reached

• nodetool flush command is issued

• The nodetool flush command force-flushes designated Memtables

./nodetool flush [keyspace] [table(s)]

©2014 DataStax Training. Use only with permission. Slide 9


What is an SSTable and what are its characteristics?

• An SSTable ("sorted string table") is



• an immutable file of sorted partitions

• written to disk through fast, sequential i/o

• contains the state of a Memtable when flushed

• The current data state of a CQL table is comprised of

• its corresponding Memtable plus

• all current SSTables flushed from that Memtable

• SSTables are periodically … … … …

compacted from many to one


… … … …
… … … …
… … … …
… … … …
… … … … … … … …
… … … … … … … …
… … … …
… … … …
… … … …

SSTables

©2014 DataStax Training. Use only with permission. Slide 10
What is an SSTable and what are its characteristics?

• For each SSTable, two 


structures are created

Memtable (corresponds to a CQL table)

• Partition index – list of 
partition key1
first:Oscar
last:Orange
level:42

its primary keys and row 
start positions
partition key2
first:Ricky
last:Red

• Partition summary – in- partition key3


first:Betty
last:Blue
level:63

memory sample of its Summary
Summary
Summary

partition index (default: 1 Index
Index
Index

partition key of 128)

… … … …
… … … …
… … … … … … … …
… … … …
… … … … … … … …
… … … … … … … …
… … … …
… … … …
… … … …

SSTables

©2014 DataStax Training. Use only with permission. Slide 11
What is compaction?

• Updates do mutate Memtable partitions, but


its SSTables are immutable

Memtable (corresponds to a CQL table)

• no SSTable seeks/overwrites

partition key1
first:Oscar
last:Orange
level:42

• SSTables just accrue new 
partition key2
first:Ricky
last:Red

timestamped updates

partition key3
first:Betty
last:Blue
level:63

• So, SSTables must be 
periodically compacted
Summary

Index

• related SSTables are merged



• most recent version of each  … … … …

column is compiled to one  … … … …


… … … …
partition in one new SSTable

… … … …
• partitions marked for  … Compaction

… … …
deletion are evicted
… … … …
… … … …
• old SSTables are deleted
… … … …

Note, Compaction and the Read Path are discussed in 


further detail later in this course.
SSTables

©2014 DataStax Training. Use only with permission. Slide 12
What is the significance of idempotency?

Coordinator
Memtable (corresponds to a CQL table)


partition key1
first:Oscar
last:Orange
level:42

partition key2
first:Ricky
last:Red

partition key3
first:Betty last:Blue level:63
timestamp 541
timestamp 541
timestamp 541

partition key3
first:Betty last:Blue level:63
timestamp 583
timestamp 583
timestamp 583

• Due to the high per-operation overhead, Cassandra does not


support transactional rollback ("two phase commit")

• As a result, a Cassandra client could receive an exception from a successful
insert/update operation (e.g., TimedOutException due to network latency)

• Idempotent operation – always causes the same result

• Insert/updates are effectively idempotent when run with identical values

• Operations involving COUNTER columns are not idempotent

• Each column of any write is time-stamped, and only the most
recent are read and compacted

©2014 DataStax Training. Use only with permission. Slide 13
Exercise 1: Insert data and observe the write path flow

©2014 DataStax Training. Use only with permission. Slide 14


Learning Objectives

• Understand how data is written to the storage engine



• Understand the data directories

©2014 DataStax Training. Use only with permission. Slide 15


Where are the data directories located?

• The SSTable and CommitLog directory locations are configured in


conf/cassandra.yaml

• data_file_directories – if multiple locations, distribution is balanced

• commitlog_directory – best practice to place on separate disk

• By default, the files are all placed in /var/lib/cassandra or in install_location/data

©2014 DataStax Training. Use only with permission. Slide 16


Demo 2: Show data directory configuration in the
cassandra.yaml file

©2014 DataStax Training. Use only with permission. Slide 17


How are data directories created for a keyspace?

• Data directories are created by 


keyspace and table name / id

…/data/keyspace/tablename-tableid

CREATE KEYSPACE musicdb


WITH replication = {
'class' : 'SimpleStrategy',
'replication_factor' : 1
};

CREATE TABLE performer (


name VARCHAR,
type VARCHAR,
country VARCHAR,
style VARCHAR,
founded INT,
born INT,
died INT,
PRIMARY KEY (name)
);

©2014 DataStax Training. Use only with permission. Slide 18


What files result from Memtable flush or compaction?

• Data files are created by keyspace name, table name, plus



• Version – SSTable format version (e.g., 'ka' is Cassandra 2.1)

• Generation – incremented each time SSTables flush from a Memtable

• Component – describes the type of file content

• <keyspace>-<table>-<version>-<generation>-<component>

©2014 DataStax Training. Use only with permission. Slide 19


What files result from Memtable flush or compaction?

• -CompressionInfo.db – metadata for Data file compression



• -Data.db – base SSTable data including

• row key, data size, columns index, row level tombstone info, column count, and
column list in sorted order by name

• -Filter.db – SSTable partition keys Bloom filter, to optimize reads

• -Index.db – index for this SSTable, used to optimize reads

• sorted row keys mapped to offsets in Data file; newer versions also include
column index, tombstone, and bloom filter info

• -Statistics.db – statistics for this SSTable

• row size and column count estimate, generation numbers of files from which
this SSTable was compacted, more

• -Summary.db – sampling from Index file, used to optimize reads

• sample size determined by index_interval (default: 1 of each 128)

• -TOC.txt – component list for this SSTable

©2014 DataStax Training. Use only with permission. Slide 20
What is sstable2json?

• tools/bin/sstable2json is a utility which exports an SSTable in JSON


format, for testing and debugging

• -k
display only the partitions for the specified set of keys (limit: 500)

• -x
exclude a specified set of keys (limit: 500)

• -e
enumerate keys only

./sstable2json [full_path_to_SSTable_Data_file] | more

©2014 DataStax Training. Use only with permission. Slide 21


Exercise 3: Insert data and observe SSTables created

©2014 DataStax Training. Use only with permission. Slide 22


Summary

• Cassandra writes fast because it sequentially appends to a log,


without seeking

• A Memtable is an in-memory structure corresponding to a CQL
table and its indexes

• The CommitLog is an append-only log, replayed to restore a downed
node's Memtables

• SSTables are Memtable snapshots periodically flushed to disk

• Compaction is a periodic process to merge and optimize SSTables

• CommitLog accrues in memory and syncs to disk in a batch or
periodic manner

• When Memtables flush to SSTables, heap memory is cleared and the
CommitLog is truncated

• Flush happens at Memtable_total_space_in_mb,
commitlog_total_space_in_mb, or nodetool flush

©2014 DataStax Training. Use only with permission. Slide 23
Summary

• Total table data is the current state of a Memtable plus its SSTables

• Writes should be idempotent, they persist if acknowledgment fails

• Each column in any write is time-stamped; only the most current
are read and compacted

• data_file_directories and commitlog_directory are set in cassandra.yaml

• Each CQL table in a keyspace has a corresponding keyspace name/
table name/table id folder

• Data file names are comprised of keyspace-table-version-
generation-component.db

• Data file components are: Data, Index, Summary, Filter,
CompressionInfo, Statistics, TOC

• sstable2json converts a SSTable to JSON for debug/testing

©2014 DataStax Training. Use only with permission. Slide 24


Review Questions

• What happens when a Memtable is flushed?



• What causes a Memtable to flush?

• What is the relationship of a CQL table to Memtables and SSTables?

• Do disk seeks happen during writes?

• How are data files organized?

©2014 DataStax Training. Use only with permission. Slide 25


©2014 DataStax Training. Use only with permission. Slide 26

You might also like