https://youtu.
be/SCmN2Sr7fqE
The design of time-series database for metrics aggregation focuses on
efficient storage and retrieval of time-series data. Here are the key
components and design considerations for designing a time series
database:
1. Data Model - we organize data into time series, which are uniquely
identified by a metric name and a set of key-value pairs called
labels. Each time series represents a stream of timestamped
numerical data points.
2. Storage Format - we store data on disk and use a combination of
memory-mapped files and an append-only write-ahead log to store
data points. The data is compressed and organized into chunks,
making it space-efficient while still allowing fast access.
3. Chunking and Block Storage - we divide time series data into
fixed-size chunks called blocks. Each block represents a time range
and contains multiple time series. Chunks within blocks are
further divided into smaller segments for efficient compression and
retrieval. Blocks are stored on disk and can be selectively loaded
into memory based on the time range of the queried data.
4. Indexing - we maintain indexes to facilitate efficient querying. We
use an inverted index structure called the inverted bitmap index,
which allows for fast filtering and selection of time series based on
labels and label values.
5. Data Compaction - we optimize storage and query performance by
utilizing data compaction techniques. This involves periodically
merging and compacting chunks to eliminate redundant data
points and reduce the overall storage size.
6. Retention and Data Expiration - we define a data retention policy
to determine how long the data should be retained in the time
series database. We allow to configure the retention period based
on time or storage constraints. Expired data is purged from the
database during the compaction process.