Tags: datastax/cassandra
Tags
CNDB-15623: Only use write path for CDC tables in CassandraStreamRece… …iver if CDC is enabled on the node (#2043) Repairs use the local write path for streams on CDC-enabled tables, based on table schema. This interacts poorly with the separation of CNDB services. This commit fixes the issue by only using the CDC write path for a stream if CDC is enabled in the node's configuration (as well as in the schema). This avoids attempting to use the local write path if commitlog-based CDC is not enabled.
CNDB-14861: Fix usage of PrimaryKeyWithSource in SAI The PrimaryKeyWithSource class has been present for two years in the code base as an optimization for hybrid vector workloads, which have to materialize many primary keys in the search-then-sort query path. However, the logic is invalid for version aa (because we have the bug where compacted sstables write per row, not per partition) and it is also invalid for static columns. This commit avoids creation of PrimaryKeyWithSource in those cases. (cherry picked from commit e942cae)
CNDB-15485: Fix ResultRetriever key comparison to prevent dupes in re… …sult set (#2023) ### What is the issue riptano/cndb#15485 ### What does this PR fix and why was it fixed This PR fixes a bug introduced to this branch via #1884. The bug only impacts SAI file format `aa` when the index file was produced via compaction, which is why the modified test simply adds coverage to compact the table and hit the bug. The bug happens when an iterator produces the same partition across two different batch fetches from storage. These keys were not collapsed in the `key.equals(lastKey)` logic because compacted indexes use a row id per row instead of per partition, and the logic in `PrimaryKeyWithSource` considers rows with different row ids to be distinct. However, when we went to materialize a batch from storage, we hit this code: ```java ClusteringIndexFilter clusteringIndexFilter = command.clusteringIndexFilter(firstKey.partitionKey()); if (cfs.metadata().comparator.size() == 0 || firstKey.hasEmptyClustering()) { return clusteringIndexFilter; } else { nextClusterings.clear(); for (PrimaryKey key : keys) nextClusterings.add(key.clustering()); return new ClusteringIndexNamesFilter(nextClusterings, clusteringIndexFilter.isReversed()); } ``` which returned `clusteringIndexFilter` for `aa` because those indexes do not have the clustering information. Therefore, each batch fetched the whole partition (which was subsequently filtered to the proper results), and produced a multiplier effect where we saw `batch` many duplicates. This fix works by comparing partition keys and clustering keys directly, which is a return to the old comparison logic from before #1884. There was actually a discussion about this in the PR to `main`, but unfortunately, we missed this case #1883 (comment). A more proper long term fix might be to remove the logic of creating a `PrimaryKeyWithSource` for AA indexes. However, I preferred this approach because it is essentially a `revert` instead of fixing forward solution.
CNDB-15452: Split SAI metrics query types into disjoint categories (#… …2015) It's simpler to understand SAI query metrics when they are split into granular, non-overlapping categories. The fact they are non-overlapping makes any of their combinations meaningful. They can be also visualized in stacked charts. Additionally, a bug was fixed that prevented proper updates of SortThenFilterQueriesCompleted and FilterThenSortQueriesCompleted metrics for non-ANN TopK queries and for some non-hybrid queries. Now those metrics are bumped up by all hybrid topK queries, and only by those.
CNDB-14577: Compact all SSTables of a level shard if their number rea… ( #1925) …ches a limit (#1873) CNDB-14577: [UCS by default does not compact many small non-overlapping sstables with very few rows](riptano/cndb#14577) This PR limits the number of SSTables for a given compaction level shard by executing a major compaction of the shard instead of the regular compaction of overlapping SSTables if the number of SSTables reaches a threshold. The threshold is controlled by the `max_sstables_per_shard_factor` setting: ```md `max_sstables_per_shard_factor` Limits the number of SSTables per shard. If the number of sstables in a shard exceeds this factor times the shard compaction threshold, a major compaction of the shard will be triggered. Some conditions like slow writes can lead to SSTables being very small, and never overlap with enough other SSTables to be compacted. So this setting is useful to prevent the number of SSTables in a shard from growing too large, which can cause problems due to the per-sstable overhead. Also these small SSTables may still have overlaps even if under the compaction threshold (eg. due to write replicas) and never compacting them wastes storage space. The default value is 10. ``` --------- ### What is the issue ... ### What does this PR fix and why was it fixed ... Co-authored-by: Christophe Bornet <[email protected]>
A CC version based on https://github.com/datastax/cassandra/releases/… …tag/cndb-main-release-202505-HF3 it contains HCD 1.2.3 specific patches.
CNDB-11666: Batch clusterings into single SAI partition post-filterin… …g reads (#1897) Port of CASSANDRA-19497. Co-authored-by: Caleb Rackliffe <[email protected]> Co-authored-by: Michael Marshall <[email protected]> Co-authored-by: Andrés de la Peña <[email protected]>
Hcd-130 incremental repair failure during compaction (#1743) ### What is the issue Concurrent and incremental repairs would spin fail or deadlock. ### What does this PR fix and why was it fixed Concurrent and incremental repairs would spin fail. This patch: - Removes an optimization failing to observe max parallelism - Provides an improved algorithm to enforce max parallelism - Closes transactions on some exceptions failing to be caught - Removes a deadlock between cfs and the compaction strategy for long running sequential operations
CNDB-14602: Fix bytes-based paging for partition deletions (#1836) Only preserve the original data size or rows in case of purging. Fixes DBPE-16935. Cherry picked from riptano/cndb#14602, which was merged to main as c5e2e64
PreviousNext