Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Raft: Support database replication via Raft. #10

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 60 commits into
base: master
Choose a base branch
from
Draft

Conversation

lnikon
Copy link
Owner

@lnikon lnikon commented Dec 28, 2024

Summary by CodeRabbit

  • New Features

    • Added Raft consensus module for distributed replication, including gRPC-based cluster communication and leader election.
    • Introduced flexible Write-Ahead Log (WAL) system supporting in-memory and persistent storage backends.
    • Enhanced configuration schema with explicit sections for WAL, LSM tree, and server node/peer settings.
    • Added Prometheus metrics example and Raft/replica launch scripts for cluster deployment.
  • Improvements

    • Refactored database and LSM tree architecture for modularity, move semantics, and builder patterns.
    • Updated configuration and manifest handling for better reliability and extensibility.
    • Expanded CMake and Conan configurations to support new dependencies and modular builds.
    • Improved code formatting, style consistency, and project organization.
  • Bug Fixes

    • Fixed segment overlap detection in LSM tree merging.
    • Added assertion checks for manifest updates in compaction and LSM operations.
  • Documentation

    • Added initial design document and updated JSON schema documentation.
  • Chores

    • Updated .gitignore, code style, and static analysis settings.
    • Added Python requirements and updated build/test profiles.

Copy link

coderabbitai bot commented Dec 28, 2024

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

This update introduces a comprehensive Raft consensus implementation with gRPC communication and persistent state, integrates a modular Write-Ahead Log (WAL) system with pluggable storage backends, and refactors the database, LSM tree, and manifest layers for explicit dependency injection and move semantics. The build system, configuration schema, and test infrastructure are correspondingly expanded and modernized.

Changes

File(s) / Path(s) Change Summary
Raft Consensus & gRPC
lib/raft/raft.cpp, lib/raft/raft.h, lib/raft/raft_test.cpp, lib/raft/persistence.h, lib/proto/Raft.proto, lib/proto/CMakeLists.txt, examples/raft/main.cpp, examples/raft/CMakeLists.txt
Implements a full Raft consensus module with gRPC-based peer communication, persistent state, leader election, log replication, and unit tests. Adds new protobuf definitions and build targets for Raft.
WAL System & Storage Abstractions
lib/wal/common.h, lib/wal/common.cpp, lib/wal/config.h, lib/wal/wal.h, lib/wal/CMakeLists.txt, lib/wal/log/log.h, lib/wal/log/concepts.h, lib/wal/log/storage/builder.h, lib/wal/log/storage/in_memory_log_storage.h, lib/wal/log/storage/persistent_log_storage.h, lib/wal/log/storage/backend/backend.h, lib/wal/log/storage/backend/append_only_file_storage_backend.h, lib/wal/log/storage/log_storage_crtp.h
Introduces a modular Write-Ahead Log system with in-memory and persistent (file) backends, builder patterns, concepts for type safety, and unified log storage wrappers.
Database, LSM Tree, Manifest Refactor
lib/db/db.h, lib/db/db.cpp, lib/db/db_config.h, lib/db/db_config.cpp, lib/db/manifest/manifest.h, lib/db/manifest/manifest.cpp, lib/structures/lsmtree/lsmtree.h, lib/structures/lsmtree/lsmtree.cpp, lib/structures/lsmtree/levels/level.h, lib/structures/lsmtree/levels/level.cpp, lib/structures/lsmtree/levels/levels.h, lib/structures/lsmtree/levels/levels.cpp
Refactors to explicitly inject dependencies (WAL, manifest, LSM tree), adds move semantics, status codes, and builder classes. Manifest and LSM tree now rely on WAL for persistence.
Filesystem Layer
lib/fs/append_only_file.h, lib/fs/append_only_file.cpp, lib/fs/random_access_file.h, lib/fs/random_access_file.cpp, lib/fs/random_access_file_test.cpp, lib/fs/common.h, lib/fs/common.cpp
Adds async file I/O via io_uring, append-only and random access file abstractions, error handling, and a full test suite.
POSIX Wrapper
lib/posix_wrapper/open_flag.h, lib/posix_wrapper/CMakeLists.txt, lib/posix_wrapper/posix_wrapper_test.cpp
Introduces strongly-typed POSIX open flags and tests.
Concurrency Utilities
lib/concurrency/thread_safe_queue.h, lib/concurrency/helpers.h
Adds move semantics, dual mutex guards, and helper functions for thread-safe containers.
Configuration & Schema
assets/database_config_schema.json, lib/config/config.h, lib/config/config.cpp, lib/server/server_config.h, src/main.cpp, assets/tkvpp_config_1.json, assets/tkvpp_config_2.json, assets/tkvpp_config_3.json, assets/tkvpp_config_standalone.json, assets/tkvpp_config_replicated_single_node.json
Expands configuration schema for WAL and Raft, updates config loading, and adds new example configs for different deployment modes.
Build System & Project Structure
CMakeLists.txt, conanfile.txt, src/CMakeLists.txt, lib/CMakeLists.txt, lib/db/CMakeLists.txt, lib/config/CMakeLists.txt, lib/concurrency/CMakeLists.txt, lib/fs/CMakeLists.txt, lib/server/CMakeLists.txt, lib/structures/CMakeLists.txt, lib/structures/hashindex/CMakeLists.txt, lib/structures/lsmtree/CMakeLists.txt, lib/structures/memtable/CMakeLists.txt, lib/structures/skiplist/CMakeLists.txt, lib/structures/sorted_vector/CMakeLists.txt, bench/CMakeLists.txt, bench/db/CMakeLists.txt, bench/fs/CMakeLists.txt, examples/CMakeLists.txt, examples/absl/CMakeLists.txt, examples/absl/concurrency/CMakeLists.txt, examples/embedded/CMakeLists.txt, examples/prometheus/CMakeLists.txt, examples/raft/CMakeLists.txt
Renames project to "frankie", adds Raft, WAL, and POSIX modules, updates dependencies, and expands build/test targets.
Testing & Examples
lib/raft/raft_test.cpp, lib/fs/random_access_file_test.cpp, lib/posix_wrapper/posix_wrapper_test.cpp, examples/prometheus/main.cpp, examples/prometheus/CMakeLists.txt, examples/iouring/libiouring/main.cpp, examples/iouring/raw_uring/main.cpp
Adds new test suites and example programs for Raft, filesystem, POSIX, Prometheus metrics, and io_uring.
Miscellaneous
.gitignore, .clang-format, .clang-tidy, .vscode/launch.json, .vscode/settings.json, requirements.txt, DESIGN.md, main_run_replicas.sh, raft_run_replicas.sh
Simplifies .gitignore, updates formatting and linter configs, adds debug launch configs, requirements, design doc, and helper scripts for running clusters.
Removed Legacy WAL
lib/db/wal/wal.cpp, lib/db/wal/wal.h, assets/tkvpp_config.json
Removes old WAL implementation and outdated config.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant RaftNode (Leader)
    participant RaftNode (Follower)
    participant WAL
    participant LSMTree
    participant Manifest

    Client->>RaftNode (Leader): Put(key, value)
    RaftNode (Leader)->>WAL: Add log entry
    RaftNode (Leader)->>RaftNode (Follower): AppendEntries RPC
    RaftNode (Follower)->>WAL: Add log entry
    RaftNode (Follower)-->>RaftNode (Leader): AppendEntriesResponse
    RaftNode (Leader)->>LSMTree: Commit entry
    LSMTree->>Manifest: Update manifest
    Note right of RaftNode (Leader): On majority, commit and apply
Loading

Possibly related PRs

  • lnikon/tinykvpp#9: Adds a single debug configuration for Main in .vscode/launch.json, which is related to the broader debug and Raft integration in this PR.

Poem

A rabbit hopped to Raft’s new tune,
With logs that write and files that swoon.
Consensus reached across the land,
With WAL and LSM hand in hand.
Frankie’s code now leaps and bounds,
In clusters, tests, and disk I/O rounds.
🐇✨ The future’s bright—let’s dig new ground!

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (7)
examples/raft/thread_safety.h (2)

52-96: Commented-out Mutex class

Currently, the annotated Mutex class is left commented out. If it’s still under development, consider removing it to reduce code clutter or enabling it with an appropriate implementation.


229-279: Deprecated lock-style annotations

The lock-style annotations at the end of this file are marked as deprecated. If the project no longer relies on them, removing the dead code can improve readability. Otherwise, keep them in a dedicated legacy-compatibility file or mark them with a deprecation notice to guide users to the new annotation approach.

examples/raft/main.cpp (1)

58-58: Consider passing 'ip' by const reference

Static analysis suggests passing 'ip' by const reference to avoid potential performance overhead and unnecessary copying when constructing NodeClient.

- NodeClient(const ID id, const IP ip)
+ NodeClient(const ID id, const IP& ip)
🧰 Tools
🪛 cppcheck (2.10-2)

[performance] 58-58: Function parameter 'ip' should be passed by const reference.

(passedByValue)

lib/proto/Raft.proto (1)

45-45: Inconsistent field naming: “lastLogterm”

For consistency with other fields like term, lastLogIndex, and typical naming conventions, consider renaming lastLogterm to lastLogTerm.

- uint32 lastLogterm = 4;
+ uint32 lastLogTerm = 4;
lib/proto/CMakeLists.txt (3)

40-41: Remove duplicate variable definitions.

These variables are already defined above (lines 11-12) and don't need to be redefined since they haven't changed.

-set(PROTO_BINARY_DIR "${CMAKE_CURRENT_BINARY_DIR}")
-set(PROTO_IMPORT_DIRS "${CMAKE_CURRENT_LIST_DIR}")

34-62: Consider reducing code duplication in proto generation setup.

The Raft proto setup duplicates the same pattern as TinyKVPP. Consider creating a function to reduce duplication and improve maintainability.

Example approach:

function(setup_proto_library TARGET_NAME PROTO_FILE)
    if(NOT EXISTS "${PROTO_FILE}")
        message(FATAL_ERROR "Proto file not found: ${PROTO_FILE}")
    endif()

    add_library(${TARGET_NAME} OBJECT "${PROTO_FILE}")
    target_include_directories(${TARGET_NAME} PUBLIC 
        "$<BUILD_INTERFACE:${PROTO_BINARY_DIR}>" 
        ${protobuf_INCLUDE_DIR})
    target_link_libraries(${TARGET_NAME} PUBLIC 
        protobuf::libprotobuf 
        gRPC::grpc++)

    protobuf_generate(
        TARGET ${TARGET_NAME}
        IMPORT_DIRS ${PROTO_IMPORT_DIRS}
        PROTOC_OUT_DIR "${PROTO_BINARY_DIR}"
    )

    protobuf_generate(
        TARGET ${TARGET_NAME}
        LANGUAGE grpc
        PLUGIN protoc-gen-grpc=$<TARGET_FILE:gRPC::grpc_cpp_plugin>
        PLUGIN_OPTIONS generate_mock_code=true
        GENERATE_EXTENSIONS .grpc.pb.h .grpc.pb.cc)
endfunction()

# Usage:
setup_proto_library(TKVProtoObjects "${TINYKVPP_PROTO_FILE}")
setup_proto_library(RaftProtoObjects "${RAFT_PROTO_FILE}")

1-3: Add version requirements for protobuf and gRPC.

Since this project relies heavily on protobuf and gRPC for the Raft implementation, consider adding minimum version requirements to ensure compatibility.

 cmake_minimum_required(VERSION 3.25)
 project(zkv)
+
+find_package(Protobuf 3.21 REQUIRED)
+find_package(gRPC 1.50 REQUIRED)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 928ff53 and 10396a7.

📒 Files selected for processing (10)
  • .vscode/launch.json (2 hunks)
  • CMakeLists.txt (1 hunks)
  • examples/CMakeLists.txt (1 hunks)
  • examples/raft/CMakeLists.txt (1 hunks)
  • examples/raft/main.cpp (1 hunks)
  • examples/raft/thread_safety.h (1 hunks)
  • lib/proto/CMakeLists.txt (2 hunks)
  • lib/proto/Raft.proto (1 hunks)
  • lib/structures/lsmtree/levels/level.cpp (1 hunks)
  • run_replicas.sh (1 hunks)
🧰 Additional context used
🪛 cppcheck (2.10-2)
examples/raft/main.cpp

[performance] 58-58: Function parameter 'ip' should be passed by const reference.

(passedByValue)

🔇 Additional comments (6)
examples/raft/thread_safety.h (2)

6-10: Clang-specific annotations and portability concern

These conditional macros work well for Clang’s thread safety analysis. However, if this header is consumed in a project that compiles with different compilers lacking Clang’s __attribute__ support, the macros become no-ops. Ensure that consumers of this header file are aware of how these macros behave in non-Clang environments.


108-124: Effective RAII-based locking

The MutexLocker class employs a clean RAII pattern to ensure mutex acquisition and release. This pattern mitigates risks of forgetting to unlock in exceptional code paths. The approach is sound and safe.

examples/raft/main.cpp (1)

84-85: Review election timeout usage

Using generateRandomTimeout() directly as the RPC deadline is somewhat ad-hoc. If threads or timeouts are short, requests might prematurely fail and disrupt the election process. Validate that this approach is within expected design constraints.

examples/CMakeLists.txt (1)

6-6: New subdirectory addition

Adding the raft subdirectory properly includes the new Raft module. Ensure that all required libraries and dependencies (like grpc, spdlog, and the proto files) are consistently linked in the raft subdirectory’s CMakeLists.txt.

run_replicas.sh (1)

3-5: Verify the deployment strategy for production

The current configuration uses localhost addresses (0.0.0.0), which is suitable for development but may need adjustment for production deployment.

Please clarify:

  1. Is this script intended for development only?
  2. For production deployment, will there be:
    • Configuration for actual host addresses?
    • Container orchestration (e.g., Kubernetes)?
    • Load balancer configuration?
.vscode/launch.json (1)

15-16: Verify external console requirement.

The configuration enables external console which differs from other configurations. Verify if this is required for the Raft implementation (e.g., for viewing distributed node logs separately).

Comment on lines 1 to 6
cmake_minimum_required(VERSION 3.25)
project(zkv)

add_executable(RaftMain "${CMAKE_CURRENT_SOURCE_DIR}/main.cpp")
set_target_properties(RaftMain PROPERTIES CXX_STANDARD 23)
target_link_libraries(RaftMain PRIVATE DB RaftProtoObjects)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add test configuration for Raft implementation

The Raft consensus is a critical component that requires thorough testing. Consider adding test configuration to ensure reliability.

Add the following after the existing code:

# Add test executable
add_executable(RaftTests "${CMAKE_CURRENT_SOURCE_DIR}/tests/raft_tests.cpp")
set_target_properties(RaftTests PROPERTIES CXX_STANDARD 23)
target_link_libraries(RaftTests PRIVATE 
    DB 
    RaftProtoObjects 
    Catch2::Catch2WithMain
)

# Register with CTest
add_test(NAME raft_tests COMMAND RaftTests)

This will help ensure:

  • Leader election correctness
  • Log replication consistency
  • Network partition handling
  • State persistence reliability

run_replicas.sh Outdated
Comment on lines 1 to 5
#!/bin/bash

./build/RaftMain --id 1 --nodes 0.0.0.0:8080,0.0.0.0:8081,0.0.0.0:8082 &> log_1.txt &
./build/RaftMain --id 2 --nodes 0.0.0.0:8080,0.0.0.0:8081,0.0.0.0:8082 &> log_2.txt &
./build/RaftMain --id 3 --nodes 0.0.0.0:8080,0.0.0.0:8081,0.0.0.0:8082 &> log_3.txt &
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Enhance the replica deployment script for production readiness

The current script has several limitations that could cause issues in production:

  1. No cleanup of background processes
  2. No health checks for node startup
  3. No error handling for port availability
  4. Hardcoded configuration
  5. No graceful shutdown mechanism

Replace the script with this more robust version:

-#!/bin/bash
-
-./build/RaftMain --id 1 --nodes 0.0.0.0:8080,0.0.0.0:8081,0.0.0.0:8082 &> log_1.txt &
-./build/RaftMain --id 2 --nodes 0.0.0.0:8080,0.0.0.0:8081,0.0.0.0:8082 &> log_2.txt &
-./build/RaftMain --id 3 --nodes 0.0.0.0:8080,0.0.0.0:8081,0.0.0.0:8082 &> log_3.txt &
+#!/bin/bash
+set -euo pipefail
+
+# Configuration
+readonly BASE_PORT=8080
+readonly NUM_REPLICAS=3
+readonly BINARY="./build/RaftMain"
+readonly LOG_DIR="logs"
+
+# Create log directory
+mkdir -p "${LOG_DIR}"
+
+# Cleanup function
+cleanup() {
+    echo "Shutting down Raft nodes..."
+    kill $(jobs -p) 2>/dev/null || true
+    wait
+}
+
+# Register cleanup on script exit
+trap cleanup EXIT
+
+# Build node list
+NODES=""
+for ((i=1; i<=NUM_REPLICAS; i++)); do
+    port=$((BASE_PORT + i - 1))
+    if [ ! -z "$NODES" ]; then
+        NODES="${NODES},"
+    fi
+    NODES="${NODES}0.0.0.0:${port}"
+done
+
+# Start nodes
+for ((i=1; i<=NUM_REPLICAS; i++)); do
+    port=$((BASE_PORT + i - 1))
+    
+    # Check if port is available
+    if nc -z localhost $port 2>/dev/null; then
+        echo "Error: Port $port is already in use"
+        exit 1
+    fi
+    
+    # Start node
+    echo "Starting Raft node $i on port $port..."
+    $BINARY --id $i --nodes $NODES &> "${LOG_DIR}/log_${i}.txt" &
+    
+    # Wait briefly to ensure proper startup order
+    sleep 1
+    
+    # Check if process is running
+    if ! kill -0 $! 2>/dev/null; then
+        echo "Error: Failed to start node $i"
+        exit 1
+    fi
+done
+
+echo "All Raft nodes started. Logs available in ${LOG_DIR}/"
+echo "Press Ctrl+C to shutdown all nodes"
+
+# Wait for Ctrl+C
+wait

This enhanced version includes:

  • Process cleanup on exit
  • Port availability checking
  • Configurable number of replicas
  • Organized logging
  • Startup health checks
  • Proper error handling
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
#!/bin/bash
./build/RaftMain --id 1 --nodes 0.0.0.0:8080,0.0.0.0:8081,0.0.0.0:8082 &> log_1.txt &
./build/RaftMain --id 2 --nodes 0.0.0.0:8080,0.0.0.0:8081,0.0.0.0:8082 &> log_2.txt &
./build/RaftMain --id 3 --nodes 0.0.0.0:8080,0.0.0.0:8081,0.0.0.0:8082 &> log_3.txt &
#!/bin/bash
set -euo pipefail
# Configuration
readonly BASE_PORT=8080
readonly NUM_REPLICAS=3
readonly BINARY="./build/RaftMain"
readonly LOG_DIR="logs"
# Create log directory
mkdir -p "${LOG_DIR}"
# Cleanup function
cleanup() {
echo "Shutting down Raft nodes..."
kill $(jobs -p) 2>/dev/null || true
wait
}
# Register cleanup on script exit
trap cleanup EXIT
# Build node list
NODES=""
for ((i=1; i<=NUM_REPLICAS; i++)); do
port=$((BASE_PORT + i - 1))
if [ ! -z "$NODES" ]; then
NODES="${NODES},"
fi
NODES="${NODES}0.0.0.0:${port}"
done
# Start nodes
for ((i=1; i<=NUM_REPLICAS; i++)); do
port=$((BASE_PORT + i - 1))
# Check if port is available
if nc -z localhost $port 2>/dev/null; then
echo "Error: Port $port is already in use"
exit 1
fi
# Start node
echo "Starting Raft node $i on port $port..."
$BINARY --id $i --nodes $NODES &> "${LOG_DIR}/log_${i}.txt" &
# Wait briefly to ensure proper startup order
sleep 1
# Check if process is running
if ! kill -0 $! 2>/dev/null; then
echo "Error: Failed to start node $i"
exit 1
fi
done
echo "All Raft nodes started. Logs available in ${LOG_DIR}/"
echo "Press Ctrl+C to shutdown all nodes"
# Wait for Ctrl+C
wait

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (5)
examples/raft/main.cpp (5)

32-33: Consider making timeout values configurable.

The hardcoded timeout values could be moved to configuration to allow tuning based on network conditions and deployment requirements.

-    const int minTimeout{150};
-    const int maxTimeout{300};
+    static constexpr const char* CONFIG_MIN_TIMEOUT = "RAFT_MIN_TIMEOUT";
+    static constexpr const char* CONFIG_MAX_TIMEOUT = "RAFT_MAX_TIMEOUT";
+    const int minTimeout = std::getenv(CONFIG_MIN_TIMEOUT) ? std::atoi(std::getenv(CONFIG_MIN_TIMEOUT)) : 150;
+    const int maxTimeout = std::getenv(CONFIG_MAX_TIMEOUT) ? std::atoi(std::getenv(CONFIG_MAX_TIMEOUT)) : 300;

53-60: Optimize parameter passing in constructor.

The ip parameter should be passed by const reference for better performance with strings.

-    NodeClient(const ID id, const IP ip)
+    NodeClient(const ID id, const IP& ip)
🧰 Tools
🪛 cppcheck (2.10-2)

[performance] 53-53: Function parameter 'ip' should be passed by const reference.

(passedByValue)


103-103: Remove commented-out code.

The commented-out completion queue declaration should be removed if it's not needed.

-    /*grpc::CompletionQueue                   m_cq;*/

309-309: Implement persistent state initialization.

The TODO comment indicates missing persistence implementation which is crucial for Raft's durability guarantees.

Would you like me to help implement the persistent state initialization using a simple file-based storage solution?


591-592: Improve error message formatting.

The error message should follow consistent formatting and include the actual value for better debugging.

-        spdlog::error("ID of the node should be positve integer");
+        spdlog::error("Invalid node ID: {}. Node ID must be a positive integer.", nodeId);
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 10396a7 and b9b5995.

📒 Files selected for processing (1)
  • examples/raft/main.cpp (1 hunks)
🧰 Additional context used
🪛 cppcheck (2.10-2)
examples/raft/main.cpp

[performance] 53-53: Function parameter 'ip' should be passed by const reference.

(passedByValue)

Comment on lines 504 to 507
void decrementNextIndex(ID id)
{
(void)id;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Implement decrementNextIndex method.

The method is currently a no-op but is crucial for Raft's log replication mechanism.

     void decrementNextIndex(ID id)
     {
-        (void)id;
+        if (m_nextIndex[id - 1] > 0) {
+            m_nextIndex[id - 1]--;
+        }
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
void decrementNextIndex(ID id)
{
(void)id;
}
void decrementNextIndex(ID id)
{
if (m_nextIndex[id - 1] > 0) {
m_nextIndex[id - 1]--;
}
}

Comment on lines 226 to 240
while (!m_stopElectionTimer)
{
if (token.stop_requested())
{
spdlog::info("Stopping election timer thread");
return;
}

{
absl::MutexLock locker(&m_electionMutex);
if (m_state == NodeState::LEADER)
{
continue;
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix potential race condition in election timer loop.

The continue statement inside the leader check could cause the thread to spin continuously if the node is the leader. Consider using a condition variable to wait efficiently.

                     if (m_state == NodeState::LEADER)
                     {
-                        continue;
+                        m_timerCV.WaitWithTimeout(&m_timerMutex, absl::Milliseconds(100));
+                        continue;
                     }

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines 449 to 502
std::thread(
[this, &client, heartbeatInterval]()
{
while (m_state == NodeState::LEADER)
{
AppendEntriesRequest request;
{
absl::ReaderMutexLock locker(&m_electionMutex);

request.set_term(m_currentTerm);
request.set_prevlogterm(getLastLogTerm());
request.set_prevlogindex(getLastLogIndex());
request.set_leadercommit(m_commitIndex);
request.set_senderid(m_id);
}

{
AppendEntriesResponse response;
if (!client.appendEntries(request, &response))
{
spdlog::error("AppendEntriesRequest failed during heartbeat");
return;
}

auto responseTerm = response.term();
auto success = response.success();

spdlog::info(
"Received AppendEntriesResponse in requester thread peerTerm={} success={} responderId={}",
responseTerm,
success,
response.responderid());

{
absl::WriterMutexLock locker(&m_electionMutex);

if (responseTerm > m_currentTerm)
{
becomeFollower(responseTerm);
return;
}

if (!success)
{
decrementNextIndex(client.getId());
}
}
}

std::this_thread::sleep_for(heartbeatInterval);
}
})
.detach();
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Potential resource leak in heartbeat thread.

The detached thread might outlive the NodeClient reference it captures, leading to undefined behavior. Consider using a managed thread pool or ensuring proper lifecycle management.

-    std::thread(
+    auto heartbeatThread = std::make_shared<std::jthread>(
         [this, &client, heartbeatInterval]()
         {
             // ... existing code ...
-        })
-        .detach();
+        });
+    m_heartbeatThreads.push_back(heartbeatThread);

Committable suggestion skipped: line range outside the PR's diff.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
examples/raft/main.cpp (2)

30-40: Consider making timeout constants configurable and more descriptive.

The hardcoded timeout values could be made configurable to allow tuning based on network conditions and deployment requirements.

-    const int minTimeout{150};
-    const int maxTimeout{300};
+    static constexpr int MIN_ELECTION_TIMEOUT_MS{150};
+    static constexpr int MAX_ELECTION_TIMEOUT_MS{300};
+    // TODO: Consider making these configurable via command line arguments

53-60: Pass 'ip' parameter by const reference for better performance.

Passing string parameters by value creates unnecessary copies.

-    NodeClient(const ID id, const IP ip)
+    NodeClient(const ID id, const IP& ip)
🧰 Tools
🪛 cppcheck (2.10-2)

[performance] 53-53: Function parameter 'ip' should be passed by const reference.

(passedByValue)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b9b5995 and 614c2bc.

📒 Files selected for processing (1)
  • examples/raft/main.cpp (1 hunks)
🧰 Additional context used
🪛 cppcheck (2.10-2)
examples/raft/main.cpp

[performance] 53-53: Function parameter 'ip' should be passed by const reference.

(passedByValue)

🔇 Additional comments (3)
examples/raft/main.cpp (3)

491-494: ⚠️ Potential issue

Implement decrementNextIndex method.

The method is currently a no-op but is crucial for Raft's log replication mechanism.

     void decrementNextIndex(ID id)
     {
-        (void)id;
+        if (m_nextIndex[id - 1] > 0) {
+            m_nextIndex[id - 1]--;
+        }
     }

Likely invalid or redundant comment.


436-489: ⚠️ Potential issue

Replace detached thread with managed thread for heartbeat.

Using detached threads can lead to resource leaks and undefined behavior if the referenced objects are destroyed. Consider using a managed thread pool or at least storing the thread handle.

-        std::thread(
+        auto heartbeatThread = std::make_shared<std::jthread>(
             [this, &client, heartbeatInterval]()
             {
                 // ... existing code ...
-            })
-            .detach();
+            });
+        m_heartbeatThreads.push_back(heartbeatThread);

Likely invalid or redundant comment.


226-240: ⚠️ Potential issue

Fix potential deadlock in election timer loop.

The current implementation might deadlock if the node becomes a leader while holding the election mutex. Consider using a condition variable with a timeout instead of a busy wait.

-                    if (m_state == NodeState::LEADER)
-                    {
-                        continue;
-                    }
+                    if (m_state == NodeState::LEADER)
+                    {
+                        m_timerCV.WaitWithTimeout(&m_timerMutex, absl::Milliseconds(100));
+                        continue;
+                    }

Likely invalid or redundant comment.

Comment on lines 56 to 57
m_channel(grpc::CreateChannel(m_ip, grpc::InsecureChannelCredentials())),
m_stub(RaftService::NewStub(m_channel))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Security concern: Using insecure channel credentials.

The use of grpc::InsecureChannelCredentials() is not suitable for production environments as it exposes the communication to potential man-in-the-middle attacks.

Consider using TLS certificates for secure communication:

auto creds = grpc::SslCredentials(grpc::SslCredentialsOptions());
m_channel = grpc::CreateChannel(m_ip, creds);

Comment on lines 595 to 597
cm.start();

cm.stop();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Implement graceful shutdown handling.

The server should handle SIGTERM/SIGINT signals for graceful shutdown instead of immediate stop.

void signal_handler(int signal) {
    spdlog::info("Received signal {}. Initiating graceful shutdown...", signal);
    // Trigger graceful shutdown
}

// In main():
std::signal(SIGTERM, signal_handler);
std::signal(SIGINT, signal_handler);

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
examples/raft/raft.h (1)

22-38: Consider passing ip by const reference in the NodeClient constructor.

Static analysis flagged this constructor parameter as a performance concern. Although this might be minor for small strings, passing by const reference avoids unnecessary copies, especially if IP can hold large strings or if this constructor is called frequently.

- NodeClient(const ID id, const IP ip);
+ NodeClient(const ID id, const IP& ip);
examples/raft/raft.cpp (1)

22-22: Pass ip by const reference to optimize.

As noted by static analysis, passing ip by const reference can avoid copying strings if they are large or if the constructor is called often.

- NodeClient::NodeClient(const ID id, const IP ip)
+ NodeClient::NodeClient(const ID id, const IP& ip)
🧰 Tools
🪛 cppcheck (2.10-2)

[performance] 22-22: Function parameter 'ip' should be passed by const reference.

(passedByValue)

examples/raft/main.cpp (1)

39-42: Avoid copying IP strings in the for loop.

In the loop iterating over nodeIps, consider capturing the iterated string by const reference to avoid unnecessary copies. For large or numerous IP strings, this minor refactor can improve performance:

- for (auto ip : nodeIps)
+ for (const auto& ip : nodeIps)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 614c2bc and 2e6374a.

📒 Files selected for processing (4)
  • examples/raft/CMakeLists.txt (1 hunks)
  • examples/raft/main.cpp (1 hunks)
  • examples/raft/raft.cpp (1 hunks)
  • examples/raft/raft.h (1 hunks)
🧰 Additional context used
🪛 cppcheck (2.10-2)
examples/raft/raft.cpp

[performance] 22-22: Function parameter 'ip' should be passed by const reference.

(passedByValue)

examples/raft/main.cpp

[performance] 25-25: Function parameter 'ip' should be passed by const reference.

(passedByValue)

🔇 Additional comments (4)
examples/raft/raft.h (1)

40-125: enable_shared_from_this usage check.

The class ConsensusModule inherits from std::enable_shared_from_this<ConsensusModule> but does not appear to call shared_from_this(). Verify whether this inheritance is intentional. If not, consider removing it to simplify the class.

examples/raft/raft.cpp (2)

193-198: Election timer loop concurrency risk
This comment reaffirms a previously raised concern about potential continuous spinning or premature exit when the node becomes a leader, preventing election timeout monitoring if leadership changes again.


459-463: decrementNextIndex remains unimplemented
This method is critical for the Raft log replication process but is currently a no-op. Consider implementing the logic to decrement m_nextIndex[id - 1], as recommended in a prior review.

examples/raft/CMakeLists.txt (1)

4-6: Add specialized test target for Raft.

A test configuration for the Raft consensus ensures reliability of leader election, log replication, and fault tolerance. See previous suggestion for details on adding a dedicated test executable.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (2)
examples/raft/raft.cpp (2)

31-43: Add deadline to appendEntries RPC call.

For consistency with requestVote, consider adding a deadline to prevent hanging on network issues:

 auto NodeClient::appendEntries(const AppendEntriesRequest &request, AppendEntriesResponse *response) -> bool
 {
     grpc::ClientContext context;
+    context.set_deadline(std::chrono::system_clock::now() + std::chrono::seconds(generateRandomTimeout()));
 
     grpc::Status status = m_stub->AppendEntries(&context, request, response);

279-286: Implement persistent state storage.

The TODO comment indicates missing persistence implementation:

Would you like me to help implement the persistent state storage using RocksDB or a similar key-value store? This would ensure durability of critical Raft state across restarts.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2e6374a and e0a354b.

📒 Files selected for processing (2)
  • examples/raft/raft.cpp (1 hunks)
  • lib/structures/lsmtree/levels/level.cpp (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • lib/structures/lsmtree/levels/level.cpp
🧰 Additional context used
🪛 cppcheck (2.10-2)
examples/raft/raft.cpp

[performance] 22-22: Function parameter 'ip' should be passed by const reference.

(passedByValue)

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
examples/raft/raft.cpp (2)

31-43: Enhance error handling in appendEntries.

The error handling could be more detailed and include retry logic for transient failures.

Consider adding:

 auto NodeClient::appendEntries(const AppendEntriesRequest &request, AppendEntriesResponse *response) -> bool
 {
+    constexpr int maxRetries = 3;
+    int attempts = 0;
+    while (attempts++ < maxRetries) {
     grpc::ClientContext context;
+    context.set_deadline(std::chrono::system_clock::now() + std::chrono::seconds(1));
 
     grpc::Status status = m_stub->AppendEntries(&context, request, response);
     if (!status.ok())
     {
-        spdlog::error("AppendEntries RPC call failed");
+        spdlog::error("AppendEntries RPC call failed. Error code={}, message={}, attempt={}/{}",
+                      static_cast<int>(status.error_code()),
+                      status.error_message(),
+                      attempts,
+                      maxRetries);
+        if (attempts < maxRetries) {
+            std::this_thread::sleep_for(std::chrono::milliseconds(100 * attempts));
+            continue;
+        }
         return false;
     }
+    break;
+    }
 
     return true;
 }

293-300: Implement persistent state initialization.

The TODO comment indicates missing persistence implementation, which is crucial for maintaining consistency across node restarts.

Would you like me to help implement the persistent state initialization using a simple file-based storage system?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e0a354b and 2fa9fca.

📒 Files selected for processing (1)
  • examples/raft/raft.cpp (1 hunks)
🧰 Additional context used
🪛 cppcheck (2.10-2)
examples/raft/raft.cpp

[performance] 22-22: Function parameter 'ip' should be passed by const reference.

(passedByValue)

🔇 Additional comments (5)
examples/raft/raft.cpp (5)

1-21: LGTM! Headers are well-organized and complete.

The includes are properly categorized and contain all necessary dependencies for gRPC, abseil synchronization primitives, and logging.


401-405: Fix incorrect majority calculation.

The majority calculation is incorrect as noted in the previous review. Using the existing fix:

 auto ConsensusModule::hasMajority(const uint32_t votes) const -> bool
 {
-    constexpr const double HALF_OF_THE_REPLICAS = 2.0;
-    return votes > static_cast<double>(m_replicas.size()) / HALF_OF_THE_REPLICAS;
+    // Total nodes = replicas + self
+    const size_t totalNodes = m_replicas.size() + 1;
+    return votes > totalNodes / 2;
 }

429-488: 🛠️ Refactor suggestion

Enhance heartbeat thread reliability.

The heartbeat mechanism needs improvements:

  1. Add retry logic for failed heartbeats
  2. Implement backoff strategy
  3. Add thread cleanup on errors
 void ConsensusModule::sendHeartbeat(NodeClient &client)
 {
-    constexpr const auto heartbeatInterval{std::chrono::milliseconds(100)};
+    constexpr const auto baseHeartbeatInterval{std::chrono::milliseconds(100)};
+    constexpr const int maxRetries = 3;
+    constexpr const auto maxBackoff{std::chrono::milliseconds(1000)};
 
     m_heartbeatThreads.emplace_back(
-        [this, &client, heartbeatInterval](std::stop_token token)
+        [this, &client](std::stop_token token)
         {
+            auto heartbeatInterval = baseHeartbeatInterval;
+            int consecutiveFailures = 0;

Likely invalid or redundant comment.


22-29: 🛠️ Refactor suggestion

Enhance constructor robustness and efficiency.

The constructor should validate inputs and handle connection failures:

-NodeClient::NodeClient(const ID id, const IP ip)
+NodeClient::NodeClient(const ID id, const IP& ip)
     : m_id{id},
       m_ip{ip},
       m_channel(grpc::CreateChannel(m_ip, grpc::InsecureChannelCredentials())),
       m_stub(RaftService::NewStub(m_channel))
 {
     assert(m_id > 0);
+    assert(!m_ip.empty());
+    
+    // Verify channel is valid
+    if (!m_channel || !m_stub) {
+        throw std::runtime_error("Failed to establish gRPC channel to " + m_ip);
+    }
 }

Likely invalid or redundant comment.

🧰 Tools
🪛 cppcheck (2.10-2)

[performance] 22-22: Function parameter 'ip' should be passed by const reference.

(passedByValue)


66-96: 🛠️ Refactor suggestion

Enhance constructor's thread safety and error handling.

The constructor needs better initialization and validation:

  1. Initialize atomic variables explicitly
  2. Add error handling for server startup
  3. Validate replica IDs
 ConsensusModule::ConsensusModule(const ID id, std::vector<IP> replicas)
     : m_id{id},
       m_currentTerm{0},
       m_votedFor{0},
-      m_state{NodeState::FOLLOWER}
+      m_state{NodeState::FOLLOWER},
+      m_stopElectionTimer{false},
+      m_electionInProgress{false},
+      m_leaderHeartbeatReceived{false},
+      m_voteCount{0}
 {
     assert(m_id > 0);
     assert(replicas.size() > 0);
     assert(m_id <= replicas.size());
 
     m_ip = replicas[m_id - 1];
+    if (m_ip.empty()) {
+        throw std::runtime_error("Invalid IP address for node " + std::to_string(m_id));
+    }
 
     grpc::ServerBuilder builder;
     builder.AddListeningPort(m_ip, grpc::InsecureServerCredentials());
     builder.RegisterService(this);
 
     m_server = builder.BuildAndStart();
+    if (!m_server) {
+        throw std::runtime_error("Failed to start gRPC server on " + m_ip);
+    }

Likely invalid or redundant comment.

Comment on lines 172 to 179
/*if (pRequest->lastlogterm() < getLastLogTerm() ||*/
/* (pRequest->lastlogterm() == getLastLogTerm() && pRequest->lastlogindex() < getLastLogIndex()))*/
/*{*/
/* pResponse->set_votegranted(0);*/
/* return grpc::Status::OK;*/
/*}*/
/**/
return grpc::Status::OK;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Uncomment and fix the log comparison logic in RequestVote.

The commented-out log comparison logic is essential for the Raft protocol's safety properties. It should be uncommented and fixed.

-    /*if (pRequest->lastlogterm() < getLastLogTerm() ||*/
-    /*    (pRequest->lastlogterm() == getLastLogTerm() && pRequest->lastlogindex() < getLastLogIndex()))*/
-    /*{*/
-    /*    pResponse->set_votegranted(0);*/
-    /*    return grpc::Status::OK;*/
-    /*}*/
-    /**/
+    // Rule: Candidate's log must be at least as up-to-date as receiver's log
+    if (pRequest->lastlogterm() < getLastLogTerm() ||
+        (pRequest->lastlogterm() == getLastLogTerm() && pRequest->lastlogindex() < getLastLogIndex()))
+    {
+        pResponse->set_term(m_currentTerm);
+        pResponse->set_votegranted(0);
+        pResponse->set_responderid(m_id);
+        return grpc::Status::OK;
+    }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
/*if (pRequest->lastlogterm() < getLastLogTerm() ||*/
/* (pRequest->lastlogterm() == getLastLogTerm() && pRequest->lastlogindex() < getLastLogIndex()))*/
/*{*/
/* pResponse->set_votegranted(0);*/
/* return grpc::Status::OK;*/
/*}*/
/**/
return grpc::Status::OK;
// Rule: Candidate's log must be at least as up-to-date as receiver's log
if (pRequest->lastlogterm() < getLastLogTerm() ||
(pRequest->lastlogterm() == getLastLogTerm() && pRequest->lastlogindex() < getLastLogIndex()))
{
pResponse->set_term(m_currentTerm);
pResponse->set_votegranted(0);
pResponse->set_responderid(m_id);
return grpc::Status::OK;
}
return grpc::Status::OK;

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
examples/raft/raft.cpp (1)

422-426: ⚠️ Potential issue

Fix incorrect majority calculation.
The current logic can lead to split votes. Please use total nodes (replicas.size() + 1) instead of just replicas when determining majority.

-constexpr const double HALF_OF_THE_REPLICAS = 2.0;
-return votes > static_cast<double>(m_replicas.size()) / HALF_OF_THE_REPLICAS;
+const size_t totalNodes = m_replicas.size() + 1; // includes self
+return votes > totalNodes / 2;
🧹 Nitpick comments (7)
examples/raft/raft.cpp (5)

25-43: Increase inline IP validation or logging information.
Currently, the constructor validates non-empty IP and logs channel creation issues. Consider adding more context or checks (e.g., IP format) to proactively catch configuration errors.


45-57: Add trace-level logs for debug support.
The appendEntries RPC call is crucial to Raft. Adding trace logs (e.g., printing the request’s term) can help debug replication issues.


59-73: Implement consistent RPC deadlines.
You set a deadline for RequestVote, whereas appendEntries has no per-RPC deadline. Consider aligning both to ensure consistent timeout behavior.


80-114: Consider checking the validity of term or replica size mismatch.
In the constructor, you assume m_id <= replicas.size(). If this is violated, you throw in the assertion; consider converting it into a runtime error or user-friendly message for production builds.


288-312: Ensure graceful handling of any outstanding RPC calls on stop.
Consider waiting for in-flight RPCs to complete or adding a short deadline prior to forcibly shutting down the gRPC server.

examples/raft/raft.h (2)

20-20: Consider adding randomness seeding or injection for reproducibility.
generateRandomTimeout() depends on likely internal PRNG. For test determinism, you can seed or allow a configurable random source.


72-75: Document the lock invariants.
It is helpful to clarify in comments that becomeLeader and sendHeartbeat require the m_stateMutex to be locked. This ensures future maintainers understand the concurrency design.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2fa9fca and 0832944.

📒 Files selected for processing (2)
  • examples/raft/raft.cpp (1 hunks)
  • examples/raft/raft.h (1 hunks)
🔇 Additional comments (2)
examples/raft/raft.cpp (1)

154-196: Ensure you handle a second vote request from the same candidate within the same term.
When m_votedFor != 0 and is not the candidate’s ID, you immediately continue, but you might want to short-circuit with a negative response.

examples/raft/raft.h (1)

40-40: Revisit the multiple inheritance approach.
Inheriting from both RaftService::Service and std::enable_shared_from_this is valid but can complicate object lifetime. Confirm that shared pointers are consistently managed to avoid double free or dangling references.

Comment on lines 116 to 152
auto ConsensusModule::AppendEntries(grpc::ServerContext *pContext,
const AppendEntriesRequest *pRequest,
AppendEntriesResponse *pResponse) -> grpc::Status
{
(void)pContext;
(void)pRequest;
(void)pResponse;

spdlog::info("Recevied AppendEntries RPC from leader={} during term={}", pRequest->senderid(), pRequest->term());

absl::MutexLock locker(&m_stateMutex);
absl::MutexLock timerLocker(&m_timerMutex);

if (pRequest->term() < m_currentTerm)
{
pResponse->set_term(m_currentTerm);
pResponse->set_success(false);
pResponse->set_responderid(m_id);
return grpc::Status::OK;
}

if (pRequest->term() > m_currentTerm)
{
becomeFollower(pRequest->term());
}

pResponse->set_term(m_currentTerm);
pResponse->set_success(true);
pResponse->set_responderid(m_id);

spdlog::info("Node={} is resetting election timeout at term={}", m_id, m_currentTerm);
resetElectionTimer();

m_leaderHeartbeatReceived.store(true);

return grpc::Status::OK;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Validate the log index in AppendEntries.
The method always sets success=true without verifying log continuity (as per Raft spec). Not rejecting an inconsistent log entry could lead to divergent logs.

Comment on lines 450 to 495
void ConsensusModule::sendHeartbeat(NodeClient &client)
{
constexpr const auto heartbeatInterval{std::chrono::milliseconds(10)};
constexpr const int maxRetries{3};

m_heartbeatThreads.emplace_back(
[this, maxRetries, &client, heartbeatInterval](std::stop_token token)
{
spdlog::info("Node={} is starting a heartbeat thread for client={}", m_id, client.getId());

int consecutiveFailures = 0;
while (!token.stop_requested())
{
AppendEntriesRequest request;
{
absl::ReaderMutexLock locker(&m_stateMutex);
if (m_state != NodeState::LEADER)
{
spdlog::info("Node={} is no longer a leader. Stopping the heartbeat thread");
break;
}

request.set_term(m_currentTerm);
request.set_prevlogterm(getLastLogTerm());
request.set_prevlogindex(getLastLogIndex());
request.set_leadercommit(m_commitIndex);
request.set_senderid(m_id);
}

{
AppendEntriesResponse response;
if (!client.appendEntries(request, &response))
{
consecutiveFailures++;

consecutiveFailures = 0;
spdlog::error("AppendEntriesRequest failed during heartbeat. Attempt {}/{}",
consecutiveFailures,
maxRetries);
if (consecutiveFailures >= maxRetries)
{
return;
}

continue;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix the consecutive failures logic.
Within the heartbeat loop, consecutiveFailures++ is immediately reset to zero. This prevents retries from accumulating.

 if (!client.appendEntries(request, &response)) {
     consecutiveFailures++;
-    consecutiveFailures = 0;  // This negates the increment
     spdlog::error("AppendEntriesRequest failed during heartbeat. Attempt {}/{}",
                   consecutiveFailures,
                   maxRetries);
     ...
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
void ConsensusModule::sendHeartbeat(NodeClient &client)
{
constexpr const auto heartbeatInterval{std::chrono::milliseconds(10)};
constexpr const int maxRetries{3};
m_heartbeatThreads.emplace_back(
[this, maxRetries, &client, heartbeatInterval](std::stop_token token)
{
spdlog::info("Node={} is starting a heartbeat thread for client={}", m_id, client.getId());
int consecutiveFailures = 0;
while (!token.stop_requested())
{
AppendEntriesRequest request;
{
absl::ReaderMutexLock locker(&m_stateMutex);
if (m_state != NodeState::LEADER)
{
spdlog::info("Node={} is no longer a leader. Stopping the heartbeat thread");
break;
}
request.set_term(m_currentTerm);
request.set_prevlogterm(getLastLogTerm());
request.set_prevlogindex(getLastLogIndex());
request.set_leadercommit(m_commitIndex);
request.set_senderid(m_id);
}
{
AppendEntriesResponse response;
if (!client.appendEntries(request, &response))
{
consecutiveFailures++;
consecutiveFailures = 0;
spdlog::error("AppendEntriesRequest failed during heartbeat. Attempt {}/{}",
consecutiveFailures,
maxRetries);
if (consecutiveFailures >= maxRetries)
{
return;
}
continue;
}
void ConsensusModule::sendHeartbeat(NodeClient &client)
{
constexpr const auto heartbeatInterval{std::chrono::milliseconds(10)};
constexpr const int maxRetries{3};
m_heartbeatThreads.emplace_back(
[this, maxRetries, &client, heartbeatInterval](std::stop_token token)
{
spdlog::info("Node={} is starting a heartbeat thread for client={}", m_id, client.getId());
int consecutiveFailures = 0;
while (!token.stop_requested())
{
AppendEntriesRequest request;
{
absl::ReaderMutexLock locker(&m_stateMutex);
if (m_state != NodeState::LEADER)
{
spdlog::info("Node={} is no longer a leader. Stopping the heartbeat thread");
break;
}
request.set_term(m_currentTerm);
request.set_prevlogterm(getLastLogTerm());
request.set_prevlogindex(getLastLogIndex());
request.set_leadercommit(m_commitIndex);
request.set_senderid(m_id);
}
{
AppendEntriesResponse response;
if (!client.appendEntries(request, &response))
{
consecutiveFailures++;
spdlog::error("AppendEntriesRequest failed during heartbeat. Attempt {}/{}",
consecutiveFailures,
maxRetries);
if (consecutiveFailures >= maxRetries)
{
return;
}
continue;
}

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
.vscode/launch.json (1)

7-33: 🛠️ Refactor suggestion

Enhance RaftMain debug configuration for Raft development

The configuration needs improvements for effectively debugging Raft nodes:

  1. Add preLaunchTask to ensure RaftMain is compiled before debugging
  2. Add command line arguments for Raft configuration (node ID, peer list)
  3. Make the debug port configurable using VS Code input variables

Apply this diff to enhance the configuration:

         {
             "name": "rr - RaftMain",
             "type": "cppdbg",
             "request": "launch",
             "program": "${workspaceFolder}/build/RaftMain",
-            "miDebuggerServerAddress": "localhost:50505",
+            "miDebuggerServerAddress": "localhost:${input:debugPort}",
+            "args": [
+                "--node-id", "${input:nodeId}",
+                "--config", "${workspaceFolder}/assets/raft_config.json"
+            ],
             "stopAtEntry": false,
             "cwd": "${workspaceFolder}",
             "environment": [],
             "externalConsole": true,
+            "preLaunchTask": "build",
             "linux": {

Add these input configurations at the root level:

    "inputs": [
        {
            "id": "debugPort",
            "type": "promptString",
            "description": "Debug port for remote debugging",
            "default": "50505"
        },
        {
            "id": "nodeId",
            "type": "promptString",
            "description": "Raft node ID",
            "default": "1"
        }
    ]
🧹 Nitpick comments (3)
lib/proto/Raft.proto (1)

8-17: Unify naming conventions in enums and fields
Consider using a consistent naming style (e.g., lastLogTerm) in the request messages or switching to snake_case to match common protobuf conventions.

examples/raft/main.cpp (1)

69-69: Consider implementing signal-based graceful shutdown
Currently, the Raft node stops only after the main logic completes. Adding handling of SIGTERM/SIGINT to initiate a controlled shutdown can prevent data inconsistencies or partial writes.

.vscode/launch.json (1)

34-64: Align Debug - RaftMain configuration with other debug configurations

The configuration should be consistent with other debug configurations in the file:

  1. Add command line arguments for Raft configuration
  2. Add launchCompleteCommand for consistent behavior
  3. Add missing field descriptions

Apply this diff:

         {
             "name": "Debug - RaftMain",
             "type": "cppdbg",
             "request": "attach",
             "program": "${workspaceFolder}/build/RaftMain",
             "processId": "${command:pickProcess}",
             "MIMode": "gdb",
+            "args": [ // Arguments for Raft configuration
+                "--node-id", "1",
+                "--config", "${workspaceFolder}/assets/raft_config.json"
+            ],
             "setupCommands": [
                 {
                     "description": "Enable pretty-printing for gdb",
                     "text": "-enable-pretty-printing",
                     "ignoreFailures": true
                 }
             ],
             "preLaunchTask": "build",
             "miDebuggerPath": "/usr/bin/gdb",
             "logging": {
                 "trace": true,
                 "traceResponse": true,
                 "engineLogging": false
             },
             "targetArchitecture": "x86_64",
             "pipeTransport": {
                 "pipeCwd": "",
                 "pipeProgram": "/bin/bash",
                 "pipeArgs": [
                     "-c"
                 ],
                 "debuggerPath": "/usr/bin/gdb"
             },
+            "launchCompleteCommand": "exec-run" // Consistent with other configurations
         },
🧰 Tools
🪛 Biome (1.9.4)

[error] 64-64: Expected a property but instead found '}'.

Expected a property here.

(parse)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0832944 and 2b7359e.

📒 Files selected for processing (6)
  • .vscode/launch.json (2 hunks)
  • examples/raft/CMakeLists.txt (1 hunks)
  • examples/raft/main.cpp (1 hunks)
  • examples/raft/raft.cpp (1 hunks)
  • examples/raft/raft.h (1 hunks)
  • lib/proto/Raft.proto (1 hunks)
🧰 Additional context used
🪛 Biome (1.9.4)
.vscode/launch.json

[error] 64-64: Expected a property but instead found '}'.

Expected a property here.

(parse)

🔇 Additional comments (9)
examples/raft/raft.cpp (5)

28-28: Use atomic or mutex to protect gFirstElection
gFirstElection is a global integer that is written from multiple threads (e.g., in the election timer and upon heartbeat reception). For thread safety, consider making this variable atomic or protecting it with a mutex.


33-34: Security concern: Using insecure channel credentials
This code uses grpc::InsecureChannelCredentials(), which is typically insecure for production. Consider TLS or an equivalent secure channel.


509-513: Fix incorrect majority calculation
This logic uses votes > m_replicas.size() / 2.0 and lacks the node’s own vote in the total. Consider using (m_replicas.size() + 1) to account for the node itself.


569-581: Fix the consecutive failures logic
consecutiveFailures++ is immediately reset to zero on line 571, preventing retries from incrementing. This makes the retry loop unusable as intended.


133-134: Unused or uninitialized server thread
While m_serverThread is declared and stopped in stop(), it never appears to be started. This might cause confusion or unexpected deadlocks. Verify if it’s needed and ensure proper initialization if so.

Also applies to: 421-423

examples/raft/CMakeLists.txt (1)

4-6: Add test configuration for Raft implementation
Comprehensive testing of Raft is crucial. Adding a test binary and registering tests ensures reliability of leader election, log replication, and fault tolerance.

examples/raft/main.cpp (1)

25-25: Environment variable usage may have portability issues
Using setenv("ASAN_OPTIONS", ...) is helpful for debugging, but consider limiting this to debug builds to avoid altering production behavior.

examples/raft/raft.h (2)

74-75: Unused method initializeVolatileState
Although declared, initializeVolatileState() has no definition or usage in the codebase. If it's part of your design, ensure it is implemented. Otherwise, remove it to avoid confusion.


82-83: Confirm concurrency invariants for sendAppendEntriesRPC
sendAppendEntriesRPC is not marked with the same lock requirements as sendHeartbeat. Verify that you consistently acquire the necessary locks in both methods so that log mutations are always thread-safe.

],
"debuggerPath": "/usr/bin/gdb"
},
},
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix JSON syntax error

Remove the trailing comma after the pipeTransport object.

Apply this diff:

                 ],
                 "debuggerPath": "/usr/bin/gdb"
-            },
+            }
         },
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
},
],
"debuggerPath": "/usr/bin/gdb"
}
},
🧰 Tools
🪛 Biome (1.9.4)

[error] 64-64: Expected a property but instead found '}'.

Expected a property here.

(parse)

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

♻️ Duplicate comments (3)
examples/raft/raft.cpp (3)

141-224: ⚠️ Potential issue

Fix log replication safety issues.

The AppendEntries implementation has several critical issues:

  1. Missing validation of request fields
  2. Potential off-by-one errors in log indexing
  3. Inconsistent state updates
+    // Validate request fields
+    if (!pRequest) {
+        return grpc::Status(grpc::StatusCode::INVALID_ARGUMENT, "Null request");
+    }

     // 3. Append new entries and remove conflicting ones
     auto newEntryStart = pRequest->prevlogindex() + 1;
+    // Check for integer overflow
+    if (newEntryStart < pRequest->prevlogindex()) {
+        return grpc::Status(grpc::StatusCode::INVALID_ARGUMENT, "Index overflow");
+    }

-    m_log.insert(m_log.end(), pRequest->entries().begin(), pRequest->entries().end());
+    // Validate and append entries
+    for (const auto& entry : pRequest->entries()) {
+        if (entry.term() < m_currentTerm) {
+            return grpc::Status(grpc::StatusCode::FAILED_PRECONDITION, "Entry from previous term");
+        }
+        m_log.push_back(entry);
+    }

481-584: ⚠️ Potential issue

Fix election safety issues.

The election implementation has critical safety issues:

  1. Race condition in vote counting
  2. Incorrect majority calculation (as noted in past reviews)
  3. Missing timeout handling for concurrent elections
-    constexpr const double HALF_OF_THE_REPLICAS = 2.0;
-    return votes > static_cast<double>(m_replicas.size()) / HALF_OF_THE_REPLICAS;
+    // Total nodes = replicas + self
+    const size_t totalNodes = m_replicas.size() + 1;
+    return votes > totalNodes / 2;

     void startElection() {
+        const auto electionTimeout = std::chrono::seconds(generateRandomTimeout());
+        const auto deadline = std::chrono::steady_clock::now() + electionTimeout;
+
         RequestVoteRequest request;
         {
             absl::WriterMutexLock locker(&m_stateMutex);
+            if (std::chrono::steady_clock::now() >= deadline) {
+                spdlog::warn("Election timed out before starting");
+                return;
+            }

586-660: ⚠️ Potential issue

Fix heartbeat reliability issues.

The heartbeat implementation has several critical issues:

  1. Incorrect retry logic (consecutiveFailures reset too early)
  2. Potential thread leaks in error paths
  3. Missing timeout handling
     if (!client.appendEntries(request, &response)) {
         consecutiveFailures++;
-        consecutiveFailures = 0;  // This negates the increment
         spdlog::error("AppendEntriesRequest failed during heartbeat. Attempt {}/{}",
                       consecutiveFailures,
                       maxRetries);
         if (consecutiveFailures >= maxRetries) {
+            absl::WriterMutexLock locker(&m_stateMutex);
+            becomeFollower(m_currentTerm);  // Step down if can't reach majority
             return;
         }
         continue;
     }
🧹 Nitpick comments (2)
examples/raft/raft.h (1)

1-24: Add include guards and improve documentation.

While #pragma once works for most modern compilers, consider adding traditional include guards for maximum compatibility. Also add documentation for type aliases and threading model.

 #pragma once
+#ifndef RAFT_H_
+#define RAFT_H_

+// Type aliases for improved readability and maintainability
 using IP = std::string;         // Represents an IP address
 using ID = uint32_t;           // Unique identifier for nodes
 using Clock = std::chrono::high_resolution_clock;
 using TimePoint = std::chrono::high_resolution_clock::time_point;

 // Valid IDs start from 1
 constexpr const ID invalidId = 0;

+#endif  // RAFT_H_
examples/raft/raft.cpp (1)

1-796: Improve logging for better observability.

The logging implementation needs improvements:

  1. Add structured logging with consistent fields
  2. Use appropriate log levels
  3. Add logging for all error paths
+    // Add logging configuration
+    spdlog::set_pattern("[%Y-%m-%d %H:%M:%S.%e] [%n] [%^%l%$] [%s:%#] %v");

-    spdlog::debug("Node={} starts election. New term={}", m_id, m_currentTerm);
+    spdlog::debug("Starting election node_id={} current_term={} state={}", 
+                  m_id, m_currentTerm, static_cast<int>(m_state));

-    spdlog::error("AppendEntries RPC call failed");
+    spdlog::error("RPC failed operation={} node_id={} peer_id={} error_code={} error_message={}", 
+                  "AppendEntries", m_id, client.getId(), 
+                  static_cast<int>(status.error_code()), status.error_message());
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2b7359e and f53b501.

📒 Files selected for processing (2)
  • examples/raft/raft.cpp (1 hunks)
  • examples/raft/raft.h (1 hunks)

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (7)
examples/raft/raft.cpp (6)

56-67: ⚠️ Potential issue

Correct the error messages in exceptions.

In the NodeClient constructor, the error messages incorrectly use m_id twice when formatting the error string. The second m_id should be m_ip to display the correct IP address.

Apply this diff to fix the error messages:

-throw std::runtime_error(fmt::format("Failed to establish a gRPC channel for node={} ip={}", m_id, m_id));
+throw std::runtime_error(fmt::format("Failed to establish a gRPC channel for node={} ip={}", m_id, m_ip));

Similarly, update the other error messages:

-throw std::runtime_error(fmt::format("Failed to create a stub for node={} ip={}", m_id, m_id));
+throw std::runtime_error(fmt::format("Failed to create a stub for node={} ip={}", m_id, m_ip));

-throw std::runtime_error(fmt::format("Failed to create a KV stub for node={} ip={}", m_id, m_id));
+throw std::runtime_error(fmt::format("Failed to create a KV stub for node={} ip={}", m_id, m_ip));

137-138: ⚠️ Potential issue

Avoid unsafe dynamic_cast and check for nullptr.

In the ConsensusModule constructor, you are using dynamic_cast without checking for a nullptr result, which can lead to undefined behavior if the cast fails.

Consider modifying the code to check if the cast is successful before using the result:

-auto raftService = dynamic_cast<RaftService::Service*>(this);
-auto kvService = dynamic_cast<TinyKVPPService::Service*>(this);
+auto* raftService = dynamic_cast<RaftService::Service*>(this);
+auto* kvService = dynamic_cast<TinyKVPPService::Service*>(this);
+if (!raftService || !kvService) {
+    throw std::runtime_error("Invalid service cast in ConsensusModule constructor");
+}

 builder.AddListeningPort(m_ip, grpc::InsecureServerCredentials());
-builder.RegisterService(dynamic_cast<RaftService::Service *>(this));
-builder.RegisterService(dynamic_cast<TinyKVPPService::Service *>(this));
+builder.RegisterService(raftService);
+builder.RegisterService(kvService);

647-649: ⚠️ Potential issue

Fix the logic error with consecutiveFailures.

Inside the heartbeat loop in sendHeartbeat, consecutiveFailures is incremented and then immediately reset to zero. This negates the increment and prevents retry logic from working correctly.

Apply this diff to correct the logic:

 if (!client.appendEntries(request, &response))
 {
     consecutiveFailures++;
-    consecutiveFailures = 0;
     spdlog::error("AppendEntriesRequest failed during heartbeat. Attempt {}/{}",
                   consecutiveFailures,
                   maxRetries);
     if (consecutiveFailures >= maxRetries)
     {
         return;
     }
     continue;
 }

471-491: ⚠️ Potential issue

Ensure proper shutdown and cleanup in stop() method.

In the ConsensusModule::stop() method, there are potential issues with thread management and server shutdown:

  • The m_serverThread is being requested to stop and joined, but it's not clear where this thread was started.
  • The m_raftServer->Shutdown() should be called before stopping the election and heartbeat threads to prevent new RPCs from being accepted during shutdown.

Consider reordering the shutdown sequence and ensuring all threads are properly managed:

 void ConsensusModule::stop()
 {
     absl::WriterMutexLock locker{&m_stateMutex};

+    if (m_raftServer)
+    {
+        m_raftServer->Shutdown();
+    }

     m_stopElection = true;

     m_electionThread.request_stop();
     m_electionThread.join();

     for (auto &heartbeatThread : m_heartbeatThreads)
     {
         heartbeatThread.request_stop();
         heartbeatThread.join();
     }
     m_heartbeatThreads.clear();

-    if (m_raftServer)
-    {
-        m_raftServer->Shutdown();
-    }

-    m_serverThread.request_stop();
-    m_serverThread.join();
 }

128-131: ⚠️ Potential issue

Initialize m_matchIndex and m_nextIndex for all replicas.

The m_matchIndex and m_nextIndex maps are not initialized in the constructor, which can lead to undefined behavior when accessing these values later.

Add initialization for these maps in the constructor:

 for (auto [id, ip] : std::ranges::views::enumerate(replicas))
 {
     if (id + 1 == m_id)
     {
         continue;
     }

     m_replicas.emplace(id + 1, NodeClient(id + 1, ip));
+    // Initialize matchIndex and nextIndex for each replica
+    m_matchIndex[id + 1] = 0;
+    m_nextIndex[id + 1] = getLastLogIndex() + 1;
 }

169-240: ⚠️ Potential issue

Ensure compliance with Raft protocol in AppendEntries.

The AppendEntries method seems to have some incomplete implementation aspects regarding the Raft protocol, such as:

  • Not updating m_currentTerm correctly.
  • Missing reset of m_votedFor.
  • Incomplete log replication logic.

Review the method to ensure it fully complies with the Raft specification, especially Sections 5.3 and 5.4.

examples/raft/raft.h (1)

117-121: ⚠️ Potential issue

Add thread safety annotations for member variables.

Member variables that are accessed concurrently should have appropriate thread safety annotations to prevent data races and ensure correctness.

Add ABSL_GUARDED_BY annotations:

 uint32_t m_currentTerm      ABSL_GUARDED_BY(m_stateMutex);
 uint32_t m_votedFor         ABSL_GUARDED_BY(m_stateMutex);
 NodeState m_state           ABSL_GUARDED_BY(m_stateMutex);
 std::vector<LogEntry> m_log ABSL_GUARDED_BY(m_stateMutex);
-/*db::wal::wal_t m_wal        ABSL_GUARDED_BY(m_stateMutex);*/
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f53b501 and 497b88d.

📒 Files selected for processing (2)
  • examples/raft/raft.cpp (1 hunks)
  • examples/raft/raft.h (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: build-gcc
  • GitHub Check: build-clang
🔇 Additional comments (4)
examples/raft/raft.cpp (1)

379-389: Handle failure of persistent state initialization in init().

In the init() method, if initializePersistentState() fails, the node currently logs a warning but still returns false. There might be a need for more robust error handling or recovery strategies.

Please confirm whether it's acceptable for the node to proceed if persistent state initialization fails. If not, consider terminating the node or implementing a retry mechanism.

examples/raft/raft.h (3)

30-37: Add virtual destructor to NodeClient.

The NodeClient class contains virtual methods due to the use of gRPC stubs. It is recommended to have a virtual destructor to ensure proper cleanup of derived classes.

Add a virtual destructor:

 class NodeClient
 {
   public:
     NodeClient(ID nodeId, IP nodeIp);
+    virtual ~NodeClient() = default;

     auto appendEntries(const AppendEntriesRequest &request, AppendEntriesResponse *response) -> bool;
     auto requestVote(const RequestVoteRequest &request, RequestVoteResponse *response) -> bool;

48-75: Improve documentation and thread safety annotations in ConsensusModule.

The ConsensusModule class lacks sufficient documentation and thread safety annotations, which are important for maintainability and correctness.

Consider adding method documentation and ensuring that thread safety annotations are consistently applied:

 class ConsensusModule : public RaftService::Service,
                         public TinyKVPPService::Service,
                         std::enable_shared_from_this<ConsensusModule>
 {
   public:
     // Constructs a ConsensusModule with the given node ID and list of replica IPs.
     ConsensusModule(ID nodeId, std::vector<IP> replicas);

     // Raft RPCs
     auto AppendEntries(grpc::ServerContext        *pContext,
                        const AppendEntriesRequest *pRequest,
                        AppendEntriesResponse      *pResponse) -> grpc::Status override;

     // Other methods...
+    // Initializes the persistent state. Must be called before start().
     auto init() -> bool;

+    // Starts the consensus module's main loop and begins participating in the cluster.
     void start();

+    // Stops the consensus module and cleans up resources.
     void stop();

134-138: Use std::atomic_flag for better synchronization.

For atomic boolean flags, std::atomic_flag provides better guarantees and semantics compared to std::atomic<bool>.

Modify the declarations:

 std::atomic<bool>     m_leaderHeartbeatReceived{false};
 std::atomic<bool>     m_stopElection{false};
+std::atomic_flag      m_leaderHeartbeatReceived = ATOMIC_FLAG_INIT;
+std::atomic_flag      m_stopElection = ATOMIC_FLAG_INIT;

@lnikon lnikon self-assigned this Jan 9, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (3)
examples/raft/raft.cpp (3)

666-676: ⚠️ Potential issue

Fix the consecutiveFailures logic in sendHeartbeat.

Currently, consecutiveFailures is reset to zero immediately after being incremented, which prevents the failure count from accumulating correctly. This negates the retry mechanism and may cause premature termination of the heartbeat thread.

Apply this diff to correct the logic:

        consecutiveFailures++;
        spdlog::error("AppendEntriesRequest failed during heartbeat. Attempt {}/{}",
                      consecutiveFailures,
                      maxRetries);
        if (consecutiveFailures >= maxRetries)
        {
            return;
        }
-        consecutiveFailures = 0;
        continue;

710-784: 🛠️ Refactor suggestion

Avoid using detached threads in sendAppendEntriesRPC.

Using detached threads can lead to resource management issues and potential undefined behavior. It's safer to manage thread lifetimes explicitly.

Apply this diff to manage threads properly:

-        std::thread(
+        m_replicationThreads.emplace_back(
             [this](NodeClient &client, std::vector<LogEntry> logEntries)
             {
                 // ... existing code ...
             },
             std::ref(client),
             logEntries)
-            .detach();
+        );

Ensure that m_replicationThreads is stored (e.g., in a std::vector<std::jthread>) and properly joined or stopped during shutdown to prevent resource leaks.


590-603: ⚠️ Potential issue

Ensure thread safety in becomeFollower.

The method becomeFollower modifies shared state but may not have the necessary mutex protection in all calling contexts.

Ensure that becomeFollower is always called with m_stateMutex held or add mutex locking within the method:

+    absl::MutexLock locker(&m_stateMutex);
    m_currentTerm = newTerm;
    m_state = NodeState::FOLLOWER;
    updatePersistentState(std::nullopt, 0);

    // ... existing code ...

Alternatively, if the method is intended to be called with the mutex already locked, add a thread annotation to enforce this:

void ConsensusModule::becomeFollower(uint32_t newTerm) ABSL_EXCLUSIVE_LOCKS_REQUIRED(m_stateMutex);
🧹 Nitpick comments (3)
conan/profiles/release-gcc (1)

10-11: Review profile consistency across compilers

The GCC profile retains the compiler.libcxx setting while it's removed from Clang profiles. Consider:

  1. Standardizing the approach across all profiles
  2. Documenting the reason for different handling if intended
examples/raft/raft.cpp (1)

642-646: Prevent tight loop on failed heartbeats.

When the heartbeat fails, the loop immediately continues without a delay, potentially causing high CPU usage due to rapid retries.

Consider adding a short sleep before retrying to prevent a tight loop:

        continue;
+       // Add a short delay before the next attempt
+       std::this_thread::sleep_for(std::chrono::milliseconds(50));
    }
examples/raft/raft.h (1)

41-41: Add noexcept to destructor for NodeClient.

Consistently marking the destructor as noexcept is good practice and conveys exception safety guarantees.

Ensure the destructor is declared as:

virtual ~NodeClient() noexcept = default;
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 497b88d and ee68090.

📒 Files selected for processing (9)
  • CMakeLists.txt (2 hunks)
  • conan/profiles/debug-clang (1 hunks)
  • conan/profiles/debug-clang-tsan (0 hunks)
  • conan/profiles/debug-gcc (1 hunks)
  • conan/profiles/release-clang (1 hunks)
  • conan/profiles/release-gcc (1 hunks)
  • examples/raft/main.cpp (1 hunks)
  • examples/raft/raft.cpp (1 hunks)
  • examples/raft/raft.h (1 hunks)
💤 Files with no reviewable changes (1)
  • conan/profiles/debug-clang-tsan
✅ Files skipped from review due to trivial changes (1)
  • conan/profiles/debug-gcc
🚧 Files skipped from review as they are similar to previous changes (1)
  • examples/raft/main.cpp
⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: build-gcc
  • GitHub Check: build-clang
🔇 Additional comments (6)
conan/profiles/debug-clang (1)

10-10: Verify Clang 19 availability in the build environment

The upgrade to Clang 19 is significant. Please ensure:

  1. Clang 19 is available in the build environment
  2. All team members have access to this version
  3. CI/CD pipelines are updated accordingly

Run this script to check Clang availability:

✅ Verification successful

Clang 19 upgrade is well-supported by the project infrastructure

The compiler version upgrade is consistent with the project's requirements:

  • Both debug and release Clang profiles are aligned to version 19
  • Project requires C++23 features, which Clang 19 fully supports
  • Build environment is controlled via Docker images
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if Clang 19 is available and its features
clang_path=$(which clang)
if [ -x "$clang_path" ]; then
    $clang_path --version
    # Check C++23 support
    echo '#include <concepts>' | $clang_path -x c++ -std=c++23 -
fi

Length of output: 140


Script:

#!/bin/bash
# Search for compiler configurations and CI files
echo "=== Compiler Profiles ==="
fd ".*" conan/profiles --type f --exec cat {}

echo -e "\n=== CI Configurations ==="
fd "^\.gitlab-ci\.yml$|^\.github/workflows/.*\.yml$" --type f --exec cat {}

echo -e "\n=== Build System Files ==="
fd "CMakeLists.txt|conanfile\.txt|conanfile\.py" --type f --exec cat {}

Length of output: 14546

conan/profiles/release-clang (1)

10-10: LGTM! Changes are consistent with debug profile

The changes maintain consistency with the debug-clang profile, which is good practice.

conan/profiles/release-gcc (1)

9-9: Consider implications of removing GNU extensions

Changing from gnu23 to 23 removes GNU extensions. Please verify that:

  1. The Raft implementation doesn't rely on any GNU-specific features
  2. All existing code compiles without GNU extensions

Run this script to check for GNU extension usage:

✅ Verification successful

GNU extensions removal is safe

The codebase doesn't rely on GNU-specific features. The only GNU attribute usage is in thread safety annotations that are already properly guarded for Clang-only usage, and in isolated third-party GSL utility files. The core Raft implementation will compile correctly with standard C++23.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for common GNU extension usage
rg -l "(__GNUC__|__extension__|__attribute__|__asm__|typeof|__typeof__|__builtin)" 

# Check if any files use GNU-specific headers
rg -l "#include.*<(gnu/|bits/)"

Length of output: 328


Script:

#!/bin/bash
# Check thread_safety.h content
echo "=== thread_safety.h content ==="
cat examples/raft/thread_safety.h

# Look for GNU extensions specifically in Raft implementation
echo -e "\n=== GNU extensions in Raft code ==="
fd -e cpp -e h -e hpp . examples/raft --exec rg -l "(__GNUC__|__extension__|__attribute__|__asm__|typeof|__typeof__|__builtin)" {}

Length of output: 9927

examples/raft/raft.cpp (2)

142-147: Use static_cast instead of dynamic_cast for base class casting.

Since ConsensusModule publicly inherits from RaftService::Service, static_cast is sufficient and more efficient than dynamic_cast. This change also eliminates the need for runtime nullptr checks.

Apply this diff to replace dynamic_cast with static_cast:

-auto *raftService = dynamic_cast<RaftService::Service *>(this);
+auto *raftService = static_cast<RaftService::Service *>(this);
 if (raftService == nullptr)
 {
     throw std::runtime_error(fmt::format("Failed to dynamic_cast ConsensusModule to RaftService"));
 }

Similarly, update the casting for tkvppService:

-auto *tkvppService = dynamic_cast<TinyKVPPService::Service *>(this);
+auto *tkvppService = static_cast<TinyKVPPService::Service *>(this);

448-458: Verify election timer reset logic.

The logic within AwaitWithTimeout may not properly reset the election timer, potentially causing unintended election timeouts.

Please verify that the condition in AwaitWithTimeout correctly captures both the receipt of a heartbeat and the passage of the timeout duration:

bool heartbeatReceived = m_timerMutex.AwaitWithTimeout(
    absl::Condition(
        +[this, &timeToWaitDeadlineMs]() { 
            return m_leaderHeartbeatReceived.load() || currentTimeMs() >= timeToWaitDeadlineMs; 
        }
    ),
    absl::Milliseconds(timeToWaitMs)
);

Ensure that currentTimeMs() is appropriately calculated within the lambda to prevent stale time comparisons.

CMakeLists.txt (1)

16-22: Conditionally include -Wthread-safety for Clang compiler.

The -Wthread-safety flag is specific to Clang and can cause errors with other compilers like GCC.

The added conditional correctly handles compiler-specific flags, ensuring compatibility across different compilers.

Comment on lines 151 to 152
std::atomic<bool> m_leaderHeartbeatReceived{false};
std::atomic<bool> m_stopElection{false};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Atomic variables need explicit memory ordering specifications

The verification reveals that while atomic methods are used, memory ordering is not specified. This could lead to synchronization issues:

  • m_leaderHeartbeatReceived: All load/store operations should specify memory ordering
  • m_stopElection: Direct boolean access should be replaced with load/store operations with proper memory ordering

Recommended changes:

  • For m_leaderHeartbeatReceived.store(), use memory_order_release
  • For m_leaderHeartbeatReceived.load(), use memory_order_acquire
  • Replace direct access to m_stopElection with proper atomic operations
  • For m_stopElection, use memory_order_seq_cst as it's used for thread termination
🔗 Analysis chain

Ensure atomic variables are used safely with appropriate memory order.

The atomic variables m_leaderHeartbeatReceived and m_stopElection may require explicit memory order semantics to prevent data races.

Please verify that these atomic variables are accessed with consistent memory order, or consider protecting them with mutexes if they represent complex state changes.

For instance, use:

std::atomic<bool> m_leaderHeartbeatReceived{false};

And when accessing:

if (m_leaderHeartbeatReceived.load(std::memory_order_acquire)) {
    // ...
}
🏁 Scripts executed

The following scripts were executed for the analysis:

Error: No shell scripts found


Script:

#!/bin/bash
# Search for usages of m_leaderHeartbeatReceived
ast-grep --pattern 'm_leaderHeartbeatReceived'

# Search for usages of m_stopElection
ast-grep --pattern 'm_stopElection'

# Backup search using ripgrep for more context
rg "m_leaderHeartbeatReceived|m_stopElection" -A 2 -B 2

Length of output: 3647

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (5)
examples/raft/raft.cpp (2)

605-609: ⚠️ Potential issue

Correct majority calculation in hasMajority function

The current majority calculation may lead to incorrect results, especially in clusters with an even number of nodes. The calculation should consider the total number of nodes, including the current node.

Apply this diff to fix the calculation:

 auto ConsensusModule::hasMajority(uint32_t votes) const -> bool
 {
-    constexpr const double HALF_OF_THE_REPLICAS = 2.0;
-    return votes > static_cast<double>(m_replicas.size()) / HALF_OF_THE_REPLICAS;
+    const size_t totalNodes = m_replicas.size() + 1; // Including self
+    return votes > totalNodes / 2;
 }

666-678: ⚠️ Potential issue

Fix consecutiveFailures reset logic in heartbeat

The consecutiveFailures counter is reset to zero immediately after incrementing, which prevents the failure count from accumulating and may cause premature termination of the heartbeat loop.

Apply this diff to correct the failure counter logic:

 if (!client.appendEntries(request, &response))
 {
     consecutiveFailures++;
     spdlog::error("AppendEntriesRequest failed during heartbeat. Attempt {}/{}",
                   consecutiveFailures,
                   maxRetries);
     if (consecutiveFailures >= maxRetries)
     {
         return;
     }
-    consecutiveFailures = 0;
     continue;
 }
+else
+{
+    consecutiveFailures = 0; // Reset on successful response
+}
examples/raft/raft.h (3)

67-67: ⚠️ Potential issue

Inherit enable_shared_from_this publicly

ConsensusModule should inherit from std::enable_shared_from_this publicly to use shared_from_this() safely. Private inheritance may lead to undefined behavior when calling shared_from_this().

Apply this diff to change the inheritance:

 class ConsensusModule : public RaftService::Service,
                         public TinyKVPPService::Service,
-                        std::enable_shared_from_this<ConsensusModule>
+                        public std::enable_shared_from_this<ConsensusModule>
{
    // Class definition
};

151-152: 🛠️ Refactor suggestion

Specify memory order for atomic variables

The std::atomic<bool> variables m_leaderHeartbeatReceived and m_stopElection are accessed without specifying memory order, which may lead to synchronization issues.

Apply this diff to specify memory ordering:

 std::atomic<bool> m_leaderHeartbeatReceived{false};
 std::atomic<bool> m_stopElection{false};

 // When storing:
-m_leaderHeartbeatReceived.store(true);
+m_leaderHeartbeatReceived.store(true, std::memory_order_release);

 // When loading:
-if (m_leaderHeartbeatReceived.load())
+if (m_leaderHeartbeatReceived.load(std::memory_order_acquire))

Repeat similar changes for m_stopElection. Alternatively, protect accesses with mutexes if appropriate.


145-148: ⚠️ Potential issue

Initialize m_matchIndex and m_nextIndex properly

Ensure that m_matchIndex and m_nextIndex are initialized for all replica nodes in the constructor to prevent uninitialized access.

Confirm that in the constructor, after initializing m_replicas, you properly initialize m_matchIndex and m_nextIndex. If this is already done, ensure that the logic correctly covers all replica nodes.

🧹 Nitpick comments (1)
examples/raft/raft.cpp (1)

925-943: Handle malformed log entries during restore

In restorePersistentState(), if a log line is malformed, the code may crash or behave unexpectedly. Add checks to ensure that each line is correctly parsed.

Apply this diff to add error handling:

 while (std::getline(ifs, logLine))
 {
     std::stringstream sst(logLine);

     std::string   key;
     std::string   value;
     std::uint32_t term = 0;

     sst >> key >> value >> term;

+    if (sst.fail())
+    {
+        spdlog::error("Malformed log entry in {}: {}", path.c_str(), logLine);
+        continue; // Or handle the error as needed
+    }

     LogEntry logEntry;
     logEntry.set_key(key);
     logEntry.set_value(value);
     logEntry.set_term(term);
     m_log.emplace_back(logEntry);

     spdlog::info("Node={} restored logEntry=[key={}, value={}, term={}]", m_id, key, value, term);
 }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ee68090 and f68e8d1.

📒 Files selected for processing (2)
  • examples/raft/raft.cpp (1 hunks)
  • examples/raft/raft.h (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: build-gcc
  • GitHub Check: build-clang
🔇 Additional comments (4)
examples/raft/raft.cpp (2)

44-46: Pass nodeIp by const reference in NodeClient constructor

Passing nodeIp by const reference avoids unnecessary copying of the std::string, improving performance.

Apply this diff to modify the constructor:

-NodeClient::NodeClient(ID nodeId, IP nodeIp)
+NodeClient::NodeClient(ID nodeId, const IP& nodeIp)
     : m_id{nodeId},
-      m_ip{std::move(nodeIp)},
+      m_ip{nodeIp},
       m_channel(grpc::CreateChannel(m_ip, grpc::InsecureChannelCredentials())),
       m_stub(RaftService::NewStub(m_channel)),
       m_kvStub(TinyKVPPService::NewStub(m_channel))
{
    assert(m_id > 0);
    assert(!m_ip.empty());
    // ...
}

Remember to update the declaration in raft.h as well.


658-659: Correct previous log term and index in heartbeat

In the sendHeartbeat method, request.set_prevlogterm(getLastLogTerm()); and request.set_prevlogindex(getLastLogIndex()); may not correctly represent the leader's state when there are no new entries. Ensure that the correct previous log term and index are sent.

Please verify that getLastLogTerm() and getLastLogIndex() return the appropriate values in the context of heartbeats. If they refer to the latest log entries, consider adjusting them to represent the previous ones when necessary.

Run the following script to check the implementations of getLastLogTerm() and getLastLogIndex():

Ensure that these functions return zero when the log is empty and the correct term/index otherwise.

examples/raft/raft.h (2)

32-48: Add virtual destructor and documentation to NodeClient

To ensure proper resource cleanup and facilitate inheritance, add a virtual destructor to NodeClient. Additionally, provide method documentation for clarity.

Apply this diff:

 class NodeClient
 {
   public:
+    /**
+     * Virtual destructor for safe polymorphic destruction.
+     */
+    virtual ~NodeClient() noexcept = default;

     // Existing constructor and methods...

     /**
      * Sends AppendEntries RPC to the target node.
      */
     auto appendEntries(const AppendEntriesRequest &request, AppendEntriesResponse *response) -> bool;

     // Other methods...
 };

65-93: Improve documentation and inheritance in ConsensusModule

Add method documentation and ensure consistent use of override specifiers. Publicly inherit from enable_shared_from_this to avoid issues with shared_from_this().

Apply this diff:

 class ConsensusModule : public RaftService::Service,
                         public TinyKVPPService::Service,
-                        std::enable_shared_from_this<ConsensusModule>
+                        public std::enable_shared_from_this<ConsensusModule>
 {
   public:
     /**
      * Constructs a consensus module for a node in the Raft cluster.
      * @param nodeId Unique identifier for this node
      * @param replicas Vector of IP addresses for all nodes in cluster
      * @throws std::runtime_error if server creation fails
      */
     ConsensusModule(ID nodeId, std::vector<IP> replicas);

+    virtual ~ConsensusModule() noexcept override = default;

     /**
      * Handles AppendEntries RPC calls from other nodes.
      */
     auto AppendEntries(grpc::ServerContext *pContext,
                        const AppendEntriesRequest *pRequest,
                        AppendEntriesResponse *pResponse) -> grpc::Status override;

     // Other methods with documentation...
 };

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (3)
examples/raft/raft.cpp (3)

616-620: ⚠️ Potential issue

Fix incorrect majority calculation in hasMajority.

The majority calculation is incorrect and could lead to split votes or incorrect behavior. The function should consider the total number of nodes (replicas plus self) when determining if a majority has been achieved.

Apply this diff to correct the calculation:

 auto ConsensusModule::hasMajority(uint32_t votes) const -> bool
 {
-    constexpr const double HALF_OF_THE_REPLICAS = 2.0;
-    return votes > static_cast<double>(m_replicas.size()) / HALF_OF_THE_REPLICAS;
+    // Total nodes = replicas + self
+    const size_t totalNodes = m_replicas.size() + 1;
+    return votes >= (totalNodes / 2) + 1;
 }

This ensures that a majority requires more than half of the total nodes, which aligns with the Raft consensus requirements.


678-687: ⚠️ Potential issue

Correct the consecutiveFailures logic in heartbeat.

The consecutiveFailures counter is reset to zero immediately after being incremented, which prevents retries from accumulating correctly. This negates the purpose of tracking consecutive failures and may cause the heartbeat thread to continue indefinitely without proper handling of failures.

Apply this diff to fix the logic:

 if (!client.appendEntries(request, &response))
 {
     consecutiveFailures++;

     spdlog::error("AppendEntriesRequest failed during heartbeat. Attempt {}/{}",
                   consecutiveFailures,
                   maxRetries);
     if (consecutiveFailures >= maxRetries)
     {
         return;
     }
-    consecutiveFailures = 0;
     continue;
 }

+// Reset consecutiveFailures to zero only when AppendEntries RPC call succeeds
+} else {
+    consecutiveFailures = 0;
+}

This change ensures that consecutiveFailures is only reset upon a successful RPC call, allowing the retry mechanism to function correctly.


721-797: 🛠️ Refactor suggestion

Avoid detaching threads in sendAppendEntriesRPC.

Detaching threads can lead to undefined behavior and resource leaks. Use std::jthread or store the thread handles to manage thread lifetimes properly instead of detaching threads. Ensure proper thread management and cleanup.

Apply this diff to properly manage the threads:

 void ConsensusModule::sendAppendEntriesRPC(NodeClient &client, std::vector<LogEntry> logEntries)
 {
-    std::thread(
+    m_replicationThreads.emplace_back(
         [this](NodeClient &client, std::vector<LogEntry> logEntries)
         {
             // ... existing code ...
         },
-        std::ref(client),
-        logEntries)
-        .detach();
+        std::ref(client),
+        std::move(logEntries)
+    );
 }

+// Declare m_replicationThreads in your class to store these threads
+std::vector<std::jthread> m_replicationThreads;

Ensure you join these threads appropriately in the stop() method to prevent resource leaks:

 void ConsensusModule::stop()
 {
     // ... existing code ...

+    for (auto &replicationThread : m_replicationThreads)
+    {
+        replicationThread.request_stop();
+        replicationThread.join();
+    }
+    m_replicationThreads.clear();
 }
🧹 Nitpick comments (3)
examples/raft/raft.cpp (3)

72-75: Use consistent and fixed RPC timeout values instead of random timeouts.

Using random timeouts for RPC calls in appendEntries, requestVote, and put methods can lead to unpredictable behavior and complicate debugging. It's generally better to use a consistent and reasonable timeout value for RPC calls to ensure reliable communication and easier troubleshooting.

Apply this diff to use a fixed timeout:

 // In NodeClient::appendEntries (lines 72-75)
-    const auto RPC_TIMEOUT = std::chrono::seconds(generateRandomTimeout());
+    constexpr auto RPC_TIMEOUT = std::chrono::seconds(5); // Use a fixed 5-second timeout

 // In NodeClient::requestVote (lines 88-91)
-    const auto RPC_TIMEOUT = std::chrono::seconds(generateRandomTimeout());
+    constexpr auto RPC_TIMEOUT = std::chrono::seconds(5); // Use a fixed 5-second timeout

 // In NodeClient::put (lines 106-108)
-    context.set_deadline(std::chrono::system_clock::now() + std::chrono::seconds(generateRandomTimeout()));
+    context.set_deadline(std::chrono::system_clock::now() + std::chrono::seconds(5)); // Use a fixed 5-second timeout

Also applies to: 88-91, 106-108


143-155: Use static_cast instead of dynamic_cast for service registration.

Since ConsensusModule inherits from RaftService::Service and TinyKVPPService::Service, you can safely use static_cast instead of dynamic_cast. This avoids the overhead of run-time type checking and does not require RTTI support, improving performance and compatibility.

Apply this diff:

 // Replace dynamic_cast with static_cast for RaftService
-    auto *raftService = dynamic_cast<RaftService::Service *>(this);
+    auto *raftService = static_cast<RaftService::Service *>(this);

 // Replace dynamic_cast with static_cast for TinyKVPPService
-    auto *tkvppService = dynamic_cast<TinyKVPPService::Service *>(this);
+    auto *tkvppService = static_cast<TinyKVPPService::Service *>(this);

508-512: Remove commented and unused code related to m_serverThread.

The code involving m_serverThread is commented out, and there is no evidence of m_serverThread being initialized or used elsewhere. Keeping unused or commented code can lead to confusion and maintainability issues. Consider removing this code if it's not needed.

Apply this diff:

-    /*if (m_serverThread.joinable())*/
     {
         m_serverThread.request_stop();
         m_serverThread.join();
     }

If m_serverThread is intended to be used for managing the server thread, ensure it is properly initialized and managed throughout the code.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f68e8d1 and 3372412.

📒 Files selected for processing (2)
  • examples/raft/main.cpp (1 hunks)
  • examples/raft/raft.cpp (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • examples/raft/main.cpp
⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: build-gcc
  • GitHub Check: build-clang

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (3)
lib/raft/raft.h (1)

185-185: ⚠️ Potential issue

Protect m_shutdown against concurrent read-write access.
Accessing m_shutdown from multiple threads without synchronization can cause a data race. Either mark it as std::atomic<bool> or always guard it with m_stateMutex. This issue was raised in a past review and still appears to be unresolved.

- bool m_shutdown{false};
+ std::atomic<bool> m_shutdown{false};
lib/raft/raft.cpp (2)

458-458: ⚠️ Potential issue

Synchronize or make m_shutdown atomic when used in loops.
The loop condition !m_shutdown is read outside of a lock. This leads to a data race since m_shutdown may be set to true in another thread. Mark m_shutdown as std::atomic<bool> or guard accesses with the mutex.


846-862: 🛠️ Refactor suggestion

Consider using fsync for improved durability.
Reliability of Raft depends on durable writes. Simply flushing the stream might not guarantee disk synchronization. Use fsync or equivalent on supported platforms to ensure data is fully persisted, especially after writing the log.

🧹 Nitpick comments (3)
lib/raft/raft.h (1)

117-117: Consider passing nodeConfig by const reference.
A small performance optimization can be achieved by passing the nodeConfig parameter by const reference instead of by value, as flagged by the static analysis hints. This avoids copying the entire struct when constructing the consensus_module_t.

- consensus_module_t(node_config_t nodeConfig, std::vector<raft_node_grpc_client_t> replicas) noexcept;
+ consensus_module_t(const node_config_t &nodeConfig, std::vector<raft_node_grpc_client_t> replicas) noexcept;
lib/raft/raft.cpp (2)

180-196: Correct spelling of “received.”
There are multiple instances of “recevied” in the log messages. Consider fixing them for clarity and consistency.

- spdlog::debug("Node={} recevied AppendEntries RPC from leader={} during term={}",
+ spdlog::debug("Node={} received AppendEntries RPC from leader={} during term={}",

247-248: Implement or remove the TODO for state machine updates.
Currently, committed log entries are not applied to an actual state machine, which undermines Raft’s core functionality. Consider adding or wiring a proper state machine interface to complete the replication pipeline.

Would you like me to draft a snippet for state machine integration and open an issue?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0eb35f7 and f7f23db.

📒 Files selected for processing (2)
  • lib/raft/raft.cpp (1 hunks)
  • lib/raft/raft.h (1 hunks)
🧰 Additional context used
🧬 Code Definitions (1)
lib/raft/raft.h (1)
lib/raft/raft.cpp (4)
  • raft_node_grpc_client_t (55-61)
  • consensus_module_t (117-137)
  • votedFor (407-411)
  • votedFor (407-407)
🪛 Cppcheck (2.10-2)
lib/raft/raft.cpp

[performance] 117-117: Function parameter 'nodeConfig' should be passed by const reference.

(passedByValue)

⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: build-gcc
  • GitHub Check: build-clang
🔇 Additional comments (1)
lib/raft/raft.cpp (1)

884-884: Re-check logic overwriting m_votedFor after file read.
The code zeroes out m_votedFor immediately after reading a stored value, making the restored value ineffective. Verify that this behavior is truly intended.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 18

♻️ Duplicate comments (15)
lib/raft/CMakeLists.txt (1)

8-11: Proper test setup with no duplicate compilation.

The RaftTest executable is correctly set up to link against RaftLib rather than compiling raft.cpp twice. This addresses the issue mentioned in the previous review.

lib/fs/append_only_file.cpp (1)

67-71: Good use of std::expected for error handling.

The implementation has addressed previous concerns about error handling in methods like flush and reset by using std::expected to properly propagate errors from the underlying random_access_file_t. This is a significant improvement over the previous implementation.

Also applies to: 73-77

assets/database_config_schema.json (1)

53-74: ⚠️ Potential issue

maximumLevels is required but still undefined

The "lsm" object lists "maximumLevels" in its required array (lines 68‑73) yet never declares it under "properties". This causes every config to fail validation.

         "flushThreshold": {
           "type": "integer",
           "description": "The threshold of bytes at which the memtable should be flushed",
           "minimum": 1
         },
+        "maximumLevels": {
+          "type": "integer",
+          "description": "Maximum number of levels in the LSM tree",
+          "minimum": 1
+        },
lib/fs/append_only_file.h (1)

43-45: Returning std::stringstream by value may be heavy

Constructing and copying a whole stringstream every call could be costly for large logs. Consider returning a reference/wrapper or providing an API that streams directly to the caller.

lib/raft/raft.h (1)

213-215: ⚠️ Potential issue

m_shutdown is non‑atomic and unguarded → data‑race risk

m_shutdown is read/written from multiple threads (heartbeat, election, gRPC handlers) but is neither std::atomic nor protected by m_stateMutex, reopening the race condition highlighted in earlier review cycles.

-    bool m_shutdown{false};
+    std::atomic<bool> m_shutdown{false};

or guard every access with m_stateMutex.

lib/wal/log/persistent_log_storage.h (5)

158-173: Fix buffer size handling in read_impl.

The method doesn't resize the buffer according to the actual number of bytes read, which could lead to extraneous data when fewer bytes are read than requested.

 [[nodiscard]] auto read_impl(std::size_t offset, std::size_t size)
     -> std::string
 {
     std::string buffer;
     buffer.resize(size);
-    if (const auto res = m_file.read(offset, buffer.data(), size);
-        !res.has_value())
+    const auto res = m_file.read(offset, buffer.data(), size);
+    if (!res.has_value())
     {
         spdlog::error(
             "Failed to read from file storage. Offset={}, size={}",
             offset,
             size);
         return {};
     }
+    // Adjust buffer size to actual bytes read
+    buffer.resize(static_cast<size_t>(res.value()));
     return buffer;
 }

292-306: Add newline separators to entries for line-by-line reading.

The append method doesn't add newlines between entries, but the constructor reads the file line by line. This inconsistency will cause problems when loading persisted logs.

 [[nodiscard]] auto append(std::string entry) -> bool
 {
+    // Add newline to separate entries when reading line by line
+    std::string entry_with_newline = entry + '\n';
     if (!m_backendStorage.write(static_cast<const char *>(entry.data()),
                                 m_backendStorage.size(),
                                 entry.size()))
     {
         spdlog::error(
             "Persistent log storage write failed. Entry={}, size={}\n",
             entry,
             entry.size());
         return false;
     }
     m_inMemoryLog.emplace_back(std::move(entry));
     return true;
 }

294-296: Fix offset parameter in write operation.

The current implementation is using m_backendStorage.size() as the offset for writing, which may not be appropriate for appending log entries. This parameter should represent where in the file to write, and for an append-only log, it should typically be the end of the file.

-    if (!m_backendStorage.write(static_cast<const char *>(entry.data()),
-                                m_backendStorage.size(),
-                                entry.size()))
+    if (!m_backendStorage.write(static_cast<const char *>(entry.data()),
+                                0, // Use 0 for offset since append_only_file will handle appending
+                                entry.size()))

331-334: Inconsistency between size() and read() methods.

The size() method returns m_backendStorage.size() but read() uses m_inMemoryLog.size(). This inconsistency could lead to subtle bugs.

 [[nodiscard]] auto size() const -> std::size_t
 {
-    return m_backendStorage.size();
+    return m_inMemoryLog.size();
 }

254-268: Optimize file reading in constructor and add error handling.

The persistent_log_storage_t constructor reads the entire file content at once, which could be inefficient for large files. Consider reading in chunks. Additionally, there's no error handling if the read fails.

 explicit persistent_log_storage_t(TBackendStorage &&backendStorage)
     : m_backendStorage(std::move(backendStorage))
 {
-    const std::string raw =
-        m_backendStorage.read(0, m_backendStorage.size());
-    std::istringstream stream(raw);
-    for (std::string line; std::getline(stream, line);)
-    {
-        if (absl::StripAsciiWhitespace(line).empty())
-        {
-            continue;
-        }
-        m_inMemoryLog.emplace_back(std::move(line));
+    const size_t total_size = m_backendStorage.size();
+    if (total_size == 0) {
+        return;  // Empty file, nothing to load
+    }
+    
+    try {
+        // Read in chunks for efficiency with large files
+        constexpr size_t chunk_size = 4096;
+        size_t offset = 0;
+        std::string buffer;
+        std::string line_buffer;
+        
+        while (offset < total_size) {
+            size_t to_read = std::min(chunk_size, total_size - offset);
+            std::string chunk = m_backendStorage.read(offset, to_read);
+            offset += chunk.size();
+            
+            if (chunk.empty()) {
+                spdlog::warn("Read returned empty chunk at offset {}", offset);
+                break;
+            }
+            
+            buffer += chunk;
+            
+            // Process complete lines
+            std::istringstream stream(buffer);
+            while (std::getline(stream, line_buffer)) {
+                if (!absl::StripAsciiWhitespace(line_buffer).empty()) {
+                    m_inMemoryLog.emplace_back(std::move(line_buffer));
+                }
+                line_buffer.clear();
+            }
+            
+            // Keep any incomplete last line for the next iteration
+            buffer = stream.eof() ? "" : line_buffer;
+        }
+    } catch (const std::exception& e) {
+        spdlog::error("Error loading log entries: {}", e.what());
     }
 }
lib/wal/wal.h (2)

78-88: Double-check copy/move semantics in a concurrent environment.

Allowing both copy and move semantics for WAL could lead to resource duplication or unexpected log states if concurrency is involved. Consider marking copy operations as deleted or making them explicit if single ownership is desired.


111-112: Mismatch between documentation and implementation in reset().

The docstring says a std::runtime_error is thrown if reopening the log fails, but the implementation just returns a bool. Ensure the implementation either throws on failure or update the documentation to reflect the actual behavior.

lib/raft/raft.cpp (3)

272-273: Implement state machine updates.

There are TODO comments indicating that the state machine updates are missing. This is crucial for maintaining consistency in a Raft implementation, as committed log entries need to be applied to the state machine.

 while (m_lastApplied < m_commitIndex)
 {
     ++m_lastApplied;
-    // TODO(lnikon): Update the state machine!
+    // Apply the log entry to the state machine
+    if (m_lastApplied > 0 && m_lastApplied <= m_log.size()) {
+        const auto& entry = m_log[m_lastApplied - 1];
+        applyToStateMachine(entry);
+    }
 }

You would need to add a method to apply log entries to the state machine:

void consensus_module_t::applyToStateMachine(const LogEntry& entry) {
    // Update key-value store or other state based on the log entry
    spdlog::info("Applying log entry: key={}, value={}", entry.key(), entry.value());
    // Actual state machine update would go here
}

571-627: Avoid spawning a new thread for every append.

Each sendAppendEntriesRPC operation potentially spawns fresh threads, which can severely degrade performance under high load. Consider using a thread pool or a limited worker model to manage concurrency more efficiently.

Consider implementing a thread pool:

class ThreadPool {
private:
    std::vector<std::thread> workers;
    std::queue<std::function<void()>> tasks;
    std::mutex queue_mutex;
    std::condition_variable condition;
    bool stop;

public:
    ThreadPool(size_t num_threads) : stop(false) {
        for (size_t i = 0; i < num_threads; ++i) {
            workers.emplace_back([this] {
                while (true) {
                    std::function<void()> task;
                    {
                        std::unique_lock<std::mutex> lock(queue_mutex);
                        condition.wait(lock, [this] { return stop || !tasks.empty(); });
                        if (stop && tasks.empty()) return;
                        task = std::move(tasks.front());
                        tasks.pop();
                    }
                    task();
                }
            });
        }
    }

    template<class F>
    void enqueue(F&& f) {
        {
            std::unique_lock<std::mutex> lock(queue_mutex);
            tasks.emplace(std::forward<F>(f));
        }
        condition.notify_one();
    }

    ~ThreadPool() {
        {
            std::unique_lock<std::mutex> lock(queue_mutex);
            stop = true;
        }
        condition.notify_all();
        for (std::thread &worker : workers) {
            worker.join();
        }
    }
};

Then update the sendAppendEntriesRPC method to use this thread pool instead of spawning new threads for each operation.


893-963: Add fsync for log reliability.

While the log is flushed to disk using flush(), it is not explicitly synced with fsync or an equivalent system call. For true Raft safety, this is essential to ensure durability in case of system crashes.

 fsa.flush();
+
+// Ensure data is durably stored on disk
+#ifdef _WIN32
+    _commit(_fileno(fsa));
+#else
+    if (fsync(fileno(fsa)) != 0) {
+        spdlog::error("Node={} fsync failed for file {}: {}", 
+                      m_config.m_id, 
+                      path.c_str(),
+                      strerror(errno));
+        return false;
+    }
+#endif

You would need to add the appropriate includes:

#ifdef _WIN32
#include <io.h>
#else
#include <unistd.h>
#include <errno.h>
#include <string.h>
#endif
🧹 Nitpick comments (26)
.gitignore (1)

34-48: Suggestion: Expand .gitignore to cover OS, IDE, CMake & Conan artifacts
The current minimal ignore list omits several common files generated by the new CMake-based build, Conan, and developer IDEs—e.g., VSCode settings, macOS .DS_Store, CMake cache/folders, and Conan info files. Consider adding these patterns to prevent accidental commits of generated artifacts.

Apply this diff to extend the ignore rules:

@@ -48,0 +49,17
+ # Editor directories
+ .vscode/
+
+ # macOS & Windows system files
+ .DS_Store
+ Thumbs.db
+
+ # CMake build artifacts
+ CMakeCache.txt
+ CMakeFiles/
+ cmake_install.cmake
+ Makefile
+
+ # Conan generated files
+ conanbuildinfo.cmake
+ conanbuildinfo.txt
+ conaninfo.txt
+
+ # Coverage reports
+ *.gcno
+ *.gcda
lib/posix_wrapper/posix_wrapper_test.cpp (2)

7-18: Comments are helpful but could be more consistent

The comment provides useful documentation on the enum-to-flag mapping, which helps developers understand the test context. Consider structuring it as a more formal comment block with consistent formatting.

-// Mapping between enum and POSIX open flags
-// kReadOnly = O_RDONLY,
-// kWriteOnly = O_WRONLY,
-// kReadWrite = O_RDWR,
-// kAppend = O_APPEND,
-// kCreate = O_CREAT,
-// kTruncate = O_TRUNC,
-// kExclusive = O_EXCL,
-// kNonBlock = O_NONBLOCK,
-// kSync = O_SYNC,
-// kDirect = O_DIRECT,
+/**
+ * Mapping between enum and POSIX open flags:
+ * - kReadOnly  → O_RDONLY
+ * - kWriteOnly → O_WRONLY
+ * - kReadWrite → O_RDWR
+ * - kAppend    → O_APPEND
+ * - kCreate    → O_CREAT
+ * - kTruncate  → O_TRUNC
+ * - kExclusive → O_EXCL
+ * - kNonBlock  → O_NONBLOCK
+ * - kSync      → O_SYNC
+ * - kDirect    → O_DIRECT
+ */

19-31: Test implementation is comprehensive and well-structured

The test case properly verifies that each enum value is correctly mapped to its corresponding POSIX flag. All important flag types are covered, providing thorough validation of the to_native function.

Consider adding tests for combined flags to ensure they work correctly when used together, which is common in real-world scenarios.

TEST(PosixWrapperTest, CombinedOpenFlagsToNative)
{
  // Test common combinations
  EXPECT_EQ(pw::to_native(pw::open_flag_k::kCreate | pw::open_flag_k::kReadWrite), 
            O_CREAT | O_RDWR);
  EXPECT_EQ(pw::to_native(pw::open_flag_k::kCreate | pw::open_flag_k::kTruncate | pw::open_flag_k::kWriteOnly), 
            O_CREAT | O_TRUNC | O_WRONLY);
  // More combinations as needed
}
lib/fs/common.h (1)

11-21: Consider using consistent naming conventions.

The constants use kPascalCase format while the error types use snake_case_t/k format. Consider adopting a consistent naming convention throughout the codebase for better readability and maintainability.

-static constexpr std::size_t kBufferSize{4096ULL};
-static constexpr int kDefaultFilePermissions = 0644;
-static constexpr int kIOUringQueueEntries = 128;
+static constexpr std::size_t buffer_size_k{4096ULL};
+static constexpr int default_file_permissions_k = 0644;
+static constexpr int io_uring_queue_entries_k = 128;

Or alternatively:

-enum class file_error_code_k : int8_t
-struct file_error_t
+enum class FileErrorCode : int8_t
+struct FileError
lib/raft/CMakeLists.txt (1)

2-2: Remove redundant project() call in subdirectory.

The project() call is typically only needed in the root CMakeLists.txt. In subdirectories, it's usually redundant and can potentially cause issues.

-project(frankie)
lib/fs/common.cpp (4)

8-11: Remove redundant namespace qualification

The method is already defined within the fs namespace, so the fs:: prefix on file_error_t is redundant.

-[[nodiscard]] auto fs::file_error_t::has_error() const noexcept -> bool
+[[nodiscard]] auto file_error_t::has_error() const noexcept -> bool
{
    return code != file_error_code_k::none;
}

13-16: Remove redundant namespace qualification in from_errno method

Similar to the previous comment, fs:: is redundant in the method definition.

-auto fs::file_error_t::from_errno(file_error_code_k code, int err, const char *context) noexcept -> file_error_t
+auto file_error_t::from_errno(file_error_code_k code, int err, const char *context) noexcept -> file_error_t
{
    return {.code = code, .system_errno = err, .message = std::string(context) + ": " + strerror(err)};
}

17-20: Remove redundant namespace qualification in success method

The fs:: prefix is redundant in the method definition.

-auto fs::file_error_t::success() noexcept -> file_error_t
+auto file_error_t::success() noexcept -> file_error_t
{
    return file_error_t{};
}

15-15: Consider using string formatting for more readable error messages

The string concatenation for error messages could be improved by using a string formatting library (like fmt) for better readability.

-    return {.code = code, .system_errno = err, .message = std::string(context) + ": " + strerror(err)};
+    // Using fmt library (requires header inclusion)
+    return {.code = code, .system_errno = err, .message = fmt::format("{}: {}", context, strerror(err))};
lib/db/db_config.cpp (1)

19-37: Consider using switch statement for from_string function

The current implementation using separate if statements is functional but less efficient than a switch statement would be, especially for string comparisons.

A better approach might be to use a map or unordered_map for lookup, or at minimum, use if-else if structure:

auto db::from_string(const std::string_view mode) noexcept -> db_mode_t
{
-    if (mode == db::EMBEDDED_STR_VIEW)
-    {
-        return db_mode_t::kEmbedded;
-    }
-
-    if (mode == db::STANDALONE_STR_VIEW)
-    {
-        return db_mode_t::kStandalone;
-    }
-
-    if (mode == db::REPLICATED_STR_VIEW)
-    {
-        return db_mode_t::kReplicated;
-    }
+    if (mode == db::EMBEDDED_STR_VIEW) {
+        return db_mode_t::kEmbedded;
+    } else if (mode == db::STANDALONE_STR_VIEW) {
+        return db_mode_t::kStandalone;
+    } else if (mode == db::REPLICATED_STR_VIEW) {
+        return db_mode_t::kReplicated;
+    }

    PANIC("provided database mode string is not supported", mode);
}

Or using a static map (more efficient for repeated lookups):

auto db::from_string(const std::string_view mode) noexcept -> db_mode_t
{
    static const std::unordered_map<std::string_view, db_mode_t> mode_map = {
        {EMBEDDED_STR_VIEW, db_mode_t::kEmbedded},
        {STANDALONE_STR_VIEW, db_mode_t::kStandalone},
        {REPLICATED_STR_VIEW, db_mode_t::kReplicated}
    };
    
    auto it = mode_map.find(mode);
    if (it != mode_map.end()) {
        return it->second;
    }
    
    PANIC("provided database mode string is not supported", mode);
}
lib/db/db_config.h (1)

13-15: Consider using a string literal type for string constants

Using std::string_view with a string literal creates an implicit conversion. Using const char* or C++17's std::string_view_literals would be more direct.

-static constexpr const std::string_view EMBEDDED_STR_VIEW{"embedded"};
-static constexpr const std::string_view STANDALONE_STR_VIEW{"standalone"};
-static constexpr const std::string_view REPLICATED_STR_VIEW{"replicated"};
+static constexpr std::string_view EMBEDDED_STR_VIEW = "embedded";
+static constexpr std::string_view STANDALONE_STR_VIEW = "standalone";
+static constexpr std::string_view REPLICATED_STR_VIEW = "replicated";

Or with string_view literals:

using namespace std::string_view_literals;
static constexpr auto EMBEDDED_STR_VIEW = "embedded"sv;
static constexpr auto STANDALONE_STR_VIEW = "standalone"sv;
static constexpr auto REPLICATED_STR_VIEW = "replicated"sv;
lib/fs/random_access_file_test.cpp (2)

125-127: Consider testing with larger and more diverse offsets.

The current test uses a limited set of offsets (0, 7, 4096, 8192). Consider adding tests with larger offsets and perhaps some that aren't aligned to typical page boundaries to ensure robustness across a wider range of scenarios.


6-15: Consider adding tests for concurrent operations and error handling.

While the test suite is comprehensive for basic functionality, consider adding:

  1. Tests for concurrent I/O operations, which would be relevant for an io_uring-based implementation
  2. Tests that simulate io_uring operation failures
  3. Tests with direct I/O enabled, which has stricter alignment requirements

These would help ensure the implementation is robust under various conditions.

lib/raft/persistence.h (1)

21-25: Consider using std::expected for error handling.

The methods currently return bool, which provides limited information about failure reasons. Given that other parts of the codebase use std::expected for error handling (e.g., in the filesystem code), consider adopting the same pattern here:

[[nodiscard]] auto initialize() -> std::expected<void, persistence_error_t>;
[[nodiscard]] auto update(...) -> std::expected<void, persistence_error_t>;
// etc.

This would provide more detailed error information.

lib/raft/replicated_log.h (2)

25-30: Incomplete state cleanup in move constructor.

The move constructor clears the consensus module pointer in the source object but not the log variant. This inconsistency could cause issues if the moved-from object is later used.

replicated_log_t(replicated_log_t &&other) noexcept
    : m_log{std::move(other.m_log)},
      m_pConsensusModule{std::move(other.m_pConsensusModule)}
{
    other.m_pConsensusModule = nullptr;
+    // Also consider resetting or clearing the moved-from log variant
+    // depending on its implementation details
}

94-94: Redundant initialization of member variable.

The member variable m_pConsensusModule is initialized to nullptr in line 94, but this initialization is redundant given that the constructor initializes it with the passed parameter. Consider removing the explicit initialization:

private:
    wal::log::log_variant_t                   m_log;
-    std::shared_ptr<raft::consensus_module_t> m_pConsensusModule{nullptr};
+    std::shared_ptr<raft::consensus_module_t> m_pConsensusModule;
lib/fs/append_only_file.cpp (1)

85-104: Update documentation to match implementation.

The class documentation mentions io_uring initialization, but the implementation delegates all I/O to random_access_file_t and doesn't directly interact with io_uring. Update the documentation to accurately reflect the current implementation.

lib/fs/random_access_file.h (1)

57-58: Prefer passing path by const reference to avoid an unnecessary copy

fs::path_t is typically a relatively heavy object (internally holds a std::string).
Because build() never modifies the argument, the signature can be changed to const fs::path_t& to save one move/copy and a heap allocation in hot paths.

-  auto build(fs::path_t path, posix_wrapper::open_flag_k openFlags)
+  auto build(const fs::path_t& path, posix_wrapper::open_flag_k openFlags)
lib/wal/log/in_memory_log_storage.h (2)

19-22: Parameter name shadows the data member – consider renaming to avoid confusion

explicit in_memory_log_storage_t(std::vector<std::string> m_log)
    : m_log(std::move(m_log)) {}

m_log (parameter) hides m_log (member) so the reader has to parse twice to realise the left‑hand side is the member and the right‑hand side is the parameter.
A small rename improves readability and prevents accidental self‑assignment errors.

-explicit in_memory_log_storage_t(std::vector<std::string> m_log)
-    : m_log(std::move(m_log))
+explicit in_memory_log_storage_t(std::vector<std::string> log)
+    : m_log(std::move(log))

12-13: Replace resize() idea with reserve() to avoid length side‑effects

The TODO suggests pre‑allocating the vector in the ctor.
resize() will create size() empty strings and change size().
reserve() keeps size()==0 but prepares capacity, which is what you want for an append‑only log.

m_log.reserve(default_capacity);

A reasonable default could be derived from expected entry count or configured via the builder.

src/main.cpp (2)

439-447: Incorrect error message for missing peers field

The exception message still mentions "id" instead of "peers", which will mislead operators.

-        throw std::runtime_error("\"id\" is not specified in the config");
+        throw std::runtime_error("\"peers\" is not specified in the config");

642-644: Minor typo in log message

positvepositive.

- spdlog::error("ID of the node should be positve integer");
+ spdlog::error("ID of the node should be positive integer");
lib/fs/append_only_file.h (1)

15-16: Avoid using namespace in public headers

using namespace random_access_file; leaks the entire sub‑namespace into every translation unit that includes this header, increasing the risk of symbol collisions and ODR issues. Prefer a qualified alias or direct qualification.

-using namespace random_access_file;
+// avoid namespace pollution
+namespace raf = random_access_file;
lib/wal/wal.h (1)

223-223: Replace placeholder in debug log.

The debug log contains a placeholder "FILL_ME" rather than actual content about the WAL entry.

-    spdlog::debug("Added new WAL entry {}", "FILL_ME");
+    spdlog::debug("Added new WAL entry: op={}, key={}, value={}",
+                  op_view, rec.kv.m_key.m_key, rec.kv.m_value.m_value);
lib/raft/raft.cpp (2)

126-140: Pass nodeConfig by const reference for improved performance.

The nodeConfig parameter is unnecessarily passed by value when it's only moved into a member variable, which causes an extra copy.

consensus_module_t::consensus_module_t(
-   node_config_t                        nodeConfig,
+   const node_config_t&                 nodeConfig,
    std::vector<raft_node_grpc_client_t> replicas) noexcept
    : m_config{std::move(nodeConfig)},
🧰 Tools
🪛 Cppcheck (2.10-2)

[performance] 126-126: Function parameter 'nodeConfig' should be passed by const reference.

(passedByValue)


635-635: Remove debug logging placeholder.

The string "VAGAG" appears to be a placeholder or leftover debugging text that should be replaced with a more descriptive message or removed.

-    spdlog::debug("[VAGAG] response.success");
+    spdlog::debug("Successfully received AppendEntries response from peer={}", client.id());
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f8f8194 and 6c344eb.

📒 Files selected for processing (39)
  • .clang-format (1 hunks)
  • .gitignore (1 hunks)
  • CMakeLists.txt (3 hunks)
  • assets/database_config_schema.json (1 hunks)
  • assets/tkvpp_config_1.json (2 hunks)
  • assets/tkvpp_config_2.json (1 hunks)
  • assets/tkvpp_config_3.json (1 hunks)
  • conanfile.txt (1 hunks)
  • lib/CMakeLists.txt (2 hunks)
  • lib/config/CMakeLists.txt (1 hunks)
  • lib/db/CMakeLists.txt (2 hunks)
  • lib/db/db_config.cpp (1 hunks)
  • lib/db/db_config.h (1 hunks)
  • lib/fs/CMakeLists.txt (1 hunks)
  • lib/fs/append_only_file.cpp (1 hunks)
  • lib/fs/append_only_file.h (1 hunks)
  • lib/fs/common.cpp (1 hunks)
  • lib/fs/common.h (1 hunks)
  • lib/fs/random_access_file.cpp (1 hunks)
  • lib/fs/random_access_file.h (1 hunks)
  • lib/fs/random_access_file_test.cpp (1 hunks)
  • lib/posix_wrapper/CMakeLists.txt (1 hunks)
  • lib/posix_wrapper/open_flag.h (1 hunks)
  • lib/posix_wrapper/posix_wrapper_test.cpp (1 hunks)
  • lib/raft/CMakeLists.txt (1 hunks)
  • lib/raft/persistence.h (1 hunks)
  • lib/raft/raft.cpp (1 hunks)
  • lib/raft/raft.h (1 hunks)
  • lib/raft/replicated_log.h (1 hunks)
  • lib/structures/lsmtree/CMakeLists.txt (1 hunks)
  • lib/wal/CMakeLists.txt (1 hunks)
  • lib/wal/common.h (1 hunks)
  • lib/wal/log/common.h (1 hunks)
  • lib/wal/log/concepts.h (1 hunks)
  • lib/wal/log/in_memory_log_storage.h (1 hunks)
  • lib/wal/log/log.h (1 hunks)
  • lib/wal/log/persistent_log_storage.h (1 hunks)
  • lib/wal/wal.h (1 hunks)
  • src/main.cpp (13 hunks)
✅ Files skipped from review due to trivial changes (2)
  • lib/posix_wrapper/CMakeLists.txt
  • lib/posix_wrapper/open_flag.h
🚧 Files skipped from review as they are similar to previous changes (12)
  • .clang-format
  • lib/config/CMakeLists.txt
  • lib/CMakeLists.txt
  • lib/db/CMakeLists.txt
  • CMakeLists.txt
  • assets/tkvpp_config_1.json
  • assets/tkvpp_config_3.json
  • lib/wal/log/concepts.h
  • lib/wal/CMakeLists.txt
  • lib/fs/CMakeLists.txt
  • lib/wal/common.h
  • assets/tkvpp_config_2.json
🧰 Additional context used
🧬 Code Graph Analysis (7)
lib/posix_wrapper/posix_wrapper_test.cpp (1)
lib/posix_wrapper/open_flag.h (1)
  • to_native (57-57)
lib/raft/persistence.h (4)
lib/fs/common.cpp (1)
  • nodiscard (8-11)
lib/raft/replicated_log.h (6)
  • nodiscard (52-55)
  • nodiscard (57-75)
  • nodiscard (77-80)
  • nodiscard (82-85)
  • nodiscard (87-90)
  • nodiscard (103-109)
lib/raft/raft.h (1)
  • commitIndex (166-167)
lib/raft/raft.cpp (2)
  • votedFor (451-455)
  • votedFor (451-451)
lib/raft/replicated_log.h (4)
lib/wal/log/log.h (9)
  • nodiscard (58-61)
  • nodiscard (63-68)
  • nodiscard (70-74)
  • nodiscard (76-79)
  • nodiscard (81-84)
  • nodiscard (141-166)
  • entry (58-58)
  • command (63-65)
  • index (70-71)
lib/wal/log/in_memory_log_storage.h (7)
  • nodiscard (48-52)
  • nodiscard (54-59)
  • nodiscard (61-68)
  • nodiscard (70-74)
  • entry (48-48)
  • command (55-55)
  • index (61-61)
lib/wal/log/persistent_log_storage.h (3)
  • entry (292-292)
  • command (309-309)
  • index (314-315)
lib/raft/raft.h (2)
  • logEntry (115-115)
  • index (153-153)
lib/fs/random_access_file.h (3)
lib/fs/random_access_file.cpp (5)
  • random_access_file_t (16-20)
  • random_access_file_t (22-35)
  • random_access_file_t (54-65)
  • size (120-133)
  • size (120-121)
lib/fs/common.cpp (1)
  • nodiscard (8-11)
lib/posix_wrapper/open_flag.h (5)
  • nodiscard (30-35)
  • nodiscard (37-42)
  • nodiscard (44-49)
  • nodiscard (51-55)
  • nodiscard (57-60)
lib/wal/log/in_memory_log_storage.h (2)
lib/wal/log/log.h (11)
  • other (46-51)
  • other (46-46)
  • nodiscard (58-61)
  • nodiscard (63-68)
  • nodiscard (70-74)
  • nodiscard (76-79)
  • nodiscard (81-84)
  • nodiscard (141-166)
  • entry (58-58)
  • command (63-65)
  • index (70-71)
lib/wal/log/persistent_log_storage.h (17)
  • other (130-138)
  • other (130-131)
  • other (279-285)
  • other (279-280)
  • nodiscard (43-47)
  • nodiscard (49-52)
  • nodiscard (54-57)
  • nodiscard (59-62)
  • nodiscard (84-92)
  • nodiscard (94-97)
  • nodiscard (150-156)
  • nodiscard (158-173)
  • nodiscard (175-178)
  • nodiscard (180-183)
  • entry (292-292)
  • command (309-309)
  • index (314-315)
lib/wal/log/log.h (3)
lib/wal/log/in_memory_log_storage.h (11)
  • other (29-40)
  • other (29-30)
  • nodiscard (48-52)
  • nodiscard (54-59)
  • nodiscard (61-68)
  • nodiscard (70-74)
  • nodiscard (76-79)
  • nodiscard (96-99)
  • entry (48-48)
  • command (55-55)
  • index (61-61)
lib/wal/wal.h (7)
  • other (86-86)
  • other (87-87)
  • other (193-201)
  • other (193-193)
  • other (204-212)
  • other (204-204)
  • is_same_v (316-349)
lib/wal/log/persistent_log_storage.h (17)
  • other (130-138)
  • other (130-131)
  • other (279-285)
  • other (279-280)
  • nodiscard (43-47)
  • nodiscard (49-52)
  • nodiscard (54-57)
  • nodiscard (59-62)
  • nodiscard (84-92)
  • nodiscard (94-97)
  • nodiscard (150-156)
  • nodiscard (158-173)
  • nodiscard (175-178)
  • nodiscard (180-183)
  • entry (292-292)
  • command (309-309)
  • index (314-315)
lib/raft/raft.h (3)
lib/raft/raft.cpp (4)
  • raft_node_grpc_client_t (57-64)
  • consensus_module_t (125-147)
  • votedFor (451-455)
  • votedFor (451-451)
lib/raft/replicated_log.h (7)
  • nodiscard (52-55)
  • nodiscard (57-75)
  • nodiscard (77-80)
  • nodiscard (82-85)
  • nodiscard (87-90)
  • nodiscard (103-109)
  • index (77-77)
lib/raft/persistence.h (1)
  • commitIndex (22-23)
🪛 Cppcheck (2.10-2)
lib/raft/raft.cpp

[performance] 126-126: Function parameter 'nodeConfig' should be passed by const reference.

(passedByValue)

⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: build-clang
  • GitHub Check: build-gcc
🔇 Additional comments (20)
lib/posix_wrapper/posix_wrapper_test.cpp (3)

1-4: Includes and header order look good

The includes are properly ordered with external dependencies (gtest) first, followed by project headers. This follows standard C++ best practices.


5-6: Good use of namespace alias

Using a namespace alias improves readability by shortening the code while still making it clear which namespace the symbols belong to.


1-32:

✅ Verification successful

Consider adding test for error handling

The current test set only verifies valid enum values. Depending on the implementation of to_native, it might be valuable to test behavior with invalid values (if the enum allows it) or handling of potentially undefined combinations.


🏁 Script executed:

#!/bin/bash
# Check if open_flag.h defines any handling for invalid values or combinations
fd open_flag.h --exec cat {} | grep -A 15 "to_native"

Length of output: 214


No validation in to_native; invalid‑value tests are redundant
to_native is a constexpr static_cast of the enum’s underlying type and contains no branching or error paths to exercise. Adding tests for “invalid” enum values would only verify that the cast returns the raw integer.

If you intend to support combining flags, consider:

  • Implementing bitwise operators on open_flag_k (e.g. operator|)
  • Adding a test such as
    EXPECT_EQ(
      pw::to_native(pw::open_flag_k::kReadOnly | pw::open_flag_k::kNonBlock),
      O_RDONLY | O_NONBLOCK
    );

Otherwise, the existing tests fully cover the behavior of to_native.

lib/fs/common.h (2)

1-51: Good foundation for file operations error handling.

This file provides a well-structured foundation for file operations with io_uring. The constants, error codes, and error handling structure are well thought out.


43-48:

❌ Incorrect review comment

Methods are declaration-only; ensure proper implementation.

These utility methods are declared but not implemented in this header. Ensure they're implemented in a corresponding .cpp file or as inline functions in the header, and that they handle all error cases robustly.


🏁 Script executed:

#!/bin/bash
# Search for the implementation of file_error_t methods
fd -e cpp -e h "file_error_t::" lib/fs/

Length of output: 39


I wasn’t able to find any definitions for these methods under lib/fs. Let’s search the entire repo for their implementations:

Please let me know what these commands return.


🏁 Script executed:

#!/bin/bash
# Search for method definitions across all source files
rg -n "file_error_t::" .
rg -n "has_error\s*\(" .
rg -n "success\s*\(" .
rg -n "from_errno\s*\(" .

Length of output: 1815


Implementations already provided for file_error_t methods

I located definitions for all three methods in lib/fs/common.cpp, so no additional implementations are needed:

  • fs::file_error_t::has_error() at lib/fs/common.cpp:8
  • fs::file_error_t::from_errno(...) at lib/fs/common.cpp:13
  • fs::file_error_t::success() at lib/fs/common.cpp:17

Likely an incorrect or invalid review comment.

lib/structures/lsmtree/CMakeLists.txt (1)

18-18: LGTM - Dependencies aligned with project-wide additions.

The addition of libassert::assert and magic_enum::magic_enum as public dependencies aligns with the project-wide integration of these libraries.

conanfile.txt (1)

14-17:

✅ Verification successful

LGTM - Dependencies align with new Raft implementation needs.

The new dependencies support different aspects of the Raft implementation:

  • gtest for unit testing the Raft protocol
  • prometheus-cpp for monitoring the distributed system
  • libassert and magic_enum for better assertions and enum handling

🏁 Script executed:

#!/bin/bash
# Check if the newly added dependencies are used in the project
echo "Checking for gtest usage:"
rg -l "gtest" --type cpp

echo -e "\nChecking for prometheus usage:"
rg -l "prometheus" --type cpp

echo -e "\nChecking for libassert usage:"
rg -l "assert\(" --type cpp

echo -e "\nChecking for magic_enum usage:"
rg -l "magic_enum" --type cpp

Length of output: 995


✅ Dependencies Verified and Approved

All four newly added dependencies are actively used in the codebase:

  • gtest/1.15.0: referenced in
    • lib/posix_wrapper/posix_wrapper_test.cpp
    • lib/fs/random_access_file_test.cpp
    • lib/raft/raft_test.cpp
  • prometheus-cpp/1.3.0: referenced in
    • examples/raft/main.cpp
    • examples/prometheus/main.cpp
  • libassert/2.1.4: assertion calls found in multiple headers/sources under lib/
  • magic_enum/0.9.7: used in lib/wal/wal.h

These dependencies align with the Raft implementation’s testing, monitoring, assertion, and enum‑reflection needs. No further action required.

lib/raft/CMakeLists.txt (1)

4-6: LGTM - Well-structured library definition.

The RaftLib target is well-defined with appropriate dependencies for a Raft consensus implementation, including Write-Ahead Logging (WAL) and Protocol Buffers objects.

lib/wal/log/common.h (1)

11-13: Well-designed variant type for storage abstraction

The variant approach for log storage is a good design choice. It allows for a unified interface to different storage implementations while maintaining type safety.

lib/db/db_config.cpp (1)

4-17: Good implementation of to_string function

The function implementation is correct and handles all enum cases with a proper default case that asserts on unexpected values.

lib/db/db_config.h (2)

17-22: Good enum definition with explicit values

The db_mode_t enum is well-defined with explicit values and a clear naming convention.


24-26: Good use of [[nodiscard]] and noexcept

The function declarations properly use [[nodiscard]] to ensure return values are not ignored and noexcept to indicate no exceptions will be thrown.

lib/fs/random_access_file_test.cpp (4)

1-5: Well-structured and comprehensive test suite.

The test suite is well-organized with good separation of concerns between helper utilities, fixtures, and test cases. The comment at the top also provides clear compilation instructions.


24-36: Good implementation of temporary file creation.

The make_temp_file_path() function properly creates a unique temporary file using mkstemp() and correctly closes the file descriptor after creation, which prevents resource leaks.


57-79: Well-designed test fixture.

The test fixture appropriately manages resources by creating a temporary file before each test and cleaning it up afterward. The explicit destruction of the file object in TearDown() before unlinking the file ensures proper resource cleanup.


195-214: Good negative test for moved-from objects.

Testing that operations on moved-from objects fail gracefully rather than causing undefined behavior is excellent practice.

lib/raft/persistence.h (1)

8-20: Clarify the implementation strategy for this class.

This class cannot be instantiated (due to deleted default constructor) but doesn't define any constructors either. Additionally, the methods aren't virtual, suggesting this isn't meant to be a base class for derived implementations.

Consider either:

  1. Making this a pure abstract interface with virtual methods if different persistence implementations are intended
  2. Providing a constructor and implementing the methods if only one implementation is needed
  3. Making this a namespace with free functions if no state needs to be maintained

Without this clarification, it's unclear how to use this class in the Raft implementation.

lib/raft/replicated_log.h (1)

97-98: Good use of static_assert to verify concept compliance.

Using static_assert to verify that the class satisfies the TLogStorageConcept is excellent practice, ensuring interface compliance at compile time.

lib/fs/append_only_file.cpp (1)

105-122: Well-implemented builder pattern.

The builder pattern implementation is clean and properly handles both direct I/O and error propagation. Great job on using the posix_wrapper flags for clarity and type safety.

src/main.cpp (1)

548-560: std::filesystem::exists used without including <filesystem>

exists(walPath) is unqualified and there is no #include <filesystem> in the header list.
Include it or qualify with std::filesystem::exists to avoid ADL surprises.

Comment on lines 18 to 20
[[nodiscard]] auto build() noexcept -> log_storage_variant_t
{
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Implementation for build() method is missing

The build() method of log_storage_builder_t is declared to return a log_storage_variant_t but the method body is empty. This will result in undefined behavior as it doesn't return any value.

Add an implementation to the method body that constructs and returns a valid variant instance, e.g.:

[[nodiscard]] auto build() noexcept -> log_storage_variant_t
{
+    // Default to in-memory storage or based on some configuration
+    return wal::log::in_memory_log_storage_t{};
}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
[[nodiscard]] auto build() noexcept -> log_storage_variant_t
{
}
[[nodiscard]] auto build() noexcept -> log_storage_variant_t
{
// Default to in-memory storage or based on some configuration
return wal::log::in_memory_log_storage_t{};
}

Comment on lines 15 to 21
class log_storage_builder_t final
{
public:
[[nodiscard]] auto build() noexcept -> log_storage_variant_t
{
}
};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Builder class lacks configuration options

The log_storage_builder_t class doesn't have any constructor parameters, member variables, or configuration methods, making it impossible to specify which type of storage should be built.

Consider adding configuration methods to specify the desired storage type:

class log_storage_builder_t final
{
  private:
+    bool m_use_persistent_storage{false};
+    std::string m_file_path{};

  public:
+    auto use_persistent_storage(std::string file_path) noexcept -> log_storage_builder_t&
+    {
+        m_use_persistent_storage = true;
+        m_file_path = std::move(file_path);
+        return *this;
+    }
+
+    auto use_in_memory_storage() noexcept -> log_storage_builder_t&
+    {
+        m_use_persistent_storage = false;
+        return *this;
+    }

    [[nodiscard]] auto build() noexcept -> log_storage_variant_t
    {
+        if (m_use_persistent_storage) {
+            return wal::log::persistent_log_storage_t<wal::log::file_storage_backend_t>{m_file_path};
+        }
+        return wal::log::in_memory_log_storage_t{};
    }
};
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
class log_storage_builder_t final
{
public:
[[nodiscard]] auto build() noexcept -> log_storage_variant_t
{
}
};
class log_storage_builder_t final
{
private:
bool m_use_persistent_storage{false};
std::string m_file_path{};
public:
auto use_persistent_storage(std::string file_path) noexcept -> log_storage_builder_t&
{
m_use_persistent_storage = true;
m_file_path = std::move(file_path);
return *this;
}
auto use_in_memory_storage() noexcept -> log_storage_builder_t&
{
m_use_persistent_storage = false;
return *this;
}
[[nodiscard]] auto build() noexcept -> log_storage_variant_t
{
if (m_use_persistent_storage) {
return wal::log::persistent_log_storage_t<wal::log::file_storage_backend_t>{m_file_path};
}
return wal::log::in_memory_log_storage_t{};
}
};

struct db_config_t
{
fs::path_t DatabasePath{"."};
std::string WalFilename{"wal"};
std::string ManifestFilenamePrefix{"manifest_"};
db_mode_t mode;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add default value for mode field

The mode field in db_config_t struct doesn't have a default value, which could lead to uninitialized values. Consider adding a default value.

-    db_mode_t   mode;
+    db_mode_t   mode{db_mode_t::kStandalone};
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
db_mode_t mode;
db_mode_t mode{db_mode_t::kStandalone};

Comment on lines 8 to 26
class persistence_t
{
public:
persistence_t() = delete;

persistence_t(const persistence_t &) = delete;
auto operator=(const persistence_t &) -> persistence_t & = delete;

persistence_t(persistence_t &&) = delete;
auto operator=(const persistence_t &&) -> persistence_t & = delete;

~persistence_t() = default;

[[nodiscard]] auto initialize() -> bool;
[[nodiscard]] auto update(std::optional<std::uint32_t> commitIndex,
std::optional<std::uint32_t> votedFor) -> bool;
[[nodiscard]] auto flush() -> bool;
[[nodiscard]] auto restore() -> bool;
};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add documentation explaining persistence behavior and implementation strategy.

This class lacks documentation explaining:

  1. What Raft state is being persisted
  2. The expected behavior of each method
  3. What persistence mechanism should be used
  4. Error conditions that would cause the methods to return false

Add Doxygen-style comments to clarify these aspects, making it easier for implementers to understand the contract.

auto operator=(const persistence_t &) -> persistence_t & = delete;

persistence_t(persistence_t &&) = delete;
auto operator=(const persistence_t &&) -> persistence_t & = delete;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix the move assignment operator declaration.

The move assignment operator has an incorrect const qualifier on the rvalue reference parameter, which contradicts the purpose of move semantics:

-  auto operator=(const persistence_t &&) -> persistence_t & = delete;
+  auto operator=(persistence_t &&) -> persistence_t & = delete;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
auto operator=(const persistence_t &&) -> persistence_t & = delete;
// remove the erroneous const on the rvalue reference
auto operator=(persistence_t &&) -> persistence_t & = delete;

Comment on lines 37 to 52
auto random_access_file_t::operator=(random_access_file_t &&other) noexcept
-> random_access_file_t &
{
namespace random_access_file
if (this == &other)
{
return *this;
}

m_fd = other.m_fd;
m_ring = other.m_ring;

other.m_fd = -1;
other.m_ring = io_uring{};

return *this;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Resource leak in move‑assignment operator

Before overwriting m_fd/m_ring, the current object may still own a valid file
descriptor and io_uring instance. Overwriting them without closing
m_fd and calling io_uring_queue_exit() leaks kernel resources.

  if (this == &other)
  {
    return *this;
  }

+ // release current resources
+ if (m_fd != -1)
+ {
+   io_uring_queue_exit(&m_ring);
+   close(m_fd);
+ }
+
  m_fd   = other.m_fd;
  m_ring = other.m_ring;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
auto random_access_file_t::operator=(random_access_file_t &&other) noexcept
-> random_access_file_t &
{
namespace random_access_file
if (this == &other)
{
return *this;
}
m_fd = other.m_fd;
m_ring = other.m_ring;
other.m_fd = -1;
other.m_ring = io_uring{};
return *this;
}
auto random_access_file_t::operator=(random_access_file_t &&other) noexcept
-> random_access_file_t &
{
if (this == &other)
{
return *this;
}
// release current resources
if (m_fd != -1)
{
io_uring_queue_exit(&m_ring);
close(m_fd);
}
m_fd = other.m_fd;
m_ring = other.m_ring;
other.m_fd = -1;
other.m_ring = io_uring{};
return *this;
}

Comment on lines 46 to 51
auto operator=(log_t &&other) noexcept -> log_t &
{
using std::swap;
swap(*this, other);
return *this;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix infinite recursion in move‑assignment operator

operator= invokes std::swap(*this, other), and the fallback implementation of std::swap internally uses move‑assignment again, causing unbounded recursive calls (or a hard compiler error on some STL implementations).

-    auto operator=(log_t &&other) noexcept -> log_t &
-    {
-        using std::swap;
-        swap(*this, other);
-        return *this;
-    }
+    auto operator=(log_t &&other) noexcept -> log_t &
+    {
+        if (this != &other)
+        {
+            m_storage = std::move(other.m_storage);
+        }
+        return *this;
+    }

Alternatively, define an explicit swap(log_t&, log_t&) overload and call it, but avoid calling std::swap on *this inside the assignment operator.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
auto operator=(log_t &&other) noexcept -> log_t &
{
using std::swap;
swap(*this, other);
return *this;
}
auto operator=(log_t &&other) noexcept -> log_t &
{
if (this != &other)
{
m_storage = std::move(other.m_storage);
}
return *this;
}

lib/wal/wal.h Outdated
Comment on lines 147 to 150
auto add(record_t rec) noexcept -> void
{
std::visit([&](auto &wal) { return wal.add(std::move(rec)); }, m_wal);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add thread safety for concurrent operations.

The add method processes records through a std::visit without any synchronization. If multiple threads call this method simultaneously on the same WAL instance, data races or partial writes might occur. Consider adding thread safety mechanisms if concurrent usage is intended.

 auto add(record_t rec) noexcept -> void
 {
+    std::lock_guard<std::mutex> lock(m_mutex);
     std::visit([&](auto &wal) { return wal.add(std::move(rec)); }, m_wal);
 }

You would need to add a mutex member to the class:

private:
    wal_variant_t m_wal;
    std::mutex m_mutex;  // Protect concurrent access to the WAL

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🔭 Outside diff range comments (2)
lib/structures/memtable/memtable.cpp (2)

91-94: ⚠️ Potential issue

Incorrect implementation of operator>

The current implementation of operator> doesn't correctly implement the "greater than" relationship. Using !(m_key < record.m_key) is equivalent to "greater than or equal to" (>=), not strictly "greater than" (>).

auto memtable_t::record_t::operator>(const memtable_t::record_t &record) const -> bool
{
-    return !(m_key < record.m_key);
+    return record.m_key < m_key;
}

158-173: 🛠️ Refactor suggestion

Inefficient implementation of max() function

The current implementation of max() iterates through the entire container to find the last element, which is inefficient as noted in the TODO.

auto memtable_t::max() const noexcept -> std::optional<memtable_t::record_t::key_t>
{
-    storage_t::const_iterator beforeEnd{m_data.cbegin()};
-    auto                      idx{0};
-    for (auto begin{m_data.cbegin()}; begin != m_data.cend(); ++begin)
-    {
-        if (idx++ == 0)
-        {
-            continue;
-        }
-
-        beforeEnd = begin;
-    }
-    return m_data.size() > 0 ? std::make_optional(beforeEnd->m_key) : std::nullopt;
+    if (m_data.empty()) {
+        return std::nullopt;
+    }
+    
+    // Assuming m_data is a container with bidirectional or random access iterators
+    auto last = std::prev(m_data.cend());
+    return std::make_optional(last->m_key);
}
♻️ Duplicate comments (22)
lib/db/manifest/manifest.cpp (1)

61-68: ⚠️ Potential issue

Error handling regression – .value() will throw & open() still returns success

A previous review already pointed out that calling .value() on the builder result masks errors and can throw.
The current revision still:

m_log.emplace(std::move(fs::append_only_file_builder_t{}.build(m_path.c_str(), true).value()));
return true;   // <- unconditional success

This re‑introduces the silent‑failure path we discussed.
Please restore the explicit check and propagate/ log the failure as previously suggested.

lib/raft/raft_test.cpp (1)

132-133: Replace fixed sleep with deterministic synchronisation (duplicate)
The previous review already flagged this; flaky sleeps are still present.
Use condition‑variables or polling with timeout to wait for leader election.

src/main.cpp (5)

264-268: Typo breaks JSON parsing of database.mode (duplicate)
",pde" is an invalid key and will throw at runtime.

-            db::from_string(configJson["database"][",pde"].get<std::string>());
+            db::from_string(configJson["database"]["mode"].get<std::string>());

423-430: Wrong error message for missing peers (duplicate)
The exception text still says "id" instead of "peers".


617-632: Range‑for syntax is ill‑formed – won’t compile (duplicate)

for (raft::id_t replicaId{1}; const auto &replicaIp : pDbConfig->ServerConfig.peers)

Split the declaration and loop:

-for (raft::id_t replicaId{1}; const auto &replicaIp : pDbConfig->ServerConfig.peers)
+raft::id_t replicaId{1};
+for (const auto& replicaIp : pDbConfig->ServerConfig.peers)

Increment replicaId inside the loop as you already do.


652-653: Check dynamic_cast result before dereference (duplicate)
dynamic_cast<RaftService::Service *>(pConsensusModule.get()) may return nullptr; verify before RegisterService.


669-670: Busy‑wait burns CPU – add small sleep (duplicate)
Insert std::this_thread::sleep_for(100ms); inside the loop.

lib/fs/random_access_file.cpp (2)

36-51: ⚠️ Potential issue

Move‑assignment still leaks the current resources

The assignment operator steals m_fd / m_ring from other but never releases the resources that this might already own when the operator is entered. The same issue was raised in a previous round of review and is still present.

auto random_access_file_t::operator=(random_access_file_t &&other) noexcept -> random_access_file_t &
{
     if (this == &other)
     {
         return *this;
     }

-    m_fd   = other.m_fd;
-    m_ring = other.m_ring;
+    // release currently‑owned resources first
+    if (m_fd != -1)
+    {
+        io_uring_queue_exit(&m_ring);
+        close(m_fd);
+    }
+
+    m_fd   = other.m_fd;
+    m_ring = other.m_ring;
 
     other.m_fd   = -1;
     other.m_ring = io_uring{};
     return *this;
}

83-88: ⚠️ Potential issue

io_uring_submit / io_uring_wait_cqe return codes are ignored

Both calls can fail (negative return value). If that happens the unchecked cqe pointer is dereferenced on the next line, leading to UB / crash. Handle the error path first and convert it into file_error_t, as already suggested earlier.

lib/wal/log/log.h (2)

46-51: ⚠️ Potential issue

Recursive move‑assignment via std::swap

Using swap(*this, other) falls back on ADL to std::swap, which in turn calls move‑assignment again – resulting in infinite recursion or a hard error, exactly as mentioned in the previous review round.

Replace the body with an explicit move as shown in the earlier suggestion, or provide a non‑recursive swap(log_t&, log_t&) overload and call that.


128-154: ⚠️ Potential issue

object_backend_tag APIs compile, but build() still static‑asserts

set_url() is enabled for object_backend_tag, yet build() still has no branch for that tag and unconditionally hits the static_assert(always_false<>). Any caller that uses the object backend will therefore fail to compile.

Either (a) implement the object‑store backend in the builder, or (b) remove / disable the set_url() overload until the backend is ready, to avoid a confusing build‑time trap.

lib/raft/raft.h (1)

170-194: ⚠️ Potential issue

Potential data race on m_shutdown

m_shutdown is a plain bool, but the class launches heartbeat / election threads (m_heartbeatThread, m_electionThread) that almost certainly read this flag concurrently. Every access must be protected by a mutex or replaced with std::atomic<bool> – otherwise the program exhibits undefined behaviour.

-    bool m_shutdown{false};
+    std::atomic<bool> m_shutdown{false};

Also ensure all reads & writes use the same atomic object, or are guarded by m_stateMutex.

lib/wal/log/persistent_log_storage.h (5)

148-158: ⚠️ Potential issue

Buffer size doesn't account for read result.

The read_impl function doesn't resize the buffer based on the actual bytes read. If fewer bytes are read than requested, the buffer will contain uninitialized or garbage data.

Apply this fix to properly resize the buffer:

[[nodiscard]] auto read_impl(std::size_t offset, std::size_t size) -> std::string
{
    std::string buffer;
    buffer.resize(size);
-   if (const auto res = m_file.read(offset, buffer.data(), size); !res.has_value())
+   const auto res = m_file.read(offset, buffer.data(), size);
+   if (!res.has_value())
    {
        spdlog::error("Failed to read from file storage. Offset={}, size={}", offset, size);
        return {};
    }
+   // Adjust buffer size to actual bytes read
+   buffer.resize(static_cast<size_t>(res.value()));
    return buffer;
}

234-247: 🛠️ Refactor suggestion

Gracefully handle empty or corrupt files in constructor.

The constructor reads the entire file content at once, which is inefficient for large files and lacks error handling if the read fails.

Consider reading in chunks and adding proper error handling:

explicit persistent_log_storage_t(TBackendStorage &&backendStorage)
    : m_backendStorage(std::move(backendStorage))
{
-   const std::string  raw = m_backendStorage.read(0, m_backendStorage.size());
-   std::istringstream stream(raw);
-   for (std::string line; std::getline(stream, line);)
-   {
-       if (absl::StripAsciiWhitespace(line).empty())
-       {
-           continue;
-       }
-       m_inMemoryLog.emplace_back(std::move(line));
+   const size_t total_size = m_backendStorage.size();
+   if (total_size == 0) {
+       return;  // Empty file, nothing to load
+   }
+   
+   try {
+       // Read in chunks for efficiency with large files
+       constexpr size_t chunk_size = 4096;
+       size_t offset = 0;
+       std::string buffer;
+       std::string line_buffer;
+       
+       while (offset < total_size) {
+           size_t to_read = std::min(chunk_size, total_size - offset);
+           std::string chunk = m_backendStorage.read(offset, to_read);
+           offset += chunk.size();
+           
+           if (chunk.empty()) {
+               spdlog::warn("Read returned empty chunk at offset {}", offset);
+               break;
+           }
+           
+           buffer += chunk;
+           
+           // Process complete lines
+           std::istringstream stream(buffer);
+           while (std::getline(stream, line_buffer)) {
+               if (!absl::StripAsciiWhitespace(line_buffer).empty()) {
+                   m_inMemoryLog.emplace_back(std::move(line_buffer));
+               }
+               line_buffer.clear();
+           }
+           
+           // Keep any incomplete last line for the next iteration
+           buffer = stream.eof() ? "" : line_buffer;
+       }
+   } catch (const std::exception& e) {
+       spdlog::error("Error loading log entries: {}", e.what());
    }
}

270-281: ⚠️ Potential issue

Fix missing newline in append method.

The append method doesn't add newlines between entries, but the constructor reads the file line by line. This inconsistency will cause problems when loading persisted logs.

Apply this fix:

[[nodiscard]] auto append(std::string entry) -> bool
{
+   // Add newline to separate entries when reading line by line
+   entry += '\n';
    if (!m_backendStorage.write(
            static_cast<const char *>(entry.data()), m_backendStorage.size(), entry.size()))
    {
        spdlog::error(
            "Persistent log storage write failed. Entry={}, size={}\n", entry, entry.size());
        return false;
    }
-   m_inMemoryLog.emplace_back(std::move(entry));
+   // Store the original entry (without newline) in memory
+   m_inMemoryLog.emplace_back(entry.substr(0, entry.size() - 1));
    return true;
}

272-274: ⚠️ Potential issue

Fix offset parameter in write operation.

The current implementation uses m_backendStorage.size() as the offset for writing, which is incorrect for appending log entries.

Correct the offset parameter:

if (!m_backendStorage.write(
-       static_cast<const char *>(entry.data()), m_backendStorage.size(), entry.size()))
+       static_cast<const char *>(entry.data()), 0, entry.size()))  // Use 0 for offset since append_only_file will handle appending

304-307: ⚠️ Potential issue

Inconsistency between size() and read() methods.

The size() method returns m_backendStorage.size() but read() uses m_inMemoryLog.size(). This inconsistency could lead to subtle bugs.

[[nodiscard]] auto size() const -> std::size_t
{
-   return m_backendStorage.size();
+   return m_inMemoryLog.size();
}
lib/raft/raft.cpp (5)

261-262: ⚠️ Potential issue

Implement the state machine updates or remove the TODO.

This TODO comment has been present through multiple reviews. Without updating the state machine, committed log entries have no effect on the actual system state.

Implement state machine updates to maintain consistency:

while (m_lastApplied < m_commitIndex)
{
    ++m_lastApplied;
-   // TODO(lnikon): Update the state machine!
+   const auto& entry = m_log[m_lastApplied - 1];
+   // Apply the entry to your state machine
+   // For example:
+   // m_stateMachine.apply(entry);
}

479-485: 🛠️ Refactor suggestion

Consider using thread pool instead of spawning new threads.

Creating a new thread for every append entries RPC can severely degrade performance under high load.

Consider implementing a thread pool:

- // TODO(lnikon): Make these calls async
- for (auto &[id, client] : m_replicas)
- {
-     spdlog::debug(
-         "Node={} is creating a heartbeat thread for the peer={}", m_config.m_id, id);
-     sendAppendEntriesRPC(client.value(), {});
- }
+ // Use a thread pool to limit concurrency
+ std::vector<std::future<void>> futures;
+ for (auto &[id, client] : m_replicas)
+ {
+     spdlog::debug(
+         "Node={} is queuing heartbeat for peer={}", m_config.m_id, id);
+     futures.push_back(m_threadPool.enqueue([this, &client]() {
+         sendAppendEntriesRPC(client.value(), {});
+     }));
+ }
+ 
+ // Wait for all heartbeats to complete or timeout
+ for (auto& future : futures) {
+     future.wait_for(std::chrono::milliseconds(50));
+ }

This would require adding a thread pool implementation to your codebase.


873-878: 🛠️ Refactor suggestion

Consider using fsync for log reliability.

The current implementation flushes the data to the operating system but doesn't ensure it's actually written to persistent storage. For true Raft safety, calling fsync is necessary.

Add proper fsync functionality:

fsa.flush();
if (fsa.fail())
{
    return false;
}
+ 
+ // Ensure data is actually persisted to disk
+ #ifdef _WIN32
+     _commit(_fileno(fsa));
+ #else
+     fsync(fileno(fsa));
+ #endif

This requires adding the appropriate headers (#include <unistd.h> for POSIX systems or <io.h> for Windows).


933-934: ⚠️ Potential issue

Fix unintended override of loaded votedFor value.

The code explicitly sets m_votedFor = 0 immediately after reading it from a file, which defeats the purpose of persisting this value.

ifs >> m_commitIndex >> m_votedFor;
-m_votedFor = 0;

This change ensures that the persisted value for votedFor is properly maintained across restarts, which is critical for the correctness of the Raft protocol.


621-622: ⚠️ Potential issue

Implement state machine updates.

The TODO comment in onSendAppendEntriesRPC indicates missing state machine updates, which are crucial for maintaining consistency.

Implement the missing state machine updates:

while (m_lastApplied < m_commitIndex)
{
    ++m_lastApplied;
-   spdlog::info("TODO(lnikon): Apply to state machine here");
+   const auto& entry = m_log[m_lastApplied - 1];
+   // Apply the entry to your state machine
+   // For example:
+   // m_stateMachine.apply(entry);
+   spdlog::info("Applied entry [key={}, value={}] to state machine", 
+                entry.key(), entry.value());
}
🧹 Nitpick comments (5)
lib/structures/memtable/memtable.cpp (1)

109-115: Redundant size tracking

Both m_size and m_num_of_bytes are incremented by the same value (record.size()), suggesting redundancy. Consider clarifying the distinction between these variables or consolidating them if they track the same information.

lib/db/manifest/manifest.cpp (1)

96-99: Detect partial writes – don’t treat “>= 0” as success

append() returning a non‑negative value means “some bytes written”, not necessarily “all bytes written”.
For manifest durability you should verify the full payload length:

-    return m_log->append({infoSerialized.c_str(), infoSerialized.size()})
-        .transform([](ssize_t res) { return res >= 0; })
+    const auto expected = m_log->append({infoSerialized.c_str(), infoSerialized.size()});
+    return expected.transform([&](ssize_t res) {
+               return static_cast<std::size_t>(res) == infoSerialized.size();
+           })
         .value_or(false);

This prevents silent truncation on short writes (e.g. EINTR).

lib/db/db.h (1)

20-21: Pass heavy objects by (r)value‑reference to avoid an extra move

wal::wal_wrapper_t potentially owns sizeable buffers; taking it by value forces one move into the parameter and another into m_wal.
Prefer an r‑value reference in the ctor and std::move at the call‑site to make the intent explicit and forbid an accidental copy.

-    explicit db_t(config::shared_ptr_t config, wal::wal_wrapper_t wal);
+    explicit db_t(config::shared_ptr_t config, wal::wal_wrapper_t&& wal);
src/main.cpp (1)

605-609: Typo: “positve” → “positive”

-            spdlog::error("ID of the node should be positve integer");
+            spdlog::error("ID of the node should be positive integer");
lib/fs/random_access_file.cpp (1)

81-82: IOSQE_IO_LINK flag set without a linked SQE

IOSQE_IO_LINK makes sense only when at least two SQEs are linked together.
Here there is a single writev SQE, so the flag has no effect and only adds confusion.
Consider dropping the flag (or linking the operation with a follow‑up fsync SQE if that is the intention).

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6c344eb and 8677d89.

📒 Files selected for processing (65)
  • .clang-format (1 hunks)
  • bench/db/basic_bench.cpp (1 hunks)
  • bench/db/celero_bench.cpp (2 hunks)
  • bench/db/gbench_bench.cpp (2 hunks)
  • bench/fs/lots_of_write.cpp (1 hunks)
  • bench/fs/open_write_streams.cpp (3 hunks)
  • examples/absl/absl_try_mutex.cpp (1 hunks)
  • examples/iouring/libiouring/main.cpp (1 hunks)
  • examples/iouring/raw_uring/main.cpp (1 hunks)
  • examples/prometheus/main.cpp (1 hunks)
  • examples/raft/main.cpp (1 hunks)
  • lib/concurrency/thread_safe_queue.h (1 hunks)
  • lib/db/db.cpp (4 hunks)
  • lib/db/db.h (3 hunks)
  • lib/db/db_config.h (1 hunks)
  • lib/db/manifest/manifest.cpp (3 hunks)
  • lib/db/manifest/manifest.h (5 hunks)
  • lib/fs/append_only_file.cpp (1 hunks)
  • lib/fs/append_only_file.h (1 hunks)
  • lib/fs/common.cpp (1 hunks)
  • lib/fs/common.h (1 hunks)
  • lib/fs/random_access_file.cpp (1 hunks)
  • lib/fs/random_access_file.h (1 hunks)
  • lib/fs/random_access_file_test.cpp (1 hunks)
  • lib/posix_wrapper/open_flag.h (1 hunks)
  • lib/posix_wrapper/posix_wrapper_test.cpp (1 hunks)
  • lib/raft/persistence.h (1 hunks)
  • lib/raft/raft.cpp (1 hunks)
  • lib/raft/raft.h (1 hunks)
  • lib/raft/raft_test.cpp (1 hunks)
  • lib/raft/replicated_log.h (1 hunks)
  • lib/server/grpc_server.cpp (2 hunks)
  • lib/server/grpc_server.h (1 hunks)
  • lib/server/server_factory.h (1 hunks)
  • lib/server/server_kind.cpp (1 hunks)
  • lib/server/server_kind.h (1 hunks)
  • lib/structures/lsmtree/levels/level.cpp (11 hunks)
  • lib/structures/lsmtree/levels/level.h (2 hunks)
  • lib/structures/lsmtree/levels/levels.cpp (4 hunks)
  • lib/structures/lsmtree/levels/levels.h (1 hunks)
  • lib/structures/lsmtree/lsmtree.cpp (9 hunks)
  • lib/structures/lsmtree/lsmtree_test.cpp (4 hunks)
  • lib/structures/lsmtree/segments/helpers.cpp (1 hunks)
  • lib/structures/lsmtree/segments/helpers.h (1 hunks)
  • lib/structures/lsmtree/segments/lsmtree_regular_segment.cpp (2 hunks)
  • lib/structures/lsmtree/segments/lsmtree_regular_segment.h (1 hunks)
  • lib/structures/lsmtree/segments/lsmtree_segment_factory.h (1 hunks)
  • lib/structures/lsmtree/segments/segment_storage.cpp (2 hunks)
  • lib/structures/lsmtree/segments/segment_storage.h (3 hunks)
  • lib/structures/lsmtree/segments/uuid.h (9 hunks)
  • lib/structures/memtable/memtable.cpp (2 hunks)
  • lib/structures/memtable/memtable_test.cpp (1 hunks)
  • lib/structures/skiplist/skiplist.h (1 hunks)
  • lib/structures/skiplist/skiplist_test.cpp (1 hunks)
  • lib/structures/sorted_vector/sorted_vector.h (4 hunks)
  • lib/structures/sorted_vector/sorted_vector_test.cpp (1 hunks)
  • lib/wal/common.h (1 hunks)
  • lib/wal/config.h (1 hunks)
  • lib/wal/log/common.h (1 hunks)
  • lib/wal/log/concepts.h (1 hunks)
  • lib/wal/log/in_memory_log_storage.h (1 hunks)
  • lib/wal/log/log.h (1 hunks)
  • lib/wal/log/persistent_log_storage.h (1 hunks)
  • lib/wal/wal.h (1 hunks)
  • src/main.cpp (9 hunks)
✅ Files skipped from review due to trivial changes (27)
  • lib/structures/memtable/memtable_test.cpp
  • lib/structures/lsmtree/segments/lsmtree_segment_factory.h
  • lib/structures/lsmtree/segments/segment_storage.cpp
  • bench/db/basic_bench.cpp
  • lib/server/server_factory.h
  • lib/structures/lsmtree/segments/helpers.h
  • lib/structures/lsmtree/segments/helpers.cpp
  • lib/server/server_kind.cpp
  • lib/structures/skiplist/skiplist_test.cpp
  • lib/structures/skiplist/skiplist.h
  • lib/server/grpc_server.cpp
  • lib/concurrency/thread_safe_queue.h
  • lib/structures/lsmtree/levels/levels.h
  • lib/structures/lsmtree/levels/level.h
  • bench/fs/lots_of_write.cpp
  • lib/server/grpc_server.h
  • lib/structures/sorted_vector/sorted_vector_test.cpp
  • lib/structures/sorted_vector/sorted_vector.h
  • examples/absl/absl_try_mutex.cpp
  • lib/structures/lsmtree/segments/segment_storage.h
  • lib/server/server_kind.h
  • bench/db/gbench_bench.cpp
  • lib/structures/lsmtree/segments/lsmtree_regular_segment.h
  • bench/fs/open_write_streams.cpp
  • lib/structures/lsmtree/levels/levels.cpp
  • lib/structures/lsmtree/segments/uuid.h
  • lib/structures/lsmtree/segments/lsmtree_regular_segment.cpp
🚧 Files skipped from review as they are similar to previous changes (25)
  • .clang-format
  • lib/structures/lsmtree/lsmtree_test.cpp
  • lib/fs/common.cpp
  • lib/wal/config.h
  • lib/wal/log/common.h
  • examples/prometheus/main.cpp
  • lib/db/db.cpp
  • lib/db/db_config.h
  • examples/raft/main.cpp
  • lib/posix_wrapper/posix_wrapper_test.cpp
  • lib/posix_wrapper/open_flag.h
  • lib/fs/common.h
  • lib/db/manifest/manifest.h
  • lib/wal/log/concepts.h
  • lib/structures/lsmtree/lsmtree.cpp
  • lib/raft/persistence.h
  • lib/raft/replicated_log.h
  • lib/fs/append_only_file.cpp
  • lib/wal/common.h
  • examples/iouring/libiouring/main.cpp
  • lib/fs/random_access_file.h
  • lib/structures/lsmtree/levels/level.cpp
  • lib/wal/log/in_memory_log_storage.h
  • lib/fs/append_only_file.h
  • lib/wal/wal.h
🧰 Additional context used
🧬 Code Graph Analysis (3)
bench/db/celero_bench.cpp (1)
lib/structures/lsmtree/lsmtree_test.cpp (2)
  • generateRandomString (76-90)
  • generateRandomString (76-76)
lib/raft/raft_test.cpp (2)
lib/raft/raft.cpp (7)
  • AppendEntries (179-283)
  • AppendEntries (179-181)
  • id (110-113)
  • id (110-110)
  • raft_node_grpc_client_t (56-63)
  • RequestVote (285-344)
  • RequestVote (285-287)
lib/raft/raft.h (8)
  • request (68-69)
  • request (70-70)
  • request (133-133)
  • raft_node_grpc_client_t (58-59)
  • raft_node_grpc_client_t (60-60)
  • raft_node_grpc_client_t (60-60)
  • raft_node_grpc_client_t (62-62)
  • raft_node_grpc_client_t (65-65)
lib/wal/log/persistent_log_storage.h (4)
lib/wal/log/in_memory_log_storage.h (11)
  • nodiscard (46-50)
  • nodiscard (52-56)
  • nodiscard (58-65)
  • nodiscard (67-71)
  • nodiscard (73-76)
  • nodiscard (93-96)
  • other (29-39)
  • other (29-29)
  • entry (46-46)
  • command (52-52)
  • index (58-58)
lib/wal/log/log.h (11)
  • nodiscard (58-61)
  • nodiscard (63-67)
  • nodiscard (69-72)
  • nodiscard (74-77)
  • nodiscard (79-82)
  • nodiscard (136-156)
  • other (46-51)
  • other (46-46)
  • entry (58-58)
  • command (63-64)
  • index (69-69)
lib/wal/wal.h (9)
  • nodiscard (156-159)
  • nodiscard (227-267)
  • other (86-86)
  • other (87-87)
  • other (190-198)
  • other (190-190)
  • other (200-208)
  • other (200-200)
  • is_same_v (310-340)
lib/fs/append_only_file.h (3)
  • data (30-31)
  • offset (33-34)
  • other (23-23)
🪛 Cppcheck (2.10-2)
lib/raft/raft_test.cpp

[error] 168-168: Reference to rvResponses that may be invalid.

(invalidContainerReference)

lib/raft/raft.cpp

[performance] 123-123: Function parameter 'nodeConfig' should be passed by const reference.

(passedByValue)

⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: build-gcc
  • GitHub Check: build-clang
🔇 Additional comments (8)
bench/db/celero_bench.cpp (2)

40-41: Clean formatting of the return type.

The trailing return type has been well formatted for better readability by moving it to a separate line with proper indentation. This change improves code readability without affecting functionality.


68-69: Improved argument formatting for better readability.

Breaking the emplace_back() call into multiple lines with proper indentation makes the code more readable, especially with long arguments like the ones used here. This formatting change is consistent with good C++ styling practices.

lib/structures/memtable/memtable.cpp (2)

51-52: Style improvements look good!

The return type formatting aligns with the project's style guidelines, improving readability by clearly separating the return type from the function signature.


117-118: Style improvements look good!

The return type formatting is consistent with other functions in the codebase and improves readability.

lib/fs/random_access_file_test.cpp (1)

24-35: Looks good – helper utilities are self‑contained and portable

The temporary‑file helper correctly uses mkstemp and closes the returned fd, avoiding races.
Implementation is concise and C++20 compliant.

lib/raft/raft_test.cpp (1)

157-170: Static‑analysis “invalid reference” is a false positive – capacity reserved

Good job calling reserve(clusterSize*2), which guarantees rvResponses won’t reallocate, so the references captured by SetArgPointee stay valid.
No action needed.

🧰 Tools
🪛 Cppcheck (2.10-2)

[error] 168-168: Reference to rvResponses that may be invalid.

(invalidContainerReference)

lib/fs/random_access_file.cpp (1)

173-175: Verify kBufferSize visibility

kBufferSize is referenced here but is not defined in this translation unit.
If fs/common.h does not actually provide it, the file will not compile.

Would you double‑check that fs/common.h (or another included header) really defines kBufferSize as constexpr std::size_t?

lib/raft/raft.cpp (1)

123-137: Function parameter 'nodeConfig' should be passed by const reference.

The static analyzer flagged this, but since the parameter is moved into a member variable, taking it by value is actually appropriate. No change is needed.

🧰 Tools
🪛 Cppcheck (2.10-2)

[performance] 123-123: Function parameter 'nodeConfig' should be passed by const reference.

(passedByValue)

Comment on lines +253 to +271
while (bytes_remaining)
{
off_t bytes_to_read = bytes_remaining;
if (bytes_to_read > BLOCK_SZ)
bytes_to_read = BLOCK_SZ;

fi->iovecs[current_block].iov_len = bytes_to_read;

void *buf;
if (posix_memalign(&buf, BLOCK_SZ, BLOCK_SZ))
{
perror("posix_memalign");
return 1;
}
fi->iovecs[current_block].iov_base = buf;

current_block++;
bytes_remaining -= bytes_to_read;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Leaking every buffer, file_info, and file descriptor

For each block you posix_memalign a buffer and store it in fi->iovecs, but neither the buffers, the enclosing file_info, nor file_fd are freed/closed after the CQ event is processed.
Reading many files (or large files) will quickly exhaust memory and file‑descriptor limits.

Suggested post‑processing in read_from_cq:

@@ while (std::getline(stream, line))
     for (int i = 0; i < blocks; i++)
     {
         output_to_console((char *)fi->iovecs[i].iov_base, fi->iovecs[i].iov_len);
-    }
+        free(fi->iovecs[i].iov_base);
+    }
+    close(cqe->fd);               // requires storing fd in file_info
+    free(fi);

You will need to:

  1. Extend file_info with int fd; so it’s reachable at completion time.
  2. Assign it (fi->fd = file_fd;) in submit_to_sq.
  3. Free/close as above once all blocks are printed.

Without this, the example cannot be used in long‑running benchmarks.

Also applies to: 194-211

Comment on lines +21 to +38
struct app_io_sq_ring
{
unsigned char *head;
unsigned char *tail;
unsigned char *ring_mask;
unsigned char *ring_entries;
unsigned char *flags;
unsigned char *array;
};

struct app_io_cq_ring
{
unsigned char *head;
unsigned char *tail;
unsigned char *ring_mask;
unsigned char *ring_entries;
struct io_uring_cqe *cqes;
};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

⚠️ Potential issue

Incorrect ring‐buffer field types break atomicity & correctness

The SQ/CQ ring descriptors are mapped over kernel‐owned unsigned int fields, yet the structures declare every pointer as unsigned char *.
When you later dereference them (*sring->head, *cring->tail, etc.) you read/modify a single byte instead of the required 4‑byte (or 8‑byte) word.
This corrupts the ring state and produces hard‑to‑reproduce I/O failures.

-struct app_io_sq_ring
-{
-    unsigned char *head;
-    unsigned char *tail;
-    unsigned char *ring_mask;
-    unsigned char *ring_entries;
-    unsigned char *flags;
-    unsigned char *array;
-};
+struct app_io_sq_ring
+{
+    unsigned int *head;
+    unsigned int *tail;
+    unsigned int *ring_mask;
+    unsigned int *ring_entries;
+    unsigned int *flags;
+    unsigned int *array;
+};
 
-struct app_io_cq_ring
-{
-    unsigned char       *head;
-    unsigned char       *tail;
-    unsigned char       *ring_mask;
-    unsigned char       *ring_entries;
-    struct io_uring_cqe *cqes;
-};
+struct app_io_cq_ring
+{
+    unsigned int        *head;
+    unsigned int        *tail;
+    unsigned int        *ring_mask;
+    unsigned int        *ring_entries;
+    struct io_uring_cqe *cqes;
+};

Besides preventing silent data races, this also fixes the index = tail & *s->sq_ring.ring_mask; expression (currently masking with a single byte).
Update all subsequent casts/dereferences accordingly.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
struct app_io_sq_ring
{
unsigned char *head;
unsigned char *tail;
unsigned char *ring_mask;
unsigned char *ring_entries;
unsigned char *flags;
unsigned char *array;
};
struct app_io_cq_ring
{
unsigned char *head;
unsigned char *tail;
unsigned char *ring_mask;
unsigned char *ring_entries;
struct io_uring_cqe *cqes;
};
struct app_io_sq_ring
{
unsigned int *head;
unsigned int *tail;
unsigned int *ring_mask;
unsigned int *ring_entries;
unsigned int *flags;
unsigned int *array;
};
struct app_io_cq_ring
{
unsigned int *head;
unsigned int *tail;
unsigned int *ring_mask;
unsigned int *ring_entries;
struct io_uring_cqe *cqes;
};

Comment on lines +159 to +164
cring->head = (unsigned char *)cq_ptr + p.cq_off.head;
cring->tail = (unsigned char *)cq_ptr + p.cq_off.tail;
cring->ring_mask = (unsigned char *)cq_ptr + p.cq_off.ring_mask;
cring->ring_entries = (unsigned char *)cq_ptr + p.cq_off.ring_entries;
cring->cqes = (io_uring_cqe *)cq_ptr + p.cq_off.cqes;

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Pointer arithmetic bug offsets CQEs incorrectly

p.cq_off.cqes is a byte offset, but adding it after casting to io_uring_cqe * scales it by sizeof(io_uring_cqe), pointing far past the mapped area.

-    cring->cqes = (io_uring_cqe *)cq_ptr + p.cq_off.cqes;
+    cring->cqes = reinterpret_cast<io_uring_cqe *>(
+        reinterpret_cast<unsigned char *>(cq_ptr) + p.cq_off.cqes);

The same pattern appears in any similar offset arithmetic—double‑check all + p.*_off.* calculations.
Dereferencing a mis‑aligned pointer here corrupts completion data and segfaults under load.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
cring->head = (unsigned char *)cq_ptr + p.cq_off.head;
cring->tail = (unsigned char *)cq_ptr + p.cq_off.tail;
cring->ring_mask = (unsigned char *)cq_ptr + p.cq_off.ring_mask;
cring->ring_entries = (unsigned char *)cq_ptr + p.cq_off.ring_entries;
cring->cqes = (io_uring_cqe *)cq_ptr + p.cq_off.cqes;
cring->head = (unsigned char *)cq_ptr + p.cq_off.head;
cring->tail = (unsigned char *)cq_ptr + p.cq_off.tail;
cring->ring_mask = (unsigned char *)cq_ptr + p.cq_off.ring_mask;
cring->ring_entries = (unsigned char *)cq_ptr + p.cq_off.ring_entries;
cring->cqes = reinterpret_cast<io_uring_cqe *>(
reinterpret_cast<unsigned char *>(cq_ptr) + p.cq_off.cqes);

Comment on lines +88 to +92
{
return m_raftClients |
std::views::transform([](auto &&pair) { return std::move(pair.second); }) |
std::ranges::to<std::vector<raft::raft_node_grpc_client_t>>();
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

raftClients() moves elements out of the map – second call UB

std::move(pair.second) leaves the element in m_raftClients in a moved‑from state.
raftClients() is invoked twice (once per test), so the second invocation will operate on invalid objects.
Return pointers/references, or move only once and document the contract.

-return m_raftClients |
-       std::views::transform([](auto &&pair) { return std::move(pair.second); }) |
-       std::ranges::to<std::vector<raft::raft_node_grpc_client_t>>();
+std::vector<raft::raft_node_grpc_client_t*> vec;
+vec.reserve(m_raftClients.size());
+for (auto& [id, client] : m_raftClients) {
+    vec.push_back(&client);
+}
+return vec;

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines 142 to 149
else if constexpr (std::is_same_v<TStorageTag, storage_tags::file_backend_tag>)
{
auto &&storage =
persistent_log_storage_builder_t<file_storage_backend_t>{{.file_path = m_file_path}}
.build();
return storage.has_value() ? std::make_optional(log_t{std::move(storage.value())})
: std::nullopt;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Missing validation for m_file_path

When the file backend is selected, an empty m_file_path silently flows into persistent_log_storage_builder_t, which will likely fail at runtime. Guard against empty paths and return std::nullopt (or a descriptive error) instead.

lib/raft/raft.h Outdated
Comment on lines 170 to 179
mutable absl::Mutex m_stateMutex;
uint32_t m_currentTerm ABSL_GUARDED_BY(m_stateMutex);
uint32_t m_votedFor ABSL_GUARDED_BY(m_stateMutex);
std::vector<LogEntry> m_log ABSL_GUARDED_BY(m_stateMutex);

// Volatile state on all servers.
uint32_t m_commitIndex ABSL_GUARDED_BY(m_stateMutex);
uint32_t m_lastApplied ABSL_GUARDED_BY(m_stateMutex);
NodeState m_state ABSL_GUARDED_BY(m_stateMutex);

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Members appear un‑initialised

m_currentTerm, m_votedFor, m_commitIndex, m_lastApplied, and m_state are declared but not given default initialisers here. Unless the constructor zero‑initialises them in the .cpp, they may hold indeterminate values, breaking the Raft invariants.

Please initialise them either in‑class or in the constructor’s member‑initialiser list.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🔭 Outside diff range comments (1)
assets/database_config_schema.json (1)

64-85: ⚠️ Potential issue

Missing “maximumLevels” property in the LSM schema.

The required array for lsm includes "maximumLevels", but there’s no corresponding definition under properties. This mismatch will cause schema validation failures.

Apply this diff to add the missing entry:

--- a/assets/database_config_schema.json
+++ b/assets/database_config_schema.json
@@ 64,78
         "properties": {
             "flushThreshold": {
                 "type": "integer",
                 "description": "The threshold of bytes at which the memtable should be flushed",
                 "minimum": 1
             },
+            "maximumLevels": {
+                "type": "integer",
+                "description": "Maximum number of levels in the LSM tree",
+                "minimum": 1
+            },
             "levelZeroCompaction": {
                 "$ref": "#/$defs/compaction"
             },
             "levelNonZeroCompaction": {
                 "$ref": "#/$defs/compaction"
             }
♻️ Duplicate comments (1)
assets/tkvpp_config_1.json (1)

5-10: Remove redundant walFilename from the database section.

The WAL filename now belongs under the top‑level "wal" object. Keeping it here will either be ignored or break validation against the updated schema.

--- a/assets/tkvpp_config_1.json
+++ b/assets/tkvpp_config_1.json
@@ 5,10
     "database": {
         "path": "./var/tkvpp",
-        "walFilename": "wal",
         "manifestFilenamePrefix": "manifest_",
         "mode": "replicated"
     },
🧹 Nitpick comments (2)
assets/database_config_schema.json (2)

29-34: Align mode description with enum values.

The description refers to “server-client mode” whereas the enum uses "standalone". To avoid confusion, update it:

--- a/assets/database_config_schema.json
+++ b/assets/database_config_schema.json
@@ 30,34
-                    "Specifies whether the database will be embedded, run in server-client mode, or replicated"
+                    "Specifies whether the database will be embedded, standalone, or replicated"

108-114: Add format validation for peer addresses.

Currently, "peers" items are untyped strings. Consider enforcing IPv4 syntax with "format": "ipv4":

--- a/assets/database_config_schema.json
+++ b/assets/database_config_schema.json
@@ 108,114
                 "peers": {
                     "type": "array",
                     "description": "Array of IPv4 addresses of peers",
                     "items": {
-                        "type": "string"
+                        "type": "string",
+                        "format": "ipv4"
                     }
                 }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8677d89 and f11e4df.

📒 Files selected for processing (4)
  • assets/database_config_schema.json (4 hunks)
  • assets/tkvpp_config_1.json (1 hunks)
  • assets/tkvpp_config_2.json (1 hunks)
  • assets/tkvpp_config_3.json (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • assets/tkvpp_config_3.json
  • assets/tkvpp_config_2.json
⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: build-clang
  • GitHub Check: build-gcc
🔇 Additional comments (4)
assets/tkvpp_config_1.json (4)

1-4: Logging configuration is valid.

The "logging" section matches the schema and requires no changes.


11-15: WAL configuration is correct.

The "wal" object includes all required fields (enable, filename, storageType) and conforms to the schema.


16-27: LSM settings follow the schema.

The entries for "flushThreshold", "maximumLevels", and compaction strategies are correctly defined.


28-38: Server section aligns with schema requirements.

The "server" block provides transport, host, port, id, and peers as required.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🔭 Outside diff range comments (6)
lib/structures/lsmtree/segments/lsmtree_segment_index.h (2)

28-28: 💡 Verification agent

🧩 Analysis chain

Address the TODO comment before finalizing the Raft implementation.

The TODO comment indicates that the interface is not yet decided. As you're implementing Raft for replication, it's important to finalize this interface to ensure it properly supports the distributed consensus requirements.


🏁 Script executed:

#!/bin/bash
# Check if there are other similar TODO comments in the LSM tree implementation
rg "TODO: Decide on the interface" --type cpp

Length of output: 141


Finalize LSM Tree Segment Index Interface for Raft Replication

The // TODO: Decide on the interface. comment indicates that the segment index API is still undefined. Before merging Raft-based replication, please design and implement this interface so it fully supports the distributed consensus requirements (e.g., segment lookup, metadata serialization, versioning).

• File: lib/structures/lsmtree/segments/lsmtree_segment_index.h
• Line: 28

Suggested change:

-    // TODO: Decide on the interface.
+    // Define a stable interface here to support Raft replication:
+    // - lookupSegment(id): returns segment metadata
+    // - serializeMetadata(): returns bytes for log entries
+    // - applyMetadata(bytes): updates index from log entry

24-26: 💡 Verification agent

🧩 Analysis chain

Consider implementing the empty key_t struct.

The key_t struct is currently empty. If it's intended to be a placeholder for future implementation, consider adding a comment explaining its purpose or implementing it with the required functionality to support the LSM tree operations.


🏁 Script executed:

#!/bin/bash
# Check if there are other occurrences of this empty struct or if it's used/implemented elsewhere
rg -A 3 "struct key_t" --type cpp

Length of output: 513


🏁 Script executed:

#!/bin/bash
# Display the contents of the lsmtree_segment_index.h file for context
sed -n '1,200p' lib/structures/lsmtree/segments/lsmtree_segment_index.h

Length of output: 1049


Implement key_t for use in unordered_map

The empty key_t struct is used as the key in an std::unordered_map, so you’ll need to:

  • Define the actual key data (e.g. raw bytes, string, numeric ID) inside key_t.
  • Provide an operator== for key_t.
  • Specialize std::hash<structures::lsmtree::lsmtree_segment_index_t::key_t> (or supply a custom hash functor) so the map can be instantiated.

If key_t is merely a placeholder, add a // TODO: comment explaining the intended fields, comparison, and hashing strategy.

Affected location:

  • lib/structures/lsmtree/segments/lsmtree_segment_index.h: lines 24–26
lib/structures/lsmtree/lsmtree_reader_writer.h (2)

8-8: 🛠️ Refactor suggestion

Clean up unprofessional TODO comment.

The TODO comment contains unprofessional language and indicates confusion about the file's purpose. Consider either documenting the actual purpose of this file or refactoring/removing it if it's no longer needed.


46-48: ⚠️ Potential issue

Fix inheritance inconsistency.

lsmtree_segment_reader_t incorrectly inherits from generic_writer_t but is named as a reader. This mismatch between the class name and its inheritance suggests an error.

-struct lsmtree_segment_reader_t : generic_writer_t<lsmtree_segment_reader_t>
+struct lsmtree_segment_reader_t : generic_reader_t<lsmtree_segment_reader_t>
lib/structures/lsmtree/levels/level.h (1)

113-113: 🛠️ Refactor suggestion

Fix non-standard attribute.

The attribute [[__nodiscard__]] uses a non-standard format with double underscores. This should be changed to the standard [[nodiscard]] attribute.

-[[__nodiscard__]] auto bytes_used() const noexcept -> std::size_t;
+[[nodiscard]] auto bytes_used() const noexcept -> std::size_t;
lib/structures/lsmtree/lsmtree.h (1)

60-66: ⚠️ Potential issue

Update documentation for move operations.

The documentation comments still indicate that move constructor and assignment operator are deleted, which contradicts the actual implementation that now supports these operations.

Update the documentation to accurately reflect that move operations are supported with appropriate descriptions of their behavior.

Also applies to: 91-100

♻️ Duplicate comments (14)
lib/wal/CMakeLists.txt (1)

19-24: 🛠️ Refactor suggestion

Un-comment and add the missing WAL unit tests

The WALTest target is still commented-out. Given the criticality of WAL in Raft replication, shipping without tests leaves the core persistence layer un-verified.

-# add_executable(WALTest "wal_test.cpp")
-# set_target_properties(WALTest PROPERTIES CXX_STANDARD 23)
-# target_link_libraries(
-#   WALTest
-#   Catch2::Catch2WithMain
-#   spdlog::spdlog fmt::fmt LSMTree MemTable DB Config HashIndex)
-#
-# include(Catch)
-# catch_discover_tests(WALTest)
+add_executable(WALTest wal_test.cpp)
+set_target_properties(WALTest PROPERTIES CXX_STANDARD 23)
+target_link_libraries(
+  WALTest
+  PRIVATE Catch2::Catch2WithMain
+          spdlog::spdlog
+          fmt::fmt
+          WAL
+          Config
+          FS)
+
+include(Catch)
+catch_discover_tests(WALTest)

This restores compilation and automatic discovery of Catch2 cases once wal_test.cpp is added.

lib/fs/random_access_file.cpp (3)

36-51: ⚠️ Potential issue

Move-assignment still leaks existing resources
Previous review already covered this; the issue remains.

Before overwriting m_fd / m_ring, close the currently-owned file descriptor and tear down the existing queue to avoid FD & ring leaks.


83-88: ⚠️ Potential issue

io_uring_submit / io_uring_wait_cqe return codes ignored

Negative return values signal submission / wait failure yet the code dereferences an un-initialised cqe or treats the request as successful. Handle both calls explicitly:

-io_uring_submit(&m_ring);
-io_uring_cqe *cqe;
-int wait_res = io_uring_wait_cqe(&m_ring, &cqe);
+int submit_res = io_uring_submit(&m_ring);
+if (submit_res < 0) {
+    return std::unexpected(file_error_t{
+        .code = file_error_code_k::write_failed,
+        .system_errno = -submit_res,
+        .message = std::format("io_uring_submit failed. fd={}", m_fd),
+    });
+}
+
+io_uring_cqe *cqe;
+int wait_res = io_uring_wait_cqe(&m_ring, &cqe);
+if (wait_res < 0) {
+    return std::unexpected(file_error_t{
+        .code = file_error_code_k::write_failed,
+        .system_errno = -wait_res,
+        .message = std::format("io_uring_wait_cqe failed. fd={}", m_fd),
+    });
+}

173-174: ⚠️ Potential issue

kBufferSize is still undefined in this TU

Compilation will fail unless kBufferSize is declared. Define it at namespace or file scope:

+static constexpr std::size_t kBufferSize = 8192;

or include a header that provides it.

src/main.cpp (5)

420-427: ⚠️ Potential issue

Fix incorrect error message for missing peers configuration

The error message incorrectly references "id" when it should reference "peers".

    if (configJson.contains("peers"))
    {
        dbConfig->ServerConfig.peers = configJson["peers"].get<std::vector<std::string>>();
    }
    else
    {
-        throw std::runtime_error("\"id\" is not specified in the config");
+        throw std::runtime_error("\"peers\" is not specified in the config");
    }

561-574: ⚠️ Potential issue

Fix invalid range-for loop syntax

The loop construction is incorrect. A range-for loop can't have an initializer statement.

-            for (raft::id_t replicaId{1}; const auto &replicaIp : pDbConfig->ServerConfig.peers)
+            raft::id_t replicaId{1};
+            for (const auto &replicaIp : pDbConfig->ServerConfig.peers)
             {
                 if (replicaId != pDbConfig->ServerConfig.id)
                 {
                     std::unique_ptr<RaftService::Stub> stub{RaftService::NewStub(
                         grpc::CreateChannel(replicaIp, grpc::InsecureChannelCredentials()))};

                     replicas.emplace_back(raft::node_config_t{.m_id = replicaId, .m_ip = replicaIp},
                                           std::move(stub));
                     spdlog::info("replicaId={} replicaIp={}", replicaId, replicaIp);
                 }

                 ++replicaId;
             }

586-587: ⚠️ Potential issue

Add nullptr check for dynamic_cast result

The dynamic_cast could return nullptr if the cast fails, which would lead to a runtime error.

-            grpcBuilder.RegisterService(
-                dynamic_cast<RaftService::Service *>(pConsensusModule.get()));
+            auto* service = dynamic_cast<RaftService::Service*>(pConsensusModule.get());
+            if (!service) {
+                spdlog::error("Failed to cast consensus module to RaftService");
+                return EXIT_FAILURE;
+            }
+            grpcBuilder.RegisterService(service);

632-635: 🛠️ Refactor suggestion

Add sleep to reduce CPU usage in shutdown loop

The busy-wait loop consumes unnecessary CPU cycles.

        while (!gShutdown)
        {
-            std::this_thread::yield();
+            std::this_thread::sleep_for(std::chrono::milliseconds(100));
        }

261-265: ⚠️ Potential issue

Fix incorrect JSON key in database mode loading

There's a typo in the JSON key when loading the database mode - ",pde" should be "mode".

    if (configJson["database"].contains("mode"))
    {
        dbConfig->DatabaseConfig.mode =
-            db::from_string(configJson["database"][",pde"].get<std::string>());
+            db::from_string(configJson["database"]["mode"].get<std::string>());
    }
lib/fs/append_only_file.h (1)

42-42: Consider returning a reference or shared stream for efficiency

Returning a std::stringstream by value may lead to unnecessary copying of potentially large data. Consider returning a reference to an internal stream or a shared pointer.

-    [[nodiscard]] auto stream() noexcept -> std::expected<std::stringstream, file_error_t>;
+    [[nodiscard]] auto stream() noexcept -> std::expected<std::reference_wrapper<std::stringstream>, file_error_t>;
lib/raft/raft.cpp (4)

260-262: ⚠️ Potential issue

Implement the state machine updates

The TODO comment indicates that state machine updates aren't implemented, which is essential for the Raft protocol to be useful.

        while (m_lastApplied < m_commitIndex)
        {
            ++m_lastApplied;
-            // TODO(lnikon): Update the state machine!
+            if (m_lastApplied <= m_log.size()) {
+                const auto& entry = m_log[m_lastApplied - 1];
+                applyToStateMachine(entry);
+            }
        }

I recommend implementing a state machine interface:

class StateMachine {
public:
    virtual void apply(const LogEntry& entry) = 0;
    virtual ~StateMachine() = default;
};

Then add it to the consensus module and call it here.


473-488: 🛠️ Refactor suggestion

Avoid spawning a new thread for every heartbeat

Creating a new thread for every heartbeat can lead to resource exhaustion under high load. Consider using a thread pool or a single thread that handles all heartbeats.

void consensus_module_t::runHeartbeatThread(std::stop_token token)
{
    constexpr const auto heartbeatInterval{std::chrono::milliseconds(100)};

    while (!token.stop_requested() && !m_shutdown)
    {
-        // TODO(lnikon): Make these calls async
        for (auto &[id, client] : m_replicas)
        {
            spdlog::debug(
                "Node={} is creating a heartbeat thread for the peer={}", m_config.m_id, id);
-            sendAppendEntriesRPC(client.value(), {});
+            // Use a thread pool or queue tasks instead of creating a new thread for each heartbeat
+            std::vector<LogEntry> empty;
+            sendAppendEntriesRPC(client.value(), empty);
        }
        std::this_thread::sleep_for(heartbeatInterval);
    }
}

873-876: 🛠️ Refactor suggestion

Consider using fsync for log reliability

The log is flushed to disk but not explicitly synced. For true Raft safety, consider calling fsync to ensure the data is durable on disk.

        fsa.flush();
        if (fsa.fail())
        {
            return false;
        }
+        
+        // Ensure data is durably written to disk
+        if (fsync(fileno(fsa)) != 0) {
+            spdlog::error("Node={} failed to fsync {}: {}", 
+                          m_config.m_id, path.c_str(), strerror(errno));
+            return false;
+        }

931-933: ⚠️ Potential issue

Remove unintended override of loaded votedFor value

The code explicitly sets m_votedFor = 0 immediately after reading it from a file, which defeats the purpose of persisting this value.

        ifs >> m_commitIndex >> m_votedFor;
-        m_votedFor = 0;
        spdlog::info("Node={} restored commitIndex={} and votedFor={}",
                     m_config.m_id,
                     m_commitIndex,
                     m_votedFor);
🧹 Nitpick comments (16)
lib/structures/lsmtree/compaction/compaction_trigger.h (1)

1-7: Consider adding documentation or removing this empty file.

The file now contains only an empty namespace declaration after removing the CompactionTrigger class. While this cleanup aligns with the modernization efforts mentioned in the project, an empty header file could be confusing for developers.

If this file is being kept as a placeholder for future implementations or to maintain consistent directory structure, consider adding a comment explaining its purpose. Otherwise, if the CompactionTrigger functionality has been completely moved elsewhere or is no longer needed, it might be cleaner to remove this file entirely.

lib/fs/random_access_file.cpp (1)

79-82: Unnecessary IOSQE_IO_LINK flag

IOSQE_IO_LINK is meant for chaining multiple SQEs; here only a single write is submitted. Dropping the flag avoids superfluous kernel work:

-io_uring_prep_writev(sqe, m_fd, &iov, 1, offset);
-sqe->flags |= IOSQE_IO_LINK;
+io_uring_prep_writev(sqe, m_fd, &iov, 1, offset);
lib/wal/wal.cpp (1)

51-59: Consider streaming approach for large logs.

The current implementation reads the entire log into memory when reconstructing records, which might not be efficient for large logs. Consider a streaming approach that processes records one at a time without loading the entire log content.

lib/concurrency/thread_safe_queue.h (1)

11-11: Update documentation to reflect move semantics.

Consider adding documentation comments for the newly added move constructor and move assignment operator to maintain consistency with the well-documented style of the rest of the class.

lib/structures/lsmtree/levels/levels.h (2)

118-119: Function declaration formatting is inconsistent.

The flush_to_level0 function declaration uses a different formatting style compared to other methods in the file, with the return type on a separate line.

Consider using consistent formatting for all method declarations:

-[[nodiscard]] auto flush_to_level0(memtable::memtable_t memtable) const noexcept
-    -> segments::regular_segment::shared_ptr_t;
+[[nodiscard]] auto flush_to_level0(memtable::memtable_t memtable) const noexcept -> segments::regular_segment::shared_ptr_t;

12-12: Update class documentation to reflect move semantics.

The class has good documentation for other methods, but the newly added move constructor and assignment operator lack documentation, while the old comments for the deleted versions remain misleading.

Consider adding appropriate documentation for the new move operations.

lib/wal/log/storage/in_memory_log_storage.h (3)

13-14: Address the TODO comment before finalizing.

There's a TODO comment about resizing the log in the constructor to a default size. Consider implementing this optimization or removing the comment before finalizing the code.


46-56: Append methods always return true regardless of outcome.

Both append methods always return true without any actual error checking. While this is consistent with the interface, consider documenting that these operations cannot fail in this implementation or add actual error checking in case of allocation failures.


90-93: Builder's build method could be more explicit about success.

The build method returns an optional but always constructs a valid object. Consider either documenting that the optional will always contain a value or simplify the return type to just return the object directly since it cannot fail.

-[[nodiscard]] auto build() -> std::optional<in_memory_log_storage_t>
+[[nodiscard]] auto build() -> in_memory_log_storage_t
 {
-    return in_memory_log_storage_t{};
+    return {};
 }
lib/wal/log/storage/builder.h (1)

55-56: Typo in static-assert message

"sutisfy""satisfy".
While harmless at runtime, typos in diagnostic strings reduce professionalism and hinder searchability.

-              "log_storage_wrapper_t must sutisfy TLogStorageConcept concept");
+              "log_storage_wrapper_t must satisfy TLogStorageConcept concept");
lib/wal/log/storage/backend/backend.h (1)

83-86: config() returns by value — avoid unnecessary copies

Returning the whole storage_backend_config_t by value forces a copy each time.
A cheap const-reference is sufficient and avoids accidental mutations.

-    [[nodiscard]] auto config() const
-    {
-        return m_config;
-    }
+    [[nodiscard]] auto config() const -> const storage_backend_config_t &
+    {
+        return m_config;
+    }
lib/wal/log/storage/persistent_log_storage.h (3)

20-28: Consider checking for nullptr in static_assert conditional

The static_assert(false, ...) in create_storage_backend_builder will always fail during compilation if the template is instantiated with an unsupported backend type. Consider using std::is_same_v<TStorageBackend, void> or another technique that allows the compiler to skip the branch during template instantiation.

-    else
-    {
-        static_assert(false, "not supported backend storage type passed");
-    }
+    else
+    {
+        static_assert(!std::is_same_v<TStorageBackend, TStorageBackend>, 
+                      "not supported backend storage type passed");
+    }

74-85: Add overflow protection to append operation

The append method writes directly at m_backendStorage.size() without verifying if there's enough space. While conceptually an append-only file grows as needed, adding a check could prevent potential overflow issues.

 [[nodiscard]] auto append(std::string entry) -> bool
 {
+    // Check if appending would cause overflow
+    const auto current_size = m_backendStorage.size();
+    if (std::numeric_limits<std::size_t>::max() - current_size < entry.size()) {
+        spdlog::error(
+            "Persistent log storage append would overflow. Current size={}, entry size={}",
+            current_size, entry.size());
+        return false;
+    }
+
     if (!m_backendStorage.write(
-            static_cast<const char *>(entry.data()), m_backendStorage.size(), entry.size()))
+            static_cast<const char *>(entry.data()), current_size, entry.size()))
     {
         spdlog::error(
             "Persistent log storage write failed. Entry={}, size={}\n", entry, entry.size());
         return false;
     }
     m_inMemoryLog.emplace_back(std::move(entry));
     return true;
 }

92-100: Optimize read method to avoid unnecessary copy

The read method creates a new optional object for each successful read. Consider using std::optional's in-place constructor to avoid an extra copy.

 [[nodiscard]] auto read(const size_t index) const -> std::optional<std::string>
 {
     if (index < m_inMemoryLog.size())
     {
-        return std::make_optional(m_inMemoryLog[index]);
+        return m_inMemoryLog[index];
     }

     return std::nullopt;
 }
lib/fs/append_only_file.h (2)

47-50: Add documentation for the private member variables

The private implementation details would benefit from documentation explaining their roles, particularly how the io_uring instance relates to the file descriptor.

  private:
    append_only_file_t(random_access_file_t &&fd) noexcept;

+    // Underlying random access file for I/O operations
    random_access_file::random_access_file_t m_fd;
+    // io_uring instance for asynchronous I/O operations
    io_uring                                 m_ring{};

53-57: Add documentation for the builder class

The builder lacks documentation to explain its purpose and usage.

+/**
+ * @brief Builder for creating append-only file instances
+ * 
+ * Provides a convenient way to configure and create append_only_file_t instances
+ * with various options such as direct I/O.
+ */
class append_only_file_builder_t
{
  public:
+    /**
+     * @brief Builds an append-only file
+     * @param path Path to the file
+     * @param direct_io Whether to use direct I/O (bypassing OS cache)
+     * @return An expected containing the built file or an error
+     */
    auto build(std::string path, bool direct_io) -> std::expected<append_only_file_t, file_error_t>;
};
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f11e4df and 50df1b1.

📒 Files selected for processing (47)
  • conanfile.txt (1 hunks)
  • lib/concurrency/CMakeLists.txt (1 hunks)
  • lib/concurrency/helpers.h (1 hunks)
  • lib/concurrency/thread_safe_queue.h (5 hunks)
  • lib/db/CMakeLists.txt (1 hunks)
  • lib/db/db.cpp (5 hunks)
  • lib/db/db.h (3 hunks)
  • lib/fs/append_only_file.cpp (1 hunks)
  • lib/fs/append_only_file.h (1 hunks)
  • lib/fs/random_access_file.cpp (1 hunks)
  • lib/fs/random_access_file.h (1 hunks)
  • lib/raft/raft.cpp (1 hunks)
  • lib/raft/raft.h (1 hunks)
  • lib/raft/replicated_log.h (1 hunks)
  • lib/structures/lsmtree/compaction/compactation.h (1 hunks)
  • lib/structures/lsmtree/compaction/compaction_trigger.h (1 hunks)
  • lib/structures/lsmtree/compaction/level_zero_compactation.h (1 hunks)
  • lib/structures/lsmtree/levels/level.h (3 hunks)
  • lib/structures/lsmtree/levels/levels.cpp (5 hunks)
  • lib/structures/lsmtree/levels/levels.h (2 hunks)
  • lib/structures/lsmtree/lsmtree.cpp (9 hunks)
  • lib/structures/lsmtree/lsmtree.h (5 hunks)
  • lib/structures/lsmtree/lsmtree_compactor.h (1 hunks)
  • lib/structures/lsmtree/lsmtree_config.h (1 hunks)
  • lib/structures/lsmtree/lsmtree_reader_writer.h (1 hunks)
  • lib/structures/lsmtree/lsmtree_types.h (1 hunks)
  • lib/structures/lsmtree/segments/helpers.h (2 hunks)
  • lib/structures/lsmtree/segments/lsmtree_regular_segment.cpp (2 hunks)
  • lib/structures/lsmtree/segments/lsmtree_regular_segment.h (2 hunks)
  • lib/structures/lsmtree/segments/lsmtree_segment_factory.h (1 hunks)
  • lib/structures/lsmtree/segments/lsmtree_segment_index.h (1 hunks)
  • lib/structures/lsmtree/segments/segment_storage.h (2 hunks)
  • lib/structures/lsmtree/segments/types.h (1 hunks)
  • lib/structures/memtable/memtable.h (1 hunks)
  • lib/structures/memtable/memtable_reader_writer.h (1 hunks)
  • lib/structures/sorted_vector/sorted_vector.h (5 hunks)
  • lib/wal/CMakeLists.txt (1 hunks)
  • lib/wal/log/log.h (1 hunks)
  • lib/wal/log/storage/backend/backend.h (1 hunks)
  • lib/wal/log/storage/backend/file_storage_backend.h (1 hunks)
  • lib/wal/log/storage/builder.h (1 hunks)
  • lib/wal/log/storage/in_memory_log_storage.h (1 hunks)
  • lib/wal/log/storage/log_storage_crtp.h (1 hunks)
  • lib/wal/log/storage/persistent_log_storage.h (1 hunks)
  • lib/wal/wal.cpp (1 hunks)
  • lib/wal/wal.h (1 hunks)
  • src/main.cpp (9 hunks)
✅ Files skipped from review due to trivial changes (13)
  • lib/structures/lsmtree/segments/types.h
  • lib/structures/lsmtree/lsmtree_config.h
  • lib/structures/lsmtree/compaction/compactation.h
  • lib/concurrency/CMakeLists.txt
  • lib/structures/lsmtree/segments/helpers.h
  • lib/structures/lsmtree/lsmtree_types.h
  • lib/structures/memtable/memtable_reader_writer.h
  • lib/structures/lsmtree/compaction/level_zero_compactation.h
  • lib/structures/sorted_vector/sorted_vector.h
  • lib/structures/lsmtree/lsmtree_compactor.h
  • lib/structures/lsmtree/segments/lsmtree_regular_segment.h
  • lib/wal/log/log.h
  • lib/wal/log/storage/log_storage_crtp.h
🚧 Files skipped from review as they are similar to previous changes (14)
  • lib/structures/lsmtree/segments/segment_storage.h
  • lib/structures/lsmtree/segments/lsmtree_segment_factory.h
  • conanfile.txt
  • lib/structures/lsmtree/levels/levels.cpp
  • lib/structures/lsmtree/segments/lsmtree_regular_segment.cpp
  • lib/db/db.h
  • lib/db/db.cpp
  • lib/fs/random_access_file.h
  • lib/structures/lsmtree/lsmtree.cpp
  • lib/raft/replicated_log.h
  • lib/fs/append_only_file.cpp
  • lib/raft/raft.h
  • lib/db/CMakeLists.txt
  • lib/wal/wal.h
🧰 Additional context used
🧬 Code Graph Analysis (2)
lib/wal/log/storage/in_memory_log_storage.h (4)
lib/raft/replicated_log.h (11)
  • other (31-44)
  • other (31-31)
  • nodiscard (51-54)
  • nodiscard (56-73)
  • nodiscard (75-78)
  • nodiscard (80-83)
  • nodiscard (85-88)
  • nodiscard (100-105)
  • entry (51-51)
  • command (56-57)
  • index (75-75)
lib/wal/log/log.h (9)
  • nodiscard (39-42)
  • nodiscard (44-48)
  • nodiscard (50-53)
  • nodiscard (55-58)
  • nodiscard (60-63)
  • nodiscard (75-78)
  • entry (39-39)
  • command (44-45)
  • index (50-50)
lib/wal/log/storage/builder.h (3)
  • entry (26-26)
  • command (31-31)
  • index (37-37)
lib/wal/log/storage/persistent_log_storage.h (3)
  • entry (74-74)
  • command (87-87)
  • index (92-92)
lib/fs/random_access_file.cpp (4)
lib/fs/random_access_file.h (8)
  • random_access_file_t (17-17)
  • random_access_file_t (19-19)
  • random_access_file_t (22-22)
  • random_access_file_t (25-25)
  • random_access_file_t (44-44)
  • data (27-28)
  • offset (30-31)
  • path (53-54)
lib/fs/append_only_file.cpp (12)
  • read (55-59)
  • read (55-56)
  • size (61-64)
  • size (61-61)
  • flush (66-69)
  • flush (66-66)
  • reset (71-74)
  • reset (71-71)
  • stream (76-79)
  • stream (76-76)
  • build (101-117)
  • build (101-102)
lib/fs/common.cpp (2)
  • from_errno (13-18)
  • from_errno (13-14)
lib/posix_wrapper/open_flag.h (1)
  • to_native (54-54)
🪛 Cppcheck (2.10-2)
lib/raft/raft.cpp

[performance] 123-123: Function parameter 'nodeConfig' should be passed by const reference.

(passedByValue)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: build-gcc
🔇 Additional comments (33)
lib/structures/lsmtree/segments/lsmtree_segment_index.h (1)

1-1: Using #pragma once improves code maintainability.

The replacement of traditional include guards with #pragma once is a good modernization that reduces boilerplate and prevents potential issues with mismatched or duplicate include guards. This change aligns with modern C++ best practices and is consistent with other header files in the project.

lib/structures/memtable/memtable.h (1)

1-13: Header modernization looks good

#pragma once and the reordered include list improve readability and guard safety. No functional impact detected.

lib/structures/lsmtree/lsmtree_reader_writer.h (2)

1-1: Good practice: Using #pragma once instead of include guards.

The change from traditional include guards to #pragma once is a good modernization that improves readability and reduces the chance of macro naming conflicts.


38-40: Verify recursive template parameter pattern.

The lsmtree_reader_t inherits from generic_reader_t<lsmtree_reader_t>, passing itself as a template parameter. This is a form of the Curiously Recurring Template Pattern (CRTP), but ensure this is intentional as it can lead to unexpected behavior if not used correctly.

lib/wal/wal.cpp (2)

6-24: Good implementation of move semantics.

The WAL class properly implements move construction and assignment with the correct move semantics, self-assignment check, and resource transfer. This ensures efficient handling of WAL resources.


44-49: Good logging and helper function.

The lambda function recordToString is a good practice for encapsulating the record serialization logic and enhancing log readability.

lib/concurrency/helpers.h (1)

33-38: Good practice: Explicitly deleting copy and move operations.

Explicitly deleting copy and move operations for the lock guard is good practice since these operations could lead to incorrect lock states.

lib/structures/lsmtree/levels/level.h (2)

23-35: Good practice: Explicit lifecycle management.

Explicitly deleting default constructor, copy and move operations, and defaulting the destructor clarifies the intended usage pattern and prevents accidental misuse of the class.


53-76: Improved code formatting.

Moving return types to a new line after function signatures improves readability, especially for longer function declarations with complex return types.

lib/concurrency/thread_safe_queue.h (5)

21-25: Default constructor and explicit disabling of copy semantics look good.

The class properly disables copy semantics and defines a default constructor, which is appropriate for a thread-safe container.


26-30: Move constructor correctly handles non-movable mutex.

The implementation correctly moves the internal queue while recognizing that the mutex cannot be moved. Using the move_under_optional_lock helper is a good approach for safely moving resources.


31-40: Move assignment operator uses the copy-and-swap idiom with proper synchronization.

The move assignment implementation follows the safe copy-and-swap idiom, correctly handling synchronization with absl_dual_mutex_lock_guard to prevent data races. This is a robust approach for concurrent data structures.


71-71: Simplified pop_all implementation looks good.

The simplified ternary operator approach is more concise than the previous implementation.


98-104: Swap implementation correctly handles non-movable mutex.

The swap implementation correctly avoids swapping the mutex, which is not movable. The comment explaining this is helpful.

lib/structures/lsmtree/levels/levels.h (4)

24-26: Copy semantics explicitly disabled.

The class properly disables copy semantics which is appropriate for this type of resource-managing class.


27-36: Move constructor correctly handles non-movable synchronization primitives.

The implementation properly moves all movable members while leaving the mutex and notification in their default states. The comments are clear about this behavior.


38-47: Move assignment uses the copy-and-swap idiom with proper synchronization.

The implementation safely uses a temporary object and swap approach with proper mutex locking via absl_dual_mutex_lock_guard. This is a thread-safe implementation of the move assignment operator.


124-135: Swap implementation correctly handles non-movable synchronization primitives.

The swap method properly swaps movable members while leaving the mutex and notification as-is. The comments clearly explain this behavior.

lib/wal/log/storage/in_memory_log_storage.h (4)

19-22: Constructor that takes the log as a parameter looks good.

The constructor properly moves the provided log vector, which is efficient.


24-39: Move semantics implementation is correct.

The move constructor and assignment operator are properly implemented. The self-assignment check in the assignment operator is a good practice.


58-65: Read method safely handles out-of-range access.

The read method properly checks bounds and returns an optional, which is a good practice for safe API design.


82-83: Static assertion verifies concept compliance.

The static assertion ensures that the class satisfies the required concept, which is a good practice for compile-time verification.

lib/structures/lsmtree/lsmtree.h (5)

67-79: Move constructor correctly handles non-movable mutex.

The implementation properly moves all movable members and leaves the mutex in its default state. The comments explain this clearly.

However, the documentation comment above this method (lines 60-66) still says the move constructor is deleted, which is now incorrect.


101-110: Move assignment operator uses copy-and-swap idiom with proper synchronization.

The implementation safely uses a temporary object and swap approach with proper mutex locking. This is a thread-safe implementation of the move assignment operator.

However, the documentation comment above this method (lines 91-100) still says the move assignment operator is deleted, which is now incorrect.


8-8: Updated WAL include path reflects architectural changes.

The include path change from the db namespace to a dedicated wal namespace indicates a good architectural separation of concerns.


178-178:

❌ Incorrect review comment

Declare swap method implementation.

The swap method is declared but not defined in this file. Ensure the implementation is provided in the corresponding .cpp file.


🏁 Script executed:

#!/bin/bash
# Check if the swap method is implemented in the corresponding cpp file
grep -r "void structures::lsmtree::lsmtree_t::swap" lib/structures/lsmtree/

Length of output: 77


🏁 Script executed:

#!/bin/bash
# List all files in the lsmtree directory to locate possible implementation files
echo "Files in lib/structures/lsmtree/:"
find lib/structures/lsmtree -maxdepth 1 -type f

# Search for any swap method implementation patterns in C++ source files
echo -e "\nSearching for 'swap(' in C++ source files:"
grep -R "swap(" lib/structures/lsmtree/ --include='*.cpp' --include='*.cc' --include='*.cxx' -n || true

echo -e "\nSearching for '::swap' qualified definitions:"
grep -R "::swap" lib/structures/lsmtree/ --include='*.cpp' --include='*.cc' --include='*.cxx' -n || true

Length of output: 1696


swap method already implemented

  • Definition found in lib/structures/lsmtree/lsmtree.cpp at line 326.
  • No missing implementation; you can safely ignore this comment.

Likely an incorrect or invalid review comment.


181-181:

✅ Verification successful

Config pointer is no longer const.

The configuration pointer member has been changed from const to non-const. If this was intentional, consider documenting why this change was made, as it could potentially allow the configuration to be modified after initialization.


🏁 Script executed:

#!/bin/bash
# Check for modifications to the config pointer in the class methods
grep -r "m_pConfig =" lib/structures/lsmtree/

Length of output: 47


🏁 Script executed:

#!/bin/bash
# List all occurrences of m_pConfig in the LSMTree implementation & header
grep -R "m_pConfig" -n lib/structures/lsmtree/

# Display context around the member declaration in the header
sed -n '160,200p' lib/structures/lsmtree/lsmtree.h

Length of output: 4031


Non-const config pointer supports move semantics

The m_pConfig member was deliberately changed to a non-const config::shared_ptr_t so that lsmtree_t instances can be safely moved or swapped via the public swap() method. In practice, assignments to m_pConfig only occur in constructors and in swap(), and all other accesses remain read-only. No further changes are required.

lib/wal/log/storage/backend/file_storage_backend.h (1)

64-67: Negative file size casts to huge positive number

append_only_file_t::size probably returns std::expected<std::size_t, …> but on error the code converts a negative sentinel to a large unsigned value.
Return 0 instead (or propagate the error).

-        return m_file.size().value_or(0);
+        return m_file.size().transform_error([](auto) { return std::size_t{0}; }).value_or(0);
lib/wal/log/storage/backend/backend.h (1)

3-4: <expected> is C++23 — verify toolchain support

std::expected is standardised only since C++23.
If the project still targets C++20, you’ll need either
#include <tl/expected.hpp> (or another shim), or
-std=c++23 in the toolchain.

Please confirm CI flags/support.

lib/wal/log/storage/persistent_log_storage.h (3)

87-90: LGTM: Good use of existing append method

The overloaded append method correctly reuses the string-based append method by formatting the input parameters.


102-106: LGTM: Reset implementation correctly clears both memory and storage

The reset method correctly handles both in-memory and backend state.


134-180: LGTM: Well-designed builder pattern implementation

The builder pattern implementation is clean and follows best practices:

  • Proper move semantics
  • Deleted copy operations
  • Clear error handling with std::expected
  • Descriptive error messages
src/main.cpp (1)

469-479: LGTM: Good implementation of signal handlers

The signal handlers for SIGTERM and SIGINT are properly implemented with error checking.

Comment on lines 52 to 62
[[nodiscard]] auto read_impl(std::size_t offset, std::size_t size) -> std::string
{
std::string buffer;
buffer.resize(size);
if (const auto res = m_file.read(offset, buffer.data(), size); !res.has_value())
{
spdlog::error("Failed to read from file storage. Offset={}, size={}", offset, size);
return {};
}
return buffer;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

read_impl returns a string with undefined tail bytes

append_only_file_t::read may return fewer bytes than requested; the code never shrinks buffer, so uninitialised bytes leak to callers.

-        std::string buffer;
-        buffer.resize(size);
-        if (const auto res = m_file.read(offset, buffer.data(), size); !res.has_value())
+        std::string buffer(size, '\0');
+        const auto res = m_file.read(offset, buffer.data(), size);
+        if (!res.has_value())
         {
             spdlog::error("Failed to read from file storage. Offset={}, size={}", offset, size);
             return {};
         }
-        return buffer;
+        buffer.resize(res.value());   // keep only the bytes actually read
+        return buffer;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
[[nodiscard]] auto read_impl(std::size_t offset, std::size_t size) -> std::string
{
std::string buffer;
buffer.resize(size);
if (const auto res = m_file.read(offset, buffer.data(), size); !res.has_value())
{
spdlog::error("Failed to read from file storage. Offset={}, size={}", offset, size);
return {};
}
return buffer;
}
[[nodiscard]] auto read_impl(std::size_t offset, std::size_t size) -> std::string
{
- std::string buffer;
- buffer.resize(size);
- if (const auto res = m_file.read(offset, buffer.data(), size); !res.has_value())
+ std::string buffer(size, '\0');
+ const auto res = m_file.read(offset, buffer.data(), size);
+ if (!res.has_value())
{
spdlog::error("Failed to read from file storage. Offset={}, size={}", offset, size);
return {};
}
- return buffer;
+ buffer.resize(res.value()); // keep only the bytes actually read
+ return buffer;
}

Comment on lines 44 to 50
[[nodiscard]] auto write_impl(const char *data, std::size_t offset, std::size_t size) -> bool
{
(void)offset;
return m_file.append({data, size})
.transform([](ssize_t res) { return res >= 0; })
.value_or(false);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

write_impl silently ignores offset (potential data corruption)

offset is discarded even though the public interface promises arbitrary-offset writes.
Callers may assume idempotent overwrite semantics, leading to duplicated or corrupted log segments.

If the backend is truly append-only, document this clearly and reject non-EOF offsets:

+    if (offset != m_file.size().value_or(0)) {
+        spdlog::error("file_storage_backend_t supports only append-at-EOF writes (got offset={})", offset);
+        return false;
+    }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
[[nodiscard]] auto write_impl(const char *data, std::size_t offset, std::size_t size) -> bool
{
(void)offset;
return m_file.append({data, size})
.transform([](ssize_t res) { return res >= 0; })
.value_or(false);
}
[[nodiscard]] auto write_impl(const char *data, std::size_t offset, std::size_t size) -> bool
{
(void)offset;
if (offset != m_file.size().value_or(0)) {
spdlog::error("file_storage_backend_t supports only append-at-EOF writes (got offset={})", offset);
return false;
}
return m_file.append({data, size})
.transform([](ssize_t res) { return res >= 0; })
.value_or(false);
}

Comment on lines 123 to 124
consensus_module_t::consensus_module_t(node_config_t nodeConfig,
std::vector<raft_node_grpc_client_t> replicas) noexcept
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Pass node configuration by const reference

The nodeConfig parameter is passed by value, which requires an unnecessary copy. Consider passing it by const reference for better performance.

-consensus_module_t::consensus_module_t(node_config_t                        nodeConfig,
+consensus_module_t::consensus_module_t(const node_config_t&                 nodeConfig,
                                       std::vector<raft_node_grpc_client_t> replicas) noexcept
🧰 Tools
🪛 Cppcheck (2.10-2)

[performance] 123-123: Function parameter 'nodeConfig' should be passed by const reference.

(passedByValue)

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (3)
lib/wal/wal.cpp (1)

33-36: Prefer letting spdlog format – avoid eager std::format
spdlog::error(std::format(...)) eagerly builds the full string even when the log level is disabled, doubling formatting work.
Let spdlog (fmt) handle the placeholders:

-        spdlog::error(std::format("WAL: Failed to append entry: {} {} {}",
-                                  op_view,
-                                  rec.kv.m_key.m_key,
-                                  rec.kv.m_value.m_value));
+        spdlog::error("WAL: Failed to append entry: {} {} {}",
+                      op_view, rec.kv.m_key.m_key, rec.kv.m_value.m_value);
lib/wal/log/storage/backend/append_only_file_storage_backend.h (1)

55-67: Avoid allocating the maximum size up-front for reads
Allocating std::string buffer(size, '\0') can waste memory when the file tail is shorter than the requested size. Allocate exactly what was read:

-        std::string buffer(size, '\0');
-        const auto  res = m_file.read(offset, buffer.data(), size);
+        std::string buffer(size, '\0');
+        const auto res = m_file.read(offset, buffer.data(), size);-        buffer.resize(res.value());
+        buffer.assign(buffer.data(), static_cast<size_t>(res.value()));

or build the string from the returned byte count to avoid retaining unused capacity.

lib/wal/log/storage/backend/backend.h (1)

83-86: config() returns by value – unnecessary copy
config() hands back an entire storage_backend_config_t object each call. Returning const storage_backend_config_t& avoids copies and clarifies immutability:

-    [[nodiscard]] auto config() const
-    {
-        return m_config;
-    }
+    [[nodiscard]] const storage_backend_config_t& config() const { return m_config; }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 50df1b1 and 720ee8c.

📒 Files selected for processing (18)
  • lib/concurrency/helpers.h (1 hunks)
  • lib/db/db.cpp (4 hunks)
  • lib/db/db.h (3 hunks)
  • lib/fs/append_only_file.cpp (1 hunks)
  • lib/server/grpc_server.cpp (2 hunks)
  • lib/structures/lsmtree/levels/level.cpp (12 hunks)
  • lib/structures/lsmtree/lsmtree.cpp (10 hunks)
  • lib/structures/lsmtree/lsmtree.h (6 hunks)
  • lib/structures/lsmtree/segments/lsmtree_regular_segment.h (2 hunks)
  • lib/structures/lsmtree/segments/segment_storage.h (2 hunks)
  • lib/wal/log/concepts.h (1 hunks)
  • lib/wal/log/storage/backend/append_only_file_storage_backend.h (1 hunks)
  • lib/wal/log/storage/backend/backend.h (1 hunks)
  • lib/wal/log/storage/builder.h (1 hunks)
  • lib/wal/log/storage/in_memory_log_storage.h (1 hunks)
  • lib/wal/log/storage/persistent_log_storage.h (1 hunks)
  • lib/wal/wal.cpp (1 hunks)
  • lib/wal/wal.h (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • lib/wal/log/storage/builder.h
🚧 Files skipped from review as they are similar to previous changes (14)
  • lib/server/grpc_server.cpp
  • lib/db/db.h
  • lib/wal/log/concepts.h
  • lib/wal/log/storage/in_memory_log_storage.h
  • lib/structures/lsmtree/levels/level.cpp
  • lib/db/db.cpp
  • lib/structures/lsmtree/lsmtree.cpp
  • lib/wal/log/storage/persistent_log_storage.h
  • lib/wal/wal.h
  • lib/structures/lsmtree/segments/segment_storage.h
  • lib/concurrency/helpers.h
  • lib/structures/lsmtree/segments/lsmtree_regular_segment.h
  • lib/structures/lsmtree/lsmtree.h
  • lib/fs/append_only_file.cpp
🧰 Additional context used
🧬 Code Graph Analysis (1)
lib/wal/log/storage/backend/backend.h (6)
lib/wal/log/storage/builder.h (6)
  • nodiscard (29-32)
  • nodiscard (34-38)
  • nodiscard (40-43)
  • nodiscard (45-48)
  • nodiscard (50-53)
  • nodiscard (70-105)
lib/wal/log/storage/in_memory_log_storage.h (6)
  • nodiscard (46-50)
  • nodiscard (52-56)
  • nodiscard (58-65)
  • nodiscard (67-71)
  • nodiscard (73-76)
  • nodiscard (90-93)
lib/wal/log/storage/persistent_log_storage.h (2)
  • nodiscard (75-86)
  • nodiscard (88-91)
lib/fs/append_only_file.h (2)
  • data (30-31)
  • offset (33-34)
lib/fs/random_access_file.h (2)
  • data (27-28)
  • offset (30-31)
lib/fs/random_access_file.cpp (2)
  • size (116-128)
  • size (116-116)
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: build-gcc
🔇 Additional comments (2)
lib/wal/log/storage/backend/append_only_file_storage_backend.h (2)

47-53: Offset parameter silently ignored – risk of accidental overwrite
write_impl discards the offset argument and blindly appends. If callers ever rely on offset positioning (e.g. re-writing a corrupted tail), the operation will violate the contract.

At minimum, assert or log when offset != size() to surface misuse:

+        if (offset != static_cast<ssize_t>(m_file.size().value_or(0)))
+        {
+            spdlog::warn("append_only_file_storage_backend_t::write_impl: "
+                         "offset ({}) differs from file size; forcing append",
+                         offset);
+        }

112-116: Enum value kWrongFileFormat is never produced
build_impl maps every failure of append_only_file_builder_t to kUnableToOpenFile. If file-format validation is expected, return kWrongFileFormat when error.code indicates e.g. invalid_magic.

This keeps error reporting actionable.

lib/wal/wal.cpp Outdated
Comment on lines 60 to 68
auto logStream = std::stringstream{};
for (std::size_t idx{0}; idx < m_log.size(); ++idx)
{
if (auto logLine{m_log.read(idx)}; logLine.has_value())
{
logStream << logLine.value();
}
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Streaming the whole log into memory is O(N²) – parse on the fly
records() concatenates all entries into a stringstream and then re-reads them line-by-line, doubling memory and work.
Iterating once and parsing each line immediately avoids the extra copy:

std::vector<record_t> result;
for (std::size_t idx = 0; idx < m_log.size(); ++idx)
{
    if (auto line = m_log.read(idx); line && !absl::StripAsciiWhitespace(*line).empty())
    {
        std::istringstream s(*line);
        record_t rec;
        rec.read(s);
        spdlog::debug("WAL: Recovered record: {}", recordToString(rec));
        result.emplace_back(std::move(rec));
    }
}

This keeps memory proportional to the largest record rather than the full log.

lib/wal/wal.cpp Outdated
Comment on lines 28 to 32
const auto op_view{magic_enum::enum_name(rec.op)};
if (!m_log.append(std::string{op_view.data(), op_view.size()},
rec.kv.m_key.m_key,
rec.kv.m_value.m_value))
{
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

✏️ Delimiter missing – recovered records will concatenate together
m_log.append is invoked with three separate fields but without any line delimiter ('\n').
records() later expects entries to be newline-separated (std::getline(logStream, line)), which is not true for the in-memory implementation (fmt::format("{} {} {}", …)).

Result: every record after the first is parsed as part of the same “line”, breaking recovery.

-    if (!m_log.append(std::string{op_view.data(), op_view.size()},
-                      rec.kv.m_key.m_key,
-                      rec.kv.m_value.m_value))
+    // Add the delimiter directly here so all storage back-ends produce a
+    // canonical “one-entry-per-line” format expected by `records()`.
+    auto command = std::string{op_view};
+    if (!m_log.append(fmt::format("{}\n", command),          // single-string overload
+                      rec.kv.m_key.m_key,
+                      rec.kv.m_value.m_value))

or update the storage layer to append the terminating newline.

Without this fix, log recovery will silently corrupt the WAL state.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const auto op_view{magic_enum::enum_name(rec.op)};
if (!m_log.append(std::string{op_view.data(), op_view.size()},
rec.kv.m_key.m_key,
rec.kv.m_value.m_value))
{
const auto op_view{magic_enum::enum_name(rec.op)};
// Add the delimiter directly here so all storage back-ends produce a
// canonical “one-entry-per-line” format expected by `records()`.
auto command = std::string{op_view};
if (!m_log.append(fmt::format("{}\n", command),
rec.kv.m_key.m_key,
rec.kv.m_value.m_value))
{
//
}

Comment on lines +42 to +45
[[nodiscard]] auto read(ssize_t offset, std::size_t size) -> std::string
{
return static_cast<Derived *>(this)->read_impl(offset, size);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

No error propagation – read() cannot tell success from empty data
The base interface returns a plain std::string; failures are reported as an empty string, which is indistinguishable from a legitimate empty payload. Consider returning std::expected<std::string, storage_backend_error_t> (or std::optional<std::string>) to preserve error semantics.

This will let higher layers react to I/O failures instead of silently continuing with corrupt state.

@lnikon lnikon marked this pull request as draft May 1, 2025 21:00
@lnikon lnikon changed the title Implement Raft protocol to replicate KV pairs Raft: Support database replication via Raft. May 1, 2025
@lnikon lnikon requested a review from Copilot May 13, 2025 15:41
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces Raft consensus support along with significant refactoring and modernization updates. Key changes include adding Raft-based replication features with gRPC integration and comprehensive unit tests; updating configuration schemas, build files, and project naming; and improving code formatting and static analysis settings.

Reviewed Changes

Copilot reviewed 138 out of 138 changed files in this pull request and generated no comments.

Show a summary per file
File Description
conan/profiles/debug-clang Update Clang version and compiler settings
bench/fs/*.cpp Reformatting stream initialization for improved readability
bench/db/*.cpp Adjustments to benchmark formatting and multi-line constructor usage
bench and main CMakeLists.txt Update project name and add new dependency/find_package calls
assets/*.json Update and add new configuration schema files and configs
.vscode/*.json, .clang-tidy, .clang-format Update IDE and linting configurations
DESIGN.md New design documentation
Comments suppressed due to low confidence (4)

bench/fs/open_write_streams.cpp:16

  • [nitpick] Ensure that the multi-line std::fstream constructor call is aligned consistently with the project's style guidelines for improved readability.
std::fstream      fs(filename,

bench/fs/lots_of_write.cpp:17

  • [nitpick] Consider aligning the constructor parameters in a consistent format with other similar instances in the codebase to enhance readability.
std::fstream      fs(filename,

bench/db/celero_bench.cpp:68

  • [nitpick] Review the formatting of the multi-line emplace_back call for consistency with similar patterns in the benchmarks.
m_records.emplace_back(bench::generateRandomString(keySize),

bench/db/basic_bench.cpp:32

  • [nitpick] Consider ensuring consistent alignment and line breaks for multi-line constructor calls across different benchmark files to improve overall maintainability.
records.emplace_back(bench::generateRandomString(1024),

- Fixed clang version to 19 in conan profiles, as clang++-20 borks the
spdlog
- Optimized WAL records() & added check for a empty log
- Implement replicated_log_storage_t
- Implemented proper initialization order in Main
- Started integrating wal::wal_t with raft::consensus_module_t
- Make wal_t and related classes to support TEntry
- Support serialization of TEntry via stream operators (initial version)
@lnikon
Copy link
Owner Author

lnikon commented Jun 21, 2025

@coderabbitai full review

Copy link

coderabbitai bot commented Jun 21, 2025

✅ Actions performed

Full review triggered.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 23

🔭 Outside diff range comments (2)
lib/structures/lsmtree/lsmtree_reader_writer.h (2)

38-38: Fix incorrect template parameter in inheritance.

The lsmtree_reader_t should inherit from generic_reader_t<lsmtree_t>, not generic_reader_t<lsmtree_reader_t> to avoid circular template instantiation.

Apply this diff to fix the inheritance:

-struct lsmtree_reader_t : generic_reader_t<lsmtree_reader_t>
+struct lsmtree_reader_t : generic_reader_t<lsmtree_t>

46-46: Fix incorrect base class for reader struct.

The lsmtree_segment_reader_t should inherit from generic_reader_t, not generic_writer_t, to maintain logical consistency.

Apply this diff to fix the inheritance:

-struct lsmtree_segment_reader_t : generic_writer_t<lsmtree_segment_reader_t>
+struct lsmtree_segment_reader_t : generic_reader_t<lsmtree_segment_reader_t>
♻️ Duplicate comments (37)
examples/iouring/raw_uring/main.cpp (3)

21-38: Critical: Fix incorrect ring buffer field types.

The SQ/CQ ring descriptors must use unsigned int* instead of unsigned char*. When dereferencing these pointers, you're reading/writing single bytes instead of 4-byte words, causing data corruption.

Apply this fix:

 struct app_io_sq_ring
 {
-    unsigned char *head;
-    unsigned char *tail;
-    unsigned char *ring_mask;
-    unsigned char *ring_entries;
-    unsigned char *flags;
-    unsigned char *array;
+    unsigned int *head;
+    unsigned int *tail;
+    unsigned int *ring_mask;
+    unsigned int *ring_entries;
+    unsigned int *flags;
+    unsigned int *array;
 };
 
 struct app_io_cq_ring
 {
-    unsigned char       *head;
-    unsigned char       *tail;
-    unsigned char       *ring_mask;
-    unsigned char       *ring_entries;
+    unsigned int        *head;
+    unsigned int        *tail;
+    unsigned int        *ring_mask;
+    unsigned int        *ring_entries;
     struct io_uring_cqe *cqes;
 };

159-163: Critical: Fix pointer arithmetic for CQE offset.

The byte offset is incorrectly scaled by sizeof(io_uring_cqe) when added to the typed pointer.

Apply this fix:

-    cring->cqes = (io_uring_cqe *)cq_ptr + p.cq_off.cqes;
+    cring->cqes = reinterpret_cast<io_uring_cqe *>(
+        reinterpret_cast<unsigned char *>(cq_ptr) + p.cq_off.cqes);

194-211: Critical: Fix memory and file descriptor leaks.

Every buffer, file_info structure, and file descriptor must be freed/closed after processing.

The file_info structure needs to store the file descriptor:

 struct file_info
 {
+    int          fd;
     off_t        file_sz;
     struct iovec iovecs[];
 };

Then in submit_to_sq, store it:

+    fi->fd = file_fd;
     fi->file_sz = file_sz;

And clean up in read_from_cq after line 208:

         for (int i = 0; i < blocks; i++)
         {
             output_to_console((char *)fi->iovecs[i].iov_base, fi->iovecs[i].iov_len);
+            free(fi->iovecs[i].iov_base);
         }
+        close(fi->fd);
+        free(fi);
examples/raft/CMakeLists.txt (1)

1-6: LGTM! Raft example CMake setup looks good.

The CMake configuration properly sets up the RaftMain executable with:

  • Appropriate C++23 standard
  • Correct library dependencies for Raft consensus functionality
  • Prometheus integration for monitoring

However, given the critical nature of Raft consensus protocol, comprehensive testing is essential.

The previous review comments about adding test configuration remain valid. Raft consensus requires thorough testing of:

  • Leader election correctness
  • Log replication consistency
  • Network partition handling
  • State persistence reliability

Consider adding a separate test executable configuration to ensure the Raft implementation is robust and reliable.

examples/prometheus/main.cpp (3)

48-48: Use modern C++ random facilities.

The use of std::rand() is problematic as it's not thread-safe and provides poor randomization quality.

This issue was previously identified. Please implement the suggested thread-safe random number generator using std::random_device and std::mt19937.


70-74: Optimize counter implementation for better performance.

The dynamic creation of counters for each HTTP method is inefficient and can lead to high cardinality issues.

This performance concern was previously raised. Consider pre-registering counters at startup as suggested in the past review.


45-45: Add graceful shutdown mechanism.

The infinite loop lacks proper shutdown handling, making it difficult to terminate the application cleanly.

This was previously identified. Consider implementing signal handling for SIGINT and SIGTERM as suggested.

assets/tkvpp_config_1.json (1)

7-7: Remove redundant walFilename property.

The walFilename property in the database section duplicates the filename property in the WAL section (line 13).

This redundancy was previously identified. Please remove the redundant property as suggested in the past review.

examples/raft/main.cpp (4)

18-24: Critical Issue: Unsafe signal handler implementation persists.

This is the same critical issue identified in previous reviews. Using std::condition_variable::notify_all() in a signal handler is unsafe and can cause undefined behavior.

The suggested fix from the previous review is still valid:

-std::condition_variable gCv;
+std::atomic<bool> gShutdown{false};

-static void signalHandler(int sig)
+void signalHandler(int sig)
 {
     if (sig == SIGTERM || sig == SIGINT)
     {
-        gCv.notify_all();
+        gShutdown.store(true);
     }
 }

And in the main function:

-    std::mutex                   mtx;
-    std::unique_lock<std::mutex> lock(mtx);
-    gCv.wait(lock);
+    while (!gShutdown.load())
+    {
+        std::this_thread::sleep_for(std::chrono::milliseconds(100));
+    }

64-65: Security Issue: Insecure channel credentials still present.

The use of grpc::InsecureChannelCredentials() identified in previous reviews remains a security concern for production deployments.

Consider using TLS credentials as suggested in the previous review:

auto creds = grpc::SslCredentials(grpc::SslCredentialsOptions());
// Use creds instead of grpc::InsecureChannelCredentials()

76-80: Implement initialization timeout and better error reporting.

The initialization error handling could be improved with timeout and more detailed error information as suggested in previous reviews.

Consider the enhancement suggested in previous reviews for better initialization handling with timeout and detailed error reporting.


85-87: Replace condition variable with atomic flag for main loop.

This is part of the critical signal handling fix. The condition variable wait should be replaced with the atomic flag approach.

This change is part of the signal handling fix mentioned in the previous comment.

lib/db/db_config.h (1)

32-32: Add default value for mode field

The mode field in db_config_t struct doesn't have a default value, which could lead to uninitialized values. Consider adding a default value.

-    db_mode_t   mode;
+    db_mode_t   mode{db_mode_t::kStandalone};
lib/structures/lsmtree/levels/level.cpp (1)

182-186: Good fix for the overlap checking logic.

The lambda now correctly checks for overlapping segments by comparing the ranges between different segments rather than comparing a segment to itself.

lib/raft/persistence.h (2)

9-27: Consider adding documentation for the persistence interface.

The class would benefit from documentation explaining what Raft state is persisted and the expected behavior of each method.


18-18: Remove incorrect const qualifier from move assignment operator.

The parameter should be persistence_t && not const persistence_t &&.

Apply this diff:

-    auto operator=(const persistence_t &&) -> persistence_t & = delete;
+    auto operator=(persistence_t &&) -> persistence_t & = delete;
lib/raft/raft_test.cpp (3)

87-92: raftClients() moves elements causing undefined behavior on second call.

Moving elements from the map leaves them in an invalid state. Since this method is called twice (once per test), the second invocation operates on moved-from objects.


132-132: Replace fixed sleep with deterministic synchronization.

Using fixed sleep durations makes tests flaky and unnecessarily slow. Consider polling with timeout or using condition variables.

Also applies to: 184-184


168-168: Potential invalid reference in mock setup.

Using SetArgPointee with a reference that may become invalid can cause issues. Consider using a safer approach.

examples/iouring/libiouring/main.cpp (2)

94-100: Fix memory allocation mismatch for remainder block.

This is the same issue flagged in previous reviews. Memory is allocated using the initial blocks count, but blocks is incremented afterward if there's a remainder. This leads to potential buffer overrun.


70-79: Handle partial block in completion callback.

This is the same issue from previous reviews - the completion callback doesn't account for partial blocks when file_sz is not a multiple of BLOCK_SZ.

lib/fs/append_only_file.cpp (1)

22-29: Ineffective self-assignment check in move constructor.

The move constructor has a self-assignment check after the member has already been moved in the initializer list, rendering it ineffective. Either remove the check or restructure the constructor.

.vscode/launch.json (1)

83-83: Fix JSON syntax errors - remove trailing commas.

Remove the trailing commas after the pipeTransport objects to fix JSON syntax errors:

                 "debuggerPath": "/usr/bin/gdb"
-            },
+            }
         },

Also applies to: 119-119, 164-164, 301-301, 348-348

lib/fs/random_access_file.cpp (3)

38-53: Resource leak in move assignment operator.

Before overwriting m_fd/m_ring, the current object may still own a valid file descriptor and io_uring instance. Overwriting them without closing m_fd and calling io_uring_queue_exit() leaks kernel resources.


87-90: Missing error handling for io_uring operations.

The io_uring_submit and io_uring_wait_cqe calls lack error checking. If either returns a negative value, the code may dereference an uninitialized cqe pointer.


191-191: Undefined constant kBufferSize.

kBufferSize is used but not defined in this translation unit, which will cause a compilation error.

lib/fs/append_only_file.h (2)

42-42: Consider returning a reference or pointer for stream usage.

Returning a std::stringstream by value may be inefficient for large files. Consider returning a reference to an internal stringstream, using a string_view, or providing a streaming interface.


28-28: Address pending reference counting implementation.

The TODO comment in the destructor implementation indicates that reference counting is still missing. This is necessary to correctly manage the resource lifecycle when multiple instances might share the same file descriptor.

assets/database_config_schema.json (1)

66-84: Missing property definition for "maximumLevels" in the LSM schema.

The "maximumLevels" property is listed as required (line 81), but it's not defined in the LSM properties section. This will cause schema validation failures.

Add the missing property definition after line 71:

                 "minimum": 1
             },
+            "maximumLevels": {
+                "type": "integer",
+                "description": "Maximum number of levels in the LSM tree",
+                "minimum": 1
+            },
             "levelZeroCompaction": {
lib/db/db.h (1)

74-84: Return std::optional<shared_ptr_t> instead of std::optional<db_t>.

Returning std::optional<db_t> by value is inefficient as it requires moving the entire database object. Since you already provide make_shared, consider returning std::optional<shared_ptr_t> for consistency and efficiency.

-    [[nodiscard]] auto build(
-        config::shared_ptr_t   config,
-        db_t::wal_ptr_t        pWal,
-        manifest::shared_ptr_t pManifest,
-        db_t::lsmtree_ptr_t    pLSMTree
-    ) -> std::optional<db_t>
-    {
-        return std::make_optional(
-            db_t{std::move(config), std::move(pWal), std::move(pManifest), std::move(pLSMTree)}
-        );
-    }
+    [[nodiscard]] auto build(
+        config::shared_ptr_t   config,
+        db_t::wal_ptr_t        pWal,
+        manifest::shared_ptr_t pManifest,
+        db_t::lsmtree_ptr_t    pLSMTree
+    ) -> std::optional<shared_ptr_t>
+    {
+        return make_shared(
+            std::move(config), std::move(pWal), std::move(pManifest), std::move(pLSMTree)
+        );
+    }
lib/raft/raft.h (2)

229-236: Initialize member variables.

These members are declared but not initialized. Even if they're initialized in the constructor, it's better practice to provide default values here to ensure they're never uninitialized.

-    uint32_t m_currentTerm ABSL_GUARDED_BY(m_stateMutex);
-    uint32_t m_votedFor    ABSL_GUARDED_BY(m_stateMutex);
+    uint32_t m_currentTerm ABSL_GUARDED_BY(m_stateMutex){0};
+    uint32_t m_votedFor    ABSL_GUARDED_BY(m_stateMutex){gInvalidId};
     wal_ptr_t m_log        ABSL_GUARDED_BY(m_stateMutex);

     // Volatile state on all servers.
-    uint32_t m_commitIndex ABSL_GUARDED_BY(m_stateMutex);
-    uint32_t m_lastApplied ABSL_GUARDED_BY(m_stateMutex);
-    NodeState m_state      ABSL_GUARDED_BY(m_stateMutex);
+    uint32_t m_commitIndex ABSL_GUARDED_BY(m_stateMutex){0};
+    uint32_t m_lastApplied ABSL_GUARDED_BY(m_stateMutex){0};
+    NodeState m_state      ABSL_GUARDED_BY(m_stateMutex){NodeState::FOLLOWER};

251-251: Make m_shutdown thread-safe.

The m_shutdown flag is accessed from multiple threads but is neither atomic nor protected by a mutex. This creates a data race condition.

-    bool m_shutdown{false};
+    std::atomic<bool> m_shutdown{false};
lib/raft/raft.cpp (5)

156-156: Pass nodeConfig by const reference to avoid unnecessary copy.

-consensus_module_t::consensus_module_t(
-    node_config_t                        nodeConfig,
+consensus_module_t::consensus_module_t(
+    const node_config_t&                 nodeConfig,
     std::vector<raft_node_grpc_client_t> replicas,
     wal_ptr_t                            pLog,
     on_commit_cbk_t                      onCommitCbk
 ) noexcept
-    : m_config{std::move(nodeConfig)},
+    : m_config{nodeConfig},

820-820: Critical: Implement state machine updates for committed entries.

The TODO indicates that committed log entries are never applied to the state machine. This is a critical issue that prevents the Raft implementation from having any real effect.

-spdlog::info("TODO(lnikon): Apply to state machine here");
+const auto& entry = m_log->read(m_lastApplied - 1);
+if (entry.has_value()) {
+    m_onCommitCbk(entry.value());
+}

619-619: Fix data race on m_shutdown variable.

The m_shutdown variable is accessed from multiple threads without proper synchronization. It's written under lock in stop() but read without lock in runHeartbeatThread and runElectionThread.

+std::atomic<bool> m_shutdown{false};

Then remove the lock when setting it in stop():

-{
-    absl::WriterMutexLock locker{&m_stateMutex};
-    m_shutdown = true;
-}
+m_shutdown.store(true);

Also applies to: 862-862


1134-1134: Remove the line that overrides the loaded votedFor value.

This line defeats the purpose of persisting votedFor and breaks Raft protocol correctness:

 ifs >> m_commitIndex >> m_votedFor;
-m_votedFor = 0;

1108-1108: Add fsync for true durability.

The file is flushed but not synced to disk. For Raft safety, add fsync:

-return !fsa.flush().fail();
+if (fsa.flush().fail()) {
+    return false;
+}
+// Ensure data is durable on disk
+#ifdef _WIN32
+    return _commit(fsa.rdbuf()->fd()) == 0;
+#else
+    return fsync(fileno(fsa)) == 0;
+#endif

Note: You'll need to get the file descriptor appropriately for your stream implementation.

🧹 Nitpick comments (42)
.gitignore (2)

34-41: Consider including environment and OS-specific files in ignore patterns
This block adds generic backups and logs, but common local files like .env (environment variables) and OS metadata (.DS_Store, Thumbs.db) are missing. Including them prevents sensitive or noise files from being committed.

Proposed diff:

  *.user
  *.vim
  *.Vim
+ # Local environment and OS metadata
+ *.env
+ *.env.local
+ .DS_Store
+ Thumbs.db

42-48: Refine IDE and directory patterns and avoid overly generic names
Ignoring .idea/ and .fleet/ is useful, but consider also VSCode (.vscode/) and Visual Studio (.vs/). Additionally, verify that entries like var/, tmp/, and current/ align with actual generated folders to avoid masking legit project directories.

Proposed diff:

  .idea/
  .fleet/
  .cache/
+ .vscode/
+ .vs/
  build/
- var
- tmp
- current
+ var/     # only if generated at root
+ tmp/     # ensure this matches project structure
+ current/ # verify necessity
lib/structures/lsmtree/lsmtree_reader_writer.h (2)

8-8: Fix typo and use professional language in TODO comment.

The TODO comment contains a typo and unprofessional language that should be addressed.

Apply this diff to fix the comment:

-// TOD(vahag): WTF is this file?
+// TODO(vahag): What is the purpose of this file?

10-48: Consider removing or implementing placeholder code.

This file contains only empty stub implementations with no actual functionality. Consider either implementing the intended functionality or removing unused placeholder code to reduce codebase clutter.

If these structs are intended for future implementation, consider adding clear documentation about their intended purpose and functionality.

DESIGN.md (1)

3-3: Fix missing article for better readability.

-Purpose of this document is to describe design decisions made while developing TinyKVPP and at least try to give some rationale behind them.
+The purpose of this document is to describe design decisions made while developing TinyKVPP and at least try to give some rationale behind them.
lib/server/CMakeLists.txt (1)

1-2: Optional: Specify project languages
Consider declaring languages explicitly (e.g., project(frankie LANGUAGES C CXX)) to avoid unintended default language settings.

examples/embedded/CMakeLists.txt (1)

1-2: Optional: Specify project languages
You might add LANGUAGES C CXX to the project() call to explicitly set the intended build languages.

lib/structures/lsmtree/compaction/compactation.h (1)

1-6: Optional: Correct naming and clean up
The file and struct are spelled compactation instead of compaction, and the compactation_t struct is empty. Consider renaming for clarity and removing or populating the struct to reduce confusion.

.clang-tidy (1)

12-12: Consider specifying a style or integrating clang-format
Setting FormatStyle: none disables automatic formatting fixes. To ensure consistent code style, consider choosing a clang-format style (e.g., Google or LLVM) and enabling it here or adding a .clang-format file.

lib/structures/lsmtree/compaction/level_zero_compactation.h (1)

5-8: Optional: Remove or populate empty namespace
The level_zero_compactation namespace is currently empty. Consider removing this header until functionality is added or outlining the intended interface to clarify its purpose.

lib/structures/lsmtree/lsmtree_compactor.h (1)

1-6: Clarify the purpose of this empty compactor header.

The file now contains only an empty namespace after removing all commented code. For a compactor component in an LSM tree implementation, this seems incomplete.

Please clarify:

  • Is the compactor functionality implemented elsewhere?
  • Is this header still needed, or should it be removed?
  • Is this a work-in-progress that needs implementation?
#!/bin/bash
# Description: Check if compactor functionality exists elsewhere and if this header is referenced
# Expected: Find actual compactor implementation or references to this header

echo "Searching for compactor implementations and references..."

# Search for compactor-related code in the LSM tree module
rg -i "compact" lib/structures/lsmtree/ -A 3 -B 1

# Search for includes of this header file
rg "#include.*lsmtree_compactor" --type cpp --type c

# Search for compactor class/struct definitions
ast-grep --pattern 'class $_ {
  $$$
  compact$_($$$) {
    $$$
  }
  $$$
}'

ast-grep --pattern 'struct $_ {
  $$$
  compact$_($$$) {
    $$$
  }
  $$$
}'
lib/raft/CMakeLists.txt (2)

4-13: Specify link scope for Raft library dependencies.

Consider marking each dependency as PRIVATE or PUBLIC to clarify whether they’re part of Raft’s public interface or only needed internally.


15-18: Scope test dependencies explicitly.

For RaftTest, use target_link_libraries(RaftTest PRIVATE ...) to ensure these dependencies don’t propagate beyond the test target.

lib/posix_wrapper/posix_wrapper_test.cpp (1)

7-17: Consider adding test for bitwise operations.

The comments suggest the enum supports bitwise operations, but the current test only verifies individual flag mappings. Consider adding tests for combined flags.

Add test cases for bitwise combinations:

TEST(PosixWrapperTest, OpenFlagCombinations)
{
    EXPECT_EQ(pw::to_native(pw::open_flag_k::kReadWrite | pw::open_flag_k::kCreate), 
              O_RDWR | O_CREAT);
    EXPECT_EQ(pw::to_native(pw::open_flag_k::kWriteOnly | pw::open_flag_k::kAppend), 
              O_WRONLY | O_APPEND);
}
assets/tkvpp_config_standalone.json (4)

2-4: Suggest enhancing log output configuration.
Consider extending the logging section to specify log output destinations (file paths, rotation policies) for better observability in standalone mode.


5-10: Consolidate WAL filename settings.
The walFilename under database duplicates the filename in the wal section. Consider removing the redundant walFilename key and relying solely on the wal section to avoid confusion.


16-27: Validate compaction thresholds for performance tuning.
The flushThreshold and both compaction thresholds are set to 1024 in standalone mode. Consider reviewing these values against production workloads or making them configurable per environment.


28-38: Use explicit peer addresses instead of wildcards.
Binding to 0.0.0.0 is appropriate for listening, but using 0.0.0.0 in the peers list may not be routable by clients. Replace these entries with actual hostnames or IPs (e.g., 127.0.0.1:8081) and ensure the self-address aligns with id: 1.

lib/structures/lsmtree/CMakeLists.txt (2)

3-13: Prefer listing only source files.
CMake allows headers to be discovered implicitly; listing all header files (e.g., segments/uuid.h, segments/types.h) is optional. Consider removing headers from add_library to simplify maintenance.


19-21: Review linkage of header-only dependencies.
Libraries like magic_enum are header-only and may not require linking as a PUBLIC target. If they only provide include paths, consider moving them to INTERFACE instead to reduce unnecessary link dependencies.

.vscode/settings.json (1)

17-17: Avoid hardcoded clangd path.
A fixed clangd.path may not be portable across environments. Consider relying on PATH or using a workspace setting that adapts to different developer machines.

assets/tkvpp_config_3.json (3)

2-4: Consider increasing log verbosity for replication.
In replicated mode, setting loggingLevel to "info" may omit important Raft state transitions. You might want "debug" for deeper insights during development.


16-27: LSM thresholds consistency check.
The LSM thresholds mirror those in standalone mode. If compaction behavior differs under Replicated WAL, consider tuning these values independently.


28-38: Specify reachable peer endpoints.
Using 0.0.0.0 for peers can cause connection issues. Replace with actual host addresses or service names, and ensure that each peer’s id correctly maps to its index in this list.

lib/fs/common.cpp (3)

8-8: Remove redundant namespace qualification.

Inside the fs namespace, you don't need to qualify file_error_t with fs::.

-[[nodiscard]] auto fs::file_error_t::has_error() const noexcept -> bool
+[[nodiscard]] auto file_error_t::has_error() const noexcept -> bool

13-18: Remove redundant namespace qualification.

Same issue as above - remove the fs:: prefix since we're already in the fs namespace.

-auto fs::file_error_t::from_errno(file_error_code_k code, int err, const char *context) noexcept
+auto file_error_t::from_errno(file_error_code_k code, int err, const char *context) noexcept

19-22: Remove redundant namespace qualification.

Apply the same fix here.

-auto fs::file_error_t::success() noexcept -> file_error_t
+auto file_error_t::success() noexcept -> file_error_t
lib/server/grpc_server.cpp (1)

32-44: Simplify status string construction.

The explicit std::string construction is unnecessary. Consider using string literals directly:

-        std::string status;
-        if (!m_database->put(
-                structures::lsmtree::key_t{pRequest->key()},
-                structures::lsmtree::value_t{pRequest->value()},
-                db::db_put_context_k::replicate_k
-            ))
-        {
-            status = std::string("Request failed");
-        }
-        else
-        {
-            status = std::string("OK");
-        }
-
-        pResponse->set_status(status);
+        const bool success = m_database->put(
+            structures::lsmtree::key_t{pRequest->key()},
+            structures::lsmtree::value_t{pRequest->value()},
+            db::db_put_context_k::replicate_k
+        );
+        
+        pResponse->set_status(success ? "OK" : "Request failed");
lib/db/db_config.h (1)

3-6: Verify necessity of both assert libraries

You're including both libassert/assert.hpp and <cassert>. Consider if both are necessary or if you can standardize on one assertion library.

lib/proto/CMakeLists.txt (2)

40-49: Eliminate duplicate variable definitions and checks

The PROTO_BINARY_DIR, PROTO_IMPORT_DIRS, and gRPC plugin check are unnecessarily duplicated. Consider refactoring to avoid repetition.

-set(PROTO_BINARY_DIR "${CMAKE_CURRENT_BINARY_DIR}")
-set(PROTO_IMPORT_DIRS "${CMAKE_CURRENT_LIST_DIR}")
-
-add_library(RaftProtoObjects OBJECT "${RAFT_PROTO_FILE}")
-target_include_directories(RaftProtoObjects PUBLIC "$<BUILD_INTERFACE:${PROTO_BINARY_DIR}>" ${protobuf_INCLUDE_DIR})
-target_link_libraries(RaftProtoObjects PUBLIC protobuf::libprotobuf gRPC::grpc++)
-
-if(NOT TARGET gRPC::grpc_cpp_plugin)
-  message(FATAL_ERROR "gRPC C++ plugin not found. Please gRPC is properly installed.")
-endif()
+add_library(RaftProtoObjects OBJECT "${RAFT_PROTO_FILE}")
+target_include_directories(RaftProtoObjects PUBLIC "$<BUILD_INTERFACE:${PROTO_BINARY_DIR}>" ${protobuf_INCLUDE_DIR})
+target_link_libraries(RaftProtoObjects PUBLIC protobuf::libprotobuf gRPC::grpc++)

51-62: Consider creating a function to reduce protobuf generation duplication

The protobuf generation logic is duplicated for both proto files. Consider creating a CMake function to handle this pattern.

Example approach:

function(add_proto_library TARGET_NAME PROTO_FILE)
    add_library(${TARGET_NAME} OBJECT "${PROTO_FILE}")
    target_include_directories(${TARGET_NAME} PUBLIC "$<BUILD_INTERFACE:${PROTO_BINARY_DIR}>" ${protobuf_INCLUDE_DIR})
    target_link_libraries(${TARGET_NAME} PUBLIC protobuf::libprotobuf gRPC::grpc++)
    
    protobuf_generate(TARGET ${TARGET_NAME} IMPORT_DIRS ${PROTO_IMPORT_DIRS} PROTOC_OUT_DIR "${PROTO_BINARY_DIR}")
    protobuf_generate(TARGET ${TARGET_NAME} LANGUAGE grpc PLUGIN protoc-gen-grpc=$<TARGET_FILE:gRPC::grpc_cpp_plugin> PLUGIN_OPTIONS generate_mock_code=true GENERATE_EXTENSIONS .grpc.pb.h .grpc.pb.cc)
endfunction()
lib/proto/Raft.proto (1)

20-24: Consider adding missing field number 3 in LogEntry.

The LogEntry message skips field number 3, jumping from index (2) to payload (4). While not incorrect, it's unusual and may indicate a missing field.

If field 3 was intentionally omitted, consider adding a comment explaining why, or renumber the fields consecutively.

lib/db/db.cpp (1)

64-64: Missing mutex implementation for thread safety.

The TODO comments indicate missing write and read mutexes, which could lead to data races in concurrent environments.

Would you like me to generate the mutex implementation to ensure thread-safe access to the database?

Also applies to: 184-184

lib/fs/random_access_file.cpp (1)

253-253: Use consistent logging throughout the codebase.

Replace std::cerr with spdlog::error for consistency with the rest of the codebase:

-        std::cerr << "errorno: " << strerror(errno) << '\n';
+        spdlog::error("errno: {}", strerror(errno));
lib/structures/lsmtree/lsmtree.cpp (2)

41-43: Clean up misleading comments about thread moveability.

The comments suggest that m_flushing_thread cannot be moved, but std::jthread is actually moveable since C++20. The current implementation correctly handles the move by stopping and recreating the thread, but the comments should be updated or removed to avoid confusion.

-//   m_flushing_thread(std::move(other.m_flushing_thread)),
-//   m_flushing_queue(std::move(other.m_flushing_queue))

99-105: Consider implementing write throttling to reduce latency spikes.

The TODO correctly identifies that flushing can cause periodic latencies. Consider implementing write throttling or a double-buffering scheme where writes go to a new memtable while the old one is being flushed.

lib/structures/lsmtree/lsmtree.h (1)

81-97: Well-structured builder pattern with one suggestion.

The builder pattern effectively separates construction concerns. Consider marking the builder methods as const noexcept where appropriate for consistency.

 [[nodiscard]] auto
 build_memtable_from_wal(wal::shared_ptr_t<wal::wal_entry_t> pWal) const noexcept
     -> std::optional<memtable::memtable_t>;

 [[nodiscard]] auto build_levels_from_manifest(
     config::shared_ptr_t pConfig, db::manifest::shared_ptr_t pManifest
 ) const noexcept -> std::optional<levels::levels_t>;
lib/raft/raft.cpp (1)

42-46: Consider making timeout values configurable.

The hardcoded timeout values (150-300ms) should be configurable to support different network conditions and deployment scenarios.

-constexpr const int minTimeout{150};
-constexpr const int maxTimeout{300};
+const int minTimeout = m_config.m_minElectionTimeout;
+const int maxTimeout = m_config.m_maxElectionTimeout;
lib/db/manifest/manifest.h (4)

40-50: Fix typo in error message.

             default:
-                return {"unkown op"};
+                return {"unknown op"};

105-118: Fix typo in error message.

             default:
-                return {"unkown op"};
+                return {"unknown op"};

160-168: Add documentation for public methods.

The refactored constructor and [[nodiscard]] attributes look good. However, the enable() and disable() methods lack documentation.

Consider adding documentation:

+    /**
+     * @brief Enable manifest recording. When enabled, new records will be persisted to WAL.
+     */
     void enable();
+    /**
+     * @brief Disable manifest recording. When disabled, new records will not be persisted.
+     */
     void disable();

234-238: Add documentation for manifest_builder_t.

The builder pattern with optional return type is a good design choice. Consider adding documentation to clarify its purpose.

+/**
+ * @brief Builder for creating manifest instances with proper initialization and error handling.
+ */
 struct manifest_builder_t final
 {
+    /**
+     * @brief Build a manifest instance from the given path and WAL.
+     * @param path The filesystem path for the manifest
+     * @param wal The write-ahead log instance for persistence
+     * @return Optional manifest instance, empty on failure
+     */
     [[nodiscard]] auto build(fs::path_t path, wal::wal_t<manifest_t::record_t> wal)
         -> std::optional<manifest_t>;
 };
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2173a93 and 14de60c.

📒 Files selected for processing (107)
  • .clang-format (2 hunks)
  • .clang-tidy (2 hunks)
  • .gitignore (1 hunks)
  • .vscode/launch.json (3 hunks)
  • .vscode/settings.json (1 hunks)
  • CMakeLists.txt (3 hunks)
  • DESIGN.md (1 hunks)
  • assets/database_config_schema.json (4 hunks)
  • assets/tkvpp_config.json (0 hunks)
  • assets/tkvpp_config_1.json (1 hunks)
  • assets/tkvpp_config_2.json (1 hunks)
  • assets/tkvpp_config_3.json (1 hunks)
  • assets/tkvpp_config_replicated_single_node.json (1 hunks)
  • assets/tkvpp_config_standalone.json (1 hunks)
  • bench/CMakeLists.txt (1 hunks)
  • bench/db/CMakeLists.txt (1 hunks)
  • bench/db/basic_bench.cpp (1 hunks)
  • bench/db/celero_bench.cpp (2 hunks)
  • bench/db/gbench_bench.cpp (2 hunks)
  • bench/fs/CMakeLists.txt (1 hunks)
  • bench/fs/lots_of_write.cpp (1 hunks)
  • bench/fs/open_write_streams.cpp (3 hunks)
  • conan/profiles/debug-clang (1 hunks)
  • conan/profiles/debug-clang-tsan (2 hunks)
  • conan/profiles/debug-gcc (1 hunks)
  • conan/profiles/release-clang (1 hunks)
  • conan/profiles/release-gcc (1 hunks)
  • conanfile.txt (1 hunks)
  • examples/CMakeLists.txt (1 hunks)
  • examples/absl/CMakeLists.txt (1 hunks)
  • examples/absl/absl_try_mutex.cpp (1 hunks)
  • examples/absl/concurrency/CMakeLists.txt (1 hunks)
  • examples/embedded/CMakeLists.txt (1 hunks)
  • examples/iouring/libiouring/main.cpp (1 hunks)
  • examples/iouring/raw_uring/main.cpp (1 hunks)
  • examples/prometheus/CMakeLists.txt (1 hunks)
  • examples/prometheus/main.cpp (1 hunks)
  • examples/raft/CMakeLists.txt (1 hunks)
  • examples/raft/main.cpp (1 hunks)
  • lib/CMakeLists.txt (2 hunks)
  • lib/concurrency/CMakeLists.txt (1 hunks)
  • lib/concurrency/helpers.h (1 hunks)
  • lib/concurrency/thread_safe_queue.h (5 hunks)
  • lib/config/CMakeLists.txt (1 hunks)
  • lib/config/config.cpp (1 hunks)
  • lib/config/config.h (2 hunks)
  • lib/db/CMakeLists.txt (1 hunks)
  • lib/db/db.cpp (3 hunks)
  • lib/db/db.h (1 hunks)
  • lib/db/db_config.cpp (1 hunks)
  • lib/db/db_config.h (1 hunks)
  • lib/db/db_test.cpp (1 hunks)
  • lib/db/manifest/manifest.cpp (1 hunks)
  • lib/db/manifest/manifest.h (6 hunks)
  • lib/db/wal/wal.cpp (0 hunks)
  • lib/db/wal/wal.h (0 hunks)
  • lib/fs/CMakeLists.txt (1 hunks)
  • lib/fs/append_only_file.cpp (1 hunks)
  • lib/fs/append_only_file.h (1 hunks)
  • lib/fs/common.cpp (1 hunks)
  • lib/fs/common.h (1 hunks)
  • lib/fs/random_access_file.cpp (1 hunks)
  • lib/fs/random_access_file.h (1 hunks)
  • lib/fs/random_access_file_test.cpp (1 hunks)
  • lib/posix_wrapper/CMakeLists.txt (1 hunks)
  • lib/posix_wrapper/open_flag.h (1 hunks)
  • lib/posix_wrapper/posix_wrapper_test.cpp (1 hunks)
  • lib/proto/CMakeLists.txt (2 hunks)
  • lib/proto/Raft.proto (1 hunks)
  • lib/raft/CMakeLists.txt (1 hunks)
  • lib/raft/persistence.h (1 hunks)
  • lib/raft/raft.cpp (1 hunks)
  • lib/raft/raft.h (1 hunks)
  • lib/raft/raft_test.cpp (1 hunks)
  • lib/server/CMakeLists.txt (1 hunks)
  • lib/server/grpc_server.cpp (4 hunks)
  • lib/server/grpc_server.h (1 hunks)
  • lib/server/server_config.h (1 hunks)
  • lib/server/server_factory.h (1 hunks)
  • lib/server/server_kind.cpp (1 hunks)
  • lib/server/server_kind.h (1 hunks)
  • lib/structures/CMakeLists.txt (1 hunks)
  • lib/structures/hashindex/CMakeLists.txt (1 hunks)
  • lib/structures/lsmtree/CMakeLists.txt (1 hunks)
  • lib/structures/lsmtree/compaction/compactation.h (1 hunks)
  • lib/structures/lsmtree/compaction/compaction_trigger.h (1 hunks)
  • lib/structures/lsmtree/compaction/level_zero_compactation.h (1 hunks)
  • lib/structures/lsmtree/levels/level.cpp (12 hunks)
  • lib/structures/lsmtree/levels/level.h (3 hunks)
  • lib/structures/lsmtree/levels/levels.cpp (8 hunks)
  • lib/structures/lsmtree/levels/levels.h (2 hunks)
  • lib/structures/lsmtree/lsmtree.cpp (5 hunks)
  • lib/structures/lsmtree/lsmtree.h (1 hunks)
  • lib/structures/lsmtree/lsmtree_compactor.h (1 hunks)
  • lib/structures/lsmtree/lsmtree_config.h (1 hunks)
  • lib/structures/lsmtree/lsmtree_reader_writer.h (1 hunks)
  • lib/structures/lsmtree/lsmtree_test.cpp (4 hunks)
  • lib/structures/lsmtree/lsmtree_types.h (1 hunks)
  • lib/structures/lsmtree/segments/helpers.cpp (1 hunks)
  • lib/structures/lsmtree/segments/helpers.h (2 hunks)
  • lib/structures/lsmtree/segments/lsmtree_regular_segment.cpp (3 hunks)
  • lib/structures/lsmtree/segments/lsmtree_regular_segment.h (2 hunks)
  • lib/structures/lsmtree/segments/lsmtree_segment_factory.h (1 hunks)
  • lib/structures/lsmtree/segments/lsmtree_segment_index.h (1 hunks)
  • lib/structures/lsmtree/segments/segment_storage.cpp (2 hunks)
  • lib/structures/lsmtree/segments/segment_storage.h (2 hunks)
  • lib/structures/lsmtree/segments/types.h (1 hunks)
⛔ Files not processed due to max files limit (33)
  • lib/structures/lsmtree/segments/uuid.h
  • lib/structures/memtable/CMakeLists.txt
  • lib/structures/memtable/memtable.cpp
  • lib/structures/memtable/memtable.h
  • lib/structures/memtable/memtable_reader_writer.h
  • lib/structures/memtable/memtable_test.cpp
  • lib/structures/skiplist/CMakeLists.txt
  • lib/structures/skiplist/skiplist.h
  • lib/structures/skiplist/skiplist_test.cpp
  • lib/structures/sorted_vector/CMakeLists.txt
  • lib/structures/sorted_vector/sorted_vector.h
  • lib/structures/sorted_vector/sorted_vector_test.cpp
  • lib/wal/CMakeLists.txt
  • lib/wal/common.cpp
  • lib/wal/common.h
  • lib/wal/config.h
  • lib/wal/log/concepts.h
  • lib/wal/log/log.h
  • lib/wal/log/storage/backend/append_only_file_storage_backend.h
  • lib/wal/log/storage/backend/backend.h
  • lib/wal/log/storage/builder.h
  • lib/wal/log/storage/in_memory_log_storage.h
  • lib/wal/log/storage/log_storage_crtp.h
  • lib/wal/log/storage/persistent_log_storage.h
  • lib/wal/log/storage/replicated_log_storage.h
  • lib/wal/wal.h
  • main_run_replicas.sh
  • raft_run_replicas.sh
  • requirements.txt
  • src/CMakeLists.txt
  • src/config.cpp
  • src/config.h
  • src/main.cpp
💤 Files with no reviewable changes (3)
  • assets/tkvpp_config.json
  • lib/db/wal/wal.cpp
  • lib/db/wal/wal.h
🧰 Additional context used
🪛 LanguageTool
DESIGN.md

[uncategorized] ~2-~2: Possible missing article found.
Context: # Design Document Purpose of this document is to describe design ...

(AI_HYDRA_LEO_MISSING_THE)

🪛 Buf (1.54.0)
lib/proto/Raft.proto

1-1: Files must have a package defined.

(PACKAGE_DEFINED)

🪛 Biome (1.9.4)
.vscode/launch.json

[error] 83-83: Expected a property but instead found '}'.

Expected a property here.

(parse)


[error] 119-119: Expected a property but instead found '}'.

Expected a property here.

(parse)


[error] 164-164: Expected a property but instead found '}'.

Expected a property here.

(parse)


[error] 301-301: Expected a property but instead found '}'.

Expected a property here.

(parse)


[error] 348-348: Expected a property but instead found '}'.

Expected a property here.

(parse)

🔇 Additional comments (125)
lib/structures/lsmtree/compaction/compaction_trigger.h (1)

1-1: Replace include guards with #pragma once
Switching to #pragma once simplifies the header and avoids potential macro collisions. The change here is spot-on.

lib/structures/lsmtree/lsmtree_reader_writer.h (1)

1-1: LGTM - Good modernization of include guard.

The replacement of traditional header guards with #pragma once is a good practice that simplifies maintenance and improves readability.

bench/db/basic_bench.cpp (1)

32-33: LGTM - Good formatting improvement.

The multi-line formatting of the emplace_back call improves readability without changing functionality.

bench/fs/lots_of_write.cpp (1)

17-18: LGTM - Good formatting improvement.

The multi-line formatting of the std::fstream constructor improves readability of the file open flags without changing functionality.

examples/absl/CMakeLists.txt (1)

2-2: LGTM - Consistent project naming.

The project name update from "zkv" to "frankie" aligns with the project-wide renaming initiative.

lib/structures/hashindex/CMakeLists.txt (1)

2-2: LGTM - Consistent project naming.

The project name update from "zkv" to "frankie" maintains consistency with the project-wide renaming initiative.

lib/structures/CMakeLists.txt (1)

2-2: Verify project-wide consistency of the name change.

The project name change from zkv to frankie looks correct. Please ensure this renaming is consistent across all CMakeLists.txt files and any other configuration files that reference the project name.

#!/bin/bash
# Description: Verify that all CMakeLists.txt files use the consistent project name "frankie"
# and check for any remaining references to the old name "zkv"

echo "=== Checking all CMakeLists.txt files for project name consistency ==="
fd "CMakeLists.txt" --exec grep -H "project(" {}

echo -e "\n=== Searching for any remaining references to 'zkv' ==="
rg -i "zkv" --type cmake
lib/structures/lsmtree/segments/lsmtree_segment_index.h (1)

1-1: Good modernization with #pragma once.

Replacing traditional include guards with #pragma once is a good practice that simplifies header maintenance and reduces the risk of naming conflicts.

lib/structures/lsmtree/segments/helpers.h (2)

1-1: Good modernization with #pragma once.

Replacing traditional include guards with #pragma once is a good practice that simplifies header maintenance.


17-18: Improved function declaration formatting.

The reformatted function declaration with the return type on a separate line improves readability, especially for longer function signatures.

bench/db/gbench_bench.cpp (2)

48-49: Improved readability with multi-line function call.

Breaking the emplace_back call across multiple lines improves readability when dealing with longer argument lists.


59-65: Better formatting for benchmark registration chain.

The multi-line formatting of the BENCHMARK_REGISTER_F chain significantly improves readability and makes it easier to see all the test arguments at a glance.

bench/db/celero_bench.cpp (2)

40-41: Improved formatting for trailing return type.

The multi-line formatting of the trailing return type improves readability for complex return types like std::vector<std::shared_ptr<celero::TestFixture::ExperimentValue>>.


68-69: Better readability with multi-line function call.

Breaking the emplace_back call across multiple lines improves readability and is consistent with similar formatting improvements elsewhere in the codebase.

lib/concurrency/helpers.h (2)

10-48: LGTM! Well-implemented deadlock prevention.

The dual mutex lock guard correctly orders mutexes by address and follows proper RAII patterns. The reverse-order unlocking in the destructor is the correct approach.


51-58: LGTM! Useful defensive programming pattern.

The conditional locking based on compile-time flag is a good approach for debug builds without impacting release performance.

lib/structures/lsmtree/segments/types.h (1)

1-1: LGTM! Standard header modernization.

Replacing include guards with #pragma once is a good modernization practice.

bench/db/CMakeLists.txt (1)

2-2: LGTM! Consistent project naming.

The project rename aligns with the broader refactoring across CMake files.

lib/structures/lsmtree/lsmtree_types.h (1)

1-1: LGTM! Good modernization practice.

Replacing traditional header guards with #pragma once is a standard modernization that improves maintainability and reduces potential macro naming conflicts.

examples/absl/concurrency/CMakeLists.txt (1)

2-2: LGTM! Consistent project renaming.

The project name update from "zkv" to "frankie" aligns with the broader project renaming effort across the codebase.

bench/CMakeLists.txt (1)

2-2: LGTM! Consistent project renaming.

The project name update from "zkv" to "frankie" maintains consistency with the broader project renaming effort.

examples/absl/absl_try_mutex.cpp (1)

23-24: LGTM! Improved code readability.

The reformatting of the m_mutex.Await call improves readability by breaking the long line into two lines without changing functionality.

bench/fs/CMakeLists.txt (1)

2-2: LGTM! Completing the consistent project renaming.

The project name update from "zkv" to "frankie" maintains consistency with the project renaming effort across all build configurations.

lib/server/CMakeLists.txt (1)

2-2: Consistent renaming approved
Project name updated to frankie aligns with the rest of the build configuration.

examples/embedded/CMakeLists.txt (1)

2-2: Renaming consistent
Updating the project declaration to frankie matches the new global naming.

lib/structures/lsmtree/compaction/compactation.h (1)

1-1: Simplified include guard approved
Replacing traditional header guards with #pragma once is a clean modernization.

.clang-tidy (1)

2-2: Abseil checks added
Including abseil-* in static analysis is a good step to enforce Abseil best practices.

lib/structures/lsmtree/compaction/level_zero_compactation.h (1)

1-1: Include guard simplification approved
Switching to #pragma once is appropriate for modern compilers.

conan/profiles/debug-gcc (1)

10-10: LGTM! Configuration reordering for consistency.

The reordering of the compiler.cppstd=gnu23 setting aligns with the broader compiler profile standardization mentioned in the PR summary.

lib/server/server_factory.h (1)

16-17: LGTM! Improved readability of template constraints.

Splitting the requires clause across multiple lines enhances readability without changing the logic.

lib/structures/lsmtree/segments/lsmtree_segment_factory.h (2)

1-1: LGTM! Modernization with pragma once.

Replacing include guards with #pragma once is a modern C++ best practice that reduces boilerplate and potential errors.


9-10: LGTM! Improved function signature formatting.

The reformatted function declaration with the return type on a separate line enhances readability for complex return types.

lib/structures/lsmtree/segments/helpers.cpp (1)

36-37: LGTM! Consistent function signature formatting.

The formatting change aligns with the broader LSM tree refactoring and improves readability by placing the return type on a separate line.

bench/fs/open_write_streams.cpp (3)

16-17: LGTM! Improved constructor readability.

Splitting the std::fstream constructor call across multiple lines enhances readability without changing functionality.


37-38: LGTM! Consistent constructor formatting.

The formatting improvement is consistent with the previous function and enhances code readability.


59-60: LGTM! Maintained formatting consistency.

The formatting change maintains consistency across all benchmark functions and improves readability.

conan/profiles/release-gcc (1)

9-9: Verify no GNU extensions are used in the codebase.

The change from gnu23 to standard 23 removes GNU dialect extensions. This improves portability but could break code that relies on GNU-specific features.

#!/bin/bash
# Description: Search for potential GNU extension usage that might break with standard C++23
# Expected: No critical GNU-specific code that would fail with standard C++23

echo "Searching for potential GNU extension usage..."

# Search for GNU-specific attributes and extensions
rg -i "(__attribute__|__builtin_|__extension__|__typeof__|__alignof__|statement.*expressions|computed.*goto)" --type cpp --type c

# Search for GCC-specific pragmas
rg "#pragma.*GCC" --type cpp --type c

# Search for variable length arrays (VLA) - GNU extension
rg "^\s*\w+\s+\w+\[.*\w+.*\]" --type cpp --type c -A 2

echo "If no critical GNU-specific code is found, the change is safe."
lib/structures/lsmtree/lsmtree_config.h (1)

1-3: LGTM! Good modernization practices.

The changes improve the header file by:

  • Using #pragma once instead of traditional include guards (cleaner and less error-prone)
  • Ordering standard library includes before project-specific includes (follows best practices)
lib/concurrency/CMakeLists.txt (2)

8-8: LGTM! Addition of libassert dependency.

Adding libassert::assert is consistent with the broader project updates and provides better assertion utilities for the concurrency module.


2-2: Verify project renaming consistency across the codebase.

The project name change from "zkv" to "frankie" should be consistent throughout the entire codebase.

#!/bin/bash
# Description: Check for any remaining references to the old "zkv" project name
# Expected: No remaining "zkv" references in CMake files, configs, or documentation

echo "Searching for remaining 'zkv' references..."

# Search in CMake files
rg -i "zkv" --type cmake

# Search in all text files for potential missed references
rg -i "zkv" --type txt --type md --type json --type yaml --type toml

# Search in source files for potential hardcoded references
rg -i "zkv" --type cpp --type c --type h

echo "All 'zkv' references should be updated to 'frankie' for consistency."
lib/config/CMakeLists.txt (2)

2-2: LGTM: Project renaming aligns with PR objectives.

The renaming from zkv to frankie is consistent with the broader project restructuring for Raft implementation.


8-8: LGTM: Appropriate dependencies for enhanced functionality.

The addition of libassert::assert and fmt::fmt as public dependencies is well-suited for:

  • Enhanced debugging and validation capabilities with libassert
  • Modern C++ formatting for logging and string operations with fmt

These dependencies align with the quality and maintainability improvements needed for the Raft consensus implementation.

lib/structures/lsmtree/segments/segment_storage.cpp (2)

82-84: LGTM: Improved readability with multi-line formatting.

The reformatting of the erase call improves code readability without changing functionality.


95-96: LGTM: Consistent trailing return type formatting.

The multi-line trailing return type formatting enhances readability and maintains consistency with modern C++ style practices.

lib/CMakeLists.txt (2)

2-2: LGTM: Consistent project renaming.

The project renaming from zkv to frankie maintains consistency across the codebase restructuring.


11-13: LGTM: Well-structured modular additions for Raft implementation.

The addition of three new subdirectories demonstrates good architectural planning:

  • raft: Core consensus protocol implementation
  • wal: Write-Ahead Log for persistence and durability
  • posix_wrapper: System abstraction layer

This modular approach supports maintainability and testability of the distributed systems components.

lib/server/server_kind.cpp (1)

20-21: LGTM: Consistent formatting for improved readability.

The multi-line trailing return type formatting enhances code readability and maintains consistency with the codebase's style standards.

conan/profiles/debug-clang (1)

2-3: Verify Clang 19 availability and compatibility.

The upgrade from Clang 18 to 19 supports better C++23 compatibility for the Raft implementation. However, ensure that:

  1. Clang 19 is available in your build environments
  2. All dependencies are compatible with Clang 19
  3. The removal of compiler.libcxx=libstdc++11 doesn't cause linking issues
#!/bin/bash
# Description: Verify Clang 19 availability and check for any compatibility issues

# Check if Clang 19 is available
which clang-19 || echo "Clang 19 not found"
which clang++-19 || echo "Clang++ 19 not found"

# Check Clang 19 version if available
if command -v clang-19 &> /dev/null; then
    echo "Clang 19 version info:"
    clang-19 --version
fi

# Look for other Conan profiles to ensure consistency
echo "Checking other Clang profiles for version consistency:"
fd -e "" "debug-clang" conan/profiles/ --exec cat {}
fd -e "" "release-clang" conan/profiles/ --exec cat {}

Also applies to: 10-10

.clang-format (2)

2-2: LGTM: Improved readability with shorter line length.

The reduction from 120 to 100 characters improves code readability, especially on smaller displays.


13-13: LGTM: BlockIndent provides more consistent formatting.

The change to BlockIndent aligns with modern C++ formatting practices and provides cleaner code structure.

lib/db/db_test.cpp (1)

14-14: LGTM: Proper namespace refactoring for WAL modularity.

The change from db::wal::make_shared to wal::make_shared correctly reflects the WAL component being moved to its own namespace, improving modularity.

Verify that the new wal namespace is properly defined and accessible:

#!/bin/bash
# Description: Verify the wal namespace and make_shared function exist in the codebase
# Expected: Find the wal namespace definition and make_shared function

# Search for wal namespace definition
ast-grep --pattern 'namespace wal { $$$ }'

# Search for make_shared function in wal namespace
rg -A 3 "make_shared.*wal"
conanfile.txt (1)

14-17: Verify dependency versions are current and secure.

The new dependencies support the Raft implementation well:

  • gtest/1.15.0: Testing framework for expanded test coverage
  • prometheus-cpp/1.3.0: Metrics for distributed system monitoring
  • libassert/2.1.4: Enhanced assertion utilities
  • magic_enum/0.9.7: Enum reflection capabilities

Please verify these dependency versions are current and secure:

What are the latest stable versions of gtest, prometheus-cpp, libassert, and magic_enum libraries? Are there any known security vulnerabilities in the specified versions?
lib/server/server_config.h (2)

5-5: LGTM: Proper include for new vector field.

The <vector> include is correctly added to support the new peers field.


15-16: LGTM: Essential fields for Raft consensus protocol.

The addition of id and peers fields is crucial for Raft implementation:

  • id: Provides unique node identification in the cluster
  • peers: Enables communication with other cluster members

The field types are appropriate for their intended use.

lib/structures/lsmtree/lsmtree_test.cpp (3)

49-50: Trailing return type formatting is acceptable.

Breaking the trailing return type onto its own line follows modern C++ style and is consistent here.


123-123: Namespace qualifier adjustment looks correct.

Switching from db::wal::make_shared to wal::make_shared aligns with the new WAL library namespace.


134-135: Assertion formatting improvement.

Splitting the REQUIRE call across two lines enhances readability without changing behavior.

examples/prometheus/CMakeLists.txt (1)

1-7: Example CMakeLists is correctly scoped.

The Prometheus example defines its executable and links only against prometheus-cpp::prometheus-cpp as intended.

lib/server/grpc_server.h (2)

20-21: Formatting only: method declaration.

This reflow of the Put signature preserves the API and behavior.


23-24: Formatting only: method declaration.

This reflow of the Get signature preserves the API and behavior.

lib/raft/CMakeLists.txt (1)

1-3: CMake boilerplate is standard.

The minimum required version and project name are correctly declared.

examples/CMakeLists.txt (1)

1-8: Examples subdirectories inclusion is correct.

Adding raft and prometheus under examples integrates both new demos as intended.

lib/posix_wrapper/posix_wrapper_test.cpp (1)

19-31: Well-structured unit test with comprehensive coverage.

The test effectively verifies the enum-to-native flag mapping with clear assertions for all documented enum values. The test structure follows Google Test best practices.

conan/profiles/release-clang (2)

10-10: Compiler version update aligns with modernization effort.

The update from Clang 18 to 19 is consistent with the broader project modernization and matches updates in other profiles.


2-3: Verify Clang 19 availability in target environments.

The explicit paths to clang-19 and clang++-19 assume these specific versions are installed at standard locations.

Please verify that Clang 19 is available in your target build environments:

#!/bin/bash
# Check if clang-19 is available
which clang-19 || echo "clang-19 not found"
which clang++-19 || echo "clang++-19 not found"

# Check clang-19 version
clang-19 --version 2>/dev/null || echo "clang-19 not executable"
assets/tkvpp_config_1.json (1)

16-26: LSM configuration appears well-structured.

The LSM tree configuration with separate strategies for level-zero and non-zero compaction provides good flexibility for performance tuning.

conan/profiles/debug-clang-tsan (2)

2-3: Compiler version update maintains consistency.

The update to Clang 19 is consistent with other profiles and supports the project's modernization effort.


4-6: Verify ThreadSanitizer compatibility with Clang 19.

The ThreadSanitizer flags should be verified to work correctly with the updated Clang version.

Please verify ThreadSanitizer functionality with Clang 19:

#!/bin/bash
# Test ThreadSanitizer compilation with Clang 19
echo 'int main() { return 0; }' | clang++-19 -fsanitize=thread -x c++ - -o /tmp/tsan_test 2>&1
if [ $? -eq 0 ]; then
    echo "ThreadSanitizer compilation successful"
    /tmp/tsan_test && echo "ThreadSanitizer runtime working"
    rm -f /tmp/tsan_test
else
    echo "ThreadSanitizer compilation failed"
fi
assets/tkvpp_config_standalone.json (1)

11-15: Let’s extract the defined enum values for walStorageType to confirm that "persistent" is allowed:

#!/bin/bash
# Extract the walStorageType enum array from the JSON schema
jq -r '.["$defs"].walStorageType.enum' assets/database_config_schema.json
lib/server/server_kind.h (1)

23-24: Formatting change is consistent.
The trailing return type has been moved to a new line, aligning with project style. No further action needed.

lib/structures/lsmtree/CMakeLists.txt (2)

25-37: Test target dependencies look comprehensive.
LSMTreeTest links against WAL and other modules, ensuring coverage of the new WAL and Raft integration. No issues spotted.


16-18: Confirm include paths cover all headers.
You’ve added ${CMAKE_CURRENT_SOURCE_DIR} and /segments, but if there are headers under levels/, ensure that directory is also added. Otherwise, references like #include "levels/foo.h" may fail.

rg -n '#include.*levels/' -n lib/structures/lsmtree
.vscode/settings.json (2)

14-16: C/C++ standards and clangd update check configured.
Setting C++23/C23 and enabling clangd.checkUpdates aligns the editor with the project's build settings.


13-13: It looks like the initial rg -R invocation used an invalid flag. Let’s re-run the search correctly to check for CMAKE_EXPORT_COMPILE_COMMANDS:

#!/bin/bash
# Using ripgrep without -R (it’s recursive by default)
rg -n 'CMAKE_EXPORT_COMPILE_COMMANDS' .

# As a fallback, use grep
grep -R 'CMAKE_EXPORT_COMPILE_COMMANDS' -n .
assets/tkvpp_config_3.json (2)

5-10: Configuration matches schema.
The database section correctly specifies the path, manifest prefix, and mode: "replicated". No changes needed.


11-15: ```shell
#!/bin/bash

Search for references to the walStorageType enum in code (excluding assets)

rg -n 'walStorageType' --glob '!*assets/**'

Search for any explicit handling of "replicated" storage type in code (excluding JSON files)

rg -n 'replicated' --glob '!assets/**' --glob '!/*.json'


</details>
<details>
<summary>lib/config/config.h (3)</summary>

`3-7`: **LGTM! Consistent include style and proper WAL integration.**

The changes to use quoted includes and the addition of WAL configuration support are well-integrated and follow consistent patterns.

---

`18-18`: **LGTM! WAL configuration properly integrated.**

The addition of the WALConfig member aligns with the broader WAL integration described in the PR objectives.

---

`20-21`: **LGTM! Consistent return type usage.**

The change from `std::filesystem::path` to `fs::path_t` and the addition of `manifest_path()` method maintain consistency with the project's type aliases.

</details>
<details>
<summary>assets/tkvpp_config_replicated_single_node.json (1)</summary>

`1-37`: **LGTM! Well-structured configuration for replicated single node.**

The JSON configuration is properly formatted and contains appropriate settings for a replicated single-node setup. The configuration values are consistent and align with the Raft consensus implementation.

</details>
<details>
<summary>lib/db/db_config.cpp (2)</summary>

`4-17`: **LGTM! Complete enum to string conversion with proper error handling.**

The switch statement covers all enum values and includes appropriate error handling for unexpected cases.

---

`19-37`: **LGTM! Robust string to enum conversion with validation.**

The function properly validates all supported string values and provides clear error reporting for invalid inputs.

</details>
<details>
<summary>lib/server/grpc_server.cpp (1)</summary>

`2-2`: **LGTM!**

The include is necessary for accessing `db::db_put_context_k::replicate_k`.

</details>
<details>
<summary>lib/config/config.cpp (1)</summary>

`1-13`: **LGTM!**

Good use of `string_view` for compile-time constants and appropriate includes for file operations.

</details>
<details>
<summary>lib/structures/lsmtree/levels/levels.cpp (3)</summary>

`21-21`: **Excellent thread lifecycle management!**

The refactoring improves safety and readability:
- Proper `joinable()` check prevents crashes
- `stop_token` usage follows modern C++20 best practices
- Extracting `compaction_task` improves code organization




Also applies to: 32-43, 202-220

---

`182-182`: **Good typo fix!**

---

`78-82`: Let’s locate how ASSERT is defined and used across the repo:


```shell
#!/bin/bash
# Search for ASSERT macro definitions
rg -n '#define\s*ASSERT'

# Restrict to common header extensions
rg -n '#define\s*ASSERT' -g '*.{h,hpp}'

# Show ASSERT usages with surrounding lines to spot its definition
rg -n 'ASSERT\(' -C 3
lib/posix_wrapper/open_flag.h (1)

48-57: Well-designed utility functions!

Good use of constexpr and noexcept for compile-time evaluation and performance.

CMakeLists.txt (2)

10-26: Good fix for the compiler compatibility issue!

This implementation correctly addresses the previous concern about -Wthread-safety being Clang-specific. The conditional logic ensures GCC builds won't fail.


45-48: LGTM!

The new dependencies align well with the Raft implementation requirements and enhanced testing/monitoring capabilities.

lib/db/db_config.h (1)

13-25: Well-designed mode abstraction with string conversion utilities

The enum design and string constant approach provides a clean abstraction for database modes. The conversion functions will facilitate configuration parsing and validation.

lib/structures/lsmtree/segments/lsmtree_regular_segment.cpp (2)

19-26: Good improvement with noexcept specification

Adding noexcept to the constructor is appropriate since it only performs move operations and basic initialization without throwing operations.


28-29: Improved readability with consistent formatting

The parameter and return type formatting enhances code readability and maintains consistency across the codebase.

Also applies to: 54-55

lib/db/CMakeLists.txt (2)

4-4: Good modularization with extracted WAL dependency

The addition of db_config.cpp and removal of "wal/wal.cpp" reflects the proper separation of concerns, with WAL now as an external dependency.


8-23: Clean dependency management with new WAL and assertion libraries

The reformatted and extended dependencies properly reflect the new modular architecture. The addition of libassert::assert and WAL libraries aligns with the system design.

lib/fs/common.h (3)

11-21: Well-chosen default constants for filesystem operations

The buffer size (4KB), file permissions (0644), and io_uring queue entries (128) are reasonable defaults for filesystem operations. The buffer size aligns with typical page sizes, and the queue depth provides good concurrency without excessive resource usage.


23-35: Comprehensive error code enumeration

The error codes cover all essential file operation failures including io_uring-specific errors. The enum class design provides type safety and clear error categorization.


37-48: Excellent error handling design with rich context

The file_error_t struct properly combines error codes, system errno values, and descriptive messages. The static factory methods and [[nodiscard]] attributes promote safe error handling practices.

lib/fs/CMakeLists.txt (1)

1-13: LGTM! CMake configuration properly updated.

The changes appropriately rename the project and add necessary dependencies for the enhanced filesystem module with asynchronous I/O support.

lib/structures/lsmtree/levels/level.cpp (2)

62-63: Good addition of assertions for manifest operations.

Wrapping manifest->add() calls with ASSERT ensures these critical operations succeed and helps catch issues early in debug builds.

Also applies to: 266-269, 294-297


30-33: Formatting improvements enhance readability.

The code reformatting makes complex expressions and function signatures more readable.

Also applies to: 73-74, 91-92, 99-102, 115-120, 124-128, 202-205, 210-213, 227-230, 239-242, 246-247, 282-283

lib/concurrency/thread_safe_queue.h (2)

21-42: Well-implemented move semantics with proper thread safety.

The addition of move constructor and move assignment operator correctly handles mutex locking using helper functions. The explicit deletion of copy operations and the private swap method follow best practices.

Also applies to: 98-104


71-71: Clean simplification of pop_all().

The ternary operator makes the implementation more concise while maintaining the same behavior.

lib/fs/random_access_file_test.cpp (1)

1-222: Excellent test coverage for random_access_file_t.

The test suite is comprehensive and well-structured:

  • Good use of test fixtures for setup/teardown
  • Helper utilities are appropriately implemented
  • Tests cover both success and failure scenarios
  • Parameterized tests effectively test multiple offsets
  • Edge cases like reading beyond EOF and operations on moved-from objects are covered
assets/tkvpp_config_2.json (1)

1-39: Configuration looks good for Raft node 2.

The configuration correctly sets up node 2 with replicated mode, appropriate port binding, and includes all cluster peers as expected for Raft consensus.

lib/structures/lsmtree/levels/level.h (1)

25-27: Was removing explicit intentional?

The constructor is no longer marked explicit, which allows implicit conversions. This could lead to unexpected object construction. Consider keeping it explicit unless implicit conversion is specifically needed.

lib/structures/lsmtree/levels/levels.h (2)

27-48: Well-designed move constructor with proper thread lifecycle management.

The move constructor correctly handles the compaction thread by requesting stop and joining before starting a new thread. The comments clearly document the non-movable mutex behavior.


50-59: Thread-safe move assignment with deadlock prevention.

The move assignment operator uses absl_dual_mutex_lock_guard to safely lock both mutexes and prevent deadlocks. The temporary object pattern ensures exception safety.

lib/structures/lsmtree/segments/lsmtree_regular_segment.h (2)

1-1: Good modernization with pragma once.

Replacing include guards with #pragma once improves compilation speed and readability.


16-24: Excellent explicit special member function management.

The explicit deletion of copy/move operations and defaulted destructor with noexcept specifications follow modern C++ best practices and clearly communicate the intended object semantics.

lib/structures/lsmtree/segments/segment_storage.h (3)

3-10: Improved include organization.

Reordering includes to place standard library headers before project-specific ones follows established conventions and improves readability.


43-46: Good type alias introduction for clarity.

The regular_segment_ptr_t alias improves code readability and the simplified storage_t alias reduces verbosity without losing meaning.


52-60: Excellent explicit special member function control.

The explicit deletion of copy/move operations with proper noexcept specifications clearly communicates the intended object lifetime semantics and prevents accidental copying.

lib/fs/random_access_file.h (3)

1-9: Excellent modern C++ header design.

The use of #pragma once, std::expected for error handling, and proper include organization demonstrates good modern C++ practices.


17-25: Perfect RAII resource management.

The explicit deletion of copy operations and proper move semantics with noexcept specifications ensure safe resource management for the file descriptor and io_uring instance.


50-55: Clean builder pattern implementation.

The builder pattern provides a clean interface for constructing the complex random_access_file_t object, and the use of std::expected allows for proper error propagation during construction.

lib/db/db.h (1)

29-34: Verify consensus module initialization.

The m_pConsensusModule member is declared but not shown in the constructor parameters. Ensure it's properly initialized, either in the constructor or through a separate initialization method.

#!/bin/bash
# Check how m_pConsensusModule is initialized in the implementation
rg -A 10 "db_t::db_t.*config.*pWal.*pManifest.*pLsmtree" lib/db/db.cpp
# Also check if there's a setter method
rg "setConsensusModule|initConsensusModule" lib/db/

Also applies to: 62-62

lib/structures/lsmtree/lsmtree.h (4)

1-1: LGTM!

Good modernization using #pragma once instead of traditional include guards.


19-37: Well-designed status enum for error handling.

The comprehensive set of status codes will improve error reporting and debugging. Good use of enum class with explicit underlying type.


42-47: Good use of pass-by-value for move semantics.

The constructor properly takes memtable and levels by value, allowing callers to efficiently move these objects into the LSM tree. This is a good pattern when combined with the move constructor/assignment operators added below.


54-55: Proper implementation of move semantics.

Good adherence to the Rule of Five with move operations marked noexcept and copy operations deleted. This ensures efficient resource management.

lib/db/manifest/manifest.h (4)

3-16: LGTM! Necessary includes for WAL integration.

The new includes support the refactored manifest implementation with WAL integration and improved error handling.


37-37: Good practice: Explicit enum value for serialization consistency.

Making the enum value explicit ensures consistent serialization across different platforms and compiler versions.


157-157: Good design: std::monostate for uninitialized state.

Adding std::monostate as the first variant type provides a proper uninitialized state, which improves error handling and initialization semantics.


183-201: Well-implemented streaming operator with proper error handling.

The operator<< correctly handles all variant types including the uninitialized state with appropriate error logging.

Comment on lines 3 to 6
namespace structures::lsmtree::compaction
{

class CompactionTrigger
{
public:
private:
};

} // namespace structures::lsmtree::compaction
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

❓ Verification inconclusive

Empty header – remove or restore intended API
This file now only declares an empty namespace block. Either delete compaction_trigger.h if it's no longer needed, or reintroduce the CompactionTrigger declaration/implementation that was removed.

Please verify there are no remaining references to CompactionTrigger in the codebase:


🏁 Script executed:

#!/bin/bash
# Search for any lingering uses of the removed `CompactionTrigger`
rg "CompactionTrigger"

Length of output: 22


Empty header – remove or restore intended API
This file now only declares an empty namespace block. Either delete compaction_trigger.h if it's no longer needed, or reintroduce the CompactionTrigger declaration/implementation that was removed.

Please verify there are no remaining references to CompactionTrigger in the codebase:

#!/bin/bash
# Search for any lingering uses of the removed `CompactionTrigger`
rg "CompactionTrigger"
🤖 Prompt for AI Agents
In lib/structures/lsmtree/compaction/compaction_trigger.h at lines 3 to 6, the
file only contains an empty namespace block with no declarations. You should
either delete this header file if it is no longer used or restore the original
CompactionTrigger declaration and implementation if it was removed by mistake.
Before deciding, run a search for "CompactionTrigger" in the codebase to confirm
if there are any remaining references that require this header.

Comment on lines 375 to 394
s = (struct submitter *)malloc(sizeof(*s));
if (!s)
{
perror("malloc");
return 1;
}
memset(s, 0, sizeof(*s));

if (app_setup_uring(s))
{
fprintf(stderr, "Unable to setup uring!\n");
return 1;
}

for (int i = 1; i < argc; i++)
{
if (submit_to_sq(argv[i], s))
{
fprintf(stderr, "Error reading file\n");
return 0;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix memory leak and error return code.

The submitter structure is never freed, and the error path returns 0 (success) instead of an error code.

Apply these fixes:

  1. Free the submitter at the end of main:
     }
+    free(s);
     return 0;
  1. Fix the error return code:
         {
             fprintf(stderr, "Error reading file\n");
-            return 0;
+            return 1;
         }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
s = (struct submitter *)malloc(sizeof(*s));
if (!s)
{
perror("malloc");
return 1;
}
memset(s, 0, sizeof(*s));
if (app_setup_uring(s))
{
fprintf(stderr, "Unable to setup uring!\n");
return 1;
}
for (int i = 1; i < argc; i++)
{
if (submit_to_sq(argv[i], s))
{
fprintf(stderr, "Error reading file\n");
return 0;
s = (struct submitter *)malloc(sizeof(*s));
if (!s)
{
perror("malloc");
return 1;
}
memset(s, 0, sizeof(*s));
if (app_setup_uring(s))
{
fprintf(stderr, "Unable to setup uring!\n");
return 1;
}
for (int i = 1; i < argc; i++)
{
if (submit_to_sq(argv[i], s))
{
fprintf(stderr, "Error reading file\n");
return 1;
}
}
free(s);
return 0;
🤖 Prompt for AI Agents
In examples/iouring/raw_uring/main.cpp around lines 375 to 394, the allocated
submitter structure is not freed, causing a memory leak, and the error return
code is incorrectly 0 instead of a failure code. To fix this, ensure you call
free on the submitter pointer before exiting main to release the allocated
memory. Also, change the error return code from 0 to a non-zero value (e.g., 1)
to correctly indicate failure when submit_to_sq fails.

Comment on lines +334 to +362
while (bytes_remaining)
{
const off_t bytes_to_read = std::min(bytes_remaining, (long)BLOCK_SZ);

void *buf = nullptr;
if (posix_memalign(&buf, BLOCK_SZ, BLOCK_SZ))
{
perror("posix_memalign");
return 1;
}

iovecs[current_block].iov_base = buf;
iovecs[current_block].iov_len = bytes_to_read;
current_block++;
bytes_remaining -= bytes_to_read;
}

int ret = readv(file_fd, iovecs, blocks);
if (ret < 0)
{
perror("readv");
return 1;
}

for (int i = 0; i < blocks; ++i)
{
output_to_console((char *)iovecs[i].iov_base, iovecs[i].iov_len);
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix memory leaks in synchronous read function.

The buffers allocated with posix_memalign and the iovecs array are never freed. The file descriptor is also not closed.

Add cleanup after line 361:

     for (int i = 0; i < blocks; ++i)
     {
         output_to_console((char *)iovecs[i].iov_base, iovecs[i].iov_len);
+        free(iovecs[i].iov_base);
     }
+    free(iovecs);
+    close(file_fd);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
while (bytes_remaining)
{
const off_t bytes_to_read = std::min(bytes_remaining, (long)BLOCK_SZ);
void *buf = nullptr;
if (posix_memalign(&buf, BLOCK_SZ, BLOCK_SZ))
{
perror("posix_memalign");
return 1;
}
iovecs[current_block].iov_base = buf;
iovecs[current_block].iov_len = bytes_to_read;
current_block++;
bytes_remaining -= bytes_to_read;
}
int ret = readv(file_fd, iovecs, blocks);
if (ret < 0)
{
perror("readv");
return 1;
}
for (int i = 0; i < blocks; ++i)
{
output_to_console((char *)iovecs[i].iov_base, iovecs[i].iov_len);
}
while (bytes_remaining)
{
const off_t bytes_to_read = std::min(bytes_remaining, (long)BLOCK_SZ);
void *buf = nullptr;
if (posix_memalign(&buf, BLOCK_SZ, BLOCK_SZ))
{
perror("posix_memalign");
return 1;
}
iovecs[current_block].iov_base = buf;
iovecs[current_block].iov_len = bytes_to_read;
current_block++;
bytes_remaining -= bytes_to_read;
}
int ret = readv(file_fd, iovecs, blocks);
if (ret < 0)
{
perror("readv");
return 1;
}
for (int i = 0; i < blocks; ++i)
{
output_to_console((char *)iovecs[i].iov_base, iovecs[i].iov_len);
free(iovecs[i].iov_base);
}
free(iovecs);
close(file_fd);
🤖 Prompt for AI Agents
In examples/iouring/raw_uring/main.cpp around lines 334 to 362, the buffers
allocated with posix_memalign for iovecs and the iovecs array itself are not
freed, and the file descriptor is not closed, causing memory leaks and resource
leaks. To fix this, after line 361, add code to free each buffer pointed to by
iovecs[i].iov_base, then free the iovecs array, and finally close the file
descriptor file_fd to properly release all allocated resources.

Comment on lines +1 to +12
cmake_minimum_required(VERSION 3.25)
project(frankie)

add_library(POSIXWrapper open_flag.cpp)

set_target_properties(POSIXWrapper PROPERTIES CXX_STANDARD 23)
target_include_directories(POSIXWrapper INTERFACE ${CMAKE_CURRENT_SOURCE_DIR})
target_link_libraries(POSIXWrapper PUBLIC uring spdlog::spdlog)

add_executable(POSIXWrapperTest posix_wrapper_test.cpp)
set_target_properties(POSIXWrapperTest PROPERTIES CXX_STANDARD 23)
target_link_libraries(POSIXWrapperTest gtest::gtest)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Fix missing gtest linking and improve test configuration.

The CMake configuration is well-structured, but the test executable needs proper gtest linking and configuration.

Apply this diff to fix the test configuration:

 add_executable(POSIXWrapperTest posix_wrapper_test.cpp)
 set_target_properties(POSIXWrapperTest PROPERTIES CXX_STANDARD 23)
-target_link_libraries(POSIXWrapperTest gtest::gtest)
+target_link_libraries(POSIXWrapperTest 
+    PRIVATE 
+        POSIXWrapper 
+        gtest::gtest 
+        gtest::gtest_main
+)
+
+# Enable test discovery
+include(GoogleTest)
+gtest_discover_tests(POSIXWrapperTest)

This ensures:

  • The test links against the library being tested
  • Includes gtest_main for automatic test runner
  • Enables CTest integration through test discovery
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
cmake_minimum_required(VERSION 3.25)
project(frankie)
add_library(POSIXWrapper open_flag.cpp)
set_target_properties(POSIXWrapper PROPERTIES CXX_STANDARD 23)
target_include_directories(POSIXWrapper INTERFACE ${CMAKE_CURRENT_SOURCE_DIR})
target_link_libraries(POSIXWrapper PUBLIC uring spdlog::spdlog)
add_executable(POSIXWrapperTest posix_wrapper_test.cpp)
set_target_properties(POSIXWrapperTest PROPERTIES CXX_STANDARD 23)
target_link_libraries(POSIXWrapperTest gtest::gtest)
cmake_minimum_required(VERSION 3.25)
project(frankie)
add_library(POSIXWrapper open_flag.cpp)
set_target_properties(POSIXWrapper PROPERTIES CXX_STANDARD 23)
target_include_directories(POSIXWrapper INTERFACE ${CMAKE_CURRENT_SOURCE_DIR})
target_link_libraries(POSIXWrapper PUBLIC uring spdlog::spdlog)
add_executable(POSIXWrapperTest posix_wrapper_test.cpp)
set_target_properties(POSIXWrapperTest PROPERTIES CXX_STANDARD 23)
target_link_libraries(POSIXWrapperTest
PRIVATE
POSIXWrapper
gtest::gtest
gtest::gtest_main
)
# Enable test discovery
include(GoogleTest)
gtest_discover_tests(POSIXWrapperTest)
🤖 Prompt for AI Agents
In lib/posix_wrapper/CMakeLists.txt lines 1 to 12, the test executable
POSIXWrapperTest is missing proper linking to the POSIXWrapper library and
gtest_main, and lacks CTest integration. Fix this by linking POSIXWrapperTest
against both POSIXWrapper and gtest_main libraries, and add enable_testing()
along with add_test() commands to enable CTest test discovery and execution.

http_requests_counter.Add({{"method", method}}).Increment();
}

return EXIT_SUCCESS;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Include missing header for EXIT_SUCCESS.

The code uses EXIT_SUCCESS but doesn't include the required header.

Add the missing include:

+#include <cstdlib>
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
return EXIT_SUCCESS;
+#include <cstdlib>
🤖 Prompt for AI Agents
In examples/prometheus/main.cpp around line 77, the code uses EXIT_SUCCESS
without including the necessary header. Add the missing include directive for
<cstdlib> at the top of the file to ensure EXIT_SUCCESS is properly defined.

Comment on lines 166 to 168
// should return an error code?
assert(m_levels.flush_to_level0(std::move(memtable)));
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Replace assert with proper error handling in flushing thread.

Using assert in the flushing thread will crash the entire program on failure. This could lead to data loss and poor user experience. Consider logging the error and implementing a retry mechanism or graceful degradation.

-                // TODO: Assert will crash the program, maybe we
-                // should return an error code?
-                assert(m_levels.flush_to_level0(std::move(memtable)));
+                if (!m_levels.flush_to_level0(std::move(memtable))) {
+                    spdlog::error("Failed to flush memtable to level0");
+                    // Consider retry logic or error recovery here
+                }

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In lib/structures/lsmtree/lsmtree.cpp around lines 166 to 168, replace the
assert statement used to check the result of flush_to_level0 with proper error
handling. Instead of crashing the program on failure, log the error and
implement a retry mechanism or a way to gracefully handle the failure to avoid
data loss and improve stability in the flushing thread.

Comment on lines +26 to +46
template <typename TStream> auto operator<<(TStream &stream, const LogEntry &record) -> TStream &
{
stream << record.SerializeAsString();

// TODO(lnikon): Uncomment when protobuf serialization is implemented. Protobuf serializes into
// binary format by default.
// record.SerializeToOstream(&stream);

return stream;
}

template <typename TStream> auto operator>>(TStream &stream, LogEntry &record) -> TStream &
{
record.ParseFromString(stream.str());

// TODO(lnikon): Uncomment when protobuf serialization is implemented. Protobuf serializes into
// binary format by default.
// record.ParseFromIstream(&stream);

return stream;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Move serialization operators into the raft namespace.

These template operators for LogEntry are in the global namespace, which can cause ODR violations and naming conflicts. Move them into the raft namespace or to a separate header file.

+namespace raft
+{
+
 template <typename TStream> auto operator<<(TStream &stream, const LogEntry &record) -> TStream &
 {
     stream << record.SerializeAsString();
     // ... rest of the function
 }

 template <typename TStream> auto operator>>(TStream &stream, LogEntry &record) -> TStream &
 {
     record.ParseFromString(stream.str());
     // ... rest of the function
 }
+
+} // namespace raft
-
-namespace raft
-{

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In lib/raft/raft.h around lines 26 to 46, the template operators for LogEntry
serialization are currently defined in the global namespace, which risks ODR
violations and naming conflicts. Move both operator<< and operator>> template
definitions inside the raft namespace to properly scope them and avoid these
issues.

auto consensus_module_t::getLastLogIndex() const -> uint32_t
{
// TODO(lnikon): Add a method into wal::wal_t for: m_log->read(m_log->size() - 1)
return m_log->empty() ? 0 : m_log->read(m_log->size() - 1)->index();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add consistent bounds checking for log access.

The code has inconsistent bounds checking when accessing the log. While getLogTerm checks bounds, getLastLogIndex and getLastLogTerm don't.

 auto consensus_module_t::getLastLogIndex() const -> uint32_t
 {
-    return m_log->empty() ? 0 : m_log->read(m_log->size() - 1)->index();
+    if (m_log->empty()) return 0;
+    auto entry = m_log->read(m_log->size() - 1);
+    return entry.has_value() ? entry->index() : 0;
 }

 auto consensus_module_t::getLastLogTerm() const -> uint32_t
 {
-    return m_log->empty() ? 0 : m_log->read(m_log->size() - 1)->term();
+    if (m_log->empty()) return 0;
+    auto entry = m_log->read(m_log->size() - 1);
+    return entry.has_value() ? entry->term() : 0;
 }

Also applies to: 983-983, 994-999

🤖 Prompt for AI Agents
In lib/raft/raft.cpp at lines 978, 983, and 994-999, the functions
getLastLogIndex and getLastLogTerm access the log without consistent bounds
checking, unlike getLogTerm. To fix this, add explicit checks to ensure the log
is not empty before accessing elements, returning a default value (e.g., 0) if
the log is empty. This will prevent out-of-bounds access and maintain
consistency across these functions.

Comment on lines +915 to +966
std::vector<std::jthread> requesterThreads;
requesterThreads.reserve(m_replicas.size());
for (auto &[id, client] : m_replicas)
{
spdlog::debug(
"Node={} is creating RequestVoteRPC thread for the "
"peer={} during term={}",
m_config.m_id,
id,
newTerm
);

requesterThreads.emplace_back(
[&client, request, this]()
{
RequestVoteResponse response;
if (!client->requestVote(request, &response))
{
spdlog::error("RequestVote RPC failed in requester thread");
return;
}

auto responseTerm = response.term();
auto voteGranted = response.votegranted();

spdlog::debug(
"Received RequestVoteResponse in requester "
"thread peerTerm={} "
"voteGranted={} peer={}",
responseTerm,
voteGranted,
response.responderid()
);

absl::WriterMutexLock locker(&m_stateMutex);
if (responseTerm > m_currentTerm)
{
becomeFollower(responseTerm);
return;
}

if (voteGranted != 0 && responseTerm == m_currentTerm)
{
m_voteCount++;
if (hasMajority(m_voteCount.load()))
{
becomeLeader();
}
}
}
);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Use thread pool instead of spawning threads for each RPC.

Creating a new thread for each replica during elections is inefficient and can degrade performance under load. Consider using a thread pool or async approach.

-std::vector<std::jthread> requesterThreads;
-requesterThreads.reserve(m_replicas.size());
-for (auto &[id, client] : m_replicas)
-{
-    requesterThreads.emplace_back(
-        [&client, request, this]()
-        {
-            // RPC logic
-        }
-    );
-}
+// Use a thread pool or async futures
+std::vector<std::future<void>> futures;
+for (auto &[id, client] : m_replicas)
+{
+    futures.push_back(std::async(std::launch::async, 
+        [&client, request, this]()
+        {
+            // RPC logic
+        }
+    ));
+}

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In lib/raft/raft.cpp between lines 915 and 966, the code currently spawns a new
std::jthread for each replica to send RequestVoteRPCs, which is inefficient.
Refactor this to use a thread pool or an asynchronous task executor instead of
creating a separate thread per RPC. Replace the requesterThreads vector and
emplace_back calls with tasks submitted to a thread pool, ensuring the same
logic runs asynchronously without spawning excessive threads.

Comment on lines +203 to +232
template <typename TStream>
auto operator>>(TStream &stream, manifest_t::record_t &rRecord) -> TStream &
{
int record_type_int{0};
stream >> record_type_int;

switch (static_cast<manifest_t::record_type_k>(record_type_int))
{
case manifest_t::record_type_k::segment_k:
{
manifest_t::segment_record_t record;
record.read(stream);
rRecord.emplace<1>(std::move(record));
break;
}
case manifest_t::record_type_k::level_k:
{
manifest_t::level_record_t record;
record.read(stream);
rRecord.emplace<2>(record);
break;
}
default:
{
spdlog::error("unhandled record_type_int={}. Skipping record.", record_type_int);
break;
}
}
return stream;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

⚠️ Potential issue

Fix inconsistent move semantics and add error handling.

There are two issues to address:

  1. Inconsistent use of std::move between segment and level records
  2. Missing error handling for stream extraction failures
 auto operator>>(TStream &stream, manifest_t::record_t &rRecord) -> TStream &
 {
     int record_type_int{0};
-    stream >> record_type_int;
+    if (!(stream >> record_type_int))
+    {
+        spdlog::error("Failed to read record type from stream");
+        return stream;
+    }

     switch (static_cast<manifest_t::record_type_k>(record_type_int))
     {
     case manifest_t::record_type_k::segment_k:
     {
         manifest_t::segment_record_t record;
         record.read(stream);
         rRecord.emplace<1>(std::move(record));
         break;
     }
     case manifest_t::record_type_k::level_k:
     {
         manifest_t::level_record_t record;
         record.read(stream);
-        rRecord.emplace<2>(record);
+        rRecord.emplace<2>(std::move(record));
         break;
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
template <typename TStream>
auto operator>>(TStream &stream, manifest_t::record_t &rRecord) -> TStream &
{
int record_type_int{0};
stream >> record_type_int;
switch (static_cast<manifest_t::record_type_k>(record_type_int))
{
case manifest_t::record_type_k::segment_k:
{
manifest_t::segment_record_t record;
record.read(stream);
rRecord.emplace<1>(std::move(record));
break;
}
case manifest_t::record_type_k::level_k:
{
manifest_t::level_record_t record;
record.read(stream);
rRecord.emplace<2>(record);
break;
}
default:
{
spdlog::error("unhandled record_type_int={}. Skipping record.", record_type_int);
break;
}
}
return stream;
}
template <typename TStream>
auto operator>>(TStream &stream, manifest_t::record_t &rRecord) -> TStream &
{
int record_type_int{0};
if (!(stream >> record_type_int))
{
spdlog::error("Failed to read record type from stream");
return stream;
}
switch (static_cast<manifest_t::record_type_k>(record_type_int))
{
case manifest_t::record_type_k::segment_k:
{
manifest_t::segment_record_t record;
record.read(stream);
rRecord.emplace<1>(std::move(record));
break;
}
case manifest_t::record_type_k::level_k:
{
manifest_t::level_record_t record;
record.read(stream);
rRecord.emplace<2>(std::move(record));
break;
}
default:
{
spdlog::error("unhandled record_type_int={}. Skipping record.", record_type_int);
break;
}
}
return stream;
}
🤖 Prompt for AI Agents
In lib/db/manifest/manifest.h between lines 203 and 232, fix the inconsistent
use of std::move by applying it uniformly to both segment_record_t and
level_record_t when emplacing into rRecord. Additionally, add error handling
after extracting record_type_int from the stream to check if the stream is in a
failed state; if so, log an error and handle the failure appropriately to
prevent processing invalid data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In progress
Development

Successfully merging this pull request may close these issues.

1 participant