-
Notifications
You must be signed in to change notification settings - Fork 0
Raft: Support database replication via Raft. #10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThis update introduces a comprehensive refactor and extension to the codebase, focusing on modularizing storage, logging, and consensus components. A new Write-Ahead Log (WAL) system is implemented with pluggable storage backends, including in-memory and file-based options, and is integrated via builder and wrapper patterns. The Raft consensus algorithm is added with gRPC-based communication, persistent state management, and a replicated log abstraction. The database, LSM tree, and manifest components are updated to support the new WAL and consensus modules, with improved error handling and move semantics. Extensive CMake and configuration changes enable modular builds, new dependencies, and flexible deployment. Auxiliary files, tests, scripts, and documentation are added or updated to support the new architecture. Changes
Sequence Diagram(s)sequenceDiagram
participant Main as Main
participant Config as Config Loader
participant WAL as WAL Builder
participant Raft as Raft Consensus Module
participant DB as Database
participant Server as gRPC Server
Main->>Config: Load and validate config (DB, WAL, LSM, Server)
Config-->>Main: Config object
Main->>WAL: Build WAL (in-memory or file)
WAL-->>Main: WAL instance
Main->>Raft: (If replicated) Init Raft consensus module
Raft-->>Main: Raft module instance
Main->>DB: Initialize database with config and WAL
DB-->>Main: DB instance
Main->>Server: Register services (KV, Raft)
Server-->>Main: Server ready
Main->>Server: Start server in thread
Main->>Raft: Start consensus module (if replicated)
Main-->>Main: Wait for shutdown signal
Main->>Server: Shutdown server
Main->>Raft: Stop consensus module
Poem
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
🧹 Nitpick comments (7)
examples/raft/thread_safety.h (2)
52-96
: Commented-out Mutex classCurrently, the annotated
Mutex
class is left commented out. If it’s still under development, consider removing it to reduce code clutter or enabling it with an appropriate implementation.
229-279
: Deprecated lock-style annotationsThe lock-style annotations at the end of this file are marked as deprecated. If the project no longer relies on them, removing the dead code can improve readability. Otherwise, keep them in a dedicated legacy-compatibility file or mark them with a deprecation notice to guide users to the new annotation approach.
examples/raft/main.cpp (1)
58-58
: Consider passing 'ip' by const referenceStatic analysis suggests passing 'ip' by const reference to avoid potential performance overhead and unnecessary copying when constructing
NodeClient
.- NodeClient(const ID id, const IP ip) + NodeClient(const ID id, const IP& ip)🧰 Tools
🪛 cppcheck (2.10-2)
[performance] 58-58: Function parameter 'ip' should be passed by const reference.
(passedByValue)
lib/proto/Raft.proto (1)
45-45
: Inconsistent field naming: “lastLogterm”For consistency with other fields like
term
,lastLogIndex
, and typical naming conventions, consider renaminglastLogterm
tolastLogTerm
.- uint32 lastLogterm = 4; + uint32 lastLogTerm = 4;lib/proto/CMakeLists.txt (3)
40-41
: Remove duplicate variable definitions.These variables are already defined above (lines 11-12) and don't need to be redefined since they haven't changed.
-set(PROTO_BINARY_DIR "${CMAKE_CURRENT_BINARY_DIR}") -set(PROTO_IMPORT_DIRS "${CMAKE_CURRENT_LIST_DIR}")
34-62
: Consider reducing code duplication in proto generation setup.The Raft proto setup duplicates the same pattern as TinyKVPP. Consider creating a function to reduce duplication and improve maintainability.
Example approach:
function(setup_proto_library TARGET_NAME PROTO_FILE) if(NOT EXISTS "${PROTO_FILE}") message(FATAL_ERROR "Proto file not found: ${PROTO_FILE}") endif() add_library(${TARGET_NAME} OBJECT "${PROTO_FILE}") target_include_directories(${TARGET_NAME} PUBLIC "$<BUILD_INTERFACE:${PROTO_BINARY_DIR}>" ${protobuf_INCLUDE_DIR}) target_link_libraries(${TARGET_NAME} PUBLIC protobuf::libprotobuf gRPC::grpc++) protobuf_generate( TARGET ${TARGET_NAME} IMPORT_DIRS ${PROTO_IMPORT_DIRS} PROTOC_OUT_DIR "${PROTO_BINARY_DIR}" ) protobuf_generate( TARGET ${TARGET_NAME} LANGUAGE grpc PLUGIN protoc-gen-grpc=$<TARGET_FILE:gRPC::grpc_cpp_plugin> PLUGIN_OPTIONS generate_mock_code=true GENERATE_EXTENSIONS .grpc.pb.h .grpc.pb.cc) endfunction() # Usage: setup_proto_library(TKVProtoObjects "${TINYKVPP_PROTO_FILE}") setup_proto_library(RaftProtoObjects "${RAFT_PROTO_FILE}")
1-3
: Add version requirements for protobuf and gRPC.Since this project relies heavily on protobuf and gRPC for the Raft implementation, consider adding minimum version requirements to ensure compatibility.
cmake_minimum_required(VERSION 3.25) project(zkv) + +find_package(Protobuf 3.21 REQUIRED) +find_package(gRPC 1.50 REQUIRED)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
.vscode/launch.json
(2 hunks)CMakeLists.txt
(1 hunks)examples/CMakeLists.txt
(1 hunks)examples/raft/CMakeLists.txt
(1 hunks)examples/raft/main.cpp
(1 hunks)examples/raft/thread_safety.h
(1 hunks)lib/proto/CMakeLists.txt
(2 hunks)lib/proto/Raft.proto
(1 hunks)lib/structures/lsmtree/levels/level.cpp
(1 hunks)run_replicas.sh
(1 hunks)
🧰 Additional context used
🪛 cppcheck (2.10-2)
examples/raft/main.cpp
[performance] 58-58: Function parameter 'ip' should be passed by const reference.
(passedByValue)
🔇 Additional comments (6)
examples/raft/thread_safety.h (2)
6-10
: Clang-specific annotations and portability concern
These conditional macros work well for Clang’s thread safety analysis. However, if this header is consumed in a project that compiles with different compilers lacking Clang’s __attribute__
support, the macros become no-ops. Ensure that consumers of this header file are aware of how these macros behave in non-Clang environments.
108-124
: Effective RAII-based locking
The MutexLocker
class employs a clean RAII pattern to ensure mutex acquisition and release. This pattern mitigates risks of forgetting to unlock in exceptional code paths. The approach is sound and safe.
examples/raft/main.cpp (1)
84-85
: Review election timeout usage
Using generateRandomTimeout()
directly as the RPC deadline is somewhat ad-hoc. If threads or timeouts are short, requests might prematurely fail and disrupt the election process. Validate that this approach is within expected design constraints.
examples/CMakeLists.txt (1)
6-6
: New subdirectory addition
Adding the raft
subdirectory properly includes the new Raft module. Ensure that all required libraries and dependencies (like grpc
, spdlog
, and the proto files) are consistently linked in the raft
subdirectory’s CMakeLists.txt
.
run_replicas.sh (1)
3-5
: Verify the deployment strategy for production
The current configuration uses localhost addresses (0.0.0.0), which is suitable for development but may need adjustment for production deployment.
Please clarify:
- Is this script intended for development only?
- For production deployment, will there be:
- Configuration for actual host addresses?
- Container orchestration (e.g., Kubernetes)?
- Load balancer configuration?
.vscode/launch.json (1)
15-16
: Verify external console requirement.
The configuration enables external console which differs from other configurations. Verify if this is required for the Raft implementation (e.g., for viewing distributed node logs separately).
examples/raft/CMakeLists.txt
Outdated
cmake_minimum_required(VERSION 3.25) | ||
project(zkv) | ||
|
||
add_executable(RaftMain "${CMAKE_CURRENT_SOURCE_DIR}/main.cpp") | ||
set_target_properties(RaftMain PROPERTIES CXX_STANDARD 23) | ||
target_link_libraries(RaftMain PRIVATE DB RaftProtoObjects) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add test configuration for Raft implementation
The Raft consensus is a critical component that requires thorough testing. Consider adding test configuration to ensure reliability.
Add the following after the existing code:
# Add test executable
add_executable(RaftTests "${CMAKE_CURRENT_SOURCE_DIR}/tests/raft_tests.cpp")
set_target_properties(RaftTests PROPERTIES CXX_STANDARD 23)
target_link_libraries(RaftTests PRIVATE
DB
RaftProtoObjects
Catch2::Catch2WithMain
)
# Register with CTest
add_test(NAME raft_tests COMMAND RaftTests)
This will help ensure:
- Leader election correctness
- Log replication consistency
- Network partition handling
- State persistence reliability
run_replicas.sh
Outdated
#!/bin/bash | ||
|
||
./build/RaftMain --id 1 --nodes 0.0.0.0:8080,0.0.0.0:8081,0.0.0.0:8082 &> log_1.txt & | ||
./build/RaftMain --id 2 --nodes 0.0.0.0:8080,0.0.0.0:8081,0.0.0.0:8082 &> log_2.txt & | ||
./build/RaftMain --id 3 --nodes 0.0.0.0:8080,0.0.0.0:8081,0.0.0.0:8082 &> log_3.txt & |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enhance the replica deployment script for production readiness
The current script has several limitations that could cause issues in production:
- No cleanup of background processes
- No health checks for node startup
- No error handling for port availability
- Hardcoded configuration
- No graceful shutdown mechanism
Replace the script with this more robust version:
-#!/bin/bash
-
-./build/RaftMain --id 1 --nodes 0.0.0.0:8080,0.0.0.0:8081,0.0.0.0:8082 &> log_1.txt &
-./build/RaftMain --id 2 --nodes 0.0.0.0:8080,0.0.0.0:8081,0.0.0.0:8082 &> log_2.txt &
-./build/RaftMain --id 3 --nodes 0.0.0.0:8080,0.0.0.0:8081,0.0.0.0:8082 &> log_3.txt &
+#!/bin/bash
+set -euo pipefail
+
+# Configuration
+readonly BASE_PORT=8080
+readonly NUM_REPLICAS=3
+readonly BINARY="./build/RaftMain"
+readonly LOG_DIR="logs"
+
+# Create log directory
+mkdir -p "${LOG_DIR}"
+
+# Cleanup function
+cleanup() {
+ echo "Shutting down Raft nodes..."
+ kill $(jobs -p) 2>/dev/null || true
+ wait
+}
+
+# Register cleanup on script exit
+trap cleanup EXIT
+
+# Build node list
+NODES=""
+for ((i=1; i<=NUM_REPLICAS; i++)); do
+ port=$((BASE_PORT + i - 1))
+ if [ ! -z "$NODES" ]; then
+ NODES="${NODES},"
+ fi
+ NODES="${NODES}0.0.0.0:${port}"
+done
+
+# Start nodes
+for ((i=1; i<=NUM_REPLICAS; i++)); do
+ port=$((BASE_PORT + i - 1))
+
+ # Check if port is available
+ if nc -z localhost $port 2>/dev/null; then
+ echo "Error: Port $port is already in use"
+ exit 1
+ fi
+
+ # Start node
+ echo "Starting Raft node $i on port $port..."
+ $BINARY --id $i --nodes $NODES &> "${LOG_DIR}/log_${i}.txt" &
+
+ # Wait briefly to ensure proper startup order
+ sleep 1
+
+ # Check if process is running
+ if ! kill -0 $! 2>/dev/null; then
+ echo "Error: Failed to start node $i"
+ exit 1
+ fi
+done
+
+echo "All Raft nodes started. Logs available in ${LOG_DIR}/"
+echo "Press Ctrl+C to shutdown all nodes"
+
+# Wait for Ctrl+C
+wait
This enhanced version includes:
- Process cleanup on exit
- Port availability checking
- Configurable number of replicas
- Organized logging
- Startup health checks
- Proper error handling
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
#!/bin/bash | |
./build/RaftMain --id 1 --nodes 0.0.0.0:8080,0.0.0.0:8081,0.0.0.0:8082 &> log_1.txt & | |
./build/RaftMain --id 2 --nodes 0.0.0.0:8080,0.0.0.0:8081,0.0.0.0:8082 &> log_2.txt & | |
./build/RaftMain --id 3 --nodes 0.0.0.0:8080,0.0.0.0:8081,0.0.0.0:8082 &> log_3.txt & | |
#!/bin/bash | |
set -euo pipefail | |
# Configuration | |
readonly BASE_PORT=8080 | |
readonly NUM_REPLICAS=3 | |
readonly BINARY="./build/RaftMain" | |
readonly LOG_DIR="logs" | |
# Create log directory | |
mkdir -p "${LOG_DIR}" | |
# Cleanup function | |
cleanup() { | |
echo "Shutting down Raft nodes..." | |
kill $(jobs -p) 2>/dev/null || true | |
wait | |
} | |
# Register cleanup on script exit | |
trap cleanup EXIT | |
# Build node list | |
NODES="" | |
for ((i=1; i<=NUM_REPLICAS; i++)); do | |
port=$((BASE_PORT + i - 1)) | |
if [ ! -z "$NODES" ]; then | |
NODES="${NODES}," | |
fi | |
NODES="${NODES}0.0.0.0:${port}" | |
done | |
# Start nodes | |
for ((i=1; i<=NUM_REPLICAS; i++)); do | |
port=$((BASE_PORT + i - 1)) | |
# Check if port is available | |
if nc -z localhost $port 2>/dev/null; then | |
echo "Error: Port $port is already in use" | |
exit 1 | |
fi | |
# Start node | |
echo "Starting Raft node $i on port $port..." | |
$BINARY --id $i --nodes $NODES &> "${LOG_DIR}/log_${i}.txt" & | |
# Wait briefly to ensure proper startup order | |
sleep 1 | |
# Check if process is running | |
if ! kill -0 $! 2>/dev/null; then | |
echo "Error: Failed to start node $i" | |
exit 1 | |
fi | |
done | |
echo "All Raft nodes started. Logs available in ${LOG_DIR}/" | |
echo "Press Ctrl+C to shutdown all nodes" | |
# Wait for Ctrl+C | |
wait |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (5)
examples/raft/main.cpp (5)
32-33
: Consider making timeout values configurable.The hardcoded timeout values could be moved to configuration to allow tuning based on network conditions and deployment requirements.
- const int minTimeout{150}; - const int maxTimeout{300}; + static constexpr const char* CONFIG_MIN_TIMEOUT = "RAFT_MIN_TIMEOUT"; + static constexpr const char* CONFIG_MAX_TIMEOUT = "RAFT_MAX_TIMEOUT"; + const int minTimeout = std::getenv(CONFIG_MIN_TIMEOUT) ? std::atoi(std::getenv(CONFIG_MIN_TIMEOUT)) : 150; + const int maxTimeout = std::getenv(CONFIG_MAX_TIMEOUT) ? std::atoi(std::getenv(CONFIG_MAX_TIMEOUT)) : 300;
53-60
: Optimize parameter passing in constructor.The
ip
parameter should be passed by const reference for better performance with strings.- NodeClient(const ID id, const IP ip) + NodeClient(const ID id, const IP& ip)🧰 Tools
🪛 cppcheck (2.10-2)
[performance] 53-53: Function parameter 'ip' should be passed by const reference.
(passedByValue)
103-103
: Remove commented-out code.The commented-out completion queue declaration should be removed if it's not needed.
- /*grpc::CompletionQueue m_cq;*/
309-309
: Implement persistent state initialization.The TODO comment indicates missing persistence implementation which is crucial for Raft's durability guarantees.
Would you like me to help implement the persistent state initialization using a simple file-based storage solution?
591-592
: Improve error message formatting.The error message should follow consistent formatting and include the actual value for better debugging.
- spdlog::error("ID of the node should be positve integer"); + spdlog::error("Invalid node ID: {}. Node ID must be a positive integer.", nodeId);
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
examples/raft/main.cpp
(1 hunks)
🧰 Additional context used
🪛 cppcheck (2.10-2)
examples/raft/main.cpp
[performance] 53-53: Function parameter 'ip' should be passed by const reference.
(passedByValue)
examples/raft/main.cpp
Outdated
void decrementNextIndex(ID id) | ||
{ | ||
(void)id; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implement decrementNextIndex method.
The method is currently a no-op but is crucial for Raft's log replication mechanism.
void decrementNextIndex(ID id)
{
- (void)id;
+ if (m_nextIndex[id - 1] > 0) {
+ m_nextIndex[id - 1]--;
+ }
}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
void decrementNextIndex(ID id) | |
{ | |
(void)id; | |
} | |
void decrementNextIndex(ID id) | |
{ | |
if (m_nextIndex[id - 1] > 0) { | |
m_nextIndex[id - 1]--; | |
} | |
} |
examples/raft/main.cpp
Outdated
while (!m_stopElectionTimer) | ||
{ | ||
if (token.stop_requested()) | ||
{ | ||
spdlog::info("Stopping election timer thread"); | ||
return; | ||
} | ||
|
||
{ | ||
absl::MutexLock locker(&m_electionMutex); | ||
if (m_state == NodeState::LEADER) | ||
{ | ||
continue; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix potential race condition in election timer loop.
The continue
statement inside the leader check could cause the thread to spin continuously if the node is the leader. Consider using a condition variable to wait efficiently.
if (m_state == NodeState::LEADER)
{
- continue;
+ m_timerCV.WaitWithTimeout(&m_timerMutex, absl::Milliseconds(100));
+ continue;
}
Committable suggestion skipped: line range outside the PR's diff.
examples/raft/main.cpp
Outdated
std::thread( | ||
[this, &client, heartbeatInterval]() | ||
{ | ||
while (m_state == NodeState::LEADER) | ||
{ | ||
AppendEntriesRequest request; | ||
{ | ||
absl::ReaderMutexLock locker(&m_electionMutex); | ||
|
||
request.set_term(m_currentTerm); | ||
request.set_prevlogterm(getLastLogTerm()); | ||
request.set_prevlogindex(getLastLogIndex()); | ||
request.set_leadercommit(m_commitIndex); | ||
request.set_senderid(m_id); | ||
} | ||
|
||
{ | ||
AppendEntriesResponse response; | ||
if (!client.appendEntries(request, &response)) | ||
{ | ||
spdlog::error("AppendEntriesRequest failed during heartbeat"); | ||
return; | ||
} | ||
|
||
auto responseTerm = response.term(); | ||
auto success = response.success(); | ||
|
||
spdlog::info( | ||
"Received AppendEntriesResponse in requester thread peerTerm={} success={} responderId={}", | ||
responseTerm, | ||
success, | ||
response.responderid()); | ||
|
||
{ | ||
absl::WriterMutexLock locker(&m_electionMutex); | ||
|
||
if (responseTerm > m_currentTerm) | ||
{ | ||
becomeFollower(responseTerm); | ||
return; | ||
} | ||
|
||
if (!success) | ||
{ | ||
decrementNextIndex(client.getId()); | ||
} | ||
} | ||
} | ||
|
||
std::this_thread::sleep_for(heartbeatInterval); | ||
} | ||
}) | ||
.detach(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential resource leak in heartbeat thread.
The detached thread might outlive the NodeClient reference it captures, leading to undefined behavior. Consider using a managed thread pool or ensuring proper lifecycle management.
- std::thread(
+ auto heartbeatThread = std::make_shared<std::jthread>(
[this, &client, heartbeatInterval]()
{
// ... existing code ...
- })
- .detach();
+ });
+ m_heartbeatThreads.push_back(heartbeatThread);
Committable suggestion skipped: line range outside the PR's diff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (2)
examples/raft/main.cpp (2)
30-40
: Consider making timeout constants configurable and more descriptive.The hardcoded timeout values could be made configurable to allow tuning based on network conditions and deployment requirements.
- const int minTimeout{150}; - const int maxTimeout{300}; + static constexpr int MIN_ELECTION_TIMEOUT_MS{150}; + static constexpr int MAX_ELECTION_TIMEOUT_MS{300}; + // TODO: Consider making these configurable via command line arguments
53-60
: Pass 'ip' parameter by const reference for better performance.Passing string parameters by value creates unnecessary copies.
- NodeClient(const ID id, const IP ip) + NodeClient(const ID id, const IP& ip)🧰 Tools
🪛 cppcheck (2.10-2)
[performance] 53-53: Function parameter 'ip' should be passed by const reference.
(passedByValue)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
examples/raft/main.cpp
(1 hunks)
🧰 Additional context used
🪛 cppcheck (2.10-2)
examples/raft/main.cpp
[performance] 53-53: Function parameter 'ip' should be passed by const reference.
(passedByValue)
🔇 Additional comments (3)
examples/raft/main.cpp (3)
491-494
:
Implement decrementNextIndex method.
The method is currently a no-op but is crucial for Raft's log replication mechanism.
void decrementNextIndex(ID id)
{
- (void)id;
+ if (m_nextIndex[id - 1] > 0) {
+ m_nextIndex[id - 1]--;
+ }
}
Likely invalid or redundant comment.
436-489
:
Replace detached thread with managed thread for heartbeat.
Using detached threads can lead to resource leaks and undefined behavior if the referenced objects are destroyed. Consider using a managed thread pool or at least storing the thread handle.
- std::thread(
+ auto heartbeatThread = std::make_shared<std::jthread>(
[this, &client, heartbeatInterval]()
{
// ... existing code ...
- })
- .detach();
+ });
+ m_heartbeatThreads.push_back(heartbeatThread);
Likely invalid or redundant comment.
226-240
:
Fix potential deadlock in election timer loop.
The current implementation might deadlock if the node becomes a leader while holding the election mutex. Consider using a condition variable with a timeout instead of a busy wait.
- if (m_state == NodeState::LEADER)
- {
- continue;
- }
+ if (m_state == NodeState::LEADER)
+ {
+ m_timerCV.WaitWithTimeout(&m_timerMutex, absl::Milliseconds(100));
+ continue;
+ }
Likely invalid or redundant comment.
examples/raft/main.cpp
Outdated
m_channel(grpc::CreateChannel(m_ip, grpc::InsecureChannelCredentials())), | ||
m_stub(RaftService::NewStub(m_channel)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Security concern: Using insecure channel credentials.
The use of grpc::InsecureChannelCredentials()
is not suitable for production environments as it exposes the communication to potential man-in-the-middle attacks.
Consider using TLS certificates for secure communication:
auto creds = grpc::SslCredentials(grpc::SslCredentialsOptions());
m_channel = grpc::CreateChannel(m_ip, creds);
examples/raft/main.cpp
Outdated
cm.start(); | ||
|
||
cm.stop(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Implement graceful shutdown handling.
The server should handle SIGTERM/SIGINT signals for graceful shutdown instead of immediate stop.
void signal_handler(int signal) {
spdlog::info("Received signal {}. Initiating graceful shutdown...", signal);
// Trigger graceful shutdown
}
// In main():
std::signal(SIGTERM, signal_handler);
std::signal(SIGINT, signal_handler);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
examples/raft/raft.h (1)
22-38
: Consider passingip
by const reference in theNodeClient
constructor.Static analysis flagged this constructor parameter as a performance concern. Although this might be minor for small strings, passing by const reference avoids unnecessary copies, especially if
IP
can hold large strings or if this constructor is called frequently.- NodeClient(const ID id, const IP ip); + NodeClient(const ID id, const IP& ip);examples/raft/raft.cpp (1)
22-22
: Passip
by const reference to optimize.As noted by static analysis, passing
ip
by const reference can avoid copying strings if they are large or if the constructor is called often.- NodeClient::NodeClient(const ID id, const IP ip) + NodeClient::NodeClient(const ID id, const IP& ip)🧰 Tools
🪛 cppcheck (2.10-2)
[performance] 22-22: Function parameter 'ip' should be passed by const reference.
(passedByValue)
examples/raft/main.cpp (1)
39-42
: Avoid copying IP strings in the for loop.In the loop iterating over
nodeIps
, consider capturing the iterated string by const reference to avoid unnecessary copies. For large or numerous IP strings, this minor refactor can improve performance:- for (auto ip : nodeIps) + for (const auto& ip : nodeIps)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
examples/raft/CMakeLists.txt
(1 hunks)examples/raft/main.cpp
(1 hunks)examples/raft/raft.cpp
(1 hunks)examples/raft/raft.h
(1 hunks)
🧰 Additional context used
🪛 cppcheck (2.10-2)
examples/raft/raft.cpp
[performance] 22-22: Function parameter 'ip' should be passed by const reference.
(passedByValue)
examples/raft/main.cpp
[performance] 25-25: Function parameter 'ip' should be passed by const reference.
(passedByValue)
🔇 Additional comments (4)
examples/raft/raft.h (1)
40-125
: enable_shared_from_this
usage check.
The class ConsensusModule
inherits from std::enable_shared_from_this<ConsensusModule>
but does not appear to call shared_from_this()
. Verify whether this inheritance is intentional. If not, consider removing it to simplify the class.
examples/raft/raft.cpp (2)
193-198
: Election timer loop concurrency risk
This comment reaffirms a previously raised concern about potential continuous spinning or premature exit when the node becomes a leader, preventing election timeout monitoring if leadership changes again.
459-463
: decrementNextIndex
remains unimplemented
This method is critical for the Raft log replication process but is currently a no-op. Consider implementing the logic to decrement m_nextIndex[id - 1]
, as recommended in a prior review.
examples/raft/CMakeLists.txt (1)
4-6
: Add specialized test target for Raft.
A test configuration for the Raft consensus ensures reliability of leader election, log replication, and fault tolerance. See previous suggestion for details on adding a dedicated test executable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
🧹 Nitpick comments (2)
examples/raft/raft.cpp (2)
31-43
: Add deadline to appendEntries RPC call.For consistency with
requestVote
, consider adding a deadline to prevent hanging on network issues:auto NodeClient::appendEntries(const AppendEntriesRequest &request, AppendEntriesResponse *response) -> bool { grpc::ClientContext context; + context.set_deadline(std::chrono::system_clock::now() + std::chrono::seconds(generateRandomTimeout())); grpc::Status status = m_stub->AppendEntries(&context, request, response);
279-286
: Implement persistent state storage.The TODO comment indicates missing persistence implementation:
Would you like me to help implement the persistent state storage using RocksDB or a similar key-value store? This would ensure durability of critical Raft state across restarts.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
examples/raft/raft.cpp
(1 hunks)lib/structures/lsmtree/levels/level.cpp
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- lib/structures/lsmtree/levels/level.cpp
🧰 Additional context used
🪛 cppcheck (2.10-2)
examples/raft/raft.cpp
[performance] 22-22: Function parameter 'ip' should be passed by const reference.
(passedByValue)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
examples/raft/raft.cpp (2)
31-43
: Enhance error handling in appendEntries.The error handling could be more detailed and include retry logic for transient failures.
Consider adding:
auto NodeClient::appendEntries(const AppendEntriesRequest &request, AppendEntriesResponse *response) -> bool { + constexpr int maxRetries = 3; + int attempts = 0; + while (attempts++ < maxRetries) { grpc::ClientContext context; + context.set_deadline(std::chrono::system_clock::now() + std::chrono::seconds(1)); grpc::Status status = m_stub->AppendEntries(&context, request, response); if (!status.ok()) { - spdlog::error("AppendEntries RPC call failed"); + spdlog::error("AppendEntries RPC call failed. Error code={}, message={}, attempt={}/{}", + static_cast<int>(status.error_code()), + status.error_message(), + attempts, + maxRetries); + if (attempts < maxRetries) { + std::this_thread::sleep_for(std::chrono::milliseconds(100 * attempts)); + continue; + } return false; } + break; + } return true; }
293-300
: Implement persistent state initialization.The TODO comment indicates missing persistence implementation, which is crucial for maintaining consistency across node restarts.
Would you like me to help implement the persistent state initialization using a simple file-based storage system?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
examples/raft/raft.cpp
(1 hunks)
🧰 Additional context used
🪛 cppcheck (2.10-2)
examples/raft/raft.cpp
[performance] 22-22: Function parameter 'ip' should be passed by const reference.
(passedByValue)
🔇 Additional comments (5)
examples/raft/raft.cpp (5)
1-21
: LGTM! Headers are well-organized and complete.
The includes are properly categorized and contain all necessary dependencies for gRPC, abseil synchronization primitives, and logging.
401-405
: Fix incorrect majority calculation.
The majority calculation is incorrect as noted in the previous review. Using the existing fix:
auto ConsensusModule::hasMajority(const uint32_t votes) const -> bool
{
- constexpr const double HALF_OF_THE_REPLICAS = 2.0;
- return votes > static_cast<double>(m_replicas.size()) / HALF_OF_THE_REPLICAS;
+ // Total nodes = replicas + self
+ const size_t totalNodes = m_replicas.size() + 1;
+ return votes > totalNodes / 2;
}
429-488
: 🛠️ Refactor suggestion
Enhance heartbeat thread reliability.
The heartbeat mechanism needs improvements:
- Add retry logic for failed heartbeats
- Implement backoff strategy
- Add thread cleanup on errors
void ConsensusModule::sendHeartbeat(NodeClient &client)
{
- constexpr const auto heartbeatInterval{std::chrono::milliseconds(100)};
+ constexpr const auto baseHeartbeatInterval{std::chrono::milliseconds(100)};
+ constexpr const int maxRetries = 3;
+ constexpr const auto maxBackoff{std::chrono::milliseconds(1000)};
m_heartbeatThreads.emplace_back(
- [this, &client, heartbeatInterval](std::stop_token token)
+ [this, &client](std::stop_token token)
{
+ auto heartbeatInterval = baseHeartbeatInterval;
+ int consecutiveFailures = 0;
Likely invalid or redundant comment.
22-29
: 🛠️ Refactor suggestion
Enhance constructor robustness and efficiency.
The constructor should validate inputs and handle connection failures:
-NodeClient::NodeClient(const ID id, const IP ip)
+NodeClient::NodeClient(const ID id, const IP& ip)
: m_id{id},
m_ip{ip},
m_channel(grpc::CreateChannel(m_ip, grpc::InsecureChannelCredentials())),
m_stub(RaftService::NewStub(m_channel))
{
assert(m_id > 0);
+ assert(!m_ip.empty());
+
+ // Verify channel is valid
+ if (!m_channel || !m_stub) {
+ throw std::runtime_error("Failed to establish gRPC channel to " + m_ip);
+ }
}
Likely invalid or redundant comment.
🧰 Tools
🪛 cppcheck (2.10-2)
[performance] 22-22: Function parameter 'ip' should be passed by const reference.
(passedByValue)
66-96
: 🛠️ Refactor suggestion
Enhance constructor's thread safety and error handling.
The constructor needs better initialization and validation:
- Initialize atomic variables explicitly
- Add error handling for server startup
- Validate replica IDs
ConsensusModule::ConsensusModule(const ID id, std::vector<IP> replicas)
: m_id{id},
m_currentTerm{0},
m_votedFor{0},
- m_state{NodeState::FOLLOWER}
+ m_state{NodeState::FOLLOWER},
+ m_stopElectionTimer{false},
+ m_electionInProgress{false},
+ m_leaderHeartbeatReceived{false},
+ m_voteCount{0}
{
assert(m_id > 0);
assert(replicas.size() > 0);
assert(m_id <= replicas.size());
m_ip = replicas[m_id - 1];
+ if (m_ip.empty()) {
+ throw std::runtime_error("Invalid IP address for node " + std::to_string(m_id));
+ }
grpc::ServerBuilder builder;
builder.AddListeningPort(m_ip, grpc::InsecureServerCredentials());
builder.RegisterService(this);
m_server = builder.BuildAndStart();
+ if (!m_server) {
+ throw std::runtime_error("Failed to start gRPC server on " + m_ip);
+ }
Likely invalid or redundant comment.
examples/raft/raft.cpp
Outdated
/*if (pRequest->lastlogterm() < getLastLogTerm() ||*/ | ||
/* (pRequest->lastlogterm() == getLastLogTerm() && pRequest->lastlogindex() < getLastLogIndex()))*/ | ||
/*{*/ | ||
/* pResponse->set_votegranted(0);*/ | ||
/* return grpc::Status::OK;*/ | ||
/*}*/ | ||
/**/ | ||
return grpc::Status::OK; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uncomment and fix the log comparison logic in RequestVote.
The commented-out log comparison logic is essential for the Raft protocol's safety properties. It should be uncommented and fixed.
- /*if (pRequest->lastlogterm() < getLastLogTerm() ||*/
- /* (pRequest->lastlogterm() == getLastLogTerm() && pRequest->lastlogindex() < getLastLogIndex()))*/
- /*{*/
- /* pResponse->set_votegranted(0);*/
- /* return grpc::Status::OK;*/
- /*}*/
- /**/
+ // Rule: Candidate's log must be at least as up-to-date as receiver's log
+ if (pRequest->lastlogterm() < getLastLogTerm() ||
+ (pRequest->lastlogterm() == getLastLogTerm() && pRequest->lastlogindex() < getLastLogIndex()))
+ {
+ pResponse->set_term(m_currentTerm);
+ pResponse->set_votegranted(0);
+ pResponse->set_responderid(m_id);
+ return grpc::Status::OK;
+ }
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
/*if (pRequest->lastlogterm() < getLastLogTerm() ||*/ | |
/* (pRequest->lastlogterm() == getLastLogTerm() && pRequest->lastlogindex() < getLastLogIndex()))*/ | |
/*{*/ | |
/* pResponse->set_votegranted(0);*/ | |
/* return grpc::Status::OK;*/ | |
/*}*/ | |
/**/ | |
return grpc::Status::OK; | |
// Rule: Candidate's log must be at least as up-to-date as receiver's log | |
if (pRequest->lastlogterm() < getLastLogTerm() || | |
(pRequest->lastlogterm() == getLastLogTerm() && pRequest->lastlogindex() < getLastLogIndex())) | |
{ | |
pResponse->set_term(m_currentTerm); | |
pResponse->set_votegranted(0); | |
pResponse->set_responderid(m_id); | |
return grpc::Status::OK; | |
} | |
return grpc::Status::OK; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (1)
examples/raft/raft.cpp (1)
422-426
:⚠️ Potential issueFix incorrect majority calculation.
The current logic can lead to split votes. Please use total nodes (replicas.size() + 1
) instead of just replicas when determining majority.-constexpr const double HALF_OF_THE_REPLICAS = 2.0; -return votes > static_cast<double>(m_replicas.size()) / HALF_OF_THE_REPLICAS; +const size_t totalNodes = m_replicas.size() + 1; // includes self +return votes > totalNodes / 2;
🧹 Nitpick comments (7)
examples/raft/raft.cpp (5)
25-43
: Increase inline IP validation or logging information.
Currently, the constructor validates non-empty IP and logs channel creation issues. Consider adding more context or checks (e.g., IP format) to proactively catch configuration errors.
45-57
: Add trace-level logs for debug support.
TheappendEntries
RPC call is crucial to Raft. Adding trace logs (e.g., printing the request’s term) can help debug replication issues.
59-73
: Implement consistent RPC deadlines.
You set a deadline forRequestVote
, whereasappendEntries
has no per-RPC deadline. Consider aligning both to ensure consistent timeout behavior.
80-114
: Consider checking the validity of term or replica size mismatch.
In the constructor, you assumem_id <= replicas.size()
. If this is violated, you throw in the assertion; consider converting it into a runtime error or user-friendly message for production builds.
288-312
: Ensure graceful handling of any outstanding RPC calls on stop.
Consider waiting for in-flight RPCs to complete or adding a short deadline prior to forcibly shutting down the gRPC server.examples/raft/raft.h (2)
20-20
: Consider adding randomness seeding or injection for reproducibility.
generateRandomTimeout()
depends on likely internal PRNG. For test determinism, you can seed or allow a configurable random source.
72-75
: Document the lock invariants.
It is helpful to clarify in comments thatbecomeLeader
andsendHeartbeat
require them_stateMutex
to be locked. This ensures future maintainers understand the concurrency design.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
examples/raft/raft.cpp
(1 hunks)examples/raft/raft.h
(1 hunks)
🔇 Additional comments (2)
examples/raft/raft.cpp (1)
154-196
: Ensure you handle a second vote request from the same candidate within the same term.
When m_votedFor != 0
and is not the candidate’s ID, you immediately continue, but you might want to short-circuit with a negative response.
examples/raft/raft.h (1)
40-40
: Revisit the multiple inheritance approach.
Inheriting from both RaftService::Service
and std::enable_shared_from_this
is valid but can complicate object lifetime. Confirm that shared pointers are consistently managed to avoid double free or dangling references.
examples/raft/raft.cpp
Outdated
auto ConsensusModule::AppendEntries(grpc::ServerContext *pContext, | ||
const AppendEntriesRequest *pRequest, | ||
AppendEntriesResponse *pResponse) -> grpc::Status | ||
{ | ||
(void)pContext; | ||
(void)pRequest; | ||
(void)pResponse; | ||
|
||
spdlog::info("Recevied AppendEntries RPC from leader={} during term={}", pRequest->senderid(), pRequest->term()); | ||
|
||
absl::MutexLock locker(&m_stateMutex); | ||
absl::MutexLock timerLocker(&m_timerMutex); | ||
|
||
if (pRequest->term() < m_currentTerm) | ||
{ | ||
pResponse->set_term(m_currentTerm); | ||
pResponse->set_success(false); | ||
pResponse->set_responderid(m_id); | ||
return grpc::Status::OK; | ||
} | ||
|
||
if (pRequest->term() > m_currentTerm) | ||
{ | ||
becomeFollower(pRequest->term()); | ||
} | ||
|
||
pResponse->set_term(m_currentTerm); | ||
pResponse->set_success(true); | ||
pResponse->set_responderid(m_id); | ||
|
||
spdlog::info("Node={} is resetting election timeout at term={}", m_id, m_currentTerm); | ||
resetElectionTimer(); | ||
|
||
m_leaderHeartbeatReceived.store(true); | ||
|
||
return grpc::Status::OK; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validate the log index in AppendEntries
.
The method always sets success=true
without verifying log continuity (as per Raft spec). Not rejecting an inconsistent log entry could lead to divergent logs.
examples/raft/raft.cpp
Outdated
void ConsensusModule::sendHeartbeat(NodeClient &client) | ||
{ | ||
constexpr const auto heartbeatInterval{std::chrono::milliseconds(10)}; | ||
constexpr const int maxRetries{3}; | ||
|
||
m_heartbeatThreads.emplace_back( | ||
[this, maxRetries, &client, heartbeatInterval](std::stop_token token) | ||
{ | ||
spdlog::info("Node={} is starting a heartbeat thread for client={}", m_id, client.getId()); | ||
|
||
int consecutiveFailures = 0; | ||
while (!token.stop_requested()) | ||
{ | ||
AppendEntriesRequest request; | ||
{ | ||
absl::ReaderMutexLock locker(&m_stateMutex); | ||
if (m_state != NodeState::LEADER) | ||
{ | ||
spdlog::info("Node={} is no longer a leader. Stopping the heartbeat thread"); | ||
break; | ||
} | ||
|
||
request.set_term(m_currentTerm); | ||
request.set_prevlogterm(getLastLogTerm()); | ||
request.set_prevlogindex(getLastLogIndex()); | ||
request.set_leadercommit(m_commitIndex); | ||
request.set_senderid(m_id); | ||
} | ||
|
||
{ | ||
AppendEntriesResponse response; | ||
if (!client.appendEntries(request, &response)) | ||
{ | ||
consecutiveFailures++; | ||
|
||
consecutiveFailures = 0; | ||
spdlog::error("AppendEntriesRequest failed during heartbeat. Attempt {}/{}", | ||
consecutiveFailures, | ||
maxRetries); | ||
if (consecutiveFailures >= maxRetries) | ||
{ | ||
return; | ||
} | ||
|
||
continue; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix the consecutive failures logic.
Within the heartbeat loop, consecutiveFailures++
is immediately reset to zero. This prevents retries from accumulating.
if (!client.appendEntries(request, &response)) {
consecutiveFailures++;
- consecutiveFailures = 0; // This negates the increment
spdlog::error("AppendEntriesRequest failed during heartbeat. Attempt {}/{}",
consecutiveFailures,
maxRetries);
...
}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
void ConsensusModule::sendHeartbeat(NodeClient &client) | |
{ | |
constexpr const auto heartbeatInterval{std::chrono::milliseconds(10)}; | |
constexpr const int maxRetries{3}; | |
m_heartbeatThreads.emplace_back( | |
[this, maxRetries, &client, heartbeatInterval](std::stop_token token) | |
{ | |
spdlog::info("Node={} is starting a heartbeat thread for client={}", m_id, client.getId()); | |
int consecutiveFailures = 0; | |
while (!token.stop_requested()) | |
{ | |
AppendEntriesRequest request; | |
{ | |
absl::ReaderMutexLock locker(&m_stateMutex); | |
if (m_state != NodeState::LEADER) | |
{ | |
spdlog::info("Node={} is no longer a leader. Stopping the heartbeat thread"); | |
break; | |
} | |
request.set_term(m_currentTerm); | |
request.set_prevlogterm(getLastLogTerm()); | |
request.set_prevlogindex(getLastLogIndex()); | |
request.set_leadercommit(m_commitIndex); | |
request.set_senderid(m_id); | |
} | |
{ | |
AppendEntriesResponse response; | |
if (!client.appendEntries(request, &response)) | |
{ | |
consecutiveFailures++; | |
consecutiveFailures = 0; | |
spdlog::error("AppendEntriesRequest failed during heartbeat. Attempt {}/{}", | |
consecutiveFailures, | |
maxRetries); | |
if (consecutiveFailures >= maxRetries) | |
{ | |
return; | |
} | |
continue; | |
} | |
void ConsensusModule::sendHeartbeat(NodeClient &client) | |
{ | |
constexpr const auto heartbeatInterval{std::chrono::milliseconds(10)}; | |
constexpr const int maxRetries{3}; | |
m_heartbeatThreads.emplace_back( | |
[this, maxRetries, &client, heartbeatInterval](std::stop_token token) | |
{ | |
spdlog::info("Node={} is starting a heartbeat thread for client={}", m_id, client.getId()); | |
int consecutiveFailures = 0; | |
while (!token.stop_requested()) | |
{ | |
AppendEntriesRequest request; | |
{ | |
absl::ReaderMutexLock locker(&m_stateMutex); | |
if (m_state != NodeState::LEADER) | |
{ | |
spdlog::info("Node={} is no longer a leader. Stopping the heartbeat thread"); | |
break; | |
} | |
request.set_term(m_currentTerm); | |
request.set_prevlogterm(getLastLogTerm()); | |
request.set_prevlogindex(getLastLogIndex()); | |
request.set_leadercommit(m_commitIndex); | |
request.set_senderid(m_id); | |
} | |
{ | |
AppendEntriesResponse response; | |
if (!client.appendEntries(request, &response)) | |
{ | |
consecutiveFailures++; | |
spdlog::error("AppendEntriesRequest failed during heartbeat. Attempt {}/{}", | |
consecutiveFailures, | |
maxRetries); | |
if (consecutiveFailures >= maxRetries) | |
{ | |
return; | |
} | |
continue; | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
.vscode/launch.json (1)
7-33
: 🛠️ Refactor suggestionEnhance RaftMain debug configuration for Raft development
The configuration needs improvements for effectively debugging Raft nodes:
- Add
preLaunchTask
to ensure RaftMain is compiled before debugging- Add command line arguments for Raft configuration (node ID, peer list)
- Make the debug port configurable using VS Code input variables
Apply this diff to enhance the configuration:
{ "name": "rr - RaftMain", "type": "cppdbg", "request": "launch", "program": "${workspaceFolder}/build/RaftMain", - "miDebuggerServerAddress": "localhost:50505", + "miDebuggerServerAddress": "localhost:${input:debugPort}", + "args": [ + "--node-id", "${input:nodeId}", + "--config", "${workspaceFolder}/assets/raft_config.json" + ], "stopAtEntry": false, "cwd": "${workspaceFolder}", "environment": [], "externalConsole": true, + "preLaunchTask": "build", "linux": {Add these input configurations at the root level:
"inputs": [ { "id": "debugPort", "type": "promptString", "description": "Debug port for remote debugging", "default": "50505" }, { "id": "nodeId", "type": "promptString", "description": "Raft node ID", "default": "1" } ]
🧹 Nitpick comments (3)
lib/proto/Raft.proto (1)
8-17
: Unify naming conventions in enums and fields
Consider using a consistent naming style (e.g.,lastLogTerm
) in the request messages or switching to snake_case to match common protobuf conventions.examples/raft/main.cpp (1)
69-69
: Consider implementing signal-based graceful shutdown
Currently, the Raft node stops only after the main logic completes. Adding handling of SIGTERM/SIGINT to initiate a controlled shutdown can prevent data inconsistencies or partial writes..vscode/launch.json (1)
34-64
: Align Debug - RaftMain configuration with other debug configurationsThe configuration should be consistent with other debug configurations in the file:
- Add command line arguments for Raft configuration
- Add
launchCompleteCommand
for consistent behavior- Add missing field descriptions
Apply this diff:
{ "name": "Debug - RaftMain", "type": "cppdbg", "request": "attach", "program": "${workspaceFolder}/build/RaftMain", "processId": "${command:pickProcess}", "MIMode": "gdb", + "args": [ // Arguments for Raft configuration + "--node-id", "1", + "--config", "${workspaceFolder}/assets/raft_config.json" + ], "setupCommands": [ { "description": "Enable pretty-printing for gdb", "text": "-enable-pretty-printing", "ignoreFailures": true } ], "preLaunchTask": "build", "miDebuggerPath": "/usr/bin/gdb", "logging": { "trace": true, "traceResponse": true, "engineLogging": false }, "targetArchitecture": "x86_64", "pipeTransport": { "pipeCwd": "", "pipeProgram": "/bin/bash", "pipeArgs": [ "-c" ], "debuggerPath": "/usr/bin/gdb" }, + "launchCompleteCommand": "exec-run" // Consistent with other configurations },🧰 Tools
🪛 Biome (1.9.4)
[error] 64-64: Expected a property but instead found '}'.
Expected a property here.
(parse)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
.vscode/launch.json
(2 hunks)examples/raft/CMakeLists.txt
(1 hunks)examples/raft/main.cpp
(1 hunks)examples/raft/raft.cpp
(1 hunks)examples/raft/raft.h
(1 hunks)lib/proto/Raft.proto
(1 hunks)
🧰 Additional context used
🪛 Biome (1.9.4)
.vscode/launch.json
[error] 64-64: Expected a property but instead found '}'.
Expected a property here.
(parse)
🔇 Additional comments (9)
examples/raft/raft.cpp (5)
28-28
: Use atomic or mutex to protectgFirstElection
gFirstElection
is a global integer that is written from multiple threads (e.g., in the election timer and upon heartbeat reception). For thread safety, consider making this variable atomic or protecting it with a mutex.
33-34
: Security concern: Using insecure channel credentials
This code usesgrpc::InsecureChannelCredentials()
, which is typically insecure for production. Consider TLS or an equivalent secure channel.
509-513
: Fix incorrect majority calculation
This logic usesvotes > m_replicas.size() / 2.0
and lacks the node’s own vote in the total. Consider using(m_replicas.size() + 1)
to account for the node itself.
569-581
: Fix the consecutive failures logic
consecutiveFailures++
is immediately reset to zero on line 571, preventing retries from incrementing. This makes the retry loop unusable as intended.
133-134
: Unused or uninitialized server thread
Whilem_serverThread
is declared and stopped instop()
, it never appears to be started. This might cause confusion or unexpected deadlocks. Verify if it’s needed and ensure proper initialization if so.Also applies to: 421-423
examples/raft/CMakeLists.txt (1)
4-6
: Add test configuration for Raft implementation
Comprehensive testing of Raft is crucial. Adding a test binary and registering tests ensures reliability of leader election, log replication, and fault tolerance.examples/raft/main.cpp (1)
25-25
: Environment variable usage may have portability issues
Usingsetenv("ASAN_OPTIONS", ...)
is helpful for debugging, but consider limiting this to debug builds to avoid altering production behavior.examples/raft/raft.h (2)
74-75
: Unused methodinitializeVolatileState
Although declared,initializeVolatileState()
has no definition or usage in the codebase. If it's part of your design, ensure it is implemented. Otherwise, remove it to avoid confusion.
82-83
: Confirm concurrency invariants forsendAppendEntriesRPC
sendAppendEntriesRPC
is not marked with the same lock requirements assendHeartbeat
. Verify that you consistently acquire the necessary locks in both methods so that log mutations are always thread-safe.
], | ||
"debuggerPath": "/usr/bin/gdb" | ||
}, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix JSON syntax error
Remove the trailing comma after the pipeTransport object.
Apply this diff:
],
"debuggerPath": "/usr/bin/gdb"
- },
+ }
},
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
}, | |
], | |
"debuggerPath": "/usr/bin/gdb" | |
} | |
}, |
🧰 Tools
🪛 Biome (1.9.4)
[error] 64-64: Expected a property but instead found '}'.
Expected a property here.
(parse)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
♻️ Duplicate comments (3)
examples/raft/raft.cpp (3)
141-224
:⚠️ Potential issueFix log replication safety issues.
The AppendEntries implementation has several critical issues:
- Missing validation of request fields
- Potential off-by-one errors in log indexing
- Inconsistent state updates
+ // Validate request fields + if (!pRequest) { + return grpc::Status(grpc::StatusCode::INVALID_ARGUMENT, "Null request"); + } // 3. Append new entries and remove conflicting ones auto newEntryStart = pRequest->prevlogindex() + 1; + // Check for integer overflow + if (newEntryStart < pRequest->prevlogindex()) { + return grpc::Status(grpc::StatusCode::INVALID_ARGUMENT, "Index overflow"); + } - m_log.insert(m_log.end(), pRequest->entries().begin(), pRequest->entries().end()); + // Validate and append entries + for (const auto& entry : pRequest->entries()) { + if (entry.term() < m_currentTerm) { + return grpc::Status(grpc::StatusCode::FAILED_PRECONDITION, "Entry from previous term"); + } + m_log.push_back(entry); + }
481-584
:⚠️ Potential issueFix election safety issues.
The election implementation has critical safety issues:
- Race condition in vote counting
- Incorrect majority calculation (as noted in past reviews)
- Missing timeout handling for concurrent elections
- constexpr const double HALF_OF_THE_REPLICAS = 2.0; - return votes > static_cast<double>(m_replicas.size()) / HALF_OF_THE_REPLICAS; + // Total nodes = replicas + self + const size_t totalNodes = m_replicas.size() + 1; + return votes > totalNodes / 2; void startElection() { + const auto electionTimeout = std::chrono::seconds(generateRandomTimeout()); + const auto deadline = std::chrono::steady_clock::now() + electionTimeout; + RequestVoteRequest request; { absl::WriterMutexLock locker(&m_stateMutex); + if (std::chrono::steady_clock::now() >= deadline) { + spdlog::warn("Election timed out before starting"); + return; + }
586-660
:⚠️ Potential issueFix heartbeat reliability issues.
The heartbeat implementation has several critical issues:
- Incorrect retry logic (consecutiveFailures reset too early)
- Potential thread leaks in error paths
- Missing timeout handling
if (!client.appendEntries(request, &response)) { consecutiveFailures++; - consecutiveFailures = 0; // This negates the increment spdlog::error("AppendEntriesRequest failed during heartbeat. Attempt {}/{}", consecutiveFailures, maxRetries); if (consecutiveFailures >= maxRetries) { + absl::WriterMutexLock locker(&m_stateMutex); + becomeFollower(m_currentTerm); // Step down if can't reach majority return; } continue; }
🧹 Nitpick comments (2)
examples/raft/raft.h (1)
1-24
: Add include guards and improve documentation.While
#pragma once
works for most modern compilers, consider adding traditional include guards for maximum compatibility. Also add documentation for type aliases and threading model.#pragma once +#ifndef RAFT_H_ +#define RAFT_H_ +// Type aliases for improved readability and maintainability using IP = std::string; // Represents an IP address using ID = uint32_t; // Unique identifier for nodes using Clock = std::chrono::high_resolution_clock; using TimePoint = std::chrono::high_resolution_clock::time_point; // Valid IDs start from 1 constexpr const ID invalidId = 0; +#endif // RAFT_H_examples/raft/raft.cpp (1)
1-796
: Improve logging for better observability.The logging implementation needs improvements:
- Add structured logging with consistent fields
- Use appropriate log levels
- Add logging for all error paths
+ // Add logging configuration + spdlog::set_pattern("[%Y-%m-%d %H:%M:%S.%e] [%n] [%^%l%$] [%s:%#] %v"); - spdlog::debug("Node={} starts election. New term={}", m_id, m_currentTerm); + spdlog::debug("Starting election node_id={} current_term={} state={}", + m_id, m_currentTerm, static_cast<int>(m_state)); - spdlog::error("AppendEntries RPC call failed"); + spdlog::error("RPC failed operation={} node_id={} peer_id={} error_code={} error_message={}", + "AppendEntries", m_id, client.getId(), + static_cast<int>(status.error_code()), status.error_message());
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (7)
examples/raft/raft.cpp (6)
56-67
:⚠️ Potential issueCorrect the error messages in exceptions.
In the
NodeClient
constructor, the error messages incorrectly usem_id
twice when formatting the error string. The secondm_id
should bem_ip
to display the correct IP address.Apply this diff to fix the error messages:
-throw std::runtime_error(fmt::format("Failed to establish a gRPC channel for node={} ip={}", m_id, m_id)); +throw std::runtime_error(fmt::format("Failed to establish a gRPC channel for node={} ip={}", m_id, m_ip));Similarly, update the other error messages:
-throw std::runtime_error(fmt::format("Failed to create a stub for node={} ip={}", m_id, m_id)); +throw std::runtime_error(fmt::format("Failed to create a stub for node={} ip={}", m_id, m_ip)); -throw std::runtime_error(fmt::format("Failed to create a KV stub for node={} ip={}", m_id, m_id)); +throw std::runtime_error(fmt::format("Failed to create a KV stub for node={} ip={}", m_id, m_ip));
137-138
:⚠️ Potential issueAvoid unsafe
dynamic_cast
and check fornullptr
.In the
ConsensusModule
constructor, you are usingdynamic_cast
without checking for anullptr
result, which can lead to undefined behavior if the cast fails.Consider modifying the code to check if the cast is successful before using the result:
-auto raftService = dynamic_cast<RaftService::Service*>(this); -auto kvService = dynamic_cast<TinyKVPPService::Service*>(this); +auto* raftService = dynamic_cast<RaftService::Service*>(this); +auto* kvService = dynamic_cast<TinyKVPPService::Service*>(this); +if (!raftService || !kvService) { + throw std::runtime_error("Invalid service cast in ConsensusModule constructor"); +} builder.AddListeningPort(m_ip, grpc::InsecureServerCredentials()); -builder.RegisterService(dynamic_cast<RaftService::Service *>(this)); -builder.RegisterService(dynamic_cast<TinyKVPPService::Service *>(this)); +builder.RegisterService(raftService); +builder.RegisterService(kvService);
647-649
:⚠️ Potential issueFix the logic error with
consecutiveFailures
.Inside the heartbeat loop in
sendHeartbeat
,consecutiveFailures
is incremented and then immediately reset to zero. This negates the increment and prevents retry logic from working correctly.Apply this diff to correct the logic:
if (!client.appendEntries(request, &response)) { consecutiveFailures++; - consecutiveFailures = 0; spdlog::error("AppendEntriesRequest failed during heartbeat. Attempt {}/{}", consecutiveFailures, maxRetries); if (consecutiveFailures >= maxRetries) { return; } continue; }
471-491
:⚠️ Potential issueEnsure proper shutdown and cleanup in
stop()
method.In the
ConsensusModule::stop()
method, there are potential issues with thread management and server shutdown:
- The
m_serverThread
is being requested to stop and joined, but it's not clear where this thread was started.- The
m_raftServer->Shutdown()
should be called before stopping the election and heartbeat threads to prevent new RPCs from being accepted during shutdown.Consider reordering the shutdown sequence and ensuring all threads are properly managed:
void ConsensusModule::stop() { absl::WriterMutexLock locker{&m_stateMutex}; + if (m_raftServer) + { + m_raftServer->Shutdown(); + } m_stopElection = true; m_electionThread.request_stop(); m_electionThread.join(); for (auto &heartbeatThread : m_heartbeatThreads) { heartbeatThread.request_stop(); heartbeatThread.join(); } m_heartbeatThreads.clear(); - if (m_raftServer) - { - m_raftServer->Shutdown(); - } - m_serverThread.request_stop(); - m_serverThread.join(); }
128-131
:⚠️ Potential issueInitialize
m_matchIndex
andm_nextIndex
for all replicas.The
m_matchIndex
andm_nextIndex
maps are not initialized in the constructor, which can lead to undefined behavior when accessing these values later.Add initialization for these maps in the constructor:
for (auto [id, ip] : std::ranges::views::enumerate(replicas)) { if (id + 1 == m_id) { continue; } m_replicas.emplace(id + 1, NodeClient(id + 1, ip)); + // Initialize matchIndex and nextIndex for each replica + m_matchIndex[id + 1] = 0; + m_nextIndex[id + 1] = getLastLogIndex() + 1; }
169-240
:⚠️ Potential issueEnsure compliance with Raft protocol in
AppendEntries
.The
AppendEntries
method seems to have some incomplete implementation aspects regarding the Raft protocol, such as:
- Not updating
m_currentTerm
correctly.- Missing reset of
m_votedFor
.- Incomplete log replication logic.
Review the method to ensure it fully complies with the Raft specification, especially Sections 5.3 and 5.4.
examples/raft/raft.h (1)
117-121
:⚠️ Potential issueAdd thread safety annotations for member variables.
Member variables that are accessed concurrently should have appropriate thread safety annotations to prevent data races and ensure correctness.
Add
ABSL_GUARDED_BY
annotations:uint32_t m_currentTerm ABSL_GUARDED_BY(m_stateMutex); uint32_t m_votedFor ABSL_GUARDED_BY(m_stateMutex); NodeState m_state ABSL_GUARDED_BY(m_stateMutex); std::vector<LogEntry> m_log ABSL_GUARDED_BY(m_stateMutex); -/*db::wal::wal_t m_wal ABSL_GUARDED_BY(m_stateMutex);*/
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
examples/raft/raft.cpp
(1 hunks)examples/raft/raft.h
(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: build-gcc
- GitHub Check: build-clang
🔇 Additional comments (4)
examples/raft/raft.cpp (1)
379-389
: Handle failure of persistent state initialization ininit()
.In the
init()
method, ifinitializePersistentState()
fails, the node currently logs a warning but still returnsfalse
. There might be a need for more robust error handling or recovery strategies.Please confirm whether it's acceptable for the node to proceed if persistent state initialization fails. If not, consider terminating the node or implementing a retry mechanism.
examples/raft/raft.h (3)
30-37
: Add virtual destructor toNodeClient
.The
NodeClient
class contains virtual methods due to the use of gRPC stubs. It is recommended to have a virtual destructor to ensure proper cleanup of derived classes.Add a virtual destructor:
class NodeClient { public: NodeClient(ID nodeId, IP nodeIp); + virtual ~NodeClient() = default; auto appendEntries(const AppendEntriesRequest &request, AppendEntriesResponse *response) -> bool; auto requestVote(const RequestVoteRequest &request, RequestVoteResponse *response) -> bool;
48-75
: Improve documentation and thread safety annotations inConsensusModule
.The
ConsensusModule
class lacks sufficient documentation and thread safety annotations, which are important for maintainability and correctness.Consider adding method documentation and ensuring that thread safety annotations are consistently applied:
class ConsensusModule : public RaftService::Service, public TinyKVPPService::Service, std::enable_shared_from_this<ConsensusModule> { public: // Constructs a ConsensusModule with the given node ID and list of replica IPs. ConsensusModule(ID nodeId, std::vector<IP> replicas); // Raft RPCs auto AppendEntries(grpc::ServerContext *pContext, const AppendEntriesRequest *pRequest, AppendEntriesResponse *pResponse) -> grpc::Status override; // Other methods... + // Initializes the persistent state. Must be called before start(). auto init() -> bool; + // Starts the consensus module's main loop and begins participating in the cluster. void start(); + // Stops the consensus module and cleans up resources. void stop();
134-138
: Usestd::atomic_flag
for better synchronization.For atomic boolean flags,
std::atomic_flag
provides better guarantees and semantics compared tostd::atomic<bool>
.Modify the declarations:
std::atomic<bool> m_leaderHeartbeatReceived{false}; std::atomic<bool> m_stopElection{false}; +std::atomic_flag m_leaderHeartbeatReceived = ATOMIC_FLAG_INIT; +std::atomic_flag m_stopElection = ATOMIC_FLAG_INIT;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
♻️ Duplicate comments (3)
examples/raft/raft.cpp (3)
666-676
:⚠️ Potential issueFix the
consecutiveFailures
logic insendHeartbeat
.Currently,
consecutiveFailures
is reset to zero immediately after being incremented, which prevents the failure count from accumulating correctly. This negates the retry mechanism and may cause premature termination of the heartbeat thread.Apply this diff to correct the logic:
consecutiveFailures++; spdlog::error("AppendEntriesRequest failed during heartbeat. Attempt {}/{}", consecutiveFailures, maxRetries); if (consecutiveFailures >= maxRetries) { return; } - consecutiveFailures = 0; continue;
710-784
: 🛠️ Refactor suggestionAvoid using detached threads in
sendAppendEntriesRPC
.Using detached threads can lead to resource management issues and potential undefined behavior. It's safer to manage thread lifetimes explicitly.
Apply this diff to manage threads properly:
- std::thread( + m_replicationThreads.emplace_back( [this](NodeClient &client, std::vector<LogEntry> logEntries) { // ... existing code ... }, std::ref(client), logEntries) - .detach(); + );Ensure that
m_replicationThreads
is stored (e.g., in astd::vector<std::jthread>
) and properly joined or stopped during shutdown to prevent resource leaks.
590-603
:⚠️ Potential issueEnsure thread safety in
becomeFollower
.The method
becomeFollower
modifies shared state but may not have the necessary mutex protection in all calling contexts.Ensure that
becomeFollower
is always called withm_stateMutex
held or add mutex locking within the method:+ absl::MutexLock locker(&m_stateMutex); m_currentTerm = newTerm; m_state = NodeState::FOLLOWER; updatePersistentState(std::nullopt, 0); // ... existing code ...
Alternatively, if the method is intended to be called with the mutex already locked, add a thread annotation to enforce this:
void ConsensusModule::becomeFollower(uint32_t newTerm) ABSL_EXCLUSIVE_LOCKS_REQUIRED(m_stateMutex);
🧹 Nitpick comments (3)
conan/profiles/release-gcc (1)
10-11
: Review profile consistency across compilersThe GCC profile retains the
compiler.libcxx
setting while it's removed from Clang profiles. Consider:
- Standardizing the approach across all profiles
- Documenting the reason for different handling if intended
examples/raft/raft.cpp (1)
642-646
: Prevent tight loop on failed heartbeats.When the heartbeat fails, the loop immediately continues without a delay, potentially causing high CPU usage due to rapid retries.
Consider adding a short sleep before retrying to prevent a tight loop:
continue; + // Add a short delay before the next attempt + std::this_thread::sleep_for(std::chrono::milliseconds(50)); }examples/raft/raft.h (1)
41-41
: Addnoexcept
to destructor forNodeClient
.Consistently marking the destructor as
noexcept
is good practice and conveys exception safety guarantees.Ensure the destructor is declared as:
virtual ~NodeClient() noexcept = default;
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
CMakeLists.txt
(2 hunks)conan/profiles/debug-clang
(1 hunks)conan/profiles/debug-clang-tsan
(0 hunks)conan/profiles/debug-gcc
(1 hunks)conan/profiles/release-clang
(1 hunks)conan/profiles/release-gcc
(1 hunks)examples/raft/main.cpp
(1 hunks)examples/raft/raft.cpp
(1 hunks)examples/raft/raft.h
(1 hunks)
💤 Files with no reviewable changes (1)
- conan/profiles/debug-clang-tsan
✅ Files skipped from review due to trivial changes (1)
- conan/profiles/debug-gcc
🚧 Files skipped from review as they are similar to previous changes (1)
- examples/raft/main.cpp
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: build-gcc
- GitHub Check: build-clang
🔇 Additional comments (6)
conan/profiles/debug-clang (1)
10-10
: Verify Clang 19 availability in the build environmentThe upgrade to Clang 19 is significant. Please ensure:
- Clang 19 is available in the build environment
- All team members have access to this version
- CI/CD pipelines are updated accordingly
Run this script to check Clang availability:
✅ Verification successful
Clang 19 upgrade is well-supported by the project infrastructure
The compiler version upgrade is consistent with the project's requirements:
- Both debug and release Clang profiles are aligned to version 19
- Project requires C++23 features, which Clang 19 fully supports
- Build environment is controlled via Docker images
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Check if Clang 19 is available and its features clang_path=$(which clang) if [ -x "$clang_path" ]; then $clang_path --version # Check C++23 support echo '#include <concepts>' | $clang_path -x c++ -std=c++23 - fiLength of output: 140
Script:
#!/bin/bash # Search for compiler configurations and CI files echo "=== Compiler Profiles ===" fd ".*" conan/profiles --type f --exec cat {} echo -e "\n=== CI Configurations ===" fd "^\.gitlab-ci\.yml$|^\.github/workflows/.*\.yml$" --type f --exec cat {} echo -e "\n=== Build System Files ===" fd "CMakeLists.txt|conanfile\.txt|conanfile\.py" --type f --exec cat {}Length of output: 14546
conan/profiles/release-clang (1)
10-10
: LGTM! Changes are consistent with debug profileThe changes maintain consistency with the debug-clang profile, which is good practice.
conan/profiles/release-gcc (1)
9-9
: Consider implications of removing GNU extensionsChanging from
gnu23
to23
removes GNU extensions. Please verify that:
- The Raft implementation doesn't rely on any GNU-specific features
- All existing code compiles without GNU extensions
Run this script to check for GNU extension usage:
✅ Verification successful
GNU extensions removal is safe ✅
The codebase doesn't rely on GNU-specific features. The only GNU attribute usage is in thread safety annotations that are already properly guarded for Clang-only usage, and in isolated third-party GSL utility files. The core Raft implementation will compile correctly with standard C++23.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for common GNU extension usage rg -l "(__GNUC__|__extension__|__attribute__|__asm__|typeof|__typeof__|__builtin)" # Check if any files use GNU-specific headers rg -l "#include.*<(gnu/|bits/)"Length of output: 328
Script:
#!/bin/bash # Check thread_safety.h content echo "=== thread_safety.h content ===" cat examples/raft/thread_safety.h # Look for GNU extensions specifically in Raft implementation echo -e "\n=== GNU extensions in Raft code ===" fd -e cpp -e h -e hpp . examples/raft --exec rg -l "(__GNUC__|__extension__|__attribute__|__asm__|typeof|__typeof__|__builtin)" {}Length of output: 9927
examples/raft/raft.cpp (2)
142-147
: Usestatic_cast
instead ofdynamic_cast
for base class casting.Since
ConsensusModule
publicly inherits fromRaftService::Service
,static_cast
is sufficient and more efficient thandynamic_cast
. This change also eliminates the need for runtimenullptr
checks.Apply this diff to replace
dynamic_cast
withstatic_cast
:-auto *raftService = dynamic_cast<RaftService::Service *>(this); +auto *raftService = static_cast<RaftService::Service *>(this); if (raftService == nullptr) { throw std::runtime_error(fmt::format("Failed to dynamic_cast ConsensusModule to RaftService")); }Similarly, update the casting for
tkvppService
:-auto *tkvppService = dynamic_cast<TinyKVPPService::Service *>(this); +auto *tkvppService = static_cast<TinyKVPPService::Service *>(this);
448-458
: Verify election timer reset logic.The logic within
AwaitWithTimeout
may not properly reset the election timer, potentially causing unintended election timeouts.Please verify that the condition in
AwaitWithTimeout
correctly captures both the receipt of a heartbeat and the passage of the timeout duration:bool heartbeatReceived = m_timerMutex.AwaitWithTimeout( absl::Condition( +[this, &timeToWaitDeadlineMs]() { return m_leaderHeartbeatReceived.load() || currentTimeMs() >= timeToWaitDeadlineMs; } ), absl::Milliseconds(timeToWaitMs) );Ensure that
currentTimeMs()
is appropriately calculated within the lambda to prevent stale time comparisons.CMakeLists.txt (1)
16-22
: Conditionally include-Wthread-safety
for Clang compiler.The
-Wthread-safety
flag is specific to Clang and can cause errors with other compilers like GCC.The added conditional correctly handles compiler-specific flags, ensuring compatibility across different compilers.
examples/raft/raft.h
Outdated
std::atomic<bool> m_leaderHeartbeatReceived{false}; | ||
std::atomic<bool> m_stopElection{false}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Atomic variables need explicit memory ordering specifications
The verification reveals that while atomic methods are used, memory ordering is not specified. This could lead to synchronization issues:
m_leaderHeartbeatReceived
: All load/store operations should specify memory orderingm_stopElection
: Direct boolean access should be replaced with load/store operations with proper memory ordering
Recommended changes:
- For
m_leaderHeartbeatReceived.store()
, usememory_order_release
- For
m_leaderHeartbeatReceived.load()
, usememory_order_acquire
- Replace direct access to
m_stopElection
with proper atomic operations - For
m_stopElection
, usememory_order_seq_cst
as it's used for thread termination
🔗 Analysis chain
Ensure atomic variables are used safely with appropriate memory order.
The atomic variables m_leaderHeartbeatReceived
and m_stopElection
may require explicit memory order semantics to prevent data races.
Please verify that these atomic variables are accessed with consistent memory order, or consider protecting them with mutexes if they represent complex state changes.
For instance, use:
std::atomic<bool> m_leaderHeartbeatReceived{false};
And when accessing:
if (m_leaderHeartbeatReceived.load(std::memory_order_acquire)) {
// ...
}
🏁 Scripts executed
The following scripts were executed for the analysis:
Error: No shell scripts found
Script:
#!/bin/bash
# Search for usages of m_leaderHeartbeatReceived
ast-grep --pattern 'm_leaderHeartbeatReceived'
# Search for usages of m_stopElection
ast-grep --pattern 'm_stopElection'
# Backup search using ripgrep for more context
rg "m_leaderHeartbeatReceived|m_stopElection" -A 2 -B 2
Length of output: 3647
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
♻️ Duplicate comments (5)
examples/raft/raft.cpp (2)
605-609
:⚠️ Potential issueCorrect majority calculation in
hasMajority
functionThe current majority calculation may lead to incorrect results, especially in clusters with an even number of nodes. The calculation should consider the total number of nodes, including the current node.
Apply this diff to fix the calculation:
auto ConsensusModule::hasMajority(uint32_t votes) const -> bool { - constexpr const double HALF_OF_THE_REPLICAS = 2.0; - return votes > static_cast<double>(m_replicas.size()) / HALF_OF_THE_REPLICAS; + const size_t totalNodes = m_replicas.size() + 1; // Including self + return votes > totalNodes / 2; }
666-678
:⚠️ Potential issueFix
consecutiveFailures
reset logic in heartbeatThe
consecutiveFailures
counter is reset to zero immediately after incrementing, which prevents the failure count from accumulating and may cause premature termination of the heartbeat loop.Apply this diff to correct the failure counter logic:
if (!client.appendEntries(request, &response)) { consecutiveFailures++; spdlog::error("AppendEntriesRequest failed during heartbeat. Attempt {}/{}", consecutiveFailures, maxRetries); if (consecutiveFailures >= maxRetries) { return; } - consecutiveFailures = 0; continue; } +else +{ + consecutiveFailures = 0; // Reset on successful response +}examples/raft/raft.h (3)
67-67
:⚠️ Potential issueInherit
enable_shared_from_this
publicly
ConsensusModule
should inherit fromstd::enable_shared_from_this
publicly to useshared_from_this()
safely. Private inheritance may lead to undefined behavior when callingshared_from_this()
.Apply this diff to change the inheritance:
class ConsensusModule : public RaftService::Service, public TinyKVPPService::Service, - std::enable_shared_from_this<ConsensusModule> + public std::enable_shared_from_this<ConsensusModule> { // Class definition };
151-152
: 🛠️ Refactor suggestionSpecify memory order for atomic variables
The
std::atomic<bool>
variablesm_leaderHeartbeatReceived
andm_stopElection
are accessed without specifying memory order, which may lead to synchronization issues.Apply this diff to specify memory ordering:
std::atomic<bool> m_leaderHeartbeatReceived{false}; std::atomic<bool> m_stopElection{false}; // When storing: -m_leaderHeartbeatReceived.store(true); +m_leaderHeartbeatReceived.store(true, std::memory_order_release); // When loading: -if (m_leaderHeartbeatReceived.load()) +if (m_leaderHeartbeatReceived.load(std::memory_order_acquire))Repeat similar changes for
m_stopElection
. Alternatively, protect accesses with mutexes if appropriate.
145-148
:⚠️ Potential issueInitialize
m_matchIndex
andm_nextIndex
properlyEnsure that
m_matchIndex
andm_nextIndex
are initialized for all replica nodes in the constructor to prevent uninitialized access.Confirm that in the constructor, after initializing
m_replicas
, you properly initializem_matchIndex
andm_nextIndex
. If this is already done, ensure that the logic correctly covers all replica nodes.
🧹 Nitpick comments (1)
examples/raft/raft.cpp (1)
925-943
: Handle malformed log entries during restoreIn
restorePersistentState()
, if a log line is malformed, the code may crash or behave unexpectedly. Add checks to ensure that each line is correctly parsed.Apply this diff to add error handling:
while (std::getline(ifs, logLine)) { std::stringstream sst(logLine); std::string key; std::string value; std::uint32_t term = 0; sst >> key >> value >> term; + if (sst.fail()) + { + spdlog::error("Malformed log entry in {}: {}", path.c_str(), logLine); + continue; // Or handle the error as needed + } LogEntry logEntry; logEntry.set_key(key); logEntry.set_value(value); logEntry.set_term(term); m_log.emplace_back(logEntry); spdlog::info("Node={} restored logEntry=[key={}, value={}, term={}]", m_id, key, value, term); }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
examples/raft/raft.cpp
(1 hunks)examples/raft/raft.h
(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: build-gcc
- GitHub Check: build-clang
🔇 Additional comments (4)
examples/raft/raft.cpp (2)
44-46
: PassnodeIp
by const reference inNodeClient
constructorPassing
nodeIp
by const reference avoids unnecessary copying of thestd::string
, improving performance.Apply this diff to modify the constructor:
-NodeClient::NodeClient(ID nodeId, IP nodeIp) +NodeClient::NodeClient(ID nodeId, const IP& nodeIp) : m_id{nodeId}, - m_ip{std::move(nodeIp)}, + m_ip{nodeIp}, m_channel(grpc::CreateChannel(m_ip, grpc::InsecureChannelCredentials())), m_stub(RaftService::NewStub(m_channel)), m_kvStub(TinyKVPPService::NewStub(m_channel)) { assert(m_id > 0); assert(!m_ip.empty()); // ... }Remember to update the declaration in
raft.h
as well.
658-659
: Correct previous log term and index in heartbeatIn the
sendHeartbeat
method,request.set_prevlogterm(getLastLogTerm());
andrequest.set_prevlogindex(getLastLogIndex());
may not correctly represent the leader's state when there are no new entries. Ensure that the correct previous log term and index are sent.Please verify that
getLastLogTerm()
andgetLastLogIndex()
return the appropriate values in the context of heartbeats. If they refer to the latest log entries, consider adjusting them to represent the previous ones when necessary.Run the following script to check the implementations of
getLastLogTerm()
andgetLastLogIndex()
:Ensure that these functions return zero when the log is empty and the correct term/index otherwise.
examples/raft/raft.h (2)
32-48
: Add virtual destructor and documentation toNodeClient
To ensure proper resource cleanup and facilitate inheritance, add a virtual destructor to
NodeClient
. Additionally, provide method documentation for clarity.Apply this diff:
class NodeClient { public: + /** + * Virtual destructor for safe polymorphic destruction. + */ + virtual ~NodeClient() noexcept = default; // Existing constructor and methods... /** * Sends AppendEntries RPC to the target node. */ auto appendEntries(const AppendEntriesRequest &request, AppendEntriesResponse *response) -> bool; // Other methods... };
65-93
: Improve documentation and inheritance inConsensusModule
Add method documentation and ensure consistent use of override specifiers. Publicly inherit from
enable_shared_from_this
to avoid issues withshared_from_this()
.Apply this diff:
class ConsensusModule : public RaftService::Service, public TinyKVPPService::Service, - std::enable_shared_from_this<ConsensusModule> + public std::enable_shared_from_this<ConsensusModule> { public: /** * Constructs a consensus module for a node in the Raft cluster. * @param nodeId Unique identifier for this node * @param replicas Vector of IP addresses for all nodes in cluster * @throws std::runtime_error if server creation fails */ ConsensusModule(ID nodeId, std::vector<IP> replicas); + virtual ~ConsensusModule() noexcept override = default; /** * Handles AppendEntries RPC calls from other nodes. */ auto AppendEntries(grpc::ServerContext *pContext, const AppendEntriesRequest *pRequest, AppendEntriesResponse *pResponse) -> grpc::Status override; // Other methods with documentation... };
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (3)
examples/raft/raft.cpp (3)
616-620
:⚠️ Potential issueFix incorrect majority calculation in
hasMajority
.The majority calculation is incorrect and could lead to split votes or incorrect behavior. The function should consider the total number of nodes (replicas plus self) when determining if a majority has been achieved.
Apply this diff to correct the calculation:
auto ConsensusModule::hasMajority(uint32_t votes) const -> bool { - constexpr const double HALF_OF_THE_REPLICAS = 2.0; - return votes > static_cast<double>(m_replicas.size()) / HALF_OF_THE_REPLICAS; + // Total nodes = replicas + self + const size_t totalNodes = m_replicas.size() + 1; + return votes >= (totalNodes / 2) + 1; }This ensures that a majority requires more than half of the total nodes, which aligns with the Raft consensus requirements.
678-687
:⚠️ Potential issueCorrect the
consecutiveFailures
logic in heartbeat.The
consecutiveFailures
counter is reset to zero immediately after being incremented, which prevents retries from accumulating correctly. This negates the purpose of tracking consecutive failures and may cause the heartbeat thread to continue indefinitely without proper handling of failures.Apply this diff to fix the logic:
if (!client.appendEntries(request, &response)) { consecutiveFailures++; spdlog::error("AppendEntriesRequest failed during heartbeat. Attempt {}/{}", consecutiveFailures, maxRetries); if (consecutiveFailures >= maxRetries) { return; } - consecutiveFailures = 0; continue; } +// Reset consecutiveFailures to zero only when AppendEntries RPC call succeeds +} else { + consecutiveFailures = 0; +}This change ensures that
consecutiveFailures
is only reset upon a successful RPC call, allowing the retry mechanism to function correctly.
721-797
: 🛠️ Refactor suggestionAvoid detaching threads in
sendAppendEntriesRPC
.Detaching threads can lead to undefined behavior and resource leaks. Use
std::jthread
or store the thread handles to manage thread lifetimes properly instead of detaching threads. Ensure proper thread management and cleanup.Apply this diff to properly manage the threads:
void ConsensusModule::sendAppendEntriesRPC(NodeClient &client, std::vector<LogEntry> logEntries) { - std::thread( + m_replicationThreads.emplace_back( [this](NodeClient &client, std::vector<LogEntry> logEntries) { // ... existing code ... }, - std::ref(client), - logEntries) - .detach(); + std::ref(client), + std::move(logEntries) + ); } +// Declare m_replicationThreads in your class to store these threads +std::vector<std::jthread> m_replicationThreads;Ensure you join these threads appropriately in the
stop()
method to prevent resource leaks:void ConsensusModule::stop() { // ... existing code ... + for (auto &replicationThread : m_replicationThreads) + { + replicationThread.request_stop(); + replicationThread.join(); + } + m_replicationThreads.clear(); }
🧹 Nitpick comments (3)
examples/raft/raft.cpp (3)
72-75
: Use consistent and fixed RPC timeout values instead of random timeouts.Using random timeouts for RPC calls in
appendEntries
,requestVote
, andput
methods can lead to unpredictable behavior and complicate debugging. It's generally better to use a consistent and reasonable timeout value for RPC calls to ensure reliable communication and easier troubleshooting.Apply this diff to use a fixed timeout:
// In NodeClient::appendEntries (lines 72-75) - const auto RPC_TIMEOUT = std::chrono::seconds(generateRandomTimeout()); + constexpr auto RPC_TIMEOUT = std::chrono::seconds(5); // Use a fixed 5-second timeout // In NodeClient::requestVote (lines 88-91) - const auto RPC_TIMEOUT = std::chrono::seconds(generateRandomTimeout()); + constexpr auto RPC_TIMEOUT = std::chrono::seconds(5); // Use a fixed 5-second timeout // In NodeClient::put (lines 106-108) - context.set_deadline(std::chrono::system_clock::now() + std::chrono::seconds(generateRandomTimeout())); + context.set_deadline(std::chrono::system_clock::now() + std::chrono::seconds(5)); // Use a fixed 5-second timeoutAlso applies to: 88-91, 106-108
143-155
: Usestatic_cast
instead ofdynamic_cast
for service registration.Since
ConsensusModule
inherits fromRaftService::Service
andTinyKVPPService::Service
, you can safely usestatic_cast
instead ofdynamic_cast
. This avoids the overhead of run-time type checking and does not require RTTI support, improving performance and compatibility.Apply this diff:
// Replace dynamic_cast with static_cast for RaftService - auto *raftService = dynamic_cast<RaftService::Service *>(this); + auto *raftService = static_cast<RaftService::Service *>(this); // Replace dynamic_cast with static_cast for TinyKVPPService - auto *tkvppService = dynamic_cast<TinyKVPPService::Service *>(this); + auto *tkvppService = static_cast<TinyKVPPService::Service *>(this);
508-512
: Remove commented and unused code related tom_serverThread
.The code involving
m_serverThread
is commented out, and there is no evidence ofm_serverThread
being initialized or used elsewhere. Keeping unused or commented code can lead to confusion and maintainability issues. Consider removing this code if it's not needed.Apply this diff:
- /*if (m_serverThread.joinable())*/ { m_serverThread.request_stop(); m_serverThread.join(); }
If
m_serverThread
is intended to be used for managing the server thread, ensure it is properly initialized and managed throughout the code.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
examples/raft/main.cpp
(1 hunks)examples/raft/raft.cpp
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- examples/raft/main.cpp
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: build-gcc
- GitHub Check: build-clang
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
♻️ Duplicate comments (6)
lib/wal/log/persistent_log_storage.h (5)
144-154
:⚠️ Potential issueFix buffer size adjustment after read operation.
The
read_impl
function doesn't resize the buffer according to the actual number of bytes read, which could lead to extraneous data when fewer bytes are read than requested.Apply this fix:
[[nodiscard]] auto read_impl(std::size_t offset, std::size_t size) -> std::string { std::string buffer; buffer.resize(size); - if (const auto res = m_file.read(offset, buffer.data(), size); !res.has_value()) + const auto res = m_file.read(offset, buffer.data(), size); + if (!res.has_value()) { spdlog::error("Failed to read from file storage. Offset={}, size={}", offset, size); return {}; } + // Adjust buffer size to actual bytes read + buffer.resize(static_cast<size_t>(res.value())); return buffer; }
139-142
:⚠️ Potential issueHandle partial writes or I/O errors robustly.
While
m_file.append
returns negative for errors, there is no explicit handling of partial writes (e.g., if it appends fewer bytes than requested).Improve the error handling by checking if the number of bytes written matches the requested size:
[[nodiscard]] auto write_impl(const char *data, std::size_t offset, std::size_t size) -> bool { - return m_file.append({data, size}).transform([](ssize_t res) { return res >= 0; }).value_or(false); + return m_file.append({data, size}).transform([size](ssize_t res) { + if (res < 0) { + spdlog::error("Write operation failed with error code: {}", res); + return false; + } + if (static_cast<size_t>(res) < size) { + spdlog::warn("Partial write: {} of {} bytes written", res, size); + } + return true; + }).value_or(false); }
227-240
: 🛠️ Refactor suggestionGracefully handle empty or corrupt files in constructor.
The
persistent_log_storage_t
constructor reads the entire file content at once, which could be inefficient for large files. Consider reading in chunks and adding error handling.Improve by implementing chunked reading with proper error handling:
explicit persistent_log_storage_t(TBackendStorage &&backendStorage) : m_backendStorage(std::move(backendStorage)) { - const std::string raw = m_backendStorage.read(0, m_backendStorage.size()); - std::istringstream stream(raw); - for (std::string line; std::getline(stream, line);) - { - if (absl::StripAsciiWhitespace(line).empty()) - { - continue; - } - m_inMemoryLog.emplace_back(std::move(line)); + const size_t total_size = m_backendStorage.size(); + if (total_size == 0) { + return; // Empty file, nothing to load + } + + try { + // Read in chunks for efficiency with large files + constexpr size_t chunk_size = 4096; + size_t offset = 0; + std::string buffer; + std::string line_buffer; + + while (offset < total_size) { + size_t to_read = std::min(chunk_size, total_size - offset); + std::string chunk = m_backendStorage.read(offset, to_read); + offset += chunk.size(); + + if (chunk.empty()) { + spdlog::warn("Read returned empty chunk at offset {}", offset); + break; + } + + buffer += chunk; + + // Process complete lines + std::istringstream stream(buffer); + while (std::getline(stream, line_buffer)) { + if (!absl::StripAsciiWhitespace(line_buffer).empty()) { + m_inMemoryLog.emplace_back(std::move(line_buffer)); + } + line_buffer.clear(); + } + + // Keep any incomplete last line for the next iteration + buffer = stream.eof() ? "" : line_buffer; + } + } catch (const std::exception& e) { + spdlog::error("Error loading log entries: {}", e.what()); } }
261-268
:⚠️ Potential issueFix missing newline in append method.
The append method doesn't add newlines between entries, but the constructor reads the file line by line. This inconsistency will cause problems when loading persisted logs.
Fix by adding newlines between entries:
void append(std::string entry) { + // Add newline to separate entries when reading line by line + entry += '\n'; if (!m_backendStorage.write(static_cast<const char *>(entry.data()), m_backendStorage.size(), entry.size())) { spdlog::error("Persistent log storage write failed. Entry={}, size={}\n", entry, entry.size()); } - m_inMemoryLog.emplace_back(std::move(entry)); + // Store the original entry (without newline) in memory + m_inMemoryLog.emplace_back(entry.substr(0, entry.size() - 1)); }
288-291
:⚠️ Potential issueInconsistency between size() and read() methods.
The
size()
method returnsm_backendStorage.size()
butread()
usesm_inMemoryLog.size()
. This inconsistency could lead to subtle bugs.Fix the inconsistency:
[[nodiscard]] auto size() const -> std::size_t { - return m_backendStorage.size(); + return m_inMemoryLog.size(); }lib/fs/append_only_file.cpp (1)
80-111
:⚠️ Potential issueCheck the return value of
io_uring_submit
.Submitting I/O operations without verifying the return code can mask submission failures. If
io_uring_submit(&m_ring)
returns a negative error code, the subsequent wait call might behave unpredictably. For example:io_uring_submit(&m_ring); +int submit_res = io_uring_submit(&m_ring); +if (submit_res < 0) { + return std::unexpected(file_error_t{ + .code = file_error_code_k::write_failed, + .system_errno = -submit_res, + .message = std::format("Failed to submit io_uring write. fd={}", m_fd), + }); +} // Then proceed to wait for the CQE
🧹 Nitpick comments (7)
lib/fs/CMakeLists.txt (1)
10-12
: Commented DBTest Executable Block
There is a commented-out section for the DBTest executable. Consider removing this block if it's no longer needed to reduce clutter, or add context about its future usage if it's intended for temporary debugging.lib/fs/append_only_file.h (1)
43-47
: Good use of builder pattern, but documentation neededThe builder pattern is an excellent choice for handling the complexity of file creation with various options. However, the
direct_io
parameter needs documentation as it has significant performance implications.When
direct_io
is true, it bypasses the kernel page cache which can improve performance for specific workloads but may degrade performance for others. This should be clearly documented to guide users on when to enable it.Consider adding documentation comments to explain:
- The purpose of the
append_only_file_builder_t
class- The implications of the
direct_io
parameter- The expected behavior of the
build
methodlib/wal/log/persistent_log_storage.h (2)
141-141
: Improve error logging in write operation.The current implementation silently returns false when
append
fails but doesn't log any details about the failure. Consider adding more comprehensive error logging.Enhance the error reporting:
- return m_file.append({data, size}).transform([](ssize_t res) { return res >= 0; }).value_or(false); + return m_file.append({data, size}).transform([data, size](ssize_t res) { + if (res < 0) { + spdlog::error("File append failed. Data size: {}, Error code: {}", size, res); + return false; + } + return true; + }).value_or(false);
198-200
: Use specific error handling for different file error codes.The current error handling in
build_impl
uses a default case for all file open errors. Consider implementing specific handling for different error types.Enhance the error handling with specific error types:
switch (error.code) { + case fs::file_error_code_t::kFileOpenError: + spdlog::error("Failed to open file: {}", config().file_path.c_str()); + return std::unexpected(storage_backend_builder_error_t::kUnableToOpenFile); + case fs::file_error_code_t::kFileIOError: + spdlog::error("I/O error on file: {}", config().file_path.c_str()); + return std::unexpected(storage_backend_builder_error_t::kUnableToOpenFile); default: + spdlog::error("Unknown error opening file: {}", static_cast<int>(error.code)); return std::unexpected(storage_backend_builder_error_t::kUnableToOpenFile); }lib/db/manifest/manifest.cpp (2)
75-100
: Log file append errors for better troubleshooting.Currently,
.value_or(false)
returnsfalse
if any error occurs while appending, without providing additional details. Consider logging the error to avoid losing critical debugging information. For example:-auto manifest_t::add(record_t info) -> bool +auto manifest_t::add(record_t info) -> bool { ... m_records.emplace_back(info); const std::string &infoSerialized = std::visit(infoToString, info); - return m_log->append({infoSerialized.c_str(), infoSerialized.size()}) - .transform([](ssize_t res) { return res >= 0; }) - .value_or(false); + auto result = m_log->append({infoSerialized.c_str(), infoSerialized.size()}) + .transform([](ssize_t res) { return res >= 0; }); + if (!result) { + spdlog::error("Failed to append to manifest: {}", result.error().message); + return false; + } + return result.value(); }
106-152
: Add error logging when recovery fails.If
m_log->stream()
or the subsequent.and_then(...)
block encounters an error,.has_value()
simply returnsfalse
silently. Consider logging the error details so that recovery failures are not hidden:-return m_log->stream() - .and_then( - [this](std::stringstream stream) -> std::expected<void, fs::file_error_t> { - ... - }) - .has_value(); +auto result = m_log->stream() + .and_then( + [this](std::stringstream stream) -> std::expected<void, fs::file_error_t> { + ... + return {}; + }); +if (!result) { + spdlog::error("Manifest recovery failed: {}", result.error().message); + return false; +} +return true;lib/fs/append_only_file.cpp (1)
68-78
: Consider addressing the TODO for reference counting.The destructor gracefully handles moved-from objects, but there is a TODO about reference counting. If multiple instances refer to the same descriptor, you risk premature closure.
Shall I propose an approach for reference counting or open a new issue to discuss multi-owner scenarios?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
lib/db/manifest/manifest.cpp
(3 hunks)lib/fs/CMakeLists.txt
(1 hunks)lib/fs/append_only_file.cpp
(1 hunks)lib/fs/append_only_file.h
(1 hunks)lib/wal/log/persistent_log_storage.h
(1 hunks)
🧰 Additional context used
🧬 Code Definitions (3)
lib/wal/log/persistent_log_storage.h (3)
lib/wal/log/in_memory_log_storage.h (7)
nodiscard
(50-57)nodiscard
(59-63)nodiscard
(65-68)nodiscard
(85-88)entry
(45-48)entry
(45-45)index
(50-50)lib/wal/log/log.h (7)
nodiscard
(62-65)nodiscard
(67-70)nodiscard
(72-75)nodiscard
(125-143)entry
(57-60)entry
(57-57)index
(62-62)lib/fs/append_only_file.h (2)
data
(27-27)offset
(28-28)
lib/db/manifest/manifest.cpp (3)
lib/fs/append_only_file.h (1)
path
(46-46)lib/db/manifest/manifest.h (1)
info
(198-198)lib/fs/append_only_file.cpp (2)
stream
(173-209)stream
(173-173)
lib/fs/append_only_file.h (1)
lib/fs/append_only_file.cpp (7)
other
(51-65)other
(51-51)append_only_file_t
(31-35)append_only_file_t
(37-49)append_only_file_t
(67-78)size
(129-141)size
(129-129)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: build-gcc
- GitHub Check: build-clang
🔇 Additional comments (15)
lib/fs/CMakeLists.txt (3)
2-2
: Project Name Update Verification
The project name is now set to "frankie", which aligns with the overall restructuring. Ensure this name change is consistently applied across the entire build configuration.
4-4
: FS Library Target Definition
The FS library is defined using a consolidated list of source and header files. Including header files directly into the target can help IDEs display complete source information, so this is acceptable if intentional.
8-8
: Updated Library Linkage
The FS target now links against both theuring
library andspdlog::spdlog
, which seems aligned with the new dependency requirements. Verify that these libraries are available in the build environment and that their versions are compatible with the rest of the project.lib/fs/append_only_file.h (3)
3-10
: Good modernization with better includes and error handlingThe inclusion of low-level I/O headers (
fcntl.h
,unistd.h
,liburing.h
) andstd::expected
shows a positive shift toward modern C++ practices and more efficient I/O operations.
27-32
: Improved API with better error handling using std::expectedThe switch to using
std::expected
for error handling is an excellent improvement over traditional error codes or exceptions. The[[nodiscard]]
attribute ensures callers don't ignore error results.Regarding the
stream()
method:Consider returning a reference or pointer for stream usage.
Returning astd::stringstream
by value may be convenient, but if users intend to write large data or require advanced usage, consider other approaches—such as returning a reference to an internal stringstream or providing a separate factory method that clarifies usage.
14-26
:✅ Verification successful
Well-designed class with proper rule-of-five implementation
The conversion from a struct to a class with explicit rule-of-five elements is a good practice. I particularly appreciate:
- Explicitly deleted default constructor and copy operations
- Move semantics support
- Friend declaration for the builder class
However, I notice the move assignment operator is declared but its implementation isn't shown in the relevant snippets.
🏁 Script executed:
#!/bin/bash # Check if move assignment operator is implemented in the source file rg -A 10 "operator=" lib/fs/append_only_file.cppLength of output: 281
Approval: Verified move assignment operator implementation
Great work refining the class design. I confirmed that the move assignment operator, although only declared in the header snippet, is indeed implemented in lib/fs/append_only_file.cpp. The changes correctly enforce the rule-of-five with explicitly deleted default and copy operations along with proper move semantics. No further modifications are required.
lib/fs/append_only_file.cpp (9)
31-36
: Constructor initialization looks good.No issues found. The constructor trivially assigns the file descriptor and ring.
37-49
: Move constructor is properly implemented.The move constructor carefully resets the source instance to avoid double-destruction.
51-65
: Move assignment operator is properly implemented.Similar to the move constructor, the operator resets the source to a safe state.
113-127
: Singlepread
implementation is correct for partial reads.Returning on partial reads is expected at the call site or repeated in a loop if needed.
129-141
: File size retrieval logic appears correct.Checking
fstat
return values and returning an error on failure is appropriate.
143-154
: Flush method adequately checks for errors.The
fsync
call’s return value is correctly validated, preventing silent flush failures.
156-171
: Reset method correctly truncates and re-seeks.Both operations now check for errors, avoiding silent failures.
173-209
: Stream reading logic is straightforward and handles end-of-file gracefully.The method properly returns an error on read failures, ensuring no silent data corruption.
229-265
: Builder function is well-structured.The code uses
std::expected
to handle errors, closes the file onio_uring
initialization failure, and returns a meaningful error message as needed.
lib/fs/append_only_file.h
Outdated
private: | ||
path_t m_path; | ||
std::fstream m_out; | ||
}; | ||
|
||
inline append_only_file_t::append_only_file_t(fs::path_t path) | ||
: m_path{std::move(path)} | ||
{ | ||
} | ||
|
||
inline append_only_file_t::~append_only_file_t() noexcept | ||
{ | ||
close(); | ||
} | ||
|
||
inline auto append_only_file_t::open() noexcept -> bool | ||
{ | ||
m_out = std::fstream{m_path, std::fstream::app | std::fstream::ate | std::fstream::out}; | ||
return m_out.is_open(); | ||
} | ||
|
||
inline auto append_only_file_t::open(fs::path_t path) noexcept -> bool | ||
{ | ||
m_path = std::move(path); | ||
return open(); | ||
} | ||
append_only_file_t(int fd, io_uring ring) noexcept; | ||
|
||
inline auto append_only_file_t::is_open() noexcept -> bool | ||
{ | ||
return m_out.is_open(); | ||
} | ||
|
||
inline auto append_only_file_t::close() noexcept -> bool | ||
{ | ||
if (!is_open()) | ||
{ | ||
return true; | ||
} | ||
|
||
m_out.flush(); | ||
m_out.close(); | ||
|
||
// TODO(lnikon): Do we need to recover when we're unable to close the | ||
// stream? | ||
return m_out.bad(); | ||
} | ||
|
||
inline auto append_only_file_t::write(const data_t &data) noexcept -> bool | ||
{ | ||
if (!is_open()) | ||
{ | ||
return false; | ||
} | ||
|
||
m_out << data << '\n'; | ||
m_out.flush(); | ||
|
||
return m_out.bad(); | ||
} | ||
int m_fd{-1}; | ||
io_uring m_ring{}; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Architecture improved with io_uring for async I/O
The implementation now leverages Linux's io_uring for asynchronous I/O operations, which can significantly improve performance for I/O-bound workloads. The file descriptor and io_uring ring are properly initialized and managed.
However, I noticed a TODO comment in the destructor implementation about reference counting. This suggests potential resource management issues that should be addressed.
🏁 Script executed:
#!/bin/bash
# Look for the TODO comment in the destructor implementation
rg -A 5 "TODO.*reference counting" lib/fs/append_only_file.cpp
Length of output: 194
Attention: Address Pending Reference Counting Implementation
The integration of io_uring for asynchronous I/O is well executed—improving performance and proper initialization of the file descriptor and io_uring ring is confirmed. However, the TODO comment in the destructor (in lib/fs/append_only_file.cpp) indicates that a reference counting mechanism is still missing, which is necessary to correctly manage the resource lifecycle. Please implement the reference counting to ensure that the last instance properly closes the resources.
lib/wal/log/persistent_log_storage.h
Outdated
|
||
void append(std::string entry) | ||
{ | ||
if (!m_backendStorage.write(static_cast<const char *>(entry.data()), m_backendStorage.size(), entry.size())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix offset parameter in write operation.
The current implementation is using m_backendStorage.size()
as the offset for writing, which may not be appropriate for appending log entries. This parameter should represent where in the file to write, and for an append-only log, it should typically be the end of the file.
Correct the offset parameter:
- if (!m_backendStorage.write(static_cast<const char *>(entry.data()), m_backendStorage.size(), entry.size()))
+ if (!m_backendStorage.write(static_cast<const char *>(entry.data()), 0, entry.size())) // Use 0 for offset since append_only_file will handle appending
Alternatively, consider implementing an append-specific method in your backend interface to make the intended behavior clearer.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
if (!m_backendStorage.write(static_cast<const char *>(entry.data()), m_backendStorage.size(), entry.size())) | |
if (!m_backendStorage.write(static_cast<const char *>(entry.data()), 0, entry.size())) // Use 0 for offset since append_only_file will handle appending |
m_log.emplace(std::move(fs::append_only_file_builder_t{}.build(m_path.c_str(), true).value())); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handle potential error from .value()
before returning success.
Relying on .value()
can trigger an exception if build(...)
fails. As it stands, the function ignores any possible error and always returns true
, which may mask file-opening failures. Consider explicitly checking the result and logging the error or returning false
if the build fails:
-auto manifest_t::open() -> bool
-{
- m_name = latest_manifest_filename(m_config->DatabaseConfig.DatabasePath);
- m_path = m_config->DatabaseConfig.DatabasePath / m_name;
- m_log.emplace(std::move(fs::append_only_file_builder_t{}.build(m_path.c_str(), true).value()));
- return true;
-}
+auto manifest_t::open() -> bool
+{
+ m_name = latest_manifest_filename(m_config->DatabaseConfig.DatabasePath);
+ m_path = m_config->DatabaseConfig.DatabasePath / m_name;
+
+ auto fileExpected = fs::append_only_file_builder_t{}.build(m_path.c_str(), true);
+ if (!fileExpected) {
+ spdlog::error("Failed to build manifest log file: {}", fileExpected.error().message);
+ return false;
+ }
+
+ m_log.emplace(std::move(fileExpected.value()));
+ return true;
+}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
m_log.emplace(std::move(fs::append_only_file_builder_t{}.build(m_path.c_str(), true).value())); | |
auto manifest_t::open() -> bool | |
{ | |
m_name = latest_manifest_filename(m_config->DatabaseConfig.DatabasePath); | |
m_path = m_config->DatabaseConfig.DatabasePath / m_name; | |
auto fileExpected = fs::append_only_file_builder_t{}.build(m_path.c_str(), true); | |
if (!fileExpected) { | |
spdlog::error("Failed to build manifest log file: {}", fileExpected.error().message); | |
return false; | |
} | |
m_log.emplace(std::move(fileExpected.value())); | |
return true; | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
♻️ Duplicate comments (4)
lib/wal/log/persistent_log_storage.h (1)
146-156
:⚠️ Potential issueBuffer size doesn't account for read result.
The read_impl function doesn't resize the buffer according to the actual number of bytes read, which could lead to extraneous data when fewer bytes are read than requested.
[[nodiscard]] auto read_impl(std::size_t offset, std::size_t size) -> std::string { std::string buffer; buffer.resize(size); - if (const auto res = m_file.read(offset, buffer.data(), size); !res.has_value()) + const auto res = m_file.read(offset, buffer.data(), size); + if (!res.has_value()) { spdlog::error("Failed to read from file storage. Offset={}, size={}", offset, size); return {}; } + // Adjust buffer size to actual bytes read + buffer.resize(static_cast<size_t>(res.value())); return buffer; }lib/fs/append_only_file.cpp (1)
92-96
:⚠️ Potential issueCheck the return value of
io_uring_submit
and reassessIOSQE_IO_LINK
usage.
- The call to
io_uring_submit(&m_ring)
is not checked for error conditions. If submission fails, subsequent operations may silently fail.- The
IOSQE_IO_LINK
flag is usually used to chain multiple operations. Currently, only one write operation is submitted without additional linked SQEs, so it might be unnecessary.Below is an example of how to handle the submission return value:
io_uring_prep_writev(sqe, m_fd, &iov, 1, 0); sqe->flags |= IOSQE_IO_LINK; -int submit_result = io_uring_submit(&m_ring); +int submit_result = io_uring_submit(&m_ring); +if (submit_result < 0) { + return std::unexpected(file_error_t{ + .code = file_error_code_k::write_failed, + .system_errno = -submit_result, + .message = std::format("io_uring_submit failed. fd={}", m_fd), + }); +} io_uring_cqe *cqe;src/main.cpp (1)
427-443
:⚠️ Potential issueFix incorrect error message for missing peers.
When"peers"
is not specified, the thrown error message references"id"
. This misleads users:- throw std::runtime_error("\"id\" is not specified in the config"); + throw std::runtime_error("\"peers\" is not specified in the config");lib/raft/raft.h (1)
108-209
:⚠️ Potential issue
consensus_module_t
concurrency and data members require attention.
bool m_shutdown
is not guarded bym_stateMutex
nor is it atomic, yet it’s used in multiple threads. This can introduce data races.
- Use
std::atomic<bool>
or protect it with the same mutex for consistent access.- Good use of
ABSL_GUARDED_BY(m_stateMutex)
for critical variables likem_currentTerm
,m_votedFor
, etc.- Consider adding comments or invariants specifying how each field is synchronized.
- bool m_shutdown{false}; + std::atomic<bool> m_shutdown{false};
🧹 Nitpick comments (12)
lib/wal/log/persistent_log_storage.h (4)
73-73
: Fix incorrect return type in copy assignment operator.The return type should be a reference to the object, not an rvalue reference.
-auto operator=(const storage_backend_builder_t &) -> storage_backend_builder_t && = delete; +auto operator=(const storage_backend_builder_t &) -> storage_backend_builder_t & = delete;
194-204
: Simplify error handling in build_impl.The switch statement with only a default case is unnecessarily complex and doesn't provide specific error messages based on the actual error code.
auto file = fs::append_only_file_builder_t{}.build(config().file_path.c_str(), true); if (!file) { - const auto &error{file.error()}; - switch (error.code) - { - default: - return std::unexpected(storage_backend_builder_error_t::kUnableToOpenFile); - } + // Log specific error details before returning the generic error + spdlog::error("Failed to open file: {}", config().file_path.c_str()); + return std::unexpected(storage_backend_builder_error_t::kUnableToOpenFile); } return file_storage_backend_t(std::move(file.value()));
72-76
: Use consistent move semantics.Move operations are deleted for
storage_backend_builder_t
but the base class correctly implements move operations. Consider whether move operations should be allowed for consistency.-storage_backend_builder_t(storage_backend_builder_t &&) = delete; -auto operator=(storage_backend_builder_t &&) -> storage_backend_builder_t && = delete; +storage_backend_builder_t(storage_backend_builder_t &&) noexcept = default; +auto operator=(storage_backend_builder_t &&) noexcept -> storage_backend_builder_t & = default;
220-222
: Improve compile-time error message for unsupported backend types.The current static_assert will simply state "not supported backend storage type passed" without specifying which type was passed. Consider including the type in the error message.
- static_assert(false, "not supported backend storage type passed"); + static_assert(false, + "Unsupported backend storage type passed: Use of unimplemented storage backend");lib/fs/append_only_file.cpp (4)
69-69
: Consider addressing theTODO
for reference counting.Line 69 mentions a
TODO
comment about needing reference counting to manage this resource safely across multiple owners. Without it, concurrent or shared ownership scenarios may lead to premature resource closing or descriptor leaks.I can help design and implement a reference-counted wrapper around the file descriptor and ring if you’d like.
112-126
: Consider aligning read operations with the asynchronous design.While the file offers asynchronous appends, it currently uses synchronous
pread
for reads. If the expectation is to handle very large reads or maintain consistent async I/O semantics, refactoring to use io_uring for reads (or providing a clear reason for sync I/O) could improve performance or clarity.
172-209
: Consider large-file streaming implications.Reading the entire file into a
std::stringstream
can consume a lot of memory for very large files. If the file can grow large, consider chunked processing or a streaming callback approach.
228-264
: Review alignment requirements for direct I/O.When
direct_io
is true, some platforms (like Linux) require buffer alignment (usually to the filesystem block size) for successful reads and writes. If callers supply unaligned buffers—like typical std::string data—operations might fail or degrade. Document these constraints or add alignment checks to ensure correct usage.src/main.cpp (2)
35-208
: Extensive JSON schema definition is well-organized.
This large schema covers logging, database, WAL, LSM-tree, and server configuration. Everything seems consistent. If you foresee adding more components, consider modularizing the schema definitions into separate smaller schemas for ease of maintenance.
484-495
: Handling SIGTERM and SIGINT signals is correct.
Installing the handlers for both signals ensures orderly shutdown. If you plan to handle other signals likeSIGHUP
orSIGQUIT
for advanced scenarios, consider adding them here.lib/raft/raft.h (1)
29-74
: Remove large block of commented-out client code or clarify its purpose.
If thetkvpp_node_grpc_client_t
class is no longer needed, consider deleting it instead of leaving it commented out. This reduces clutter and confusion.-// class tkvpp_node_grpc_client_t -// { -// ... -// }; +// (Remove if no longer needed)lib/raft/raft.cpp (1)
336-463
:replicate
API, elections, and node shutdown.
- Forwarding “put requests” to the leader is present but incomplete (lines 368–374). If leader redirection is required, you might return a specialized status or store the leader’s IP for direct re-connection.
- The election thread and heartbeat thread approach is valid but ensure no race conditions with
m_shutdown
.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
lib/fs/append_only_file.cpp
(1 hunks)lib/raft/raft.cpp
(1 hunks)lib/raft/raft.h
(1 hunks)lib/wal/log/persistent_log_storage.h
(1 hunks)src/main.cpp
(6 hunks)
🧰 Additional context used
🧬 Code Definitions (5)
src/main.cpp (5)
lib/server/server_kind.cpp (4)
from_string
(20-33)from_string
(20-20)to_string
(7-18)to_string
(7-7)lib/server/server_kind.h (2)
from_string
(22-22)to_string
(20-20)lib/wal/common.h (2)
from_string
(25-25)to_string
(24-24)lib/wal/common.cpp (2)
from_string
(25-25)to_string
(10-10)lib/wal/wal.h (2)
to_string
(259-272)to_string
(259-259)
lib/wal/log/persistent_log_storage.h (3)
lib/wal/log/in_memory_log_storage.h (7)
nodiscard
(50-57)nodiscard
(59-63)nodiscard
(65-68)nodiscard
(85-88)entry
(45-48)entry
(45-45)index
(50-50)lib/wal/log/log.h (7)
nodiscard
(62-65)nodiscard
(67-70)nodiscard
(72-75)nodiscard
(125-143)entry
(57-60)entry
(57-57)index
(62-62)lib/fs/append_only_file.h (2)
data
(27-27)offset
(28-28)
lib/fs/append_only_file.cpp (1)
lib/fs/append_only_file.h (9)
append_only_file_t
(17-17)append_only_file_t
(19-19)append_only_file_t
(22-22)append_only_file_t
(25-25)append_only_file_t
(37-37)other
(20-20)data
(27-27)offset
(28-28)path
(46-46)
lib/raft/raft.cpp (2)
lib/raft/raft.h (15)
raft_node_grpc_client_t
(88-88)raft_node_grpc_client_t
(89-89)raft_node_grpc_client_t
(89-89)raft_node_grpc_client_t
(91-91)raft_node_grpc_client_t
(94-94)request
(97-97)request
(98-98)request
(152-152)consensus_module_t
(112-112)pContext
(115-117)pContext
(119-120)client
(149-149)index
(162-162)logIndex
(154-154)commitIndex
(165-166)lib/raft/raft_test.cpp (1)
id
(59-69)
lib/raft/raft.h (2)
lib/raft/raft.cpp (3)
raft_node_grpc_client_t
(92-99)votedFor
(471-475)votedFor
(471-471)lib/wal/log/persistent_log_storage.h (17)
nodiscard
(43-46)nodiscard
(48-51)nodiscard
(53-56)nodiscard
(58-61)nodiscard
(80-87)nodiscard
(89-92)nodiscard
(141-144)nodiscard
(146-156)nodiscard
(158-161)nodiscard
(163-166)nodiscard
(182-190)nodiscard
(192-205)nodiscard
(269-277)nodiscard
(279-283)nodiscard
(285-288)nodiscard
(333-349)index
(269-269)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: build-gcc
- GitHub Check: build-clang
🔇 Additional comments (16)
lib/wal/log/persistent_log_storage.h (6)
143-143
: Handle partial writes robustly.The current implementation only checks if the result is >= 0, but doesn't verify that all bytes were written.
226-242
: Gracefully handle empty or corrupt files in constructor.The
persistent_log_storage_t
constructor reads the entire file content at once, which could be inefficient for large files. Consider reading in chunks. Additionally, there's no error handling if the read fails.
260-267
: Fix missing newline in append method.The append method doesn't add newlines between entries, but the constructor reads the file line by line. This inconsistency will cause problems when loading persisted logs.
285-288
: Inconsistency between size() and read() methods.The
size()
method returnsm_backendStorage.size()
butread()
usesm_inMemoryLog.size()
. This inconsistency could lead to subtle bugs.
262-262
: Fix offset parameter in write operation.The current implementation is using
m_backendStorage.size()
as the offset for writing, which may not be appropriate for appending log entries. This parameter should represent where in the file to write, and for an append-only log, it should typically be the end of the file.
193-204
:❓ Verification inconclusive
Handle file opening errors more gracefully.
Consider checking for and handling specific error codes from
append_only_file_builder_t
. This would allow for more informative error messages and potentially recovery options.auto file = fs::append_only_file_builder_t{}.build(config().file_path.c_str(), true); if (!file) { const auto &error{file.error()}; + // Map specific file errors to appropriate builder errors + switch (error.code) + { + case fs::file_error_code_t::kFileNotFound: + spdlog::error("File not found: {}", config().file_path.c_str()); + return std::unexpected(storage_backend_builder_error_t::kUnableToOpenFile); + case fs::file_error_code_t::kPermissionDenied: + spdlog::error("Permission denied for file: {}", config().file_path.c_str()); + return std::unexpected(storage_backend_builder_error_t::kUnableToOpenFile); + // Add cases for other relevant error codes default: + spdlog::error("Unknown error opening file: {} (code: {})", + config().file_path.c_str(), static_cast<int>(error.code)); return std::unexpected(storage_backend_builder_error_t::kUnableToOpenFile); } }
🏁 Script executed:
#!/bin/bash # Find error code enum definition to validate error cases rg -A 10 -B 5 "enum class .*file_error" --type cppLength of output: 52
Manual Verification Required: Confirm File Error Code Definitions
The review comment suggests a refactor for handling file opening errors by mapping specific error codes to more informative error messages. However, our preliminary search (using
rg -A 10 -B 5 "enum class .*file_error" --type cpp
) did not return any results for enumerations likefs::file_error_code_t::kFileNotFound
orfs::file_error_code_t::kPermissionDenied
.Before proceeding with the suggested changes, please verify manually whether these error code definitions exist in the codebase (for example, by searching for
fs::file_error_code_t
or specific constants such askFileNotFound
andkPermissionDenied
). If they exist, consider implementing the detailed error handling as proposed; if not, the current generic error handling might remain the best option.lib/fs/append_only_file.cpp (2)
142-153
: Flush method looks good.This flush method properly checks
fsync
return values, avoiding silent failures.
155-170
: Reset method properly handles truncate and seek errors.The error handling for both
ftruncate
andlseek
is correct, with descriptive messages. This looks solid.src/main.cpp (3)
4-5
: Imports for Raft and gRPC server look good.
No concerns here. These headers are necessary for implementing Raft consensus and the gRPC-based server.
13-14
: Signal and thread headers are appropriate.
Including<csignal>
and<thread>
is correct for handling signals and threading in the main application.
210-218
: Global shutdown flag and signal handler are suitable but consider concurrency details.
- Storing
gShutdown
as anstd::atomic<bool>
is good.- The signal handler avoids throwing exceptions or performing nontrivial work, which is correct.
- Consider whether you need memory ordering constraints (
memory_order_*
) onstd::atomic<bool>
for thread visibility, though usage here appears straightforward.Could you confirm if all threads that read
gShutdown
do so in loops where this relaxation is safe?lib/raft/raft.h (2)
1-26
: Header includes and namespaces look correct.
They align with your usage of gRPC, absl, SPDLOG, and standard library features.
75-106
:raft_node_grpc_client_t
: Good separation of client logic.
- The constructor asserts valid ID and IP, preventing meaningless node configs.
- Consider documenting timeouts or gRPC call specifics here.
- The interface is clear and straightforward for Raft RPC calls.
lib/raft/raft.cpp (3)
1-49
: Initialization of random timeouts and file-scoped constants looks fine.
The utility for generating random timeouts is helpful to avoid election collisions.
91-152
:raft_node_grpc_client_t
implementation is logically consistent.
- The random per-call timeout is a good idea for resilience.
- Logging errors on failure helps in diagnosing connectivity or RPC issues.
277-334
:RequestVote
logic looks correct, but test carefully.
- You correctly revert to follower if a higher term request arrives.
- The up-to-date check on candidate logs is handled by comparing lastLogTerm and lastLogIndex.
- If you haven’t done so, consider adding thorough test coverage for edge cases like tie-terms or logs with the same term/different index.
lib/fs/append_only_file.cpp
Outdated
if (res < 0) | ||
{ | ||
return std::unexpected(file_error_t{ | ||
.code = file_error_code_k::write_failed, | ||
.system_errno = -res, | ||
.message = std::format("Write operation failed. fd={}", m_fd), | ||
}); | ||
} | ||
return res; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Handle potential partial writes.
If the write operation only partially writes the data, the caller might not retry, potentially losing data. Consider looping until all data is written or returning an explicit partial write result here.
src/main.cpp
Outdated
return EXIT_FAILURE; | ||
} | ||
|
||
std::visit( | ||
[](auto &server) | ||
auto pDatabase = db::make_shared(pDbConfig, std::move(wal.value())); | ||
if (!pDatabase->open()) | ||
{ | ||
spdlog::error("Unable to open the database"); | ||
return EXIT_FAILURE; | ||
} | ||
|
||
if (pDbConfig->ServerConfig.id == 0) | ||
{ | ||
spdlog::error("ID of the node should be positve integer"); | ||
return EXIT_FAILURE; | ||
} | ||
|
||
if (pDbConfig->ServerConfig.peers.empty()) | ||
{ | ||
spdlog::error("List of node IPs can't be empty"); | ||
return EXIT_FAILURE; | ||
} | ||
|
||
// Prepare config for replicas | ||
std::vector<raft::raft_node_grpc_client_t> replicas; | ||
for (raft::id_t replicaId{1}; const auto &replicaIp : pDbConfig->ServerConfig.peers) | ||
{ | ||
if (replicaId != pDbConfig->ServerConfig.id) | ||
{ | ||
using T = std::decay_t<decltype(server)>; | ||
if constexpr (std::is_same_v<T, std::monostate>) | ||
{ | ||
return; | ||
} | ||
else | ||
{ | ||
server.shutdown(); | ||
} | ||
}, | ||
server); | ||
std::unique_ptr<RaftService::Stub> stub{ | ||
RaftService::NewStub(grpc::CreateChannel(replicaIp, grpc::InsecureChannelCredentials()))}; | ||
|
||
replicas.emplace_back(raft::node_config_t{.m_id = replicaId, .m_ip = replicaIp}, std::move(stub)); | ||
spdlog::info("replicaId={} replicaIp={}", replicaId, replicaIp); | ||
} | ||
|
||
++replicaId; | ||
} | ||
|
||
// Create current nodes config | ||
raft::node_config_t nodeConfig{ | ||
.m_id = pDbConfig->ServerConfig.id, | ||
.m_ip = | ||
fmt::format("{}:{}", pDatabase->config()->ServerConfig.host, pDatabase->config()->ServerConfig.port)}; | ||
|
||
// Start building gRPC server. Listen on current nodes host:port | ||
grpc::ServerBuilder grpcBuilder; | ||
grpcBuilder.AddListeningPort(nodeConfig.m_ip, grpc::InsecureServerCredentials()); | ||
|
||
// Create consensus module and add it into gRPC server | ||
auto pConsensusModule = std::make_shared<raft::consensus_module_t>(nodeConfig, std::move(replicas)); | ||
if (!pConsensusModule->init()) | ||
{ | ||
spdlog::error("Failed to initialize the state machine"); | ||
return EXIT_FAILURE; | ||
} | ||
grpcBuilder.RegisterService(dynamic_cast<RaftService::Service *>(pConsensusModule.get())); | ||
|
||
// Create KV service and add it into gRPC server | ||
auto kvService = std::make_unique<server::grpc_communication::tinykvpp_service_impl_t>(pDatabase); | ||
grpcBuilder.RegisterService(kvService.get()); | ||
|
||
// Create gRPC server | ||
std::unique_ptr<grpc::Server> pServer{std::unique_ptr<grpc::Server>(grpcBuilder.BuildAndStart())}; | ||
|
||
// Start consensus module and gRPC server | ||
auto serverThread = std::jthread([&pServer] { pServer->Wait(); }); | ||
pConsensusModule->start(); | ||
|
||
spdlog::info("Running {} server on {}:{}", pDbConfig->ServerConfig.transport, pDbConfig->ServerConfig.host, pDbConfig->ServerConfig.port); | ||
|
||
while (!gShutdown) | ||
{ | ||
std::this_thread::yield(); | ||
} | ||
|
||
spdlog::debug("Node={} is requesting server shutdown", nodeConfig.m_id); | ||
pServer->Shutdown(); | ||
|
||
spdlog::debug("Node={} is joining the server thread", nodeConfig.m_id); | ||
if (serverThread.joinable()) | ||
{ | ||
serverThread.join(); | ||
spdlog::debug("Node={} joined the server thread", nodeConfig.m_id); | ||
} | ||
|
||
pConsensusModule->stop(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Review WAL initialization, DB startup, and server logic.
- Typo: Line 568 says “positve integer” → “positive integer.”
- ID vs. number of peers: You assume node IDs from 1..N, but the loop at lines 580–592 enumerates peers from 1 upwards. This may break if config IDs don’t match 1-based indexing.
- Possible improvement: If you want a more flexible configuration, consider mapping peer addresses to IDs directly rather than incrementing
replicaId
. - Check concurrency: The open process and WAL checks are done once; if the directory doesn’t exist, you return an error. You might want to create the directory if missing, or ensure the docs specify pre-creating it.
Overall, the logic is coherent; just ensure configuration assumptions are documented or validated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (1)
lib/raft/raft.h (1)
200-202
:⚠️ Potential issueGuard or atomically protect
m_shutdown
.
m_shutdown
is a plainbool
that is repeatedly read/written from different threads (e.g., election thread, main thread). To avoid data races, either protect it underm_stateMutex
or convert it tostd::atomic<bool>
.- bool m_shutdown{false}; + std::atomic<bool> m_shutdown{false};
🧹 Nitpick comments (2)
lib/raft/raft.cpp (2)
52-89
: Remove or finalize the commented-outtkvpp_node_grpc_client_t
.The commented-out code for
tkvpp_node_grpc_client_t
is adding clutter and may cause confusion. Delete it if it’s no longer needed, or separate it into its own file if it will be revisited.-/// tkvpp_node_grpc_client_t -// ... -// { -// // ... -// } +// (Either remove this code entirely or move it to a different file if you plan to reuse it.)
694-745
: Potential performance concern with per-replica threads insendRequestVoteRPCs
.Spawning one thread for each replica to send RequestVote RPCs might be excessive for large clusters. Consider reusing a smaller thread pool or using asynchronous gRPC calls to limit overhead while preserving concurrency.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
lib/raft/raft.cpp
(1 hunks)lib/raft/raft.h
(1 hunks)
🧰 Additional context used
🧬 Code Definitions (1)
lib/raft/raft.cpp (2)
lib/raft/raft.h (21)
raft_node_grpc_client_t
(88-88)raft_node_grpc_client_t
(89-89)raft_node_grpc_client_t
(89-89)raft_node_grpc_client_t
(91-91)raft_node_grpc_client_t
(94-94)request
(97-97)request
(98-98)request
(145-145)consensus_module_t
(112-112)pContext
(115-117)pContext
(119-120)logEntry
(127-127)client
(140-140)client
(147-147)token
(141-141)token
(143-143)newTerm
(138-138)votes
(155-155)index
(156-156)logIndex
(148-148)commitIndex
(159-160)lib/raft/raft_test.cpp (1)
id
(59-69)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: build-gcc
- GitHub Check: build-clang
🔇 Additional comments (2)
lib/raft/raft.h (1)
79-106
: No immediate concerns forraft_node_grpc_client_t
class.The class design and usage of gRPC stubs here look fine. No obvious concurrency or correctness issues are apparent in the constructor or member functions.
lib/raft/raft.cpp (1)
902-966
: Persist log data safely with an explicit fsync call.The TODO notes "# TODO(lnikon): ::fsync(...)". Omitting an explicit fsync can cause data loss if OS buffers aren't persisted to disk during a crash. A dedicated sync step ensures stronger durability guarantees for the log.
// Example (POSIX): // #include <unistd.h> // // int fd = fileno(fsa.native_handle()); // if (fd != -1) { // ::fsync(fd); // }
// TODO(lnikon): Update the state machine! | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Implement the state machine updates or remove the TODO.
The TODO indicates that the log entries are never applied to any state machine after being committed. This prevents changes from having any real effect in the key-value store (or other replicated state).
Would you like me to draft an implementation snippet and open a new issue to track the state machine integration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
lib/raft/raft.h (1)
47-73
: Remove commented-out tkvpp_node_grpc_client_t code.
This block of commented-out code appears to be an older or alternative client implementation. Since you now haveraft_node_grpc_client_t
, consider removing this commented-out section to reduce clutter and maintain clarity.-// class tkvpp_node_grpc_client_t -// { -// public: -// ... -// };lib/raft/raft.cpp (3)
250-252
: Implement state machine updates for committed entries.
Line 251’s TODO indicates that appended log entries are not applied to a state machine after commit. This prevents replicated data from taking effect.Do you want me to draft a snippet for applying these committed entries to your key-value store (or another user-defined state machine) and open a new issue to track this?
540-550
: Consider asynchronous or batched heartbeats.
Currently, heartbeats are sent in a loop, potentially creating overhead in busy clusters. Using asynchronous calls or a batched approach (e.g., a single dispatch to multiple peers) could reduce overhead and improve throughput.
628-677
: Avoid spawning a new thread for each RequestVote RPC.
Creating multiple jthreads insendRequestVoteRPCs
may cause performance overhead, especially with large clusters. Consider using a shared thread pool or an asynchronous/fan-out pattern for multiple RPCs.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
lib/raft/raft.cpp
(1 hunks)lib/raft/raft.h
(1 hunks)
🧰 Additional context used
🧬 Code Definitions (1)
lib/raft/raft.h (1)
lib/raft/raft.cpp (3)
raft_node_grpc_client_t
(91-98)votedFor
(472-476)votedFor
(472-472)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: build-gcc
- GitHub Check: build-clang
🔇 Additional comments (4)
lib/raft/raft.h (2)
79-106
: Client interface looks good.
The design ofraft_node_grpc_client_t
is clean, with clear responsibilities for Raft RPC calls and retrieving node metadata. No immediate issues found.
210-210
: Synchronize or atomically store m_shutdown.
Access tom_shutdown
is not guarded by a mutex or made atomic. Using it unguarded in multi-threaded code can cause data races.lib/raft/raft.cpp (2)
420-424
: Use proper synchronization or atomics for m_shutdown checks.
You are readingm_shutdown
in a loop without holding a lock, and it’s declared as a plain bool. This can introduce a data race or inconsistent memory behavior.
866-867
: Ensure log durability with fsync or an equivalent.
Line 866 and 894 have TODOs regarding callingfsync
(or a similar platform-specific call) after writing. Without syncing, there’s a risk of data loss on crashes.Also applies to: 894-895
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
lib/raft/raft.cpp (2)
117-117
: Pass 'nodeConfig' by const reference to avoid unnecessary copying.Leveraging a const reference avoids creating a copy of the configuration object, which can improve performance slightly and is a common best practice in C++.
-consensus_module_t::consensus_module_t(node_config_t nodeConfig, std::vector<raft_node_grpc_client_t> replicas) noexcept +consensus_module_t::consensus_module_t(const node_config_t& nodeConfig, std::vector<raft_node_grpc_client_t> replicas) noexcept🧰 Tools
🪛 Cppcheck (2.10-2)
[performance] 117-117: Function parameter 'nodeConfig' should be passed by const reference.
(passedByValue)
460-460
: Consider making the heartbeat calls truly asynchronous.The comment “TODO: Make these calls async” indicates an intention to handle RPC calls in a non-blocking fashion, which can help with performance and throughput under high load. Evaluate using a thread pool or event-driven model to accomplish this.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
lib/raft/raft.cpp
(1 hunks)
🧰 Additional context used
🪛 Cppcheck (2.10-2)
lib/raft/raft.cpp
[performance] 117-117: Function parameter 'nodeConfig' should be passed by const reference.
(passedByValue)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: build-gcc
- GitHub Check: build-clang
🔇 Additional comments (4)
lib/raft/raft.cpp (4)
247-248
: Apply the state machine updates or remove the TODO.
458-459
: Prevent data race onm_shutdown
in concurrent loops.Both
runHeartbeatThread
(line 458) andrunElectionThread
(line 586) readm_shutdown
without a lock, while other parts of the code modifym_shutdown
under a mutex. This can lead to inconsistent reads ofm_shutdown
.Also applies to: 586-587
569-570
: Finish the state machine application or remove the TODO.The comment indicates the need to apply committed log entries to the state machine. Without this step, data won’t be reflected in the replicated state.
782-844
: Use fsync (or similar) to ensure log durability.Currently, the code only flushes data to the OS buffers and does not perform an explicit fsync. For strict Raft persistence guarantees, consider adding a mechanism to fully synchronize the data to disk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (3)
lib/raft/raft.h (1)
185-185
:⚠️ Potential issueProtect
m_shutdown
against concurrent read-write access.
Accessingm_shutdown
from multiple threads without synchronization can cause a data race. Either mark it asstd::atomic<bool>
or always guard it withm_stateMutex
. This issue was raised in a past review and still appears to be unresolved.- bool m_shutdown{false}; + std::atomic<bool> m_shutdown{false};lib/raft/raft.cpp (2)
458-458
:⚠️ Potential issueSynchronize or make
m_shutdown
atomic when used in loops.
The loop condition!m_shutdown
is read outside of a lock. This leads to a data race sincem_shutdown
may be set to true in another thread. Markm_shutdown
asstd::atomic<bool>
or guard accesses with the mutex.
846-862
: 🛠️ Refactor suggestionConsider using fsync for improved durability.
Reliability of Raft depends on durable writes. Simply flushing the stream might not guarantee disk synchronization. Usefsync
or equivalent on supported platforms to ensure data is fully persisted, especially after writing the log.
🧹 Nitpick comments (3)
lib/raft/raft.h (1)
117-117
: Consider passingnodeConfig
by const reference.
A small performance optimization can be achieved by passing thenodeConfig
parameter by const reference instead of by value, as flagged by the static analysis hints. This avoids copying the entire struct when constructing theconsensus_module_t
.- consensus_module_t(node_config_t nodeConfig, std::vector<raft_node_grpc_client_t> replicas) noexcept; + consensus_module_t(const node_config_t &nodeConfig, std::vector<raft_node_grpc_client_t> replicas) noexcept;lib/raft/raft.cpp (2)
180-196
: Correct spelling of “received.”
There are multiple instances of “recevied” in the log messages. Consider fixing them for clarity and consistency.- spdlog::debug("Node={} recevied AppendEntries RPC from leader={} during term={}", + spdlog::debug("Node={} received AppendEntries RPC from leader={} during term={}",
247-248
: Implement or remove the TODO for state machine updates.
Currently, committed log entries are not applied to an actual state machine, which undermines Raft’s core functionality. Consider adding or wiring a proper state machine interface to complete the replication pipeline.Would you like me to draft a snippet for state machine integration and open an issue?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
lib/raft/raft.cpp
(1 hunks)lib/raft/raft.h
(1 hunks)
🧰 Additional context used
🧬 Code Definitions (1)
lib/raft/raft.h (1)
lib/raft/raft.cpp (4)
raft_node_grpc_client_t
(55-61)consensus_module_t
(117-137)votedFor
(407-411)votedFor
(407-407)
🪛 Cppcheck (2.10-2)
lib/raft/raft.cpp
[performance] 117-117: Function parameter 'nodeConfig' should be passed by const reference.
(passedByValue)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: build-gcc
- GitHub Check: build-clang
🔇 Additional comments (1)
lib/raft/raft.cpp (1)
884-884
: Re-check logic overwritingm_votedFor
after file read.
The code zeroes outm_votedFor
immediately after reading a stored value, making the restored value ineffective. Verify that this behavior is truly intended.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 18
♻️ Duplicate comments (15)
lib/raft/CMakeLists.txt (1)
8-11
: Proper test setup with no duplicate compilation.The RaftTest executable is correctly set up to link against RaftLib rather than compiling raft.cpp twice. This addresses the issue mentioned in the previous review.
lib/fs/append_only_file.cpp (1)
67-71
: Good use of std::expected for error handling.The implementation has addressed previous concerns about error handling in methods like
flush
andreset
by usingstd::expected
to properly propagate errors from the underlyingrandom_access_file_t
. This is a significant improvement over the previous implementation.Also applies to: 73-77
assets/database_config_schema.json (1)
53-74
:⚠️ Potential issue
maximumLevels
is required but still undefinedThe
"lsm"
object lists"maximumLevels"
in itsrequired
array (lines 68‑73) yet never declares it under"properties"
. This causes every config to fail validation."flushThreshold": { "type": "integer", "description": "The threshold of bytes at which the memtable should be flushed", "minimum": 1 }, + "maximumLevels": { + "type": "integer", + "description": "Maximum number of levels in the LSM tree", + "minimum": 1 + },lib/fs/append_only_file.h (1)
43-45
: Returningstd::stringstream
by value may be heavyConstructing and copying a whole
stringstream
every call could be costly for large logs. Consider returning a reference/wrapper or providing an API that streams directly to the caller.lib/raft/raft.h (1)
213-215
:⚠️ Potential issue
m_shutdown
is non‑atomic and unguarded → data‑race risk
m_shutdown
is read/written from multiple threads (heartbeat, election, gRPC handlers) but is neitherstd::atomic
nor protected bym_stateMutex
, reopening the race condition highlighted in earlier review cycles.- bool m_shutdown{false}; + std::atomic<bool> m_shutdown{false};or guard every access with
m_stateMutex
.lib/wal/log/persistent_log_storage.h (5)
158-173
: Fix buffer size handling in read_impl.The method doesn't resize the buffer according to the actual number of bytes read, which could lead to extraneous data when fewer bytes are read than requested.
[[nodiscard]] auto read_impl(std::size_t offset, std::size_t size) -> std::string { std::string buffer; buffer.resize(size); - if (const auto res = m_file.read(offset, buffer.data(), size); - !res.has_value()) + const auto res = m_file.read(offset, buffer.data(), size); + if (!res.has_value()) { spdlog::error( "Failed to read from file storage. Offset={}, size={}", offset, size); return {}; } + // Adjust buffer size to actual bytes read + buffer.resize(static_cast<size_t>(res.value())); return buffer; }
292-306
: Add newline separators to entries for line-by-line reading.The append method doesn't add newlines between entries, but the constructor reads the file line by line. This inconsistency will cause problems when loading persisted logs.
[[nodiscard]] auto append(std::string entry) -> bool { + // Add newline to separate entries when reading line by line + std::string entry_with_newline = entry + '\n'; if (!m_backendStorage.write(static_cast<const char *>(entry.data()), m_backendStorage.size(), entry.size())) { spdlog::error( "Persistent log storage write failed. Entry={}, size={}\n", entry, entry.size()); return false; } m_inMemoryLog.emplace_back(std::move(entry)); return true; }
294-296
: Fix offset parameter in write operation.The current implementation is using
m_backendStorage.size()
as the offset for writing, which may not be appropriate for appending log entries. This parameter should represent where in the file to write, and for an append-only log, it should typically be the end of the file.- if (!m_backendStorage.write(static_cast<const char *>(entry.data()), - m_backendStorage.size(), - entry.size())) + if (!m_backendStorage.write(static_cast<const char *>(entry.data()), + 0, // Use 0 for offset since append_only_file will handle appending + entry.size()))
331-334
: Inconsistency between size() and read() methods.The
size()
method returnsm_backendStorage.size()
butread()
usesm_inMemoryLog.size()
. This inconsistency could lead to subtle bugs.[[nodiscard]] auto size() const -> std::size_t { - return m_backendStorage.size(); + return m_inMemoryLog.size(); }
254-268
: Optimize file reading in constructor and add error handling.The
persistent_log_storage_t
constructor reads the entire file content at once, which could be inefficient for large files. Consider reading in chunks. Additionally, there's no error handling if the read fails.explicit persistent_log_storage_t(TBackendStorage &&backendStorage) : m_backendStorage(std::move(backendStorage)) { - const std::string raw = - m_backendStorage.read(0, m_backendStorage.size()); - std::istringstream stream(raw); - for (std::string line; std::getline(stream, line);) - { - if (absl::StripAsciiWhitespace(line).empty()) - { - continue; - } - m_inMemoryLog.emplace_back(std::move(line)); + const size_t total_size = m_backendStorage.size(); + if (total_size == 0) { + return; // Empty file, nothing to load + } + + try { + // Read in chunks for efficiency with large files + constexpr size_t chunk_size = 4096; + size_t offset = 0; + std::string buffer; + std::string line_buffer; + + while (offset < total_size) { + size_t to_read = std::min(chunk_size, total_size - offset); + std::string chunk = m_backendStorage.read(offset, to_read); + offset += chunk.size(); + + if (chunk.empty()) { + spdlog::warn("Read returned empty chunk at offset {}", offset); + break; + } + + buffer += chunk; + + // Process complete lines + std::istringstream stream(buffer); + while (std::getline(stream, line_buffer)) { + if (!absl::StripAsciiWhitespace(line_buffer).empty()) { + m_inMemoryLog.emplace_back(std::move(line_buffer)); + } + line_buffer.clear(); + } + + // Keep any incomplete last line for the next iteration + buffer = stream.eof() ? "" : line_buffer; + } + } catch (const std::exception& e) { + spdlog::error("Error loading log entries: {}", e.what()); } }lib/wal/wal.h (2)
78-88
: Double-check copy/move semantics in a concurrent environment.Allowing both copy and move semantics for WAL could lead to resource duplication or unexpected log states if concurrency is involved. Consider marking copy operations as deleted or making them explicit if single ownership is desired.
111-112
: Mismatch between documentation and implementation in reset().The docstring says a
std::runtime_error
is thrown if reopening the log fails, but the implementation just returns a bool. Ensure the implementation either throws on failure or update the documentation to reflect the actual behavior.lib/raft/raft.cpp (3)
272-273
: Implement state machine updates.There are TODO comments indicating that the state machine updates are missing. This is crucial for maintaining consistency in a Raft implementation, as committed log entries need to be applied to the state machine.
while (m_lastApplied < m_commitIndex) { ++m_lastApplied; - // TODO(lnikon): Update the state machine! + // Apply the log entry to the state machine + if (m_lastApplied > 0 && m_lastApplied <= m_log.size()) { + const auto& entry = m_log[m_lastApplied - 1]; + applyToStateMachine(entry); + } }You would need to add a method to apply log entries to the state machine:
void consensus_module_t::applyToStateMachine(const LogEntry& entry) { // Update key-value store or other state based on the log entry spdlog::info("Applying log entry: key={}, value={}", entry.key(), entry.value()); // Actual state machine update would go here }
571-627
: Avoid spawning a new thread for every append.Each
sendAppendEntriesRPC
operation potentially spawns fresh threads, which can severely degrade performance under high load. Consider using a thread pool or a limited worker model to manage concurrency more efficiently.Consider implementing a thread pool:
class ThreadPool { private: std::vector<std::thread> workers; std::queue<std::function<void()>> tasks; std::mutex queue_mutex; std::condition_variable condition; bool stop; public: ThreadPool(size_t num_threads) : stop(false) { for (size_t i = 0; i < num_threads; ++i) { workers.emplace_back([this] { while (true) { std::function<void()> task; { std::unique_lock<std::mutex> lock(queue_mutex); condition.wait(lock, [this] { return stop || !tasks.empty(); }); if (stop && tasks.empty()) return; task = std::move(tasks.front()); tasks.pop(); } task(); } }); } } template<class F> void enqueue(F&& f) { { std::unique_lock<std::mutex> lock(queue_mutex); tasks.emplace(std::forward<F>(f)); } condition.notify_one(); } ~ThreadPool() { { std::unique_lock<std::mutex> lock(queue_mutex); stop = true; } condition.notify_all(); for (std::thread &worker : workers) { worker.join(); } } };Then update the
sendAppendEntriesRPC
method to use this thread pool instead of spawning new threads for each operation.
893-963
: Add fsync for log reliability.While the log is flushed to disk using
flush()
, it is not explicitly synced withfsync
or an equivalent system call. For true Raft safety, this is essential to ensure durability in case of system crashes.fsa.flush(); + +// Ensure data is durably stored on disk +#ifdef _WIN32 + _commit(_fileno(fsa)); +#else + if (fsync(fileno(fsa)) != 0) { + spdlog::error("Node={} fsync failed for file {}: {}", + m_config.m_id, + path.c_str(), + strerror(errno)); + return false; + } +#endifYou would need to add the appropriate includes:
#ifdef _WIN32 #include <io.h> #else #include <unistd.h> #include <errno.h> #include <string.h> #endif
🧹 Nitpick comments (26)
.gitignore (1)
34-48
: Suggestion: Expand.gitignore
to cover OS, IDE, CMake & Conan artifacts
The current minimal ignore list omits several common files generated by the new CMake-based build, Conan, and developer IDEs—e.g., VSCode settings, macOS.DS_Store
, CMake cache/folders, and Conan info files. Consider adding these patterns to prevent accidental commits of generated artifacts.Apply this diff to extend the ignore rules:
@@ -48,0 +49,17 + # Editor directories + .vscode/ + + # macOS & Windows system files + .DS_Store + Thumbs.db + + # CMake build artifacts + CMakeCache.txt + CMakeFiles/ + cmake_install.cmake + Makefile + + # Conan generated files + conanbuildinfo.cmake + conanbuildinfo.txt + conaninfo.txt + + # Coverage reports + *.gcno + *.gcdalib/posix_wrapper/posix_wrapper_test.cpp (2)
7-18
: Comments are helpful but could be more consistentThe comment provides useful documentation on the enum-to-flag mapping, which helps developers understand the test context. Consider structuring it as a more formal comment block with consistent formatting.
-// Mapping between enum and POSIX open flags -// kReadOnly = O_RDONLY, -// kWriteOnly = O_WRONLY, -// kReadWrite = O_RDWR, -// kAppend = O_APPEND, -// kCreate = O_CREAT, -// kTruncate = O_TRUNC, -// kExclusive = O_EXCL, -// kNonBlock = O_NONBLOCK, -// kSync = O_SYNC, -// kDirect = O_DIRECT, +/** + * Mapping between enum and POSIX open flags: + * - kReadOnly → O_RDONLY + * - kWriteOnly → O_WRONLY + * - kReadWrite → O_RDWR + * - kAppend → O_APPEND + * - kCreate → O_CREAT + * - kTruncate → O_TRUNC + * - kExclusive → O_EXCL + * - kNonBlock → O_NONBLOCK + * - kSync → O_SYNC + * - kDirect → O_DIRECT + */
19-31
: Test implementation is comprehensive and well-structuredThe test case properly verifies that each enum value is correctly mapped to its corresponding POSIX flag. All important flag types are covered, providing thorough validation of the
to_native
function.Consider adding tests for combined flags to ensure they work correctly when used together, which is common in real-world scenarios.
TEST(PosixWrapperTest, CombinedOpenFlagsToNative) { // Test common combinations EXPECT_EQ(pw::to_native(pw::open_flag_k::kCreate | pw::open_flag_k::kReadWrite), O_CREAT | O_RDWR); EXPECT_EQ(pw::to_native(pw::open_flag_k::kCreate | pw::open_flag_k::kTruncate | pw::open_flag_k::kWriteOnly), O_CREAT | O_TRUNC | O_WRONLY); // More combinations as needed }lib/fs/common.h (1)
11-21
: Consider using consistent naming conventions.The constants use
kPascalCase
format while the error types usesnake_case_t/k
format. Consider adopting a consistent naming convention throughout the codebase for better readability and maintainability.-static constexpr std::size_t kBufferSize{4096ULL}; -static constexpr int kDefaultFilePermissions = 0644; -static constexpr int kIOUringQueueEntries = 128; +static constexpr std::size_t buffer_size_k{4096ULL}; +static constexpr int default_file_permissions_k = 0644; +static constexpr int io_uring_queue_entries_k = 128;Or alternatively:
-enum class file_error_code_k : int8_t -struct file_error_t +enum class FileErrorCode : int8_t +struct FileErrorlib/raft/CMakeLists.txt (1)
2-2
: Remove redundant project() call in subdirectory.The
project()
call is typically only needed in the root CMakeLists.txt. In subdirectories, it's usually redundant and can potentially cause issues.-project(frankie)
lib/fs/common.cpp (4)
8-11
: Remove redundant namespace qualificationThe method is already defined within the
fs
namespace, so thefs::
prefix onfile_error_t
is redundant.-[[nodiscard]] auto fs::file_error_t::has_error() const noexcept -> bool +[[nodiscard]] auto file_error_t::has_error() const noexcept -> bool { return code != file_error_code_k::none; }
13-16
: Remove redundant namespace qualification infrom_errno
methodSimilar to the previous comment,
fs::
is redundant in the method definition.-auto fs::file_error_t::from_errno(file_error_code_k code, int err, const char *context) noexcept -> file_error_t +auto file_error_t::from_errno(file_error_code_k code, int err, const char *context) noexcept -> file_error_t { return {.code = code, .system_errno = err, .message = std::string(context) + ": " + strerror(err)}; }
17-20
: Remove redundant namespace qualification insuccess
methodThe
fs::
prefix is redundant in the method definition.-auto fs::file_error_t::success() noexcept -> file_error_t +auto file_error_t::success() noexcept -> file_error_t { return file_error_t{}; }
15-15
: Consider using string formatting for more readable error messagesThe string concatenation for error messages could be improved by using a string formatting library (like
fmt
) for better readability.- return {.code = code, .system_errno = err, .message = std::string(context) + ": " + strerror(err)}; + // Using fmt library (requires header inclusion) + return {.code = code, .system_errno = err, .message = fmt::format("{}: {}", context, strerror(err))};lib/db/db_config.cpp (1)
19-37
: Consider using switch statement forfrom_string
functionThe current implementation using separate
if
statements is functional but less efficient than a switch statement would be, especially for string comparisons.A better approach might be to use a map or unordered_map for lookup, or at minimum, use if-else if structure:
auto db::from_string(const std::string_view mode) noexcept -> db_mode_t { - if (mode == db::EMBEDDED_STR_VIEW) - { - return db_mode_t::kEmbedded; - } - - if (mode == db::STANDALONE_STR_VIEW) - { - return db_mode_t::kStandalone; - } - - if (mode == db::REPLICATED_STR_VIEW) - { - return db_mode_t::kReplicated; - } + if (mode == db::EMBEDDED_STR_VIEW) { + return db_mode_t::kEmbedded; + } else if (mode == db::STANDALONE_STR_VIEW) { + return db_mode_t::kStandalone; + } else if (mode == db::REPLICATED_STR_VIEW) { + return db_mode_t::kReplicated; + } PANIC("provided database mode string is not supported", mode); }Or using a static map (more efficient for repeated lookups):
auto db::from_string(const std::string_view mode) noexcept -> db_mode_t { static const std::unordered_map<std::string_view, db_mode_t> mode_map = { {EMBEDDED_STR_VIEW, db_mode_t::kEmbedded}, {STANDALONE_STR_VIEW, db_mode_t::kStandalone}, {REPLICATED_STR_VIEW, db_mode_t::kReplicated} }; auto it = mode_map.find(mode); if (it != mode_map.end()) { return it->second; } PANIC("provided database mode string is not supported", mode); }lib/db/db_config.h (1)
13-15
: Consider using a string literal type for string constantsUsing
std::string_view
with a string literal creates an implicit conversion. Usingconst char*
or C++17'sstd::string_view_literals
would be more direct.-static constexpr const std::string_view EMBEDDED_STR_VIEW{"embedded"}; -static constexpr const std::string_view STANDALONE_STR_VIEW{"standalone"}; -static constexpr const std::string_view REPLICATED_STR_VIEW{"replicated"}; +static constexpr std::string_view EMBEDDED_STR_VIEW = "embedded"; +static constexpr std::string_view STANDALONE_STR_VIEW = "standalone"; +static constexpr std::string_view REPLICATED_STR_VIEW = "replicated";Or with string_view literals:
using namespace std::string_view_literals; static constexpr auto EMBEDDED_STR_VIEW = "embedded"sv; static constexpr auto STANDALONE_STR_VIEW = "standalone"sv; static constexpr auto REPLICATED_STR_VIEW = "replicated"sv;lib/fs/random_access_file_test.cpp (2)
125-127
: Consider testing with larger and more diverse offsets.The current test uses a limited set of offsets (0, 7, 4096, 8192). Consider adding tests with larger offsets and perhaps some that aren't aligned to typical page boundaries to ensure robustness across a wider range of scenarios.
6-15
: Consider adding tests for concurrent operations and error handling.While the test suite is comprehensive for basic functionality, consider adding:
- Tests for concurrent I/O operations, which would be relevant for an io_uring-based implementation
- Tests that simulate io_uring operation failures
- Tests with direct I/O enabled, which has stricter alignment requirements
These would help ensure the implementation is robust under various conditions.
lib/raft/persistence.h (1)
21-25
: Consider using std::expected for error handling.The methods currently return bool, which provides limited information about failure reasons. Given that other parts of the codebase use std::expected for error handling (e.g., in the filesystem code), consider adopting the same pattern here:
[[nodiscard]] auto initialize() -> std::expected<void, persistence_error_t>; [[nodiscard]] auto update(...) -> std::expected<void, persistence_error_t>; // etc.This would provide more detailed error information.
lib/raft/replicated_log.h (2)
25-30
: Incomplete state cleanup in move constructor.The move constructor clears the consensus module pointer in the source object but not the log variant. This inconsistency could cause issues if the moved-from object is later used.
replicated_log_t(replicated_log_t &&other) noexcept : m_log{std::move(other.m_log)}, m_pConsensusModule{std::move(other.m_pConsensusModule)} { other.m_pConsensusModule = nullptr; + // Also consider resetting or clearing the moved-from log variant + // depending on its implementation details }
94-94
: Redundant initialization of member variable.The member variable
m_pConsensusModule
is initialized tonullptr
in line 94, but this initialization is redundant given that the constructor initializes it with the passed parameter. Consider removing the explicit initialization:private: wal::log::log_variant_t m_log; - std::shared_ptr<raft::consensus_module_t> m_pConsensusModule{nullptr}; + std::shared_ptr<raft::consensus_module_t> m_pConsensusModule;lib/fs/append_only_file.cpp (1)
85-104
: Update documentation to match implementation.The class documentation mentions io_uring initialization, but the implementation delegates all I/O to
random_access_file_t
and doesn't directly interact with io_uring. Update the documentation to accurately reflect the current implementation.lib/fs/random_access_file.h (1)
57-58
: Prefer passingpath
by const reference to avoid an unnecessary copy
fs::path_t
is typically a relatively heavy object (internally holds astd::string
).
Becausebuild()
never modifies the argument, the signature can be changed toconst fs::path_t&
to save one move/copy and a heap allocation in hot paths.- auto build(fs::path_t path, posix_wrapper::open_flag_k openFlags) + auto build(const fs::path_t& path, posix_wrapper::open_flag_k openFlags)lib/wal/log/in_memory_log_storage.h (2)
19-22
: Parameter name shadows the data member – consider renaming to avoid confusionexplicit in_memory_log_storage_t(std::vector<std::string> m_log) : m_log(std::move(m_log)) {}
m_log
(parameter) hidesm_log
(member) so the reader has to parse twice to realise the left‑hand side is the member and the right‑hand side is the parameter.
A small rename improves readability and prevents accidental self‑assignment errors.-explicit in_memory_log_storage_t(std::vector<std::string> m_log) - : m_log(std::move(m_log)) +explicit in_memory_log_storage_t(std::vector<std::string> log) + : m_log(std::move(log))
12-13
: Replaceresize()
idea withreserve()
to avoid length side‑effectsThe TODO suggests pre‑allocating the vector in the ctor.
resize()
will create size() empty strings and changesize()
.
reserve()
keepssize()==0
but prepares capacity, which is what you want for an append‑only log.m_log.reserve(default_capacity);A reasonable default could be derived from expected entry count or configured via the builder.
src/main.cpp (2)
439-447
: Incorrect error message for missingpeers
fieldThe exception message still mentions
"id"
instead of"peers"
, which will mislead operators.- throw std::runtime_error("\"id\" is not specified in the config"); + throw std::runtime_error("\"peers\" is not specified in the config");
642-644
: Minor typo in log message
positve
→positive
.- spdlog::error("ID of the node should be positve integer"); + spdlog::error("ID of the node should be positive integer");lib/fs/append_only_file.h (1)
15-16
: Avoidusing namespace
in public headers
using namespace random_access_file;
leaks the entire sub‑namespace into every translation unit that includes this header, increasing the risk of symbol collisions and ODR issues. Prefer a qualified alias or direct qualification.-using namespace random_access_file; +// avoid namespace pollution +namespace raf = random_access_file;lib/wal/wal.h (1)
223-223
: Replace placeholder in debug log.The debug log contains a placeholder "FILL_ME" rather than actual content about the WAL entry.
- spdlog::debug("Added new WAL entry {}", "FILL_ME"); + spdlog::debug("Added new WAL entry: op={}, key={}, value={}", + op_view, rec.kv.m_key.m_key, rec.kv.m_value.m_value);lib/raft/raft.cpp (2)
126-140
: Pass nodeConfig by const reference for improved performance.The
nodeConfig
parameter is unnecessarily passed by value when it's only moved into a member variable, which causes an extra copy.consensus_module_t::consensus_module_t( - node_config_t nodeConfig, + const node_config_t& nodeConfig, std::vector<raft_node_grpc_client_t> replicas) noexcept : m_config{std::move(nodeConfig)},🧰 Tools
🪛 Cppcheck (2.10-2)
[performance] 126-126: Function parameter 'nodeConfig' should be passed by const reference.
(passedByValue)
635-635
: Remove debug logging placeholder.The string "VAGAG" appears to be a placeholder or leftover debugging text that should be replaced with a more descriptive message or removed.
- spdlog::debug("[VAGAG] response.success"); + spdlog::debug("Successfully received AppendEntries response from peer={}", client.id());
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (39)
.clang-format
(1 hunks).gitignore
(1 hunks)CMakeLists.txt
(3 hunks)assets/database_config_schema.json
(1 hunks)assets/tkvpp_config_1.json
(2 hunks)assets/tkvpp_config_2.json
(1 hunks)assets/tkvpp_config_3.json
(1 hunks)conanfile.txt
(1 hunks)lib/CMakeLists.txt
(2 hunks)lib/config/CMakeLists.txt
(1 hunks)lib/db/CMakeLists.txt
(2 hunks)lib/db/db_config.cpp
(1 hunks)lib/db/db_config.h
(1 hunks)lib/fs/CMakeLists.txt
(1 hunks)lib/fs/append_only_file.cpp
(1 hunks)lib/fs/append_only_file.h
(1 hunks)lib/fs/common.cpp
(1 hunks)lib/fs/common.h
(1 hunks)lib/fs/random_access_file.cpp
(1 hunks)lib/fs/random_access_file.h
(1 hunks)lib/fs/random_access_file_test.cpp
(1 hunks)lib/posix_wrapper/CMakeLists.txt
(1 hunks)lib/posix_wrapper/open_flag.h
(1 hunks)lib/posix_wrapper/posix_wrapper_test.cpp
(1 hunks)lib/raft/CMakeLists.txt
(1 hunks)lib/raft/persistence.h
(1 hunks)lib/raft/raft.cpp
(1 hunks)lib/raft/raft.h
(1 hunks)lib/raft/replicated_log.h
(1 hunks)lib/structures/lsmtree/CMakeLists.txt
(1 hunks)lib/wal/CMakeLists.txt
(1 hunks)lib/wal/common.h
(1 hunks)lib/wal/log/common.h
(1 hunks)lib/wal/log/concepts.h
(1 hunks)lib/wal/log/in_memory_log_storage.h
(1 hunks)lib/wal/log/log.h
(1 hunks)lib/wal/log/persistent_log_storage.h
(1 hunks)lib/wal/wal.h
(1 hunks)src/main.cpp
(13 hunks)
✅ Files skipped from review due to trivial changes (2)
- lib/posix_wrapper/CMakeLists.txt
- lib/posix_wrapper/open_flag.h
🚧 Files skipped from review as they are similar to previous changes (12)
- .clang-format
- lib/config/CMakeLists.txt
- lib/CMakeLists.txt
- lib/db/CMakeLists.txt
- CMakeLists.txt
- assets/tkvpp_config_1.json
- assets/tkvpp_config_3.json
- lib/wal/log/concepts.h
- lib/wal/CMakeLists.txt
- lib/fs/CMakeLists.txt
- lib/wal/common.h
- assets/tkvpp_config_2.json
🧰 Additional context used
🧬 Code Graph Analysis (7)
lib/posix_wrapper/posix_wrapper_test.cpp (1)
lib/posix_wrapper/open_flag.h (1)
to_native
(57-57)
lib/raft/persistence.h (4)
lib/fs/common.cpp (1)
nodiscard
(8-11)lib/raft/replicated_log.h (6)
nodiscard
(52-55)nodiscard
(57-75)nodiscard
(77-80)nodiscard
(82-85)nodiscard
(87-90)nodiscard
(103-109)lib/raft/raft.h (1)
commitIndex
(166-167)lib/raft/raft.cpp (2)
votedFor
(451-455)votedFor
(451-451)
lib/raft/replicated_log.h (4)
lib/wal/log/log.h (9)
nodiscard
(58-61)nodiscard
(63-68)nodiscard
(70-74)nodiscard
(76-79)nodiscard
(81-84)nodiscard
(141-166)entry
(58-58)command
(63-65)index
(70-71)lib/wal/log/in_memory_log_storage.h (7)
nodiscard
(48-52)nodiscard
(54-59)nodiscard
(61-68)nodiscard
(70-74)entry
(48-48)command
(55-55)index
(61-61)lib/wal/log/persistent_log_storage.h (3)
entry
(292-292)command
(309-309)index
(314-315)lib/raft/raft.h (2)
logEntry
(115-115)index
(153-153)
lib/fs/random_access_file.h (3)
lib/fs/random_access_file.cpp (5)
random_access_file_t
(16-20)random_access_file_t
(22-35)random_access_file_t
(54-65)size
(120-133)size
(120-121)lib/fs/common.cpp (1)
nodiscard
(8-11)lib/posix_wrapper/open_flag.h (5)
nodiscard
(30-35)nodiscard
(37-42)nodiscard
(44-49)nodiscard
(51-55)nodiscard
(57-60)
lib/wal/log/in_memory_log_storage.h (2)
lib/wal/log/log.h (11)
other
(46-51)other
(46-46)nodiscard
(58-61)nodiscard
(63-68)nodiscard
(70-74)nodiscard
(76-79)nodiscard
(81-84)nodiscard
(141-166)entry
(58-58)command
(63-65)index
(70-71)lib/wal/log/persistent_log_storage.h (17)
other
(130-138)other
(130-131)other
(279-285)other
(279-280)nodiscard
(43-47)nodiscard
(49-52)nodiscard
(54-57)nodiscard
(59-62)nodiscard
(84-92)nodiscard
(94-97)nodiscard
(150-156)nodiscard
(158-173)nodiscard
(175-178)nodiscard
(180-183)entry
(292-292)command
(309-309)index
(314-315)
lib/wal/log/log.h (3)
lib/wal/log/in_memory_log_storage.h (11)
other
(29-40)other
(29-30)nodiscard
(48-52)nodiscard
(54-59)nodiscard
(61-68)nodiscard
(70-74)nodiscard
(76-79)nodiscard
(96-99)entry
(48-48)command
(55-55)index
(61-61)lib/wal/wal.h (7)
other
(86-86)other
(87-87)other
(193-201)other
(193-193)other
(204-212)other
(204-204)is_same_v
(316-349)lib/wal/log/persistent_log_storage.h (17)
other
(130-138)other
(130-131)other
(279-285)other
(279-280)nodiscard
(43-47)nodiscard
(49-52)nodiscard
(54-57)nodiscard
(59-62)nodiscard
(84-92)nodiscard
(94-97)nodiscard
(150-156)nodiscard
(158-173)nodiscard
(175-178)nodiscard
(180-183)entry
(292-292)command
(309-309)index
(314-315)
lib/raft/raft.h (3)
lib/raft/raft.cpp (4)
raft_node_grpc_client_t
(57-64)consensus_module_t
(125-147)votedFor
(451-455)votedFor
(451-451)lib/raft/replicated_log.h (7)
nodiscard
(52-55)nodiscard
(57-75)nodiscard
(77-80)nodiscard
(82-85)nodiscard
(87-90)nodiscard
(103-109)index
(77-77)lib/raft/persistence.h (1)
commitIndex
(22-23)
🪛 Cppcheck (2.10-2)
lib/raft/raft.cpp
[performance] 126-126: Function parameter 'nodeConfig' should be passed by const reference.
(passedByValue)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: build-clang
- GitHub Check: build-gcc
🔇 Additional comments (20)
lib/posix_wrapper/posix_wrapper_test.cpp (3)
1-4
: Includes and header order look goodThe includes are properly ordered with external dependencies (gtest) first, followed by project headers. This follows standard C++ best practices.
5-6
: Good use of namespace aliasUsing a namespace alias improves readability by shortening the code while still making it clear which namespace the symbols belong to.
1-32
:✅ Verification successful
Consider adding test for error handling
The current test set only verifies valid enum values. Depending on the implementation of
to_native
, it might be valuable to test behavior with invalid values (if the enum allows it) or handling of potentially undefined combinations.
🏁 Script executed:
#!/bin/bash # Check if open_flag.h defines any handling for invalid values or combinations fd open_flag.h --exec cat {} | grep -A 15 "to_native"Length of output: 214
No validation in to_native; invalid‑value tests are redundant
to_native
is a constexprstatic_cast
of the enum’s underlying type and contains no branching or error paths to exercise. Adding tests for “invalid” enum values would only verify that the cast returns the raw integer.If you intend to support combining flags, consider:
- Implementing bitwise operators on
open_flag_k
(e.g.operator|
)- Adding a test such as
EXPECT_EQ( pw::to_native(pw::open_flag_k::kReadOnly | pw::open_flag_k::kNonBlock), O_RDONLY | O_NONBLOCK );Otherwise, the existing tests fully cover the behavior of
to_native
.lib/fs/common.h (2)
1-51
: Good foundation for file operations error handling.This file provides a well-structured foundation for file operations with io_uring. The constants, error codes, and error handling structure are well thought out.
43-48
:❌ Incorrect review comment
Methods are declaration-only; ensure proper implementation.
These utility methods are declared but not implemented in this header. Ensure they're implemented in a corresponding .cpp file or as inline functions in the header, and that they handle all error cases robustly.
🏁 Script executed:
#!/bin/bash # Search for the implementation of file_error_t methods fd -e cpp -e h "file_error_t::" lib/fs/Length of output: 39
I wasn’t able to find any definitions for these methods under lib/fs. Let’s search the entire repo for their implementations:
Please let me know what these commands return.
🏁 Script executed:
#!/bin/bash # Search for method definitions across all source files rg -n "file_error_t::" . rg -n "has_error\s*\(" . rg -n "success\s*\(" . rg -n "from_errno\s*\(" .Length of output: 1815
Implementations already provided for
file_error_t
methodsI located definitions for all three methods in
lib/fs/common.cpp
, so no additional implementations are needed:
fs::file_error_t::has_error()
atlib/fs/common.cpp:8
fs::file_error_t::from_errno(...)
atlib/fs/common.cpp:13
fs::file_error_t::success()
atlib/fs/common.cpp:17
Likely an incorrect or invalid review comment.
lib/structures/lsmtree/CMakeLists.txt (1)
18-18
: LGTM - Dependencies aligned with project-wide additions.The addition of
libassert::assert
andmagic_enum::magic_enum
as public dependencies aligns with the project-wide integration of these libraries.conanfile.txt (1)
14-17
:✅ Verification successful
LGTM - Dependencies align with new Raft implementation needs.
The new dependencies support different aspects of the Raft implementation:
- gtest for unit testing the Raft protocol
- prometheus-cpp for monitoring the distributed system
- libassert and magic_enum for better assertions and enum handling
🏁 Script executed:
#!/bin/bash # Check if the newly added dependencies are used in the project echo "Checking for gtest usage:" rg -l "gtest" --type cpp echo -e "\nChecking for prometheus usage:" rg -l "prometheus" --type cpp echo -e "\nChecking for libassert usage:" rg -l "assert\(" --type cpp echo -e "\nChecking for magic_enum usage:" rg -l "magic_enum" --type cppLength of output: 995
✅ Dependencies Verified and Approved
All four newly added dependencies are actively used in the codebase:
- gtest/1.15.0: referenced in
• lib/posix_wrapper/posix_wrapper_test.cpp
• lib/fs/random_access_file_test.cpp
• lib/raft/raft_test.cpp- prometheus-cpp/1.3.0: referenced in
• examples/raft/main.cpp
• examples/prometheus/main.cpp- libassert/2.1.4: assertion calls found in multiple headers/sources under lib/
- magic_enum/0.9.7: used in lib/wal/wal.h
These dependencies align with the Raft implementation’s testing, monitoring, assertion, and enum‑reflection needs. No further action required.
lib/raft/CMakeLists.txt (1)
4-6
: LGTM - Well-structured library definition.The RaftLib target is well-defined with appropriate dependencies for a Raft consensus implementation, including Write-Ahead Logging (WAL) and Protocol Buffers objects.
lib/wal/log/common.h (1)
11-13
: Well-designed variant type for storage abstractionThe variant approach for log storage is a good design choice. It allows for a unified interface to different storage implementations while maintaining type safety.
lib/db/db_config.cpp (1)
4-17
: Good implementation ofto_string
functionThe function implementation is correct and handles all enum cases with a proper default case that asserts on unexpected values.
lib/db/db_config.h (2)
17-22
: Good enum definition with explicit valuesThe
db_mode_t
enum is well-defined with explicit values and a clear naming convention.
24-26
: Good use of[[nodiscard]]
andnoexcept
The function declarations properly use
[[nodiscard]]
to ensure return values are not ignored andnoexcept
to indicate no exceptions will be thrown.lib/fs/random_access_file_test.cpp (4)
1-5
: Well-structured and comprehensive test suite.The test suite is well-organized with good separation of concerns between helper utilities, fixtures, and test cases. The comment at the top also provides clear compilation instructions.
24-36
: Good implementation of temporary file creation.The
make_temp_file_path()
function properly creates a unique temporary file usingmkstemp()
and correctly closes the file descriptor after creation, which prevents resource leaks.
57-79
: Well-designed test fixture.The test fixture appropriately manages resources by creating a temporary file before each test and cleaning it up afterward. The explicit destruction of the file object in
TearDown()
before unlinking the file ensures proper resource cleanup.
195-214
: Good negative test for moved-from objects.Testing that operations on moved-from objects fail gracefully rather than causing undefined behavior is excellent practice.
lib/raft/persistence.h (1)
8-20
: Clarify the implementation strategy for this class.This class cannot be instantiated (due to deleted default constructor) but doesn't define any constructors either. Additionally, the methods aren't virtual, suggesting this isn't meant to be a base class for derived implementations.
Consider either:
- Making this a pure abstract interface with virtual methods if different persistence implementations are intended
- Providing a constructor and implementing the methods if only one implementation is needed
- Making this a namespace with free functions if no state needs to be maintained
Without this clarification, it's unclear how to use this class in the Raft implementation.
lib/raft/replicated_log.h (1)
97-98
: Good use of static_assert to verify concept compliance.Using static_assert to verify that the class satisfies the TLogStorageConcept is excellent practice, ensuring interface compliance at compile time.
lib/fs/append_only_file.cpp (1)
105-122
: Well-implemented builder pattern.The builder pattern implementation is clean and properly handles both direct I/O and error propagation. Great job on using the posix_wrapper flags for clarity and type safety.
src/main.cpp (1)
548-560
:std::filesystem::exists
used without including<filesystem>
exists(walPath)
is unqualified and there is no#include <filesystem>
in the header list.
Include it or qualify withstd::filesystem::exists
to avoid ADL surprises.
lib/wal/log/common.h
Outdated
[[nodiscard]] auto build() noexcept -> log_storage_variant_t | ||
{ | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementation for build()
method is missing
The build()
method of log_storage_builder_t
is declared to return a log_storage_variant_t
but the method body is empty. This will result in undefined behavior as it doesn't return any value.
Add an implementation to the method body that constructs and returns a valid variant instance, e.g.:
[[nodiscard]] auto build() noexcept -> log_storage_variant_t
{
+ // Default to in-memory storage or based on some configuration
+ return wal::log::in_memory_log_storage_t{};
}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
[[nodiscard]] auto build() noexcept -> log_storage_variant_t | |
{ | |
} | |
[[nodiscard]] auto build() noexcept -> log_storage_variant_t | |
{ | |
// Default to in-memory storage or based on some configuration | |
return wal::log::in_memory_log_storage_t{}; | |
} |
lib/wal/log/common.h
Outdated
class log_storage_builder_t final | ||
{ | ||
public: | ||
[[nodiscard]] auto build() noexcept -> log_storage_variant_t | ||
{ | ||
} | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Builder class lacks configuration options
The log_storage_builder_t
class doesn't have any constructor parameters, member variables, or configuration methods, making it impossible to specify which type of storage should be built.
Consider adding configuration methods to specify the desired storage type:
class log_storage_builder_t final
{
private:
+ bool m_use_persistent_storage{false};
+ std::string m_file_path{};
public:
+ auto use_persistent_storage(std::string file_path) noexcept -> log_storage_builder_t&
+ {
+ m_use_persistent_storage = true;
+ m_file_path = std::move(file_path);
+ return *this;
+ }
+
+ auto use_in_memory_storage() noexcept -> log_storage_builder_t&
+ {
+ m_use_persistent_storage = false;
+ return *this;
+ }
[[nodiscard]] auto build() noexcept -> log_storage_variant_t
{
+ if (m_use_persistent_storage) {
+ return wal::log::persistent_log_storage_t<wal::log::file_storage_backend_t>{m_file_path};
+ }
+ return wal::log::in_memory_log_storage_t{};
}
};
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
class log_storage_builder_t final | |
{ | |
public: | |
[[nodiscard]] auto build() noexcept -> log_storage_variant_t | |
{ | |
} | |
}; | |
class log_storage_builder_t final | |
{ | |
private: | |
bool m_use_persistent_storage{false}; | |
std::string m_file_path{}; | |
public: | |
auto use_persistent_storage(std::string file_path) noexcept -> log_storage_builder_t& | |
{ | |
m_use_persistent_storage = true; | |
m_file_path = std::move(file_path); | |
return *this; | |
} | |
auto use_in_memory_storage() noexcept -> log_storage_builder_t& | |
{ | |
m_use_persistent_storage = false; | |
return *this; | |
} | |
[[nodiscard]] auto build() noexcept -> log_storage_variant_t | |
{ | |
if (m_use_persistent_storage) { | |
return wal::log::persistent_log_storage_t<wal::log::file_storage_backend_t>{m_file_path}; | |
} | |
return wal::log::in_memory_log_storage_t{}; | |
} | |
}; |
struct db_config_t | ||
{ | ||
fs::path_t DatabasePath{"."}; | ||
std::string WalFilename{"wal"}; | ||
std::string ManifestFilenamePrefix{"manifest_"}; | ||
db_mode_t mode; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add default value for mode field
The mode
field in db_config_t
struct doesn't have a default value, which could lead to uninitialized values. Consider adding a default value.
- db_mode_t mode;
+ db_mode_t mode{db_mode_t::kStandalone};
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
db_mode_t mode; | |
db_mode_t mode{db_mode_t::kStandalone}; |
lib/raft/persistence.h
Outdated
class persistence_t | ||
{ | ||
public: | ||
persistence_t() = delete; | ||
|
||
persistence_t(const persistence_t &) = delete; | ||
auto operator=(const persistence_t &) -> persistence_t & = delete; | ||
|
||
persistence_t(persistence_t &&) = delete; | ||
auto operator=(const persistence_t &&) -> persistence_t & = delete; | ||
|
||
~persistence_t() = default; | ||
|
||
[[nodiscard]] auto initialize() -> bool; | ||
[[nodiscard]] auto update(std::optional<std::uint32_t> commitIndex, | ||
std::optional<std::uint32_t> votedFor) -> bool; | ||
[[nodiscard]] auto flush() -> bool; | ||
[[nodiscard]] auto restore() -> bool; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add documentation explaining persistence behavior and implementation strategy.
This class lacks documentation explaining:
- What Raft state is being persisted
- The expected behavior of each method
- What persistence mechanism should be used
- Error conditions that would cause the methods to return false
Add Doxygen-style comments to clarify these aspects, making it easier for implementers to understand the contract.
lib/raft/persistence.h
Outdated
auto operator=(const persistence_t &) -> persistence_t & = delete; | ||
|
||
persistence_t(persistence_t &&) = delete; | ||
auto operator=(const persistence_t &&) -> persistence_t & = delete; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix the move assignment operator declaration.
The move assignment operator has an incorrect const
qualifier on the rvalue reference parameter, which contradicts the purpose of move semantics:
- auto operator=(const persistence_t &&) -> persistence_t & = delete;
+ auto operator=(persistence_t &&) -> persistence_t & = delete;
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
auto operator=(const persistence_t &&) -> persistence_t & = delete; | |
// remove the erroneous const on the rvalue reference | |
auto operator=(persistence_t &&) -> persistence_t & = delete; |
lib/fs/random_access_file.cpp
Outdated
auto random_access_file_t::operator=(random_access_file_t &&other) noexcept | ||
-> random_access_file_t & | ||
{ | ||
namespace random_access_file | ||
if (this == &other) | ||
{ | ||
return *this; | ||
} | ||
|
||
m_fd = other.m_fd; | ||
m_ring = other.m_ring; | ||
|
||
other.m_fd = -1; | ||
other.m_ring = io_uring{}; | ||
|
||
return *this; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resource leak in move‑assignment operator
Before overwriting m_fd
/m_ring
, the current object may still own a valid file
descriptor and io_uring
instance. Overwriting them without closing
m_fd
and calling io_uring_queue_exit()
leaks kernel resources.
if (this == &other)
{
return *this;
}
+ // release current resources
+ if (m_fd != -1)
+ {
+ io_uring_queue_exit(&m_ring);
+ close(m_fd);
+ }
+
m_fd = other.m_fd;
m_ring = other.m_ring;
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
auto random_access_file_t::operator=(random_access_file_t &&other) noexcept | |
-> random_access_file_t & | |
{ | |
namespace random_access_file | |
if (this == &other) | |
{ | |
return *this; | |
} | |
m_fd = other.m_fd; | |
m_ring = other.m_ring; | |
other.m_fd = -1; | |
other.m_ring = io_uring{}; | |
return *this; | |
} | |
auto random_access_file_t::operator=(random_access_file_t &&other) noexcept | |
-> random_access_file_t & | |
{ | |
if (this == &other) | |
{ | |
return *this; | |
} | |
// release current resources | |
if (m_fd != -1) | |
{ | |
io_uring_queue_exit(&m_ring); | |
close(m_fd); | |
} | |
m_fd = other.m_fd; | |
m_ring = other.m_ring; | |
other.m_fd = -1; | |
other.m_ring = io_uring{}; | |
return *this; | |
} |
lib/wal/log/log.h
Outdated
auto operator=(log_t &&other) noexcept -> log_t & | ||
{ | ||
using std::swap; | ||
swap(*this, other); | ||
return *this; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix infinite recursion in move‑assignment operator
operator=
invokes std::swap(*this, other)
, and the fallback implementation of std::swap
internally uses move‑assignment again, causing unbounded recursive calls (or a hard compiler error on some STL implementations).
- auto operator=(log_t &&other) noexcept -> log_t &
- {
- using std::swap;
- swap(*this, other);
- return *this;
- }
+ auto operator=(log_t &&other) noexcept -> log_t &
+ {
+ if (this != &other)
+ {
+ m_storage = std::move(other.m_storage);
+ }
+ return *this;
+ }
Alternatively, define an explicit swap(log_t&, log_t&)
overload and call it, but avoid calling std::swap
on *this
inside the assignment operator.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
auto operator=(log_t &&other) noexcept -> log_t & | |
{ | |
using std::swap; | |
swap(*this, other); | |
return *this; | |
} | |
auto operator=(log_t &&other) noexcept -> log_t & | |
{ | |
if (this != &other) | |
{ | |
m_storage = std::move(other.m_storage); | |
} | |
return *this; | |
} |
lib/wal/wal.h
Outdated
auto add(record_t rec) noexcept -> void | ||
{ | ||
std::visit([&](auto &wal) { return wal.add(std::move(rec)); }, m_wal); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add thread safety for concurrent operations.
The add
method processes records through a std::visit without any synchronization. If multiple threads call this method simultaneously on the same WAL instance, data races or partial writes might occur. Consider adding thread safety mechanisms if concurrent usage is intended.
auto add(record_t rec) noexcept -> void
{
+ std::lock_guard<std::mutex> lock(m_mutex);
std::visit([&](auto &wal) { return wal.add(std::move(rec)); }, m_wal);
}
You would need to add a mutex member to the class:
private:
wal_variant_t m_wal;
std::mutex m_mutex; // Protect concurrent access to the WAL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 8
🔭 Outside diff range comments (2)
lib/structures/memtable/memtable.cpp (2)
91-94
:⚠️ Potential issueIncorrect implementation of
operator>
The current implementation of
operator>
doesn't correctly implement the "greater than" relationship. Using!(m_key < record.m_key)
is equivalent to "greater than or equal to" (>=
), not strictly "greater than" (>
).auto memtable_t::record_t::operator>(const memtable_t::record_t &record) const -> bool { - return !(m_key < record.m_key); + return record.m_key < m_key; }
158-173
: 🛠️ Refactor suggestionInefficient implementation of
max()
functionThe current implementation of
max()
iterates through the entire container to find the last element, which is inefficient as noted in the TODO.auto memtable_t::max() const noexcept -> std::optional<memtable_t::record_t::key_t> { - storage_t::const_iterator beforeEnd{m_data.cbegin()}; - auto idx{0}; - for (auto begin{m_data.cbegin()}; begin != m_data.cend(); ++begin) - { - if (idx++ == 0) - { - continue; - } - - beforeEnd = begin; - } - return m_data.size() > 0 ? std::make_optional(beforeEnd->m_key) : std::nullopt; + if (m_data.empty()) { + return std::nullopt; + } + + // Assuming m_data is a container with bidirectional or random access iterators + auto last = std::prev(m_data.cend()); + return std::make_optional(last->m_key); }
♻️ Duplicate comments (22)
lib/db/manifest/manifest.cpp (1)
61-68
:⚠️ Potential issueError handling regression –
.value()
will throw &open()
still returns successA previous review already pointed out that calling
.value()
on the builder result masks errors and can throw.
The current revision still:m_log.emplace(std::move(fs::append_only_file_builder_t{}.build(m_path.c_str(), true).value())); return true; // <- unconditional successThis re‑introduces the silent‑failure path we discussed.
Please restore the explicit check and propagate/ log the failure as previously suggested.lib/raft/raft_test.cpp (1)
132-133
: Replace fixed sleep with deterministic synchronisation (duplicate)
The previous review already flagged this; flaky sleeps are still present.
Use condition‑variables or polling with timeout to wait for leader election.src/main.cpp (5)
264-268
: Typo breaks JSON parsing ofdatabase.mode
(duplicate)
",pde"
is an invalid key and will throw at runtime.- db::from_string(configJson["database"][",pde"].get<std::string>()); + db::from_string(configJson["database"]["mode"].get<std::string>());
423-430
: Wrong error message for missingpeers
(duplicate)
The exception text still says"id"
instead of"peers"
.
617-632
: Range‑for syntax is ill‑formed – won’t compile (duplicate)for (raft::id_t replicaId{1}; const auto &replicaIp : pDbConfig->ServerConfig.peers)Split the declaration and loop:
-for (raft::id_t replicaId{1}; const auto &replicaIp : pDbConfig->ServerConfig.peers) +raft::id_t replicaId{1}; +for (const auto& replicaIp : pDbConfig->ServerConfig.peers)Increment
replicaId
inside the loop as you already do.
652-653
: Checkdynamic_cast
result before dereference (duplicate)
dynamic_cast<RaftService::Service *>(pConsensusModule.get())
may returnnullptr
; verify beforeRegisterService
.
669-670
: Busy‑wait burns CPU – add small sleep (duplicate)
Insertstd::this_thread::sleep_for(100ms);
inside the loop.lib/fs/random_access_file.cpp (2)
36-51
:⚠️ Potential issueMove‑assignment still leaks the current resources
The assignment operator steals
m_fd
/m_ring
fromother
but never releases the resources that this might already own when the operator is entered. The same issue was raised in a previous round of review and is still present.auto random_access_file_t::operator=(random_access_file_t &&other) noexcept -> random_access_file_t & { if (this == &other) { return *this; } - m_fd = other.m_fd; - m_ring = other.m_ring; + // release currently‑owned resources first + if (m_fd != -1) + { + io_uring_queue_exit(&m_ring); + close(m_fd); + } + + m_fd = other.m_fd; + m_ring = other.m_ring; other.m_fd = -1; other.m_ring = io_uring{}; return *this; }
83-88
:⚠️ Potential issue
io_uring_submit
/io_uring_wait_cqe
return codes are ignoredBoth calls can fail (negative return value). If that happens the unchecked
cqe
pointer is dereferenced on the next line, leading to UB / crash. Handle the error path first and convert it intofile_error_t
, as already suggested earlier.lib/wal/log/log.h (2)
46-51
:⚠️ Potential issueRecursive move‑assignment via
std::swap
Using
swap(*this, other)
falls back on ADL tostd::swap
, which in turn calls move‑assignment again – resulting in infinite recursion or a hard error, exactly as mentioned in the previous review round.Replace the body with an explicit move as shown in the earlier suggestion, or provide a non‑recursive
swap(log_t&, log_t&)
overload and call that.
128-154
:⚠️ Potential issue
object_backend_tag
APIs compile, butbuild()
still static‑asserts
set_url()
is enabled forobject_backend_tag
, yetbuild()
still has no branch for that tag and unconditionally hits thestatic_assert(always_false<>)
. Any caller that uses the object backend will therefore fail to compile.Either (a) implement the object‑store backend in the builder, or (b) remove / disable the
set_url()
overload until the backend is ready, to avoid a confusing build‑time trap.lib/raft/raft.h (1)
170-194
:⚠️ Potential issuePotential data race on
m_shutdown
m_shutdown
is a plainbool
, but the class launches heartbeat / election threads (m_heartbeatThread
,m_electionThread
) that almost certainly read this flag concurrently. Every access must be protected by a mutex or replaced withstd::atomic<bool>
– otherwise the program exhibits undefined behaviour.- bool m_shutdown{false}; + std::atomic<bool> m_shutdown{false};Also ensure all reads & writes use the same atomic object, or are guarded by
m_stateMutex
.lib/wal/log/persistent_log_storage.h (5)
148-158
:⚠️ Potential issueBuffer size doesn't account for read result.
The read_impl function doesn't resize the buffer based on the actual bytes read. If fewer bytes are read than requested, the buffer will contain uninitialized or garbage data.
Apply this fix to properly resize the buffer:
[[nodiscard]] auto read_impl(std::size_t offset, std::size_t size) -> std::string { std::string buffer; buffer.resize(size); - if (const auto res = m_file.read(offset, buffer.data(), size); !res.has_value()) + const auto res = m_file.read(offset, buffer.data(), size); + if (!res.has_value()) { spdlog::error("Failed to read from file storage. Offset={}, size={}", offset, size); return {}; } + // Adjust buffer size to actual bytes read + buffer.resize(static_cast<size_t>(res.value())); return buffer; }
234-247
: 🛠️ Refactor suggestionGracefully handle empty or corrupt files in constructor.
The constructor reads the entire file content at once, which is inefficient for large files and lacks error handling if the read fails.
Consider reading in chunks and adding proper error handling:
explicit persistent_log_storage_t(TBackendStorage &&backendStorage) : m_backendStorage(std::move(backendStorage)) { - const std::string raw = m_backendStorage.read(0, m_backendStorage.size()); - std::istringstream stream(raw); - for (std::string line; std::getline(stream, line);) - { - if (absl::StripAsciiWhitespace(line).empty()) - { - continue; - } - m_inMemoryLog.emplace_back(std::move(line)); + const size_t total_size = m_backendStorage.size(); + if (total_size == 0) { + return; // Empty file, nothing to load + } + + try { + // Read in chunks for efficiency with large files + constexpr size_t chunk_size = 4096; + size_t offset = 0; + std::string buffer; + std::string line_buffer; + + while (offset < total_size) { + size_t to_read = std::min(chunk_size, total_size - offset); + std::string chunk = m_backendStorage.read(offset, to_read); + offset += chunk.size(); + + if (chunk.empty()) { + spdlog::warn("Read returned empty chunk at offset {}", offset); + break; + } + + buffer += chunk; + + // Process complete lines + std::istringstream stream(buffer); + while (std::getline(stream, line_buffer)) { + if (!absl::StripAsciiWhitespace(line_buffer).empty()) { + m_inMemoryLog.emplace_back(std::move(line_buffer)); + } + line_buffer.clear(); + } + + // Keep any incomplete last line for the next iteration + buffer = stream.eof() ? "" : line_buffer; + } + } catch (const std::exception& e) { + spdlog::error("Error loading log entries: {}", e.what()); } }
270-281
:⚠️ Potential issueFix missing newline in append method.
The append method doesn't add newlines between entries, but the constructor reads the file line by line. This inconsistency will cause problems when loading persisted logs.
Apply this fix:
[[nodiscard]] auto append(std::string entry) -> bool { + // Add newline to separate entries when reading line by line + entry += '\n'; if (!m_backendStorage.write( static_cast<const char *>(entry.data()), m_backendStorage.size(), entry.size())) { spdlog::error( "Persistent log storage write failed. Entry={}, size={}\n", entry, entry.size()); return false; } - m_inMemoryLog.emplace_back(std::move(entry)); + // Store the original entry (without newline) in memory + m_inMemoryLog.emplace_back(entry.substr(0, entry.size() - 1)); return true; }
272-274
:⚠️ Potential issueFix offset parameter in write operation.
The current implementation uses
m_backendStorage.size()
as the offset for writing, which is incorrect for appending log entries.Correct the offset parameter:
if (!m_backendStorage.write( - static_cast<const char *>(entry.data()), m_backendStorage.size(), entry.size())) + static_cast<const char *>(entry.data()), 0, entry.size())) // Use 0 for offset since append_only_file will handle appending
304-307
:⚠️ Potential issueInconsistency between size() and read() methods.
The
size()
method returnsm_backendStorage.size()
butread()
usesm_inMemoryLog.size()
. This inconsistency could lead to subtle bugs.[[nodiscard]] auto size() const -> std::size_t { - return m_backendStorage.size(); + return m_inMemoryLog.size(); }lib/raft/raft.cpp (5)
261-262
:⚠️ Potential issueImplement the state machine updates or remove the TODO.
This TODO comment has been present through multiple reviews. Without updating the state machine, committed log entries have no effect on the actual system state.
Implement state machine updates to maintain consistency:
while (m_lastApplied < m_commitIndex) { ++m_lastApplied; - // TODO(lnikon): Update the state machine! + const auto& entry = m_log[m_lastApplied - 1]; + // Apply the entry to your state machine + // For example: + // m_stateMachine.apply(entry); }
479-485
: 🛠️ Refactor suggestionConsider using thread pool instead of spawning new threads.
Creating a new thread for every append entries RPC can severely degrade performance under high load.
Consider implementing a thread pool:
- // TODO(lnikon): Make these calls async - for (auto &[id, client] : m_replicas) - { - spdlog::debug( - "Node={} is creating a heartbeat thread for the peer={}", m_config.m_id, id); - sendAppendEntriesRPC(client.value(), {}); - } + // Use a thread pool to limit concurrency + std::vector<std::future<void>> futures; + for (auto &[id, client] : m_replicas) + { + spdlog::debug( + "Node={} is queuing heartbeat for peer={}", m_config.m_id, id); + futures.push_back(m_threadPool.enqueue([this, &client]() { + sendAppendEntriesRPC(client.value(), {}); + })); + } + + // Wait for all heartbeats to complete or timeout + for (auto& future : futures) { + future.wait_for(std::chrono::milliseconds(50)); + }This would require adding a thread pool implementation to your codebase.
873-878
: 🛠️ Refactor suggestionConsider using fsync for log reliability.
The current implementation flushes the data to the operating system but doesn't ensure it's actually written to persistent storage. For true Raft safety, calling fsync is necessary.
Add proper fsync functionality:
fsa.flush(); if (fsa.fail()) { return false; } + + // Ensure data is actually persisted to disk + #ifdef _WIN32 + _commit(_fileno(fsa)); + #else + fsync(fileno(fsa)); + #endifThis requires adding the appropriate headers (#include <unistd.h> for POSIX systems or <io.h> for Windows).
933-934
:⚠️ Potential issueFix unintended override of loaded votedFor value.
The code explicitly sets
m_votedFor = 0
immediately after reading it from a file, which defeats the purpose of persisting this value.ifs >> m_commitIndex >> m_votedFor; -m_votedFor = 0;
This change ensures that the persisted value for
votedFor
is properly maintained across restarts, which is critical for the correctness of the Raft protocol.
621-622
:⚠️ Potential issueImplement state machine updates.
The TODO comment in onSendAppendEntriesRPC indicates missing state machine updates, which are crucial for maintaining consistency.
Implement the missing state machine updates:
while (m_lastApplied < m_commitIndex) { ++m_lastApplied; - spdlog::info("TODO(lnikon): Apply to state machine here"); + const auto& entry = m_log[m_lastApplied - 1]; + // Apply the entry to your state machine + // For example: + // m_stateMachine.apply(entry); + spdlog::info("Applied entry [key={}, value={}] to state machine", + entry.key(), entry.value()); }
🧹 Nitpick comments (5)
lib/structures/memtable/memtable.cpp (1)
109-115
: Redundant size trackingBoth
m_size
andm_num_of_bytes
are incremented by the same value (record.size()
), suggesting redundancy. Consider clarifying the distinction between these variables or consolidating them if they track the same information.lib/db/manifest/manifest.cpp (1)
96-99
: Detect partial writes – don’t treat “>= 0” as success
append()
returning a non‑negative value means “some bytes written”, not necessarily “all bytes written”.
For manifest durability you should verify the full payload length:- return m_log->append({infoSerialized.c_str(), infoSerialized.size()}) - .transform([](ssize_t res) { return res >= 0; }) + const auto expected = m_log->append({infoSerialized.c_str(), infoSerialized.size()}); + return expected.transform([&](ssize_t res) { + return static_cast<std::size_t>(res) == infoSerialized.size(); + }) .value_or(false);This prevents silent truncation on short writes (e.g.
EINTR
).lib/db/db.h (1)
20-21
: Pass heavy objects by (r)value‑reference to avoid an extra move
wal::wal_wrapper_t
potentially owns sizeable buffers; taking it by value forces one move into the parameter and another intom_wal
.
Prefer an r‑value reference in the ctor andstd::move
at the call‑site to make the intent explicit and forbid an accidental copy.- explicit db_t(config::shared_ptr_t config, wal::wal_wrapper_t wal); + explicit db_t(config::shared_ptr_t config, wal::wal_wrapper_t&& wal);src/main.cpp (1)
605-609
: Typo: “positve” → “positive”- spdlog::error("ID of the node should be positve integer"); + spdlog::error("ID of the node should be positive integer");lib/fs/random_access_file.cpp (1)
81-82
:IOSQE_IO_LINK
flag set without a linked SQE
IOSQE_IO_LINK
makes sense only when at least two SQEs are linked together.
Here there is a singlewritev
SQE, so the flag has no effect and only adds confusion.
Consider dropping the flag (or linking the operation with a follow‑upfsync
SQE if that is the intention).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (65)
.clang-format
(1 hunks)bench/db/basic_bench.cpp
(1 hunks)bench/db/celero_bench.cpp
(2 hunks)bench/db/gbench_bench.cpp
(2 hunks)bench/fs/lots_of_write.cpp
(1 hunks)bench/fs/open_write_streams.cpp
(3 hunks)examples/absl/absl_try_mutex.cpp
(1 hunks)examples/iouring/libiouring/main.cpp
(1 hunks)examples/iouring/raw_uring/main.cpp
(1 hunks)examples/prometheus/main.cpp
(1 hunks)examples/raft/main.cpp
(1 hunks)lib/concurrency/thread_safe_queue.h
(1 hunks)lib/db/db.cpp
(4 hunks)lib/db/db.h
(3 hunks)lib/db/db_config.h
(1 hunks)lib/db/manifest/manifest.cpp
(3 hunks)lib/db/manifest/manifest.h
(5 hunks)lib/fs/append_only_file.cpp
(1 hunks)lib/fs/append_only_file.h
(1 hunks)lib/fs/common.cpp
(1 hunks)lib/fs/common.h
(1 hunks)lib/fs/random_access_file.cpp
(1 hunks)lib/fs/random_access_file.h
(1 hunks)lib/fs/random_access_file_test.cpp
(1 hunks)lib/posix_wrapper/open_flag.h
(1 hunks)lib/posix_wrapper/posix_wrapper_test.cpp
(1 hunks)lib/raft/persistence.h
(1 hunks)lib/raft/raft.cpp
(1 hunks)lib/raft/raft.h
(1 hunks)lib/raft/raft_test.cpp
(1 hunks)lib/raft/replicated_log.h
(1 hunks)lib/server/grpc_server.cpp
(2 hunks)lib/server/grpc_server.h
(1 hunks)lib/server/server_factory.h
(1 hunks)lib/server/server_kind.cpp
(1 hunks)lib/server/server_kind.h
(1 hunks)lib/structures/lsmtree/levels/level.cpp
(11 hunks)lib/structures/lsmtree/levels/level.h
(2 hunks)lib/structures/lsmtree/levels/levels.cpp
(4 hunks)lib/structures/lsmtree/levels/levels.h
(1 hunks)lib/structures/lsmtree/lsmtree.cpp
(9 hunks)lib/structures/lsmtree/lsmtree_test.cpp
(4 hunks)lib/structures/lsmtree/segments/helpers.cpp
(1 hunks)lib/structures/lsmtree/segments/helpers.h
(1 hunks)lib/structures/lsmtree/segments/lsmtree_regular_segment.cpp
(2 hunks)lib/structures/lsmtree/segments/lsmtree_regular_segment.h
(1 hunks)lib/structures/lsmtree/segments/lsmtree_segment_factory.h
(1 hunks)lib/structures/lsmtree/segments/segment_storage.cpp
(2 hunks)lib/structures/lsmtree/segments/segment_storage.h
(3 hunks)lib/structures/lsmtree/segments/uuid.h
(9 hunks)lib/structures/memtable/memtable.cpp
(2 hunks)lib/structures/memtable/memtable_test.cpp
(1 hunks)lib/structures/skiplist/skiplist.h
(1 hunks)lib/structures/skiplist/skiplist_test.cpp
(1 hunks)lib/structures/sorted_vector/sorted_vector.h
(4 hunks)lib/structures/sorted_vector/sorted_vector_test.cpp
(1 hunks)lib/wal/common.h
(1 hunks)lib/wal/config.h
(1 hunks)lib/wal/log/common.h
(1 hunks)lib/wal/log/concepts.h
(1 hunks)lib/wal/log/in_memory_log_storage.h
(1 hunks)lib/wal/log/log.h
(1 hunks)lib/wal/log/persistent_log_storage.h
(1 hunks)lib/wal/wal.h
(1 hunks)src/main.cpp
(9 hunks)
✅ Files skipped from review due to trivial changes (27)
- lib/structures/memtable/memtable_test.cpp
- lib/structures/lsmtree/segments/lsmtree_segment_factory.h
- lib/structures/lsmtree/segments/segment_storage.cpp
- bench/db/basic_bench.cpp
- lib/server/server_factory.h
- lib/structures/lsmtree/segments/helpers.h
- lib/structures/lsmtree/segments/helpers.cpp
- lib/server/server_kind.cpp
- lib/structures/skiplist/skiplist_test.cpp
- lib/structures/skiplist/skiplist.h
- lib/server/grpc_server.cpp
- lib/concurrency/thread_safe_queue.h
- lib/structures/lsmtree/levels/levels.h
- lib/structures/lsmtree/levels/level.h
- bench/fs/lots_of_write.cpp
- lib/server/grpc_server.h
- lib/structures/sorted_vector/sorted_vector_test.cpp
- lib/structures/sorted_vector/sorted_vector.h
- examples/absl/absl_try_mutex.cpp
- lib/structures/lsmtree/segments/segment_storage.h
- lib/server/server_kind.h
- bench/db/gbench_bench.cpp
- lib/structures/lsmtree/segments/lsmtree_regular_segment.h
- bench/fs/open_write_streams.cpp
- lib/structures/lsmtree/levels/levels.cpp
- lib/structures/lsmtree/segments/uuid.h
- lib/structures/lsmtree/segments/lsmtree_regular_segment.cpp
🚧 Files skipped from review as they are similar to previous changes (25)
- .clang-format
- lib/structures/lsmtree/lsmtree_test.cpp
- lib/fs/common.cpp
- lib/wal/config.h
- lib/wal/log/common.h
- examples/prometheus/main.cpp
- lib/db/db.cpp
- lib/db/db_config.h
- examples/raft/main.cpp
- lib/posix_wrapper/posix_wrapper_test.cpp
- lib/posix_wrapper/open_flag.h
- lib/fs/common.h
- lib/db/manifest/manifest.h
- lib/wal/log/concepts.h
- lib/structures/lsmtree/lsmtree.cpp
- lib/raft/persistence.h
- lib/raft/replicated_log.h
- lib/fs/append_only_file.cpp
- lib/wal/common.h
- examples/iouring/libiouring/main.cpp
- lib/fs/random_access_file.h
- lib/structures/lsmtree/levels/level.cpp
- lib/wal/log/in_memory_log_storage.h
- lib/fs/append_only_file.h
- lib/wal/wal.h
🧰 Additional context used
🧬 Code Graph Analysis (3)
bench/db/celero_bench.cpp (1)
lib/structures/lsmtree/lsmtree_test.cpp (2)
generateRandomString
(76-90)generateRandomString
(76-76)
lib/raft/raft_test.cpp (2)
lib/raft/raft.cpp (7)
AppendEntries
(179-283)AppendEntries
(179-181)id
(110-113)id
(110-110)raft_node_grpc_client_t
(56-63)RequestVote
(285-344)RequestVote
(285-287)lib/raft/raft.h (8)
request
(68-69)request
(70-70)request
(133-133)raft_node_grpc_client_t
(58-59)raft_node_grpc_client_t
(60-60)raft_node_grpc_client_t
(60-60)raft_node_grpc_client_t
(62-62)raft_node_grpc_client_t
(65-65)
lib/wal/log/persistent_log_storage.h (4)
lib/wal/log/in_memory_log_storage.h (11)
nodiscard
(46-50)nodiscard
(52-56)nodiscard
(58-65)nodiscard
(67-71)nodiscard
(73-76)nodiscard
(93-96)other
(29-39)other
(29-29)entry
(46-46)command
(52-52)index
(58-58)lib/wal/log/log.h (11)
nodiscard
(58-61)nodiscard
(63-67)nodiscard
(69-72)nodiscard
(74-77)nodiscard
(79-82)nodiscard
(136-156)other
(46-51)other
(46-46)entry
(58-58)command
(63-64)index
(69-69)lib/wal/wal.h (9)
nodiscard
(156-159)nodiscard
(227-267)other
(86-86)other
(87-87)other
(190-198)other
(190-190)other
(200-208)other
(200-200)is_same_v
(310-340)lib/fs/append_only_file.h (3)
data
(30-31)offset
(33-34)other
(23-23)
🪛 Cppcheck (2.10-2)
lib/raft/raft_test.cpp
[error] 168-168: Reference to rvResponses that may be invalid.
(invalidContainerReference)
lib/raft/raft.cpp
[performance] 123-123: Function parameter 'nodeConfig' should be passed by const reference.
(passedByValue)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: build-gcc
- GitHub Check: build-clang
🔇 Additional comments (8)
bench/db/celero_bench.cpp (2)
40-41
: Clean formatting of the return type.The trailing return type has been well formatted for better readability by moving it to a separate line with proper indentation. This change improves code readability without affecting functionality.
68-69
: Improved argument formatting for better readability.Breaking the
emplace_back()
call into multiple lines with proper indentation makes the code more readable, especially with long arguments like the ones used here. This formatting change is consistent with good C++ styling practices.lib/structures/memtable/memtable.cpp (2)
51-52
: Style improvements look good!The return type formatting aligns with the project's style guidelines, improving readability by clearly separating the return type from the function signature.
117-118
: Style improvements look good!The return type formatting is consistent with other functions in the codebase and improves readability.
lib/fs/random_access_file_test.cpp (1)
24-35
: Looks good – helper utilities are self‑contained and portableThe temporary‑file helper correctly uses
mkstemp
and closes the returned fd, avoiding races.
Implementation is concise and C++20 compliant.lib/raft/raft_test.cpp (1)
157-170
: Static‑analysis “invalid reference” is a false positive – capacity reservedGood job calling
reserve(clusterSize*2)
, which guaranteesrvResponses
won’t reallocate, so the references captured bySetArgPointee
stay valid.
No action needed.🧰 Tools
🪛 Cppcheck (2.10-2)
[error] 168-168: Reference to rvResponses that may be invalid.
(invalidContainerReference)
lib/fs/random_access_file.cpp (1)
173-175
: VerifykBufferSize
visibility
kBufferSize
is referenced here but is not defined in this translation unit.
Iffs/common.h
does not actually provide it, the file will not compile.Would you double‑check that
fs/common.h
(or another included header) really defineskBufferSize
asconstexpr std::size_t
?lib/raft/raft.cpp (1)
123-137
: Function parameter 'nodeConfig' should be passed by const reference.The static analyzer flagged this, but since the parameter is moved into a member variable, taking it by value is actually appropriate. No change is needed.
🧰 Tools
🪛 Cppcheck (2.10-2)
[performance] 123-123: Function parameter 'nodeConfig' should be passed by const reference.
(passedByValue)
while (bytes_remaining) | ||
{ | ||
off_t bytes_to_read = bytes_remaining; | ||
if (bytes_to_read > BLOCK_SZ) | ||
bytes_to_read = BLOCK_SZ; | ||
|
||
fi->iovecs[current_block].iov_len = bytes_to_read; | ||
|
||
void *buf; | ||
if (posix_memalign(&buf, BLOCK_SZ, BLOCK_SZ)) | ||
{ | ||
perror("posix_memalign"); | ||
return 1; | ||
} | ||
fi->iovecs[current_block].iov_base = buf; | ||
|
||
current_block++; | ||
bytes_remaining -= bytes_to_read; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Leaking every buffer, file_info
, and file descriptor
For each block you posix_memalign
a buffer and store it in fi->iovecs
, but neither the buffers, the enclosing file_info
, nor file_fd
are freed/closed after the CQ event is processed.
Reading many files (or large files) will quickly exhaust memory and file‑descriptor limits.
Suggested post‑processing in read_from_cq
:
@@ while (std::getline(stream, line))
for (int i = 0; i < blocks; i++)
{
output_to_console((char *)fi->iovecs[i].iov_base, fi->iovecs[i].iov_len);
- }
+ free(fi->iovecs[i].iov_base);
+ }
+ close(cqe->fd); // requires storing fd in file_info
+ free(fi);
You will need to:
- Extend
file_info
withint fd;
so it’s reachable at completion time. - Assign it (
fi->fd = file_fd;
) insubmit_to_sq
. - Free/close as above once all blocks are printed.
Without this, the example cannot be used in long‑running benchmarks.
Also applies to: 194-211
struct app_io_sq_ring | ||
{ | ||
unsigned char *head; | ||
unsigned char *tail; | ||
unsigned char *ring_mask; | ||
unsigned char *ring_entries; | ||
unsigned char *flags; | ||
unsigned char *array; | ||
}; | ||
|
||
struct app_io_cq_ring | ||
{ | ||
unsigned char *head; | ||
unsigned char *tail; | ||
unsigned char *ring_mask; | ||
unsigned char *ring_entries; | ||
struct io_uring_cqe *cqes; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Incorrect ring‐buffer field types break atomicity & correctness
The SQ/CQ ring descriptors are mapped over kernel‐owned unsigned int
fields, yet the structures declare every pointer as unsigned char *
.
When you later dereference them (*sring->head
, *cring->tail
, etc.) you read/modify a single byte instead of the required 4‑byte (or 8‑byte) word.
This corrupts the ring state and produces hard‑to‑reproduce I/O failures.
-struct app_io_sq_ring
-{
- unsigned char *head;
- unsigned char *tail;
- unsigned char *ring_mask;
- unsigned char *ring_entries;
- unsigned char *flags;
- unsigned char *array;
-};
+struct app_io_sq_ring
+{
+ unsigned int *head;
+ unsigned int *tail;
+ unsigned int *ring_mask;
+ unsigned int *ring_entries;
+ unsigned int *flags;
+ unsigned int *array;
+};
-struct app_io_cq_ring
-{
- unsigned char *head;
- unsigned char *tail;
- unsigned char *ring_mask;
- unsigned char *ring_entries;
- struct io_uring_cqe *cqes;
-};
+struct app_io_cq_ring
+{
+ unsigned int *head;
+ unsigned int *tail;
+ unsigned int *ring_mask;
+ unsigned int *ring_entries;
+ struct io_uring_cqe *cqes;
+};
Besides preventing silent data races, this also fixes the index = tail & *s->sq_ring.ring_mask;
expression (currently masking with a single byte).
Update all subsequent casts/dereferences accordingly.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
struct app_io_sq_ring | |
{ | |
unsigned char *head; | |
unsigned char *tail; | |
unsigned char *ring_mask; | |
unsigned char *ring_entries; | |
unsigned char *flags; | |
unsigned char *array; | |
}; | |
struct app_io_cq_ring | |
{ | |
unsigned char *head; | |
unsigned char *tail; | |
unsigned char *ring_mask; | |
unsigned char *ring_entries; | |
struct io_uring_cqe *cqes; | |
}; | |
struct app_io_sq_ring | |
{ | |
unsigned int *head; | |
unsigned int *tail; | |
unsigned int *ring_mask; | |
unsigned int *ring_entries; | |
unsigned int *flags; | |
unsigned int *array; | |
}; | |
struct app_io_cq_ring | |
{ | |
unsigned int *head; | |
unsigned int *tail; | |
unsigned int *ring_mask; | |
unsigned int *ring_entries; | |
struct io_uring_cqe *cqes; | |
}; |
cring->head = (unsigned char *)cq_ptr + p.cq_off.head; | ||
cring->tail = (unsigned char *)cq_ptr + p.cq_off.tail; | ||
cring->ring_mask = (unsigned char *)cq_ptr + p.cq_off.ring_mask; | ||
cring->ring_entries = (unsigned char *)cq_ptr + p.cq_off.ring_entries; | ||
cring->cqes = (io_uring_cqe *)cq_ptr + p.cq_off.cqes; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pointer arithmetic bug offsets CQEs incorrectly
p.cq_off.cqes
is a byte offset, but adding it after casting to io_uring_cqe *
scales it by sizeof(io_uring_cqe)
, pointing far past the mapped area.
- cring->cqes = (io_uring_cqe *)cq_ptr + p.cq_off.cqes;
+ cring->cqes = reinterpret_cast<io_uring_cqe *>(
+ reinterpret_cast<unsigned char *>(cq_ptr) + p.cq_off.cqes);
The same pattern appears in any similar offset arithmetic—double‑check all + p.*_off.*
calculations.
Dereferencing a mis‑aligned pointer here corrupts completion data and segfaults under load.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
cring->head = (unsigned char *)cq_ptr + p.cq_off.head; | |
cring->tail = (unsigned char *)cq_ptr + p.cq_off.tail; | |
cring->ring_mask = (unsigned char *)cq_ptr + p.cq_off.ring_mask; | |
cring->ring_entries = (unsigned char *)cq_ptr + p.cq_off.ring_entries; | |
cring->cqes = (io_uring_cqe *)cq_ptr + p.cq_off.cqes; | |
cring->head = (unsigned char *)cq_ptr + p.cq_off.head; | |
cring->tail = (unsigned char *)cq_ptr + p.cq_off.tail; | |
cring->ring_mask = (unsigned char *)cq_ptr + p.cq_off.ring_mask; | |
cring->ring_entries = (unsigned char *)cq_ptr + p.cq_off.ring_entries; | |
cring->cqes = reinterpret_cast<io_uring_cqe *>( | |
reinterpret_cast<unsigned char *>(cq_ptr) + p.cq_off.cqes); |
{ | ||
return m_raftClients | | ||
std::views::transform([](auto &&pair) { return std::move(pair.second); }) | | ||
std::ranges::to<std::vector<raft::raft_node_grpc_client_t>>(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raftClients()
moves elements out of the map – second call UB
std::move(pair.second)
leaves the element in m_raftClients
in a moved‑from state.
raftClients()
is invoked twice (once per test), so the second invocation will operate on invalid objects.
Return pointers/references, or move only once and document the contract.
-return m_raftClients |
- std::views::transform([](auto &&pair) { return std::move(pair.second); }) |
- std::ranges::to<std::vector<raft::raft_node_grpc_client_t>>();
+std::vector<raft::raft_node_grpc_client_t*> vec;
+vec.reserve(m_raftClients.size());
+for (auto& [id, client] : m_raftClients) {
+ vec.push_back(&client);
+}
+return vec;
Committable suggestion skipped: line range outside the PR's diff.
lib/wal/log/log.h
Outdated
else if constexpr (std::is_same_v<TStorageTag, storage_tags::file_backend_tag>) | ||
{ | ||
auto &&storage = | ||
persistent_log_storage_builder_t<file_storage_backend_t>{{.file_path = m_file_path}} | ||
.build(); | ||
return storage.has_value() ? std::make_optional(log_t{std::move(storage.value())}) | ||
: std::nullopt; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Missing validation for m_file_path
When the file backend is selected, an empty m_file_path
silently flows into persistent_log_storage_builder_t
, which will likely fail at runtime. Guard against empty paths and return std::nullopt
(or a descriptive error) instead.
lib/raft/raft.h
Outdated
mutable absl::Mutex m_stateMutex; | ||
uint32_t m_currentTerm ABSL_GUARDED_BY(m_stateMutex); | ||
uint32_t m_votedFor ABSL_GUARDED_BY(m_stateMutex); | ||
std::vector<LogEntry> m_log ABSL_GUARDED_BY(m_stateMutex); | ||
|
||
// Volatile state on all servers. | ||
uint32_t m_commitIndex ABSL_GUARDED_BY(m_stateMutex); | ||
uint32_t m_lastApplied ABSL_GUARDED_BY(m_stateMutex); | ||
NodeState m_state ABSL_GUARDED_BY(m_stateMutex); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Members appear un‑initialised
m_currentTerm
, m_votedFor
, m_commitIndex
, m_lastApplied
, and m_state
are declared but not given default initialisers here. Unless the constructor zero‑initialises them in the .cpp
, they may hold indeterminate values, breaking the Raft invariants.
Please initialise them either in‑class or in the constructor’s member‑initialiser list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🔭 Outside diff range comments (1)
assets/database_config_schema.json (1)
64-85
:⚠️ Potential issueMissing “maximumLevels” property in the LSM schema.
The
required
array forlsm
includes"maximumLevels"
, but there’s no corresponding definition underproperties
. This mismatch will cause schema validation failures.Apply this diff to add the missing entry:
--- a/assets/database_config_schema.json +++ b/assets/database_config_schema.json @@ 64,78 "properties": { "flushThreshold": { "type": "integer", "description": "The threshold of bytes at which the memtable should be flushed", "minimum": 1 }, + "maximumLevels": { + "type": "integer", + "description": "Maximum number of levels in the LSM tree", + "minimum": 1 + }, "levelZeroCompaction": { "$ref": "#/$defs/compaction" }, "levelNonZeroCompaction": { "$ref": "#/$defs/compaction" }
♻️ Duplicate comments (1)
assets/tkvpp_config_1.json (1)
5-10
: Remove redundantwalFilename
from the database section.The WAL filename now belongs under the top‑level
"wal"
object. Keeping it here will either be ignored or break validation against the updated schema.--- a/assets/tkvpp_config_1.json +++ b/assets/tkvpp_config_1.json @@ 5,10 "database": { "path": "./var/tkvpp", - "walFilename": "wal", "manifestFilenamePrefix": "manifest_", "mode": "replicated" },
🧹 Nitpick comments (2)
assets/database_config_schema.json (2)
29-34
: Alignmode
description with enum values.The description refers to “server-client mode” whereas the enum uses
"standalone"
. To avoid confusion, update it:--- a/assets/database_config_schema.json +++ b/assets/database_config_schema.json @@ 30,34 - "Specifies whether the database will be embedded, run in server-client mode, or replicated" + "Specifies whether the database will be embedded, standalone, or replicated"
108-114
: Add format validation for peer addresses.Currently,
"peers"
items are untyped strings. Consider enforcing IPv4 syntax with"format": "ipv4"
:--- a/assets/database_config_schema.json +++ b/assets/database_config_schema.json @@ 108,114 "peers": { "type": "array", "description": "Array of IPv4 addresses of peers", "items": { - "type": "string" + "type": "string", + "format": "ipv4" } }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
assets/database_config_schema.json
(4 hunks)assets/tkvpp_config_1.json
(1 hunks)assets/tkvpp_config_2.json
(1 hunks)assets/tkvpp_config_3.json
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- assets/tkvpp_config_3.json
- assets/tkvpp_config_2.json
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: build-clang
- GitHub Check: build-gcc
🔇 Additional comments (4)
assets/tkvpp_config_1.json (4)
1-4
: Logging configuration is valid.The
"logging"
section matches the schema and requires no changes.
11-15
: WAL configuration is correct.The
"wal"
object includes all required fields (enable
,filename
,storageType
) and conforms to the schema.
16-27
: LSM settings follow the schema.The entries for
"flushThreshold"
,"maximumLevels"
, and compaction strategies are correctly defined.
28-38
: Server section aligns with schema requirements.The
"server"
block providestransport
,host
,port
,id
, andpeers
as required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 9
🔭 Outside diff range comments (6)
lib/structures/lsmtree/segments/lsmtree_segment_index.h (2)
28-28
: 💡 Verification agent🧩 Analysis chain
Address the TODO comment before finalizing the Raft implementation.
The TODO comment indicates that the interface is not yet decided. As you're implementing Raft for replication, it's important to finalize this interface to ensure it properly supports the distributed consensus requirements.
🏁 Script executed:
#!/bin/bash # Check if there are other similar TODO comments in the LSM tree implementation rg "TODO: Decide on the interface" --type cppLength of output: 141
Finalize LSM Tree Segment Index Interface for Raft Replication
The
// TODO: Decide on the interface.
comment indicates that the segment index API is still undefined. Before merging Raft-based replication, please design and implement this interface so it fully supports the distributed consensus requirements (e.g., segment lookup, metadata serialization, versioning).• File:
lib/structures/lsmtree/segments/lsmtree_segment_index.h
• Line: 28Suggested change:
- // TODO: Decide on the interface. + // Define a stable interface here to support Raft replication: + // - lookupSegment(id): returns segment metadata + // - serializeMetadata(): returns bytes for log entries + // - applyMetadata(bytes): updates index from log entry
24-26
: 💡 Verification agent🧩 Analysis chain
Consider implementing the empty
key_t
struct.The
key_t
struct is currently empty. If it's intended to be a placeholder for future implementation, consider adding a comment explaining its purpose or implementing it with the required functionality to support the LSM tree operations.
🏁 Script executed:
#!/bin/bash # Check if there are other occurrences of this empty struct or if it's used/implemented elsewhere rg -A 3 "struct key_t" --type cppLength of output: 513
🏁 Script executed:
#!/bin/bash # Display the contents of the lsmtree_segment_index.h file for context sed -n '1,200p' lib/structures/lsmtree/segments/lsmtree_segment_index.hLength of output: 1049
Implement
key_t
for use inunordered_map
The empty
key_t
struct is used as the key in anstd::unordered_map
, so you’ll need to:
- Define the actual key data (e.g. raw bytes, string, numeric ID) inside
key_t
.- Provide an
operator==
forkey_t
.- Specialize
std::hash<structures::lsmtree::lsmtree_segment_index_t::key_t>
(or supply a custom hash functor) so the map can be instantiated.If
key_t
is merely a placeholder, add a// TODO:
comment explaining the intended fields, comparison, and hashing strategy.Affected location:
- lib/structures/lsmtree/segments/lsmtree_segment_index.h: lines 24–26
lib/structures/lsmtree/lsmtree_reader_writer.h (2)
8-8
: 🛠️ Refactor suggestionClean up unprofessional TODO comment.
The TODO comment contains unprofessional language and indicates confusion about the file's purpose. Consider either documenting the actual purpose of this file or refactoring/removing it if it's no longer needed.
46-48
:⚠️ Potential issueFix inheritance inconsistency.
lsmtree_segment_reader_t
incorrectly inherits fromgeneric_writer_t
but is named as a reader. This mismatch between the class name and its inheritance suggests an error.-struct lsmtree_segment_reader_t : generic_writer_t<lsmtree_segment_reader_t> +struct lsmtree_segment_reader_t : generic_reader_t<lsmtree_segment_reader_t>lib/structures/lsmtree/levels/level.h (1)
113-113
: 🛠️ Refactor suggestionFix non-standard attribute.
The attribute
[[__nodiscard__]]
uses a non-standard format with double underscores. This should be changed to the standard[[nodiscard]]
attribute.-[[__nodiscard__]] auto bytes_used() const noexcept -> std::size_t; +[[nodiscard]] auto bytes_used() const noexcept -> std::size_t;lib/structures/lsmtree/lsmtree.h (1)
60-66
:⚠️ Potential issueUpdate documentation for move operations.
The documentation comments still indicate that move constructor and assignment operator are deleted, which contradicts the actual implementation that now supports these operations.
Update the documentation to accurately reflect that move operations are supported with appropriate descriptions of their behavior.
Also applies to: 91-100
♻️ Duplicate comments (14)
lib/wal/CMakeLists.txt (1)
19-24
: 🛠️ Refactor suggestionUn-comment and add the missing WAL unit tests
The
WALTest
target is still commented-out. Given the criticality of WAL in Raft replication, shipping without tests leaves the core persistence layer un-verified.-# add_executable(WALTest "wal_test.cpp") -# set_target_properties(WALTest PROPERTIES CXX_STANDARD 23) -# target_link_libraries( -# WALTest -# Catch2::Catch2WithMain -# spdlog::spdlog fmt::fmt LSMTree MemTable DB Config HashIndex) -# -# include(Catch) -# catch_discover_tests(WALTest) +add_executable(WALTest wal_test.cpp) +set_target_properties(WALTest PROPERTIES CXX_STANDARD 23) +target_link_libraries( + WALTest + PRIVATE Catch2::Catch2WithMain + spdlog::spdlog + fmt::fmt + WAL + Config + FS) + +include(Catch) +catch_discover_tests(WALTest)This restores compilation and automatic discovery of Catch2 cases once
wal_test.cpp
is added.lib/fs/random_access_file.cpp (3)
36-51
:⚠️ Potential issueMove-assignment still leaks existing resources
Previous review already covered this; the issue remains.Before overwriting
m_fd
/m_ring
, close the currently-owned file descriptor and tear down the existing queue to avoid FD & ring leaks.
83-88
:⚠️ Potential issue
io_uring_submit
/io_uring_wait_cqe
return codes ignoredNegative return values signal submission / wait failure yet the code dereferences an un-initialised
cqe
or treats the request as successful. Handle both calls explicitly:-io_uring_submit(&m_ring); -io_uring_cqe *cqe; -int wait_res = io_uring_wait_cqe(&m_ring, &cqe); +int submit_res = io_uring_submit(&m_ring); +if (submit_res < 0) { + return std::unexpected(file_error_t{ + .code = file_error_code_k::write_failed, + .system_errno = -submit_res, + .message = std::format("io_uring_submit failed. fd={}", m_fd), + }); +} + +io_uring_cqe *cqe; +int wait_res = io_uring_wait_cqe(&m_ring, &cqe); +if (wait_res < 0) { + return std::unexpected(file_error_t{ + .code = file_error_code_k::write_failed, + .system_errno = -wait_res, + .message = std::format("io_uring_wait_cqe failed. fd={}", m_fd), + }); +}
173-174
:⚠️ Potential issue
kBufferSize
is still undefined in this TUCompilation will fail unless
kBufferSize
is declared. Define it at namespace or file scope:+static constexpr std::size_t kBufferSize = 8192;
or include a header that provides it.
src/main.cpp (5)
420-427
:⚠️ Potential issueFix incorrect error message for missing peers configuration
The error message incorrectly references "id" when it should reference "peers".
if (configJson.contains("peers")) { dbConfig->ServerConfig.peers = configJson["peers"].get<std::vector<std::string>>(); } else { - throw std::runtime_error("\"id\" is not specified in the config"); + throw std::runtime_error("\"peers\" is not specified in the config"); }
561-574
:⚠️ Potential issueFix invalid range-for loop syntax
The loop construction is incorrect. A range-for loop can't have an initializer statement.
- for (raft::id_t replicaId{1}; const auto &replicaIp : pDbConfig->ServerConfig.peers) + raft::id_t replicaId{1}; + for (const auto &replicaIp : pDbConfig->ServerConfig.peers) { if (replicaId != pDbConfig->ServerConfig.id) { std::unique_ptr<RaftService::Stub> stub{RaftService::NewStub( grpc::CreateChannel(replicaIp, grpc::InsecureChannelCredentials()))}; replicas.emplace_back(raft::node_config_t{.m_id = replicaId, .m_ip = replicaIp}, std::move(stub)); spdlog::info("replicaId={} replicaIp={}", replicaId, replicaIp); } ++replicaId; }
586-587
:⚠️ Potential issueAdd nullptr check for dynamic_cast result
The dynamic_cast could return nullptr if the cast fails, which would lead to a runtime error.
- grpcBuilder.RegisterService( - dynamic_cast<RaftService::Service *>(pConsensusModule.get())); + auto* service = dynamic_cast<RaftService::Service*>(pConsensusModule.get()); + if (!service) { + spdlog::error("Failed to cast consensus module to RaftService"); + return EXIT_FAILURE; + } + grpcBuilder.RegisterService(service);
632-635
: 🛠️ Refactor suggestionAdd sleep to reduce CPU usage in shutdown loop
The busy-wait loop consumes unnecessary CPU cycles.
while (!gShutdown) { - std::this_thread::yield(); + std::this_thread::sleep_for(std::chrono::milliseconds(100)); }
261-265
:⚠️ Potential issueFix incorrect JSON key in database mode loading
There's a typo in the JSON key when loading the database mode -
",pde"
should be"mode"
.if (configJson["database"].contains("mode")) { dbConfig->DatabaseConfig.mode = - db::from_string(configJson["database"][",pde"].get<std::string>()); + db::from_string(configJson["database"]["mode"].get<std::string>()); }lib/fs/append_only_file.h (1)
42-42
: Consider returning a reference or shared stream for efficiencyReturning a
std::stringstream
by value may lead to unnecessary copying of potentially large data. Consider returning a reference to an internal stream or a shared pointer.- [[nodiscard]] auto stream() noexcept -> std::expected<std::stringstream, file_error_t>; + [[nodiscard]] auto stream() noexcept -> std::expected<std::reference_wrapper<std::stringstream>, file_error_t>;lib/raft/raft.cpp (4)
260-262
:⚠️ Potential issueImplement the state machine updates
The TODO comment indicates that state machine updates aren't implemented, which is essential for the Raft protocol to be useful.
while (m_lastApplied < m_commitIndex) { ++m_lastApplied; - // TODO(lnikon): Update the state machine! + if (m_lastApplied <= m_log.size()) { + const auto& entry = m_log[m_lastApplied - 1]; + applyToStateMachine(entry); + } }I recommend implementing a state machine interface:
class StateMachine { public: virtual void apply(const LogEntry& entry) = 0; virtual ~StateMachine() = default; };Then add it to the consensus module and call it here.
473-488
: 🛠️ Refactor suggestionAvoid spawning a new thread for every heartbeat
Creating a new thread for every heartbeat can lead to resource exhaustion under high load. Consider using a thread pool or a single thread that handles all heartbeats.
void consensus_module_t::runHeartbeatThread(std::stop_token token) { constexpr const auto heartbeatInterval{std::chrono::milliseconds(100)}; while (!token.stop_requested() && !m_shutdown) { - // TODO(lnikon): Make these calls async for (auto &[id, client] : m_replicas) { spdlog::debug( "Node={} is creating a heartbeat thread for the peer={}", m_config.m_id, id); - sendAppendEntriesRPC(client.value(), {}); + // Use a thread pool or queue tasks instead of creating a new thread for each heartbeat + std::vector<LogEntry> empty; + sendAppendEntriesRPC(client.value(), empty); } std::this_thread::sleep_for(heartbeatInterval); } }
873-876
: 🛠️ Refactor suggestionConsider using fsync for log reliability
The log is flushed to disk but not explicitly synced. For true Raft safety, consider calling fsync to ensure the data is durable on disk.
fsa.flush(); if (fsa.fail()) { return false; } + + // Ensure data is durably written to disk + if (fsync(fileno(fsa)) != 0) { + spdlog::error("Node={} failed to fsync {}: {}", + m_config.m_id, path.c_str(), strerror(errno)); + return false; + }
931-933
:⚠️ Potential issueRemove unintended override of loaded votedFor value
The code explicitly sets
m_votedFor = 0
immediately after reading it from a file, which defeats the purpose of persisting this value.ifs >> m_commitIndex >> m_votedFor; - m_votedFor = 0; spdlog::info("Node={} restored commitIndex={} and votedFor={}", m_config.m_id, m_commitIndex, m_votedFor);
🧹 Nitpick comments (16)
lib/structures/lsmtree/compaction/compaction_trigger.h (1)
1-7
: Consider adding documentation or removing this empty file.The file now contains only an empty namespace declaration after removing the
CompactionTrigger
class. While this cleanup aligns with the modernization efforts mentioned in the project, an empty header file could be confusing for developers.If this file is being kept as a placeholder for future implementations or to maintain consistent directory structure, consider adding a comment explaining its purpose. Otherwise, if the
CompactionTrigger
functionality has been completely moved elsewhere or is no longer needed, it might be cleaner to remove this file entirely.lib/fs/random_access_file.cpp (1)
79-82
: UnnecessaryIOSQE_IO_LINK
flag
IOSQE_IO_LINK
is meant for chaining multiple SQEs; here only a single write is submitted. Dropping the flag avoids superfluous kernel work:-io_uring_prep_writev(sqe, m_fd, &iov, 1, offset); -sqe->flags |= IOSQE_IO_LINK; +io_uring_prep_writev(sqe, m_fd, &iov, 1, offset);lib/wal/wal.cpp (1)
51-59
: Consider streaming approach for large logs.The current implementation reads the entire log into memory when reconstructing records, which might not be efficient for large logs. Consider a streaming approach that processes records one at a time without loading the entire log content.
lib/concurrency/thread_safe_queue.h (1)
11-11
: Update documentation to reflect move semantics.Consider adding documentation comments for the newly added move constructor and move assignment operator to maintain consistency with the well-documented style of the rest of the class.
lib/structures/lsmtree/levels/levels.h (2)
118-119
: Function declaration formatting is inconsistent.The
flush_to_level0
function declaration uses a different formatting style compared to other methods in the file, with the return type on a separate line.Consider using consistent formatting for all method declarations:
-[[nodiscard]] auto flush_to_level0(memtable::memtable_t memtable) const noexcept - -> segments::regular_segment::shared_ptr_t; +[[nodiscard]] auto flush_to_level0(memtable::memtable_t memtable) const noexcept -> segments::regular_segment::shared_ptr_t;
12-12
: Update class documentation to reflect move semantics.The class has good documentation for other methods, but the newly added move constructor and assignment operator lack documentation, while the old comments for the deleted versions remain misleading.
Consider adding appropriate documentation for the new move operations.
lib/wal/log/storage/in_memory_log_storage.h (3)
13-14
: Address the TODO comment before finalizing.There's a TODO comment about resizing the log in the constructor to a default size. Consider implementing this optimization or removing the comment before finalizing the code.
46-56
: Append methods always return true regardless of outcome.Both append methods always return
true
without any actual error checking. While this is consistent with the interface, consider documenting that these operations cannot fail in this implementation or add actual error checking in case of allocation failures.
90-93
: Builder's build method could be more explicit about success.The build method returns an optional but always constructs a valid object. Consider either documenting that the optional will always contain a value or simplify the return type to just return the object directly since it cannot fail.
-[[nodiscard]] auto build() -> std::optional<in_memory_log_storage_t> +[[nodiscard]] auto build() -> in_memory_log_storage_t { - return in_memory_log_storage_t{}; + return {}; }lib/wal/log/storage/builder.h (1)
55-56
: Typo in static-assert message
"sutisfy"
→"satisfy"
.
While harmless at runtime, typos in diagnostic strings reduce professionalism and hinder searchability.- "log_storage_wrapper_t must sutisfy TLogStorageConcept concept"); + "log_storage_wrapper_t must satisfy TLogStorageConcept concept");lib/wal/log/storage/backend/backend.h (1)
83-86
:config()
returns by value — avoid unnecessary copiesReturning the whole
storage_backend_config_t
by value forces a copy each time.
A cheap const-reference is sufficient and avoids accidental mutations.- [[nodiscard]] auto config() const - { - return m_config; - } + [[nodiscard]] auto config() const -> const storage_backend_config_t & + { + return m_config; + }lib/wal/log/storage/persistent_log_storage.h (3)
20-28
: Consider checking for nullptr in static_assert conditionalThe
static_assert(false, ...)
increate_storage_backend_builder
will always fail during compilation if the template is instantiated with an unsupported backend type. Consider usingstd::is_same_v<TStorageBackend, void>
or another technique that allows the compiler to skip the branch during template instantiation.- else - { - static_assert(false, "not supported backend storage type passed"); - } + else + { + static_assert(!std::is_same_v<TStorageBackend, TStorageBackend>, + "not supported backend storage type passed"); + }
74-85
: Add overflow protection to append operationThe append method writes directly at
m_backendStorage.size()
without verifying if there's enough space. While conceptually an append-only file grows as needed, adding a check could prevent potential overflow issues.[[nodiscard]] auto append(std::string entry) -> bool { + // Check if appending would cause overflow + const auto current_size = m_backendStorage.size(); + if (std::numeric_limits<std::size_t>::max() - current_size < entry.size()) { + spdlog::error( + "Persistent log storage append would overflow. Current size={}, entry size={}", + current_size, entry.size()); + return false; + } + if (!m_backendStorage.write( - static_cast<const char *>(entry.data()), m_backendStorage.size(), entry.size())) + static_cast<const char *>(entry.data()), current_size, entry.size())) { spdlog::error( "Persistent log storage write failed. Entry={}, size={}\n", entry, entry.size()); return false; } m_inMemoryLog.emplace_back(std::move(entry)); return true; }
92-100
: Optimize read method to avoid unnecessary copyThe
read
method creates a new optional object for each successful read. Consider using std::optional's in-place constructor to avoid an extra copy.[[nodiscard]] auto read(const size_t index) const -> std::optional<std::string> { if (index < m_inMemoryLog.size()) { - return std::make_optional(m_inMemoryLog[index]); + return m_inMemoryLog[index]; } return std::nullopt; }lib/fs/append_only_file.h (2)
47-50
: Add documentation for the private member variablesThe private implementation details would benefit from documentation explaining their roles, particularly how the io_uring instance relates to the file descriptor.
private: append_only_file_t(random_access_file_t &&fd) noexcept; + // Underlying random access file for I/O operations random_access_file::random_access_file_t m_fd; + // io_uring instance for asynchronous I/O operations io_uring m_ring{};
53-57
: Add documentation for the builder classThe builder lacks documentation to explain its purpose and usage.
+/** + * @brief Builder for creating append-only file instances + * + * Provides a convenient way to configure and create append_only_file_t instances + * with various options such as direct I/O. + */ class append_only_file_builder_t { public: + /** + * @brief Builds an append-only file + * @param path Path to the file + * @param direct_io Whether to use direct I/O (bypassing OS cache) + * @return An expected containing the built file or an error + */ auto build(std::string path, bool direct_io) -> std::expected<append_only_file_t, file_error_t>; };
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (47)
conanfile.txt
(1 hunks)lib/concurrency/CMakeLists.txt
(1 hunks)lib/concurrency/helpers.h
(1 hunks)lib/concurrency/thread_safe_queue.h
(5 hunks)lib/db/CMakeLists.txt
(1 hunks)lib/db/db.cpp
(5 hunks)lib/db/db.h
(3 hunks)lib/fs/append_only_file.cpp
(1 hunks)lib/fs/append_only_file.h
(1 hunks)lib/fs/random_access_file.cpp
(1 hunks)lib/fs/random_access_file.h
(1 hunks)lib/raft/raft.cpp
(1 hunks)lib/raft/raft.h
(1 hunks)lib/raft/replicated_log.h
(1 hunks)lib/structures/lsmtree/compaction/compactation.h
(1 hunks)lib/structures/lsmtree/compaction/compaction_trigger.h
(1 hunks)lib/structures/lsmtree/compaction/level_zero_compactation.h
(1 hunks)lib/structures/lsmtree/levels/level.h
(3 hunks)lib/structures/lsmtree/levels/levels.cpp
(5 hunks)lib/structures/lsmtree/levels/levels.h
(2 hunks)lib/structures/lsmtree/lsmtree.cpp
(9 hunks)lib/structures/lsmtree/lsmtree.h
(5 hunks)lib/structures/lsmtree/lsmtree_compactor.h
(1 hunks)lib/structures/lsmtree/lsmtree_config.h
(1 hunks)lib/structures/lsmtree/lsmtree_reader_writer.h
(1 hunks)lib/structures/lsmtree/lsmtree_types.h
(1 hunks)lib/structures/lsmtree/segments/helpers.h
(2 hunks)lib/structures/lsmtree/segments/lsmtree_regular_segment.cpp
(2 hunks)lib/structures/lsmtree/segments/lsmtree_regular_segment.h
(2 hunks)lib/structures/lsmtree/segments/lsmtree_segment_factory.h
(1 hunks)lib/structures/lsmtree/segments/lsmtree_segment_index.h
(1 hunks)lib/structures/lsmtree/segments/segment_storage.h
(2 hunks)lib/structures/lsmtree/segments/types.h
(1 hunks)lib/structures/memtable/memtable.h
(1 hunks)lib/structures/memtable/memtable_reader_writer.h
(1 hunks)lib/structures/sorted_vector/sorted_vector.h
(5 hunks)lib/wal/CMakeLists.txt
(1 hunks)lib/wal/log/log.h
(1 hunks)lib/wal/log/storage/backend/backend.h
(1 hunks)lib/wal/log/storage/backend/file_storage_backend.h
(1 hunks)lib/wal/log/storage/builder.h
(1 hunks)lib/wal/log/storage/in_memory_log_storage.h
(1 hunks)lib/wal/log/storage/log_storage_crtp.h
(1 hunks)lib/wal/log/storage/persistent_log_storage.h
(1 hunks)lib/wal/wal.cpp
(1 hunks)lib/wal/wal.h
(1 hunks)src/main.cpp
(9 hunks)
✅ Files skipped from review due to trivial changes (13)
- lib/structures/lsmtree/segments/types.h
- lib/structures/lsmtree/lsmtree_config.h
- lib/structures/lsmtree/compaction/compactation.h
- lib/concurrency/CMakeLists.txt
- lib/structures/lsmtree/segments/helpers.h
- lib/structures/lsmtree/lsmtree_types.h
- lib/structures/memtable/memtable_reader_writer.h
- lib/structures/lsmtree/compaction/level_zero_compactation.h
- lib/structures/sorted_vector/sorted_vector.h
- lib/structures/lsmtree/lsmtree_compactor.h
- lib/structures/lsmtree/segments/lsmtree_regular_segment.h
- lib/wal/log/log.h
- lib/wal/log/storage/log_storage_crtp.h
🚧 Files skipped from review as they are similar to previous changes (14)
- lib/structures/lsmtree/segments/segment_storage.h
- lib/structures/lsmtree/segments/lsmtree_segment_factory.h
- conanfile.txt
- lib/structures/lsmtree/levels/levels.cpp
- lib/structures/lsmtree/segments/lsmtree_regular_segment.cpp
- lib/db/db.h
- lib/db/db.cpp
- lib/fs/random_access_file.h
- lib/structures/lsmtree/lsmtree.cpp
- lib/raft/replicated_log.h
- lib/fs/append_only_file.cpp
- lib/raft/raft.h
- lib/db/CMakeLists.txt
- lib/wal/wal.h
🧰 Additional context used
🧬 Code Graph Analysis (2)
lib/wal/log/storage/in_memory_log_storage.h (4)
lib/raft/replicated_log.h (11)
other
(31-44)other
(31-31)nodiscard
(51-54)nodiscard
(56-73)nodiscard
(75-78)nodiscard
(80-83)nodiscard
(85-88)nodiscard
(100-105)entry
(51-51)command
(56-57)index
(75-75)lib/wal/log/log.h (9)
nodiscard
(39-42)nodiscard
(44-48)nodiscard
(50-53)nodiscard
(55-58)nodiscard
(60-63)nodiscard
(75-78)entry
(39-39)command
(44-45)index
(50-50)lib/wal/log/storage/builder.h (3)
entry
(26-26)command
(31-31)index
(37-37)lib/wal/log/storage/persistent_log_storage.h (3)
entry
(74-74)command
(87-87)index
(92-92)
lib/fs/random_access_file.cpp (4)
lib/fs/random_access_file.h (8)
random_access_file_t
(17-17)random_access_file_t
(19-19)random_access_file_t
(22-22)random_access_file_t
(25-25)random_access_file_t
(44-44)data
(27-28)offset
(30-31)path
(53-54)lib/fs/append_only_file.cpp (12)
read
(55-59)read
(55-56)size
(61-64)size
(61-61)flush
(66-69)flush
(66-66)reset
(71-74)reset
(71-71)stream
(76-79)stream
(76-76)build
(101-117)build
(101-102)lib/fs/common.cpp (2)
from_errno
(13-18)from_errno
(13-14)lib/posix_wrapper/open_flag.h (1)
to_native
(54-54)
🪛 Cppcheck (2.10-2)
lib/raft/raft.cpp
[performance] 123-123: Function parameter 'nodeConfig' should be passed by const reference.
(passedByValue)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: build-gcc
🔇 Additional comments (33)
lib/structures/lsmtree/segments/lsmtree_segment_index.h (1)
1-1
: Using#pragma once
improves code maintainability.The replacement of traditional include guards with
#pragma once
is a good modernization that reduces boilerplate and prevents potential issues with mismatched or duplicate include guards. This change aligns with modern C++ best practices and is consistent with other header files in the project.lib/structures/memtable/memtable.h (1)
1-13
: Header modernization looks good
#pragma once
and the reordered include list improve readability and guard safety. No functional impact detected.lib/structures/lsmtree/lsmtree_reader_writer.h (2)
1-1
: Good practice: Using#pragma once
instead of include guards.The change from traditional include guards to
#pragma once
is a good modernization that improves readability and reduces the chance of macro naming conflicts.
38-40
: Verify recursive template parameter pattern.The
lsmtree_reader_t
inherits fromgeneric_reader_t<lsmtree_reader_t>
, passing itself as a template parameter. This is a form of the Curiously Recurring Template Pattern (CRTP), but ensure this is intentional as it can lead to unexpected behavior if not used correctly.lib/wal/wal.cpp (2)
6-24
: Good implementation of move semantics.The WAL class properly implements move construction and assignment with the correct move semantics, self-assignment check, and resource transfer. This ensures efficient handling of WAL resources.
44-49
: Good logging and helper function.The lambda function
recordToString
is a good practice for encapsulating the record serialization logic and enhancing log readability.lib/concurrency/helpers.h (1)
33-38
: Good practice: Explicitly deleting copy and move operations.Explicitly deleting copy and move operations for the lock guard is good practice since these operations could lead to incorrect lock states.
lib/structures/lsmtree/levels/level.h (2)
23-35
: Good practice: Explicit lifecycle management.Explicitly deleting default constructor, copy and move operations, and defaulting the destructor clarifies the intended usage pattern and prevents accidental misuse of the class.
53-76
: Improved code formatting.Moving return types to a new line after function signatures improves readability, especially for longer function declarations with complex return types.
lib/concurrency/thread_safe_queue.h (5)
21-25
: Default constructor and explicit disabling of copy semantics look good.The class properly disables copy semantics and defines a default constructor, which is appropriate for a thread-safe container.
26-30
: Move constructor correctly handles non-movable mutex.The implementation correctly moves the internal queue while recognizing that the mutex cannot be moved. Using the
move_under_optional_lock
helper is a good approach for safely moving resources.
31-40
: Move assignment operator uses the copy-and-swap idiom with proper synchronization.The move assignment implementation follows the safe copy-and-swap idiom, correctly handling synchronization with
absl_dual_mutex_lock_guard
to prevent data races. This is a robust approach for concurrent data structures.
71-71
: Simplified pop_all implementation looks good.The simplified ternary operator approach is more concise than the previous implementation.
98-104
: Swap implementation correctly handles non-movable mutex.The swap implementation correctly avoids swapping the mutex, which is not movable. The comment explaining this is helpful.
lib/structures/lsmtree/levels/levels.h (4)
24-26
: Copy semantics explicitly disabled.The class properly disables copy semantics which is appropriate for this type of resource-managing class.
27-36
: Move constructor correctly handles non-movable synchronization primitives.The implementation properly moves all movable members while leaving the mutex and notification in their default states. The comments are clear about this behavior.
38-47
: Move assignment uses the copy-and-swap idiom with proper synchronization.The implementation safely uses a temporary object and swap approach with proper mutex locking via
absl_dual_mutex_lock_guard
. This is a thread-safe implementation of the move assignment operator.
124-135
: Swap implementation correctly handles non-movable synchronization primitives.The swap method properly swaps movable members while leaving the mutex and notification as-is. The comments clearly explain this behavior.
lib/wal/log/storage/in_memory_log_storage.h (4)
19-22
: Constructor that takes the log as a parameter looks good.The constructor properly moves the provided log vector, which is efficient.
24-39
: Move semantics implementation is correct.The move constructor and assignment operator are properly implemented. The self-assignment check in the assignment operator is a good practice.
58-65
: Read method safely handles out-of-range access.The read method properly checks bounds and returns an optional, which is a good practice for safe API design.
82-83
: Static assertion verifies concept compliance.The static assertion ensures that the class satisfies the required concept, which is a good practice for compile-time verification.
lib/structures/lsmtree/lsmtree.h (5)
67-79
: Move constructor correctly handles non-movable mutex.The implementation properly moves all movable members and leaves the mutex in its default state. The comments explain this clearly.
However, the documentation comment above this method (lines 60-66) still says the move constructor is deleted, which is now incorrect.
101-110
: Move assignment operator uses copy-and-swap idiom with proper synchronization.The implementation safely uses a temporary object and swap approach with proper mutex locking. This is a thread-safe implementation of the move assignment operator.
However, the documentation comment above this method (lines 91-100) still says the move assignment operator is deleted, which is now incorrect.
8-8
: Updated WAL include path reflects architectural changes.The include path change from the db namespace to a dedicated wal namespace indicates a good architectural separation of concerns.
178-178
:❌ Incorrect review comment
Declare swap method implementation.
The swap method is declared but not defined in this file. Ensure the implementation is provided in the corresponding .cpp file.
🏁 Script executed:
#!/bin/bash # Check if the swap method is implemented in the corresponding cpp file grep -r "void structures::lsmtree::lsmtree_t::swap" lib/structures/lsmtree/Length of output: 77
🏁 Script executed:
#!/bin/bash # List all files in the lsmtree directory to locate possible implementation files echo "Files in lib/structures/lsmtree/:" find lib/structures/lsmtree -maxdepth 1 -type f # Search for any swap method implementation patterns in C++ source files echo -e "\nSearching for 'swap(' in C++ source files:" grep -R "swap(" lib/structures/lsmtree/ --include='*.cpp' --include='*.cc' --include='*.cxx' -n || true echo -e "\nSearching for '::swap' qualified definitions:" grep -R "::swap" lib/structures/lsmtree/ --include='*.cpp' --include='*.cc' --include='*.cxx' -n || trueLength of output: 1696
swap method already implemented
- Definition found in
lib/structures/lsmtree/lsmtree.cpp
at line 326.- No missing implementation; you can safely ignore this comment.
Likely an incorrect or invalid review comment.
181-181
:✅ Verification successful
Config pointer is no longer const.
The configuration pointer member has been changed from const to non-const. If this was intentional, consider documenting why this change was made, as it could potentially allow the configuration to be modified after initialization.
🏁 Script executed:
#!/bin/bash # Check for modifications to the config pointer in the class methods grep -r "m_pConfig =" lib/structures/lsmtree/Length of output: 47
🏁 Script executed:
#!/bin/bash # List all occurrences of m_pConfig in the LSMTree implementation & header grep -R "m_pConfig" -n lib/structures/lsmtree/ # Display context around the member declaration in the header sed -n '160,200p' lib/structures/lsmtree/lsmtree.hLength of output: 4031
Non-const config pointer supports move semantics
The
m_pConfig
member was deliberately changed to a non-constconfig::shared_ptr_t
so thatlsmtree_t
instances can be safely moved or swapped via the publicswap()
method. In practice, assignments tom_pConfig
only occur in constructors and inswap()
, and all other accesses remain read-only. No further changes are required.lib/wal/log/storage/backend/file_storage_backend.h (1)
64-67
: Negative file size casts to huge positive number
append_only_file_t::size
probably returnsstd::expected<std::size_t, …>
but on error the code converts a negative sentinel to a large unsigned value.
Return0
instead (or propagate the error).- return m_file.size().value_or(0); + return m_file.size().transform_error([](auto) { return std::size_t{0}; }).value_or(0);lib/wal/log/storage/backend/backend.h (1)
3-4
:<expected>
is C++23 — verify toolchain support
std::expected
is standardised only since C++23.
If the project still targets C++20, you’ll need either
•#include <tl/expected.hpp>
(or another shim), or
•-std=c++23
in the toolchain.Please confirm CI flags/support.
lib/wal/log/storage/persistent_log_storage.h (3)
87-90
: LGTM: Good use of existing append methodThe overloaded
append
method correctly reuses the string-based append method by formatting the input parameters.
102-106
: LGTM: Reset implementation correctly clears both memory and storageThe reset method correctly handles both in-memory and backend state.
134-180
: LGTM: Well-designed builder pattern implementationThe builder pattern implementation is clean and follows best practices:
- Proper move semantics
- Deleted copy operations
- Clear error handling with std::expected
- Descriptive error messages
src/main.cpp (1)
469-479
: LGTM: Good implementation of signal handlersThe signal handlers for SIGTERM and SIGINT are properly implemented with error checking.
[[nodiscard]] auto read_impl(std::size_t offset, std::size_t size) -> std::string | ||
{ | ||
std::string buffer; | ||
buffer.resize(size); | ||
if (const auto res = m_file.read(offset, buffer.data(), size); !res.has_value()) | ||
{ | ||
spdlog::error("Failed to read from file storage. Offset={}, size={}", offset, size); | ||
return {}; | ||
} | ||
return buffer; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
read_impl
returns a string with undefined tail bytes
append_only_file_t::read
may return fewer bytes than requested; the code never shrinks buffer
, so uninitialised bytes leak to callers.
- std::string buffer;
- buffer.resize(size);
- if (const auto res = m_file.read(offset, buffer.data(), size); !res.has_value())
+ std::string buffer(size, '\0');
+ const auto res = m_file.read(offset, buffer.data(), size);
+ if (!res.has_value())
{
spdlog::error("Failed to read from file storage. Offset={}, size={}", offset, size);
return {};
}
- return buffer;
+ buffer.resize(res.value()); // keep only the bytes actually read
+ return buffer;
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
[[nodiscard]] auto read_impl(std::size_t offset, std::size_t size) -> std::string | |
{ | |
std::string buffer; | |
buffer.resize(size); | |
if (const auto res = m_file.read(offset, buffer.data(), size); !res.has_value()) | |
{ | |
spdlog::error("Failed to read from file storage. Offset={}, size={}", offset, size); | |
return {}; | |
} | |
return buffer; | |
} | |
[[nodiscard]] auto read_impl(std::size_t offset, std::size_t size) -> std::string | |
{ | |
- std::string buffer; | |
- buffer.resize(size); | |
- if (const auto res = m_file.read(offset, buffer.data(), size); !res.has_value()) | |
+ std::string buffer(size, '\0'); | |
+ const auto res = m_file.read(offset, buffer.data(), size); | |
+ if (!res.has_value()) | |
{ | |
spdlog::error("Failed to read from file storage. Offset={}, size={}", offset, size); | |
return {}; | |
} | |
- return buffer; | |
+ buffer.resize(res.value()); // keep only the bytes actually read | |
+ return buffer; | |
} |
[[nodiscard]] auto write_impl(const char *data, std::size_t offset, std::size_t size) -> bool | ||
{ | ||
(void)offset; | ||
return m_file.append({data, size}) | ||
.transform([](ssize_t res) { return res >= 0; }) | ||
.value_or(false); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
write_impl
silently ignores offset
(potential data corruption)
offset
is discarded even though the public interface promises arbitrary-offset writes.
Callers may assume idempotent overwrite semantics, leading to duplicated or corrupted log segments.
If the backend is truly append-only, document this clearly and reject non-EOF offsets:
+ if (offset != m_file.size().value_or(0)) {
+ spdlog::error("file_storage_backend_t supports only append-at-EOF writes (got offset={})", offset);
+ return false;
+ }
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
[[nodiscard]] auto write_impl(const char *data, std::size_t offset, std::size_t size) -> bool | |
{ | |
(void)offset; | |
return m_file.append({data, size}) | |
.transform([](ssize_t res) { return res >= 0; }) | |
.value_or(false); | |
} | |
[[nodiscard]] auto write_impl(const char *data, std::size_t offset, std::size_t size) -> bool | |
{ | |
(void)offset; | |
if (offset != m_file.size().value_or(0)) { | |
spdlog::error("file_storage_backend_t supports only append-at-EOF writes (got offset={})", offset); | |
return false; | |
} | |
return m_file.append({data, size}) | |
.transform([](ssize_t res) { return res >= 0; }) | |
.value_or(false); | |
} |
lib/raft/raft.cpp
Outdated
consensus_module_t::consensus_module_t(node_config_t nodeConfig, | ||
std::vector<raft_node_grpc_client_t> replicas) noexcept |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Pass node configuration by const reference
The nodeConfig
parameter is passed by value, which requires an unnecessary copy. Consider passing it by const reference for better performance.
-consensus_module_t::consensus_module_t(node_config_t nodeConfig,
+consensus_module_t::consensus_module_t(const node_config_t& nodeConfig,
std::vector<raft_node_grpc_client_t> replicas) noexcept
🧰 Tools
🪛 Cppcheck (2.10-2)
[performance] 123-123: Function parameter 'nodeConfig' should be passed by const reference.
(passedByValue)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (3)
lib/wal/wal.cpp (1)
33-36
: Prefer letting spdlog format – avoid eagerstd::format
spdlog::error(std::format(...))
eagerly builds the full string even when the log level is disabled, doubling formatting work.
Letspdlog
(fmt) handle the placeholders:- spdlog::error(std::format("WAL: Failed to append entry: {} {} {}", - op_view, - rec.kv.m_key.m_key, - rec.kv.m_value.m_value)); + spdlog::error("WAL: Failed to append entry: {} {} {}", + op_view, rec.kv.m_key.m_key, rec.kv.m_value.m_value);lib/wal/log/storage/backend/append_only_file_storage_backend.h (1)
55-67
: Avoid allocating the maximumsize
up-front for reads
Allocatingstd::string buffer(size, '\0')
can waste memory when the file tail is shorter than the requestedsize
. Allocate exactly what was read:- std::string buffer(size, '\0'); - const auto res = m_file.read(offset, buffer.data(), size); + std::string buffer(size, '\0'); + const auto res = m_file.read(offset, buffer.data(), size); … - buffer.resize(res.value()); + buffer.assign(buffer.data(), static_cast<size_t>(res.value()));or build the string from the returned byte count to avoid retaining unused capacity.
lib/wal/log/storage/backend/backend.h (1)
83-86
:config()
returns by value – unnecessary copy
config()
hands back an entirestorage_backend_config_t
object each call. Returningconst storage_backend_config_t&
avoids copies and clarifies immutability:- [[nodiscard]] auto config() const - { - return m_config; - } + [[nodiscard]] const storage_backend_config_t& config() const { return m_config; }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (18)
lib/concurrency/helpers.h
(1 hunks)lib/db/db.cpp
(4 hunks)lib/db/db.h
(3 hunks)lib/fs/append_only_file.cpp
(1 hunks)lib/server/grpc_server.cpp
(2 hunks)lib/structures/lsmtree/levels/level.cpp
(12 hunks)lib/structures/lsmtree/lsmtree.cpp
(10 hunks)lib/structures/lsmtree/lsmtree.h
(6 hunks)lib/structures/lsmtree/segments/lsmtree_regular_segment.h
(2 hunks)lib/structures/lsmtree/segments/segment_storage.h
(2 hunks)lib/wal/log/concepts.h
(1 hunks)lib/wal/log/storage/backend/append_only_file_storage_backend.h
(1 hunks)lib/wal/log/storage/backend/backend.h
(1 hunks)lib/wal/log/storage/builder.h
(1 hunks)lib/wal/log/storage/in_memory_log_storage.h
(1 hunks)lib/wal/log/storage/persistent_log_storage.h
(1 hunks)lib/wal/wal.cpp
(1 hunks)lib/wal/wal.h
(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- lib/wal/log/storage/builder.h
🚧 Files skipped from review as they are similar to previous changes (14)
- lib/server/grpc_server.cpp
- lib/db/db.h
- lib/wal/log/concepts.h
- lib/wal/log/storage/in_memory_log_storage.h
- lib/structures/lsmtree/levels/level.cpp
- lib/db/db.cpp
- lib/structures/lsmtree/lsmtree.cpp
- lib/wal/log/storage/persistent_log_storage.h
- lib/wal/wal.h
- lib/structures/lsmtree/segments/segment_storage.h
- lib/concurrency/helpers.h
- lib/structures/lsmtree/segments/lsmtree_regular_segment.h
- lib/structures/lsmtree/lsmtree.h
- lib/fs/append_only_file.cpp
🧰 Additional context used
🧬 Code Graph Analysis (1)
lib/wal/log/storage/backend/backend.h (6)
lib/wal/log/storage/builder.h (6)
nodiscard
(29-32)nodiscard
(34-38)nodiscard
(40-43)nodiscard
(45-48)nodiscard
(50-53)nodiscard
(70-105)lib/wal/log/storage/in_memory_log_storage.h (6)
nodiscard
(46-50)nodiscard
(52-56)nodiscard
(58-65)nodiscard
(67-71)nodiscard
(73-76)nodiscard
(90-93)lib/wal/log/storage/persistent_log_storage.h (2)
nodiscard
(75-86)nodiscard
(88-91)lib/fs/append_only_file.h (2)
data
(30-31)offset
(33-34)lib/fs/random_access_file.h (2)
data
(27-28)offset
(30-31)lib/fs/random_access_file.cpp (2)
size
(116-128)size
(116-116)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: build-gcc
🔇 Additional comments (2)
lib/wal/log/storage/backend/append_only_file_storage_backend.h (2)
47-53
: Offset parameter silently ignored – risk of accidental overwrite
write_impl
discards theoffset
argument and blindly appends. If callers ever rely on offset positioning (e.g. re-writing a corrupted tail), the operation will violate the contract.At minimum, assert or log when
offset != size()
to surface misuse:+ if (offset != static_cast<ssize_t>(m_file.size().value_or(0))) + { + spdlog::warn("append_only_file_storage_backend_t::write_impl: " + "offset ({}) differs from file size; forcing append", + offset); + }
112-116
: Enum valuekWrongFileFormat
is never produced
build_impl
maps every failure ofappend_only_file_builder_t
tokUnableToOpenFile
. If file-format validation is expected, returnkWrongFileFormat
whenerror.code
indicates e.g.invalid_magic
.This keeps error reporting actionable.
lib/wal/wal.cpp
Outdated
auto logStream = std::stringstream{}; | ||
for (std::size_t idx{0}; idx < m_log.size(); ++idx) | ||
{ | ||
if (auto logLine{m_log.read(idx)}; logLine.has_value()) | ||
{ | ||
logStream << logLine.value(); | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Streaming the whole log into memory is O(N²) – parse on the fly
records()
concatenates all entries into a stringstream
and then re-reads them line-by-line, doubling memory and work.
Iterating once and parsing each line immediately avoids the extra copy:
std::vector<record_t> result;
for (std::size_t idx = 0; idx < m_log.size(); ++idx)
{
if (auto line = m_log.read(idx); line && !absl::StripAsciiWhitespace(*line).empty())
{
std::istringstream s(*line);
record_t rec;
rec.read(s);
spdlog::debug("WAL: Recovered record: {}", recordToString(rec));
result.emplace_back(std::move(rec));
}
}
This keeps memory proportional to the largest record rather than the full log.
const auto op_view{magic_enum::enum_name(rec.op)}; | ||
if (!m_log.append(std::string{op_view.data(), op_view.size()}, | ||
rec.kv.m_key.m_key, | ||
rec.kv.m_value.m_value)) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✏️ Delimiter missing – recovered records will concatenate together
m_log.append
is invoked with three separate fields but without any line delimiter ('\n'
).
records()
later expects entries to be newline-separated (std::getline(logStream, line)
), which is not true for the in-memory implementation (fmt::format("{} {} {}", …)
).
Result: every record after the first is parsed as part of the same “line”, breaking recovery.
- if (!m_log.append(std::string{op_view.data(), op_view.size()},
- rec.kv.m_key.m_key,
- rec.kv.m_value.m_value))
+ // Add the delimiter directly here so all storage back-ends produce a
+ // canonical “one-entry-per-line” format expected by `records()`.
+ auto command = std::string{op_view};
+ if (!m_log.append(fmt::format("{}\n", command), // single-string overload
+ rec.kv.m_key.m_key,
+ rec.kv.m_value.m_value))
or update the storage layer to append the terminating newline.
Without this fix, log recovery will silently corrupt the WAL state.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
const auto op_view{magic_enum::enum_name(rec.op)}; | |
if (!m_log.append(std::string{op_view.data(), op_view.size()}, | |
rec.kv.m_key.m_key, | |
rec.kv.m_value.m_value)) | |
{ | |
const auto op_view{magic_enum::enum_name(rec.op)}; | |
// Add the delimiter directly here so all storage back-ends produce a | |
// canonical “one-entry-per-line” format expected by `records()`. | |
auto command = std::string{op_view}; | |
if (!m_log.append(fmt::format("{}\n", command), | |
rec.kv.m_key.m_key, | |
rec.kv.m_value.m_value)) | |
{ | |
// … | |
} |
[[nodiscard]] auto read(ssize_t offset, std::size_t size) -> std::string | ||
{ | ||
return static_cast<Derived *>(this)->read_impl(offset, size); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
No error propagation – read()
cannot tell success from empty data
The base interface returns a plain std::string
; failures are reported as an empty string, which is indistinguishable from a legitimate empty payload. Consider returning std::expected<std::string, storage_backend_error_t>
(or std::optional<std::string>
) to preserve error semantics.
This will let higher layers react to I/O failures instead of silently continuing with corrupt state.
- Fixed clang version to 19 in conan profiles, as clang++-20 borks the spdlog - Optimized WAL records() & added check for a empty log - Implement replicated_log_storage_t - Implemented proper initialization order in Main - Started integrating wal::wal_t with raft::consensus_module_t
Summary by CodeRabbit
New Features
Bug Fixes
Refactor
New Tests
Chores
Documentation