Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Add dedicated logging thread#21547

Draft
stelfrag wants to merge 14 commits intonetdata:masterfrom
stelfrag:async_logging
Draft

Add dedicated logging thread#21547
stelfrag wants to merge 14 commits intonetdata:masterfrom
stelfrag:async_logging

Conversation

@stelfrag
Copy link
Collaborator

@stelfrag stelfrag commented Jan 12, 2026

Summary
  • Implement asynchronous logging

Summary by cubic

Adds asynchronous logging with an event-driven background thread to offload I/O, reduce contention, and safely handle flush/reopen during shutdown, rotation, and after fork.

  • New Features

    • Command-driven event loop (uv_async) with a 4096-slot command pool and opcodes for ENTRY/FLUSH/REOPEN/SHUTDOWN; drop-on-full with queue stats and high-water mark.
    • Routes non-critical FILE/SYSLOG through async; ALERT and above stay synchronous. STDOUT/STDERR are remapped to FILE. Skips async on FILE* fallbacks, captures syslog init at enqueue, and falls back to stderr if needed.
    • REOPEN and FLUSH run inside the logger thread to avoid FILE* races; timed shutdown with atomic signaling prevents hangs; strict null-termination and source bounds checks; disown-after-fork to prevent stale handles in child processes.
    • 512B inline buffer with allocation up to 16KB; uses CLEAN_BUFFER to avoid leaks. Added nd_logger_journal_format and centralized journal direct send; journal writes stay synchronous. CMake includes nd_log-queue.c/.h.
  • Migration

    • Call nd_log_shutdown() late in application shutdown to flush pending logs and stop the thread.
    • No config changes; nd_log_reopen_log_files() reopens via the logger thread and waits until complete. Use nd_log_queue_disown_after_fork() in child processes before logging.

Written for commit 29f3c98. Summary will update on new commits.

@github-actions github-actions bot added the area/build Build system (autotools and cmake). label Jan 12, 2026
@stelfrag
Copy link
Collaborator Author

@cubic-dev-ai review this PR

@cubic-dev-ai
Copy link
Contributor

cubic-dev-ai bot commented Jan 20, 2026

@cubic-dev-ai review this PR

@stelfrag I have started the AI code review. It will take a few minutes to complete.

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 9 files

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="src/libnetdata/log/nd_log-queue.c">

<violation number="1" location="src/libnetdata/log/nd_log-queue.c:362">
P1: `nd_log_queue_shutdown()` can hang indefinitely when the command queue is full because no shutdown signal is sent, but the thread is still joined. Ensure a shutdown request is issued even when the queue is full to avoid deadlock.</violation>
</file>

<file name="src/libnetdata/log/nd_log.c">

<violation number="1" location="src/libnetdata/log/nd_log.c:312">
P2: Async logging can drop messages when select_output remaps a non-file method (e.g., journald not initialized) to stderr. The async path ignores the fallback FILE* and later writes via nd_log.sources[source].fp, which is NULL for NDLM_JOURNAL/ETW/WEL sources, so the message is lost. Skip async when the FILE* doesn’t match the source’s fp (or persist the fallback fp in the queue entry).</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

@stelfrag
Copy link
Collaborator Author

@cubic-dev-ai review this PR

@cubic-dev-ai
Copy link
Contributor

cubic-dev-ai bot commented Jan 20, 2026

@cubic-dev-ai review this PR

@stelfrag I have started the AI code review. It will take a few minutes to complete.

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 9 files

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request implements asynchronous logging with a dedicated background thread to improve performance by offloading I/O operations from hot paths.

Changes:

  • Introduces a ring buffer-based command queue (4096 entries) processed by a "LOGGER" thread
  • Routes non-critical logs (below ALERT priority) through async path while keeping ALERT+ synchronous for crash safety
  • Implements safe log rotation via REOPEN command handled entirely in the logger thread to avoid FILE* races

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/libnetdata/log/nd_log-queue.h Defines queue structures, opcodes, and public API for async logging
src/libnetdata/log/nd_log-queue.c Implements ring buffer, logger thread event loop, and queue management
src/libnetdata/log/nd_log.c Integrates async logging path with sync fallback and routing logic
src/libnetdata/log/nd_log.h Adds nd_log_shutdown() to public API
src/libnetdata/log/nd_log-init.c Initializes async queue on startup and handles log file reopening
src/libnetdata/log/nd_log-internals.h Adds nd_logger_journal_format() for journal formatting
src/libnetdata/log/nd_log-to-systemd-journal.c Refactors journal formatting into reusable function
src/daemon/daemon-shutdown.c Calls nd_log_shutdown() to flush logs before exit
CMakeLists.txt Adds nd_log-queue.c and nd_log-queue.h to build

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 11 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@stelfrag
Copy link
Collaborator Author

@cubic-dev-ai review this PR

@cubic-dev-ai
Copy link
Contributor

cubic-dev-ai bot commented Jan 21, 2026

@cubic-dev-ai review this PR

@stelfrag I have started the AI code review. It will take a few minutes to complete.

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 9 files

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="src/libnetdata/log/nd_log-queue.h">

<violation number="1" location="src/libnetdata/log/nd_log-queue.h:76">
P3: The comment claims a boolean return value, but nd_log_queue_init returns void. Update the comment (or change the signature) to avoid misleading callers about failure handling.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

- Introduce `nd_log_queue` for handling log entries asynchronously via background threads.
- Implement queue initialization, enqueue, and shutdown functions.
- Add thread-safe mechanisms for processing and flushing log entries.
- Integrate async logging with the existing logging infrastructure, falling back to synchronous if the queue is unavailable.
- Update `CMakeLists.txt` to include new files: `nd_log-queue.c` and `nd_log-queue.h`.
- Introduce a `shutdown_acknowledged` flag with supporting mutex and condition variable for safe thread shutdown.
- Ensure logger thread signals shutdown acknowledgment before exiting.
- Update cleanup process to properly destroy new synchronization primitives.
… async logging

- Store `journal_direct`, `journal_libsystemd`, and `syslog` initialization states during enqueue to ensure consistent behavior during logging.
- Add fallback mechanisms for logging to `stderr` when target destinations are not initialized.
- Introduce a flush operation before log destination reinitialization to avoid losing pending log messages.
- Refactor logging queue thread signature and improve error handling for asynchronous journal and syslog writes.
…lized send logic

- Introduce `nd_logger_journal_format` for consistent journal native protocol (KEY=VALUE\n) formatting.
- Update to use `journal_direct_send` for improved handling and fallback mechanisms.
- Simplify `nd_logger_journal_direct` by separating formatting and sending responsibilities.
- Replace the legacy `nd_log_queue` structure with the `nd_log_event_loop` and `nd_log_cmd_pool` architecture for cleaner and more efficient log entry processing.
- Transition from a queue-based logging system to a command-driven event loop using uv_async callbacks.
- Introduce reusable command pool with support for opcodes (e.g., `ND_LOG_OP_ENTRY`, `ND_LOG_OP_FLUSH`, `ND_LOG_OP_SHUTDOWN`) to enable more flexible event-driven log operations.
- Simplify logging thread to focus on processing commands in an event loop, improving modularity and reducing synchronization complexity.
- Update public API to reflect the new architecture, including changes to queue management, shutdown logic, and entry enqueueing.
- Remove unused initialization fields (`journal_direct`, `journal_libsystemd`) from log queue entries to reduce enqueue-time complexity.
- Replace raw buffer handling with `CLEAN_BUFFER` to streamline memory management and auto-free resources on return.
- Adapt async file logging to look up `FILE*` at write time, enabling proper log rotation handling.
- Deprecate journal async logging support due to sync path optimization.
- Introduce timed shutdown wait with fallback to prevent indefinite hangs during logger thread termination.
- Prevent async logging for fallback `FILE*` mismatches to avoid message loss.
- Add atomic flag signaling for immediate shutdown when the queue is full.
…R` remapping to `NDLM_FILE` during `select_output`.

- Add atomic queue depth decrement during entry processing.
- Prevent memory leaks by freeing allocated buffers when enqueueing is skipped.
…ocated log messages.

- Add source index bounds check to prevent out-of-bounds access during log entry processing.
- Optimize command pool cleanup by iterating between active indices instead of full range.
- Refactor initialization function to ensure thread-safe single execution using `FUNCTION_RUN_ONCE`.
- Enhance shutdown logic to safely handle completion destruction after thread termination.
- Use CAS loop for accurate high-water mark updates in concurrent scenarios.
- Adjust async signaling to prevent race conditions during shutdown.
- Remove rate limit checks from `nd_logger_log_fields_async` to enforce them at the caller level.
- Introduce `nd_log_queue_disown_after_fork` to reset queue state after fork, ensuring no stale handles.
- Enhance comments for `NDLM_FILE` handling and thread safety during log entry processing.
- Improve fallback handling for async logging failures, ensuring proper rate limit checks and sync fallbacks.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/build Build system (autotools and cmake). area/daemon

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants