Codestin Search App

stelfrag · 2026-03-02T14:54:56Z

Summary

Introduce fds_max to track highest seen file descriptor index and optimize related logic
Added fds_max field to pid_stat structure to track the highest file descriptor index seen (+1) for improved management.
Replaced all occurrences of fds_size with fds_max where appropriate for iteration and logic adjustments.
Enhanced cleanup_negative_pid_fds and other file descriptor handling to respect fds_max.
Ensured consistent updates to fds_max during PID file descriptor processing.

Summary by cubic

Optimize FD handling with fds_max to cut scans and speed up aggregation. Also streamline procfile reads to reduce syscalls and handle non-seekable files more efficiently.

Refactors
- Added fds_max to pid_stat; initialized on allocation and maintained in Linux/FreeBSD FD readers (grow during read; recompute/shrink during cleanup).
- Iterate using fds_max for FD aggregation/cleanup; safely release closed FDs.
- Refresh host users/groups once per aggregation pass.
- Detect kernel threads (parent is an aggregator) and skip IO/FDS/limits reads during incremental collection.
- Procfile parser: stop read loop at EOF, introduce NEEDS_SEEK to seek only before the next read, and reopen on non-seekable fds; skip seek after reopen.

^{Written for commit b99369b. Summary will update on new commits.}

…ptimize related logic - Added `fds_max` field to `pid_stat` structure to track the highest file descriptor index seen (+1) for improved management. - Replaced all occurrences of `fds_size` with `fds_max` where appropriate for iteration and logic adjustments. - Enhanced `cleanup_negative_pid_fds` and other file descriptor handling to respect `fds_max`. - Ensured consistent updates to `fds_max` during PID file descriptor processing.

cubic-dev-ai

No issues found across 6 files

Confidence score: 5/5

Automated review surfaced no issues in the provided summaries.
No files require special attention.

Architecture diagram

sequenceDiagram
    participant Aggregator as apps_aggregations.c
    participant Cache as Host User/Group Cache
    participant Reader as OS Reader (Linux/FreeBSD)
    participant Stat as pid_stat (Process Data)
    participant OS as OS Kernel (procfs/MIB)

    Note over Aggregator,Cache: Start Collection Cycle

    Aggregator->>Cache: NEW: update_cached_host_users/groups()
    Note right of Cache: Called once per cycle instead of per-PID

    loop For each PID
        Aggregator->>Reader: apps_os_read_pid_fds()
        
        Reader->>Stat: CHANGED: make_all_pid_fds_negative()
        Note right of Stat: Iterates only up to fds_max
        
        Reader->>OS: Request File Descriptors
        OS-->>Reader: List of active FDs
        
        loop For each FD from OS
            Reader->>Stat: Update FD status
            opt FD index + 1 > current fds_max
                Reader->>Stat: NEW: Expand fds_max
            end
        end

        Reader->>Stat: NEW: cleanup_negative_pid_fds()
        Note right of Stat: 1. Release FDs still negative<br/>2. Recompute/Shrink fds_max
        
        Aggregator->>Stat: CHANGED: aggregate_pid_fds_on_targets()
        Note right of Stat: Iterates only up to fds_max for counters
    end

    Note over Aggregator,Stat: End Collection Cycle

    classDef highlight fill:#172554,stroke:#333,stroke-width:2px;
    class Stat highlight;

… at EOF.

- Introduced `PROCFILE_FLAG_NEEDS_SEEK` to mark file descriptors requiring a seek before the next read. - Refactored seek logic to skip unnecessary operations for freshly reopened file descriptors. - Improved handling of non-seekable files by integrating fallback logic directly into the read path.

Copilot

Pull request overview

This PR aims to reduce overhead in apps.plugin process/file-descriptor collection by introducing a tighter upper bound (fds_max) for FD scanning and by avoiding repeated/per-PID expensive work.

Changes:

Added fds_max to struct pid_stat and updated FD iteration/cleanup logic to use it as the scan upper bound.
Updated Linux/FreeBSD FD readers to expand fds_max based on the highest FD seen, and shrink it during cleanup.
Moved user/group cache refresh outside the per-PID aggregation loop; added a kernel-thread fast path to skip I/O/FD/limits reads.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
`src/collectors/apps.plugin/apps_plugin.h`	Adds `fds_max` to `pid_stat` to support bounded FD scans.
`src/collectors/apps.plugin/apps_pid_files.c`	Switches aggregation/cleanup logic to iterate up to `fds_max` and recomputes it after cleanup.
`src/collectors/apps.plugin/apps_pid.c`	Initializes `fds_max` on PID entry allocation.
`src/collectors/apps.plugin/apps_os_linux.c`	Tracks highest FD seen during `/proc/<pid>/fd` scan and expands `p->fds_max`.
`src/collectors/apps.plugin/apps_os_freebsd.c`	Tracks highest FD seen during FreeBSD FD scan and expands `p->fds_max`.
`src/collectors/apps.plugin/apps_incremental_collection.c`	Adds kernel-thread detection to skip expensive per-PID I/O/FD/limits reads.
`src/collectors/apps.plugin/apps_aggregations.c`	Moves cached user/group refresh outside per-PID aggregation and uses `fds_max` when releasing per-PID FD ids.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-02T17:14:15Z

src/collectors/apps.plugin/apps_plugin.h

 #endif
    struct pid_fd *fds;             // array of fds it uses
    uint32_t fds_size;              // the size of the fds array
+    uint32_t fds_max;               // the highest fd index ever seen + 1


fds_max is documented as “the highest fd index ever seen + 1”, but the implementation shrinks it in cleanup_negative_pid_fds() to the last currently-active slot. Please update the field comment to reflect its actual meaning (upper bound for iterating active p->fds[] entries).

Suggested change

uint32_t fds_max; // the highest fd index ever seen + 1

uint32_t fds_max; // upper bound (exclusive) for iterating active entries in fds[]

Copilot · 2026-03-02T17:14:16Z

src/collectors/apps.plugin/apps_incremental_collection.c

+    bool is_kernel_thread = false;
+    if(p->ppid) {
+        struct pid_stat *pp = find_pid_entry(p->ppid);
+        if(pp && pp->is_aggregator)
+            is_kernel_thread = true;


The new kernel-thread fast path depends on pp->is_aggregator, but aggregators are identified via exact-match rules (see pid_match_check()), and on Linux the configured aggregator string is kthread (which won’t match kthreadd). This can make is_kernel_thread never become true (missed optimization) or behave differently than intended; consider matching the actual parent comm (kthreadd) or using an explicit OS/kernel-thread detection method.

Copilot · 2026-03-02T17:14:16Z

src/collectors/apps.plugin/apps_incremental_collection.c

+    // --------------------------------------------------------------------
+    // detect kernel threads: their parent is an aggregator
+    // (e.g. kthreadd on Linux, kernel on FreeBSD).
+    // kernel threads have no I/O, no file descriptors, no memory - skip expensive reads


This comment says kernel threads have “no memory” and that memory reads are skipped, but the function still always reads /proc/<pid>/status (which includes memory-related fields). Please adjust the comment to match the actual behavior being skipped (I/O, FDs, limits).

Suggested change

// kernel threads have no I/O, no file descriptors, no memory - skip expensive reads

// kernel threads have no meaningful I/O, file descriptors, or limits to collect here - skip those expensive reads

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-02T17:40:15Z

src/collectors/apps.plugin/apps_incremental_collection.c

+    // --------------------------------------------------------------------
+    // detect kernel threads: their parent is an aggregator
+    // (e.g. kthreadd on Linux, kernel on FreeBSD).
+    // kernel threads have no I/O, no file descriptors, no memory - skip expensive reads
+
+    bool is_kernel_thread = false;
+    if(p->ppid) {
+        struct pid_stat *pp = find_pid_entry(p->ppid);
+        if(pp && pp->is_aggregator)
+            is_kernel_thread = true;
+    }


is_kernel_thread detection here only checks whether the parent PID is an aggregator. Elsewhere, kernel processes are also identified when ppid == 0 && pid != INIT_PID (see get_tree_target()), which would include the kernel aggregator process itself (e.g. kthreadd/kernel). Consider aligning this detection logic with the existing kernel/aggregator rule (including the ppid==0 case and/or p->is_aggregator) so incremental collection consistently skips expensive IO/FD reads for all kernel processes.

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

github-actions bot added area/collectors Everything related to data collection collectors/apps labels Mar 2, 2026

cubic-dev-ai bot reviewed Mar 2, 2026

View reviewed changes

stelfrag added 4 commits March 2, 2026 18:10

Skip expensive reads for kernel threads during incremental collection

751b783

Skip expensive reads for kernel threads during incremental collection

b5c1a1f

Optimize read syscall in procfile parser to avoid unnecessary reads…

4a6c8ff

… at EOF.

ilyam8 requested a review from Copilot March 2, 2026 17:08

Copilot started reviewing on behalf of ilyam8 March 2, 2026 17:09 View session