Conversation
…ptimize related logic - Added `fds_max` field to `pid_stat` structure to track the highest file descriptor index seen (+1) for improved management. - Replaced all occurrences of `fds_size` with `fds_max` where appropriate for iteration and logic adjustments. - Enhanced `cleanup_negative_pid_fds` and other file descriptor handling to respect `fds_max`. - Ensured consistent updates to `fds_max` during PID file descriptor processing.
There was a problem hiding this comment.
No issues found across 6 files
Confidence score: 5/5
- Automated review surfaced no issues in the provided summaries.
- No files require special attention.
Architecture diagram
sequenceDiagram
participant Aggregator as apps_aggregations.c
participant Cache as Host User/Group Cache
participant Reader as OS Reader (Linux/FreeBSD)
participant Stat as pid_stat (Process Data)
participant OS as OS Kernel (procfs/MIB)
Note over Aggregator,Cache: Start Collection Cycle
Aggregator->>Cache: NEW: update_cached_host_users/groups()
Note right of Cache: Called once per cycle instead of per-PID
loop For each PID
Aggregator->>Reader: apps_os_read_pid_fds()
Reader->>Stat: CHANGED: make_all_pid_fds_negative()
Note right of Stat: Iterates only up to fds_max
Reader->>OS: Request File Descriptors
OS-->>Reader: List of active FDs
loop For each FD from OS
Reader->>Stat: Update FD status
opt FD index + 1 > current fds_max
Reader->>Stat: NEW: Expand fds_max
end
end
Reader->>Stat: NEW: cleanup_negative_pid_fds()
Note right of Stat: 1. Release FDs still negative<br/>2. Recompute/Shrink fds_max
Aggregator->>Stat: CHANGED: aggregate_pid_fds_on_targets()
Note right of Stat: Iterates only up to fds_max for counters
end
Note over Aggregator,Stat: End Collection Cycle
classDef highlight fill:#172554,stroke:#333,stroke-width:2px;
class Stat highlight;
- Introduced `PROCFILE_FLAG_NEEDS_SEEK` to mark file descriptors requiring a seek before the next read. - Refactored seek logic to skip unnecessary operations for freshly reopened file descriptors. - Improved handling of non-seekable files by integrating fallback logic directly into the read path.
There was a problem hiding this comment.
Pull request overview
This PR aims to reduce overhead in apps.plugin process/file-descriptor collection by introducing a tighter upper bound (fds_max) for FD scanning and by avoiding repeated/per-PID expensive work.
Changes:
- Added
fds_maxtostruct pid_statand updated FD iteration/cleanup logic to use it as the scan upper bound. - Updated Linux/FreeBSD FD readers to expand
fds_maxbased on the highest FD seen, and shrink it during cleanup. - Moved user/group cache refresh outside the per-PID aggregation loop; added a kernel-thread fast path to skip I/O/FD/limits reads.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
src/collectors/apps.plugin/apps_plugin.h |
Adds fds_max to pid_stat to support bounded FD scans. |
src/collectors/apps.plugin/apps_pid_files.c |
Switches aggregation/cleanup logic to iterate up to fds_max and recomputes it after cleanup. |
src/collectors/apps.plugin/apps_pid.c |
Initializes fds_max on PID entry allocation. |
src/collectors/apps.plugin/apps_os_linux.c |
Tracks highest FD seen during /proc/<pid>/fd scan and expands p->fds_max. |
src/collectors/apps.plugin/apps_os_freebsd.c |
Tracks highest FD seen during FreeBSD FD scan and expands p->fds_max. |
src/collectors/apps.plugin/apps_incremental_collection.c |
Adds kernel-thread detection to skip expensive per-PID I/O/FD/limits reads. |
src/collectors/apps.plugin/apps_aggregations.c |
Moves cached user/group refresh outside per-PID aggregation and uses fds_max when releasing per-PID FD ids. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| #endif | ||
| struct pid_fd *fds; // array of fds it uses | ||
| uint32_t fds_size; // the size of the fds array | ||
| uint32_t fds_max; // the highest fd index ever seen + 1 |
There was a problem hiding this comment.
fds_max is documented as “the highest fd index ever seen + 1”, but the implementation shrinks it in cleanup_negative_pid_fds() to the last currently-active slot. Please update the field comment to reflect its actual meaning (upper bound for iterating active p->fds[] entries).
| uint32_t fds_max; // the highest fd index ever seen + 1 | |
| uint32_t fds_max; // upper bound (exclusive) for iterating active entries in fds[] |
| bool is_kernel_thread = false; | ||
| if(p->ppid) { | ||
| struct pid_stat *pp = find_pid_entry(p->ppid); | ||
| if(pp && pp->is_aggregator) | ||
| is_kernel_thread = true; |
There was a problem hiding this comment.
The new kernel-thread fast path depends on pp->is_aggregator, but aggregators are identified via exact-match rules (see pid_match_check()), and on Linux the configured aggregator string is kthread (which won’t match kthreadd). This can make is_kernel_thread never become true (missed optimization) or behave differently than intended; consider matching the actual parent comm (kthreadd) or using an explicit OS/kernel-thread detection method.
| // -------------------------------------------------------------------- | ||
| // detect kernel threads: their parent is an aggregator | ||
| // (e.g. kthreadd on Linux, kernel on FreeBSD). | ||
| // kernel threads have no I/O, no file descriptors, no memory - skip expensive reads |
There was a problem hiding this comment.
This comment says kernel threads have “no memory” and that memory reads are skipped, but the function still always reads /proc/<pid>/status (which includes memory-related fields). Please adjust the comment to match the actual behavior being skipped (I/O, FDs, limits).
| // kernel threads have no I/O, no file descriptors, no memory - skip expensive reads | |
| // kernel threads have no meaningful I/O, file descriptors, or limits to collect here - skip those expensive reads |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // -------------------------------------------------------------------- | ||
| // detect kernel threads: their parent is an aggregator | ||
| // (e.g. kthreadd on Linux, kernel on FreeBSD). | ||
| // kernel threads have no I/O, no file descriptors, no memory - skip expensive reads | ||
|
|
||
| bool is_kernel_thread = false; | ||
| if(p->ppid) { | ||
| struct pid_stat *pp = find_pid_entry(p->ppid); | ||
| if(pp && pp->is_aggregator) | ||
| is_kernel_thread = true; | ||
| } |
There was a problem hiding this comment.
is_kernel_thread detection here only checks whether the parent PID is an aggregator. Elsewhere, kernel processes are also identified when ppid == 0 && pid != INIT_PID (see get_tree_target()), which would include the kernel aggregator process itself (e.g. kthreadd/kernel). Consider aligning this detection logic with the existing kernel/aggregator rule (including the ppid==0 case and/or p->is_aggregator) so incremental collection consistently skips expensive IO/FD reads for all kernel processes.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Summary
fds_maxto track highest seen file descriptor index and optimize related logicfds_maxfield topid_statstructure to track the highest file descriptor index seen (+1) for improved management.fds_sizewithfds_maxwhere appropriate for iteration and logic adjustments.cleanup_negative_pid_fdsand other file descriptor handling to respectfds_max.fds_maxduring PID file descriptor processing.Summary by cubic
Optimize FD handling with fds_max to cut scans and speed up aggregation. Also streamline procfile reads to reduce syscalls and handle non-seekable files more efficiently.
Written for commit b99369b. Summary will update on new commits.