-
Notifications
You must be signed in to change notification settings - Fork 10
Description
Overview
Create a new analysis module that tracks off-CPU times for processes and correlates them with cache eviction effects on cycles per instruction (CPI). This analysis will help understand how cache pollution from other processes
affects performance when a process is rescheduled.
Background
When a process is not scheduled on a CPU, other threads have the opportunity to evict its data from CPU caches. We hypothesize that:
- Longer off-CPU times lead to more cache eviction
- More CPU work by other threads during off-CPU periods increases cache pollution
- Both factors should correlate with higher CPI when the process resumes
Requirements
Two Types of Off-CPU Time Tracking
-
Per-CPU Off-CPU Time: Time since the process was last scheduled on the same CPU ID
- Zero for consecutive timer events without context switches
- Tracks per-CPU scheduling history
-
Global Off-CPU Time: Time since the process was scheduled on any CPU core
- Zero if the process is running in parallel on other cores when the event starts
- Tracks overall process scheduling gaps
Additional Metrics
For each event, track:
- Per-CPU off-CPU time
- Global off-CPU time
- Total CPU time consumed by other threads during the per-CPU off-CPU period
- Total CPU time consumed by other threads during the global off-CPU period
Data Structures
New columns to output from the analysis:
- per_cpu_off_time_ns
- global_off_time_ns
- per_cpu_other_cpu_time_ns
- global_other_cpu_time_ns
Implementation Plan
1. Refactor CPU Time Counter
- Extract
CpuTimeCounterfromcrates/trace-analysis/src/concurrency_analysis.rs - Move to new module:
crates/trace-analysis/src/cpu_time_counter.rs - Make it reusable for both concurrency and off-CPU analysis
2. Create CPU Time Tracker
- New module:
crates/trace-analysis/src/cpu_time_tracker.rs - Maintains:
- Per-PID CPU time counters
- Global CPU time counter for non-kernel threads
- Per-CPU: which PID is currently scheduled
- Per-CPU: last update timestamp
- Moves most functionality from existing concurrency analysis
3. Implement Off-CPU Analysis Module
- New module:
crates/trace-analysis/src/off_cpu_analysis.rs - Tracks per-PID last scheduled time on each CPU (module-specific state)
- Uses
CpuTimeTrackerfor CPU time counter functionality - Calculates off-CPU times and other-thread CPU usage for each event
- Outputs derived metrics and CSV files in
finalize()method
4. Update Concurrency Analysis
- Refactor to use shared
CpuTimeCounterandCpuTimeTracker - Maintain existing functionality while using new shared components
Expected Visualizations
The analysis will generate heat maps similar to concurrency analysis output, with four charts:
- Per-CPU Off-CPU Time vs CPI: X-axis binned by per-CPU off-CPU time values
- Global Off-CPU Time vs CPI: X-axis binned by global off-CPU time values
- Per-CPU Other CPU Time vs CPI: X-axis binned by CPU time consumed by other threads during per-CPU off-CPU periods
- Global Other CPU Time vs CPI: X-axis binned by CPU time consumed by other threads during global off-CPU periods
Chart Specifications
- Heat Maps: Each bin on X-axis normalized to 1, Y-axis shows CPI distribution
- Sample Weighting: Samples weighted by number of instructions
- Per-Process: Each process gets its own set of heat maps
- Bar Charts: Below each heat map, showing instruction count per X-axis bin for statistical significance
CSV Output Format
Similar to concurrency analysis, output CSV files for each metric with columns:
- Process name
- X-axis bin start and end times
- CPI bin start and end
- Number of instructions for that tuple
CSV files generated in the finalize() method.
Expected Correlation
We expect to see:
- Higher CPI when per-CPU off-CPU time is high
- Higher CPI when global off-CPU time is high
- Higher CPI when other threads consumed more CPU time during off-CPU periods
- Strongest correlation when both off-CPU time and other-thread CPU usage are high
Acceptance Criteria
-
CpuTimeCounterextracted to separate module -
CpuTimeTrackerimplemented with CPU time counter maintenance - Off-CPU analysis module calculates both types of off-CPU time
- Off-CPU analysis tracks total CPU time of other threads during off-CPU periods
- Off-CPU analysis maintains per-PID last scheduled time per CPU
- Concurrency analysis refactored to use shared components
- Analysis outputs four new columns for off-CPU metrics
- CSV files generated in
finalize()method for visualization - Heat map and bar chart visualizations match concurrency analysis format
Files to Modify/Create
crates/trace-analysis/src/cpu_time_counter.rs(new)crates/trace-analysis/src/cpu_time_tracker.rs(new)crates/trace-analysis/src/off_cpu_analysis.rs(new)crates/trace-analysis/src/concurrency_analysis.rs(refactor)crates/trace-analysis/src/lib.rs(add new modules)
Related Issues
This builds on the existing trace collection and concurrency analysis infrastructure to provide deeper insights into cache-related performance degradation.