Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Add Off-CPU Time Analysis Module #241

@yonch

Description

@yonch

Overview

Create a new analysis module that tracks off-CPU times for processes and correlates them with cache eviction effects on cycles per instruction (CPI). This analysis will help understand how cache pollution from other processes
affects performance when a process is rescheduled.

Background

When a process is not scheduled on a CPU, other threads have the opportunity to evict its data from CPU caches. We hypothesize that:

  • Longer off-CPU times lead to more cache eviction
  • More CPU work by other threads during off-CPU periods increases cache pollution
  • Both factors should correlate with higher CPI when the process resumes

Requirements

Two Types of Off-CPU Time Tracking

  1. Per-CPU Off-CPU Time: Time since the process was last scheduled on the same CPU ID

    • Zero for consecutive timer events without context switches
    • Tracks per-CPU scheduling history
  2. Global Off-CPU Time: Time since the process was scheduled on any CPU core

    • Zero if the process is running in parallel on other cores when the event starts
    • Tracks overall process scheduling gaps

Additional Metrics

For each event, track:

  • Per-CPU off-CPU time
  • Global off-CPU time
  • Total CPU time consumed by other threads during the per-CPU off-CPU period
  • Total CPU time consumed by other threads during the global off-CPU period

Data Structures

New columns to output from the analysis:

  • per_cpu_off_time_ns
  • global_off_time_ns
  • per_cpu_other_cpu_time_ns
  • global_other_cpu_time_ns

Implementation Plan

1. Refactor CPU Time Counter

  • Extract CpuTimeCounter from crates/trace-analysis/src/concurrency_analysis.rs
  • Move to new module: crates/trace-analysis/src/cpu_time_counter.rs
  • Make it reusable for both concurrency and off-CPU analysis

2. Create CPU Time Tracker

  • New module: crates/trace-analysis/src/cpu_time_tracker.rs
  • Maintains:
    • Per-PID CPU time counters
    • Global CPU time counter for non-kernel threads
    • Per-CPU: which PID is currently scheduled
    • Per-CPU: last update timestamp
  • Moves most functionality from existing concurrency analysis

3. Implement Off-CPU Analysis Module

  • New module: crates/trace-analysis/src/off_cpu_analysis.rs
  • Tracks per-PID last scheduled time on each CPU (module-specific state)
  • Uses CpuTimeTracker for CPU time counter functionality
  • Calculates off-CPU times and other-thread CPU usage for each event
  • Outputs derived metrics and CSV files in finalize() method

4. Update Concurrency Analysis

  • Refactor to use shared CpuTimeCounter and CpuTimeTracker
  • Maintain existing functionality while using new shared components

Expected Visualizations

The analysis will generate heat maps similar to concurrency analysis output, with four charts:

  1. Per-CPU Off-CPU Time vs CPI: X-axis binned by per-CPU off-CPU time values
  2. Global Off-CPU Time vs CPI: X-axis binned by global off-CPU time values
  3. Per-CPU Other CPU Time vs CPI: X-axis binned by CPU time consumed by other threads during per-CPU off-CPU periods
  4. Global Other CPU Time vs CPI: X-axis binned by CPU time consumed by other threads during global off-CPU periods

Chart Specifications

  • Heat Maps: Each bin on X-axis normalized to 1, Y-axis shows CPI distribution
  • Sample Weighting: Samples weighted by number of instructions
  • Per-Process: Each process gets its own set of heat maps
  • Bar Charts: Below each heat map, showing instruction count per X-axis bin for statistical significance

CSV Output Format

Similar to concurrency analysis, output CSV files for each metric with columns:

  • Process name
  • X-axis bin start and end times
  • CPI bin start and end
  • Number of instructions for that tuple

CSV files generated in the finalize() method.

Expected Correlation

We expect to see:

  • Higher CPI when per-CPU off-CPU time is high
  • Higher CPI when global off-CPU time is high
  • Higher CPI when other threads consumed more CPU time during off-CPU periods
  • Strongest correlation when both off-CPU time and other-thread CPU usage are high

Acceptance Criteria

  • CpuTimeCounter extracted to separate module
  • CpuTimeTracker implemented with CPU time counter maintenance
  • Off-CPU analysis module calculates both types of off-CPU time
  • Off-CPU analysis tracks total CPU time of other threads during off-CPU periods
  • Off-CPU analysis maintains per-PID last scheduled time per CPU
  • Concurrency analysis refactored to use shared components
  • Analysis outputs four new columns for off-CPU metrics
  • CSV files generated in finalize() method for visualization
  • Heat map and bar chart visualizations match concurrency analysis format

Files to Modify/Create

  • crates/trace-analysis/src/cpu_time_counter.rs (new)
  • crates/trace-analysis/src/cpu_time_tracker.rs (new)
  • crates/trace-analysis/src/off_cpu_analysis.rs (new)
  • crates/trace-analysis/src/concurrency_analysis.rs (refactor)
  • crates/trace-analysis/src/lib.rs (add new modules)

Related Issues

This builds on the existing trace collection and concurrency analysis infrastructure to provide deeper insights into cache-related performance degradation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions