|
| 1 | +.. SPDX-License-Identifier: GPL-2.0 |
| 2 | +.. _vpa-dtl: |
| 3 | + |
| 4 | +=================================== |
| 5 | +DTL (Dispatch Trace Log) |
| 6 | +=================================== |
| 7 | + |
| 8 | +Athira Rajeev, 19 April 2025 |
| 9 | + |
| 10 | +.. contents:: |
| 11 | + :depth: 3 |
| 12 | + |
| 13 | + |
| 14 | +Basic overview |
| 15 | +============== |
| 16 | + |
| 17 | +The pseries Shared Processor Logical Partition(SPLPAR) machines can |
| 18 | +retrieve a log of dispatch and preempt events from the hypervisor |
| 19 | +using data from Disptach Trace Log(DTL) buffer. With this information, |
| 20 | +user can retrieve when and why each dispatch & preempt has occurred. |
| 21 | +The vpa-dtl PMU exposes the Virtual Processor Area(VPA) DTL counters |
| 22 | +via perf. |
| 23 | + |
| 24 | +Infrastructure used |
| 25 | +=================== |
| 26 | + |
| 27 | +The VPA DTL PMU counters do not interrupt on overflow or generate any |
| 28 | +PMI interrupts. Therefore, hrtimer is used to poll the DTL data. The timer |
| 29 | +nterval can be provided by user via sample_period field in nano seconds. |
| 30 | +vpa dtl pmu has one hrtimer added per vpa-dtl pmu thread. DTL (Dispatch |
| 31 | +Trace Log) contains information about dispatch/preempt, enqueue time etc. |
| 32 | +We directly copy the DTL buffer data as part of auxiliary buffer and it |
| 33 | +will be processed later. This will avoid time taken to create samples |
| 34 | +in the kernel space. The PMU driver collecting Dispatch Trace Log (DTL) |
| 35 | +entries makes use of AUX support in perf infrastructure. On the tools side, |
| 36 | +this data is made available as PERF_RECORD_AUXTRACE records. |
| 37 | + |
| 38 | +To correlate each DTL entry with other events across CPU's, an auxtrace_queue |
| 39 | +is created for each CPU. Each auxtrace queue has a array/list of auxtrace buffers. |
| 40 | +All auxtrace queues is maintained in auxtrace heap. The queues are sorted |
| 41 | +based on timestamp. When the different PERF_RECORD_XX records are processed, |
| 42 | +compare the timestamp of perf record with timestamp of top element in the |
| 43 | +auxtrace heap so that DTL events can be co-related with other events |
| 44 | +Process the auxtrace queue if the timestamp of element from heap is |
| 45 | +lower than timestamp from entry in perf record. Sometimes it could happen that |
| 46 | +one buffer is only partially processed. if the timestamp of occurrence of |
| 47 | +another event is more than currently processed element in the queue, it will |
| 48 | +move on to next perf record. So keep track of position of buffer to continue |
| 49 | +processing next time. Update the timestamp of the auxtrace heap with the timestamp |
| 50 | +of last processed entry from the auxtrace buffer. |
| 51 | + |
| 52 | +This infrastructure ensures dispatch trace log entries can be correlated |
| 53 | +and presented along with other events like sched. |
| 54 | + |
| 55 | +vpa-dtl PMU example usage |
| 56 | +========================= |
| 57 | + |
| 58 | +.. code-block:: sh |
| 59 | +
|
| 60 | + # ls /sys/devices/vpa_dtl/ |
| 61 | + events format perf_event_mux_interval_ms power subsystem type uevent |
| 62 | +
|
| 63 | +
|
| 64 | +To capture the DTL data using perf record: |
| 65 | +.. code-block:: sh |
| 66 | +
|
| 67 | + # ./perf record -a -e sched:\*,vpa_dtl/dtl_all/ -c 1000000000 sleep 1 |
| 68 | +
|
| 69 | +The result can be interpreted using perf record. Snippet of perf report -D |
| 70 | + |
| 71 | +.. code-block:: sh |
| 72 | +
|
| 73 | + # ./perf report -D |
| 74 | +
|
| 75 | +There are different PERF_RECORD_XX records. In that records corresponding to |
| 76 | +auxtrace buffers includes: |
| 77 | + |
| 78 | +1. PERF_RECORD_AUX |
| 79 | + Conveys that new data is available in AUX area |
| 80 | + |
| 81 | +2. PERF_RECORD_AUXTRACE_INFO |
| 82 | + Describes offset and size of auxtrace data in the buffers |
| 83 | + |
| 84 | +3. PERF_RECORD_AUXTRACE |
| 85 | + This is the record that defines the auxtrace data which here in case of |
| 86 | + vpa-dtl pmu is dispatch trace log data. |
| 87 | + |
| 88 | +Snippet from perf report -D showing the PERF_RECORD_AUXTRACE dump |
| 89 | + |
| 90 | +.. code-block:: sh |
| 91 | +
|
| 92 | +0 0 0x39b10 [0x30]: PERF_RECORD_AUXTRACE size: 0x690 offset: 0 ref: 0 idx: 0 tid: -1 cpu: 0 |
| 93 | +. |
| 94 | +. ... VPA DTL PMU data: size 1680 bytes, entries is 35 |
| 95 | +. 00000000: boot_tb: 21349649546353231, tb_freq: 512000000 |
| 96 | +. 00000030: dispatch_reason:decrementer interrupt, preempt_reason:H_CEDE, enqueue_to_dispatch_time:7064, ready_to_enqueue_time:187, waiting_to_ready_time:6611773 |
| 97 | +. 00000060: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:146, ready_to_enqueue_time:0, waiting_to_ready_time:15359437 |
| 98 | +. 00000090: dispatch_reason:decrementer interrupt, preempt_reason:H_CEDE, enqueue_to_dispatch_time:4868, ready_to_enqueue_time:232, waiting_to_ready_time:5100709 |
| 99 | +. 000000c0: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:179, ready_to_enqueue_time:0, waiting_to_ready_time:30714243 |
| 100 | +. 000000f0: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:197, ready_to_enqueue_time:0, waiting_to_ready_time:15350648 |
| 101 | +. 00000120: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:213, ready_to_enqueue_time:0, waiting_to_ready_time:15353446 |
| 102 | +. 00000150: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:212, ready_to_enqueue_time:0, waiting_to_ready_time:15355126 |
| 103 | +. 00000180: dispatch_reason:decrementer interrupt, preempt_reason:H_CEDE, enqueue_to_dispatch_time:6368, ready_to_enqueue_time:164, waiting_to_ready_time:5104665 |
| 104 | + |
| 105 | +Above is representation of dtl entry of below format: |
| 106 | + |
| 107 | +struct dtl_entry { |
| 108 | + u8 dispatch_reason; |
| 109 | + u8 preempt_reason; |
| 110 | + u16 processor_id; |
| 111 | + u32 enqueue_to_dispatch_time; |
| 112 | + u32 ready_to_enqueue_time; |
| 113 | + u32 waiting_to_ready_time; |
| 114 | + u64 timebase; |
| 115 | + u64 fault_addr; |
| 116 | + u64 srr0; |
| 117 | + u64 srr1; |
| 118 | + |
| 119 | +}; |
| 120 | + |
| 121 | +First two fields represent the dispatch reason and preempt reason. The post |
| 122 | +processing of PERF_RECORD_AUXTRACE records will translate to meaningful data |
| 123 | +for user to consume. |
| 124 | + |
| 125 | +Visualize the dispatch trace log entries with perf report |
| 126 | +========================================================= |
| 127 | + |
| 128 | +.. code-block:: sh |
| 129 | +
|
| 130 | + # ./perf record -a -e sched:*,vpa_dtl/dtl_all/ -c 1000000000 sleep 1 |
| 131 | + [ perf record: Woken up 1 times to write data ] |
| 132 | + [ perf record: Captured and wrote 0.300 MB perf.data ] |
| 133 | +
|
| 134 | + # ./perf report |
| 135 | + # Samples: 321 of event 'vpa-dtl' |
| 136 | + # Event count (approx.): 321 |
| 137 | + # |
| 138 | + # Children Self Command Shared Object Symbol |
| 139 | + # ........ ........ ....... ................. .............................. |
| 140 | + # |
| 141 | + 100.00% 100.00% swapper [kernel.kallsyms] [k] plpar_hcall_norets_notrace |
| 142 | +
|
| 143 | +Visualize the dispatch trace log entries with perf script |
| 144 | +========================================================= |
| 145 | + |
| 146 | +.. code-block:: sh |
| 147 | +
|
| 148 | + # ./perf script |
| 149 | + migration/9 67 [009] 105373.359903: sched:sched_waking: comm=perf pid=13418 prio=120 target_cpu=009 |
| 150 | + migration/9 67 [009] 105373.359904: sched:sched_migrate_task: comm=perf pid=13418 prio=120 orig_cpu=9 dest_cpu=10 |
| 151 | + migration/9 67 [009] 105373.359907: sched:sched_stat_runtime: comm=migration/9 pid=67 runtime=4050 [ns] |
| 152 | + migration/9 67 [009] 105373.359908: sched:sched_switch: prev_comm=migration/9 prev_pid=67 prev_prio=0 prev_state=S ==> next_comm=swapper/9 next_pid=0 next_prio=120 |
| 153 | + :256 256 [016] 105373.359913: vpa-dtl: timebase: 21403600706628832 dispatch_reason:decrementer interrupt, preempt_reason:H_CEDE, enqueue_to_dispatch_time:4854, ready_to_enqueue_time:139, waiting_to_ready_time:511842115 c0000000000fcd28 plpar_hcall_norets_notrace+0x18 ([kernel.kallsyms]) |
| 154 | + :256 256 [017] 105373.360012: vpa-dtl: timebase: 21403600706679454 dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:236, ready_to_enqueue_time:0, waiting_to_ready_time:133864583 c0000000000fcd28 plpar_hcall_norets_notrace+0x18 ([kernel.kallsyms]) |
| 155 | + perf 13418 [010] 105373.360048: sched:sched_stat_runtime: comm=perf pid=13418 runtime=139748 [ns] |
| 156 | + perf 13418 [010] 105373.360052: sched:sched_waking: comm=migration/10 pid=72 prio=0 target_cpu=010 |
0 commit comments