-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Description
Describe the bug
I am currently evaluating the IPC performance of the Decoupled Front-End. I have encountered a significant performance degradation when running benchmarks (benchmark example).
Upon comparing the stats.txt files from the two runs, I noticed a critical difference: when the Decoupled Front-End is enabled, the RAS (Return Address Stack) predictions are almost entirely incorrect. RAS misprediction rate jumps from ~0% (FDP off) to ~99.9% (FDP on)
To verify this, I disabled the RAS and re-ran the simulations. With the RAS disabled, the IPC results for both the Decoupled Front-End enabled and disabled cases became very similar. This suggests there might be a synchronization or state update issue with the RAS when operating within the Decoupled Front-End.
I am currently debugging the BAC (src/cpu/o3/bac.cc) stage, specifically investigating potential sequencing issues regarding RAS updates and squashes. I suspect there might be a race condition or logic error there, but I haven't been able to pinpoint the root cause yet. Any suggestions or insights on this specific logic would be greatly appreciated.
Affects version
Branch: release-staging-v25-1-0-0
gem5 Modifications
To disable RAS, I modified src/cpu/pred/BranchPredictor.py
from m5.params.null_params import NullSimObject
...
class BranchPredictor(SimObject):
...
ras = Param.ReturnAddrStack(
NullSimObject(), "Return address stack, set to NULL to disable RAS."
)
...To Reproduce
Steps to reproduce the behavior. Please assume starting from a clean repository:
- Compile gem5 with command:
scons build/X86/gem5.opt -j16 --linker=mold - Compile benchmark with command:
g++ -static -O3 ./ksort_test.cc -o ./sort - Execute the simulation with:
./build/X86/gem5.opt -d m5out/test3 configs/example/gem5_library/fdp-hello-stdlib.py --isa X86
./build/X86/gem5.opt -d m5out/test4 configs/example/gem5_library/fdp-hello-stdlib.py --isa X86 --disable-fdpFor configs/example/gem5_library/fdp-hello-stdlib.py, I modified memory capacity and workload. Then I ran about two-min simulation (~20M insts)
...
memory = SingleChannelDDR3_1600(size="2GiB")
...
board.set_se_binary_workload(
BinaryResource("./sort"),
arguments=["1111111"],
)Expected behavior
IPC with FDP enabled should be comparable to IPC with FDP disabled, even when RAS is active.
Host Operating System
Debian 12
Host ISA
X86
Compiler used
clang-16