Thanks to visit codestin.com
Credit goes to github.com

Skip to content

cpu-o3: IPC performance degradation in decoupled front-end #2808

@ChoungJX

Description

@ChoungJX

Describe the bug
I am currently evaluating the IPC performance of the Decoupled Front-End. I have encountered a significant performance degradation when running benchmarks (benchmark example).

Upon comparing the stats.txt files from the two runs, I noticed a critical difference: when the Decoupled Front-End is enabled, the RAS (Return Address Stack) predictions are almost entirely incorrect. RAS misprediction rate jumps from ~0% (FDP off) to ~99.9% (FDP on)

To verify this, I disabled the RAS and re-ran the simulations. With the RAS disabled, the IPC results for both the Decoupled Front-End enabled and disabled cases became very similar. This suggests there might be a synchronization or state update issue with the RAS when operating within the Decoupled Front-End.

I am currently debugging the BAC (src/cpu/o3/bac.cc) stage, specifically investigating potential sequencing issues regarding RAS updates and squashes. I suspect there might be a race condition or logic error there, but I haven't been able to pinpoint the root cause yet. Any suggestions or insights on this specific logic would be greatly appreciated.

Affects version
Branch: release-staging-v25-1-0-0

gem5 Modifications
To disable RAS, I modified src/cpu/pred/BranchPredictor.py

from m5.params.null_params import NullSimObject
...
class BranchPredictor(SimObject):
...
    ras = Param.ReturnAddrStack(
        NullSimObject(), "Return address stack, set to NULL to disable RAS."
    )
...

To Reproduce
Steps to reproduce the behavior. Please assume starting from a clean repository:

  1. Compile gem5 with command: scons build/X86/gem5.opt -j16 --linker=mold
  2. Compile benchmark with command: g++ -static -O3 ./ksort_test.cc -o ./sort
  3. Execute the simulation with:
./build/X86/gem5.opt -d m5out/test3 configs/example/gem5_library/fdp-hello-stdlib.py --isa X86

./build/X86/gem5.opt -d m5out/test4 configs/example/gem5_library/fdp-hello-stdlib.py --isa X86 --disable-fdp

For configs/example/gem5_library/fdp-hello-stdlib.py, I modified memory capacity and workload. Then I ran about two-min simulation (~20M insts)

...
memory = SingleChannelDDR3_1600(size="2GiB")
...
board.set_se_binary_workload(
    BinaryResource("./sort"),
    arguments=["1111111"],
)

Expected behavior
IPC with FDP enabled should be comparable to IPC with FDP disabled, even when RAS is active.

Host Operating System
Debian 12

Host ISA
X86

Compiler used
clang-16

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugcpu-o3gem5's Out-Of-Order CPU

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions