I am working on a branch to add unit tests for FP8 Paged MQA Logits (deepgemm) kernels and update the benchmark script.
Basically, to prepare the existing setup to land gfx1250 support.
However, I am getting errors and mismatch on gfx950:
ValueError: 'safe_chunks_per_cta_ptr' is not in list
and
> assert diff < 1e-3, f"{diff=}"
E AssertionError: diff=tensor(1., device='cuda:0', dtype=torch.float64)
E assert tensor(1., device='cuda:0', dtype=torch.float64) < 0.001
As I didnt change the kernel functionality, I think this issue exists in main as well?
If I run the benchmark on main:
python bench_deepgemm_attention.py -B 1
I also get:
ValueError: 'safe_chunks_per_cta_ptr' is not in list
@sjfeng1999 Am I missing something here?
I am working on a branch to add unit tests for FP8 Paged MQA Logits (deepgemm) kernels and update the benchmark script.
Basically, to prepare the existing setup to land gfx1250 support.
However, I am getting errors and mismatch on gfx950:
ValueError: 'safe_chunks_per_cta_ptr' is not in listand
As I didnt change the kernel functionality, I think this issue exists in main as well?
If I run the benchmark on main:
python bench_deepgemm_attention.py -B 1I also get:
ValueError: 'safe_chunks_per_cta_ptr' is not in list@sjfeng1999 Am I missing something here?