Problems with the description of block scheduling in the paper

1. What I understand is that on nvidia GPUs, blocks are scheduled on SMs (CUs) using a  round-robin [policy,](https://www.cs.rochester.edu/~sree/fermi-tbs/fermi-tbs.html) so the blocks in the kernel should be interleaved on CUs, not just on a few CUs as in the figure for a kernel's blocks

2. For the “**dispatch delay**” described in the first case in the figure, what I wonder is why can't blocks wait for idle CUs?

![image](https://user-images.githubusercontent.com/34492736/195246087-e3c80b0f-d214-467f-9b67-4848241a443a.png)
I would be grateful if you could reply!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Problems with the description of block scheduling in the paper #4

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Problems with the description of block scheduling in the paper #4

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions