@@ -159,17 +159,20 @@ int main()
159159/*
160160Exercises:
1611611) Write a kernel where each thread first computes its ID in a register.
162- Within each group of 4 consecutive threads, threads should then share their
162+ Within each group of 4 consecutive threads, threads should then share their
163163ID with all others, using shuffling. Write this kernel once with, once without
164164cooperative groups, and confirm correctness via output.
165- 2) Launch a COOPERATIVE KERNEL and use grid-wide synchronization to make sure
165+ 2) Launch a COOPERATIVE KERNEL and use grid-wide synchronization to make sure
166166all threads in the entire grid are at the same point in the program. Can you
167- think of any use cases for this?
168- 3) Write a simple program with the following tasks A, B, C, each with N threads.
169- In A, each thread t should compute and store t*t in its output A_out[t]. In B,
170- each thread t should compute A_out[N - t - 1] - t and store it in its output
171- B_out[t]. In C, each thread t should compute B_out[N - t - 1] + 4 and store it
172- in its output C_out[t]. Implement this once using one kernel for each task A,
167+ think of any use cases for this? Your device will need to support the attribute
168+ cudaDevAttrCooperativeLaunch for this, check if it has it before starting.
169+ 3) Write a simple program with the following tasks A, B, C, each with N threads.
170+ In A, each thread t should compute and store t*t in its output A_out[t]. In B,
171+ each thread t should compute A_out[N - t - 1] - t and store it in its output
172+ B_out[t]. In C, each thread t should compute B_out[N - t - 1] + 4 and store it
173+ in its output C_out[t]. Implement this once using one kernel for each task A,
173174and once with a single kernel that uses grid synchronization between tasks.
174175In the single kernel, do you need additional threadfences and/or volatiles?
176+ Again, in order to do grid sync, your device will need to support the
177+ cudaDevAttrCooperativeLaunch attribute, check if it has it before starting.
175178*/
0 commit comments