-
Notifications
You must be signed in to change notification settings - Fork 336
Description
Background
On Nov. 1, there was consensus in the group about scheduling resource uploads & downloads at a particular point in the device's queue so that the CPU and GPU wouldn't be accessing the same resource at the same time. This is the best (only?) solution for resources which live across multiple frames.
Let's consider the other side of the issue: when information on the CPU is only necessary on the GPU for a single frame. One example of this is the Model View Projection matrices, which are consistent throughout a single frame, but will change from frame to frame.
There are a few possible models for these kinds of resources:
- Resource Churn The application allocates and destroys a new resource each frame. Obviously (I think we can all agree) we don't want this.
- Scheduled Uploads The application uses a single resource, and schedules an upload at the start of each frame.
- Explicit Recycling Allocate
n
resources up front, and recycle them each frame. We can piggyback off the design of the swapchain here, because the swapchain includesn
buffers, and it guarantees that, when you're recording commands to draw into a particular swapchain buffer, that buffer is not being accessed by the GPU. If we make an array of our own resources which parallels then
buffers in the swapchain, the parallel item in our own resource array is unused by the GPU at recording time, and therefore is free for the CPU to populate. - Implicit recycling Just like above, a collection of resources will be created, but this array is owned by the implementation. All resources exist inside the implementation in a "free pool" or an "in-use pool." When an application asks for a resource, one is pulled from the free pool, or if the pool doesn't contain anything compatible, a new one is created. The application then attaches this resource to their recording commands, and notifies the implementation when they're done recording with this resource. At this point, however, the resource isn't returned to the free pool; instead, the implementation only returns the resource to the free pool when the GPU is finished with the resource. This way, any resource granted to the application is free to be immediately written into by the CPU. (Indeed, the resource acquisition function may even accept an argument to specify the resource's initial contents.) In this option, the same number of resources will be created and recycled as in option 3, but the application doesn't own the array.
Recommendation
Option 4 is most compatible with a Web API, and should be the model for WebGPU. This is for a few reasons.
Option 2 is a good start, but we can do better. In this model, the CPU-side memcpy() will occur on the GPU's timeline, taking time away from the GPU's execution. The other models allow for issuing the memcpy() during command recording, before ownership of the resource has been given to the GPU.
Option 3 improves upon 2, but has the drawback of making application logic dependent on the number of buffers in the swapchain, which is another potential source of non-portability. In particular, both Metal and Vulkan (and maybe Direct3D, I don't know) don't let the application specify exactly how many buffers the swapchain contains. Vulkan allows the application to request a certain number, but the actual implementation may return a different number than requested. Metal never tells you how many buffers are in the swapchain, but just gives you the "next one." We'd like to avoid web authors hardcoding a constant number of resources in their application because that happened to be how many buffers their local machine was using.
Option 4 has all the benefits of option 3, but has additional benefits:
- Portability Application logic is insensitive to the number of buffers in the platform's swapchain.
- Performance Letting the browser automatically recycle resources means that the browser can improve the performance of poorly-written applications. Recycling should be automatic; developers shouldn't have to opt-in to good performance.
- Fingerprinting The nature of the platform swapchains would provide more entropy for fingerprinting.