-
Notifications
You must be signed in to change notification settings - Fork 335
Description
Behold another data upload proposal! It comes from lengthy discussions internally with @austinEng (basically a simplification of his idea) @kenrussell with input from @kvark and @kainino0x (actually close to one of his very old proposals).
Assumptions
-
All browsers are or will be multiprocess.
-
To have a pit of success, buffers are either upload buffers, readback buffers, or device-local buffers. This means that
MAP_WRITE
can only be used asCOPY_SRC
andMAP_READ
only withCOPY_DST
. Allowing mappable usages with usages likeVERTEX
orUNIFORM
means developers are likely to think "great, I can have all the usages at once", have it work OK on their Intel laptop but be much slower on discrete GPUs. Having all buffers mapped or mappable also adds more pressure to the OS, even on mobile. I think it's better to allow known universal fast paths (that can prevent some last % optiimzations) and have an optional feature for UMA, or a more expert feature for reduced restrictions on mappable buffers. -
In a large amount of systems it will be possible to synchronously create a shmem in the content process, and send it to the GPU process where it will be wrapped in a GPU resource. This is possible:
- on D3D12 with ID3D12Device3::OpenExistingHeapFromFileMapping
- on Metal with MTLDevice newBufferWithBytesNoCopy:length:options:deallocator:
- on Vulkan with VK_EXT_external_memory_host
- on Vulkan Android with AHardwareBuffer magic when the extension is not supported.
Proposal
mapReadAsync
stays the same. MAP_WRITE
(resp. MAP_READ
) is only allowed with COPY_SRC
(resp. COPY_DST
). GPUBuffer
now includes the following methods for mapping related things:
partial interface GPUBuffer {
ArrayBuffer mapWrite(unsigned long offset = 0, unsigned long size = 0);
async ArrayBuffer mapReadAsync();
void unmap();
};
Calling mapWrite
puts the buffer in the "mapped for writing" state. It's a validation error (and JS exception?) to mapWrite
overlapping ranges of the buffer (until the next unmap
). As usual it's an error to call GPUQueue.submit
with a buffer in the mapped state referenced in the commands
argument.
mapWrite
returns a new ArrayBuffer
that will replace the content of that range of the buffer when unmap
is called. There's also some restrictions where a buffer can only be mapped on one thread and has to be unmapped there. (that's eww compared to native :/, maybe we can find something better).
Possible implementation
An example implementation of this proposal could be that:
- On the first
mapWrite
after creation or anunmap
call, a shmem of the same size as the buffer is created and returned to JS. At the same time it is sent to the GPU process and wrapped in a GPU resource there that replaces the previous GPU resource associated with thatGPUBuffer
. (replacing the resource is mostly ok because of the usage restrictions) - On unmap, a signal is sent to the GPU process that the data has been unmapped (for submit validation).
If the OS / driver doesn't support wrapping shmem in GPU resources, unmap() sends a list of regions from the shmem to update in the mapped buffer.
Imagine we are a really smart user-agents that knows which sub-ranges of a buffer are in use on the client side, then we can skip creating a new shmem and reuse the existing one.
Choices
- Does
mapWrite
allow specifying the full range, or only a subrange? - Is the content of the buffer cleared after the first
mapWrite
call after creation orunmap
or is the buffer content preserved? - More stuff I missed?
Comparison with writeBuffer
In the case where we can wrap shmem in a GPU resource, the number of copies for writeBuffer
/ mapWrite
for JS and WASM are the following:
- JS
mapWrite
:shmem/staging -> device-local
- WASM
mapWrite
:wasm -> shmem/staging -> device-local
- JS
writeBuffer
:data -> shmem/staging -> device-local
- WASM
writeBuffer
:data -> shmem/staging -> device-local
When it's not possible to wrap shmem, the shmem/staging
step becomes shmem -> staging
with one extra copy happening.
mapWrite
in JS is the fastest path and it incurs a single copy after initialization of the data. All other paths need to copy from some already initialized data somewhere to shmem and incur 2 copies. At some point I thought WASM writeBuffer
would be better than WASM mapWrite
but that turned out to not be the case.
A reason for doing writeBuffer
is for simplicity, but I think that mapWrite
is fairly understandable and can easily shim writeBuffer
(more easily than mapWriteAsync
for example).