Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Synchronous non-blocking mapWrite #594

@Kangz

Description

@Kangz

Behold another data upload proposal! It comes from lengthy discussions internally with @austinEng (basically a simplification of his idea) @kenrussell with input from @kvark and @kainino0x (actually close to one of his very old proposals).

Assumptions

  1. All browsers are or will be multiprocess.

  2. To have a pit of success, buffers are either upload buffers, readback buffers, or device-local buffers. This means that MAP_WRITE can only be used as COPY_SRC and MAP_READ only with COPY_DST. Allowing mappable usages with usages like VERTEX or UNIFORM means developers are likely to think "great, I can have all the usages at once", have it work OK on their Intel laptop but be much slower on discrete GPUs. Having all buffers mapped or mappable also adds more pressure to the OS, even on mobile. I think it's better to allow known universal fast paths (that can prevent some last % optiimzations) and have an optional feature for UMA, or a more expert feature for reduced restrictions on mappable buffers.

  3. In a large amount of systems it will be possible to synchronously create a shmem in the content process, and send it to the GPU process where it will be wrapped in a GPU resource. This is possible:

Proposal

mapReadAsync stays the same. MAP_WRITE (resp. MAP_READ) is only allowed with COPY_SRC (resp. COPY_DST). GPUBuffer now includes the following methods for mapping related things:

partial interface GPUBuffer {
    ArrayBuffer mapWrite(unsigned long offset = 0, unsigned long size = 0);
    async ArrayBuffer mapReadAsync();

    void unmap();
};

Calling mapWrite puts the buffer in the "mapped for writing" state. It's a validation error (and JS exception?) to mapWrite overlapping ranges of the buffer (until the next unmap). As usual it's an error to call GPUQueue.submit with a buffer in the mapped state referenced in the commands argument.

mapWrite returns a new ArrayBuffer that will replace the content of that range of the buffer when unmap is called. There's also some restrictions where a buffer can only be mapped on one thread and has to be unmapped there. (that's eww compared to native :/, maybe we can find something better).

Possible implementation

An example implementation of this proposal could be that:

  • On the first mapWrite after creation or an unmap call, a shmem of the same size as the buffer is created and returned to JS. At the same time it is sent to the GPU process and wrapped in a GPU resource there that replaces the previous GPU resource associated with that GPUBuffer. (replacing the resource is mostly ok because of the usage restrictions)
  • On unmap, a signal is sent to the GPU process that the data has been unmapped (for submit validation).

If the OS / driver doesn't support wrapping shmem in GPU resources, unmap() sends a list of regions from the shmem to update in the mapped buffer.

Imagine we are a really smart user-agents that knows which sub-ranges of a buffer are in use on the client side, then we can skip creating a new shmem and reuse the existing one.

Choices

  • Does mapWrite allow specifying the full range, or only a subrange?
  • Is the content of the buffer cleared after the first mapWrite call after creation or unmap or is the buffer content preserved?
  • More stuff I missed?

Comparison with writeBuffer

In the case where we can wrap shmem in a GPU resource, the number of copies for writeBuffer / mapWrite for JS and WASM are the following:

  • JS mapWrite: shmem/staging -> device-local
  • WASM mapWrite: wasm -> shmem/staging -> device-local
  • JS writeBuffer: data -> shmem/staging -> device-local
  • WASM writeBuffer: data -> shmem/staging -> device-local

When it's not possible to wrap shmem, the shmem/staging step becomes shmem -> staging with one extra copy happening.

mapWrite in JS is the fastest path and it incurs a single copy after initialization of the data. All other paths need to copy from some already initialized data somewhere to shmem and incur 2 copies. At some point I thought WASM writeBuffer would be better than WASM mapWrite but that turned out to not be the case.

A reason for doing writeBuffer is for simplicity, but I think that mapWrite is fairly understandable and can easily shim writeBuffer (more easily than mapWriteAsync for example).

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions