-
Notifications
You must be signed in to change notification settings - Fork 335
Description
Motivation
Modern graphics APIs have their query mechanism to get the information about the processing of a sequence of commands on GPU, and mainly support three types:
- Occlusion Query: Count the number of samples passed depth/stencil testing or whether samples passed the testing. This feature is used to determine visibility or even measure the area of geometry, such as predicated rendering (#551).
- Pipeline Statistics Query: Count various aspects of the operation of graphics or compute pipelines, such as the number of vertex shader invocations, the number of primitives processed by the clip stage, etc.. We can use these statistics informations to get a measure of relative complexity of different parts of application, which could help to find bottlenecks while performance tuning.
- Timestamp Query: Get timestamps generated by device. It can be used to measure the execution time of commands on GPU while performance tuning.
We expect to have such a mechanism to get these informations on WebGPU, here is the investigation about the support of these queries on D3D12, Metal and Vulkan.
Native APIs
Native APIs Support
Query Types | D3D12 | Metal | Vulkan |
---|---|---|---|
Occlusion | Supported | macOS 10.11+ iOS 8+ |
Binary Occlusion: supported Precise Occlusion: VkPhysicalDeviceFeatures.occlusionQueryPrecise == true Device Coverage: 98.9% Windows, 97.3% Linux, 10.6% Android |
Pipeline Statistics | Supported | macOS 10.15+ No iOS |
VkPhysicalDeviceFeatures.pipelineStatisticsQuery == true Device Coverage: 99.5% Windows, 99.5% Linux, 58.7% Android |
Timestamp | Supported | macOS 10.15+ | VkQueueFamilyProperties.timestampValidBits != 0 |
- Binary occlusion query is supported in all native APIs, precise occlusion query and pipeline statistics query are optional features on Vulkan which need to be enabled at device creation time.
- Pipeline statistics and timestamp queries are not available on Metal until macOS 10.15+.
- On iOS 10.3+, it starts to support GPU time (
GPUStartTime
andGPUEndTime
) but only for the whole command buffer. - So we can expose the binary occlusion query as a core feature, other queries as extensions.
Query Object
Query object is a collection of a specific number of queries of a particular type.
- On D3D12 and Vulkan, they manage all types of query objects in a similar way (
ID3D12QueryHeap
orVkQueryPool
). - On Metal, it stores occlusion query results using
visibilityResultBuffer
. For pipeline statistics and timestamp queries, it usesMTLCounterSampleBuffer
(available on macOS 10.15+) for storing their query results, like D3D12 and Vulkan.
The query objects on native APIs are created with descriptor (D3D12_QUERY_HEAP_DESC
, MTLCounterSampleBufferDescriptor
, VkQueryPoolCreateInfo
) which specify query type and query count, expect for visibilityResultBuffer
on Metal, which is a MTLBuffer
and set in render pass descriptor when the render pass is creating.
The query objects are passed as an argument to query operations and need to be destroyed like Vulkan did.
Query Types
Query Types | D3D12 | Metal | Vulkan |
---|---|---|---|
Occlusion | D3D12_QUERY_HEAP_TYPE _OCCLUSION |
MTLVisibilityResultMode |
VK_QUERY_TYPE_OCC LUSION |
Pipeline Statistics | D3D12_QUERY_HEAP_TYPE _PIPELINE_STATISTICS |
MTLCommonCounterSetStatistic |
VK_QUERY_TYPE_PIP ELINE_STATISTICS |
Timestamp | D3D12_QUERY_HEAP_TYPE _TIMESTAMP |
MTLCommonCounterSetTimestamp |
VK_QUERY_TYPE_TIM ESTAMP |
On Metal, it has no query type but uses MTLVisibilityResultMode
for occlusion query, and stores query results in a MTLBuffer
directly. Other queries have their types for creating query objects on each backend.
Query Operations
Query Types | D3D12 | Metal | Vulkan | ||||
---|---|---|---|---|---|---|---|
Begin Query |
EndQ uery |
setVisib ilityResu ltMode |
sampleC ounters InBuffer |
vkCmdBe ginQuery |
vkCmdE ndQuery |
vkCmdW riteTimes tamp |
|
Occlusion | √ | √ | √ | √ | √ | ||
Pipeline Statistic |
√ | √ | √ | √ | √ | ||
Timestamp | √ | √ | √ |
Occlusion Query:
- On Metal, it calls a separate API named
setVisibiltyResultMode
withBoolean
/Disabled
to begin/end binary occlusion query (Counting
for precise occlusion query). - D3D12 and Vulkan have begin and end operations. D3D12 controls binary/precise queries when calling
BeginQuery
with query type ofD3D12_QUERY_TYPE_BINARY_OCCLUSION
orD3D12_QUERY_TYPE_OCCLUSION
, Vulkan controls them via callvkCmdBeginQuery
withcontrol flags
.
Pipeline Statistics Query:
- On Metal, it performs pipeline statistics query by calling a new API on macOS 10.15+ named
sampleCountersInBuffer
, it does NOT begin and end statistics in a range of commands like D3D12 and Vulkan, but does statistics from the beginning of render (or compute or blit) encoder to wheresampleCountersInBuffer
is called. - To implement pipeline statistics query on Metal, we can call
sampleCountersInBuffer
twice (one for Begin() and other for End()) inside a render (or compute or blit) encoder, and stores the difference of the two query results in the result buffer.
Timestamp Query:
- Unlike occlusion and pipeline statistics queries, timestamp query does NOT operate over a range, but writes timestamps generated by device to query objects.
- The meanings of the timestamp results queried from native APIs are not clear. Timestamps are different on D3D12 (GPU ticks), Metal (nanoseconds) and Vulkan (nanoseconds), and not all timestamps can be converted to specific dates, which is platform dependent.
- So it’s better to have begin/end operations for timestamp query for exposing time delta instead of timestamp which may be more useful.
These operations on native APIs have different scopes:
Query Types | D3D12 | Metal | Vulkan |
---|---|---|---|
Occlusion | Inside or outside render pass on Direct Command List | Inside render encoder | Inside or outside render pass on Graphics Queue |
Pipeline Statistic |
Inside or outside render pass on Direct Command List | Inside render/compute/blit encoders | Inside or outside render pass on Graphics and Compute Queues |
Timestamp | Inside or outside render pass on Direct nad Compute Command Lists | Inside render/compute/blit encoders | Inside or outside render pass on Graphics and Compute Queues |
Pipeline statistics query is only supported on Direct Command List on D3D12, but ID3D12GraphicsCommandList::Dispatch()
can execute commands in a compute shader.
Resolve Query Results
Query Types | D3D12 | Metal | Vulkan |
---|---|---|---|
Resolve APIs | ResolveQueryData |
resolveCounters |
vkGetQueryPoolResults vkCmdCopyQueryPoolResults |
Binary Occlusion Result | Binary 0/1 resolved into a buffer | Non-zero or zero integer stored in buffer | Non-zero or zero integer resolved into a buffer |
Precise Occlusion Result | The number of samples passed depth and stencil tests | The number of samples passed depth and stencil tests | The number of samples passed scissor, exclusive scissor, sample mask, alpha to coverage, stencil, and depth tests |
Pipeline Statistics Result | D3D12_QUERY_DATA_ PIPELINE_STATISTICS |
MTLCounterResult Statistic |
VkQueryPipelineStatistic FlagBits |
Timestamp Result | GPU Ticks resolved into a buffer. Timestamp (in ns) = Timestamp(in ticks) * 109/ ID3D12CommandQueue:: GetTimestampFrequency() |
Nanoseconds resolved into a buffer | Nanoseconds resolved into a buffer |
- All native APIs support resolving the results from query objects to a buffer memory, the destination buffer can be accessed by following pipeline, such as using as conditional for the predicated rendering.
- Resolve operation must be outside render pass on D3D12 and Vulkan or render/compute encoder on Metal.
- The state or usage of the destination buffer must be
COPY_DEST
on D3D12,MTLStorageModeShared
orMTLStorageModePrivate
on Metal,UNIFORM_BUFFER
andTRANSFER_DST
on Vulkan. - The offset in destination buffer must be a multiple of 8 bytes on D3D12 and Vulkan (if resolving results as 64-bit).
- For occlusion query, Vulkan specifies more tests in its spec, but these tests will also affect the occlusion results on D3D12 and Metal. If the depth/stencil tests are disabled, then the results is simply the area of the rasterized primitives.
- The query results are resolved as 32-bit or 64-bit unsigned integers with flag on Vulkan, and always resolved as 64-bit unsigned integers on D3D12 and Metal.
- We cannot return the results buffer directly because we need to perform post-processing by compute shader after getting the raw query results from the native APIs.
- Unify the results of binary occlusion queries.
- Compute the counters in pipeline statistics results which are different on three native APIs, we prefer to expose the common parts of them.
- Compute the difference of the two timestamp queries. The time delta may be negative due to the timestamp counter may be reset after a long time on some platforms. We can suggest users to skip the invalid time delta if it’s negative.
Proposal
Extensions
Add precise occlusion, pipeline statistics and timestamp queries in GPUExtensionName
.
QuerySet
- Define
QuerySet
instead of individual query objects because Query objects (or sample buffers on Metal) can be allocated in a continuous part of memory. - Create and destroy
QuerySet
onGPUDevice
. - Set query set in
GPURenderPassDescriptor
for occlusion query due to Metal requiresvisibilityResultsBuffer
inMTLRenderPassDescriptor
at render pass creation time.
Begin/End Query
- Occlusion query only supports begin/end on render pass encoder without passing query set which has been set in render pass descriptor.
- Pipeline statistics and timestamp queries support begin/end on both render pass encoder and compute pass encoder.
- We may need to perform different types of queries in the same render/compute pass encoder, so it’s better to pass a query set in beginQuery/endQuery for pipeline statistics and timestamp queries.
Resolve Query
Retrieve query results from query set, users can read the results from buffer memory or consume the result buffer directly.
- Queries results are resolved into GPU buffer:
Query Types | Resolved Results |
---|---|
Binary Occlusion | 0/1 |
Precise Occlusion | The number of samples passed depth/stencil tests. |
Pipeline Statistics | The number of vertex shader invocations, primitives processed by the clip stage, primitives output by the clip stage, fragment shader invocations, compute shader invocations. |
Timestamp | Time delta in nanoseconds. 0 for invalid results which need to be skipped. |
- All results in the GPU buffer are stored in a type of
GPUSize64
. The offset must be a multiple of 8 bytes. - Add a new GPUBuffer usage for resolving queries, which avoid to expose more detailed information about buffer usage, and can be reused in predicated or conditional rendering.