Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Investigation: Query API #614

@haoxli

Description

@haoxli

Motivation

Modern graphics APIs have their query mechanism to get the information about the processing of a sequence of commands on GPU, and mainly support three types:

  • Occlusion Query: Count the number of samples passed depth/stencil testing or whether samples passed the testing. This feature is used to determine visibility or even measure the area of geometry, such as predicated rendering (#551).
  • Pipeline Statistics Query: Count various aspects of the operation of graphics or compute pipelines, such as the number of vertex shader invocations, the number of primitives processed by the clip stage, etc.. We can use these statistics informations to get a measure of relative complexity of different parts of application, which could help to find bottlenecks while performance tuning.
  • Timestamp Query: Get timestamps generated by device. It can be used to measure the execution time of commands on GPU while performance tuning.

We expect to have such a mechanism to get these informations on WebGPU, here is the investigation about the support of these queries on D3D12, Metal and Vulkan.

Native APIs

Native APIs Support

Query Types D3D12 Metal Vulkan
Occlusion Supported macOS 10.11+
iOS 8+
Binary Occlusion: supported
Precise Occlusion:
VkPhysicalDeviceFeatures.occlusionQueryPrecise == true
Device Coverage: 98.9% Windows, 97.3% Linux, 10.6% Android
Pipeline Statistics Supported macOS 10.15+
No iOS
VkPhysicalDeviceFeatures.pipelineStatisticsQuery == true
Device Coverage: 99.5% Windows, 99.5% Linux, 58.7% Android
Timestamp Supported macOS 10.15+ VkQueueFamilyProperties.timestampValidBits != 0
  • Binary occlusion query is supported in all native APIs, precise occlusion query and pipeline statistics query are optional features on Vulkan which need to be enabled at device creation time.
  • Pipeline statistics and timestamp queries are not available on Metal until macOS 10.15+.
  • On iOS 10.3+, it starts to support GPU time (GPUStartTime and GPUEndTime) but only for the whole command buffer.
  • So we can expose the binary occlusion query as a core feature, other queries as extensions.

Query Object

Query object is a collection of a specific number of queries of a particular type.

The query objects on native APIs are created with descriptor (D3D12_QUERY_HEAP_DESC, MTLCounterSampleBufferDescriptor, VkQueryPoolCreateInfo) which specify query type and query count, expect for visibilityResultBuffer on Metal, which is a MTLBuffer and set in render pass descriptor when the render pass is creating.

The query objects are passed as an argument to query operations and need to be destroyed like Vulkan did.

Query Types

Query Types D3D12 Metal Vulkan
Occlusion D3D12_QUERY_HEAP_TYPE
_OCCLUSION
MTLVisibilityResultMode VK_QUERY_TYPE_OCC
LUSION
Pipeline Statistics D3D12_QUERY_HEAP_TYPE
_PIPELINE_STATISTICS
MTLCommonCounterSetStatistic VK_QUERY_TYPE_PIP
ELINE_STATISTICS
Timestamp D3D12_QUERY_HEAP_TYPE
_TIMESTAMP
MTLCommonCounterSetTimestamp VK_QUERY_TYPE_TIM
ESTAMP

On Metal, it has no query type but uses MTLVisibilityResultMode for occlusion query, and stores query results in a MTLBuffer directly. Other queries have their types for creating query objects on each backend.

Query Operations

Query Types D3D12 Metal Vulkan
Begin
Query
EndQ
uery
setVisib
ilityResu
ltMode
sampleC
ounters
InBuffer
vkCmdBe
ginQuery
vkCmdE
ndQuery
vkCmdW
riteTimes
tamp
Occlusion    
Pipeline
Statistic
   
Timestamp        

Occlusion Query:

  • On Metal, it calls a separate API named setVisibiltyResultMode with Boolean/Disabled to begin/end binary occlusion query (Counting for precise occlusion query).
  • D3D12 and Vulkan have begin and end operations. D3D12 controls binary/precise queries when calling BeginQuery with query type of D3D12_QUERY_TYPE_BINARY_OCCLUSION or D3D12_QUERY_TYPE_OCCLUSION, Vulkan controls them via call vkCmdBeginQuery with control flags.

Pipeline Statistics Query:

  • On Metal, it performs pipeline statistics query by calling a new API on macOS 10.15+ named sampleCountersInBuffer, it does NOT begin and end statistics in a range of commands like D3D12 and Vulkan, but does statistics from the beginning of render (or compute or blit) encoder to where sampleCountersInBuffer is called.
  • To implement pipeline statistics query on Metal, we can call sampleCountersInBuffer twice (one for Begin() and other for End()) inside a render (or compute or blit) encoder, and stores the difference of the two query results in the result buffer.

Timestamp Query:

  • Unlike occlusion and pipeline statistics queries, timestamp query does NOT operate over a range, but writes timestamps generated by device to query objects.
  • The meanings of the timestamp results queried from native APIs are not clear. Timestamps are different on D3D12 (GPU ticks), Metal (nanoseconds) and Vulkan (nanoseconds), and not all timestamps can be converted to specific dates, which is platform dependent.
  • So it’s better to have begin/end operations for timestamp query for exposing time delta instead of timestamp which may be more useful.

These operations on native APIs have different scopes:

Query Types D3D12 Metal Vulkan
Occlusion Inside or outside render pass on Direct Command List Inside render encoder Inside or outside render pass on Graphics Queue
Pipeline
Statistic
Inside or outside render pass on Direct Command List Inside render/compute/blit encoders Inside or outside render pass on Graphics and Compute Queues
Timestamp Inside or outside render pass on Direct nad Compute Command Lists Inside render/compute/blit encoders Inside or outside render pass on Graphics and Compute Queues

Pipeline statistics query is only supported on Direct Command List on D3D12, but ID3D12GraphicsCommandList::Dispatch() can execute commands in a compute shader.

Resolve Query Results

Query Types D3D12 Metal Vulkan
Resolve APIs ResolveQueryData resolveCounters vkGetQueryPoolResults
vkCmdCopyQueryPoolResults
Binary Occlusion Result Binary 0/1 resolved into a buffer Non-zero or zero integer stored in buffer Non-zero or zero integer resolved into a buffer
Precise Occlusion Result The number of samples passed depth and stencil tests The number of samples passed depth and stencil tests The number of samples passed scissor, exclusive scissor, sample mask, alpha to coverage, stencil, and depth tests
Pipeline Statistics Result D3D12_QUERY_DATA_
PIPELINE_STATISTICS
MTLCounterResult
Statistic
VkQueryPipelineStatistic
FlagBits
Timestamp Result GPU Ticks resolved into a buffer.
Timestamp (in ns) =
Timestamp(in ticks) * 109/
ID3D12CommandQueue::
GetTimestampFrequency()
Nanoseconds resolved into a buffer Nanoseconds resolved into a buffer
  • All native APIs support resolving the results from query objects to a buffer memory, the destination buffer can be accessed by following pipeline, such as using as conditional for the predicated rendering.
  • Resolve operation must be outside render pass on D3D12 and Vulkan or render/compute encoder on Metal.
  • The state or usage of the destination buffer must be COPY_DEST on D3D12, MTLStorageModeShared or MTLStorageModePrivate on Metal, UNIFORM_BUFFER and TRANSFER_DST on Vulkan.
  • The offset in destination buffer must be a multiple of 8 bytes on D3D12 and Vulkan (if resolving results as 64-bit).
  • For occlusion query, Vulkan specifies more tests in its spec, but these tests will also affect the occlusion results on D3D12 and Metal. If the depth/stencil tests are disabled, then the results is simply the area of the rasterized primitives.
  • The query results are resolved as 32-bit or 64-bit unsigned integers with flag on Vulkan, and always resolved as 64-bit unsigned integers on D3D12 and Metal.
  • We cannot return the results buffer directly because we need to perform post-processing by compute shader after getting the raw query results from the native APIs.
    • Unify the results of binary occlusion queries.
    • Compute the counters in pipeline statistics results which are different on three native APIs, we prefer to expose the common parts of them.
    • Compute the difference of the two timestamp queries. The time delta may be negative due to the timestamp counter may be reset after a long time on some platforms. We can suggest users to skip the invalid time delta if it’s negative.

Proposal

Extensions

Add precise occlusion, pipeline statistics and timestamp queries in GPUExtensionName.

QuerySet

  • Define QuerySet instead of individual query objects because Query objects (or sample buffers on Metal) can be allocated in a continuous part of memory.
  • Create and destroy QuerySet on GPUDevice.
  • Set query set in GPURenderPassDescriptor for occlusion query due to Metal requires visibilityResultsBuffer in MTLRenderPassDescriptor at render pass creation time.

Begin/End Query

  • Occlusion query only supports begin/end on render pass encoder without passing query set which has been set in render pass descriptor.
  • Pipeline statistics and timestamp queries support begin/end on both render pass encoder and compute pass encoder.
  • We may need to perform different types of queries in the same render/compute pass encoder, so it’s better to pass a query set in beginQuery/endQuery for pipeline statistics and timestamp queries.

Resolve Query

Retrieve query results from query set, users can read the results from buffer memory or consume the result buffer directly.

  • Queries results are resolved into GPU buffer:
Query Types Resolved Results
Binary Occlusion 0/1
Precise Occlusion The number of samples passed depth/stencil tests.
Pipeline Statistics The number of
vertex shader invocations,
primitives processed by the clip stage,
primitives output by the clip stage,
fragment shader invocations,
compute shader invocations.
Timestamp Time delta in nanoseconds.
0 for invalid results which need to be skipped.
  • All results in the GPU buffer are stored in a type of GPUSize64. The offset must be a multiple of 8 bytes.
  • Add a new GPUBuffer usage for resolving queries, which avoid to expose more detailed information about buffer usage, and can be reused in predicated or conditional rendering.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions