mirror of
https://github.com/status-im/Vulkan-Docs.git
synced 2025-02-11 13:57:01 +00:00
* Update release number to 92. Public Issues: * Move and modify valid usage statements dealing with pname:aspectMask in flink:vkCmdClearColorImage, flink:vkCmdClearDepthStencilImage, and slink:VkClearAttachment, so they are in places where all necessary information is available (public issue 529). * Fix math markup in <<textures-texel-anisotropic-filtering, Texel Anisotropic Filtering>> (public pull request 840). * Fix misspellings (public pull request 845). Internal Issues: * Add installation instructions and a Makefile "`chunked`" target for chunked HTML generation (internal issue 1352). * Fix pipeline mesh diagram style; also fix a minor bug in the classic pipeline diagram where vertex/index buffers wrongly fed into the vertex shader (internal issue 1436). * Make asciidoctor ERROR output raise an error, and don't suppress executed command output from CI make invocation (internal issue 1454). * Minor typo fixes and clarifications for `VK_NV_raytracing`. * Cleanup extension-specific properties ** Remove duplicated documentation for pname:maxDiscardRectangles, pname:pointClippingBehavior, and pname:maxVertexAttribDivisor (they shouldn't be documented with the other members of slink:VkPhysicalDeviceLimits at all). ** Remove duplicate anchor for pname:maxVertexAttribDivisor ** Consistently document stext:VkPhysicalDevice<Extension>PropertiesKHR *** Always document pname:sType/pname:pNext (was inconsistent before) *** Always mention chaining to slink:VkPhysicalDeviceProperties2 (and not as slink:VkPhysicalDeviceProperties2KHR) *** Always include Valid Usage statements last * Update Makefile 'checklinks' target and associated scripts, and fix markup problems identified by checkLinks.py, so that we can rely on the checklinks script as part of Gitlab CI.
444 lines
17 KiB
Plaintext
444 lines
17 KiB
Plaintext
include::meta/VK_NVX_device_generated_commands.txt[]
|
|
|
|
*Last Modified Date*::
|
|
2017-07-25
|
|
*Contributors*::
|
|
- Pierre Boudier, NVIDIA
|
|
- Christoph Kubisch, NVIDIA
|
|
- Mathias Schott, NVIDIA
|
|
- Jeff Bolz, NVIDIA
|
|
- Eric Werness, NVIDIA
|
|
- Detlef Roettger, NVIDIA
|
|
- Daniel Koch, NVIDIA
|
|
- Chris Hebert, NVIDIA
|
|
|
|
This extension allows the device to generate a number of critical commands
|
|
for command buffers.
|
|
|
|
When rendering a large number of objects, the device can be leveraged to
|
|
implement a number of critical functions, like updating matrices, or
|
|
implementing occlusion culling, frustum culling, front to back sorting, etc.
|
|
Implementing those on the device does not require any special extension,
|
|
since an application is free to define its own data structure, and just
|
|
process them using shaders.
|
|
|
|
However, if the application desires to quickly kick off the rendering of the
|
|
final stream of objects, then unextended Vulkan forces the application to
|
|
read back the processed stream and issue graphics command from the host.
|
|
For very large scenes, the synchronization overhead, and cost to generate
|
|
the command buffer can become the bottleneck.
|
|
This extension allows an application to generate a device side stream of
|
|
state changes and commands, and convert it efficiently into a command buffer
|
|
without having to read it back on the host.
|
|
|
|
Furthermore, it allows incremental changes to such command buffers by
|
|
manipulating only partial sections of a command stream -- for example
|
|
pipeline bindings.
|
|
Unextended Vulkan requires re-creation of entire command buffers in such
|
|
scenario, or updates synchronized on the host.
|
|
|
|
The intended usage for this extension is for the application to:
|
|
|
|
* create its objects as in unextended Vulkan
|
|
* create a slink:VkObjectTableNVX, and register the various Vulkan objects
|
|
that are needed to evaluate the input parameters.
|
|
* create a slink:VkIndirectCommandsLayoutNVX, which lists the
|
|
elink:VkIndirectCommandsTokenTypeNVX it wants to dynamically change as
|
|
atomic command sequence.
|
|
This step likely involves some internal device code compilation, since
|
|
the intent is for the GPU to generate the command buffer in the
|
|
pipeline.
|
|
* fill the input buffers with the data for each of the inputs it needs.
|
|
Each input is an array that will be filled with an index in the object
|
|
table, instead of using CPU pointers.
|
|
* set up a target secondary command buffer
|
|
* reserve command buffer space via flink:vkCmdReserveSpaceForCommandsNVX
|
|
in a target command buffer at the position you want the generated
|
|
commands to be executed.
|
|
* call flink:vkCmdProcessCommandsNVX to create the actual device commands
|
|
for all sequences based on the array contents into a provided target
|
|
command buffer.
|
|
* execute the target command buffer like a regular secondary command
|
|
buffer
|
|
|
|
For each draw/dispatch, the following can be specified:
|
|
|
|
* a different pipeline state object
|
|
* a number of descriptor sets, with dynamic offsets
|
|
* a number of vertex buffer bindings, with an optional dynamic offset
|
|
* a different index buffer, with an optional dynamic offset
|
|
|
|
Applications should: register a small number of objects, and use dynamic
|
|
offsets whenever possible.
|
|
|
|
While the GPU can be faster than a CPU to generate the commands, it may not
|
|
happen asynchronously, therefore the primary use-case is generating "`less`"
|
|
total work (occlusion culling, classification to use specialized shaders,
|
|
etc.).
|
|
|
|
=== New Object Types
|
|
|
|
* slink:VkObjectTableNVX
|
|
* slink:VkIndirectCommandsLayoutNVX
|
|
|
|
=== New Flag Types
|
|
|
|
* elink:VkIndirectCommandsLayoutUsageFlagsNVX
|
|
* elink:VkObjectEntryUsageFlagsNVX
|
|
|
|
=== New Enum Constants
|
|
|
|
Extending elink:VkStructureType:
|
|
|
|
** ename:VK_STRUCTURE_TYPE_OBJECT_TABLE_CREATE_INFO_NVX
|
|
** ename:VK_STRUCTURE_TYPE_INDIRECT_COMMANDS_LAYOUT_CREATE_INFO_NVX
|
|
** ename:VK_STRUCTURE_TYPE_CMD_PROCESS_COMMANDS_INFO_NVX
|
|
** ename:VK_STRUCTURE_TYPE_CMD_RESERVE_SPACE_FOR_COMMANDS_INFO_NVX
|
|
** ename:VK_STRUCTURE_TYPE_DEVICE_GENERATED_COMMANDS_LIMITS_NVX
|
|
** ename:VK_STRUCTURE_TYPE_DEVICE_GENERATED_COMMANDS_FEATURES_NVX
|
|
|
|
Extending elink:VkPipelineStageFlagBits:
|
|
|
|
** ename:VK_PIPELINE_STAGE_COMMAND_PROCESS_BIT_NVX
|
|
|
|
Extending elink:VkAccessFlagBits:
|
|
|
|
** ename:VK_ACCESS_COMMAND_PROCESS_READ_BIT_NVX
|
|
** ename:VK_ACCESS_COMMAND_PROCESS_WRITE_BIT_NVX
|
|
|
|
=== New Enums
|
|
|
|
* elink:VkIndirectCommandsLayoutUsageFlagBitsNVX
|
|
* elink:VkIndirectCommandsTokenTypeNVX
|
|
* elink:VkObjectEntryUsageFlagBitsNVX
|
|
* elink:VkObjectEntryTypeNVX
|
|
|
|
=== New Structures
|
|
|
|
* slink:VkDeviceGeneratedCommandsFeaturesNVX
|
|
* slink:VkDeviceGeneratedCommandsLimitsNVX
|
|
* slink:VkIndirectCommandsTokenNVX
|
|
* slink:VkIndirectCommandsLayoutTokenNVX
|
|
* slink:VkIndirectCommandsLayoutCreateInfoNVX
|
|
* slink:VkCmdProcessCommandsInfoNVX
|
|
* slink:VkCmdReserveSpaceForCommandsInfoNVX
|
|
* slink:VkObjectTableCreateInfoNVX
|
|
* slink:VkObjectTableEntryNVX
|
|
* slink:VkObjectTablePipelineEntryNVX
|
|
* slink:VkObjectTableDescriptorSetEntryNVX
|
|
* slink:VkObjectTableVertexBufferEntryNVX
|
|
* slink:VkObjectTableIndexBufferEntryNVX
|
|
* slink:VkObjectTablePushConstantEntryNVX
|
|
|
|
=== New Functions
|
|
|
|
* flink:vkCmdProcessCommandsNVX
|
|
* flink:vkCmdReserveSpaceForCommandsNVX
|
|
* flink:vkCreateIndirectCommandsLayoutNVX
|
|
* flink:vkDestroyIndirectCommandsLayoutNVX
|
|
* flink:vkCreateObjectTableNVX
|
|
* flink:vkDestroyObjectTableNVX
|
|
* flink:vkRegisterObjectsNVX
|
|
* flink:vkUnregisterObjectsNVX
|
|
* flink:vkGetPhysicalDeviceGeneratedCommandsPropertiesNVX
|
|
|
|
=== Issues
|
|
|
|
1) How to name this extension ?
|
|
|
|
*RESOLVED*: `VK_NVX_device_generated_commands`
|
|
|
|
As usual, one of the hardest issues ;)
|
|
|
|
Alternatives: `VK_gpu_commands`, `VK_execute_commands`,
|
|
`VK_device_commands`, `VK_device_execute_commands`, `VK_device_execute`,
|
|
`VK_device_created_commands`, `VK_device_recorded_commands`,
|
|
`VK_device_generated_commands`
|
|
|
|
2) Should we use serial tokens or redundant sequence description?
|
|
|
|
Similarly to slink:VkPipeline, signatures have the most likelihood to be
|
|
cross-vendor adoptable.
|
|
They also benefit from being processable in parallel.
|
|
|
|
3) How to name sequence description
|
|
|
|
stext:ExecuteCommandSignature is a bit long.
|
|
Maybe just stext:ExecuteSignature, or actually more following Vulkan
|
|
nomenclature: slink:VkIndirectCommandsLayoutNVX.
|
|
|
|
4) Do we want to provide code:indirectCommands inputs with layout or at
|
|
code:indirectCommands time?
|
|
|
|
Separate layout from data as Vulkan does.
|
|
Provide full flexibilty for code:indirectCommands.
|
|
|
|
5) Should the input be provided as SoA or AoS?
|
|
|
|
It is desirable for the application to reuse the list of objects and render
|
|
them with some kind of an override.
|
|
This can be done by just selecting a different input for a push constant or
|
|
a descriptor set, if they are defined as independent arrays.
|
|
If the data was interleaved, this would not be as easily possible.
|
|
|
|
Allowing input divisors can also reduce the conservative command buffer
|
|
allocation.
|
|
|
|
6) How do we know the size of the GPU command buffer generated by
|
|
flink:vkCmdProcessCommandsNVX ?
|
|
|
|
pname:maxSequenceCount can give an upper estimate, even if the actual count
|
|
is sourced from the gpu buffer at (buffer, countOffset).
|
|
As such pname:maxSequenceCount must always be set correctly.
|
|
|
|
Developers are encouraged to make well use the
|
|
slink:VkIndirectCommandsLayoutNVX's ptext:pTokens[].divisor, as they allow
|
|
less conservative storage costs.
|
|
Especially pipeline changes on a per-draw basis can be costly memory wise.
|
|
|
|
7) How to deal with dynamic offsets in DescriptorSets?
|
|
|
|
Maybe additional token etext:VK_EXECUTE_DESCRIPTOR_SET_OFFSET_COMMAND_NVX
|
|
that works for a "`single dynamic buffer`" descriptor set and then use (32
|
|
bit tableEntry + 32bit offset)
|
|
|
|
added dynamicCount field, variable sized input
|
|
|
|
8) Should we allow updates to the object table, similar to DescriptorSet?
|
|
|
|
Desired yes, people may change "`material`" shaders and not want to recreate
|
|
the entire register table.
|
|
However the developer must ensure to not overwrite a registered objectIndex
|
|
while it is still being used.
|
|
|
|
9) Should we allow dynamic state changes?
|
|
|
|
Seems a bit excessive for "`per-draw`" type of scenario, but GPU could
|
|
partition work itself with viewport/scissor...
|
|
|
|
10) How do we allow re-using already "`filled`" code:indirectCommands
|
|
buffers?
|
|
|
|
just use a slink:VkCommandBuffer for the output, and it can be reused
|
|
easily.
|
|
|
|
11) How portable should such re-use be?
|
|
|
|
Same as secondary command buffer
|
|
|
|
12) Should sequenceOrdered be part of IndirectCommandsLayout or
|
|
flink:vkCmdProcessCommandsNVX?
|
|
|
|
Seems better for IndirectCommandsLayout, as that is when most heavy lifting
|
|
in terms of internal device code generation is done.
|
|
|
|
13) Under which conditions is flink:vkCmdProcessCommandsNVX legal?
|
|
|
|
Options:
|
|
|
|
a) on the host command buffer like a regular draw call
|
|
|
|
b) flink:vkCmdProcessCommandsNVX makes use slink:VkCommandBufferBeginInfo
|
|
and serves as flink:vkBeginCommandBuffer / flink:vkEndCommandBuffer
|
|
implicitly.
|
|
|
|
c) The pname:targetCommandbuffer must be inside the "`begin`" state already
|
|
at the moment of being passed.
|
|
This very likely suggests a new tlink:VkCommandBufferUsageFlags
|
|
etext:VK_COMMAND_BUFFER_USAGE_DEVICE_GENERATED_BIT.
|
|
|
|
d) The pname:targetCommandbuffer must reserve space via a new function.
|
|
|
|
used a) and d).
|
|
|
|
14) What if different pipelines have different DescriptorSetLayouts at a
|
|
certain set unit that mismatches in code:token.dynamicCount?
|
|
|
|
Considered legal, as long as the maximum dynamic count of all used
|
|
DescriptorSetLayouts is provided.
|
|
|
|
15) Should we add "`strides`" to input arrays, so that "`Array of
|
|
Structures`" type setups can be supported more easily?
|
|
|
|
Maybe provide a usage flag for packed tokens stream (all inputs from same
|
|
buffer, implicit stride).
|
|
|
|
No, given performance test was worse.
|
|
|
|
16) Should we allow re-using the target command buffer directly, without
|
|
need to reset command buffer?
|
|
|
|
YES: new api flink:vkCmdReserveSpaceForCommandsNVX.
|
|
|
|
17) Is flink:vkCmdProcessCommandsNVX copying the input data or referencing
|
|
it ?
|
|
|
|
There are multiple implementations possible:
|
|
|
|
* one could have some emulation code that parse the inputs, and generates
|
|
an output command buffer, therefore copying the inputs.
|
|
* one could just reference the inputs, and have the processing done in
|
|
pipe at execution time.
|
|
|
|
If the data is mandated to be copied, then it puts a penalty on
|
|
implementation that could process the inputs directly in pipe.
|
|
If the data is "`referenced`", then it allows both types of implementation
|
|
|
|
The inputs are "`referenced`", and should not be modified after the call to
|
|
flink:vkCmdProcessCommandsNVX and until after the rendering of the target
|
|
command buffer is finished.
|
|
|
|
18) Why is this +NVX+ and not +NV+?
|
|
|
|
To allow early experimentation and feedback.
|
|
We expect that a version with a refined design as multi-vendor variant will
|
|
follow up.
|
|
|
|
19) Should we make the availability for each token type a device limit?
|
|
|
|
Only distinguish between graphics/compute for now, further splitting up may
|
|
lead to too much fractioning.
|
|
|
|
20) When can the pname:objectTable be modified?
|
|
|
|
Similar to the other inputs for flink:vkCmdProcessCommandsNVX, only when all
|
|
device access via flink:vkCmdProcessCommandsNVX or execution of target
|
|
command buffer has completed can an object at a given objectIndex be
|
|
unregistered or re-registered again.
|
|
|
|
21) Which buffer usage flags are required for the buffers referenced by
|
|
flink:vkCmdProcessCommandsNVX
|
|
|
|
reuse existing ename:VK_BUFFER_USAGE_INDIRECT_BUFFER_BIT
|
|
|
|
* slink:VkCmdProcessCommandsInfoNVX::pname:sequencesCountBuffer
|
|
* slink:VkCmdProcessCommandsInfoNVX::pname:sequencesIndexBuffer
|
|
* slink:VkIndirectCommandsTokenNVX::pname:buffer
|
|
|
|
22) In which pipeline stage do the device generated command expansion
|
|
happen?
|
|
|
|
flink:vkCmdProcessCommandsNVX is treated as if it occurs in a separate
|
|
logical pipeline from either graphics or compute, and that pipeline only
|
|
includes ename:VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT, a new stage
|
|
ename:VK_PIPELINE_STAGE_COMMAND_PROCESS_BIT_NVX, and
|
|
ename:VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT.
|
|
This new stage has two corresponding new access types,
|
|
ename:VK_ACCESS_COMMAND_PROCESS_READ_BIT_NVX and
|
|
ename:VK_ACCESS_COMMAND_PROCESS_WRITE_BIT_NVX, used to synchronize reading
|
|
the buffer inputs and writing the command buffer memory output.
|
|
The output written in the target command buffer is considered to be consumed
|
|
by the ename:VK_PIPELINE_STAGE_DRAW_INDIRECT_BIT pipeline stage.
|
|
|
|
Thus, to synchronize from writing the input buffers to executing
|
|
flink:vkCmdProcessCommandsNVX, use:
|
|
|
|
* pname:dstStageMask = ename:VK_PIPELINE_STAGE_COMMAND_PROCESS_BIT_NVX
|
|
* pname:dstAccessMask = ename:VK_ACCESS_COMMAND_PROCESS_READ_BIT_NVX
|
|
|
|
To synchronize from executing flink:vkCmdProcessCommandsNVX to executing the
|
|
generated commands, use
|
|
|
|
* pname:srcStageMask = ename:VK_PIPELINE_STAGE_COMMAND_PROCESS_BIT_NVX
|
|
* pname:srcAccessMask = ename:VK_ACCESS_COMMAND_PROCESS_WRITE_BIT_NVX
|
|
* pname:dstStageMask = ename:VK_PIPELINE_STAGE_DRAW_INDIRECT_BIT
|
|
* pname:dstAccessMask = ename:VK_ACCESS_INDIRECT_COMMAND_READ_BIT
|
|
|
|
When flink:vkCmdProcessCommandsNVX is used with a pname:targetCommandBuffer
|
|
of `NULL`, the generated commands are immediately executed and there is
|
|
implicit synchronization between generation and execution.
|
|
|
|
23) What if most token data is "`static`", but we frequently want to render
|
|
a subsection?
|
|
|
|
added "`sequencesIndexBuffer`".
|
|
This allows to easier sort and filter what should actually be processed.
|
|
|
|
=== Example Code
|
|
|
|
Open-Source samples illustrating the usage of the extension can be found at
|
|
the following locations:
|
|
|
|
https://github.com/nvpro-samples/gl_vk_threaded_cadscene/blob/master/doc/vulkan_nvxdevicegenerated.md
|
|
|
|
https://github.com/NVIDIAGameWorks/GraphicsSamples/tree/master/samples/vk10-kepler/BasicDeviceGeneratedCommandsVk
|
|
|
|
[source,c]
|
|
---------------------------------------------------
|
|
|
|
// setup secondary command buffer
|
|
vkBeginCommandBuffer(generatedCmdBuffer, &beginInfo);
|
|
... setup its state as usual
|
|
|
|
// insert the reservation (there can only be one per command buffer)
|
|
// where the generated calls should be filled into
|
|
VkCmdReserveSpaceForCommandsInfoNVX reserveInfo = { VK_STRUCTURE_TYPE_CMD_RESERVE_SPACE_FOR_COMMANDS_INFO_NVX };
|
|
reserveInfo.objectTable = objectTable;
|
|
reserveInfo.indirectCommandsLayout = deviceGeneratedLayout;
|
|
reserveInfo.maxSequencesCount = myCount;
|
|
vkCmdReserveSpaceForCommandsNVX(generatedCmdBuffer, &reserveInfo);
|
|
|
|
vkEndCommandBuffer(generatedCmdBuffer);
|
|
|
|
// trigger the generation at some point in another primary command buffer
|
|
VkCmdProcessCommandsInfoNVX processInfo = { VK_STRUCTURE_TYPE_CMD_PROCESS_COMMANDS_INFO_NVX };
|
|
processInfo.objectTable = objectTable;
|
|
processInfo.indirectCommandsLayout = deviceGeneratedLayout;
|
|
processInfo.maxSequencesCount = myCount;
|
|
// set the target of the generation (if null we would directly execute with mainCmd)
|
|
processInfo.targetCommandBuffer = generatedCmdBuffer;
|
|
// provide input data
|
|
processInfo.indirectCommandsTokenCount = 3;
|
|
processInfo.pIndirectCommandsTokens = myTokens;
|
|
|
|
// If you modify the input buffer data referenced by VkCmdProcessCommandsInfoNVX,
|
|
// ensure you have added the appropriate barriers prior generation process.
|
|
// When regenerating the content of the same reserved space, ensure prior operations have completed
|
|
|
|
VkMemoryBarrier memoryBarrier = { VK_STRUCTURE_TYPE_MEMORY_BARRIER };
|
|
memoryBarrier.srcAccessMask = ...;
|
|
memoryBarrier.dstAccessMask = VK_ACCESS_COMMAND_PROCESS_READ_BIT_NVX;
|
|
|
|
vkCmdPipelineBarrier(mainCmd,
|
|
/*srcStageMask*/VK_PIPELINE_STAGE_ALL_COMMANDS_BIT,
|
|
/*dstStageMask*/VK_PIPELINE_STAGE_COMMAND_PROCESS_BIT_NVX,
|
|
/*dependencyFlags*/0,
|
|
/*memoryBarrierCount*/1,
|
|
/*pMemoryBarriers*/&memoryBarrier,
|
|
...);
|
|
|
|
vkCmdProcessCommandsNVX(mainCmd, &processInfo);
|
|
...
|
|
// execute the secondary command buffer and ensure the processing that modifies command-buffer content
|
|
// has completed
|
|
|
|
memoryBarrier.srcAccessMask = VK_ACCESS_COMMAND_PROCESS_WRITE_BIT_NVX;
|
|
memoryBarrier.dstAccessMask = VK_ACCESS_INDIRECT_COMMAND_READ_BIT;
|
|
|
|
vkCmdPipelineBarrier(mainCmd,
|
|
/*srcStageMask*/VK_PIPELINE_STAGE_COMMAND_PROCESS_BIT_NVX,
|
|
/*dstStageMask*/VK_PIPELINE_STAGE_DRAW_INDIRECT_BIT,
|
|
/*dependencyFlags*/0,
|
|
/*memoryBarrierCount*/1,
|
|
/*pMemoryBarriers*/&memoryBarrier,
|
|
...)
|
|
vkCmdExecuteCommands(mainCmd, 1, &generatedCmdBuffer);
|
|
|
|
---------------------------------------------------
|
|
|
|
=== Version History
|
|
|
|
* Revision 3, 2017-07-25 (Chris Hebert)
|
|
- Correction to specification of dynamicCount for push_constant token in
|
|
VkIndirectCommandsLayoutNVX.
|
|
Stride was incorrectly computed as dynamicCount was not treated as byte
|
|
size.
|
|
* Revision 2, 2017-06-01 (Christoph Kubisch)
|
|
- header compatibility break: add missing _TYPE to
|
|
VkIndirectCommandsTokenTypeNVX and VkObjectEntryTypeNVX enums to follow
|
|
Vulkan naming convention
|
|
- behavior clarification: only allow a single work provoking token per
|
|
sequence when creating a slink:VkIndirectCommandsLayoutNVX
|
|
* Revision 1, 2016-10-31 (Christoph Kubisch)
|
|
- Initial draft
|