1462 lines
61 KiB
Plaintext
1462 lines
61 KiB
Plaintext
// Copyright (c) 2015-2019 Khronos Group. This work is licensed under a
|
|
// Creative Commons Attribution 4.0 International License; see
|
|
// http://creativecommons.org/licenses/by/4.0/
|
|
|
|
[[shaders]]
|
|
= Shaders
|
|
|
|
A shader specifies programmable operations that execute for each vertex,
|
|
control point, tessellated vertex, primitive, fragment, or workgroup in the
|
|
corresponding stage(s) of the graphics and compute pipelines.
|
|
|
|
Graphics pipelines include vertex shader execution as a result of
|
|
<<drawing,primitive assembly>>, followed, if enabled, by tessellation
|
|
control and evaluation shaders operating on <<drawing-patch-lists,patches>>,
|
|
geometry shaders, if enabled, operating on primitives, and fragment shaders,
|
|
if present, operating on fragments generated by <<primsrast,Rasterization>>.
|
|
In this specification, vertex, tessellation control, tessellation evaluation
|
|
and geometry shaders are collectively referred to as vertex processing
|
|
stages and occur in the logical pipeline before rasterization.
|
|
The fragment shader occurs logically after rasterization.
|
|
|
|
Only the compute shader stage is included in a compute pipeline.
|
|
Compute shaders operate on compute invocations in a workgroup.
|
|
|
|
Shaders can: read from input variables, and read from and write to output
|
|
variables.
|
|
Input and output variables can: be used to transfer data between shader
|
|
stages, or to allow the shader to interact with values that exist in the
|
|
execution environment.
|
|
Similarly, the execution environment provides constants that describe
|
|
capabilities.
|
|
|
|
Shader variables are associated with execution environment-provided inputs
|
|
and outputs using _built-in_ decorations in the shader.
|
|
The available decorations for each stage are documented in the following
|
|
subsections.
|
|
|
|
|
|
[[shader-modules]]
|
|
== Shader Modules
|
|
|
|
[open,refpage='VkShaderModule',desc='Opaque handle to a shader module object',type='handles']
|
|
--
|
|
|
|
_Shader modules_ contain _shader code_ and one or more entry points.
|
|
Shaders are selected from a shader module by specifying an entry point as
|
|
part of <<pipelines,pipeline>> creation.
|
|
The stages of a pipeline can: use shaders that come from different modules.
|
|
The shader code defining a shader module must: be in the SPIR-V format, as
|
|
described by the <<spirvenv,Vulkan Environment for SPIR-V>> appendix.
|
|
|
|
Shader modules are represented by sname:VkShaderModule handles:
|
|
|
|
include::{generated}/api/handles/VkShaderModule.txt[]
|
|
|
|
--
|
|
|
|
[open,refpage='vkCreateShaderModule',desc='Creates a new shader module object',type='protos']
|
|
--
|
|
|
|
To create a shader module, call:
|
|
|
|
include::{generated}/api/protos/vkCreateShaderModule.txt[]
|
|
|
|
* pname:device is the logical device that creates the shader module.
|
|
* pname:pCreateInfo is a pointer to an instance of the
|
|
sname:VkShaderModuleCreateInfo structure.
|
|
* pname:pAllocator controls host memory allocation as described in the
|
|
<<memory-allocation, Memory Allocation>> chapter.
|
|
* pname:pShaderModule points to a slink:VkShaderModule handle in which the
|
|
resulting shader module object is returned.
|
|
|
|
Once a shader module has been created, any entry points it contains can: be
|
|
used in pipeline shader stages as described in <<pipelines-compute,Compute
|
|
Pipelines>> and <<pipelines-graphics,Graphics Pipelines>>.
|
|
|
|
ifdef::VK_NV_glsl_shader[]
|
|
If the shader stage fails to compile ename:VK_ERROR_INVALID_SHADER_NV will
|
|
be generated and the compile log will be reported back to the application by
|
|
`<<VK_EXT_debug_report>>` if enabled.
|
|
endif::VK_NV_glsl_shader[]
|
|
|
|
include::{generated}/validity/protos/vkCreateShaderModule.txt[]
|
|
--
|
|
|
|
[open,refpage='VkShaderModuleCreateInfo',desc='Structure specifying parameters of a newly created shader module',type='structs']
|
|
--
|
|
|
|
The sname:VkShaderModuleCreateInfo structure is defined as:
|
|
|
|
include::{generated}/api/structs/VkShaderModuleCreateInfo.txt[]
|
|
|
|
* pname:sType is the type of this structure.
|
|
* pname:pNext is `NULL` or a pointer to an extension-specific structure.
|
|
* pname:flags is reserved for future use.
|
|
* pname:codeSize is the size, in bytes, of the code pointed to by
|
|
pname:pCode.
|
|
* pname:pCode points to code that is used to create the shader module.
|
|
The type and format of the code is determined from the content of the
|
|
memory addressed by pname:pCode.
|
|
|
|
.Valid Usage
|
|
****
|
|
* [[VUID-VkShaderModuleCreateInfo-codeSize-01085]]
|
|
pname:codeSize must: be greater than 0
|
|
ifndef::VK_NV_glsl_shader[]
|
|
* [[VUID-VkShaderModuleCreateInfo-codeSize-01086]]
|
|
pname:codeSize must: be a multiple of 4
|
|
* [[VUID-VkShaderModuleCreateInfo-pCode-01087]]
|
|
pname:pCode must: point to valid SPIR-V code, formatted and packed as
|
|
described by the <<spirv-spec,Khronos SPIR-V Specification>>
|
|
* [[VUID-VkShaderModuleCreateInfo-pCode-01088]]
|
|
pname:pCode must: adhere to the validation rules described by the
|
|
<<spirvenv-module-validation, Validation Rules within a Module>> section
|
|
of the <<spirvenv-capabilities,SPIR-V Environment>> appendix
|
|
endif::VK_NV_glsl_shader[]
|
|
ifdef::VK_NV_glsl_shader[]
|
|
* [[VUID-VkShaderModuleCreateInfo-pCode-01376]]
|
|
If pname:pCode points to SPIR-V code, pname:codeSize must: be a multiple
|
|
of 4
|
|
* [[VUID-VkShaderModuleCreateInfo-pCode-01377]]
|
|
pname:pCode must: point to either valid SPIR-V code, formatted and
|
|
packed as described by the <<spirv-spec,Khronos SPIR-V Specification>>
|
|
or valid GLSL code which must: be written to the `GL_KHR_vulkan_glsl`
|
|
extension specification
|
|
* [[VUID-VkShaderModuleCreateInfo-pCode-01378]]
|
|
If pname:pCode points to SPIR-V code, that code must: adhere to the
|
|
validation rules described by the <<spirvenv-module-validation,
|
|
Validation Rules within a Module>> section of the
|
|
<<spirvenv-capabilities,SPIR-V Environment>> appendix
|
|
* [[VUID-VkShaderModuleCreateInfo-pCode-01379]]
|
|
If pname:pCode points to GLSL code, it must: be valid GLSL code written
|
|
to the `GL_KHR_vulkan_glsl` GLSL extension specification
|
|
endif::VK_NV_glsl_shader[]
|
|
* [[VUID-VkShaderModuleCreateInfo-pCode-01089]]
|
|
pname:pCode must: declare the code:Shader capability for SPIR-V code
|
|
* [[VUID-VkShaderModuleCreateInfo-pCode-01090]]
|
|
pname:pCode must: not declare any capability that is not supported by
|
|
the API, as described by the <<spirvenv-module-validation,
|
|
Capabilities>> section of the <<spirvenv-capabilities,SPIR-V
|
|
Environment>> appendix
|
|
* [[VUID-VkShaderModuleCreateInfo-pCode-01091]]
|
|
If pname:pCode declares any of the capabilities listed as optional: in
|
|
the <<spirvenv-capabilities-table,SPIR-V Environment>> appendix, the
|
|
corresponding feature(s) must: be enabled.
|
|
****
|
|
|
|
include::{generated}/validity/structs/VkShaderModuleCreateInfo.txt[]
|
|
--
|
|
|
|
[open,refpage='VkShaderModuleCreateFlags',desc='Reserved for future use',type='flags']
|
|
--
|
|
include::{generated}/api/flags/VkShaderModuleCreateFlags.txt[]
|
|
|
|
tname:VkShaderModuleCreateFlags is a bitmask type for setting a mask, but is
|
|
currently reserved for future use.
|
|
--
|
|
|
|
ifdef::VK_EXT_validation_cache[]
|
|
include::VK_EXT_validation_cache/shader-module-validation-cache.txt[]
|
|
endif::VK_EXT_validation_cache[]
|
|
|
|
|
|
[open,refpage='vkDestroyShaderModule',desc='Destroy a shader module',type='protos']
|
|
--
|
|
|
|
To destroy a shader module, call:
|
|
|
|
include::{generated}/api/protos/vkDestroyShaderModule.txt[]
|
|
|
|
* pname:device is the logical device that destroys the shader module.
|
|
* pname:shaderModule is the handle of the shader module to destroy.
|
|
* pname:pAllocator controls host memory allocation as described in the
|
|
<<memory-allocation, Memory Allocation>> chapter.
|
|
|
|
A shader module can: be destroyed while pipelines created using its shaders
|
|
are still in use.
|
|
|
|
.Valid Usage
|
|
****
|
|
* [[VUID-vkDestroyShaderModule-shaderModule-01092]]
|
|
If sname:VkAllocationCallbacks were provided when pname:shaderModule was
|
|
created, a compatible set of callbacks must: be provided here
|
|
* [[VUID-vkDestroyShaderModule-shaderModule-01093]]
|
|
If no sname:VkAllocationCallbacks were provided when pname:shaderModule
|
|
was created, pname:pAllocator must: be `NULL`
|
|
****
|
|
|
|
include::{generated}/validity/protos/vkDestroyShaderModule.txt[]
|
|
--
|
|
|
|
|
|
[[shaders-execution]]
|
|
== Shader Execution
|
|
|
|
At each stage of the pipeline, multiple invocations of a shader may: execute
|
|
simultaneously.
|
|
Further, invocations of a single shader produced as the result of different
|
|
commands may: execute simultaneously.
|
|
The relative execution order of invocations of the same shader type is
|
|
undefined:.
|
|
Shader invocations may: complete in a different order than that in which the
|
|
primitives they originated from were drawn or dispatched by the application.
|
|
However, fragment shader outputs are written to attachments in
|
|
<<primrast-order,rasterization order>>.
|
|
|
|
The relative execution order of invocations of different shader types is
|
|
largely undefined:.
|
|
However, when invoking a shader whose inputs are generated from a previous
|
|
pipeline stage, the shader invocations from the previous stage are
|
|
guaranteed to have executed far enough to generate input values for all
|
|
required inputs.
|
|
|
|
|
|
[[shaders-execution-memory-ordering]]
|
|
== Shader Memory Access Ordering
|
|
|
|
The order in which image or buffer memory is read or written by shaders is
|
|
largely undefined:.
|
|
For some shader types (vertex, tessellation evaluation, and in some cases,
|
|
fragment), even the number of shader invocations that may: perform loads and
|
|
stores is undefined:.
|
|
|
|
In particular, the following rules apply:
|
|
|
|
* <<shaders-vertex-execution,Vertex>> and
|
|
<<shaders-tessellation-evaluation-execution,tessellation evaluation>>
|
|
shaders will be invoked at least once for each unique vertex, as defined
|
|
in those sections.
|
|
* <<shaders-fragment-execution,Fragment>> shaders will be invoked zero or
|
|
more times, as defined in that section.
|
|
* The relative execution order of invocations of the same shader type is
|
|
undefined:.
|
|
A store issued by a shader when working on primitive B might complete
|
|
prior to a store for primitive A, even if primitive A is specified prior
|
|
to primitive B. This applies even to fragment shaders; while fragment
|
|
shader outputs are always written to the framebuffer in
|
|
<<primrast-order, rasterization order>>, stores executed by fragment
|
|
shader invocations are not.
|
|
* The relative execution order of invocations of different shader types is
|
|
largely undefined:.
|
|
|
|
[NOTE]
|
|
.Note
|
|
====
|
|
The above limitations on shader invocation order make some forms of
|
|
synchronization between shader invocations within a single set of primitives
|
|
unimplementable.
|
|
For example, having one invocation poll memory written by another invocation
|
|
assumes that the other invocation has been launched and will complete its
|
|
writes in finite time.
|
|
====
|
|
|
|
ifdef::VK_KHR_vulkan_memory_model[]
|
|
|
|
The <<memory-model,Memory Model>> appendix defines the terminology and rules
|
|
for how to correctly communicate between shader invocations, such as when a
|
|
write is <<memory-model-visible-to,Visible-To>> a read, and what constitutes
|
|
a <<memory-model-access-data-race,Data Race>>.
|
|
|
|
Applications must: not cause a data race.
|
|
|
|
endif::VK_KHR_vulkan_memory_model[]
|
|
|
|
ifndef::VK_KHR_vulkan_memory_model[]
|
|
|
|
Stores issued to different memory locations within a single shader
|
|
invocation may: not be visible to other invocations, or may: not become
|
|
visible in the order they were performed.
|
|
|
|
The code:OpMemoryBarrier instruction can: be used to provide stronger
|
|
ordering of reads and writes performed by a single invocation.
|
|
code:OpMemoryBarrier guarantees that any memory transactions issued by the
|
|
shader invocation prior to the instruction complete prior to the memory
|
|
transactions issued after the instruction.
|
|
Memory barriers are needed for algorithms that require multiple invocations
|
|
to access the same memory and require the operations to be performed in a
|
|
partially-defined relative order.
|
|
For example, if one shader invocation does a series of writes, followed by
|
|
an code:OpMemoryBarrier instruction, followed by another write, then the
|
|
results of the series of writes before the barrier become visible to other
|
|
shader invocations at a time earlier or equal to when the results of the
|
|
final write become visible to those invocations.
|
|
In practice it means that another invocation that sees the results of the
|
|
final write would also see the previous writes.
|
|
Without the memory barrier, the final write may: be visible before the
|
|
previous writes.
|
|
|
|
Writes that are the result of shader stores through a variable decorated
|
|
with code:Coherent automatically have available writes to the same buffer,
|
|
buffer view, or image view made visible to them, and are themselves
|
|
automatically made available to access by the same buffer, buffer view, or
|
|
image view.
|
|
Reads that are the result of shader loads through a variable decorated with
|
|
code:Coherent automatically have available writes to the same buffer, buffer
|
|
view, or image view made visible to them.
|
|
The order that coherent writes to different locations become available is
|
|
undefined:, unless enforced by a memory barrier instruction or other memory
|
|
dependency.
|
|
|
|
[NOTE]
|
|
.Note
|
|
====
|
|
Explicit memory dependencies must: still be used to guarantee availability
|
|
and visibility for access via other buffers, buffer views, or image views.
|
|
====
|
|
|
|
The built-in atomic memory transaction instructions can: be used to read and
|
|
write a given memory address atomically.
|
|
While built-in atomic functions issued by multiple shader invocations are
|
|
executed in undefined: order relative to each other, these functions perform
|
|
both a read and a write of a memory address and guarantee that no other
|
|
memory transaction will write to the underlying memory between the read and
|
|
write.
|
|
Atomic operations ensure automatic availability and visibility for writes
|
|
and reads in the same way as those to code:Coherent variables.
|
|
|
|
[NOTE]
|
|
.Note
|
|
====
|
|
Memory accesses performed on different resource descriptors with the same
|
|
memory backing may: not be well-defined even with the code:Coherent
|
|
decoration or via atomics, due to things such as image layouts or ownership
|
|
of the resource - as described in the <<synchronization, Synchronization and
|
|
Cache Control>> chapter.
|
|
====
|
|
|
|
[NOTE]
|
|
.Note
|
|
====
|
|
Atomics allow shaders to use shared global addresses for mutual exclusion or
|
|
as counters, among other uses.
|
|
====
|
|
|
|
endif::VK_KHR_vulkan_memory_model[]
|
|
|
|
[[shaders-inputs]]
|
|
== Shader Inputs and Outputs
|
|
|
|
Data is passed into and out of shaders using variables with input or output
|
|
storage class, respectively.
|
|
User-defined inputs and outputs are connected between stages by matching
|
|
their code:Location decorations.
|
|
Additionally, data can: be provided by or communicated to special functions
|
|
provided by the execution environment using code:BuiltIn decorations.
|
|
|
|
In many cases, the same code:BuiltIn decoration can: be used in multiple
|
|
shader stages with similar meaning.
|
|
The specific behavior of variables decorated as code:BuiltIn is documented
|
|
in the following sections.
|
|
|
|
ifdef::VK_NV_mesh_shader[]
|
|
[[shaders-task]]
|
|
== Task Shaders
|
|
|
|
Task shaders operate in conjunction with the mesh shaders to produce a
|
|
collection of primitives that will be processed by subsequent stages of the
|
|
graphics pipeline.
|
|
Its primary purpose is to create a variable amount of subsequent mesh shader
|
|
invocations.
|
|
|
|
Task shaders are invoked via the execution of the
|
|
<<drawing-mesh-shading,programmable mesh shading>> pipeline.
|
|
|
|
The task shader has no fixed-function inputs other than variables
|
|
identifying the specific workgroup and invocation.
|
|
The only fixed output of the task shader is a task count, identifying the
|
|
number of mesh shader workgroups to create.
|
|
The task shader can write additional outputs to task memory, which can be
|
|
read by all of the mesh shader workgroups it created.
|
|
|
|
=== Task Shader Execution
|
|
|
|
Task workloads are formed from groups of work items called workgroups and
|
|
processed by the task shader in the current graphics pipeline.
|
|
A workgroup is a collection of shader invocations that execute the same
|
|
shader, potentially in parallel.
|
|
Task shaders execute in _global workgroups_ which are divided into a number
|
|
of _local workgroups_ with a size that can: be set by assigning a value to
|
|
the code:LocalSize execution mode or via an object decorated by the
|
|
code:WorkgroupSize decoration.
|
|
An invocation within a local workgroup can: share data with other members of
|
|
the local workgroup through shared variables and issue memory and control
|
|
flow barriers to synchronize with other members of the local workgroup.
|
|
|
|
[[shaders-mesh]]
|
|
== Mesh Shaders
|
|
|
|
Mesh shaders operate in workgroups to produce a collection of primitives
|
|
that will be processed by subsequent stages of the graphics pipeline.
|
|
Each workgroup emits zero or more output primitives and the group of
|
|
vertices and their associated data required for each output primitive.
|
|
|
|
Mesh shaders are invoked via the execution of the
|
|
<<drawing-mesh-shading,programmable mesh shading>> pipeline.
|
|
|
|
The only inputs available to the mesh shader are variables identifying the
|
|
specific workgroup and invocation and, if applicable, any outputs written to
|
|
task memory by the task shader that spawned the mesh shader's workgroup.
|
|
The mesh shader can operate without a task shader as well.
|
|
|
|
The invocations of the mesh shader workgroup write an output mesh,
|
|
comprising a set of primitives with per-primitive attributes, a set of
|
|
vertices with per-vertex attributes, and an array of indices identifying the
|
|
mesh vertices that belong to each primitive.
|
|
The primitives of this mesh are then processed by subsequent graphics
|
|
pipeline stages, where the outputs of the mesh shader form an interface with
|
|
the fragment shader.
|
|
|
|
=== Mesh Shader Execution
|
|
|
|
Mesh workloads are formed from groups of work items called workgroups and
|
|
processed by the mesh shader in the current graphics pipeline.
|
|
A workgroup is a collection of shader invocations that execute the same
|
|
shader, potentially in parallel.
|
|
Mesh shaders execute in _global workgroups_ which are divided into a number
|
|
of _local workgroups_ with a size that can: be set by assigning a value to
|
|
the code:LocalSize execution mode or via an object decorated by the
|
|
code:WorkgroupSize decoration.
|
|
An invocation within a local workgroup can: share data with other members of
|
|
the local workgroup through shared variables and issue memory and control
|
|
flow barriers to synchronize with other members of the local workgroup.
|
|
|
|
The _global workgroups_ may be generated explcitly via the API, or
|
|
implicitly through the task shader's work creation mechanism.
|
|
endif::VK_NV_mesh_shader[]
|
|
|
|
[[shaders-vertex]]
|
|
== Vertex Shaders
|
|
|
|
Each vertex shader invocation operates on one vertex and its associated
|
|
<<fxvertex-attrib,vertex attribute>> data, and outputs one vertex and
|
|
associated data.
|
|
ifndef::VK_NV_mesh_shader[]
|
|
Graphics pipelines must: include a vertex shader, and the vertex shader
|
|
stage is always the first shader stage in the graphics pipeline.
|
|
endif::VK_NV_mesh_shader[]
|
|
ifdef::VK_NV_mesh_shader[]
|
|
Graphics pipelines using primitive shading must: include a vertex shader,
|
|
and the vertex shader stage is always the first shader stage in the graphics
|
|
pipeline.
|
|
endif::VK_NV_mesh_shader[]
|
|
|
|
[[shaders-vertex-execution]]
|
|
=== Vertex Shader Execution
|
|
|
|
A vertex shader must: be executed at least once for each vertex specified by
|
|
a draw command.
|
|
ifdef::VK_VERSION_1_1,VK_KHR_multiview[]
|
|
If the subpass includes multiple views in its view mask, the shader may: be
|
|
invoked separately for each view.
|
|
endif::VK_VERSION_1_1,VK_KHR_multiview[]
|
|
During execution, the shader is presented with the index of the vertex and
|
|
instance for which it has been invoked.
|
|
Input variables declared in the vertex shader are filled by the
|
|
implementation with the values of vertex attributes associated with the
|
|
invocation being executed.
|
|
|
|
If the same vertex is specified multiple times in a draw command (e.g. by
|
|
including the same index value multiple times in an index buffer) the
|
|
implementation may: reuse the results of vertex shading if it can statically
|
|
determine that the vertex shader invocations will produce identical results.
|
|
|
|
[NOTE]
|
|
.Note
|
|
====
|
|
It is implementation-dependent when and if results of vertex shading are
|
|
reused, and thus how many times the vertex shader will be executed.
|
|
This is true also if the vertex shader contains stores or atomic operations
|
|
(see <<features-vertexPipelineStoresAndAtomics,
|
|
pname:vertexPipelineStoresAndAtomics>>).
|
|
====
|
|
|
|
|
|
[[shaders-tessellation-control]]
|
|
== Tessellation Control Shaders
|
|
|
|
The tessellation control shader is used to read an input patch provided by
|
|
the application and to produce an output patch.
|
|
Each tessellation control shader invocation operates on an input patch
|
|
(after all control points in the patch are processed by a vertex shader) and
|
|
its associated data, and outputs a single control point of the output patch
|
|
and its associated data, and can: also output additional per-patch data.
|
|
The input patch is sized according to the pname:patchControlPoints member of
|
|
slink:VkPipelineTessellationStateCreateInfo, as part of input assembly.
|
|
The size of the output patch is controlled by the code:OpExecutionMode
|
|
code:OutputVertices specified in the tessellation control or tessellation
|
|
evaluation shaders, which must: be specified in at least one of the shaders.
|
|
The size of the input and output patches must: each be greater than zero and
|
|
less than or equal to
|
|
sname:VkPhysicalDeviceLimits::pname:maxTessellationPatchSize.
|
|
|
|
|
|
[[shaders-tessellation-control-execution]]
|
|
=== Tessellation Control Shader Execution
|
|
|
|
A tessellation control shader is invoked at least once for each _output_
|
|
vertex in a patch.
|
|
ifdef::VK_VERSION_1_1,VK_KHR_multiview[]
|
|
If the subpass includes multiple views in its view mask, the shader may: be
|
|
invoked separately for each view.
|
|
endif::VK_VERSION_1_1,VK_KHR_multiview[]
|
|
|
|
Inputs to the tessellation control shader are generated by the vertex
|
|
shader.
|
|
Each invocation of the tessellation control shader can: read the attributes
|
|
of any incoming vertices and their associated data.
|
|
The invocations corresponding to a given patch execute logically in
|
|
parallel, with undefined: relative execution order.
|
|
However, the code:OpControlBarrier instruction can: be used to provide
|
|
limited control of the execution order by synchronizing invocations within a
|
|
patch, effectively dividing tessellation control shader execution into a set
|
|
of phases.
|
|
Tessellation control shaders will read undefined: values if one invocation
|
|
reads a per-vertex or per-patch attribute written by another invocation at
|
|
any point during the same phase, or if two invocations attempt to write
|
|
different values to the same per-patch output in a single phase.
|
|
|
|
|
|
[[shaders-tessellation-evaluation]]
|
|
== Tessellation Evaluation Shaders
|
|
|
|
The Tessellation Evaluation Shader operates on an input patch of control
|
|
points and their associated data, and a single input barycentric coordinate
|
|
indicating the invocation's relative position within the subdivided patch,
|
|
and outputs a single vertex and its associated data.
|
|
|
|
|
|
[[shaders-tessellation-evaluation-execution]]
|
|
=== Tessellation Evaluation Shader Execution
|
|
|
|
A tessellation evaluation shader is invoked at least once for each unique
|
|
vertex generated by the tessellator.
|
|
ifdef::VK_VERSION_1_1,VK_KHR_multiview[]
|
|
If the subpass includes multiple views in its view mask, the shader may: be
|
|
invoked separately for each view.
|
|
endif::VK_VERSION_1_1,VK_KHR_multiview[]
|
|
|
|
|
|
[[shaders-geometry]]
|
|
== Geometry Shaders
|
|
|
|
The geometry shader operates on a group of vertices and their associated
|
|
data assembled from a single input primitive, and emits zero or more output
|
|
primitives and the group of vertices and their associated data required for
|
|
each output primitive.
|
|
|
|
|
|
[[shaders-geometry-execution]]
|
|
=== Geometry Shader Execution
|
|
|
|
A geometry shader is invoked at least once for each primitive produced by
|
|
the tessellation stages, or at least once for each primitive generated by
|
|
<<drawing,primitive assembly>> when tessellation is not in use.
|
|
A shader can request that the geometry shader runs multiple
|
|
<<geometry-invocations, instances>>.
|
|
A geometry shader is invoked at least once for each instance.
|
|
ifdef::VK_VERSION_1_1,VK_KHR_multiview[]
|
|
If the subpass includes multiple views in its view mask, the shader may: be
|
|
invoked separately for each view.
|
|
endif::VK_VERSION_1_1,VK_KHR_multiview[]
|
|
|
|
|
|
[[shaders-fragment]]
|
|
== Fragment Shaders
|
|
|
|
Fragment shaders are invoked as the result of rasterization in a graphics
|
|
pipeline.
|
|
Each fragment shader invocation operates on a single fragment and its
|
|
associated data.
|
|
With few exceptions, fragment shaders do not have access to any data
|
|
associated with other fragments and are considered to execute in isolation
|
|
of fragment shader invocations associated with other fragments.
|
|
|
|
|
|
[[shaders-fragment-execution]]
|
|
=== Fragment Shader Execution
|
|
|
|
For each fragment generated by rasterization, a fragment shader may: be
|
|
invoked.
|
|
A fragment shader must: not be invoked if the <<fragops-early,Early
|
|
Per-Fragment Tests>> cause it to have no coverage.
|
|
ifdef::VK_VERSION_1_1,VK_KHR_multiview[]
|
|
If the subpass includes multiple views in its view mask, the shader may: be
|
|
invoked separately for each view.
|
|
endif::VK_VERSION_1_1,VK_KHR_multiview[]
|
|
|
|
Furthermore, if it is determined that a fragment generated as the result of
|
|
rasterizing a first primitive will have its outputs entirely overwritten by
|
|
a fragment generated as the result of rasterizing a second primitive in the
|
|
same subpass, and the fragment shader used for the fragment has no other
|
|
side effects, then the fragment shader may: not be executed for the fragment
|
|
from the first primitive.
|
|
|
|
Relative ordering of execution of different fragment shader invocations is
|
|
not defined.
|
|
|
|
For each fragment generated by a primitive, the number of times the fragment
|
|
shader is invoked is implementation-dependent, but must: obey the following
|
|
constraints:
|
|
|
|
* Each covered sample is included in a single fragment shader invocation.
|
|
* When sample shading is not enabled, there is at least one fragment
|
|
shader invocation.
|
|
* When sample shading is enabled, the minimum number of fragment shader
|
|
invocations is as defined in
|
|
ifdef::VK_NV_shading_rate_image[]
|
|
<<primsrast-shading-rate-image,Shading Rate Image>> and
|
|
endif::VK_NV_shading_rate_image[]
|
|
<<primsrast-sampleshading,Sample Shading>>.
|
|
|
|
When there is more than one fragment shader invocation per fragment, the
|
|
association of samples to invocations is implementation-dependent.
|
|
|
|
In addition to the conditions outlined above for the invocation of a
|
|
fragment shader, a fragment shader invocation may: be produced as a _helper
|
|
invocation_.
|
|
A helper invocation is a fragment shader invocation that is created solely
|
|
for the purposes of evaluating derivatives for use in non-helper fragment
|
|
shader invocations.
|
|
Stores and atomics performed by helper invocations must: not have any effect
|
|
on memory, and values returned by atomic instructions in helper invocations
|
|
are undefined:.
|
|
|
|
ifdef::VK_EXT_fragment_density_map[]
|
|
If the render pass has a fragment density map attachment, more than one
|
|
fragment shader invocation may: be invoked for each covered sample.
|
|
Stores and atomics performed by these additional invocations have the normal
|
|
effect.
|
|
Such additional invocations are only produced if
|
|
sname:VkPhysicalDeviceFragmentDensityMapPropertiesEXT::pname:fragmentDensityInvocations
|
|
is ename:VK_TRUE.
|
|
|
|
[NOTE]
|
|
.Note
|
|
====
|
|
Implementations may: generate these additional fragment shader invocations
|
|
in order to make transitions between fragment areas with different fragment
|
|
densities more smooth.
|
|
====
|
|
endif::VK_EXT_fragment_density_map[]
|
|
|
|
[[shaders-fragment-earlytest]]
|
|
=== Early Fragment Tests
|
|
|
|
An explicit control is provided to allow fragment shaders to enable early
|
|
fragment tests.
|
|
If the fragment shader specifies the code:EarlyFragmentTests
|
|
code:OpExecutionMode, the per-fragment tests described in
|
|
<<fragops-early-mode,Early Fragment Test Mode>> are performed prior to
|
|
fragment shader execution.
|
|
Otherwise, they are performed after fragment shader execution.
|
|
|
|
ifdef::VK_EXT_post_depth_coverage[]
|
|
[[shaders-fragment-earlytest-postdepthcoverage]]
|
|
If the fragment shader additionally specifies the code:PostDepthCoverage
|
|
code:OpExecutionMode, the value of a variable decorated with the
|
|
<<interfaces-builtin-variables-samplemask,code:SampleMask>> built-in
|
|
reflects the coverage after the early fragment tests.
|
|
Otherwise, it reflects the coverage before the early fragment tests.
|
|
endif::VK_EXT_post_depth_coverage[]
|
|
|
|
ifdef::VK_EXT_fragment_shader_interlock[]
|
|
|
|
[[shaders-fragment-shader-interlock]]
|
|
=== Fragment Shader Interlock
|
|
|
|
In normal operation, it is possible for more than one fragment shader
|
|
invocation to be executed simultaneously for the same pixel if there are
|
|
overlapping primitives.
|
|
If the <<features-features-fragmentShaderSampleInterlock,
|
|
fragmentShaderSampleInterlock>>,
|
|
<<features-features-fragmentShaderPixelInterlock,
|
|
fragmentShaderPixelInterlock>>, or
|
|
<<features-features-fragmentShaderShadingRateInterlock,
|
|
fragmentShaderShadingRateInterlock>> features are enabled, it is possible to
|
|
define a critical section within the fragment shader that is guaranteed to
|
|
not run simultaneously with another fragment shader invocation for the same
|
|
sample(s) or pixel(s).
|
|
It is also possible to control the relative ordering of execution of these
|
|
critical sections across different fragment shader invovations.
|
|
|
|
If the <<spirvenv-capabilities-table-fragmentShaderInterlock,
|
|
code:FragmentShaderSampleInterlockEXT, code:FragmentShaderPixelInterlockEXT,
|
|
or code:FragmentShaderShadingRateInterlockEXT>> capabilities are declared in
|
|
the fragment shader, the code:OpBeginInvocationInterlockEXT and
|
|
code:OpEndInvocationInterlockEXT instructions must: be used to delimit a
|
|
critical section of fragment shader code.
|
|
|
|
To ensure each invocation of the critical section is executed in
|
|
<<drawing-primitive-order, primitive order>>, declare one of the
|
|
code:PixelInterlockOrderedEXT, code:SampleInterlockOrderedEXT, or
|
|
code:ShadingRateInterlockOrderedEXT execution modes.
|
|
If the order of execution of each invocation of the critical section does
|
|
not matter, declare one of the code:PixelInterlockUnorderedEXT,
|
|
code:SampleInterlockUnorderedEXT, or code:ShadingRateInterlockUnorderedEXT
|
|
execution modes.
|
|
|
|
The code:PixelInterlockOrderedEXT and code:PixelInterlockUnorderedEXT
|
|
execution modes provide mutual exclusion in the critical section for any
|
|
pair of fragments corresponding to the same pixel, or pixels if the fragment
|
|
covers more than one pixel.
|
|
With sample shading enabled, these execution modes are treated like
|
|
code:SampleInterlockOrderedEXT or code:SampleInterlockUnorderedEXT
|
|
respectively.
|
|
|
|
The code:SampleInterlockOrderedEXT and code:SampleInterlockUnorderedEXT
|
|
execution modes only provide mutual exclusion for pairs of fragments that
|
|
both cover at least one common sample in the same pixel; these are
|
|
recommended for performance if shaders use per-sample data structures.
|
|
If these execution modes are used in single-sample mode they are treated
|
|
like code:PixelInterlockOrderedEXT or code:PixelInterlockUnorderedEXT
|
|
respectively.
|
|
|
|
ifdef::VK_NV_shading_rate_image[]
|
|
The code:ShadingRateInterlockOrderedEXT and
|
|
code:ShadingRateInterlockUnorderedEXT execution modes provide mutual
|
|
exclusion for pairs of fragments that both have at least one common sample
|
|
in the same pixel, even if none of the common samples are covered by both
|
|
fragments.
|
|
With sample shading enabled, these execution modes are treated like
|
|
code:SampleInterlockOrderedEXT or code:SampleInterlockUnorderedEXT
|
|
respectively.
|
|
endif::VK_NV_shading_rate_image[]
|
|
ifndef::VK_NV_shading_rate_image[]
|
|
The code:ShadingRateInterlockOrderedEXT and
|
|
code:ShadingRateInterlockUnorderedEXT execution modes are not supported.
|
|
endif::VK_NV_shading_rate_image[]
|
|
|
|
endif::VK_EXT_fragment_shader_interlock[]
|
|
|
|
[[shaders-compute]]
|
|
== Compute Shaders
|
|
|
|
Compute shaders are invoked via flink:vkCmdDispatch and
|
|
flink:vkCmdDispatchIndirect commands.
|
|
In general, they have access to similar resources as shader stages executing
|
|
as part of a graphics pipeline.
|
|
|
|
Compute workloads are formed from groups of work items called workgroups and
|
|
processed by the compute shader in the current compute pipeline.
|
|
A workgroup is a collection of shader invocations that execute the same
|
|
shader, potentially in parallel.
|
|
Compute shaders execute in _global workgroups_ which are divided into a
|
|
number of _local workgroups_ with a size that can: be set by assigning a
|
|
value to the code:LocalSize execution mode or via an object decorated by the
|
|
code:WorkgroupSize decoration.
|
|
An invocation within a local workgroup can: share data with other members of
|
|
the local workgroup through shared variables and issue memory and control
|
|
flow barriers to synchronize with other members of the local workgroup.
|
|
|
|
|
|
[[shaders-interpolation-decorations]]
|
|
== Interpolation Decorations
|
|
|
|
Interpolation decorations control the behavior of attribute interpolation in
|
|
the fragment shader stage.
|
|
Interpolation decorations can: be applied to code:Input storage class
|
|
variables in the fragment shader stage's interface, and control the
|
|
interpolation behavior of those variables.
|
|
|
|
Inputs that could be interpolated can: be decorated by at most one of the
|
|
following decorations:
|
|
|
|
* code:Flat: no interpolation
|
|
* code:NoPerspective: linear interpolation (for
|
|
<<line_linear_interpolation,lines>> and
|
|
<<triangle_linear_interpolation,polygons>>)
|
|
ifdef::NV_VK_fragment_shader_barycentric[]
|
|
* code:PerVertexNV: values fetched from shader-specified primitive vertex
|
|
endif::NV_VK_fragment_shader_barycentric[]
|
|
|
|
Fragment input variables decorated with neither code:Flat nor
|
|
code:NoPerspective use perspective-correct interpolation (for
|
|
<<line_perspective_interpolation,lines>> and
|
|
<<triangle_perspective_interpolation,polygons>>).
|
|
|
|
The presence of and type of interpolation is controlled by the above
|
|
interpolation decorations as well as the auxiliary decorations code:Centroid
|
|
and code:Sample.
|
|
|
|
A variable decorated with code:Flat will not be interpolated.
|
|
Instead, it will have the same value for every fragment within a triangle.
|
|
This value will come from a single <<vertexpostproc-flatshading,provoking
|
|
vertex>>.
|
|
A variable decorated with code:Flat can: also be decorated with
|
|
code:Centroid or code:Sample, which will mean the same thing as decorating
|
|
it only as code:Flat.
|
|
|
|
For fragment shader input variables decorated with neither code:Centroid nor
|
|
code:Sample, the assigned variable may: be interpolated anywhere within the
|
|
fragment and a single value may: be assigned to each sample within the
|
|
fragment.
|
|
|
|
If a fragment shader input is decorated with code:Centroid, a single value
|
|
may: be assigned to that variable for all samples in the fragment, but that
|
|
value must: be interpolated to a location that lies in both the fragment and
|
|
in the primitive being rendered, including any of the fragment's samples
|
|
covered by the primitive.
|
|
Because the location at which the variable is interpolated may: be different
|
|
in neighboring fragments, and derivatives may: be computed by computing
|
|
differences between neighboring fragments, derivatives of centroid-sampled
|
|
inputs may: be less accurate than those for non-centroid interpolated
|
|
variables.
|
|
ifdef::VK_NV_shading_rate_image[]
|
|
If
|
|
slink:VkPipelineViewportShadingRateImageStateCreateInfoNV::pname:shadingRateImageEnable
|
|
is enabled, implementations may: estimate derivatives using differencing
|
|
without dividing by the distance between adjacent sample locations when the
|
|
fragment size is larger than one pixel.
|
|
endif::VK_NV_shading_rate_image[]
|
|
ifdef::VK_EXT_post_depth_coverage[]
|
|
The <<shaders-fragment-earlytest-postdepthcoverage,code:PostDepthCoverage>>
|
|
execution mode does not affect the determination of the centroid location.
|
|
endif::VK_EXT_post_depth_coverage[]
|
|
|
|
If a fragment shader input is decorated with code:Sample, a separate value
|
|
must: be assigned to that variable for each covered sample in the fragment,
|
|
and that value must: be sampled at the location of the individual sample.
|
|
When pname:rasterizationSamples is ename:VK_SAMPLE_COUNT_1_BIT, the fragment
|
|
center must: be used for code:Centroid, code:Sample, and undecorated
|
|
attribute interpolation.
|
|
|
|
Fragment shader inputs that are signed or unsigned integers, integer
|
|
vectors, or any double-precision floating-point type must: be decorated with
|
|
code:Flat.
|
|
|
|
ifdef::VK_AMD_shader_explicit_vertex_parameter[]
|
|
When the `<<VK_AMD_shader_explicit_vertex_parameter>>` device extension is
|
|
enabled inputs can: be also decorated with the code:CustomInterpAMD
|
|
interpolation decoration, including fragment shader inputs that are signed
|
|
or unsigned integers, integer vectors, or any double-precision
|
|
floating-point type.
|
|
Inputs decorated with code:CustomInterpAMD can: only be accessed by the
|
|
extended instruction code:InterpolateAtVertexAMD and allows accessing the
|
|
value of the input for individual vertices of the primitive.
|
|
endif::VK_AMD_shader_explicit_vertex_parameter[]
|
|
|
|
ifdef::VK_NV_fragment_shader_barycentric[]
|
|
[[shaders-interpolation-decorations-pervertexnv]]
|
|
When the pname:fragmentShaderBarycentric feature is enabled, inputs can: be
|
|
also decorated with the code:PerVertexNV interpolation decoration, including
|
|
fragment shader inputs that are signed or unsigned integers, integer
|
|
vectors, or any double-precision floating-point type.
|
|
Inputs decorated with code:PerVertexNV can: only be accessed using an extra
|
|
array dimension, where the extra index identifies one of the vertices of the
|
|
primitive that produced the fragment.
|
|
endif::VK_NV_fragment_shader_barycentric[]
|
|
|
|
ifdef::VK_NV_ray_tracing[]
|
|
include::VK_NV_ray_tracing/raytracing-shaders.txt[]
|
|
endif::VK_NV_ray_tracing[]
|
|
|
|
[[shaders-staticuse]]
|
|
== Static Use
|
|
|
|
A SPIR-V module declares a global object in memory using the code:OpVariable
|
|
instruction, which results in a pointer code:x to that object.
|
|
A specific entry point in a SPIR-V module is said to _statically use_ that
|
|
object if that entry point's call tree contains a function that contains a
|
|
memory instruction or image instruction with code:x as an code:id operand.
|
|
See the "`Memory Instructions`" and "`Image Instructions`" subsections of
|
|
section 3 "`Binary Form`" of the SPIR-V specification for the complete list
|
|
of SPIR-V memory instructions.
|
|
|
|
Static use is not used to control the behavior of variables with code:Input
|
|
and code:Output storage.
|
|
The effects of those variables are applied based only on whether they are
|
|
present in a shader entry point's interface.
|
|
|
|
[[shaders-invocationgroups]]
|
|
== Invocation and Derivative Groups
|
|
|
|
An _invocation group_ (see the subsection "`Control Flow`" of section 2 of
|
|
the SPIR-V specification) for a compute shader is the set of invocations in
|
|
a single local workgroup.
|
|
For graphics shaders, an invocation group is an implementation-dependent
|
|
subset of the set of shader invocations of a given shader stage which are
|
|
produced by a single drawing command.
|
|
For indirect drawing commands with pname:drawCount greater than one,
|
|
invocations from separate draws are in distinct invocation groups.
|
|
|
|
[NOTE]
|
|
.Note
|
|
====
|
|
Because the partitioning of invocations into invocation groups is
|
|
implementation-dependent and not observable, applications generally need to
|
|
assume the worst case of all invocations in a draw belonging to a single
|
|
invocation group.
|
|
====
|
|
|
|
A _derivative group_ (see the subsection "`Control Flow`" of section 2 of
|
|
the SPIR-V 1.00 Revision 4 specification) is a set of invocations which are
|
|
used together to compute a derivative.
|
|
ifdef::VK_VERSION_1_1[]
|
|
For a fragment shader, a derivative group is generated by a single primitive
|
|
(point, line, or triangle) and includes any helper invocations needed to
|
|
compute derivatives.
|
|
If the pname:subgroupSize field of slink:VkPhysicalDeviceSubgroupProperties
|
|
is at least 4, a derivative group for a fragment shader corresponds to a
|
|
single subgroup quad.
|
|
Otherwise, a derivative group is the set of invocations generated by a
|
|
single primitive.
|
|
endif::VK_VERSION_1_1[]
|
|
ifndef::VK_VERSION_1_1[]
|
|
For a fragment shader, a derivative group is the set of invocations
|
|
generated by a single primitive.
|
|
endif::VK_VERSION_1_1[]
|
|
ifdef::VK_NV_compute_shader_derivatives[]
|
|
A derivative group for a compute shader is a single local workgroup.
|
|
endif::VK_NV_compute_shader_derivatives[]
|
|
|
|
Derivative values are undefined: for a sampled image instruction if the
|
|
instruction is in flow control that is not uniform across the derivative
|
|
group.
|
|
|
|
ifdef::VK_VERSION_1_1[]
|
|
[[shaders-subgroup]]
|
|
== Subgroups
|
|
|
|
A _subgroup_ (see the subsection "`Control Flow`" of section 2 of the SPIR-V
|
|
1.3 Revision 1 specification) is a set of invocations that can synchronize
|
|
and share data with each other efficiently.
|
|
An invocation group is partitioned into one or more subgroups.
|
|
|
|
Subgroup operations are divided into various categories as described in
|
|
elink:VkSubgroupFeatureFlagBits.
|
|
|
|
[[shaders-subgroup-basic]]
|
|
=== Basic Subgroup Operations
|
|
|
|
The basic subgroup operations allow two classes of functionality within
|
|
shaders
|
|
- elect and barrier.
|
|
Invocations within a subgroup can: choose a single invocation to perform
|
|
some task for the subgroup as a whole using elect.
|
|
Invocations within a subgroup can: perform a subgroup barrier to ensure the
|
|
ordering of execution or memory accesses within a subgroup.
|
|
Barriers can: be performed on buffer memory accesses, code:WorkgroupLocal
|
|
memory accesses, and image memory accesses to ensure that any results
|
|
written are visible by other invocations within the subgroup.
|
|
An code:OpControlBarrier can: also be used to perform a full execution
|
|
control barrier.
|
|
A full execution control barrier will ensure that each active invocation
|
|
within the subgroup reaches a point of execution before any are allowed to
|
|
continue.
|
|
|
|
[[shaders-subgroup-vote]]
|
|
=== Vote Subgroup Operations
|
|
|
|
The vote subgroup operations allow invocations within a subgroup to compare
|
|
values across a subgroup.
|
|
The types of votes enabled are:
|
|
|
|
* Do all active subgroup invocations agree that an expression is true?
|
|
* Do any active subgroup invocations evaluate an expression to true?
|
|
* Do all active subgroup invocations have the same value of an expression?
|
|
|
|
[NOTE]
|
|
.Note
|
|
====
|
|
These operations are useful in combination with control flow in that they
|
|
allow for developers to check whether conditions match across the subgroup
|
|
and choose potentially faster code-paths in these cases.
|
|
====
|
|
|
|
[[shaders-subgroup-arithmetic]]
|
|
=== Arithmetic Subgroup Operations
|
|
|
|
The arithmetic subgroup operations allow invocations to perform scan and
|
|
reduction operations across a subgroup.
|
|
For reduction operations, each invocation in a subgroup will obtain the same
|
|
result of these arithmetic operations applied across the subgroup.
|
|
For scan operations, each invocation in the subgroup will perform an
|
|
inclusive or exclusive scan, cumulatively applying the operation across the
|
|
invocations in a subgroup in an implementation-defined order.
|
|
The operations supported are add, mul, min, max, and, or, xor.
|
|
|
|
[[shaders-subgroup-ballot]]
|
|
=== Ballot Subgroup Operations
|
|
|
|
The ballot subgroup operations allow invocations to perform more complex
|
|
votes across the subgroup.
|
|
The ballot functionality allows all invocations within a subgroup to provide
|
|
a boolean value and get as a result what each invocation provided as their
|
|
boolean value.
|
|
The broadcast functionality allows values to be broadcast from an invocation
|
|
to all other invocations within the subgroup, given that the invocation to
|
|
be broadcast from is known at pipeline creation time.
|
|
|
|
[[shaders-subgroup-shuffle]]
|
|
=== Shuffle Subgroup Operations
|
|
|
|
The shuffle subgroup operations allow invocations to read values from other
|
|
invocations within a subgroup.
|
|
|
|
[[shaders-subgroup-shuffle-relative]]
|
|
=== Shuffle Relative Subgroup Operations
|
|
|
|
The shuffle relative subgroup operations allow invocations to read values
|
|
from other invocations within the subgroup relative to the current
|
|
invocation in the group.
|
|
The relative operations supported allow data to be shifted up and down
|
|
through the invocations within a subgroup.
|
|
|
|
[[shaders-subgroup-clustered]]
|
|
=== Clustered Subgroup Operations
|
|
|
|
The clustered subgroup operations allow invocations to perform an operation
|
|
among partitions of a subgroup, such that the operation is only performed
|
|
within the subgroup invocations within a partition.
|
|
The partitions for clustered subgroup operations are consecutive
|
|
power-of-two size groups of invocations and the cluster size must: be known
|
|
at pipeline creation time.
|
|
The operations supported are add, mul, min, max, and, or, xor.
|
|
|
|
[[shaders-subgroup-quad]]
|
|
=== Quad Subgroup Operations
|
|
|
|
The quad subgroup operations allow clusters of 4 invocations (a quad), to
|
|
share data efficiently with each other.
|
|
ifdef::VK_VERSION_1_1[]
|
|
For fragment shaders, if the pname:subgroupSize field of
|
|
slink:VkPhysicalDeviceSubgroupProperties is at least 4, each quad
|
|
corresponds to one of the groups of four shader invocations used for
|
|
<<texture-derivatives,derivatives>>.
|
|
endif::VK_VERSION_1_1[]
|
|
ifdef::VK_NV_compute_shader_derivatives[]
|
|
For compute shaders using the code:DerivativeGroupQuadsNV or
|
|
code:DerivativeGroupLinearNV execution modes, each quad corresponds to one
|
|
of the groups of four shader invocations used for
|
|
<<texture-derivatives-compute,derivatives>>.
|
|
The invocations in each quad are ordered to have attribute values of
|
|
P~i0,j0~, P~i1,j0~, P~i0,j1~, and P~i1,j1~, respectively.
|
|
endif::VK_NV_compute_shader_derivatives[]
|
|
|
|
ifdef::VK_NV_shader_subgroup_partitioned[]
|
|
|
|
[[shaders-subgroup-partitioned]]
|
|
=== Partitioned Subgroup Operations
|
|
|
|
The partitioned subgroup operations allow a subgroup to partition its
|
|
invocations into disjoint subsets and to perform scan and reduce operations
|
|
among invocations belonging to the same subset.
|
|
The partitions for partitioned subgroup operations are specified by a ballot
|
|
operation and can: be computed at runtime.
|
|
The operations supported are add, mul, min, max, and, or, xor.
|
|
|
|
endif::VK_NV_shader_subgroup_partitioned[]
|
|
|
|
endif::VK_VERSION_1_1[]
|
|
|
|
ifdef::VK_NV_cooperative_matrix[]
|
|
== Cooperative Matrices
|
|
|
|
A _cooperative matrix_ type is a SPIR-V type where the storage for and
|
|
computations performed on the matrix are spread across a set of invocations
|
|
such as a subgroup.
|
|
These types give the implementation freedom in how to optimize matrix
|
|
multiplies.
|
|
|
|
SPIR-V defines the types and instructions, but does not specify rules about
|
|
what sizes/combinations are valid, and it is expected that different
|
|
implementations may: support different sizes.
|
|
|
|
[open,refpage='vkGetPhysicalDeviceCooperativeMatrixPropertiesNV',desc='Returns properties describing what cooperative matrix types are supported',type='protos']
|
|
--
|
|
|
|
To enumerate the supported cooperative matrix types and operations, call:
|
|
|
|
include::{generated}/api/protos/vkGetPhysicalDeviceCooperativeMatrixPropertiesNV.txt[]
|
|
|
|
* pname:physicalDevice is the physical device.
|
|
* pname:pPropertyCount is a pointer to an integer related to the number of
|
|
cooperative matrix properties available or queried.
|
|
* pname:pProperties is either `NULL` or a pointer to an array of
|
|
slink:VkCooperativeMatrixPropertiesNV structures.
|
|
|
|
If pname:pProperties is `NULL`, then the number of cooperative matrix
|
|
properties available is returned in pname:pPropertyCount.
|
|
Otherwise, pname:pPropertyCount must: point to a variable set by the user to
|
|
the number of elements in the pname:pProperties array, and on return the
|
|
variable is overwritten with the number of structures actually written to
|
|
pname:pProperties.
|
|
If pname:pPropertyCount is less than the number of cooperative matrix
|
|
properties available, at most pname:pPropertyCount structures will be
|
|
written.
|
|
If pname:pPropertyCount is smaller than the number of cooperative matrix
|
|
properties available, ename:VK_INCOMPLETE will be returned instead of
|
|
ename:VK_SUCCESS, to indicate that not all the available cooperative matrix
|
|
properties were returned.
|
|
|
|
include::{generated}/validity/protos/vkGetPhysicalDeviceCooperativeMatrixPropertiesNV.txt[]
|
|
--
|
|
|
|
[open,refpage='VkCooperativeMatrixPropertiesNV',desc='Structure specifying cooperative matrix properties',type='structs']
|
|
--
|
|
|
|
Each sname:VkCooperativeMatrixPropertiesNV structure describes a single
|
|
supported combination of types for a matrix multiply/add operation
|
|
(code:OpCooperativeMatrixMulAddNV).
|
|
The multiply can: be described in terms of the following variables and types
|
|
(in SPIR-V pseudocode):
|
|
|
|
[source,c]
|
|
---------------------------------------------------
|
|
%A is of type OpTypeCooperativeMatrixNV %AType %scope %MSize %KSize
|
|
%B is of type OpTypeCooperativeMatrixNV %BType %scope %KSize %NSize
|
|
%C is of type OpTypeCooperativeMatrixNV %CType %scope %MSize %NSize
|
|
%D is of type OpTypeCooperativeMatrixNV %DType %scope %MSize %NSize
|
|
|
|
%D = %A * %B + %C // using OpCooperativeMatrixMulAddNV
|
|
---------------------------------------------------
|
|
|
|
A matrix multiply with these dimensions is known as an _MxNxK_ matrix
|
|
multiply.
|
|
|
|
The sname:VkCooperativeMatrixPropertiesNV structure is defined as:
|
|
|
|
include::{generated}/api/structs/VkCooperativeMatrixPropertiesNV.txt[]
|
|
|
|
* pname:sType is the type of this structure.
|
|
* pname:pNext is `NULL` or a pointer to an extension-specific structure.
|
|
* pname:MSize is the number of rows in matrices A, C, and D.
|
|
* pname:KSize is the number of columns in matrix A and rows in matrix B.
|
|
* pname:NSize is the number of columns in matrices B, C, D.
|
|
* pname:AType is the component type of matrix A, of type
|
|
elink:VkComponentTypeNV.
|
|
* pname:BType is the component type of matrix B, of type
|
|
elink:VkComponentTypeNV.
|
|
* pname:CType is the component type of matrix C, of type
|
|
elink:VkComponentTypeNV.
|
|
* pname:DType is the component type of matrix D, of type
|
|
elink:VkComponentTypeNV.
|
|
* pname:scope is the scope of all the matrix types, of type
|
|
elink:VkScopeNV.
|
|
|
|
If some types are preferred over other types (e.g. for performance), they
|
|
should: appear earlier in the list enumerated by
|
|
flink:vkGetPhysicalDeviceCooperativeMatrixPropertiesNV.
|
|
|
|
At least one entry in the list must: have power of two values for all of
|
|
pname:MSize, pname:KSize, and pname:NSize.
|
|
|
|
include::{generated}/validity/structs/VkCooperativeMatrixPropertiesNV.txt[]
|
|
--
|
|
|
|
[open,refpage='VkScopeNV',desc='Specify SPIR-V scope',type='enums']
|
|
--
|
|
|
|
Possible values for elink:VkScopeNV include:
|
|
|
|
include::{generated}/api/enums/VkScopeNV.txt[]
|
|
|
|
* ename:VK_SCOPE_DEVICE_NV corresponds to SPIR-V code:Device scope.
|
|
* ename:VK_SCOPE_WORKGROUP_NV corresponds to SPIR-V code:Workgroup scope.
|
|
* ename:VK_SCOPE_SUBGROUP_NV corresponds to SPIR-V code:Subgroup scope.
|
|
* ename:VK_SCOPE_QUEUE_FAMILY_NV corresponds to SPIR-V code:QueueFamilyKHR
|
|
scope.
|
|
|
|
All enum values match the corresponding SPIR-V value.
|
|
--
|
|
|
|
[open,refpage='VkComponentTypeNV',desc='Specify SPIR-V cooperative matrix component type',type='enums']
|
|
--
|
|
|
|
Possible values for elink:VkComponentTypeNV include:
|
|
|
|
include::{generated}/api/enums/VkComponentTypeNV.txt[]
|
|
|
|
* ename:VK_COMPONENT_TYPE_FLOAT16_NV corresponds to SPIR-V
|
|
code:OpTypeFloat 16.
|
|
* ename:VK_COMPONENT_TYPE_FLOAT32_NV corresponds to SPIR-V
|
|
code:OpTypeFloat 32.
|
|
* ename:VK_COMPONENT_TYPE_FLOAT64_NV corresponds to SPIR-V
|
|
code:OpTypeFloat 64.
|
|
* ename:VK_COMPONENT_TYPE_SINT8_NV corresponds to SPIR-V code:OpTypeInt 8
|
|
1.
|
|
* ename:VK_COMPONENT_TYPE_SINT16_NV corresponds to SPIR-V code:OpTypeInt
|
|
16 1.
|
|
* ename:VK_COMPONENT_TYPE_SINT32_NV corresponds to SPIR-V code:OpTypeInt
|
|
32 1.
|
|
* ename:VK_COMPONENT_TYPE_SINT64_NV corresponds to SPIR-V code:OpTypeInt
|
|
64 1.
|
|
* ename:VK_COMPONENT_TYPE_UINT8_NV corresponds to SPIR-V code:OpTypeInt 8
|
|
0.
|
|
* ename:VK_COMPONENT_TYPE_UINT16_NV corresponds to SPIR-V code:OpTypeInt
|
|
16 0.
|
|
* ename:VK_COMPONENT_TYPE_UINT32_NV corresponds to SPIR-V code:OpTypeInt
|
|
32 0.
|
|
* ename:VK_COMPONENT_TYPE_UINT64_NV corresponds to SPIR-V code:OpTypeInt
|
|
64 0.
|
|
--
|
|
|
|
endif::VK_NV_cooperative_matrix[]
|
|
|
|
ifdef::VK_EXT_validation_cache[]
|
|
[[shaders-validation-cache]]
|
|
== Validation Cache
|
|
|
|
[open,refpage='VkValidationCacheEXT',desc='Opaque handle to a validation cache object',type='handles']
|
|
--
|
|
|
|
Validation cache objects allow the result of internal validation to be
|
|
reused, both within a single application run and between multiple runs.
|
|
Reuse within a single run is achieved by passing the same validation cache
|
|
object when creating supported Vulkan objects.
|
|
Reuse across runs of an application is achieved by retrieving validation
|
|
cache contents in one run of an application, saving the contents, and using
|
|
them to preinitialize a validation cache on a subsequent run.
|
|
The contents of the validation cache objects are managed by the validation
|
|
layers.
|
|
Applications can: manage the host memory consumed by a validation cache
|
|
object and control the amount of data retrieved from a validation cache
|
|
object.
|
|
|
|
Validation cache objects are represented by sname:VkValidationCacheEXT
|
|
handles:
|
|
|
|
include::{generated}/api/handles/VkValidationCacheEXT.txt[]
|
|
|
|
--
|
|
|
|
[open,refpage='vkCreateValidationCacheEXT',desc='Creates a new validation cache',type='protos']
|
|
--
|
|
|
|
To create validation cache objects, call:
|
|
|
|
include::{generated}/api/protos/vkCreateValidationCacheEXT.txt[]
|
|
|
|
* pname:device is the logical device that creates the validation cache
|
|
object.
|
|
* pname:pCreateInfo is a pointer to a slink:VkValidationCacheCreateInfoEXT
|
|
structure that contains the initial parameters for the validation cache
|
|
object.
|
|
* pname:pAllocator controls host memory allocation as described in the
|
|
<<memory-allocation, Memory Allocation>> chapter.
|
|
* pname:pValidationCache is a pointer to a slink:VkValidationCacheEXT
|
|
handle in which the resulting validation cache object is returned.
|
|
|
|
[NOTE]
|
|
.Note
|
|
====
|
|
Applications can: track and manage the total host memory size of a
|
|
validation cache object using the pname:pAllocator.
|
|
Applications can: limit the amount of data retrieved from a validation cache
|
|
object in fname:vkGetValidationCacheDataEXT.
|
|
Implementations should: not internally limit the total number of entries
|
|
added to a validation cache object or the total host memory consumed.
|
|
====
|
|
|
|
Once created, a validation cache can: be passed to the
|
|
fname:vkCreateShaderModule command as part of the
|
|
sname:VkShaderModuleCreateInfo pname:pNext chain.
|
|
If a sname:VkShaderModuleValidationCacheCreateInfoEXT object is part of the
|
|
sname:VkShaderModuleCreateInfo::pname:pNext chain, and its
|
|
pname:validationCache field is not dlink:VK_NULL_HANDLE, the implementation
|
|
will query it for possible reuse opportunities and update it with new
|
|
content.
|
|
The use of the validation cache object in these commands is internally
|
|
synchronized, and the same validation cache object can: be used in multiple
|
|
threads simultaneously.
|
|
|
|
[NOTE]
|
|
.Note
|
|
====
|
|
Implementations should: make every effort to limit any critical sections to
|
|
the actual accesses to the cache, which is expected to be significantly
|
|
shorter than the duration of the fname:vkCreateShaderModule command.
|
|
====
|
|
|
|
include::{generated}/validity/protos/vkCreateValidationCacheEXT.txt[]
|
|
--
|
|
|
|
[open,refpage='VkValidationCacheCreateInfoEXT',desc='Structure specifying parameters of a newly created validation cache',type='structs']
|
|
--
|
|
|
|
The sname:VkValidationCacheCreateInfoEXT structure is defined as:
|
|
|
|
include::{generated}/api/structs/VkValidationCacheCreateInfoEXT.txt[]
|
|
|
|
* pname:sType is the type of this structure.
|
|
* pname:pNext is `NULL` or a pointer to an extension-specific structure.
|
|
* pname:flags is reserved for future use.
|
|
* pname:initialDataSize is the number of bytes in pname:pInitialData.
|
|
If pname:initialDataSize is zero, the validation cache will initially be
|
|
empty.
|
|
* pname:pInitialData is a pointer to previously retrieved validation cache
|
|
data.
|
|
If the validation cache data is incompatible (as defined below) with the
|
|
device, the validation cache will be initially empty.
|
|
If pname:initialDataSize is zero, pname:pInitialData is ignored.
|
|
|
|
.Valid Usage
|
|
****
|
|
* [[VUID-VkValidationCacheCreateInfoEXT-initialDataSize-01534]]
|
|
If pname:initialDataSize is not `0`, it must: be equal to the size of
|
|
pname:pInitialData, as returned by fname:vkGetValidationCacheDataEXT
|
|
when pname:pInitialData was originally retrieved
|
|
* [[VUID-VkValidationCacheCreateInfoEXT-initialDataSize-01535]]
|
|
If pname:initialDataSize is not `0`, pname:pInitialData must: have been
|
|
retrieved from a previous call to fname:vkGetValidationCacheDataEXT
|
|
****
|
|
|
|
include::{generated}/validity/structs/VkValidationCacheCreateInfoEXT.txt[]
|
|
--
|
|
|
|
[open,refpage='VkValidationCacheCreateFlagsEXT',desc='Reserved for future use',type='flags']
|
|
--
|
|
include::{generated}/api/flags/VkValidationCacheCreateFlagsEXT.txt[]
|
|
|
|
tname:VkValidationCacheCreateFlagsEXT is a bitmask type for setting a mask,
|
|
but is currently reserved for future use.
|
|
--
|
|
|
|
[open,refpage='vkMergeValidationCachesEXT',desc='Combine the data stores of validation caches',type='protos']
|
|
--
|
|
|
|
Validation cache objects can: be merged using the command:
|
|
|
|
include::{generated}/api/protos/vkMergeValidationCachesEXT.txt[]
|
|
|
|
* pname:device is the logical device that owns the validation cache
|
|
objects.
|
|
* pname:dstCache is the handle of the validation cache to merge results
|
|
into.
|
|
* pname:srcCacheCount is the length of the pname:pSrcCaches array.
|
|
* pname:pSrcCaches is an array of validation cache handles, which will be
|
|
merged into pname:dstCache.
|
|
The previous contents of pname:dstCache are included after the merge.
|
|
|
|
[NOTE]
|
|
.Note
|
|
====
|
|
The details of the merge operation are implementation dependent, but
|
|
implementations should: merge the contents of the specified validation
|
|
caches and prune duplicate entries.
|
|
====
|
|
|
|
.Valid Usage
|
|
****
|
|
* [[VUID-vkMergeValidationCachesEXT-dstCache-01536]]
|
|
pname:dstCache must: not appear in the list of source caches
|
|
****
|
|
|
|
include::{generated}/validity/protos/vkMergeValidationCachesEXT.txt[]
|
|
--
|
|
|
|
[open,refpage='vkGetValidationCacheDataEXT',desc='Get the data store from a validation cache',type='protos']
|
|
--
|
|
|
|
Data can: be retrieved from a validation cache object using the command:
|
|
|
|
include::{generated}/api/protos/vkGetValidationCacheDataEXT.txt[]
|
|
|
|
* pname:device is the logical device that owns the validation cache.
|
|
* pname:validationCache is the validation cache to retrieve data from.
|
|
* pname:pDataSize is a pointer to a value related to the amount of data in
|
|
the validation cache, as described below.
|
|
* pname:pData is either `NULL` or a pointer to a buffer.
|
|
|
|
If pname:pData is `NULL`, then the maximum size of the data that can: be
|
|
retrieved from the validation cache, in bytes, is returned in
|
|
pname:pDataSize.
|
|
Otherwise, pname:pDataSize must: point to a variable set by the user to the
|
|
size of the buffer, in bytes, pointed to by pname:pData, and on return the
|
|
variable is overwritten with the amount of data actually written to
|
|
pname:pData.
|
|
|
|
If pname:pDataSize is less than the maximum size that can: be retrieved by
|
|
the validation cache, at most pname:pDataSize bytes will be written to
|
|
pname:pData, and fname:vkGetValidationCacheDataEXT will return
|
|
ename:VK_INCOMPLETE.
|
|
Any data written to pname:pData is valid and can: be provided as the
|
|
pname:pInitialData member of the sname:VkValidationCacheCreateInfoEXT
|
|
structure passed to fname:vkCreateValidationCacheEXT.
|
|
|
|
Two calls to fname:vkGetValidationCacheDataEXT with the same parameters
|
|
must: retrieve the same data unless a command that modifies the contents of
|
|
the cache is called between them.
|
|
|
|
[[validation-cache-header]]
|
|
Applications can: store the data retrieved from the validation cache, and
|
|
use these data, possibly in a future run of the application, to populate new
|
|
validation cache objects.
|
|
The results of validation, however, may: depend on the vendor ID, device ID,
|
|
driver version, and other details of the device.
|
|
To enable applications to detect when previously retrieved data is
|
|
incompatible with the device, the initial bytes written to pname:pData must:
|
|
be a header consisting of the following members:
|
|
|
|
.Layout for validation cache header version ename:VK_VALIDATION_CACHE_HEADER_VERSION_ONE_EXT
|
|
[width="85%",cols="8%,21%,71%",options="header"]
|
|
|====
|
|
| Offset | Size | Meaning
|
|
| 0 | 4 | length in bytes of the entire validation cache header
|
|
written as a stream of bytes, with the least
|
|
significant byte first
|
|
| 4 | 4 | a elink:VkValidationCacheHeaderVersionEXT value
|
|
written as a stream of bytes, with the least
|
|
significant byte first
|
|
| 8 | ename:VK_UUID_SIZE | a layer commit ID expressed as a UUID, which uniquely
|
|
identifies the version of the validation layers used
|
|
to generate these validation results
|
|
|====
|
|
|
|
The first four bytes encode the length of the entire validation cache
|
|
header, in bytes.
|
|
This value includes all fields in the header including the validation cache
|
|
version field and the size of the length field.
|
|
|
|
The next four bytes encode the validation cache version, as described for
|
|
elink:VkValidationCacheHeaderVersionEXT.
|
|
A consumer of the validation cache should: use the cache version to
|
|
interpret the remainder of the cache header.
|
|
|
|
If pname:pDataSize is less than what is necessary to store this header,
|
|
nothing will be written to pname:pData and zero will be written to
|
|
pname:pDataSize.
|
|
|
|
include::{generated}/validity/protos/vkGetValidationCacheDataEXT.txt[]
|
|
--
|
|
|
|
[open,refpage='VkValidationCacheHeaderVersionEXT',desc='Encode validation cache version',type='enums',xrefs='vkCreateValidationCacheEXT vkGetValidationCacheDataEXT']
|
|
--
|
|
Possible values of the second group of four bytes in the header returned by
|
|
flink:vkGetValidationCacheDataEXT, encoding the validation cache version,
|
|
are:
|
|
|
|
include::{generated}/api/enums/VkValidationCacheHeaderVersionEXT.txt[]
|
|
|
|
* ename:VK_VALIDATION_CACHE_HEADER_VERSION_ONE_EXT specifies version one
|
|
of the validation cache.
|
|
--
|
|
|
|
[open,refpage='vkDestroyValidationCacheEXT',desc='Destroy a validation cache object',type='protos']
|
|
--
|
|
|
|
To destroy a validation cache, call:
|
|
|
|
include::{generated}/api/protos/vkDestroyValidationCacheEXT.txt[]
|
|
|
|
* pname:device is the logical device that destroys the validation cache
|
|
object.
|
|
* pname:validationCache is the handle of the validation cache to destroy.
|
|
* pname:pAllocator controls host memory allocation as described in the
|
|
<<memory-allocation, Memory Allocation>> chapter.
|
|
|
|
.Valid Usage
|
|
****
|
|
* [[VUID-vkDestroyValidationCacheEXT-validationCache-01537]]
|
|
If sname:VkAllocationCallbacks were provided when pname:validationCache
|
|
was created, a compatible set of callbacks must: be provided here
|
|
* [[VUID-vkDestroyValidationCacheEXT-validationCache-01538]]
|
|
If no sname:VkAllocationCallbacks were provided when
|
|
pname:validationCache was created, pname:pAllocator must: be `NULL`
|
|
****
|
|
|
|
include::{generated}/validity/protos/vkDestroyValidationCacheEXT.txt[]
|
|
--
|
|
endif::VK_EXT_validation_cache[]
|