485 lines
22 KiB
Plaintext
485 lines
22 KiB
Plaintext
// Copyright (c) 2015-2016 The Khronos Group Inc.
|
|
// Copyright notice at https://www.khronos.org/registry/speccopyright.html
|
|
|
|
[[shaders]]
|
|
= Shaders
|
|
|
|
A shader specifies programmable operations that execute for each vertex,
|
|
control point, tessellated vertex, primitive, fragment, or workgroup in
|
|
the corresponding stage(s) of the graphics and compute pipelines.
|
|
|
|
Graphics pipelines include vertex shader execution as a result of
|
|
<<drawing,primitive assembly>>, followed, if enabled, by tessellation
|
|
control and evaluation shaders operating on
|
|
<<drawing-primitive-topologies-patches,patches>>, geometry shaders, if
|
|
enabled, operating on primitives, and fragment shaders, if present,
|
|
operating on fragments generated by <<primsrast,Rasterization>>. In this
|
|
specification, vertex, tessellation control, tessellation evaluation and
|
|
geometry shaders are collectively referred to as vertex processing stages
|
|
and occur in the logical pipeline before rasterization. The fragment shader
|
|
occurs logically after rasterization.
|
|
|
|
Only the compute shader stage is included in a compute pipeline. Compute
|
|
shaders operate on compute invocations in a workgroup.
|
|
|
|
Shaders can: read from input variables, and read from and write to
|
|
output variables. Input and output variables can: be used to transfer data
|
|
between shader stages, or to allow the shader to interact with values that
|
|
exist in the execution environment. Similarly, the execution environment
|
|
provides constants that describe capabilities.
|
|
|
|
Shader variables are associated with execution environment-provided inputs
|
|
and outputs using _built-in_ decorations in the shader. The available
|
|
decorations for each stage are documented in the following subsections.
|
|
|
|
|
|
[[shader-modules]]
|
|
== Shader Modules
|
|
|
|
_Shader modules_ contain _shader code_ and one or more entry points. Shaders
|
|
are selected from a shader module by specifying an entry point as part of
|
|
<<pipelines,pipeline>> creation. The stages of a pipeline can: use shaders
|
|
that come from different modules. The shader code defining a shader module
|
|
must: be in the SPIR-V format, as described by the <<spirvenv,{apiname}
|
|
Environment for SPIR-V>> appendix.
|
|
|
|
A shader module is created by calling:
|
|
|
|
include::../protos/vkCreateShaderModule.txt[]
|
|
|
|
* pname:device is the logical device that creates the shader module.
|
|
* pname:pCreateInfo parameter is a pointer to an instance of the
|
|
sname:VkShaderModuleCreateInfo structure.
|
|
* pname:pAllocator controls host memory allocation as described in the
|
|
<<memory-allocation, Memory Allocation>> chapter.
|
|
* pname:pShaderModule points to a sname:VkShaderModule handle in which the
|
|
resulting render pass object is returned.
|
|
|
|
include::../validity/protos/vkCreateShaderModule.txt[]
|
|
|
|
The sname:VkShaderModuleCreateInfo structure is defined as:
|
|
|
|
include::../structs/VkShaderModuleCreateInfo.txt[]
|
|
|
|
* pname:sType is the type of this structure.
|
|
* pname:pNext is `NULL` or a pointer to an extension-specific structure.
|
|
* pname:flags is reserved for future use.
|
|
* pname:codeSize is the size, in bytes, of the code pointed to by
|
|
pname:pCode.
|
|
* pname:pCode points to code that is used to create the shader
|
|
module. The type and format of the code is determined from the content
|
|
of the memory addressed by pname:pCode.
|
|
|
|
include::../validity/structs/VkShaderModuleCreateInfo.txt[]
|
|
|
|
Once a shader module has been created, any entry points it contains can: be
|
|
used in pipeline shader stages as described in <<pipelines-compute,Compute
|
|
Pipelines>> and <<pipelines-graphics,Graphics Pipelines>>.
|
|
|
|
To destroy a shader module, call:
|
|
|
|
include::../protos/vkDestroyShaderModule.txt[]
|
|
|
|
* pname:device is the logical device that destroys the shader module.
|
|
* pname:shaderModule is the handle of the shader module to destroy.
|
|
* pname:pAllocator controls host memory allocation as described in the
|
|
<<memory-allocation, Memory Allocation>> chapter.
|
|
|
|
A shader module can: be destroyed while pipelines created using its
|
|
shaders are still in use.
|
|
|
|
include::../validity/protos/vkDestroyShaderModule.txt[]
|
|
|
|
|
|
[[shaders-execution]]
|
|
== Shader Execution
|
|
|
|
At each stage of the pipeline, multiple invocations of a shader may: execute
|
|
simultaneously. Further, invocations of a single shader produced as the
|
|
result of different commands may: execute simultaneously. The relative
|
|
execution order of invocations of the same shader type is undefined. Shader
|
|
invocations may: complete in a different order than that in which the
|
|
primitives they originated from were drawn or dispatched by the application.
|
|
However, fragment shader outputs are written to attachments in
|
|
<<fundamentals-queueoperation-apiorder,API order>>.
|
|
|
|
The relative order of invocations of different shader types is largely
|
|
undefined. However, when invoking a shader whose inputs are generated from a
|
|
previous pipeline stage, the shader invocations from the previous stage are
|
|
guaranteed to have executed far enough to generate input values for all
|
|
required inputs.
|
|
|
|
|
|
[[shaders-execution-memory-ordering]]
|
|
== Shader Memory Access Ordering
|
|
|
|
The order in which image or buffer memory is read or written by shaders is
|
|
largely undefined. For some shader types (vertex, tessellation evaluation,
|
|
and in some cases, fragment), even the number of shader invocations that
|
|
may: perform loads and stores is undefined.
|
|
|
|
In particular, the following rules apply:
|
|
|
|
* <<shaders-vertex-execution,Vertex>> and
|
|
<<shaders-tessellation-evaluation-execution,tessellation evaluation>>
|
|
shaders will be invoked at least once for each unique vertex, as defined
|
|
in those sections.
|
|
* <<shaders-fragment-execution,Fragment>> shaders will be invoked zero or
|
|
more times, as defined in that section.
|
|
* The relative order of invocations of the same shader type are undefined.
|
|
A store issued by a shader when working on primitive B might complete
|
|
prior to a store for primitive A, even if primitive A is specified prior
|
|
to primitive B. This applies even to fragment shaders; while fragment
|
|
shader outputs are always written to the framebuffer
|
|
<<fundamentals-queueoperation-apiorder,in primitive order>>, stores
|
|
executed by fragment shader invocations are not.
|
|
* The relative order of invocations of different shader types is largely
|
|
undefined.
|
|
|
|
[NOTE]
|
|
.Note
|
|
====
|
|
The above limitations on shader invocation order make some forms of
|
|
synchronization between shader invocations within a single set of primitives
|
|
unimplementable. For example, having one invocation poll memory written by
|
|
another invocation assumes that the other invocation has been launched and
|
|
will complete its writes in finite time.
|
|
====
|
|
|
|
Stores issued to different memory locations within a single shader
|
|
invocation may: not be visible to other invocations in the order they were
|
|
performed. The code:OpMemoryBarrier instruction can: be used to provide
|
|
stronger ordering of reads and writes performed by a single invocation.
|
|
code:OpMemoryBarrier guarantees that any memory transactions issued by the
|
|
shader invocation prior to the instruction complete prior to the memory
|
|
transactions issued after the instruction. Memory barriers are needed for
|
|
algorithms that require multiple invocations to access the same memory and
|
|
require the operations to be performed in a partially-defined relative
|
|
order. For example, if one shader invocation does a series of writes,
|
|
followed by an code:OpMemoryBarrier instruction, followed by another write,
|
|
then the results of the series of writes before the barrier become visible to
|
|
other shader invocations at a time earlier or equal to when the results of
|
|
the final write become visible to those invocations. In practice it means
|
|
that another invocation that sees the results of the final write would also
|
|
see the previous writes. Without the memory barrier, the final write may: be
|
|
visible before the previous writes.
|
|
|
|
The built-in atomic memory transaction instructions can: be used to read and
|
|
write a given memory address atomically. While built-in atomic functions
|
|
issued by multiple shader invocations are executed in undefined order
|
|
relative to each other, these functions perform both a read and a write of a
|
|
memory address and guarantee that no other memory transaction will write to
|
|
the underlying memory between the read and write.
|
|
|
|
[NOTE]
|
|
.Note
|
|
====
|
|
Atomics allow shaders to use shared global addresses for mutual exclusion or
|
|
as counters, among other uses.
|
|
====
|
|
|
|
|
|
[[shaders-inputs]]
|
|
== Shader Inputs and Outputs
|
|
|
|
Data is passed into and out of shaders using variables with input or output
|
|
storage class, respectively. User-defined inputs and outputs are connected
|
|
between stages by matching their code:Location decorations. Additionally,
|
|
data can: be provided by or communicated to special functions provided by
|
|
the execution environment using code:BuiltIn decorations.
|
|
|
|
In many cases, the same code:BuiltIn decoration can: be used in multiple
|
|
shader stages with similar meaning. The specific behavior of variables
|
|
decorated as code:BuiltIn is documented in the following sections.
|
|
|
|
|
|
[[shaders-vertex]]
|
|
== Vertex Shaders
|
|
|
|
Each vertex shader invocation operates on one vertex and its associated
|
|
<<fxvertex-attrib,vertex attribute>> data, and outputs one vertex and
|
|
associated data. Graphics pipelines must: include a vertex shader, and
|
|
the vertex shader stage is always the first shader stage in the graphics
|
|
pipeline.
|
|
|
|
|
|
[[shaders-vertex-execution]]
|
|
=== Vertex Shader Execution
|
|
|
|
A vertex shader must: be executed at least once for each vertex specified by
|
|
a draw command. During execution, the shader is presented with the index of
|
|
the vertex and instance for which it has been invoked. Input variables
|
|
declared in the vertex shader are filled by the implementation with the
|
|
values of vertex attributes associated with the invocation being executed.
|
|
|
|
If a vertex is a part of more than one input primitive, for example
|
|
by including the same index value multiple times in an index buffer, the
|
|
vertex shader may: be invoked only once and the results shared amongst the
|
|
resulting primitives. This is known as _vertex reuse_.
|
|
|
|
ifdef::implementation-guide[]
|
|
.Implementor's Note
|
|
****
|
|
If a vertex is repeated in a draw command (i.e. the same index is repeated
|
|
in an indexed draw), the shader may: be executed anywhere from one to the
|
|
number of repetitions times for that vertex, depending on the
|
|
implementation's ability to reuse shader results.
|
|
****
|
|
endif::implementation-guide[]
|
|
|
|
|
|
[[shaders-tessellation-control]]
|
|
== Tessellation Control Shaders
|
|
|
|
The tessellation control shader is used to read an input patch provided by
|
|
the application and to produce an output patch. Each tessellation control
|
|
shader invocation operates on an input patch (after all control points in
|
|
the patch are processed by a vertex shader) and its associated data, and
|
|
outputs a single control point of the output patch and its associated data,
|
|
and can: also output additional per-patch data. The input patch is sized
|
|
according to the pname:patchControlPoints member of
|
|
slink:VkPipelineTessellationStateCreateInfo, as part of input assembly. The
|
|
size of the output patch is controlled by the code:OpExecutionMode
|
|
code:OutputVertices specified in the tessellation control or tessellation
|
|
evaluation shaders, which must: be specified in at least one of the shaders.
|
|
The size of the input and output patches must: each be greater than zero and
|
|
less than or equal to
|
|
sname:VkPhysicalDeviceLimits::pname:maxTessellationPatchSize.
|
|
|
|
|
|
[[shaders-tessellation-control-execution]]
|
|
=== Tessellation Control Shader Execution
|
|
|
|
A tessellation control shader is invoked at least once for each _output_
|
|
vertex in a patch.
|
|
|
|
Inputs to the tessellation control shader are generated by the vertex
|
|
shader. Each invocation of the tessellation control shader can: read the
|
|
attributes of any incoming vertices and their associated data. The
|
|
invocations corresponding to a given patch execute logically in parallel,
|
|
with undefined relative execution order. However, the code:OpControlBarrier
|
|
instruction can: be used to provide limited control of the execution order
|
|
by synchronizing invocations within a patch, effectively dividing
|
|
tessellation control shader execution into a set of phases. Tessellation
|
|
control shaders will read undefined values if one invocation reads a
|
|
per-vertex or per-patch attribute written by another invocation at any point
|
|
during the same phase, or if two invocations attempt to write different
|
|
values to the same per-patch output in a single phase.
|
|
|
|
|
|
[[shaders-tessellation-evaluation]]
|
|
== Tessellation Evaluation Shaders
|
|
|
|
The Tessellation Evaluation Shader operates on an input patch of control
|
|
points and their associated data, and a single input barycentric coordinate
|
|
indicating the invocation's relative position within the subdivided patch,
|
|
and outputs a single vertex and its associated data.
|
|
|
|
|
|
[[shaders-tessellation-evaluation-execution]]
|
|
=== Tessellation Evaluation Shader Execution
|
|
|
|
A tessellation evaluation shader is invoked at least once for each
|
|
unique vertex generated by the tessellator.
|
|
|
|
|
|
[[shaders-geometry]]
|
|
== Geometry Shaders
|
|
|
|
The geometry shader operates on a group of vertices and their associated
|
|
data assembled from a single input primitive, and emits zero or more
|
|
output primitives and the group of vertices and their associated data
|
|
required for each output primitive.
|
|
|
|
|
|
[[shaders-geometry-execution]]
|
|
=== Geometry Shader Execution
|
|
|
|
A geometry shader is invoked at least once for each primitive produced by
|
|
the tessellation stages, or at least once for each primitive generated by
|
|
<<drawing,primitive assembly>> when tessellation is not in use. The number
|
|
of geometry shader invocations per input primitive is determined from the
|
|
invocation count of the geometry shader specified by the
|
|
code:OpExecutionMode code:Invocations in the geometry shader. If the
|
|
invocation count is not specified, then a default of one invocation is
|
|
executed.
|
|
|
|
|
|
[[shaders-fragment]]
|
|
== Fragment Shaders
|
|
|
|
Fragment shaders are invoked as the result of rasterization in a graphics
|
|
pipeline. Each fragment shader invocation operates on a single fragment and
|
|
its associated data. With few exceptions, fragment shaders do not have
|
|
access to any data associated with other fragments and are considered to
|
|
execute in isolation of fragment shader invocations associated with other
|
|
fragments.
|
|
|
|
|
|
[[shaders-fragment-execution]]
|
|
=== Fragment Shader Execution
|
|
|
|
For each fragment generated by rasterization, a fragment shader may: be
|
|
invoked. A fragment shader mustnot: be invoked if the <<fragops-early,Early
|
|
Per-Fragment Tests>> cause it to have no coverage.
|
|
|
|
Furthermore, if it is determined that a fragment generated as the result of
|
|
rasterizing a first primitive will have its outputs entirely overwritten by
|
|
a fragment generated as the result of rasterizing a second primitive in the
|
|
same subpass, and the fragment shader used for the fragment has no other
|
|
side effects, then the fragment shader may: not be executed for the fragment
|
|
from the first primitive.
|
|
|
|
Relative ordering of execution of different fragment shader invocations is
|
|
not defined.
|
|
|
|
The number of fragment shader invocations produced per-pixel is determined
|
|
as follows:
|
|
|
|
- If per-sample shading is enabled, the fragment shader is invoked once
|
|
per covered sample.
|
|
- Otherwise, the fragment shader is invoked at least once per fragment but
|
|
no more than once per covered sample.
|
|
|
|
In addition to the conditions outlined above for the invocation of a
|
|
fragment shader, a fragment shader invocation may: be produced as a _helper
|
|
invocation_. A helper invocation is a fragment shader invocation that is
|
|
created solely for the purposes of evaluating derivatives for use in
|
|
non-helper fragment shader invocations. Stores and atomics performed by
|
|
helper invocations mustnot: have any effect on memory, and values returned
|
|
by atomic instructions in helper invocations are undefined.
|
|
|
|
|
|
[[shaders-fragment-earlytest]]
|
|
=== Early Fragment Tests
|
|
|
|
An explicit control is provided to allow fragment shaders to enable early
|
|
fragment tests. If the fragment shader specifies the
|
|
code:EarlyFragmentTests code:OpExecutionMode, the per-fragment tests
|
|
described in <<fragops-early-mode,Early Fragment Test Mode>> are
|
|
performed prior to fragment shader execution. Otherwise, they are performed
|
|
after fragment shader execution.
|
|
|
|
|
|
[[shaders-compute]]
|
|
== Compute Shaders
|
|
|
|
Compute shaders are invoked via flink:vkCmdDispatch and
|
|
flink:vkCmdDispatchIndirect commands. In general, they have access to
|
|
similar resources as shader stages executing as part of a graphics pipeline.
|
|
|
|
Compute workloads are formed from groups of work items called workgroups
|
|
and processed by the compute shader in the current compute pipeline. A
|
|
workgroup is a collection of shader invocations that execute the same
|
|
shader, potentially in parallel. Compute shaders execute in _global
|
|
workgroups_ which are divided into a number of _local workgroups_ with a size
|
|
that can: be set by assigning a value to the code:LocalSize execution mode
|
|
either in the shader code or via
|
|
<<pipelines-specialization-constants,Specialization Constants>>. An
|
|
invocation within a local workgroup can: share data with other members of
|
|
the local workgroup through shared variables and issue memory and control
|
|
flow barriers to synchronize with other members of the local workgroup.
|
|
|
|
|
|
[[shaders-interpolation-decorations]]
|
|
== Interpolation Decorations
|
|
|
|
Interpolation decorations control the behavior of attribute interpolation in
|
|
the fragment shader stage. Interpolation decorations can: be applied to
|
|
code:Input storage class variables in the fragment shader stage's interface,
|
|
and control the interpolation behavior of those variables.
|
|
|
|
Inputs that could be interpolated can: be decorated by at most one
|
|
of the following decorations:
|
|
|
|
* code:Flat: no interpolation
|
|
* code:NoPerspective: linear interpolation (for
|
|
<<line_noperspective_interpolation,lines>> and
|
|
<<triangle_noperspective_interpolation,polygons>>).
|
|
|
|
Fragment input variables decorated with neither code:Flat nor
|
|
code:NoPerspective use perspective-correct interpolation (for
|
|
<<line_perspective_interpolation,lines>> and
|
|
<<triangle_perspective_interpolation,polygons>>).
|
|
|
|
The presence of and type of interpolation is controlled by the above
|
|
interpolation decorations as well as the auxiliary decorations code:Centroid
|
|
and code:Sample.
|
|
|
|
A variable decorated with code:Flat will not be interpolated. Instead, it
|
|
will have the same value for every fragment within a triangle. This value
|
|
will come from a single <<vertexpostproc-flatshading,provoking vertex>>. A
|
|
variable decorated with code:Flat can: also be decorated with code:Centroid
|
|
or code:Sample, which will mean the same thing as decorating it only as
|
|
code:Flat.
|
|
|
|
For fragment shader input variables decorated with neither code:Centroid nor
|
|
code:Sample, the assigned variable may: be interpolated
|
|
anywhere within the pixel and a single value may: be assigned to each sample
|
|
within the pixel.
|
|
|
|
code:Centroid and code:Sample can: be used to control the location and
|
|
frequency of the sampling of the decorated fragment shader input. If a
|
|
fragment shader input is decorated with code:Centroid, a single value may:
|
|
be assigned to that variable for all samples in the pixel, but that value
|
|
must: be interpolated to a location that lies in both the pixel and in the
|
|
primitive being rendered, including any of the pixel's samples covered by
|
|
the primitive. Because the location at which the variable is interpolated
|
|
may be different in neighboring pixels, and derivatives may be computed by
|
|
computing differences between neighboring pixels, derivatives of
|
|
centroid-sampled inputs may: be less accurate than those for non-centroid
|
|
interpolated variables. If a fragment shader input is decorated with
|
|
code:Sample, a separate value must: be assigned to that variable for each
|
|
covered sample in the pixel, and that value must: be sampled at the location
|
|
of the individual sample. When pname:rasterizationSamples is
|
|
ename:VK_SAMPLE_COUNT_1_BIT, the pixel center must: be used for
|
|
code:Centroid, code:Sample, and undecorated attribute interpolation.
|
|
|
|
Fragment shader inputs that are signed or unsigned integers, integer
|
|
vectors, or any double-precision floating-point type must: be decorated with
|
|
code:Flat.
|
|
|
|
|
|
[[shaders-staticuse]]
|
|
== Static Use
|
|
|
|
A SPIR-V module declares a global object in memory using the code:OpVariable
|
|
instruction, which results in a pointer code:x to that object. A specific
|
|
entry point in a SPIR-V module is said to _statically use_ that object if
|
|
that entry-point's call tree contains a function that contains a memory
|
|
instruction or image instruction with code:x as an code:id operand. See the
|
|
``Memory Instructions'' and ``Image Instructions'' subsections of section 3
|
|
``Binary Form'' of the SPIR-V specification for the complete list of SPIR-V
|
|
memory instructions.
|
|
|
|
Static use is not used to control the behavior of variables with code:Input
|
|
and code:Output storage. The effects of those variables are applied based
|
|
only on whether they are present in a shader entry point's interface.
|
|
|
|
[[shaders-invocationgroups]]
|
|
== Invocation and Derivative Groups
|
|
|
|
An _invocation group_ (see the subsection ``Control Flow'' of section 2 of the
|
|
SPIR-V specification) for a compute shader is the set of invocations in a
|
|
single local workgroup. For graphics shaders, an invocation group is an
|
|
implementation-dependent subset of the set of shader invocations of a given
|
|
shader stage which are produced by a single drawing command. For indirect
|
|
drawing commands with pname:drawCount greater than one, invocations from
|
|
separate draws are in distinct invocation groups.
|
|
|
|
[NOTE]
|
|
.Note
|
|
====
|
|
Because the partitioning of invocations into invocation groups is
|
|
implementation-dependent and not observable, applications generally need to
|
|
assume the worst case of all invocations in a draw belonging to a single
|
|
invocation group.
|
|
====
|
|
|
|
A _derivative group_ (see the subsection ``Control Flow'' of section 2 of the
|
|
SPIR-V 1.00 Revision 4 specification) for a fragment shader is the set of
|
|
invocations generated by a single primitive (point, line, or triangle),
|
|
including any helper invocations generated by that primitive. Derivatives are
|
|
undefined for a sampled image instruction if the instruction is in flow
|
|
control that is not uniform across the derivative group.
|