Vulkan-Docs/doc/specs/vulkan/chapters/shaders.txt

// Copyright (c) 2015-2017 The Khronos Group Inc.
// Copyright notice at https://www.khronos.org/registry/speccopyright.html

[[shaders]]
= Shaders

A shader specifies programmable operations that execute for each vertex,
control point, tessellated vertex, primitive, fragment, or workgroup in the
corresponding stage(s) of the graphics and compute pipelines.

Graphics pipelines include vertex shader execution as a result of
<<drawing,primitive assembly>>, followed, if enabled, by tessellation
control and evaluation shaders operating on
<<drawing-primitive-topologies-patches,patches>>, geometry shaders, if
enabled, operating on primitives, and fragment shaders, if present,
operating on fragments generated by <<primsrast,Rasterization>>.
In this specification, vertex, tessellation control, tessellation evaluation
and geometry shaders are collectively referred to as vertex processing
stages and occur in the logical pipeline before rasterization.
The fragment shader occurs logically after rasterization.

Only the compute shader stage is included in a compute pipeline.
Compute shaders operate on compute invocations in a workgroup.

Shaders can: read from input variables, and read from and write to output
variables.
Input and output variables can: be used to transfer data between shader
stages, or to allow the shader to interact with values that exist in the
execution environment.
Similarly, the execution environment provides constants that describe
capabilities.

Shader variables are associated with execution environment-provided inputs
and outputs using _built-in_ decorations in the shader.
The available decorations for each stage are documented in the following
subsections.


[[shader-modules]]
== Shader Modules

// refBegin VkShaderModule Opaque handle to a shader module object

_Shader modules_ contain _shader code_ and one or more entry points.
Shaders are selected from a shader module by specifying an entry point as
part of <<pipelines,pipeline>> creation.
The stages of a pipeline can: use shaders that come from different modules.
The shader code defining a shader module must: be in the SPIR-V format, as
described by the <<spirvenv,Vulkan Environment for SPIR-V>> appendix.

Shader modules are represented by sname:VkShaderModule handles:

include::../api/handles/VkShaderModule.txt[]

// refEnd VkShaderModule

// refBegin vkCreateShaderModule Creates a new shader module object

To create a shader module, call:

include::../api/protos/vkCreateShaderModule.txt[]

  * pname:device is the logical device that creates the shader module.
  * pname:pCreateInfo parameter is a pointer to an instance of the
    sname:VkShaderModuleCreateInfo structure.
  * pname:pAllocator controls host memory allocation as described in the
    <<memory-allocation, Memory Allocation>> chapter.
  * pname:pShaderModule points to a sname:VkShaderModule handle in which the
    resulting shader module object is returned.

Once a shader module has been created, any entry points it contains can: be
used in pipeline shader stages as described in <<pipelines-compute,Compute
Pipelines>> and <<pipelines-graphics,Graphics Pipelines>>.

ifdef::VK_NV_glsl_shader[]
If the shader stage fails to compile ename:VK_ERROR_INVALID_SHADER_NV will
be generated and the compile log will be reported back to the application by
+VK_EXT_debug_report+ if enabled.
endif::VK_NV_glsl_shader[]

include::../validity/protos/vkCreateShaderModule.txt[]

// refBegin VkShaderModuleCreateInfo Structure specifying parameters of a newly created shader module

The sname:VkShaderModuleCreateInfo structure is defined as:

include::../api/structs/VkShaderModuleCreateInfo.txt[]

  * pname:sType is the type of this structure.
  * pname:pNext is `NULL` or a pointer to an extension-specific structure.
  * pname:flags is reserved for future use.
  * pname:codeSize is the size, in bytes, of the code pointed to by
    pname:pCode.
  * pname:pCode points to code that is used to create the shader module.
    The type and format of the code is determined from the content of the
    memory addressed by pname:pCode.

.Valid Usage
****
  * pname:codeSize must: be greater than 0
  * pname:codeSize must: be a multiple of 4.
    If the +VK_NV_glsl_shader extension+ is enabled and pname:pCode
    references GLSL code pname:codeSize can be a multiple of 1
  * pname:pCode must: point to valid SPIR-V code, formatted and packed as
    described by the <<spirv-spec,Khronos SPIR-V Specification>>.
    If the +VK_NV_glsl_shader+ extension is enabled pname:pCode can instead
    reference valid GLSL code and must: be written to the
    +GL_KHR_vulkan_glsl+ extension specification
  * pname:pCode must: adhere to the validation rules described by the
    <<spirvenv-module-validation, Validation Rules within a Module>> section
    of the <<spirvenv-capabilities,SPIR-V Environment>> appendix.
    If the +VK_NV_glsl_shader+ extension is enabled pname:pCode can be valid
    GLSL code with respect to the +GL_KHR_vulkan_glsl+ GLSL extension
    specification
  * pname:pCode must: declare the code:Shader capability for SPIR-V code
  * pname:pCode must: not declare any capability that is not supported by
    the API, as described by the <<spirvenv-module-validation,
    Capabilities>> section of the <<spirvenv-capabilities,SPIR-V
    Environment>> appendix
  * If pname:pCode declares any of the capabilities that are listed as not
    required by the implementation, the relevant feature must: be enabled,
    as listed in the <<spirvenv-capabilities-table,SPIR-V Environment>>
    appendix
****

include::../validity/structs/VkShaderModuleCreateInfo.txt[]

// refBegin vkDestroyShaderModule Destroy a shader module module

To destroy a shader module, call:

include::../api/protos/vkDestroyShaderModule.txt[]

  * pname:device is the logical device that destroys the shader module.
  * pname:shaderModule is the handle of the shader module to destroy.
  * pname:pAllocator controls host memory allocation as described in the
    <<memory-allocation, Memory Allocation>> chapter.

A shader module can: be destroyed while pipelines created using its shaders
are still in use.

.Valid Usage
****
  * If sname:VkAllocationCallbacks were provided when pname:shaderModule was
    created, a compatible set of callbacks must: be provided here
  * If no sname:VkAllocationCallbacks were provided when pname:shaderModule
    was created, pname:pAllocator must: be `NULL`
****

include::../validity/protos/vkDestroyShaderModule.txt[]


[[shaders-execution]]
== Shader Execution

At each stage of the pipeline, multiple invocations of a shader may: execute
simultaneously.
Further, invocations of a single shader produced as the result of different
commands may: execute simultaneously.
The relative execution order of invocations of the same shader type is
undefined.
Shader invocations may: complete in a different order than that in which the
primitives they originated from were drawn or dispatched by the application.
However, fragment shader outputs are written to attachments in
<<primrast-order,rasterization order>>.

The relative order of invocations of different shader types is largely
undefined.
However, when invoking a shader whose inputs are generated from a previous
pipeline stage, the shader invocations from the previous stage are
guaranteed to have executed far enough to generate input values for all
required inputs.


[[shaders-execution-memory-ordering]]
== Shader Memory Access Ordering

The order in which image or buffer memory is read or written by shaders is
largely undefined.
For some shader types (vertex, tessellation evaluation, and in some cases,
fragment), even the number of shader invocations that may: perform loads and
stores is undefined.

In particular, the following rules apply:

  * <<shaders-vertex-execution,Vertex>> and
    <<shaders-tessellation-evaluation-execution,tessellation evaluation>>
    shaders will be invoked at least once for each unique vertex, as defined
    in those sections.
  * <<shaders-fragment-execution,Fragment>> shaders will be invoked zero or
    more times, as defined in that section.
  * The relative order of invocations of the same shader type are undefined.
    A store issued by a shader when working on primitive B might complete
    prior to a store for primitive A, even if primitive A is specified prior
    to primitive B.
    This applies even to fragment shaders; while fragment shader outputs are
    always written to the framebuffer
    <<fundamentals-queueoperation-apiorder,in primitive order>>, stores
    executed by fragment shader invocations are not.
  * The relative order of invocations of different shader types is largely
    undefined.

[NOTE]
.Note
====
The above limitations on shader invocation order make some forms of
synchronization between shader invocations within a single set of primitives
unimplementable.
For example, having one invocation poll memory written by another invocation
assumes that the other invocation has been launched and will complete its
writes in finite time.
====

Stores issued to different memory locations within a single shader
invocation may: not be visible to other invocations, or may: not become
visible in the order they were performed.

The code:OpMemoryBarrier instruction can: be used to provide stronger
ordering of reads and writes performed by a single invocation.
code:OpMemoryBarrier guarantees that any memory transactions issued by the
shader invocation prior to the instruction complete prior to the memory
transactions issued after the instruction.
Memory barriers are needed for algorithms that require multiple invocations
to access the same memory and require the operations to be performed in a
partially-defined relative order.
For example, if one shader invocation does a series of writes, followed by
an code:OpMemoryBarrier instruction, followed by another write, then the
results of the series of writes before the barrier become visible to other
shader invocations at a time earlier or equal to when the results of the
final write become visible to those invocations.
In practice it means that another invocation that sees the results of the
final write would also see the previous writes.
Without the memory barrier, the final write may: be visible before the
previous writes.

Writes that are the result of shader stores through a variable decorated
with code:Coherent automatically have available writes to the same buffer,
buffer view, or image view made visible to them, and are themselves
automatically made available to access by the same buffer, buffer view, or
image view.
Reads that are the result of shader loads through a variable decorated with
code:Coherent automatically have available writes to the same buffer, buffer
view, or image view made visible to them.
The order that coherent writes to different locations become available is
undefined, unless enforced by a memory barrier instruction or other memory
dependency.

.Note
[NOTE]
====
Explicit memory dependencies must: still be used to guarantee availability
and visibility for access via other buffers, buffer views, or image views.
====

The built-in atomic memory transaction instructions can: be used to read and
write a given memory address atomically.
While built-in atomic functions issued by multiple shader invocations are
executed in undefined order relative to each other, these functions perform
both a read and a write of a memory address and guarantee that no other
memory transaction will write to the underlying memory between the read and
write.
Atomic operations ensure automatic availability and visibility for writes
and reads in the same way as those to code:Coherent variables.

.Note
[[NOTE]]
====
Memory accesses performed on different resource descriptors with the same
memory backing may: not be well-defined even with the code:Coherent
decoration or via atomics, due to things such as image layouts or ownership
of the resource - as described in the <<synchronization, Synchronization and
Cache Control>> chapter.
====

[NOTE]
.Note
====
Atomics allow shaders to use shared global addresses for mutual exclusion or
as counters, among other uses.
====


[[shaders-inputs]]
== Shader Inputs and Outputs

Data is passed into and out of shaders using variables with input or output
storage class, respectively.
User-defined inputs and outputs are connected between stages by matching
their code:Location decorations.
Additionally, data can: be provided by or communicated to special functions
provided by the execution environment using code:BuiltIn decorations.

In many cases, the same code:BuiltIn decoration can: be used in multiple
shader stages with similar meaning.
The specific behavior of variables decorated as code:BuiltIn is documented
in the following sections.


[[shaders-vertex]]
== Vertex Shaders

Each vertex shader invocation operates on one vertex and its associated
<<fxvertex-attrib,vertex attribute>> data, and outputs one vertex and
associated data.
Graphics pipelines must: include a vertex shader, and the vertex shader
stage is always the first shader stage in the graphics pipeline.


[[shaders-vertex-execution]]
=== Vertex Shader Execution

A vertex shader must: be executed at least once for each vertex specified by
a draw command.
During execution, the shader is presented with the index of the vertex and
instance for which it has been invoked.
Input variables declared in the vertex shader are filled by the
implementation with the values of vertex attributes associated with the
invocation being executed.

If the same vertex is specified multiple times in a draw command (e.g. by
including the same index value multiple times in an index buffer) the
implementation may: reuse the results of vertex shading if it can statically
determine that the vertex shader invocations will produce identical results.

[NOTE]
.Note
==================
It is implementation-dependent when and if results of vertex shading are
reused, and thus how many times the vertex shader will be executed.
This is true also if the vertex shader contains stores or atomic operations
(see <<features-features-vertexPipelineStoresAndAtomics,
pname:vertexPipelineStoresAndAtomics>>).
==================


[[shaders-tessellation-control]]
== Tessellation Control Shaders

The tessellation control shader is used to read an input patch provided by
the application and to produce an output patch.
Each tessellation control shader invocation operates on an input patch
(after all control points in the patch are processed by a vertex shader) and
its associated data, and outputs a single control point of the output patch
and its associated data, and can: also output additional per-patch data.
The input patch is sized according to the pname:patchControlPoints member of
slink:VkPipelineTessellationStateCreateInfo, as part of input assembly.
The size of the output patch is controlled by the code:OpExecutionMode
code:OutputVertices specified in the tessellation control or tessellation
evaluation shaders, which must: be specified in at least one of the shaders.
The size of the input and output patches must: each be greater than zero and
less than or equal to
sname:VkPhysicalDeviceLimits::pname:maxTessellationPatchSize.


[[shaders-tessellation-control-execution]]
=== Tessellation Control Shader Execution

A tessellation control shader is invoked at least once for each _output_
vertex in a patch.

Inputs to the tessellation control shader are generated by the vertex
shader.
Each invocation of the tessellation control shader can: read the attributes
of any incoming vertices and their associated data.
The invocations corresponding to a given patch execute logically in
parallel, with undefined relative execution order.
However, the code:OpControlBarrier instruction can: be used to provide
limited control of the execution order by synchronizing invocations within a
patch, effectively dividing tessellation control shader execution into a set
of phases.
Tessellation control shaders will read undefined values if one invocation
reads a per-vertex or per-patch attribute written by another invocation at
any point during the same phase, or if two invocations attempt to write
different values to the same per-patch output in a single phase.


[[shaders-tessellation-evaluation]]
== Tessellation Evaluation Shaders

The Tessellation Evaluation Shader operates on an input patch of control
points and their associated data, and a single input barycentric coordinate
indicating the invocation's relative position within the subdivided patch,
and outputs a single vertex and its associated data.


[[shaders-tessellation-evaluation-execution]]
=== Tessellation Evaluation Shader Execution

A tessellation evaluation shader is invoked at least once for each unique
vertex generated by the tessellator.


[[shaders-geometry]]
== Geometry Shaders

The geometry shader operates on a group of vertices and their associated
data assembled from a single input primitive, and emits zero or more output
primitives and the group of vertices and their associated data required for
each output primitive.


[[shaders-geometry-execution]]
=== Geometry Shader Execution

A geometry shader is invoked at least once for each primitive produced by
the tessellation stages, or at least once for each primitive generated by
<<drawing,primitive assembly>> when tessellation is not in use.
The number of geometry shader invocations per input primitive is determined
from the invocation count of the geometry shader specified by the
code:OpExecutionMode code:Invocations in the geometry shader.
If the invocation count is not specified, then a default of one invocation
is executed.


[[shaders-fragment]]
== Fragment Shaders

Fragment shaders are invoked as the result of rasterization in a graphics
pipeline.
Each fragment shader invocation operates on a single fragment and its
associated data.
With few exceptions, fragment shaders do not have access to any data
associated with other fragments and are considered to execute in isolation
of fragment shader invocations associated with other fragments.


[[shaders-fragment-execution]]
=== Fragment Shader Execution

For each fragment generated by rasterization, a fragment shader may: be
invoked.
A fragment shader must: not be invoked if the <<fragops-early,Early
Per-Fragment Tests>> cause it to have no coverage.

Furthermore, if it is determined that a fragment generated as the result of
rasterizing a first primitive will have its outputs entirely overwritten by
a fragment generated as the result of rasterizing a second primitive in the
same subpass, and the fragment shader used for the fragment has no other
side effects, then the fragment shader may: not be executed for the fragment
from the first primitive.

Relative ordering of execution of different fragment shader invocations is
not defined.

The number of fragment shader invocations produced per-pixel is determined
as follows:

  * If per-sample shading is enabled, the fragment shader is invoked once
    per covered sample.
  * Otherwise, the fragment shader is invoked at least once per fragment but
    no more than once per covered sample.

In addition to the conditions outlined above for the invocation of a
fragment shader, a fragment shader invocation may: be produced as a _helper
invocation_.
A helper invocation is a fragment shader invocation that is created solely
for the purposes of evaluating derivatives for use in non-helper fragment
shader invocations.
Stores and atomics performed by helper invocations must: not have any effect
on memory, and values returned by atomic instructions in helper invocations
are undefined.


[[shaders-fragment-earlytest]]
=== Early Fragment Tests

An explicit control is provided to allow fragment shaders to enable early
fragment tests.
If the fragment shader specifies the code:EarlyFragmentTests
code:OpExecutionMode, the per-fragment tests described in
<<fragops-early-mode,Early Fragment Test Mode>> are performed prior to
fragment shader execution.
Otherwise, they are performed after fragment shader execution.


[[shaders-compute]]
== Compute Shaders

Compute shaders are invoked via flink:vkCmdDispatch and
flink:vkCmdDispatchIndirect commands.
In general, they have access to similar resources as shader stages executing
as part of a graphics pipeline.

Compute workloads are formed from groups of work items called workgroups and
processed by the compute shader in the current compute pipeline.
A workgroup is a collection of shader invocations that execute the same
shader, potentially in parallel.
Compute shaders execute in _global workgroups_ which are divided into a
number of _local workgroups_ with a size that can: be set by assigning a
value to the code:LocalSize execution mode or via an object decorated by the
code:WorkgroupSize decoration.
An invocation within a local workgroup can: share data with other members of
the local workgroup through shared variables and issue memory and control
flow barriers to synchronize with other members of the local workgroup.


[[shaders-interpolation-decorations]]
== Interpolation Decorations

Interpolation decorations control the behavior of attribute interpolation in
the fragment shader stage.
Interpolation decorations can: be applied to code:Input storage class
variables in the fragment shader stage's interface, and control the
interpolation behavior of those variables.

Inputs that could be interpolated can: be decorated by at most one of the
following decorations:

  * code:Flat: no interpolation
  * code:NoPerspective: linear interpolation (for
    <<line_linear_interpolation,lines>> and
    <<triangle_linear_interpolation,polygons>>).

Fragment input variables decorated with neither code:Flat nor
code:NoPerspective use perspective-correct interpolation (for
<<line_perspective_interpolation,lines>> and
<<triangle_perspective_interpolation,polygons>>).

The presence of and type of interpolation is controlled by the above
interpolation decorations as well as the auxiliary decorations code:Centroid
and code:Sample.

A variable decorated with code:Flat will not be interpolated.
Instead, it will have the same value for every fragment within a triangle.
This value will come from a single <<vertexpostproc-flatshading,provoking
vertex>>.
A variable decorated with code:Flat can: also be decorated with
code:Centroid or code:Sample, which will mean the same thing as decorating
it only as code:Flat.

For fragment shader input variables decorated with neither code:Centroid nor
code:Sample, the assigned variable may: be interpolated anywhere within the
pixel and a single value may: be assigned to each sample within the pixel.

code:Centroid and code:Sample can: be used to control the location and
frequency of the sampling of the decorated fragment shader input.
If a fragment shader input is decorated with code:Centroid, a single value
may: be assigned to that variable for all samples in the pixel, but that
value must: be interpolated to a location that lies in both the pixel and in
the primitive being rendered, including any of the pixel's samples covered
by the primitive.
Because the location at which the variable is interpolated may: be different
in neighboring pixels, and derivatives may: be computed by computing
differences between neighboring pixels, derivatives of centroid-sampled
inputs may: be less accurate than those for non-centroid interpolated
variables.
If a fragment shader input is decorated with code:Sample, a separate value
must: be assigned to that variable for each covered sample in the pixel, and
that value must: be sampled at the location of the individual sample.
When pname:rasterizationSamples is ename:VK_SAMPLE_COUNT_1_BIT, the pixel
center must: be used for code:Centroid, code:Sample, and undecorated
attribute interpolation.

Fragment shader inputs that are signed or unsigned integers, integer
vectors, or any double-precision floating-point type must: be decorated with
code:Flat.

ifdef::VK_AMD_shader_explicit_vertex_parameter[]
When the +VK_AMD_shader_explicit_vertex_parameter+ device extension is
enabled inputs can: be also decorated with the code:CustomInterpAMD
interpolation decoration, including fragment shader inputs that are signed
or unsigned integers, integer vectors, or any double-precision
floating-point type.
Inputs decorated with code:CustomInterpAMD can: only be accessed by the
extended instruction code:InterpolateAtVertexAMD and allows accessing the
value of the input for individual vertices of the primitive.
endif::VK_AMD_shader_explicit_vertex_parameter[]


[[shaders-staticuse]]
== Static Use

A SPIR-V module declares a global object in memory using the code:OpVariable
instruction, which results in a pointer code:x to that object.
A specific entry point in a SPIR-V module is said to _statically use_ that
object if that entry point's call tree contains a function that contains a
memory instruction or image instruction with code:x as an code:id operand.
See the ``Memory Instructions'' and ``Image Instructions'' subsections of
section 3 ``Binary Form'' of the SPIR-V specification for the complete list
of SPIR-V memory instructions.

Static use is not used to control the behavior of variables with code:Input
and code:Output storage.
The effects of those variables are applied based only on whether they are
present in a shader entry point's interface.

[[shaders-invocationgroups]]
== Invocation and Derivative Groups

An _invocation group_ (see the subsection ``Control Flow'' of section 2 of
the SPIR-V specification) for a compute shader is the set of invocations in
a single local workgroup.
For graphics shaders, an invocation group is an implementation-dependent
subset of the set of shader invocations of a given shader stage which are
produced by a single drawing command.
For indirect drawing commands with pname:drawCount greater than one,
invocations from separate draws are in distinct invocation groups.

[NOTE]
.Note
====
Because the partitioning of invocations into invocation groups is
implementation-dependent and not observable, applications generally need to
assume the worst case of all invocations in a draw belonging to a single
invocation group.
====

A _derivative group_ (see the subsection ``Control Flow'' of section 2 of
the SPIR-V 1.00 Revision 4 specification) for a fragment shader is the set
of invocations generated by a single primitive (point, line, or triangle),
including any helper invocations generated by that primitive.
Derivatives are undefined for a sampled image instruction if the instruction
is in flow control that is not uniform across the derivative group.