2019-10-06 12:42:12 -07:00
|
|
|
include::{generated}/meta/VK_NV_shader_image_footprint.txt[]
|
2018-09-15 18:35:16 -07:00
|
|
|
|
|
|
|
*Last Modified Date*::
|
|
|
|
2018-09-13
|
|
|
|
*IP Status*::
|
|
|
|
No known IP claims.
|
|
|
|
*Contributors*::
|
|
|
|
- Pat Brown, NVIDIA
|
|
|
|
- Chris Lentini, NVIDIA
|
|
|
|
- Daniel Koch, NVIDIA
|
|
|
|
- Jeff Bolz, NVIDIA
|
|
|
|
|
Change log for August 17, 2019 Vulkan 1.1.120 spec update:
* Update release number to 120.
Github Issues:
* Add slink:VkAccelerationStructureTypeNV explicitly to extension XML for
`<<VK_NV_ray_tracing>>` (public issue 848).
* Add missing valid usage statements for feature flags in
slink:VkCommandBufferInheritanceInfo (public pull request 1017).
Internal Issues:
* Clarify behavior of non-premultiplied destination colors for
`<<VK_EXT_blend_operation_advanced>>` prior to the definition of
slink:VkBlendOverlapEXT (internal issue 1766).
* Fix the confusing phrasing "`no other queue must: be (doing something)`"
for flink:vkQueuePresentKHR, flink:vkQueueSubmit, and
flink:vkQueueBindSparse (internal issue 1774).
* Add `<<VK_EXT_validation_features>>` flag to enable best practices
checks, which will soon be available in the validation layer (internal
issue 1779).
* Specify allowed characters for VUID tag name components in the style
guide (internal issue 1788).
* Update links to SPIR-V extension specifications, and parameterize their
markup in case the URLs change in the future (internal issue 1797).
* Fix an off-by-one error in the valid usage statement for
slink:VkPipelineExecutableInfoKHR (internal merge request 3303).
* Clean up markup indentation not matching the style guide (internal merge
request 3314).
* Minor script updates to allow refpage aliases, generate a dynamic TOC
for refpages, generate Apache rewrite rules for aliases, open external
links from refpages in a new window, and synchronize with the OpenCL
scripts. This will shortly enable a paned navigation setup for refpages,
similar to the OpenCL 2.2 refpages (internal merge request 3322).
* Script updates to add tests to the checker, refactor and reformat code,
generate better text for some valid usage statements, use more Pythonic
idioms, and synchronize with the OpenXR scripts (internal merge request
3239).
* Script updates and minor fixes in spec language to not raise checker
errors for refpage markup of pages not existing in the API, such as
VKAPI_NO_STDINT_H. Remove corresponding suppression of some
check_spec_links.py tests from .gitlab-ci.yml and 'allchecks' target
(internal merge request 3315).
2019-08-17 15:33:21 -07:00
|
|
|
This extension adds Vulkan support for the
|
|
|
|
{spirv}/NV/SPV_NV_shader_image_footprint.html[`SPV_NV_shader_image_footprint`]
|
2018-09-15 18:35:16 -07:00
|
|
|
SPIR-V extension.
|
|
|
|
That SPIR-V extension provides a new instruction
|
|
|
|
code:OpImageSampleFootprintNV allowing shaders to determine the set of
|
|
|
|
texels that would be accessed by an equivalent filtered texture lookup.
|
|
|
|
|
|
|
|
Instead of returning a filtered texture value, the instruction returns a
|
|
|
|
structure that can be interpreted by shader code to determine the footprint
|
|
|
|
of a filtered texture lookup.
|
|
|
|
This structure includes integer values that identify a small neighborhood of
|
|
|
|
texels in the image being accessed and a bitfield that indicates which
|
|
|
|
texels in that neighborhood would be used.
|
|
|
|
The structure also includes a bitfield where each bit identifies whether any
|
|
|
|
texel in a small aligned block of texels would be fetched by the texture
|
|
|
|
lookup.
|
|
|
|
The size of each block is specified by an access _granularity_ provided by
|
|
|
|
the shader.
|
|
|
|
The minimum granularity supported by this extension is 2x2 (for 2D textures)
|
|
|
|
and 2x2x2 (for 3D textures); the maximum granularity is 256x256 (for 2D
|
|
|
|
textures) or 64x32x32 (for 3D textures).
|
|
|
|
Each footprint query returns the footprint from a single texture level.
|
|
|
|
When using minification filters that combine accesses from multiple mipmap
|
|
|
|
levels, shaders must perform separate queries for the two levels accessed
|
2018-10-09 01:12:09 +02:00
|
|
|
("`fine`" and "`coarse`").
|
2018-09-15 18:35:16 -07:00
|
|
|
The footprint query also returns a flag indicating if the texture lookup
|
|
|
|
would access texels from only one mipmap level or from two neighboring
|
|
|
|
levels.
|
|
|
|
|
|
|
|
This extension should be useful for multi-pass rendering operations that do
|
|
|
|
an initial expensive rendering pass to produce a first image that is then
|
|
|
|
used as a texture for a second pass.
|
|
|
|
If the second pass ends up accessing only portions of the first image (e.g.,
|
|
|
|
due to visbility), the work spent rendering the non-accessed portion of the
|
|
|
|
first image was wasted.
|
|
|
|
With this feature, an application can limit this waste using an initial pass
|
|
|
|
over the geometry in the second image that performs a footprint query for
|
|
|
|
each visible pixel to determine the set of pixels that it needs from the
|
|
|
|
first image.
|
|
|
|
This pass would accumulate an aggregate footprint of all visible pixels into
|
2018-10-09 01:12:09 +02:00
|
|
|
a separate "`footprint image`" using shader atomics.
|
2018-09-15 18:35:16 -07:00
|
|
|
Then, when rendering the first image, the application can kill all shading
|
|
|
|
work for pixels not in this aggregate footprint.
|
|
|
|
|
|
|
|
This extension has a number of limitations.
|
|
|
|
The code:OpImageSampleFootprintNV instruction only supports for two- and
|
|
|
|
three-dimensional textures.
|
|
|
|
Footprint evaluation only supports the CLAMP_TO_EDGE wrap mode; results are
|
|
|
|
undefined for all other wrap modes.
|
|
|
|
Only a limited set of granularity values and that set does not support
|
|
|
|
separate coverage information for each texel in the original image.
|
|
|
|
|
|
|
|
When using SPIR-V generated from the OpenGL Shading Language, the new
|
|
|
|
instruction will be generated from code using the new
|
|
|
|
code:textureFootprint*NV built-in functions from the
|
|
|
|
`GL_NV_shader_texture_footprint` shading language extension.
|
|
|
|
|
|
|
|
=== New Object Types
|
|
|
|
|
|
|
|
None.
|
|
|
|
|
|
|
|
=== New Enum Constants
|
|
|
|
|
|
|
|
* Extending elink:VkStructureType:
|
|
|
|
** ename:VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_SHADER_IMAGE_FOOTPRINT_FEATURES_NV
|
|
|
|
|
|
|
|
=== New Enums
|
|
|
|
|
|
|
|
None.
|
|
|
|
|
|
|
|
=== New Structures
|
|
|
|
|
|
|
|
* slink:VkPhysicalDeviceShaderImageFootprintFeaturesNV
|
|
|
|
|
|
|
|
=== New Functions
|
|
|
|
|
|
|
|
None.
|
|
|
|
|
|
|
|
=== New SPIR-V Capability
|
|
|
|
|
|
|
|
* <<spirvenv-capabilities-table-imagefootprint,ImageFootprintNV>>
|
|
|
|
|
|
|
|
=== Issues
|
|
|
|
|
|
|
|
(1) The footprint returned by the SPIR-V instruction is a structure that
|
|
|
|
includes an anchor, an offset, and a mask that represents a 8x8 or 4x4x4
|
|
|
|
neighborhood of texel groups.
|
|
|
|
But the bits of the mask are not stored in simple pitch order.
|
|
|
|
Why is the footprint built this way?
|
|
|
|
|
|
|
|
*RESOLVED*: We expect that applications using this feature will want to use
|
|
|
|
a fixed granularity and accumulate coverage information from the returned
|
2018-10-09 01:12:09 +02:00
|
|
|
footprints into an aggregate "`footprint image`" that tracks the portions of
|
2018-09-15 18:35:16 -07:00
|
|
|
an image that would be needed by regular texture filtering.
|
|
|
|
If an application is using a two-dimensional image with 4x4 pixel
|
|
|
|
granularity, we expect that the footprint image will use 64-bit texels where
|
|
|
|
each bit in an 8x8 array of bits corresponds to coverage for a 4x4 block in
|
|
|
|
the original image.
|
|
|
|
Texel (0,0) in the footprint image would correspond to texels (0,0) through
|
|
|
|
(31,31) in the original image.
|
|
|
|
|
|
|
|
In the usual case, the footprint for a single access will fully contained in
|
|
|
|
a 32x32 aligned region of the original texture, which corresponds to a
|
|
|
|
single 64-bit texel in the footprint image.
|
|
|
|
In that case, the implementation will return an anchor coordinate pointing
|
|
|
|
at the single footprint image texel, an offset vector of (0,0), and a mask
|
|
|
|
whose bits are aligned with the bits in the footprint texel.
|
|
|
|
For this case, the shader can simply atomically OR the mask bits into the
|
|
|
|
contents of the footprint texel to accumulate footprint coverage.
|
|
|
|
|
|
|
|
In the worst case, the footprint for a single access spans multiple 32x32
|
|
|
|
aligned regions and may require updates to four separate footprint image
|
|
|
|
texels.
|
|
|
|
In this case, the implementation will return an anchor coordinate pointing
|
|
|
|
at the lower right footprint image texel and an offset will identify how
|
2018-10-21 06:08:41 -07:00
|
|
|
many "`columns`" and "`rows`" of the returned 8x8 mask correspond to
|
|
|
|
footprint texels to the left and above the anchor texel.
|
2018-09-15 18:35:16 -07:00
|
|
|
If the anchor is (2,3), the 64 bits of the returned mask are arranged
|
|
|
|
spatially as follows, where each 4x4 block is assigned a bit number that
|
|
|
|
matches its bit number in the footprint image texels:
|
|
|
|
|
2018-09-21 00:08:13 +02:00
|
|
|
----
|
2018-09-15 18:35:16 -07:00
|
|
|
+-------------------------+-------------------------+
|
|
|
|
| -- -- -- -- -- -- -- -- | -- -- -- -- -- -- -- -- |
|
|
|
|
| -- -- -- -- -- -- -- -- | -- -- -- -- -- -- -- -- |
|
|
|
|
| -- -- -- -- -- -- -- -- | -- -- -- -- -- -- -- -- |
|
|
|
|
| -- -- -- -- -- -- -- -- | -- -- -- -- -- -- -- -- |
|
|
|
|
| -- -- -- -- -- -- -- -- | -- -- -- -- -- -- -- -- |
|
|
|
|
| -- -- -- -- -- -- 46 47 | 40 41 42 43 44 45 -- -- |
|
|
|
|
| -- -- -- -- -- -- 54 55 | 48 49 50 51 52 53 -- -- |
|
|
|
|
| -- -- -- -- -- -- 62 63 | 56 57 58 59 60 61 -- -- |
|
|
|
|
+-------------------------+-------------------------+
|
|
|
|
| -- -- -- -- -- -- 06 07 | 00 01 02 03 04 05 -- -- |
|
|
|
|
| -- -- -- -- -- -- 14 15 | 08 09 10 11 12 13 -- -- |
|
|
|
|
| -- -- -- -- -- -- 22 23 | 16 17 18 19 20 21 -- -- |
|
|
|
|
| -- -- -- -- -- -- 30 31 | 24 25 26 27 28 29 -- -- |
|
|
|
|
| -- -- -- -- -- -- 38 39 | 32 33 34 35 36 37 -- -- |
|
|
|
|
| -- -- -- -- -- -- -- -- | -- -- -- -- -- -- -- -- |
|
|
|
|
| -- -- -- -- -- -- -- -- | -- -- -- -- -- -- -- -- |
|
|
|
|
| -- -- -- -- -- -- -- -- | -- -- -- -- -- -- -- -- |
|
|
|
|
+-------------------------+-------------------------+
|
2018-09-21 00:08:13 +02:00
|
|
|
----
|
2018-09-15 18:35:16 -07:00
|
|
|
|
|
|
|
To accumulate coverage for each of the four footprint image texels, a shader
|
|
|
|
can AND the returned mask with simple masks derived from the x and y offset
|
|
|
|
values and then atomically OR the updated mask bits into the contents of the
|
|
|
|
corresponding footprint texel.
|
|
|
|
|
|
|
|
[source,c++]
|
2018-09-21 00:08:13 +02:00
|
|
|
----
|
2018-09-15 18:35:16 -07:00
|
|
|
uint64_t returnedMask = (uint64_t(footprint.mask.x) | (uint64_t(footprint.mask.y) << 32));
|
|
|
|
uint64_t rightMask = ((0xFF >> footprint.offset.x) * 0x0101010101010101UL);
|
|
|
|
uint64_t bottomMask = 0xFFFFFFFFFFFFFFFFUL >> (8 * footprint.offset.y);
|
|
|
|
uint64_t bottomRight = returnedMask & bottomMask & rightMask;
|
|
|
|
uint64_t bottomLeft = returnedMask & bottomMask & (~rightMask);
|
|
|
|
uint64_t topRight = returnedMask & (~bottomMask) & rightMask;
|
|
|
|
uint64_t topLeft = returnedMask & (~bottomMask) & (~rightMask);
|
2018-09-21 00:08:13 +02:00
|
|
|
----
|
2018-09-15 18:35:16 -07:00
|
|
|
|
|
|
|
(2) What should an application do to ensure maximum performance when
|
|
|
|
accumulating footprints into an aggregate footprint image?
|
|
|
|
|
Change log for October 7, 2018 Vulkan 1.1.87 spec update:
* Update release number to 87.
Public Issues:
* Merge flink:vkCmdPipelineBarrier self-dependency barrier VUs referring
to the same subpass dependency (public pull request 756).
* Describe default value of `"optional"` attribute in the registry schema
document (public issue 769)
* Fix links in <<VK_NVX_raytracing>> extension (public pull request 805).
* Mark the <<VK_KHR_mir_surface>> extension obsolete (see public issue 814
- does not close this, however).
* Fix missing endif in Image Creation block (public issue 817).
Internal Issues:
* Clarify that the compressed texture formats corresponding to
<<features-features-textureCompressionETC2>>,
<<features-features-textureCompressionASTC_LDR>>, and
<<features-features-textureCompressionBC>> is not contingent on the
feature bits, and may be supported even if the features are not enabled
(internal issue 663).
* Clarify that code:FragStencilRefEXT is output only in the
<<interfaces-builtin-variables, Built-In Variables>> section (internal
issue 1173).
* Identify and correct many overly-aggressive uses of "`undefined`", and
narrow them down, where straightforward to do so. Mark such resolved
uses of "`undefined`" with the custom undefined: macro. Add a new
<<writing-undefined, Describing Undefined Behavior>> section (internal
issue 1267).
* Don't require code:inline_uniform_block descriptors to be populated
before use in the flink:vkAllocateDescriptorSets section (internal issue
1380).
* Allow suppressing inline SVG images by controlling this with an
attribute set in the Makefile, rather than the explicit [%inline]
directive (internal issue 1391).
* Mark 'Khronos' as a registered trademark in several places, now that it
is one.
* Fix typo in the <<VK_KHR_shader_atomic_int64>> appendix using the GLSL
naming of the compare exchange op when referring to the SPIR-V op.
* Specify in the flink:vkGetPhysicalDeviceQueueFamilyProperties section
that all implementations must support at least one queue family, and
that every queue family must contain at least one queue.
* Make slink:VkPipelineDynamicStateCreateInfo::pname:dynamicStateCount,
slink:VkSampleLocationsInfoEXT::pname:sampleLocationsPerPixel, and
slink:VkSampleLocationsInfoEXT::pname:sampleLocationsCount optional, to
fix bogus implicit valid usage checks that were causing failures in the
conformance tests.
* Fix vendor tag in reserved extension 237 constants. Does not affect
anything since it's just a placeholder, but this should avoid further
comments.
* Minor markup fixes in some extension appendices.
New Extensions:
* `<<VK_FUCHSIA_imagepipe_surface>>`
2018-10-07 06:10:21 -07:00
|
|
|
*RESOLVED*: We expect that the most common usage of this feature will be to
|
|
|
|
accumulate aggregate footprint coverage, as described in the previous issue.
|
2018-09-15 18:35:16 -07:00
|
|
|
Even if you ignore the anisotropic filtering case where the implementation
|
|
|
|
may return a granularity larger than that requested by the caller, each
|
|
|
|
shader invocation will need to use atomic functions to update up to four
|
|
|
|
footprint image texels for each level of detail accessed.
|
|
|
|
Having each active shader invocation perform multiple atomic operations can
|
|
|
|
be expensive, particularly when neighboring invocations will want to update
|
|
|
|
the same footprint image texels.
|
|
|
|
|
|
|
|
Techniques can be used to reduce the number of atomic operations performed
|
|
|
|
when accumulating coverage include:
|
|
|
|
|
Change log for November 25, 2018 Vulkan 1.1.94 spec update:
* Update release number to 94.
Public Issues:
* Use the terms "`texel block`" and "`texel block size`" instead of "`data
element`" and "`element size`", and define "`element`" as an array slot.
In addition to the terminology changes, retitled the <<texel-block-size,
Representation and Texel Block Size>> section and added texel block size
/ no. of texels/block information to the
<<features-formats-compatibility, Compatible Formats>> table. There is
some additional work underway to make sure the compatibility language
makes sense for all of uncompressed, compressed, and multiplanar formats
(public issue 763).
* Cleanup `VK_NV_ray_tracing` language (public issues 858, 859).
Internal Issues:
* Specify in <<shaders-invocationgroups, Invocation and Derivative
Groups>> and <<textures-output-format-conversion, Texel Output Format
Conversion>> that derivative groups are quads when code:SubgroupSize >=
4 (internal issue 1390).
* Make the type of slink:VkDescriptorUpdateTemplateCreateInfo::pNext
`const` following pattern for the other stext:Vk*CreateInfo structures
(internal issue 1459).
* Specify that flink:vkCmdClearAttachments executes as a drawing command,
rather than a transfer command (internal issue 1463).
* Update `VK_NV_ray_tracing` to use code:InstanceId instead of
code:InstanceIndex.
New Extensions:
* `VK_KHR_swapchain_mutable_format`
* `VK_EXT_fragment_density_map`
2018-11-25 23:27:30 -08:00
|
|
|
* Have logic that detects returned footprints where all components of the
|
|
|
|
returned offset vector are zero.
|
|
|
|
In that case, the mask returned by the footprint function is guaranteed
|
|
|
|
to be aligned with the footprint image texels and affects only a single
|
|
|
|
footprint image texel.
|
|
|
|
* Have fragment shaders communicate using built-in functions from the
|
|
|
|
`VK_NV_shader_subgroup_partitioned` extension or other shader subgroup
|
|
|
|
extensions.
|
|
|
|
If you have multiple invocations in a subgroup that need to update the
|
|
|
|
same texel (x,y) in the footprint image, compute an aggregate footprint
|
|
|
|
mask across all invocations in the subgroup updating that texel and have
|
|
|
|
a single invocation perform an atomic operation using that aggregate
|
|
|
|
mask.
|
|
|
|
* When the returned footprint spans multiple texels in the footprint
|
|
|
|
image, each invocation need to perform four atomic operations.
|
|
|
|
In the previous issue, we had an example that computed separate masks
|
|
|
|
for "`topLeft`", "`topRight`", "`bottomLeft`", and "`bottomRight`".
|
|
|
|
When the invocations in a subgroup have good locality, it might be the
|
|
|
|
case the "`top left`" for some invocations might refer to footprint
|
|
|
|
image texel (10,10), while neighbors might have their "`top left`"
|
|
|
|
texels at (11,10), (10,11), and (11,11).
|
|
|
|
If you compute separate masks for even/odd x and y values instead of
|
|
|
|
left/right or top/bottom, the "`odd/odd`" mask for all invocations in
|
|
|
|
the subgroup hold coverage for footprint image texel (11,11), which can
|
|
|
|
be updated by a single atomic operation for the entire subgroup.
|
2018-09-15 18:35:16 -07:00
|
|
|
|
|
|
|
=== Examples
|
|
|
|
|
|
|
|
TBD
|
|
|
|
|
|
|
|
=== Version History
|
|
|
|
|
|
|
|
* Revision 2, 2018-09-13 (Pat Brown)
|
|
|
|
- Add issue (2) with performance tips.
|
|
|
|
|
|
|
|
* Revision 1, 2018-08-12 (Pat Brown)
|
|
|
|
- Initial draft
|