AMD DirectX 11 White Paper Released

Kakkoii · Jun 27, 2009

http://www.legitreviews.com/article/1001/1/

(To big to quote.)

Shader Model 5.0 comparison chart:
http://www.legitreviews.com/im...1/directx_11_chart.jpg

1. Improved Parallelism - The following features of DirectX 11-enabled GPUs greatly enhance a programmer?s ability to exploit parallelism:

* Increased Thread Group Size and 3D Thread Dispatch: A Thread Group is a set of threads that work together to efficiently implement a partitioned data parallel algorithm. DirectX 11-enabled GPUs improve the efficiency of memory accesses by allowing the coherent exchange of data between threads within a group, thus enabling parallel algorithms to execute in fewer passes. This is designed to not only increases processing speed, but to improve power efficiency as well, by allowing higher throughput with fewer accesses to offchip memory. Shader Model 5.0 supports larger and more flexible thread groups with 3D indexing, giving programmers improved control over the domain of defining their algorithms, and enabling additional throughput due to increased multi-threading in the GPU.

* Atomic Operation Support: This is a key feature of CPUs that programmers have been demanding for GPUs as well. Atomic operations enable the more efficient and accurate combination of operations that try to modify the same memory addresses. GPUs are capable of running thousands of threads or thread groups in parallel, and if two or more of these threads try to manipulate the same variable or access the same memory location, it could result in data corruption. Without atomic operations, programmers either had to modify their algorithms to avoid these situations, or otherwise serialize updates to shared variables or memory locations (effectively eliminating much of the performance benefit from parallel processing). Atomic operations allow these situations to be handled gracefully regardless of the number of parallel threads being executed, which helps maximize performance and simplify porting of algorithms from the CPU to the GPU.

* Gather4: Modern GPUs use dedicated hardware blocks known as texture units to fetch data rapidly into their processing cores. These texture units have historically been optimized for rendering graphics, where techniques such as bilinear filtering are typically used to improve image quality. Compute Shaders often make use of these same units to fetch data as well, but they generally have no use for their filtering capabilities, leaving them underutilized. GPUs with Shader Model 5.0 support have the ability to use the excess fetch capability with the Gather4 operation, which can fetch up to 4 values simultaneously and provide a 4xincrease in data bandwidth.

2. Improved Precision and Integer Processing:

DirectX 11 enables support for double precision (64-bit) floating point operations on the GPU, according to the IEEE-754 standard. Until recently, this level of precision was only supported on CPUs, with GPUs being limited to single precision (32-bit) operations. While single precision is sufficient for most graphics applications, it can be insufficient for some simulation or number-crunching tasks that require large numbers of iterations on a single data value, or work with very large or very small values. Shader Model 5.0 also adds new integer and bit manipulation operations, such as count bits set, find first bit, insert/extract bit fields, reverse bits, and new bit shift operations. Applications such as video processing and cryptography use operations like these extensively, and can therefore benefit from improved performance on DirectX 11 GPUs.

3. Tight Integration between Compute Shaders and Rendering Pipeline:

Although Compute Shaders are primarily intended to handle non-graphics tasks, they can often be used to enhance or interface with a rendering pipeline to influence what is sent to a display. Examples include simulation tasks, like game physics or artificial intelligence, that can control the movement or behavior of objects and characters that are drawn on-screen; sorting techniques, like order independent transparency, that optimize the rendering of large numbers of objects; and postprocessing effects, like tone mapping and depth of field, which can apply various filters to modify and enhance an image after it has finished rendering. DirectX 11 Compute Shaders share a common instruction set with other DirectX 11 shader types used for rendering (including Vertex, Hull, Domain, Geometry, and Pixel Shaders), and can share data structures to implement these techniques in a much more practical and efficient manner.

4. Improved Ease of Programming and more efficient memory usage:

Powerful hardware is useless without software that can take advantage of the hardware?s capabilities. As a compute language, Shader Model 5.0 enables significant improvements that can enhance a programmer?s ability to model programs and algorithms for the GPU that were once impractical or impossible. By liberating development time from working around the restrictions of earlier GPU compute languages, the programmer?s imagination and energies can be focused instead on the actual compute problem. Shaders Model 5.0 adds some key features that make it easier to model and solve compute problems on the GPU, including:

* Increased Shared Memory with Improved Access: A key feature of DirectX 11 Compute Shaders is support for shared memory, which allows communication between threads. Shader Model 5.0 doubles the amount of shared memory available to a thread group, from 16 to 32 kilobytes. In addition to more shared memory, DirectX 11 class GPUs allow indexed reads and writes to this memory, whereas older DirectX 10 / 10.1 class GPUs limited access to private writes with shared reads. Allowing threads to directly read and write shared memory increases data parallelism within thread groups and simplifies porting of CPU code to run on the GPU. The combination of larger thread groups and more shared memory can also greatly reduce the number of non-local memory accesses required by some algorithms, which would reduce memory bandwidth requirements and improve performance.

* Append/Consume Buffers: Shader Model 5.0 supports a new type of data buffer that behaves like a stack or a list, instead of a fixed array of values. New data values are written to the end of the list as they are generated, or read from the end of the list as they are required. This is useful for implementing irregular data structures that require a different number of data values for each element, or for adaptive techniques like stream compaction that do a variable amount of processing for each element. Append buffers allow these processes to be performed in a single pass over the data, instead of requiring multiple passes which consume more memory bandwidth and compute cycles.

* Unordered Access Views (UAV): A UAV is a buffer that allows data to be written to or read from arbitrary locations, instead of a pre-defined order. Also known as ?scatter/gather? operations, these add a great deal of flexibility that was not available in older GPUs. DirectX 11 extends this flexibility beyond what was possible with DirectX 10 class GPUs by allowing Compute Shaders to access up to 8 different UAVs at a time instead of just one. The DirectX 11 programming interface allows these UAVs to be accessed by Pixel Shaders as well, which facilitates data sharing between Compute Shaders and the rendering pipeline. These enhancements allow a variety of pre- and post-processing algorithms to be implemented more efficiently with DirectX 11 class GPUs.

* Indirect Compute Dispatch: This feature enables the generation of new workloads created by previous rendering or compute shading without CPU intervention. This further reduces CPU overhead and frees up more processing time to be used on other tasks.

thilanliyan · Jun 27, 2009

Hopefully we get to see some of the benefits and eye candy of DX11 soon after it launches but I won't hold my breath.

Sylvanas · Jun 27, 2009

Well all we require is appropriate drivers and previous DX10/10.1 cards and we can take advantage of multi-threaded drivers- something that could offer a performance boost just by using the DX11 runtime. More about that here.

Scali · Jun 27, 2009

Originally posted by: Sylvanas
Well all we require is appropriate drivers and previous DX10/10.1 cards and we can take advantage of multi-threaded drivers- something that could offer a performance boost just by using the DX11 runtime. More about that here.

Yes, but not by just installing the runtime on your PC. The game actually needs to use the DX11 API to take advantage of the multithreaded features. The developer has to explicitly code in the parallel handling of rendering operations.

Another bonus for DX10/10.1 cards with DX11 is Compute Shaders. Instead of using either Cuda or Stream, you can now use DX11 CS to run GPGPU code on any compatible hardware. With a bit of luck, it will give the market for GPGPU applications such as video encoding and physics a nice boost. Currently there is too little choice, too little compatibility, and too little consistency. Applications like folding@home and Espresso may run on both AMD and nVidia hardware, but they don't produce the same results.

lavaheadache · Jun 28, 2009

how come I don't see physx in that list?

apoppin · Jun 28, 2009

Originally posted by: lavaheadache
how come I don't see physx in that list?

-Wait for the Nvidia White Paper

Physics

3. Tight Integration between Compute Shaders and Rendering Pipeline:

Although Compute Shaders are primarily intended to handle non-graphics tasks, they can often be used to enhance or interface with a rendering pipeline to influence what is sent to a display. Examples include simulation tasks, like game physics or artificial intelligence . . .

i doubt you are gonna see the "fizz" and 'X' together in an AMD white paper
... for awhile, anyway

Lonyo · Jun 28, 2009

Yeah, we just need to wait for things like Havok and PhysX to come to DX11 Compute, rather than being vendor specific (well, Havok might have been if they had ever released anything)

reallyscrued · Jun 28, 2009

Originally posted by: lavaheadache
how come I don't see physx in that list?

There's always one....

Jacen · Jun 29, 2009

Ya, that physx adoption is unbelievable. And here I thought KillerNic's had the record for mainstream success.

Glad to see the meat behind some of the demos and videos that have been floating around.

Scali · Jun 29, 2009

I wonder if any physics API will be using DX CS, to be honest. The obvious limitation is that it will only work with a DirectX application. That makes it difficult to use with non-DX applications, and impossible to use on other platforms, such as consoles, which don't have DX at all.
OpenCL seems to be a better choice, at least initially.

Search

AMD DirectX 11 White Paper Released

Kakkoii

Senior member

thilanliyan

Lifer

Sylvanas

Diamond Member

Scali

Banned

lavaheadache

Diamond Member

apoppin

Lifer

Lonyo

Lifer

reallyscrued

Platinum Member

Jacen

Member

Scali

Banned

TRENDING THREADS