D3D12 is Coming! AMD Presentation

Page 4 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

TheELF

Diamond Member
Dec 22, 2012
4,027
753
126
Are there other dx12 slides/presentations talking about these things besides AMD slides? Or are they just too technical. Some MS slides for example. Always an AMD logo on these things.

directX 12 presentations.
Doesn't matter, 15ms is still hundreds of thousands of cycles, even on the slowest of cpus like the ones in the consoles,so it doesn't matter "how much more work can we feed the GPU simultaneously per cycle" because a much faster core will need much less cycles to do the same work.

And here is the intel presentation,showing the real reason for dx12
http://www.anandtech.com/show/8388/intel-demonstrates-direct3d-12-performance-and-power-improvements

case 1
gpu works less so cpu can work more while staying in the same tdp envelope
(dx12 33fps)
or case 2
you can get the same performance as with dx11 with a lot less power consumption
 

jpiniero

Lifer
Oct 1, 2010
16,823
7,266
136
No by most objective standards they do not, juddery movement due to low frame rate targets and inconsistent frame times. Lack of high resolution

You do know that only like 2% of PC Gamers (according to Steam) are running any res above 1920x1200?

They maintain a static standard of quality relative to games developer by peers because the hardware cannot change and therefore nothing better can ever be released, only through relatively minor optimization due to understanding the hardware better can developers in future offer better games.

Developers have been able to make decent gains in previous consoles with better optimization. Go look at the difference between games at the beginning and the end of a lifecycle and you will see a big difference.

The vast majority of the reasons that PC games look close to console games is because developers are targeting the lowest common denominator which is the console and releasing largely the same game for both platforms, it has absolutely nothing to do with the lack of power on the PC.

No, they are specifically developing for the consoles. The PC is just there for some extra revenue. That's why PC games run so poorly on such faster hardware, because they spend as little as possible on the PC Version - and in some cases even less than that (see Batman AK). DX12 will greatly help with this by eliminating a huge bottleneck, but only slower hardware will benefit.
 

AtenRa

Lifer
Feb 2, 2009
14,003
3,362
136
Like I didn't explain it.

"While the hexa does 6*thread=6 units of work the quad will do 4*thread*2=8 units of work,in each cycle. "

Just to point out for a last time that you cannot have more than 4 units of work per cycle with a Quad core.
What you want to say is that the a 2x faster Quad Core will do 8 units of work per unit time (example per second).

One Cycle is not the same as one second.
 

Paul98

Diamond Member
Jan 31, 2010
3,732
199
106
Heh people dont understand, we are not talking only about how many draw calls each CPU can issue per second.
We are talking about command list(Buffer) and how much more work can we feed the GPU simultaneously per cycle.

Watch the video in the OP again.

also a few slides,

cmd_buffer_behavior-dx11.jpg



cmd_buffer_behavior-dx12.jpg


ps: This has nothing to do with AMD vs Intel (for those of you that mentioned it), its purely about Multi-Core CPUs and Throughput.

As i and others have said, these are not doing tasks in single cpu cycles. Also since its multithreaded, you need to think of the cpu core switching between these threads.

The cpu and how they handle threading and how many cpu cycles it takes to do things is very different than what you are thinking.
 

AtenRa

Lifer
Feb 2, 2009
14,003
3,362
136
As i and others have said, these are not doing tasks in single cpu cycles. Also since its multithreaded, you need to think of the cpu core switching between these threads.

The cpu and how they handle threading and how many cpu cycles it takes to do things is very different than what you are thinking.

The following pic is taken from the OP Video describing a 6-core CPU, each horizontal line represents a single cycle.

What do you think will happen in cycle #2(6x Shadow Maps) when you have a Quad Core ??

Because you only have enough cores to issue 4x Shadow Maps, the rest 2x Shadow Maps will have to be issued in the next cycle.
But if you move 2x Shadow Maps in Cycle #3, you only then have enough Cores to issue the Primary View and Rear View plus the two remaining Shadow Maps in Cycle #3.
That will force you to move Physics to Cycle #4, and then you will have to split Network and World update between Cycle #5 and Cycle #6.
Cycle #6 will be the final stage with the remaining of Network, World Update and the Post Processing.

And that is how you get 2 more cycles (6) to finish the same job that a 6-core would finish in 4 cycles.

2ujgqrd.jpg
 

PrincessFrosty

Platinum Member
Feb 13, 2008
2,300
68
91
www.frostyhacks.blogspot.com
You do know that only like 2% of PC Gamers (according to Steam) are running any res above 1920x1200?

Developers have been able to make decent gains in previous consoles with better optimization. Go look at the difference between games at the beginning and the end of a lifecycle and you will see a big difference.

No, they are specifically developing for the consoles. The PC is just there for some extra revenue. That's why PC games run so poorly on such faster hardware, because they spend as little as possible on the PC Version - and in some cases even less than that (see Batman AK). DX12 will greatly help with this by eliminating a huge bottleneck, but only slower hardware will benefit.

So what, most AAA console games don't even run at 1080p, certainly all the major titles on the last gen and what few titles in the current gen run at that res do so with abismal frame rates or look terrible.

Yes console devepopers optimise, I acknowledged this, saying there is a big difference is hyperbole because the difference relative to the changing environment isn't big at all. In the time it takes developers to go from badly optimised games in year 1 to well optimised games in year 8 they've managed to squeeze a 10-20% more out of their engines or somehing certainly not much more than that. Where as PCs have gone through an aggressive double of speed cycle every 2 years giving us hardware .

Pretty much by the time the consoles are at the end of their lifespan we have the same amount of processing power in mobile/tablet chips (tegra), and the console games with their minor optimisations look like rubbish compared to modern PC games.

The start of this video massively mis-characterises the relationship of both the amount of power available to both systems and how they interact in the market, it's great that DX12 allows for lower level control now because we'll see benefits from that and it's great, but it's no reason to start telling fibs about the console/PC relationship.
 

TheELF

Diamond Member
Dec 22, 2012
4,027
753
126
The following pic is taken from the OP Video describing a 6-core CPU, each horizontal line represents a single cycle.

Sure it does,and the cull block that spans three thread blocks means that they run a single thread spanned over three cores,how they manage to do that?Because...magic!
 

AtenRa

Lifer
Feb 2, 2009
14,003
3,362
136
Sure it does,and the cull block that spans three thread blocks means that they run a single thread spanned over three cores,how they manage to do that?Because...magic!

And ??? what this has to do with what we are talking here ??
 

jpiniero

Lifer
Oct 1, 2010
16,823
7,266
136
So what, most AAA console games don't even run at 1080p, certainly all the major titles on the last gen and what few titles in the current gen run at that res do so with abismal frame rates or look terrible.

Most of the PS4 titles are 1080p. 30 fps though.

and the console games with their minor optimisations look like rubbish compared to modern PC games

There are no 'modern PC (AAA) games' anymore. It's all console ports.
 

TheELF

Diamond Member
Dec 22, 2012
4,027
753
126
And ??? what this has to do with what we are talking here ??
Its just a block diagram of how the basics work,one line is not one cycle and one block (like the cull) does not mean its one thread or runs on three cores.
It's just a simplified view of the basics.
15ms is not one cycle,get over it.
 

AtenRa

Lifer
Feb 2, 2009
14,003
3,362
136
Its just a block diagram of how the basics work,one line is not one cycle and one block (like the cull) does not mean its one thread or runs on three cores.
It's just a simplified view of the basics.
15ms is not one cycle,get over it.

OK, so If you understand how the basics work, could you present your own sequence of the same workload as above but with a Quad Core CPU (4x Threads vs 6x Threads). ???
 

BSim500

Golden Member
Jun 5, 2013
1,480
216
106
The following pic is taken from the OP Video each horizontal line represents a single cycle.[snip]

Because you only have enough cores to issue 4x Shadow Maps, the rest 2x Shadow Maps will have to be issued in the next cycle.
All you're doing is posting the same false misleading AMD marketing slide after misleading AMD marketing slide. As people have repeatedly pointed out, "Cycles" and "jobs" are not the same thing. You said yourself "2x faster Quad Core will do 8 units of work per unit time (example per second)" yet strangely you can't seem to grasp what that means - no matter what core a job is assigned to or even how well / badly the game is threaded, the overall "layout" and "job order" of those AMD charts obviously won't be the same nor will the row height (length of time required to finish a job) for different CPU's with up to 100% IPC disparities. You're arguing over rearranging the jobs per row, when a radically higher IPC means the actual number & synchronization of "rows" themselves would no longer be matched up. A more accurate chart of what you're trying to explain would look more like this than AMD's infamous marketing dept slides:-
x4vAREO.png


Consoles with even lower IPC & clock speeds (for which developers typically design for and make up 80-90% of cross-platform AAA sales) are even worse:-
2aKLQzn.png


Except that it wouldn't because in real-life even 4-cores are not 100% loaded 100% of the time, so it would take even longer with lots of "thread locks" being stuck inside the jobs themselves (which are spread over hundreds / thousands of CPU cycles) plus "these particular lines of code are simply unthreadable" gaps between them, padding the whole things out downwards disproportionately moreso the more cores you add, with often up to 55% white space overall CPU core loading on 6-8 core CPU's. That's also why even the 8-core FX-8350 still keeps getting trounced by even an i3 in half the AAA games because the Intel's effectively have double the "rows" (IPC) - and 3.0-3.5x per core vs consoles - whilst no matter how many extra "columns" (cores) you add, thread scaling is nowhere near 100% on anything other than "embarrassingly parallel" apps like x264 (which games certainly aren't even with Mantle / DX12 enabled). "Greater throughput" and "thread 5 & 6 jobs will have to wait until the next cycle" the way you are using them comparing across completely different architectures is false.

In every single AMD slide you see silly portrayals of how 100% of each core "will hopefully" be utilized on a hex core and how CPU usage "must" always 100% from all underlying game-code. I'm not talking about DX11 threading bottlenecking on one thread, but the greater bulk of the game code on the other 5/7 threads (AI, physics, audio, etc). DX12 will not magically make non-DX code any more threadable, yet you seem to be reacting almost like just because DX driver code is spread over more cores, every aspect of the game will suddenly magically be 100% threaded, 100% of the time based on AMD's slides, which is patently false on any CPU. As "TheElf" said, that AMD slide you keep quoting is a grossly over-simplified "block diagram" that doesn't even remotely represent (or seemingly intend to) either "one job per CPU cycle", or how even very well threaded games load individual cores or a CPU as a whole in actual practice.
 

AtenRa

Lifer
Feb 2, 2009
14,003
3,362
136
@ BSim500

Orange boxes are GPU workload, the CPU only issue draw calls for them.
Gray boxes are CPU workloads. You dont spend double the Cycles for issuing draw calls with a slower CPU.

Change your slides and only double the cycle times for the CPU workload (AI, Physics, Culling and Network, world update)

2ujgqrd.jpg
 
Last edited:

Paul98

Diamond Member
Jan 31, 2010
3,732
199
106
@ BSim500

Orange boxes are GPU workload, the CPU only issue draw calls for them.
Gray boxes are CPU workloads. You dont spend double the Cycles for issuing draw calls with a slower CPU.

Change your slides and only double the cycle times for the CPU workload (AI, Physics and Network, world update)

2ujgqrd.jpg

Stop trying to read this as some sort of technical information, it's not. It's just a marketing slide to show 6 threads being used. It doesn't matter what he puts in those boxes the result is the same. That isn't technical information ether, but is at least a somewhat better representation.
 

AtenRa

Lifer
Feb 2, 2009
14,003
3,362
136
Stop trying to read this as some sort of technical information, it's not. It's just a marketing slide to show 6 threads being used. It doesn't matter what he puts in those boxes the result is the same. That isn't technical information ether, but is at least a somewhat better representation.

OK, provide your own graph about the same workload then. It is fine if you dont agree with something but you also have to provide what you believe is the right one. ;)
 

TheELF

Diamond Member
Dec 22, 2012
4,027
753
126
OK, provide your own graph about the same workload then. It is fine if you dont agree with something but you also have to provide what you believe is the right one. ;)
You provide us with a paper that explains how draw calls are beeing done in one cycle.Or at least how " You dont spend double the Cycles for issuing draw calls with a slower CPU. "
 

Headfoot

Diamond Member
Feb 28, 2008
4,444
641
126
The following pic is taken from the OP Video describing a 6-core CPU, each horizontal line represents a single cycle.

What do you think will happen in cycle #2(6x Shadow Maps) when you have a Quad Core ??

Because you only have enough cores to issue 4x Shadow Maps, the rest 2x Shadow Maps will have to be issued in the next cycle.
But if you move 2x Shadow Maps in Cycle #3, you only then have enough Cores to issue the Primary View and Rear View plus the two remaining Shadow Maps in Cycle #3.
That will force you to move Physics to Cycle #4, and then you will have to split Network and World update between Cycle #5 and Cycle #6.
Cycle #6 will be the final stage with the remaining of Network, World Update and the Post Processing.

And that is how you get 2 more cycles (6) to finish the same job that a 6-core would finish in 4 cycles.

2ujgqrd.jpg

At 3.4 ghz stock sandy bridge runs twice as many cycles per second as a 1.6ghz console cpu. So every time console has gone once, the sandy has gone twice, and that's not even counting times it gets more IPC because its wider. 4 sandy cores @ 3.3 > 6x1.6ghz with weaker cores
 

Paul98

Diamond Member
Jan 31, 2010
3,732
199
106
OK, provide your own graph about the same workload then. It is fine if you dont agree with something but you also have to provide what you believe is the right one. ;)

I and others have already provided explorations in this thread. A few things about how it works. Each of those tasks takes something like 100s of cpu cycles. Each of these different tasks takes a different about of time to complete. Certain things have to be done in order while others do not. For multiple threads running at once the cpu still runs them in parallel switching between the threads every few cycles.

There is a lot more than what i just wrote,but that is the basics.
 

shady28

Platinum Member
Apr 11, 2004
2,520
397
126
Just to point out for a last time that you cannot have more than 4 units of work per cycle with a Quad core.
...

This is a false statement.

Go read up on Superscalar architecture.

Even FX series AMD processors do more than 1 IPC. The 8 cores can do about 23 per cycle where the 4770K can do about 34 per cycle. You have to go all the way back to the 486 to get down to <= 1 IPC per core.

https://en.wikipedia.org/wiki/Instructions_per_second
 

dogen1

Senior member
Oct 14, 2014
739
40
91
In the time it takes developers to go from badly optimised games in year 1 to well optimised games in year 8 they've managed to squeeze a 10-20% more out of their engines or somehing certainly not much more than that.

I think 10-20% is a big under-estimation. There are at least a few examples of games that have nearly doubled their polygon budgets, for example, going from a first to a second generation game.
 

Blitzvogel

Platinum Member
Oct 17, 2010
2,012
23
81
The Shadow Maps they are listing, are they shadow map draw calls or have CPUs been rendering shadows this whole time? Thought that was the job of the unified shaders and vertex shaders before that?
 

AtenRa

Lifer
Feb 2, 2009
14,003
3,362
136
This is a false statement.

Go read up on Superscalar architecture.

Even FX series AMD processors do more than 1 IPC. The 8 cores can do about 23 per cycle where the 4770K can do about 34 per cycle. You have to go all the way back to the 486 to get down to <= 1 IPC per core.

https://en.wikipedia.org/wiki/Instructions_per_second

You cannot have more than one Thread per Core (No SMT etc) per cycle.
What modern superscalar processors are able to do is to split each thread in to multiple sub portions (instructions) and decode, execute and retire them. But you always have a single thread per Core per cycle.
 

TheELF

Diamond Member
Dec 22, 2012
4,027
753
126
You cannot have more than one Thread per Core (No SMT etc) per cycle.
What modern superscalar processors are able to do is to split each thread in to multiple sub portions (instructions) and decode, execute and retire them. But you always have a single thread per Core per cycle.
Still some processors can execute twice the instruction every cycle compared to a slower one,just as some threads have less ,even halve,instructions than others.

Unless you show us that draw calls are being done in only one cycle all of this doesn't matter.
If it even takes a few cycles then the faster quad will be faster than the slower hexa.