Ashes of the Singularity User Benchmarks Thread

Page 16 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

zlatan

Senior member
Mar 15, 2011
580
291
136
Might be a misunderstanding here. Every multiprocessor use in-order logic in todays GPU. GCN just use out-of-order logic for the compute engines (ACEs).
 

Enigmoid

Platinum Member
Sep 27, 2012
2,907
31
91
Here is AMD's gcn (1.0) whitepaper.

The CU front-end can decode and issue seven different types of instructions: branches, scalar ALU or memory, vector ALU, vector memory, local data share,
global data share or export, and special instructions. Only issue one instruction of each type can be issued at a time per SIMD, to avoid oversubscribing
the execution pipelines. To preserve in-order execution, each instruction must also come from a different wavefront; with 10 wavefronts for each SIMD, there
are typically many available to choose from. Beyond these two restrictions, any mix is allowed, giving the compiler plenty of freedom to issue instructions
for execution.

https://www.amd.com/Documents/GCN_Architecture_whitepaper.pdf

The pipeline is in-order.

http://people.engr.ncsu.edu/hzhou/ipdps14.pdf

Again everything execution related is in order.

Again - you seem to be inferring things. Straight up AMD tells you that their execution pipeline is in-order.

What is out of order like is that if a particular instruction stalls due to dependancies, other SIMD units can be used to execute other previously given in order instructions instead of waiting.

4 cycles = 1 wavefront instruction.

Example

The GPU parallelizes some job into a number of tasks and sends them to execution units across the GPU. (In reality there are wavefronts and such but I'm summing things up). Tasks are set and defined (and they do not get reordered).

On CU #1 several tasks are being done.
Task 1: 100 cycles on 4 SIMD units
(other tasks available, waiting)

Task 1 encounters dependencies or some other problem, only 1 SIMD unit can be utilized.

CU #1 makes the decision to use the other 3 SIMD units for Task 2 (say next 4x4 group of pixels).

Tasks 1 and 2 are not reordered, the CU can control though which units get which tasks and in the event of a dependency. Remember Tasks 1 and 2 are not dependent (parallel tasks obviously on a GPU) on each other.

Now the driver/GPU/ACE units can rearrange tasks on a very high level but they have no say over rearranging execution. They can say, "We have a compute job and a render job - do the compute job first" and they can parallelize portions of each job (if possible - which in 99.999% of cases it will be). They do not execute the jobs out of order (I'm not sure on the specifics here).
 

shady28

Platinum Member
Apr 11, 2004
2,520
397
126
Was this reported anywhere in the press? I know at least one website used a GTX 960 but they didn't report multiple crashes.

There are multiple reports of crashes and even being unable to run the game at all in the founders forum on Oxide's site.

A "pre-Beta" (aka Alpha) application being unstable on some platforms is not normally newsworthy, it's to be expected. In my case I got into the game about 45 mins and my video driver started crashing repeatedly.

The main thing I get from that is that they're missing some error handling code to tell the application to exit when the driver resets / stops responding. Instead it keeps trying, and the driver keeps crashing, so I lose all video.
 

Abwx

Lifer
Apr 2, 2011
11,837
4,790
136
Here is AMD's gcn (1.0) whitepaper.
.

Is GCN 1.2 the same design as GCN 1.0..?.

According to your curious methodology we can use Kepler s white papers to explain how Maxwell works, or whatever previous uarch to explain current uarch iteration..
 

Enigmoid

Platinum Member
Sep 27, 2012
2,907
31
91
Is GCN 1.2 the same design as GCN 1.0..?.

According to your curious methodology we can use Kepler s white papers to explain how Maxwell works, or whatever previous uarch to explain current uarch iteration..

The pipeline is (basically) the same. There may be tweaks but on a high level its the same functionality (GCN 1.2 would have enhanced certain things but would not have changed major properties).

Changing the pipeline is a major, major arch change.

GCN 1.0 -> 1.2: minor evolution
Kepler -> Maxwell: major change, though I would bet there are a lot of similarities in how data is managed. Nvidia is very tightlipped about this so I really don't know.

Out of Order for a GPU pipeline simply doesn't make sense. Its very power hungry, and thus of limited use in a device that scales well with die size. GPUs are designed for thoroughput so it makes sense to go for more execution units rather than die and power hungry OoO resources.
 
Last edited:

Abwx

Lifer
Apr 2, 2011
11,837
4,790
136
The pipeline is (basically) the same. There may be tweaks but on a high level its the same functionality (GCN 1.2 would have enhanced certain things but would not have changed major properties).

Changing the pipeline is a major, major arch change.

GCN 1.0 -> 1.2: minor evolution

But what is the evolution..?.Isnt it precisely the ACEs..??.


Kepler -> Maxwell: major change, though I would bet there are a lot of similarities in how data is managed. Nvidia is very tightlipped about this so I really don't know.

So they published nothing but still you re saying that there s a big change, i say that the big change is GPU frequency, wich is proved, and that the rest is minor..

Out of Order for a GPU pipeline simply doesn't make sense. Its very power hungry, and thus of limited use in a device that scales well with die size. GPUs are designed for thoroughput so it makes sense to go for more execution units rather than die and power hungry OoO resources.


I m not a GPU freak but from i did read from Zlatan it s not the pipeline that is out of order but the queues management, indeed i wouldnt expect you to be accurate...
 

Enigmoid

Platinum Member
Sep 27, 2012
2,907
31
91
But what is the evolution..?.Isnt it precisely the ACEs..??.

http://www.anandtech.com/show/6837/...feat-sapphire-the-first-desktop-sea-islands/2
Ultimately the differences between GCN 1.0 and GCN 1.1 are extremely minor, but they are real.

Same for 1.2. AMD changed other things such as tesselation, colour compression, powertune, etc. but the major execution resources are unchanged.

So they published nothing but still you re saying that there s a big change, i say that the big change is GPU frequency, wich is proved, and that the rest is minor..

They published lots of things.

Many changes happened. You can read up and educate yourself.

I m not a GPU freak but from i did read from Zlatan it s not the pipeline that is out of order but the queues management, indeed i wouldnt expect you to be accurate...

I never said the pipeline is out of order, I've been saying the opposite. GCN has dynamic hardware scheduling.
 

Enigmoid

Platinum Member
Sep 27, 2012
2,907
31
91

Of interest.

One effect of having the ACEs is that GCN has a limited ability to execute tasks out of order. As we mentioned previously GCN is an in-order architecture, and the instruction stream on a wavefront cannot be reodered. However the ACEs can prioritize and reprioritize tasks, allowing tasks to be completed in a different order than they’re received. This allows GCN to free up the resources those tasks were using as early as possible rather than having the task consuming resources for an extended period of time in a nearly-finished state. This is not significantly different from how modern in-order CPUs (Atom, ARM A8, etc) handle multi-tasking.
 
Last edited:

VR Enthusiast

Member
Jul 5, 2015
133
1
0
Why use Anandtech's GCN 1.1 review when you can use Anandtech's GCN 1.2 review instead?

http://www.anandtech.com/show/8460/amd-radeon-r9-285-review/2

GCN12ISA_575px.png


It’s no surprise then that one of the first things we find on AMD’s list of features for the GCN 1.2 instruction set is “improved compute task scheduling”.

I hope this was a mistake and not a deliberate attempt at misinformation.
 

Red Hawk

Diamond Member
Jan 1, 2011
3,266
169
106
But what is the evolution..?.Isnt it precisely the ACEs..??.

Not necessarily. GCN 1.1 and 1.2 have more ACEs -- at least, the big chips do. But the ACEs have always been there in GCN. Tahiti, Pitcairn, and Cape Verde all had 2 each. Hawaii, Tonga, and Fiji all have 8, while Bonaire has 2.
 

Enigmoid

Platinum Member
Sep 27, 2012
2,907
31
91
Why use Anandtech's GCN 1.1 review when you can use Anandtech's GCN 1.2 review instead?

http://www.anandtech.com/show/8460/amd-radeon-r9-285-review/2

GCN12ISA_575px.png


I hope this was a mistake and not a deliberate attempt at misinformation.

I should have linked that. There are changes, however, these are small changes as compared to kepler -> maxwell or VLIW4 -> GCN.

Ultimately that though is about as informative as intel's continual "branch prediction improvements" for each iteration of their CPUs. This is likely the fact that they went from 2 ACE units to 8 (which I think everyone is aware of).
 

VR Enthusiast

Member
Jul 5, 2015
133
1
0
I should have linked that. There are changes, however, these are small changes as compared to kepler -> maxwell or VLIW4 -> GCN.

Ultimately that though is about as informative as intel's continual "branch prediction improvements" for each iteration of their CPUs. This is likely the fact that they went from 2 ACE units to 8 (which I think everyone is aware of).

My above comment wasn't aimed at you in particular. ;)

I agree that they aren't giving out a lot of information. We do know that there are other changes in preemption though from what Zlatan has said previously. GCN 1.2 supports finer grained preemption than previous GCN, and Nvidia has stated that Pascal will have fine-grained preemption as well. I have seen that personally on a slide and I have no reason to doubt what Zlatan says about it.
 
Last edited:

monstercameron

Diamond Member
Feb 12, 2013
3,818
1
0
My above comment wasn't aimed at you in particular. ;)

I agree that they aren't giving out a lot of information. We do know that there are other changes in preemption though from what Zlatan has said previously. GCN 1.2 supports finer grained preemption than previous GCN, and Nvidia has stated that Pascal will have fine-grained preemption as well. I have seen that personally on a slide and I have no reason to doubt what Zlatan says about it.
They do have programming guides for all their uarchs.
 

AtenRa

Lifer
Feb 2, 2009
14,003
3,362
136
Here is AMD's gcn (1.0) whitepaper.

The CU front-end can decode and issue seven different types of instructions: branches, scalar ALU or memory, vector ALU, vector memory, local data share,
global data share or export, and special instructions. Only issue one instruction of each type can be issued at a time per SIMD, to avoid oversubscribing
the execution pipelines. To preserve in-order execution, each instruction must also come from a different wavefront; with 10 wavefronts for each SIMD, there
are typically many available to choose from. Beyond these two restrictions, any mix is allowed, giving the compiler plenty of freedom to issue instructions
for execution.

https://www.amd.com/Documents/GCN_Architecture_whitepaper.pdf

The pipeline is in-order.

This is only for the Compute Unit(CU), there are lots of CUs per GPU. HD7970 has 32x CUs.

What we want to know is how many commands the Command Processor can fetch per cycle. Im not 100% sure but i believe there is only a single command per cycle.
Then you have multiple instructions and then multiple wavefronts.
 

Abwx

Lifer
Apr 2, 2011
11,837
4,790
136
They published lots of things.

Many changes happened. You can read up and educate yourself.

.

Why should i educate myself..?.

To post constant bashing about a single brand..?.

I need no such education because there s a specialist hanging here and he knows better than me, and of course of you despite your denials when adressing him, at some point you should apply your own advices, all your previous post are innacurately quoting the infos provided by AMD...
 
Last edited:

Enigmoid

Platinum Member
Sep 27, 2012
2,907
31
91
Why should i educate myself..?.

To post constant bashing about a single brand..?.

I need no such education because there s a specialist hanging here and he knows better than me, and of course of you despite your denials when adressing him, at some point you should apply your own advices, all your previous post are innacurately quoting the infos provided by AMD...

Enough said.
 

96Firebird

Diamond Member
Nov 8, 2010
5,738
334
126
Haha, great post...

I think the major problem here is people taking a random forum poster's posts as gospel. But I guess that is easier than educating yourself...
 
Last edited:
Feb 19, 2009
10,457
10
76
Might be a misunderstanding here. Every multiprocessor use in-order logic in todays GPU. GCN just use out-of-order logic for the compute engines (ACEs).

Thank you for clarifying this important difference because reading up on the articles it is confusing when they say they can go out-of-order with the ACEs, but they can't do that with the CP/graphics?

Still, that's quite unique because it means GCN's compute can indeed run in parallel (leapfrog/bypass traffic blocks), no context switch required through the ACEs since they are separate pipelines. But it can only do this for compute tasks, hence the limited out-of-order and why it excels for VR. Makes sense now.

ps. Everyone should educate themselves, if they have an interest in uarchs. We are all indeed laymen.
 
Last edited:

tential

Diamond Member
May 13, 2008
7,348
642
121
I really just read that... didn't I...
I mean I know it's not possible, but I feel a statement like that really should be bannable on a forum like this. Pretty much defeats the purpose. Again, obviously not possible but..... Lol.... Wow. He just made it to my ignore list at least.

I hope everyone else does the same and don't bait into him.
 

Despoiler

Golden Member
Nov 10, 2007
1,968
773
136
Ok final set of benches updated for this build. What I would consider clean runs.

DX12 > DX11 on all benches. DX12 rips through everything that is thrown at it. Smooth all of the time. DX11 turns into a stutter fest the more it's pushed. As I've been saying, the next gen APIs give a superior user experience due to less FPS variance. It's not just about max fps anymore. Average FPS is higher because min fps is much higher.

Normal / Medium / Heavy Batches in % FPS increase
GPU bound +7, +21, +58%
CPU bound +242, +336, 408%

http://forums.anandtech.com/showpost.php?p=37652857&postcount=343
http://forums.anandtech.com/showpost.php?p=37652858&postcount=344
http://forums.anandtech.com/showpost.php?p=37652886&postcount=345
http://forums.anandtech.com/showpost.php?p=37652890&postcount=346
 
Last edited: