Ashes of the Singularity User Benchmarks Thread

zlatan · Aug 25, 2015

Might be a misunderstanding here. Every multiprocessor use in-order logic in todays GPU. GCN just use out-of-order logic for the compute engines (ACEs).

Enigmoid · Aug 25, 2015

Here is AMD's gcn (1.0) whitepaper.

The CU front-end can decode and issue seven different types of instructions: branches, scalar ALU or memory, vector ALU, vector memory, local data share,
global data share or export, and special instructions. Only issue one instruction of each type can be issued at a time per SIMD, to avoid oversubscribing
the execution pipelines. To preserve in-order execution, each instruction must also come from a different wavefront; with 10 wavefronts for each SIMD, there
are typically many available to choose from. Beyond these two restrictions, any mix is allowed, giving the compiler plenty of freedom to issue instructions
for execution.

https://www.amd.com/Documents/GCN_Architecture_whitepaper.pdf

The pipeline is in-order.

http://people.engr.ncsu.edu/hzhou/ipdps14.pdf

Again everything execution related is in order.

Again - you seem to be inferring things. Straight up AMD tells you that their execution pipeline is in-order.

What is out of order like is that if a particular instruction stalls due to dependancies, other SIMD units can be used to execute other previously given in order instructions instead of waiting.

4 cycles = 1 wavefront instruction.

Example

The GPU parallelizes some job into a number of tasks and sends them to execution units across the GPU. (In reality there are wavefronts and such but I'm summing things up). Tasks are set and defined (and they do not get reordered).

On CU #1 several tasks are being done.
Task 1: 100 cycles on 4 SIMD units
(other tasks available, waiting)

Task 1 encounters dependencies or some other problem, only 1 SIMD unit can be utilized.

CU #1 makes the decision to use the other 3 SIMD units for Task 2 (say next 4x4 group of pixels).

Tasks 1 and 2 are not reordered, the CU can control though which units get which tasks and in the event of a dependency. Remember Tasks 1 and 2 are not dependent (parallel tasks obviously on a GPU) on each other.

Now the driver/GPU/ACE units can rearrange tasks on a very high level but they have no say over rearranging execution. They can say, "We have a compute job and a render job - do the compute job first" and they can parallelize portions of each job (if possible - which in 99.999% of cases it will be). They do not execute the jobs out of order (I'm not sure on the specifics here).

shady28 · Aug 25, 2015

VR Enthusiast said:
Was this reported anywhere in the press? I know at least one website used a GTX 960 but they didn't report multiple crashes.

There are multiple reports of crashes and even being unable to run the game at all in the founders forum on Oxide's site.

A "pre-Beta" (aka Alpha) application being unstable on some platforms is not normally newsworthy, it's to be expected. In my case I got into the game about 45 mins and my video driver started crashing repeatedly.

The main thing I get from that is that they're missing some error handling code to tell the application to exit when the driver resets / stops responding. Instead it keeps trying, and the driver keeps crashing, so I lose all video.

Abwx · Aug 25, 2015

Enigmoid said:
Here is AMD's gcn (1.0) whitepaper.
.

Is GCN 1.2 the same design as GCN 1.0..?.

According to your curious methodology we can use Kepler s white papers to explain how Maxwell works, or whatever previous uarch to explain current uarch iteration..

Enigmoid · Aug 25, 2015

Abwx said:
Is GCN 1.2 the same design as GCN 1.0..?.

According to your curious methodology we can use Kepler s white papers to explain how Maxwell works, or whatever previous uarch to explain current uarch iteration..

The pipeline is (basically) the same. There may be tweaks but on a high level its the same functionality (GCN 1.2 would have enhanced certain things but would not have changed major properties).

Changing the pipeline is a major, major arch change.

GCN 1.0 -> 1.2: minor evolution
Kepler -> Maxwell: major change, though I would bet there are a lot of similarities in how data is managed. Nvidia is very tightlipped about this so I really don't know.

Out of Order for a GPU pipeline simply doesn't make sense. Its very power hungry, and thus of limited use in a device that scales well with die size. GPUs are designed for thoroughput so it makes sense to go for more execution units rather than die and power hungry OoO resources.

Abwx · Aug 25, 2015

Enigmoid said:
The pipeline is (basically) the same. There may be tweaks but on a high level its the same functionality (GCN 1.2 would have enhanced certain things but would not have changed major properties).

Changing the pipeline is a major, major arch change.

GCN 1.0 -> 1.2: minor evolution

But what is the evolution..?.Isnt it precisely the ACEs..??.

Enigmoid said:
Kepler -> Maxwell: major change, though I would bet there are a lot of similarities in how data is managed. Nvidia is very tightlipped about this so I really don't know.

So they published nothing but still you re saying that there s a big change, i say that the big change is GPU frequency, wich is proved, and that the rest is minor..

Enigmoid said:
Out of Order for a GPU pipeline simply doesn't make sense. Its very power hungry, and thus of limited use in a device that scales well with die size. GPUs are designed for thoroughput so it makes sense to go for more execution units rather than die and power hungry OoO resources.

I m not a GPU freak but from i did read from Zlatan it s not the pipeline that is out of order but the queues management, indeed i wouldnt expect you to be accurate...

ShintaiDK · Aug 25, 2015

Abwx said:
But what is the evolution..?.Isnt it precisely the ACEs..??

http://www.anandtech.com/show/4455/amds-graphics-core-next-preview-amd-architects-for-compute/5

http://www.anandtech.com/show/7457/the-radeon-r9-290x-review/2

What changes was made to that structure? Not much it seems besides the hard limit.

Enigmoid · Aug 25, 2015

Abwx said:
But what is the evolution..?.Isnt it precisely the ACEs..??.

http://www.anandtech.com/show/6837/...feat-sapphire-the-first-desktop-sea-islands/2

Ultimately the differences between GCN 1.0 and GCN 1.1 are extremely minor, but they are real.

Same for 1.2. AMD changed other things such as tesselation, colour compression, powertune, etc. but the major execution resources are unchanged.

So they published nothing but still you re saying that there s a big change, i say that the big change is GPU frequency, wich is proved, and that the rest is minor..

They published lots of things.

Many changes happened. You can read up and educate yourself.

I m not a GPU freak but from i did read from Zlatan it s not the pipeline that is out of order but the queues management, indeed i wouldnt expect you to be accurate...

I never said the pipeline is out of order, I've been saying the opposite. GCN has dynamic hardware scheduling.

Enigmoid · Aug 25, 2015

ShintaiDK said:
http://www.anandtech.com/show/4455/amds-graphics-core-next-preview-amd-architects-for-compute/5

http://www.anandtech.com/show/7457/the-radeon-r9-290x-review/2

What changes was made to that structure? Not much it seems besides the hard limit.

Of interest.

One effect of having the ACEs is that GCN has a limited ability to execute tasks out of order. As we mentioned previously GCN is an in-order architecture, and the instruction stream on a wavefront cannot be reodered. However the ACEs can prioritize and reprioritize tasks, allowing tasks to be completed in a different order than they’re received. This allows GCN to free up the resources those tasks were using as early as possible rather than having the task consuming resources for an extended period of time in a nearly-finished state. This is not significantly different from how modern in-order CPUs (Atom, ARM A8, etc) handle multi-tasking.

VR Enthusiast · Aug 25, 2015

Why use Anandtech's GCN 1.1 review when you can use Anandtech's GCN 1.2 review instead?

http://www.anandtech.com/show/8460/amd-radeon-r9-285-review/2

It’s no surprise then that one of the first things we find on AMD’s list of features for the GCN 1.2 instruction set is “improved compute task scheduling”.

I hope this was a mistake and not a deliberate attempt at misinformation.

Red Hawk · Aug 25, 2015

Abwx said:
But what is the evolution..?.Isnt it precisely the ACEs..??.

Not necessarily. GCN 1.1 and 1.2 have more ACEs -- at least, the big chips do. But the ACEs have always been there in GCN. Tahiti, Pitcairn, and Cape Verde all had 2 each. Hawaii, Tonga, and Fiji all have 8, while Bonaire has 2.

Enigmoid · Aug 25, 2015

VR Enthusiast said:
Why use Anandtech's GCN 1.1 review when you can use Anandtech's GCN 1.2 review instead?

http://www.anandtech.com/show/8460/amd-radeon-r9-285-review/2

I hope this was a mistake and not a deliberate attempt at misinformation.

I should have linked that. There are changes, however, these are small changes as compared to kepler -> maxwell or VLIW4 -> GCN.

Ultimately that though is about as informative as intel's continual "branch prediction improvements" for each iteration of their CPUs. This is likely the fact that they went from 2 ACE units to 8 (which I think everyone is aware of).

VR Enthusiast · Aug 25, 2015

Duplicate, delete.

VR Enthusiast · Aug 25, 2015

Enigmoid said:
I should have linked that. There are changes, however, these are small changes as compared to kepler -> maxwell or VLIW4 -> GCN.

Ultimately that though is about as informative as intel's continual "branch prediction improvements" for each iteration of their CPUs. This is likely the fact that they went from 2 ACE units to 8 (which I think everyone is aware of).

My above comment wasn't aimed at you in particular.

I agree that they aren't giving out a lot of information. We do know that there are other changes in preemption though from what Zlatan has said previously. GCN 1.2 supports finer grained preemption than previous GCN, and Nvidia has stated that Pascal will have fine-grained preemption as well. I have seen that personally on a slide and I have no reason to doubt what Zlatan says about it.

monstercameron · Aug 25, 2015

VR Enthusiast said:
My above comment wasn't aimed at you in particular.

I agree that they aren't giving out a lot of information. We do know that there are other changes in preemption though from what Zlatan has said previously. GCN 1.2 supports finer grained preemption than previous GCN, and Nvidia has stated that Pascal will have fine-grained preemption as well. I have seen that personally on a slide and I have no reason to doubt what Zlatan says about it.

They do have programming guides for all their uarchs.

VR Enthusiast · Aug 25, 2015

monstercameron said:
They do have programming guides for all their uarchs.

Yes you can get any manual you'd need here - http://developer.amd.com/resources/documentation-articles/developer-guides-manuals/

AtenRa · Aug 25, 2015

Enigmoid said:
Here is AMD's gcn (1.0) whitepaper.

The CU front-end can decode and issue seven different types of instructions: branches, scalar ALU or memory, vector ALU, vector memory, local data share,
global data share or export, and special instructions. Only issue one instruction of each type can be issued at a time per SIMD, to avoid oversubscribing
the execution pipelines. To preserve in-order execution, each instruction must also come from a different wavefront; with 10 wavefronts for each SIMD, there
are typically many available to choose from. Beyond these two restrictions, any mix is allowed, giving the compiler plenty of freedom to issue instructions
for execution.

https://www.amd.com/Documents/GCN_Architecture_whitepaper.pdf

The pipeline is in-order.

This is only for the Compute Unit(CU), there are lots of CUs per GPU. HD7970 has 32x CUs.

What we want to know is how many commands the Command Processor can fetch per cycle. Im not 100% sure but i believe there is only a single command per cycle.
Then you have multiple instructions and then multiple wavefronts.

Abwx · Aug 25, 2015

Enigmoid said:
They published lots of things.

Many changes happened. You can read up and educate yourself.

.

Why should i educate myself..?.

To post constant bashing about a single brand..?.

I need no such education because there s a specialist hanging here and he knows better than me, and of course of you despite your denials when adressing him, at some point you should apply your own advices, all your previous post are innacurately quoting the infos provided by AMD...

Enigmoid · Aug 25, 2015

Abwx said:
Why should i educate myself..?.

To post constant bashing about a single brand..?.

I need no such education because there s a specialist hanging here and he knows better than me, and of course of you despite your denials when adressing him, at some point you should apply your own advices, all your previous post are innacurately quoting the infos provided by AMD...

Enough said.

96Firebird · Aug 25, 2015

Haha, great post...

I think the major problem here is people taking a random forum poster's posts as gospel. But I guess that is easier than educating yourself...

Keysplayr · Aug 25, 2015

Abwx said:
Why should i educate myself..?.

I really just read that... didn't I...

Silverforce11 · Aug 25, 2015

zlatan said:
Might be a misunderstanding here. Every multiprocessor use in-order logic in todays GPU. GCN just use out-of-order logic for the compute engines (ACEs).

Thank you for clarifying this important difference because reading up on the articles it is confusing when they say they can go out-of-order with the ACEs, but they can't do that with the CP/graphics?

Still, that's quite unique because it means GCN's compute can indeed run in parallel (leapfrog/bypass traffic blocks), no context switch required through the ACEs since they are separate pipelines. But it can only do this for compute tasks, hence the limited out-of-order and why it excels for VR. Makes sense now.

ps. Everyone should educate themselves, if they have an interest in uarchs. We are all indeed laymen.

tential · Aug 25, 2015

Keysplayr said:
I really just read that... didn't I...

I mean I know it's not possible, but I feel a statement like that really should be bannable on a forum like this. Pretty much defeats the purpose. Again, obviously not possible but..... Lol.... Wow. He just made it to my ignore list at least.

I hope everyone else does the same and don't bait into him.

LTC8K6 · Aug 25, 2015

http://amd-dev.wpengine.netdna-cdn..../07/AMD_GCN3_Instruction_Set_Architecture.pdf

There is a GCN 3 (1.2) paper.

Despoiler · Aug 25, 2015

Ok final set of benches updated for this build. What I would consider clean runs.

DX12 > DX11 on all benches. DX12 rips through everything that is thrown at it. Smooth all of the time. DX11 turns into a stutter fest the more it's pushed. As I've been saying, the next gen APIs give a superior user experience due to less FPS variance. It's not just about max fps anymore. Average FPS is higher because min fps is much higher.

Normal / Medium / Heavy Batches in % FPS increase
GPU bound +7, +21, +58%
CPU bound +242, +336, 408%

http://forums.anandtech.com/showpost.php?p=37652857&postcount=343
http://forums.anandtech.com/showpost.php?p=37652858&postcount=344
http://forums.anandtech.com/showpost.php?p=37652886&postcount=345
http://forums.anandtech.com/showpost.php?p=37652890&postcount=346

Ashes of the Singularity User Benchmarks Thread

Senior member

Platinum Member

Platinum Member

Lifer

Platinum Member

Lifer

Lifer

Platinum Member

Platinum Member

Member

Diamond Member

Platinum Member

Member

Member

Diamond Member

Member

Lifer

Lifer

Platinum Member

Diamond Member

Elite Member

Lifer

Diamond Member

Lifer

Golden Member