Ryzen: Strictly technical

dogen1 · Mar 2, 2017

MajinCry said:
Because it's useless. If any other draw calls are made by the game, the optimization fails. I repeat, this will not have any use in games.

http://enbseries.enbdev.com/forum/viewtopic.php?f=17&t=4869#p69741

For this to be used, the game must not have any lighting, any shadows, any shaders, any materials, any decals, and any other meshes.

You won't find this in games. Maybe if you go back to the 80s (i.e, Asteroids), but not even DOS era games would be applicable.

Sure, but what if there's a game that renders a bunch of identical objects in a row without batching them for some reason. I'm sure it's happened before. Game development is a mess.

This will at least make any games like that run faster.

The Stilt · Mar 2, 2017

PPB said:
Stilt, to make my motherboard choice easier, if I'm targetting a 3.5ghz allcore clock on a 1700 (the second critical point where beyond that voltage scales exponentially to clock ratio), what maximum sustained Amperage should i look after in vrm designs? Also, have you any hindsight in the vrm designs on b350 motherboards, considering the most powerful vrm designs tend to go in the highest end x370 boards? Thanks in advance.

Sent from my XT1040 using Tapatalk

The VRM requirements for those specs would be quite low, < 90A sustained current capability will be sufficient even for the worst-case workloads in non-ideal conditions.
I would look into ASUS offering as usual, mostly due to the software. PRIME B350-PLUS for example is more than enough for those specs.

MajinCry · Mar 2, 2017

dogen1 said:
Sure, but what if there's a game that renders a bunch of identical objects in a row without batching them for some reason. I'm sure it's happened before. Game development is a mess.

This will at least make any games like that run faster.

You ever played a game that doesn't have shadows, lighting, object materials and shaders, that only has one (duplicated) object for the entire scene?

dogen1 · Mar 2, 2017

MajinCry said:
You ever played a game that doesn't have shadows, lighting, object materials and shaders, that only has one (duplicated) object for the entire scene?

why would it need no shadows or lighting?

it's the changes between draws that prevent this "auto batching feature"

The Stilt · Mar 2, 2017

zir_blazer said:
Hi The Stilt, I just noticed you are still active, and, as usual, decided to ask you about some obscure features...

Do you know if AVIC (AMD counterpart to Intel APICv for APIC Virtualization, present in the HEDT platform since Ivy Bridge-E, but omitted entirely from consumer LGA 1155/1150/1151) is or will be present in all Ryzen parts? So far, thanks to a lscpu dump from the guy at Phoronix, I noticed that at least on the Ryzen 7 1800X the avic CPU Flag is present: http://openbenchmarking.org/system/1703021-RI-AMDZEN08075/Ryzen 7 1800X/lscpu

Other thing that I was interesed in is PCIe ACS (Access Control Services) support, both on Ryzen itself and the Chipset. It is a feature useful for PCI/VGA Passthrough on virtualized enviroments because it disables PCIe Peer-to-Peer data transfers and forces everything to go through the IOMMU, thus providing proper Device isolation (Else, its possible that due PCIe P2P they bypass it, which is not intended. Intel HEDT also has it, and again, its omitted on consumer Processors PCIe Controllers, although the consumer Chipsets do support it). Sadly, I have no idea how to specifically check support for ACS.
The usefulness of that would be to have an idea of how good the default IOMMU Grouping in Ryzen AM4 platform should be, for potential AM4 Passthrough users. More info here, if you're interesed: http://vfio.blogspot.com.ar/2014/08/iommu-groups-inside-and-out.html

If you can also get lspci -vvv and lspci -tv output from Linux in Ryzen, it would be even better, to know the platform in detail. There are also some minor features like FLR (Function Level Reset) on Chipset Devices which could also be useful to know if they're present or not. It would add a lot of flexibility if those features are supported.

AVIC is supported on all Ryzen SKUs.

Most of the stuff is handled through the SMN in Ryzen (completely different to anything previous), so frankly I have no idea how the virtualization stuff will behave.
I would expect it to be a complete nightmare until everything is fully supported. Not only due how the functions are accessed on Zeppelin but because of all the security. Zeppelin is more secure than Fort Knox.

Unfortunately at the moment I have no energy to even think about looking anything *nix related. If you want an advice, I would suggest you wait the software stack (firmwares, microcodes) to mature, until you even try anything as specific and niche (on consumer side) as this.

PPB · Mar 2, 2017

The Stilt said:
The VRM requirements for those specs would be quite low, < 90A sustained current capability will be sufficient even for the worst-case workloads in non-ideal conditions.
I would look into ASUS offering as usual, mostly due to the software. PRIME B350-PLUS for example is more than enough for those specs.

Good to know! I'm waiting for B350 ITX or B300 if that ever comes to a DIY motherboard. I'm considering 6 phases considering a minimum of 20A per phase without knowing the components (only really low end boards nowadays cant even get to 20A per phase).

PhonakV30 · Mar 2, 2017

http://www.hardware.fr/articles/956-22/retour-sous-systeme-memoire.html

www.hardware.fr said:
If AMD could not give us an idea of the latency of an access on the second CCX, it provided us with another much more important data: the bandwidth between these two CCX: only 22 GB

how to calculate bandwidth between these two CCX ?

MajinCry · Mar 2, 2017

dogen1 said:
why would it need no shadows or lighting?

it's the changes between draws that prevent this "auto batching feature"

That's what lighting and shadows do. They change the draws. With shadows, being a shadow caster increases the number of draws, as does having a shadow interact with another object.

Draw calls are issued per object. A single object issues around 3 or 4 draw calls, in an actual game renderer, even in a scene with no shadows and no lights. Those draws are also not identical.

Again, only in a synthetic benchmark will that faux instancing show any benefit.

itsmydamnation · Mar 2, 2017

lolfail9001 said:
Uhm, their inter-CCX bus only has 22GB/sec bandwidth?

Sorry, WHAT!?

That actually make perfect sense, the Stilit already told you fabric is tied to memory controller speed.

We also know that the L3's are victim caches for their local CCX.
Now we need to look at what the cache coherency protocol might look like, in generally in most coherency protocols that enable NUMA "Dirty lines" get flushed to memory to ensure stale data isn't accessed.

So how much bandwidth do you need? Exactly the amount of memory bandwidth one 72bit ECC DDR4 interface can give you. @ 2400 that lots of reviews run at 22GB/S is more then enough.

Its all about understanding the architecture.....................

dogen1 · Mar 2, 2017

MajinCry said:
That's what lighting and shadows do. They change the draws. With shadows, being a shadow caster increases the number of draws, as does having a shadow interact with another object.

Draw calls are issued per object. A single object issues around 3 or 4 draw calls, in an actual game renderer, even in a scene with no shadows and no lights. Those draws are also not identical.

Again, only in a synthetic benchmark will that faux instancing show any benefit.

Ok, lets just imagine a game that uses deferred shading, so it renders the geometry in one go before shading the whole screen at once. Now, if there's a game that does that but for some reason draws some identical geometries in a row, but without batching them for some reason, viola. This would help that hypothetical frankenstein of a game lol.

flash-gordon · Mar 2, 2017

The Stilt said:

Great work Stilt. This curve is paints a very nice potential for Ryzen on mobiles. Can you do one for ST?

Results from some mobile CPUs:

http://www.notebookcheck.net/Mobile-Processors-Benchmark-List.2436.0.html

otinane · Mar 2, 2017

That question could be a bit pre-mature, but out of all of these specs, on which could we expect they are going to focus on Zen II ?

MajinCry · Mar 2, 2017

dogen1 said:
Ok, lets just imagine a game that uses deferred shading, so it renders the geometry in one go before shading the whole screen at once. Now, if there's a game that does that but for some reason draws some identical geometries in a row, but without batching them for some reason, viola. This would help that hypothetical frankenstein of a game lol.

But that's the thing. It can't just be "some identical geometries". It has to be that all the geometries (or, like, a few hundred objects at least) are the same, for any practical benefit to be gained.

lolfail9001 · Mar 2, 2017

itsmydamnation said:
That actually make perfect sense, the Stilit already told you fabric is tied to memory controller speed.

It only makes perfect sense if you treat Ryzen as closeted MCM of 2 separate quad core single channel dies. Because otherwise, what is the point talking about 16MB of L3 when only 8 of these are accessible in sane amount of time (i.e. faster than memory).

itsmydamnation · Mar 2, 2017

lolfail9001 said:
It only makes perfect sense if you treat Ryzen as closeted MCM of 2 separate quad core single channel dies. Because otherwise, what is the point talking about 16MB of L3 when only 8 of these are accessible in sane amount of time (i.e. faster than memory).

Only from the perspective of a single core. This shouldn't be a big deal going forward, after all the consoles work just fine with a much worse solution between modules.

oneb1t · Mar 2, 2017

will there be any phenomMsrTweaker/amdmsrtweaker or similar program possible for ryzen? or its all locked out

it will be really nice to set just singlethread turbo to 4.2ghz and leave rest at normal frequency as multithread performance is good even on low frequency

The Stilt · Mar 2, 2017

lolfail9001 said:
It only makes perfect sense if you treat Ryzen as closeted MCM of 2 separate quad core single channel dies. Because otherwise, what is the point talking about 16MB of L3 when only 8 of these are accessible in sane amount of time (i.e. faster than memory).

There are many more fabrics in Zeppelin than just the data fabric.
I'd assume the inter-CCX fabric frequency is 4x DFICLK (i.e. 5333MHz @ 2666MHz DRAM), however I don't know it as a fact.

The Stilt · Mar 2, 2017

oneb1t said:
will there be any phenomMsrTweaker/amdmsrtweaker or similar program possible for ryzen? or its all locked out

it will be really nice to set just singlethread turbo to 4.2ghz and leave rest at normal frequency as multithread performance is good even on low frequency

There is very little use for such application, unfortunately.

Configuring the Turbo or XFR is impossible, at least for the time being.

Rifter · Mar 2, 2017

If i want to run zen at 4Ghz all cores how beefy of a MB do you think will be required? and how much more beefy for 4.2Ghz

The Stilt · Mar 2, 2017

Rifter said:
If i want to run zen at 4Ghz all cores how beefy of a MB do you think will be required? and how much more beefy for 4.2Ghz

There are no guarantees that you will be able to do that, regardless of the motherboard.
ASUS PRIME X370-PRO and Crosshair VI Hero are one of your best bets.

Rifter · Mar 2, 2017

The Stilt said:
There are no guarantees that you will be able to do that, regardless of the motherboard.
ASUS PRIME X370-PRO and Crosshair VI Hero are one of your best bets.

I realize that its luck of the draw with OC's, ive been overclocking for decades. I just wanted your opinion on how beefy the power phases needed to be to go up to 4.2. Im not building till summer anyways just putting my build together now and then will wait for the bugs to be worked out, then ill dive in.

looncraz · Mar 2, 2017

lolfail9001 said:
It only makes perfect sense if you treat Ryzen as closeted MCM of 2 separate quad core single channel dies. Because otherwise, what is the point talking about 16MB of L3 when only 8 of these are accessible in sane amount of time (i.e. faster than memory).

This is how software should treat it. DRAM as LLC... not what I expected at all.

This means memory frequency is all the more important. Ryzen may be more bandwidth sensitive than latency sensitive as a result.

I apparently won't have a motherboard for a couple weeks (ARGH!) thanks to Amazon's ineptness, otherwise I would test this. For now, I'm going to write an app to do just that.

Ajay · Mar 3, 2017

itsmydamnation said:
That actually make perfect sense, the Stilit already told you fabric is tied to memory controller speed.

We also know that the L3's are victim caches for their local CCX.
Now we need to look at what the cache coherency protocol might look like, in generally in most coherency protocols that enable NUMA "Dirty lines" get flushed to memory to ensure stale data isn't accessed.

So how much bandwidth do you need? Exactly the amount of memory bandwidth one 72bit ECC DDR4 interface can give you. @ 2400 that lots of reviews run at 22GB/S is more then enough.

Its all about understanding the architecture.....................

Basically, a server uarch being purposed as a client. The real problem, I suppose, is that AMD didn't have the dosh to design a 'native' 8-core processor. AMD needed an implementation that could be largely shared between client (raven ridge) and workstation/server (summit ridge) - hence the development of the CCX. Pretty good for a scratch design - Intel has been iterating on 'Core' for 10 years.

bjt2 · Mar 3, 2017

http://support.amd.com/TechDocs/AMD Ryzen Processor and AMD Ryzen Master Overclocking Users Guide.pdf

Here it is the AMD's master overclocking utility manual.
It specifies also the default behaviour. in particular on page 11, point 3 it is described the dLDO behaviour. So it seems that it is active into retail CPUs...
Only in OC mode the dLDOs enter in bypass mode.

imported_jjj · Mar 3, 2017

The Stilt said:

Curious about SMT's perf per W, would be nice if you could add that, guess 8C/8T would be easiest.

Ryzen: Strictly technical

Senior member

Golden Member

Platinum Member

Senior member

Golden Member

Golden Member

Senior member

Platinum Member

Diamond Member

Senior member

Member

Member

Platinum Member

Golden Member

Diamond Member

Junior Member

Golden Member

Golden Member

Lifer

Golden Member

Lifer

Senior member

Lifer

Senior member

Senior member