Ryzen: Strictly technical

Page 3 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Status
Not open for further replies.

dogen1

Senior member
Oct 14, 2014
739
40
91
Because it's useless. If any other draw calls are made by the game, the optimization fails. I repeat, this will not have any use in games.

http://enbseries.enbdev.com/forum/viewtopic.php?f=17&t=4869#p69741


For this to be used, the game must not have any lighting, any shadows, any shaders, any materials, any decals, and any other meshes.

You won't find this in games. Maybe if you go back to the 80s (i.e, Asteroids), but not even DOS era games would be applicable.

Sure, but what if there's a game that renders a bunch of identical objects in a row without batching them for some reason. I'm sure it's happened before. Game development is a mess.

This will at least make any games like that run faster.
 

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
Stilt, to make my motherboard choice easier, if I'm targetting a 3.5ghz allcore clock on a 1700 (the second critical point where beyond that voltage scales exponentially to clock ratio), what maximum sustained Amperage should i look after in vrm designs? Also, have you any hindsight in the vrm designs on b350 motherboards, considering the most powerful vrm designs tend to go in the highest end x370 boards? Thanks in advance.

Sent from my XT1040 using Tapatalk

The VRM requirements for those specs would be quite low, < 90A sustained current capability will be sufficient even for the worst-case workloads in non-ideal conditions.
I would look into ASUS offering as usual, mostly due to the software. PRIME B350-PLUS for example is more than enough for those specs.
 
  • Like
Reactions: Drazick

MajinCry

Platinum Member
Jul 28, 2015
2,495
571
136
Sure, but what if there's a game that renders a bunch of identical objects in a row without batching them for some reason. I'm sure it's happened before. Game development is a mess.

This will at least make any games like that run faster.

You ever played a game that doesn't have shadows, lighting, object materials and shaders, that only has one (duplicated) object for the entire scene?
 
  • Like
Reactions: Makaveli and .vodka

dogen1

Senior member
Oct 14, 2014
739
40
91
You ever played a game that doesn't have shadows, lighting, object materials and shaders, that only has one (duplicated) object for the entire scene?

why would it need no shadows or lighting?

it's the changes between draws that prevent this "auto batching feature"
 

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
Hi The Stilt, I just noticed you are still active, and, as usual, decided to ask you about some obscure features...

Do you know if AVIC (AMD counterpart to Intel APICv for APIC Virtualization, present in the HEDT platform since Ivy Bridge-E, but omitted entirely from consumer LGA 1155/1150/1151) is or will be present in all Ryzen parts? So far, thanks to a lscpu dump from the guy at Phoronix, I noticed that at least on the Ryzen 7 1800X the avic CPU Flag is present: http://openbenchmarking.org/system/1703021-RI-AMDZEN08075/Ryzen 7 1800X/lscpu

Other thing that I was interesed in is PCIe ACS (Access Control Services) support, both on Ryzen itself and the Chipset. It is a feature useful for PCI/VGA Passthrough on virtualized enviroments because it disables PCIe Peer-to-Peer data transfers and forces everything to go through the IOMMU, thus providing proper Device isolation (Else, its possible that due PCIe P2P they bypass it, which is not intended. Intel HEDT also has it, and again, its omitted on consumer Processors PCIe Controllers, although the consumer Chipsets do support it). Sadly, I have no idea how to specifically check support for ACS.
The usefulness of that would be to have an idea of how good the default IOMMU Grouping in Ryzen AM4 platform should be, for potential AM4 Passthrough users. More info here, if you're interesed: http://vfio.blogspot.com.ar/2014/08/iommu-groups-inside-and-out.html

If you can also get lspci -vvv and lspci -tv output from Linux in Ryzen, it would be even better, to know the platform in detail. There are also some minor features like FLR (Function Level Reset) on Chipset Devices which could also be useful to know if they're present or not. It would add a lot of flexibility if those features are supported.

AVIC is supported on all Ryzen SKUs.

Most of the stuff is handled through the SMN in Ryzen (completely different to anything previous), so frankly I have no idea how the virtualization stuff will behave.
I would expect it to be a complete nightmare until everything is fully supported. Not only due how the functions are accessed on Zeppelin but because of all the security. Zeppelin is more secure than Fort Knox.

Unfortunately at the moment I have no energy to even think about looking anything *nix related. If you want an advice, I would suggest you wait the software stack (firmwares, microcodes) to mature, until you even try anything as specific and niche (on consumer side) as this.
 

PPB

Golden Member
Jul 5, 2013
1,118
168
106
The VRM requirements for those specs would be quite low, < 90A sustained current capability will be sufficient even for the worst-case workloads in non-ideal conditions.
I would look into ASUS offering as usual, mostly due to the software. PRIME B350-PLUS for example is more than enough for those specs.

Good to know! I'm waiting for B350 ITX or B300 if that ever comes to a DIY motherboard. I'm considering 6 phases considering a minimum of 20A per phase without knowing the components (only really low end boards nowadays cant even get to 20A per phase).
 

MajinCry

Platinum Member
Jul 28, 2015
2,495
571
136
why would it need no shadows or lighting?

it's the changes between draws that prevent this "auto batching feature"

That's what lighting and shadows do. They change the draws. With shadows, being a shadow caster increases the number of draws, as does having a shadow interact with another object.

Draw calls are issued per object. A single object issues around 3 or 4 draw calls, in an actual game renderer, even in a scene with no shadows and no lights. Those draws are also not identical.

Again, only in a synthetic benchmark will that faux instancing show any benefit.
 
  • Like
Reactions: KTE

itsmydamnation

Platinum Member
Feb 6, 2011
2,764
3,131
136
Uhm, their inter-CCX bus only has 22GB/sec bandwidth?

Sorry, WHAT!?
That actually make perfect sense, the Stilit already told you fabric is tied to memory controller speed.

We also know that the L3's are victim caches for their local CCX.
Now we need to look at what the cache coherency protocol might look like, in generally in most coherency protocols that enable NUMA "Dirty lines" get flushed to memory to ensure stale data isn't accessed.

So how much bandwidth do you need? Exactly the amount of memory bandwidth one 72bit ECC DDR4 interface can give you. @ 2400 that lots of reviews run at 22GB/S is more then enough.

Its all about understanding the architecture.....................
 
Last edited:
  • Like
Reactions: looncraz and .vodka

dogen1

Senior member
Oct 14, 2014
739
40
91
That's what lighting and shadows do. They change the draws. With shadows, being a shadow caster increases the number of draws, as does having a shadow interact with another object.

Draw calls are issued per object. A single object issues around 3 or 4 draw calls, in an actual game renderer, even in a scene with no shadows and no lights. Those draws are also not identical.

Again, only in a synthetic benchmark will that faux instancing show any benefit.


Ok, lets just imagine a game that uses deferred shading, so it renders the geometry in one go before shading the whole screen at once. Now, if there's a game that does that but for some reason draws some identical geometries in a row, but without batching them for some reason, viola. This would help that hypothetical frankenstein of a game lol.
 

flash-gordon

Member
May 3, 2014
123
34
101
9oVGc83.png
Great work Stilt. This curve is paints a very nice potential for Ryzen on mobiles. Can you do one for ST?

Results from some mobile CPUs:

Gx9kdpc.png

http://www.notebookcheck.net/Mobile-Processors-Benchmark-List.2436.0.html
 
  • Like
Reactions: Bubbleawsome

otinane

Member
Oct 13, 2016
68
13
36
That question could be a bit pre-mature, but out of all of these specs, on which could we expect they are going to focus on Zen II ?
 

MajinCry

Platinum Member
Jul 28, 2015
2,495
571
136
Ok, lets just imagine a game that uses deferred shading, so it renders the geometry in one go before shading the whole screen at once. Now, if there's a game that does that but for some reason draws some identical geometries in a row, but without batching them for some reason, viola. This would help that hypothetical frankenstein of a game lol.

But that's the thing. It can't just be "some identical geometries". It has to be that all the geometries (or, like, a few hundred objects at least) are the same, for any practical benefit to be gained.
 

lolfail9001

Golden Member
Sep 9, 2016
1,056
353
96
That actually make perfect sense, the Stilit already told you fabric is tied to memory controller speed.
It only makes perfect sense if you treat Ryzen as closeted MCM of 2 separate quad core single channel dies. Because otherwise, what is the point talking about 16MB of L3 when only 8 of these are accessible in sane amount of time (i.e. faster than memory).
 

itsmydamnation

Platinum Member
Feb 6, 2011
2,764
3,131
136
It only makes perfect sense if you treat Ryzen as closeted MCM of 2 separate quad core single channel dies. Because otherwise, what is the point talking about 16MB of L3 when only 8 of these are accessible in sane amount of time (i.e. faster than memory).

Only from the perspective of a single core. This shouldn't be a big deal going forward, after all the consoles work just fine with a much worse solution between modules.
 
  • Like
Reactions: ZGR and looncraz

oneb1t

Junior Member
Mar 2, 2017
1
1
36
will there be any phenomMsrTweaker/amdmsrtweaker or similar program possible for ryzen? or its all locked out

it will be really nice to set just singlethread turbo to 4.2ghz and leave rest at normal frequency as multithread performance is good even on low frequency
 
  • Like
Reactions: KTE

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
It only makes perfect sense if you treat Ryzen as closeted MCM of 2 separate quad core single channel dies. Because otherwise, what is the point talking about 16MB of L3 when only 8 of these are accessible in sane amount of time (i.e. faster than memory).

There are many more fabrics in Zeppelin than just the data fabric.
I'd assume the inter-CCX fabric frequency is 4x DFICLK (i.e. 5333MHz @ 2666MHz DRAM), however I don't know it as a fact.
 

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
will there be any phenomMsrTweaker/amdmsrtweaker or similar program possible for ryzen? or its all locked out

it will be really nice to set just singlethread turbo to 4.2ghz and leave rest at normal frequency as multithread performance is good even on low frequency

There is very little use for such application, unfortunately.

Configuring the Turbo or XFR is impossible, at least for the time being.
 
  • Like
Reactions: Drazick

Rifter

Lifer
Oct 9, 1999
11,522
751
126
If i want to run zen at 4Ghz all cores how beefy of a MB do you think will be required? and how much more beefy for 4.2Ghz
 

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
If i want to run zen at 4Ghz all cores how beefy of a MB do you think will be required? and how much more beefy for 4.2Ghz

There are no guarantees that you will be able to do that, regardless of the motherboard.
ASUS PRIME X370-PRO and Crosshair VI Hero are one of your best bets.
 

Rifter

Lifer
Oct 9, 1999
11,522
751
126
There are no guarantees that you will be able to do that, regardless of the motherboard.
ASUS PRIME X370-PRO and Crosshair VI Hero are one of your best bets.

I realize that its luck of the draw with OC's, ive been overclocking for decades. I just wanted your opinion on how beefy the power phases needed to be to go up to 4.2. Im not building till summer anyways just putting my build together now and then will wait for the bugs to be worked out, then ill dive in.
 

looncraz

Senior member
Sep 12, 2011
722
1,651
136
It only makes perfect sense if you treat Ryzen as closeted MCM of 2 separate quad core single channel dies. Because otherwise, what is the point talking about 16MB of L3 when only 8 of these are accessible in sane amount of time (i.e. faster than memory).

This is how software should treat it. DRAM as LLC... not what I expected at all.

This means memory frequency is all the more important. Ryzen may be more bandwidth sensitive than latency sensitive as a result.

I apparently won't have a motherboard for a couple weeks (ARGH!) thanks to Amazon's ineptness, otherwise I would test this. For now, I'm going to write an app to do just that.
 

Ajay

Lifer
Jan 8, 2001
15,431
7,849
136
That actually make perfect sense, the Stilit already told you fabric is tied to memory controller speed.

We also know that the L3's are victim caches for their local CCX.
Now we need to look at what the cache coherency protocol might look like, in generally in most coherency protocols that enable NUMA "Dirty lines" get flushed to memory to ensure stale data isn't accessed.

So how much bandwidth do you need? Exactly the amount of memory bandwidth one 72bit ECC DDR4 interface can give you. @ 2400 that lots of reviews run at 22GB/S is more then enough.

Its all about understanding the architecture.....................

Basically, a server uarch being purposed as a client. The real problem, I suppose, is that AMD didn't have the dosh to design a 'native' 8-core processor. AMD needed an implementation that could be largely shared between client (raven ridge) and workstation/server (summit ridge) - hence the development of the CCX. Pretty good for a scratch design - Intel has been iterating on 'Core' for 10 years.
 
  • Like
Reactions: KTE and Nothingness
Status
Not open for further replies.