Ryzen: Strictly technical

Page 44 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Status
Not open for further replies.

looncraz

Senior member
Sep 12, 2011
722
1,651
136
This seems to be fake, probably pinned manually. If I create 8 threads on my Ryzen, it still distributes them over all logical cores. It only looks like it is trying to avoid SMT core, but the spikes are all over the place.

Edit: This is what it looks like with 8 Prime95 threads for me:

XXjwGF6.png

This is because you are using high performance mode. Don't do that. You need core parking enabled to park those logical cores when they do more harm than good.
 
  • Like
Reactions: Drazick

looncraz

Senior member
Sep 12, 2011
722
1,651
136
Why does the BCLK fluctuate when you OC with the multiplier?

Ryzen (and Excavator before it) will "stretch" the clock to even out power demand over short intervals to improve power usage and stability on lower core voltage at higher clocks. The only way you'll see this is by some jitters in the base clock as the multiplier doesn't change - the stretch can be as much as 7%, but only lasts about a millisecond (it basically reacts to instantaneous Vdroop).

AMD now calls this "Pure Power," now, I think.
 

plopke

Senior member
Jan 26, 2010
238
74
101
Some are saying the update is only part of the latest Fast Ring Windows Insider Build. I'm calling BS on this since there's no documentation anywhere about any changes to the scheduler. Just a forum post on some Chinese forum.

KB4015438 is the latest regular Windows update, and it just replaces KB4013429.



Changing the scheduler to work better with Ryzen would be a "key change", no?
For all we know microsoft did not change anything except update some drivers , i vaguely remember amd stating the windows ryzen driver would be updated like a month or so after release.
 

Ajay

Lifer
Jan 8, 2001
15,537
7,905
136
Looks normal to me, basically load being spread on 1 physical core per thread. I would say it is painfully obvious even.

You can't and won't get rid of thread migration in Windows without manual affinity setting, so forget it.

Since power management is SMT aware, maybe MS will do something there that helps there. At this point I'm kinda done with the lack of knowledge about vis-a-vis details of Ryzen's cache/DF/memory subsystems. I'll wait till the next stepping of Zeppelin shows up and take a look again (hopefully there will be some tweaks to the IMC and a few other features).
 

Ajay

Lifer
Jan 8, 2001
15,537
7,905
136
For all we know microsoft did not change anything except update some drivers , i vaguely remember amd stating the windows ryzen driver would be updated like a month or so after release.

Like an ACPI device driver?

Edit: duh.
 
Last edited:

bjt2

Senior member
Sep 11, 2016
784
180
86
Some are saying the update is only part of the latest Fast Ring Windows Insider Build. I'm calling BS on this since there's no documentation anywhere about any changes to the scheduler. Just a forum post on some Chinese forum.

KB4015438 is the latest regular Windows update, and it just replaces KB4013429.



Changing the scheduler to work better with Ryzen would be a "key change", no?

I have an old laptop with celeron dual core and intel 4 series chipset, and this update cut in less than half the time to recover from hybernation. No other enhancement on my old system, but still noticeable...
 

iBoMbY

Member
Nov 23, 2016
175
103
86
This is because you are using high performance mode. Don't do that. You need core parking enabled to park those logical cores when they do more harm than good.

Why would anyone want to enable core parking? Core Parking is known to cause all kinds of problems, and has almost no benefit at all.
 

R0H1T

Platinum Member
Jan 12, 2013
2,582
162
106
Why would anyone want to enable core parking? Core Parking is known to cause all kinds of problems, and has almost no benefit at all.
Core parking works if/when you want to save power, it's mostly useless otherwise, even for Intel processors. The fact that something like a process lasso does a better job at handling core parking & unparking, not to mention setting affinities or even I/O & memory priority, than Windows itself is testament to the shellacking Microsoft's done to their OS. The Android governors on the other hand are much better at handling multiple cores, thanks in part due to big little, remember how octa cores were ridiculed on phones & yet Windows is the one that can't handle 8 cores properly.
 
  • Like
Reactions: french toast

Dresdenboy

Golden Member
Jul 28, 2003
1,730
554
136
citavia.blog.de
Dresdenboy, what is the bandwidth between CCX?
Have to answer again. Somehow you look familar, BTW. ;)

iBoMbY made the aforementioned list.
DDR4-2666: 41.7 GB/s
DDR4-3000: 46.9 GB/s
DDR4-3200: 50.0 GB/s
DDR4-3400: 53.1 GB/s
DDR4-3600: 56.3 GB/s
DDR4-4000: 62.5 GB/s
It's not exactly, what one gets in GB/s by calculating mem clock [GHz] * 32 because of a GB being 1.074 billion bytes, but it gives a good impression.
 
  • Like
Reactions: Malogeek

TerionX6

Junior Member
Jun 29, 2015
14
20
46
Zen/Zeppelin seem obviously geared towards absolute power savings. The decision to tightly couple the IMCs and uncore was a move for efficiency of data transfer, and this appears to corroborate with Zen's considerably higher actual vs. theoretical sustained transfer compared to Intel's products. I think that just about every piece of that silicon was designed with a goal of meeting or exceeding the power characteristics, with few exceptions, Intel is currently capable of while still on an inferior node. This has created a chip with great potential in deployments scaling from a workstation to massive clusters, great potential going forward.
Zen is AMD's Core2/Nehalem in a sense.

My assumptions:
The GMI/IMC (and coupled PCIe/BCLK strap) is clearly the major limiting factor for this design for home users. Zen2 will again not "win" in the home/gamer market unless AMD had planned on significantly revamping the chips clock domains.
Zen2 certainly won't be on 14LPU (at first), much too late in the design phase. No easy 10% clock improvement there.
2x inter-CCX > 1x IMC speeds may be a pipe dream for Zen F4 stepping, but man would that solve a heck of a lot of latency issues (while skyrocketing uncore power draw)

The Stilt, have you gotten your hands on any other 1800x samples to compare Vmin/Fmax? Would be interesting to see possible ranges of binning for a single SKU.
 

SpaceBeer

Senior member
Apr 2, 2016
307
100
116
What about Raven Ridge CPUs (APUs)? They will have only one CCX (+ up to 11 GPU CUs). So there will be no "issues" with CCX communication

So those SKUs with less GPU CUs might be better choice for gaming?
 

looncraz

Senior member
Sep 12, 2011
722
1,651
136
Why would anyone want to enable core parking? Core Parking is known to cause all kinds of problems, and has almost no benefit at all.

Core parking is absolutely vital for Windows scheduling with SMT. Windows' scheduler is likely completely unaware of SMT or much of anything else. Everything is done with external processes/daemons simply creating and sorting lists which the scheduler uses.

Core parking will preferentially park logical cores - giving the scheduler the appearance of being SMT aware.
Without it, you'll never see dual core turbo clocks or XFR on Ryzen and you'll see logical cores being used as if they're real cores.

This is why some people are seeing a sudden improvement in a few cases with the new Windows update - core parking was re-enabled and they didn't notice.
 
  • Like
Reactions: Drazick

looncraz

Senior member
Sep 12, 2011
722
1,651
136
Windows scheduler seems perfectly aware of Ryzen's SMT.

It's applications doing their own scheduling that aren't.

Applications can't do their own scheduling (with the exception of those rare kernel-on-kernel designs), they spawn threads and apportion their workload amongst those threads. Some apps are simply spawning too many worker threads and seeing negative scaling as a result because they assume AMD has no SMT and treat it like it has 16 cores.

Some apps may use thread affinity to prevent threads from roaming around, but most don't bother.
 
  • Like
Reactions: Drazick and IEC

rvborgh

Member
Apr 16, 2014
195
94
101
Applications can't do their own scheduling (with the exception of those rare kernel-on-kernel designs), they spawn threads and apportion their workload amongst those threads. Some apps are simply spawning too many worker threads and seeing negative scaling as a result because they assume AMD has no SMT and treat it like it has 16 cores.

Some apps may use thread affinity to prevent threads from roaming around, but most don't bother.

i wonder how many games use User Mode Scheduling.

https://msdn.microsoft.com/en-us/library/windows/desktop/dd627187(v=vs.85).aspx
 

looncraz

Senior member
Sep 12, 2011
722
1,651
136
  • Like
Reactions: Drazick and rvborgh

beginner99

Diamond Member
Jun 2, 2009
5,211
1,581
136
I always forget about UMS. Not sure how commonly it would be used by games - considering how poorly most scale with core, I'd imagine not that often (since the investment in implementing its use would not be worthwhile unless you are using a task-base processing model).

And it seems to be 64-bit only. I never really bothered to check but how many games actually are 64-bit only nowadays? Ok, probably most but it is a rather new development. Are the game engines pure 64-bit? Don't know.
 
  • Like
Reactions: looncraz

Kromaatikse

Member
Mar 4, 2017
83
169
56
I know of at least one game which is definitely 64-bit only.

By contrast, The Talos Principle has 32-bit and 64-bit binaries included.
 

imported_jjj

Senior member
Feb 14, 2009
660
430
136
Wouldn't it be fun if AMD had their ARM version of Zen on existing sockets? In both consumer and server.
Would be so much easier to sell ARM, especially in consumer and maybe they could even get Microsoft on board.
TSMC N7 HPC should be ready for tapeouts in June 2017 so a nice 8-12 ARM cores for AM4 next year could be quite interesting.

They wouldn't sell much in consumer but it's important to have ARM as a deskstop platform, otherwise ARM in server remains an uphill battle.
Using the existing infrastructure (sockets) would reduce costs for both AMD and customers.
Ofc it would be a substantial financial effort, unless they already need the ARM core on TSMC's N7 HPC for semi-custom.
 
Status
Not open for further replies.