Ryzen: Strictly technical

looncraz · Mar 12, 2017

malventano said:
Interesting. Curious why AMD would not have informed MS of this with enough lead time to get the feature added to their scheduler prior to Ryzen's release. It is possible that MS reserves such updates for major releases, as it is a very low-level fix that requires lots of QC and testing.

Microsoft has known for over a year.

I would not be surprised if there were five different patches in the pipeline for Ryzen but they're all stuck in some committee run by people with managerial degrees and no idea of how to write software.

unseenmorbidity · Mar 12, 2017

Wouldn't an engineering sample have been sufficient for microsoft?

DisEnchantment · Mar 12, 2017

How does Windows 10 work in Non heterogeneous SoCs like big.LITTLE processors used for Tablets running WinRT?
The ARM world is not any better. How does Windows 10 cope with this? Although being RISC and the fact that power efficiency being the key feature, I suppose it can be considered an advantage.

lolfail9001 · Mar 12, 2017

DisEnchantment said:
How does Windows 10 cope with this?

Same way as it (and any other OS) copes with any funky topology: by having scheduler be aware of said topology.

DisEnchantment · Mar 12, 2017

lolfail9001 said:
Same way as it (and any other OS) copes with any funky topology: by having scheduler be aware of said topology.

Ok, so you are saying that MS WinRT Scheduler actually have big.LITTLE awareness? I was just curious if they really did something for myriads of implementations of ARM Architectures or simply never bothered.

lolfail9001 · Mar 12, 2017

DisEnchantment said:
Ok, so you are saying that MS WinRT Scheduler actually have big.LITTLE awareness? I was just curious if they really did something for the many myriads of ARM Architectures or simply never bothered.

I can't know, but from what i reading on linux implementation of support, MS would not have too much issues getting at least IKS right, and HMP is a gimmick anyways.

looncraz · Mar 12, 2017

malventano said:
With the addition of LFC to FreeSync panels that have a sufficient FPS range to support it, the playing field is mostly equal. The only real difference I've seen anymore is that most of the FreeSync panels still don't get overdrive as good as it could be (example 1 2 3), particularly when operating in the VRR range. G-Sync still does a better job at overdrive regardless of refresh rate. So long as you are ok with possibly imperfect overdrive, and you ensure the panel supports LFC, there's no longer a reason to have to jump ship on your GPU to get the panel that does what you want it to.

I guess the answer to your question is that there isn't a big reason to repeat the test, unless you can think of something more specific you were hoping to see from such a test?

And would anyone actually notice a difference with any of this without using a scope? I find it doubtful.

FreeSync's main advantage is the variety in the ecosystem - most/all new monitors (as in newly designed) will incorporate FreeSync as it's built-in to the hardware by the chip companies (can't remember the name of the chip, brain is wonky).

I've been playing games on my wife's 1080p monitor without FreeSync... it's been tortuous. I doubt moving from GSync to FreeSync will be something anyone would ever notice if not explicitly told they had done so.

looncraz · Mar 12, 2017

lolfail9001 said:
Ah, confirmation bias, i missed the mostly part in wikichip link.

Question arises, in what sense is it "mostly".

Some reasons it would be "mostly exclusive":

1. Because there's not a strict policy of flushing the L3 lines that are fetched by an L2, thereby multiple copies of the same data exist.

2. Data requests from the other CCX or another core within the CCX result in L2 data being included in the L3 (global data optimization).

lolfail9001 · Mar 12, 2017

looncraz said:
FreeSync's main advantage is the variety in the ecosystem - most/all new monitors (as in newly designed) will incorporate FreeSync as it's built-in to the hardware by the chip companies (can't remember the name of the chip, brain is wonky).

Scaler? Anyways i think you're kind of agreeing with him.

looncraz said:
I've been playing games on my wife's 1080p monitor without FreeSync... it's been tortuous. I doubt moving from GSync to FreeSync will be something anyone would ever notice if not explicitly told they had done so.

I would argue moving from GSync to Freesync would be noticeable if you kept the Nvidia card

looncraz said:
1. Because there's not a strict policy of flushing the L3 lines that are fetched by an L2, thereby multiple copies of the same data exist.

That makes perfect sense, actually...

looncraz said:
2. Data requests from the other CCX or another core within the CCX result in L2 data being included in the L3 (global data optimization).

That sounds kind of tricky to implement right with a mostly eviction cache. And considering the fabric debacle, i do not trust AMD to bother with it either

CatMerc · Mar 12, 2017

looncraz said:
And would anyone actually notice a difference with any of this without using a scope? I find it doubtful.

FreeSync's main advantage is the variety in the ecosystem - most/all new monitors (as in newly designed) will incorporate FreeSync as it's built-in to the hardware by the chip companies (can't remember the name of the chip, brain is wonky).

I've been playing games on my wife's 1080p monitor without FreeSync... it's been tortuous. I doubt moving from GSync to FreeSync will be something anyone would ever notice if not explicitly told they had done so.

And another thing to note is that this is a panel issue more than anything.
My AOC G2460PF without any overdrive settings has next to no ghosting. There's only a tiny tiny amount that is only visible when I stick my eyes right up to the monitor, and that's only when I crank up the windmill speed to the max.

looncraz · Mar 12, 2017

unseenmorbidity said:
Wouldn't an engineering sample have been sufficient for microsoft?

They wouldn't have needed even that. The Linux kernel changes would have given them all of the information they needed.

All that really needs to happen is to create two NUMA nodes, four cores, 8 threads, and a modified resistance value before moving a thread to another node since the penalty for doing so is easily 1/4 of what it would be in a multi-socket system (while still being 10X greater than not using NUMA nodes).

itsmydamnation · Mar 12, 2017

lolfail9001 said:
Ah, confirmation bias, i missed the mostly part in wikichip link.

Question arises, in what sense is it "mostly".

the L3 holds tag data for the L2's below it so it can act as a probe filter.

Ajay · Mar 12, 2017

lolfail9001 said:
Ah, confirmation bias, i missed the mostly part in wikichip link.

Question arises, in what sense is it "mostly".

Exactly my question! I can't find out how or what the L3$ cache logic decides to include.

Ajay · Mar 12, 2017

itsmydamnation said:
the L3 holds tag data for the L2's below it so it can act as a probe filter.

Just curious if you have a source link for that?

Elixer · Mar 12, 2017

looncraz said:
Microsoft has known for over a year.

I would not be surprised if there were five different patches in the pipeline for Ryzen but they're all stuck in some committee run by people with managerial degrees and no idea of how to write software.

They work on MS time, and QC & QA takes a long, long time.
The next release of win 10 is also feature locked, so, I don't think they would be doing it for the 'Creators update' either, unless the insider preview already was changed.
Need more people to use Ryzen with the insider preview builds, so MS can get more external testers they can push patches to.

malventano · Mar 12, 2017

lolfail9001 said:
I believe Kyle has posted that his gaming benchmarks did not change one iota after trying that. Maybe it's broken in Win10 too .
http://www.hardocp.com/news/2017/03/03/ryzen_smt_groupaware_groupsize_settings

groupsize tweaks work as expected under Win 10, but do note that any given (non NUMA-aware) application will be restricted to half of the total logical cores (one NUMA node).

looncraz · Mar 12, 2017

Anyone know why RDTSC calls are cheaper from VMWare Player than they are on real hardware?

looncraz · Mar 12, 2017

Elixer said:
They work on MS time, and QC & QA takes a long, long time.
The next release of win 10 is also feature locked, so, I don't think they would be doing it for the 'Creators update' either, unless the insider preview already was changed.
Need more people to use Ryzen with the insider preview builds, so MS can get more external testers they can push patches to.

You can apply fixes to feature-locked branches in most projects

This would certainly be considered a patch.

Kromaatikse · Mar 12, 2017

DisEnchantment said:
How does Windows 10 work in Non heterogeneous SoCs like big.LITTLE processors used for Tablets running WinRT?

Probably by treating it as a pure power-management problem. In low-power state, the "little" core is used; in high-performance state, the "big" core is used. If you only use one at a time, there is no special scheduling problem to solve.

piesquared · Mar 12, 2017

Kromaatikse said:
Probably by treating it as a pure power-management problem. In low-power state, the "little" core is used; in high-performance state, the "big" core is used. If you only use one at a time, there is no special scheduling problem to solve.

What to developers do for the PS4 with it's split cache?

itsmydamnation · Mar 12, 2017

Ajay said:
Just curious if you have a source link for that?

its covered in the ICCSS presentation:

Kromaatikse · Mar 12, 2017

piesquared said:
What to developers do for the PS4 with it's split cache?

The PS4 is a known optimisation target, so any performance quirks there can be handled directly. I'm not aware of any particular bottleneck between L2 caches with Jaguar, but they would probably try to keep threads sharing data on the same cache.

Also, the PS4 doesn't run Windows, so its process scheduler almost certainly isn't insane. That always helps.

malventano · Mar 12, 2017

looncraz said:
And would anyone actually notice a difference with any of this without using a scope? I find it doubtful.

You mean without LFC? It's painfully noticable once you dip below the lower VRR limit of the panel. Not as much of an issue if that number is as low as 30 Hz, but some panels have bottom ends that are far higher than that.

deadhand · Mar 12, 2017

Kromaatikse said:
The PS4 is a known optimisation target, so any performance quirks there can be handled directly. I'm not aware of any particular bottleneck between L2 caches with Jaguar, but they would probably try to keep threads sharing data on the same cache.

Also, the PS4 doesn't run Windows, so its process scheduler almost certainly isn't insane. That always helps.

https://forums.anandtech.com/threads/ryzen-strictly-technical.2500572/page-11#post-38776963

There is a fairly severe latency bottleneck when accessing L2 caches on the opposite quad core module on the PS4. On the PS4 die there is a fairly large physical gap between the quad core modules.
Or do you mean in terms of programming for the PS4?

Ajay · Mar 12, 2017

itsmydamnation said:
its covered in the ICCSS presentation:

Thanks.

Ryzen: Strictly technical

Senior member

Golden Member

Golden Member

Golden Member

Golden Member

Golden Member

Senior member

Senior member

Golden Member

Golden Member

Senior member

Diamond Member

Lifer

Lifer

Lifer

Junior Member

Senior member

Senior member

Member

Golden Member

Diamond Member

Member

Junior Member

Junior Member

Lifer