Speculation: Ryzen 4000 series/Zen 3

Olikan · Apr 17, 2020

Gideon said:
Which is why i believe they will increase the L2 cache to 1MB

Me too, actually i think zen3 might have a two L2, instruction and data... similar to ibm z13

L2i as a traditional cache, while L2d cache will store the full L1D, and left overs from the uop cache (see that patent "virtualizing micro-op cache")

DisEnchantment · Apr 18, 2020

Running out of ideas to speculate.

There is nothing from Zen3, no leaks nothing.
Too much time I have these days. Been staying indoors for way too much time. I water my plants many times a day.

NostaSeronx · Apr 18, 2020

DisEnchantment said:
Running out of ideas to speculate.

I'll drop some speculation...
New FPU ISA extension that breaks x87 through AVX512 compatibility. It should however be a one stop shop; killing the need to do hybrid SSE2/SSE4.1/AVX2/AVX512(REX, VEX, EVEX // 128/256/512 codepaths) for media. AMD needs to make their FPU instruction set more competitive with ARM(SVEx)/RISC-V(RVV). Since, Intel is killing servers with garbage server CPUs with a garbage broken AVX3(Units and ISA, completely borked design).

1. Reduce fetch/decode blob (Multi-prefixes to support means more power)
2. Increase throughput (1 decoder for any SIMD/Vec Length)
3. No application with multiple generations of FPU ISAs from now on, etc.

Another decade, another time AMD saves x86.

soresu · Apr 18, 2020

DisEnchantment said:
Running out of ideas to speculate.

There is nothing from Zen3, no leaks nothing.
Too much time I have these days. Been staying indoors for way too much time. I water my plants many times a day.

I feel for you, I don't get along that well with family - starting to come loose at the hinges here.

Things we might see are ML specific/optimised instructions, Bfloat16 support, 1 bit instruction support for Binarised Neural Networks (BNN's) which run faster and with a lower memory footprint than 8-32 bit based DNN's.

Way back AMD hired John Gustafson (of Gustafson's Law) who invented Unums with the potential to replace floating point as a more efficient format - yet he subsequently left with even less fanfare than Keller or Koduri, makes me wonder if that was a key future objective.

Transactional memory extensions are another possibility too - ARM published their own TM spec called TME at the same time as SVE2 early last year, so now would be as good a time as any for AMD to embrace it before ARM chips supporting it begin pouring out of Sophia, Cambridge and Austin (and the custom ones like Marvell Triton too of course).

darkswordsman17 · Apr 21, 2020

DisEnchantment said:
Running out of ideas to speculate.

There is nothing from Zen3, no leaks nothing.
Too much time I have these days. Been staying indoors for way too much time. I water my plants many times a day.

Well, sure since we've officially settled that Zen 3 will have SMT4...

(Careful what you wish for!)

Not that it matters, what with the impending release of an ARM Megachip (its a NeuralNet Processor, a learning comput-ah!). I heard its got like 128 cores, with double the per core performance of Zen 5, and same clocks, with SMT64, and AVX9216 (its over 9000!). All in -5W (that's right, it actually provides power as it runs). Produced on GF's .0001 Nanumeter process. Its based on Bulldozer core.

NTMBK · Apr 21, 2020

darkswordsman17 said:
Well, sure since we've officially settled that Zen 3 will have SMT4...

(Careful what you wish for!)

Not that it matters, what with the impending release of an ARM Megachip (its a NeuralNet Processor, a learning comput-ah!). I heard its got like 128 cores, with double the per core performance of Zen 5, and same clocks, with SMT64, and AVX9216 (its over 9000!). All in -5W (that's right, it actually provides power as it runs). Produced on GF's .0001 Nanumeter process. Its based on Bulldozer core.

You forgot the FDSOI

NostaSeronx · Apr 21, 2020

NTMBK said:
You forgot the FDSOI

Still in progress.

Movement from CMT(One core per thread) to DMT(Many cores per many threads to many cores tied to a single thread)

^Closest existing architecture to the threading style is P9.

Current-gen 22FDX(Traditional channel lattice/gate-first/12nm soi thickness/20nm BOX) to Next-gen 22FDX or 12FDX(Novel channel superlattice/gate-last/5nm soi thickness/15nm BOX) // soi thickness is the layer above buried oxide(box), current gen is thinned to ~8.5nm which is far from the required sub-6nm for High Performance.

Only solution that I have gathered from small snip-it texts.
Single module
Next-gen graphics
AI/Vision engine
LPDDR memory only

Anything K8-2 and K10 related is pretty much locked to GlobalFoundries;
K8-2 https://patents.google.com/patent/US6553482B1 expired, but Current Assignee: GlobalFoundries Inc
K8-2 https://patents.google.com/patent/US6240503B1 expired, but Current Assignee: Advanced Micro Devices Inc
K10 https://patents.google.com/patent/US7793080B2 not-expired, but Current Assignee: GlobalFoundries Inc

https://patents.google.com/patent/US6240503B1 =>
"According to one embodiment, address generation units 34A and 34C are used for load memory operations and address generation units 34B and 34D are used for store memory operations. Functional units 32A and 32D are integer functional units configured to perform integer arithmetic/logical operations and execute branch instructions. Functional units 32B and 32E are multimedia execution units configured to execute multimedia instructions, and functional units 32C and 32F ate floating point units configured to execute floating point instructions."

US6553482B1 - Universal dependency vector/queue entry - Google Patents

A processor employs an instruction queue and a dependency vector generation unit. The dependency vector generation unit generates a dependency vector for each instruction operation. Particularly, a dependency vector corresponding to a first instruction operation may be indicative of an ordering...

patents.google.com

"It is noted that, while in the present embodiment the instruction queue is physically divided into instruction queues 36A-36B, other embodiments may divide the instruction queue into even larger numbers of physical queues which may operate independently. For example, an embodiment employing four instruction queues might be employed (with four register files and four execution cores)."

Generally enough leeway to have AMD's Neoverse Nx to TSMC and AMD's Neoverse Ex to GlobalFoundries.

Pre-2006 => K9 canned
Post-2012 => Zen taking K9 w/ micro-op cache replacing the trace cache and a low-power overhaul replacing its 5 GHz push.
Pre-2002 => K8-2 canned
Pre-2006 => Bulldozer takes K8-2 and further clusters it and becomes K10 officially.
Now we just have to wait for the third overhaul. Since, Family 19h is the third full overhaul architecture from K9. With the growth AMD is getting again, it isn't a bad time to bring out two cores like old times.

DrMrLordX · Apr 21, 2020

NTMBK said:
You forgot the FDSOI

. . . and it's secretly CON-based.

NostaSeronx · Apr 21, 2020

DrMrLordX said:
. . . and it's secretly CON-based.

It wasn't very much a secret. imho, the secret was that it had higher IPC than Zen when it shared the FinFET node. When they retargeted the 15h family towards cost-effective mobility that plan was discarded. Sometime after 2017, they revisited the ambitious sort-of derivative before Zen became the focus with new brains.

Stoney on FT4 -> Pollock on FT5 (wasn't on any roadmap by the way) -> 22FDX/12FDX FT5(as well?)

There is more foundry techs at AMD for 12FDX, than 22FDX. Also since they missed a couple common production-class tapeouts; late 2018, through 2019, early 2020. It is probably not 22FDX, but it could be as AMD/GloFo is slow as snails in launches. Maturity in tools(32 <-> 22 same steppers) means lower cost. Even if it is faster than Zen, it is on a much less dense node and will never have enough modules to compete. Nor a roadmap as aggressive, do to GlobalFoundries; Zen3 (7nm) -> Zen4 (5nm) -> Zen5 (3nm?) -> Zen6 (2nm?), etc.

darkswordsman17 · Apr 21, 2020

NTMBK said:
You forgot the FDSOI

Good catch!

DrMrLordX said:
. . . and it's secretly CON-based.

I already covered that!

But here's the twist. The ARM chip will be from Intel! And AMD will be revealed as Emperor Palpatine! The call is coming from inside the house!

NostaSeronx · Apr 21, 2020

darkswordsman17 said:
that Zen 3 will have SMT4.

My TW guy says nope. It will be strictly SMT2(less power here) and SMT-off(more perf here). If I am wrong then there is no more GlobalFoundries projects.

DrMrLordX · Apr 21, 2020

darkswordsman17 said:
I already covered that!

Oh sorry. I zoned out after -1W TDP.

amrnuke · Apr 21, 2020

I'm bored after an hour long, brisk, dusk walk in perfect 75 degree weather.

I know everyone else has given their speculation, but I figure why not have some fun.

Wild speculation:

N7+

Keep relative spacing in the core (actual density remains far less than possible density, as it is on Zen2), +3-5% non-cache transistors
Clock bumps roughly 0.2 GHz on average
Single-threaded performance gains of 15-20%
No major change in TDP (+/- 10%)

Cache:
L1I$ 64 KB per core
L1D$ 64 KB per core
L2$ 1 MB per core
Unified L3$ for whole chiplet, remains 4 MB per core for 8 core / chiplet and 5.33 MB per core for 6 core / chiplet = 32MB L3$ available to all chiplets
(Is it possible to tag the nearest L3$ as preferential, and have a controller choose to fill the 17-32MB range on more distal areas?)

I think overall costs of the chips will be about the same, no major lineup shakups or additions for Ryzen 5/7/9 on release day.

Yeah yeah, whatever.

Here's my juicy prediction:
They play the numbers game and when 4900X is announced, it'll come with a 4.9 GHz boost.
And with the new process, cache facelift, and resultant IPC gains - the chip will trade blows in lightly-threaded apps and games(1080p) with the 10900K @ 5.3 GHz.

french toast · Apr 22, 2020

I would love AMD to add some transistors to increase frequency.. It wouldn't be alot but 100mhz or so would make a small difference.
Process would probably be N7P or N7+.. Performance wise they look quite similar, N7+ slightly better but N7P would be quite mature at that stage so yields will be awesome and top bins will be more numerous.
Wouldn't be be surprised to see combined 48mb L3 with improved memory controller and fabric to keep latency to main memory in check, improve core to core latency. - gaming should be the winner here.

~15% integer IPC, +5% clocks, larger gains for Gaming/latency sensitive apps, big unknown is FP IMO...likely an improvement but we don't need AVX 512 capabilities on desktop imo.
Would be nice if they could afford to create two versions of Zen3 with desktop/server... Double FP units and bigger caches/ SMT 3/4 for server only, on LP libraries.
Desktop we want frequency so relaxed HP libraries, lower cache, AVX 256, SMT 2.

AMD is already going down this route with RDNA/CDNA and the performance benefits would not be lost in Lisa Su et al, but neither would fab/engineering costs so probably unlikely at this stage.

exquisitechar · Apr 22, 2020

amrnuke said:
Here's my juicy prediction:
They play the numbers game and when 4900X is announced, it'll come with a 4.9 GHz boost.
And with the new process, cache facelift, and resultant IPC gains - the chip will trade blows in lightly-threaded apps and games(1080p) with the 10900K @ 5.3 GHz.

4900x will be slightly faster in lightly threaded apps even at a sustained 4.6GHz. With a 15%+ IPC increase over Zen 2, which is already slightly ahead of Skylake, the 10900k will struggle. Any small clock speed increase is just the cherry on top. Games are a thing of their own, but I expect it to trade blows or be faster in them too, yeah, likely the latter.

Not sure that they will play the numbers game with boost clocks this time around. With Renoir they actually advertised a lower boost than what is achieved on Ryzen 7 and 9 SKUs, so it seems that they learned their lesson.

Richie Rich · Apr 22, 2020

That rumor about samples with disabled SMT due to broken silicon. What a coincidence. Zen1 was much bigger technical challenge for AMD in terms of SMT as dealing with SMT for the first time and did AMD spread samples with broken SMT? No, because SMT2 was expected feature as Intel had and there was nothing to hide about SMT. Zen3 is in different situation, we might expect AVX512 this is no surprise as Intel has it already. However nobody expects SMT4 and also it is their main ace in the sleeve against Intel. AMD has a lot of reasons to hide SMT4.

french toast · Apr 22, 2020

Richie Rich said:
That rumor about samples with disabled SMT due to broken silicon. What a coincidence. Zen1 was much bigger technical challenge for AMD in terms of SMT as dealing with SMT for the first time and did AMD spread samples with broken SMT? No, because SMT2 was expected feature as Intel had and there was nothing to hide about SMT. Zen3 is in different situation, we might expect AVX512 this is no surprise as Intel has it already. However nobody expects SMT4 and also it is their main ace in the sleeve against Intel. AMD has a lot of reasons to hide SMT4.

I actually think it is a slight possibility, but it would increase power consumption significantly which would affect clocks, single thread would also go down, so I don't think this would be the best option for desktop at this time.. Zen 4 on 5nm? Maybe they could enable smt 3 for desktop on 5nm and smt 4 for server?.. Small chance we see that, but in all honesty I don't expect to see any significant AVX 512 support or SMT increase with zen 3..although I expect them to make the core wider, improve the cache and topology to be able to support these features in future if needed.

NTMBK · Apr 22, 2020

Richie Rich said:
That rumor about samples with disabled SMT due to broken silicon. What a coincidence. Zen1 was much bigger technical challenge for AMD in terms of SMT as dealing with SMT for the first time and did AMD spread samples with broken SMT? No, because SMT2 was expected feature as Intel had and there was nothing to hide about SMT. Zen3 is in different situation, we might expect AVX512 this is no surprise as Intel has it already. However nobody expects SMT4 and also it is their main ace in the sleeve against Intel. AMD has a lot of reasons to hide SMT4.

The whole point of samples is so vendors can update software to work well on the new hardware. These vendors are working under NDA. You don't disable core new features to "hide" them, you do it because they are broken.

Gideon · Apr 22, 2020

french toast said:
I actually think it is a slight possibility, but it would increase power consumption significantly which would affect clocks, single thread would also go down, so I don't think this would be the best option for desktop at this time.. Zen 4 on 5nm? Maybe they could enable smt 3 for desktop on 5nm and smt 4 for server?.. Small chance we see that, but in all honesty I don't expect to see any significant AVX 512 support or SMT increase with zen 3..although I expect them to make the core wider, improve the cache and topology to be able to support these features in future if needed.

Now obviously I don't think there is any SMT-4 in zen 3, nor do I believe there will be in near future.

However, if there ever is at some point, I'm 100% convinced that enabling it will be controlled by the BIOS and the default would still be SMT-2. It might even happen that AMD flat-out disables this option for most consumer SKUs except say threadripper and maybe the 3950x equivalent (not because of ill-will but just to avoid problems with windows and performance regressions in many apps).

Windows just doesn't really play well with lots of threads therefore there is no way that SMT-4 would be enabled by default. Especially as many apps that use up all the threads might very-well see performance tank.

The only clients that have workflows that actually benefit from SMT-4 would know how to enable it in BIOS and would mostly be running linux anyway. Enabling SMT-4 out of the box on consumer chips IMO just seems dumb.

amrnuke · Apr 22, 2020

Gideon said:
Windows just doesn't really play well with lots of threads therefore there is no way that SMT-4 would be enabled by default. Especially as many apps that use up all the threads might very-well see performance tank.

Exactly
3990X saw issues going from 64C/64T to 64C/128T on Windows, theoretically I guess we would be able to have 16C/64T before Windows chokes on it.
However, I agree with you that because of the lightly-threaded performance drop with SMT4 compared to SMT2, AMD would disable this by default for everything but 16 core and up parts.
If they do proceed down SMT4 path in the future, e.g. Zen4, my hope would be that AMD allow it to be enabled in BIOS for ALL chips.
My rationale being that there are a lot of folks like me who use the 3600 for work, but let it run R@H and WCG during downtime, and a somewhat optimized SMT4 for that purpose would be great.
I just wish it wouldn't require a reboot to turn SMT4 on and off, or if we could do it per-CCX, though that's really pie-in-the-sky.

Geranium · Apr 22, 2020

Richie Rich said:
CORONA-VIRUS ACTUAL DATA: spread rate 7-10x per week, or double every 2 days. At least 4 weeks inertia after quarantine (infected inertia rise 17x).

1. Intel Core i9 9900K @5GHz ......... SPECint2006 score: 54.28 ...... 10.86 pts/GHz
2. Apple A13 @2.65 GHz .................. SPECint2006 score: 52.82 ...... 19.93 pts/GHz ...... +83 % IPC over 9900K
3. AMD Ryzen 3950X @4.6 GHz ...... SPECint2006 score:50.02 ...... 10.87 pts/GHz ...... + 0% IPC over 9900K .... fastest clocked Ryzen beaten by iPhone CPU
4. ARM Cortex A77@2.84 GHz ......... SPECint2006 score: 33.32 ...... 11.73 pts/GHz ...... + 8% IPC over 9900K

Only 1.83x improvent with 4 to 8 times of L2 and Mediatek like "Optimization"!!

Apple's ARM chip has 4MB L2 per core compared to 512KB and 1MB per core AMD64 chips. The whole benchmark could fit in Apple's L2 cache.
Also Is the benchmark was compiled with same compiler?? Same OS?? Same Storage and RAM size and speed??

Edit : I was repling to the SpeCint2006 benchmark. looks like it is for signature.

moinmoin · Apr 22, 2020

Gideon said:
However, if there ever is at some point, I'm 100% convinced that enabling it will be controlled by the BIOS and the default would still be SMT-2.

While I totally agree this is what is going to happen, I personally wish the control over SMT would move into the processor, making it decide itself which amount of logical threads are most efficient for handling a given workload.

(And to be honest I'm fed up of discussing Windows as the obstacle to progress in CPU features.)

DannyH246 · Apr 22, 2020

Does anyone have any thoughts on fabrication options for the IO die moving forward? Currently it is manufactured on 14nm at GloFlo. Will Zen3 IO die be manufactured on GloFlo's enhanced 12nm process? Or TSMC's 7nm?
Another question i had - would AMD ever consider GloFlo's FD-SOI process for any future IO die?

amrnuke · Apr 22, 2020

Geranium said:
Also Is the benchmark was compiled with same compiler?? Same OS?? Same Storage and RAM size and speed??

No

amrnuke · Apr 22, 2020

moinmoin said:
While I totally agree this is what is going to happen, I personally wish the control over SMT would move into the processor, making it decide itself which amount of logical threads are most efficient for handling a given workload.

(And to be honest I'm fed up of discussing Windows as the obstacle to progress in CPU features.)

That would be really interesting to let SMT be an on-the-fly switch, perhaps even on a per-core basis.

Speculation: Ryzen 4000 series/Zen 3

Platinum Member

Golden Member

Diamond Member

Diamond Member

Lifer

Lifer

Diamond Member

Lifer

Diamond Member

Lifer

Diamond Member

Lifer

Golden Member

Senior member

Senior member

Senior member

Senior member

Lifer

Platinum Member

Golden Member

Member

Diamond Member

Junior Member

Golden Member

Golden Member