Discussion Zen 7 speculation thread

Page 14 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

soresu

Diamond Member
Dec 19, 2014
4,101
3,560
136
My father was talking about how he went to a specialist in dentistry and couldn't figure out why he had pain. He did some searching and some articles suggested that it may be due to a weak heart, which he does have. Another example of myopia.
Funny but Lisa Su mentioned something similar in a recent interview when talking about the medical specialists who worked on her mothers care.
 

MadRat

Lifer
Oct 14, 1999
11,999
307
126
Opposite actually. Process gains are starting to stall, while costs are increasing at a rapid rate. "New process gen" in 2026+ is like 0.5x of 2003-2024. That's on top of where the gains have been reducing for past 20 years. After 0.18u, they needed increasingly more complex process, different materials, and layers to advance. Starting with copper interconnects.
It wasn't so long ago engineers on this site - before the great wipe event - were scoffing at the idea of copper interconnects.
 

Thunder 57

Diamond Member
Aug 19, 2007
4,025
6,740
136
It wasn't so long ago engineers on this site - before the great wipe event - were scoffing at the idea of copper interconnects.

This site barely existed when the switch to copper interconnects were implemented. We're talking like 1999 here. And what is the "great wipe event"? Ironically Coppermine did not use copper interconnects, but Athlon did.
 
  • Like
Reactions: Joe NYC

soresu

Diamond Member
Dec 19, 2014
4,101
3,560
136
This site barely existed when the switch to copper interconnects were implemented. We're talking like 1999 here. And what is the "great wipe event"? Ironically Coppermine did not use copper interconnects, but Athlon did.
Wait what did they use before copper?
 

Kepler_L2

Senior member
Sep 6, 2020
998
4,256
136
Interesting patent for essentially an infinitely large LDQ:

"Moreover, because of the flexibility presented by the systems and methods discussed herein, additional computing resources are not wasted in idling while additional LDQ or LOQ space opens up. Instead, the systems and methods discussed herein provide an out-of-order tracking solution that can potentially track out-of-order loads from every cacheline in the data cache--meaning no processor resources are wasted idling while waiting for additional tracking space opens up."
 

Doug S

Diamond Member
Feb 8, 2020
3,572
6,305
136
Geez I was going for laugh react not to get material for a dissertation on the evolution of the English language in the US vs UK since the Boston Tea Party.
 
  • Haha
Reactions: Thunder 57

soresu

Diamond Member
Dec 19, 2014
4,101
3,560
136
Geez I was going for laugh react not to get material for a dissertation on the evolution of the English language in the US vs UK since the Boston Tea Party.
From what I read it's more about academic semantics and pedantry than US/UK English linguistic differences.

One of the arguments hinging on the fact that chemist/inventor Humphry Davy had already isolated the elements potassium, sodium, calcium, strontium, barium, and magnesium, leading some academics to argue that continuity was the better way to go.

Never mind that he also isolated boron 😂🤣, could have ended up being called aluminon.
 

marees

Golden Member
Apr 28, 2024
1,735
2,374
96
So assuming an even number of WGPs, a monolithic laptop zen 7 APU (with RDNA 5+) should have atleast 4 WGP ?

So if we assume like 3 monolithic chips like zen 6 medusa point, then we could have (in 2029-2030):
  1. Point 1 — 12 or 8 WGPs
  2. Point 2 — 6 or 8 WGPs
  3. Point 3 — 4 WGPs

2-3. 9070XT at @2.97Ghz lists 389 INT8 TOPS. 389 x 4/64 = 24.31 TOPs INT8 dense. +3% clocks to @3.06Ghz x doubled throughput = 50 TOPs. 3 WGPs/CUs = 75 TOPs.
 
Last edited:
  • Like
Reactions: Tlh97 and MrMPFR

MrMPFR

Member
Aug 9, 2025
103
203
71
Worst case assuming no changes related to node or CU vs RDNA 5 at ~3ghz:
#1 = 200-300 INT8 TOPs
#2 = 150-200 INT8 TOPs
#3 = 100 INT8 TOPs.

100-300 INT8 TOPS = 2.5-8 times Copilot+ requirement!
Does this mean NPUs are getting deprecated in RDNA 5 and later iGPU based design? I remember reading about that somewhere in the forums.
 
  • Like
Reactions: Tlh97 and marees

madtronik

Junior Member
Jul 22, 2019
17
49
91
Worst case assuming no changes related to node or CU vs RDNA 5 at ~3ghz:
#1 = 200-300 INT8 TOPs
#2 = 150-200 INT8 TOPs
#3 = 100 INT8 TOPs.

100-300 INT8 TOPS = 2.5-8 times Copilot+ requirement!
Does this mean NPUs are getting deprecated in RDNA 5 and later iGPU based design? I remember reading about that somewhere in the forums.
Well, I've always read that AMD sells NPUs not as raw power but as a low power specialized circuit for background inferencing. They always said it is a battery saving feature. If you can get low power powerful inference with iGPU I guess they have no place to be.
 

soresu

Diamond Member
Dec 19, 2014
4,101
3,560
136
Well, I've always read that AMD sells NPUs not as raw power but as a low power specialized circuit for background inferencing. They always said it is a battery saving feature. If you can get low power powerful inference with iGPU I guess they have no place to be.
Low power inference does seem to be the general point of all "NPU" blocks in mobile SoCs these days, I would be surprised if XDNA1/2 was not also cast from that mold.
 
  • Like
Reactions: Tlh97 and MrMPFR

MrMPFR

Member
Aug 9, 2025
103
203
71
That is the belief here in the forums

Especially because AMD & Microsoft have 2 more years to work out something sensible !!!
I think this is what I remembered:
The funny thing about "AMD traded MALL in Strix Point for the NPU" is that both MALL and NPU are deprecated in the future

Still leaves a lot of open questions. For one are NPUs deprecated nextgen in all Zen 6 + RDNA 5 based products or later. If it applies to mobile too how will AMD adress the concerns regarding time to first token (execution latency) and power efficiency, because getting an inferior solution in terms of battery life on a brand new product vs last gen is just unacceptable.
Are we talking about customizations to ML and core in RDNA 5 to effectively emulate an "NPU mode" to save on power? Could that perhaps be very fine-grained power gating, architectural changes to ML HW and even special modes of operation, and in general massive architectural changes to cachemem and data locality? Just a bit of spitballing.

This processing in cache patent should increase ML and RT performance sizeably: https://patents.google.com/patent/US20240264942A1
Really any branchy (PT for example) or mem hungry workload should benefit, as long as BW heavy instructions are offloaded to CCUs. Perhaps this is the Processing-in-Cache patent that was mentioned January?

Obviously no confirmation but might be reasonable to expect this is roughly how the NPU in Mediatek Dimensity 9500 SoC achieves CIM. IIRC CIM is touted as one of the big reasons for why the NPU is so power efficient.
 
Last edited:
  • Like
Reactions: Tlh97 and marees

marees

Golden Member
Apr 28, 2024
1,735
2,374
96
I think this is what I remembered:


Still leaves a lot of open questions. For one are NPUs deprecated nextgen in all Zen 6 + RDNA 5 based products or later. If it applies to mobile too how will AMD adress the concerns regarding time to first token (execution latency) and power efficiency, because getting an inferior solution in terms of battery life on a brand new product vs last gen is just unacceptable.
Are we talking about customizations to ML and core in RDNA 5 to effectively emulate an "NPU mode" to save on power? That could perhaps be very fine-grained power gating, architectural changes to ML HW and even special modes of operation, and in general massive architectural changes to cachemem and data locality. Clearly spitballing here.

This patent might help increase ML performance: https://patents.google.com/patent/US20240264942A1
Sounds a bit like compute-in-memory (CIM), but as of now no confirmation for its implementation (yes or no) and if so in which product lineups.
Interesting: co-compute linked to L3 to avoid cache-thrashing of L1 cache (for memory intensive loads such as RT)

This seems like would need lots of more die area 🤔
Or is this just renaming of RT core to a more generic co-compute core 🤔 🤔
 
  • Like
Reactions: Tlh97 and MrMPFR