Design changes in Zen 2 (CPU/core/chiplet only)

Page 6 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

IRobot23

Senior member
Jul 3, 2017
601
183
76
I hadn't remembered the old road map; just found it.

next-horizon-zen3-4-roadmap.png


Zen3 is 7nm+ and on track. Zen4 design is surprisingly near design completion. Maybe Zen4 will be SMT4?


https://fuse.wikichip.org/news/1815/amd-discloses-initial-zen-2-details/

I think TR4 and Epyc need DDR5 and a redesign more badly than AM4.

I don't think the next TR socket needs all that much wattage, given the efficiency gains. They should work on TR affordabilty and graphics output.

We can stay on AM4 and X399. All they need to do is HBM with low latency and high freq on die stacking. I don't care about DDR if they do that.
 

amd6502

Senior member
Apr 21, 2017
971
360
136
My impression is that the Zen cores already do a very good job power gating unused areas, and that the big challenge ahead for AMD is optimizing the whole uncore for mobile usage while their uncore is primarily optimized for scalability first (which is another reason why their monolithic APU designs launch last each gen).

The Dr makes a good point though, that the heterogenous approach does have one big advantage. It's easier (and probably somewhat more effective) to disable the powerful larger energy hungry cores and in low-frequency low load conditions shift threads into virtual cores that end up being processed by 16h-like small energy efficient and low frequency cores. (To be able to extend the wattage range and frequeny range as much as possible is big plus.)
 

DrMrLordX

Lifer
Apr 27, 2000
21,629
10,841
136
My impression is that the Zen cores already do a very good job power gating unused areas, and that the big challenge ahead for AMD is optimizing the whole uncore for mobile usage while their uncore is primarily optimized for scalability first (which is another reason why their monolithic APU designs launch last each gen).

Uncore power usage seems to be a challenge for both Intel and AMD desktop/server-oriented designs. Fwiw AMD seems to be in a "mobile last" design phase right now. I wouldn't expect them to prioritize developments aimed at the mobile sector.
 

krumme

Diamond Member
Oct 9, 2009
5,952
1,585
136
Guys what's your take on zen3 IO and chiplet diesize? And what process for IO?

Does 7nm plus bring some density improvements?
 

jpiniero

Lifer
Oct 1, 2010
14,591
5,214
136
Guys what's your take on zen3 IO and chiplet diesize? And what process for IO?

Does 7nm plus bring some density improvements?

AMD isn't using TSMC 7+ it appears, looks like they are using SS7 next and then possibly go back to TSMC for their 5 nm later.

SS7 is a only tad denser than TSMC's 7 nm but if it allows them to not have to use HPC they could gain some density there.

The IO die I think depends on how the WSA looks. I'll guess GF 12 for now.
 

Tuna-Fish

Golden Member
Mar 4, 2011
1,348
1,534
136
Also, the cpu doesn't "look ahead 100 cycles"; that makes no sense. All out of order execution and ilp happens after decode. For x86 that would be uops in the uop queue.

Yes. Do note that there very often are 100 cycles worth of instructions (or more) in the uop queue. Consider a situation where there is a miss through all caches to RAM. That's roughly 90ns on modern Intel CPUs, give or take a bit depending on how good your DRAM is. If the CPU is running at 4GHz, and decoding ~3 instructions per cycle on average, it would be able to decode a >1000 instructions while waiting on data for that one miss. The hard limit on how far ahead it can get are the ROB and the PRFs. Or, whenever there is a miss to memory the CPU will fetch, decode and issue instructions until one of [ROB (224), sPRF (180), vPRF (168)] runs out of room. (numbers for coffee lake)

Even misses to L3 will give the CPU enough time to fill up half the queues. Misses are common enough (and usually happen before major computation -- you have to first get data before you can work on it) that unless your code has plenty of badly predicted branches, the normal operating condition of a CPU is executing from queues that are closer to full than empty, allowing a lot of OoO despite the in-order frontend and retire.

Of course, if you are doing things like tree traversal and branching on every node, the queue gets cleared every time the predictor fails, and you are always running with empty queues. That's just one more reason why in practice trees usually lose to hash tables, even if hash tables technically have to do more work.
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,686
1,221
136
On the chiplet design.

I expect the consumer model to switch to 12FDX:
100 squared mm on 14LPP to 50-60 squared mm on 12FDX.
^-- Will continue to use three die. (One I/O + two CPUs)

While the server model to switch for 7FF+:
400 squared mm on 14LPP to ~300 squared mm on N7+(7LPP is also an option).
At this point I expect four dies per quadrant. (One I/O + Sixteen CPUs)
The shrink is mostly for the increase in Hypertransport connections on package.
 
Last edited:

amd6502

Senior member
Apr 21, 2017
971
360
136
I would think it would be largely seamless for the I/O chip.

When it will appear, it would need to achieve the below:
-> 12FDX becomes available.
-> 12FDX complex IP + AMD IP are Si proven.
-> 12FDX surpasses 14LPP quantity.

I don't know how often new consumer IO chips are needed. But why not 22fdx if the real goal is cost and perf/watt?

I tend to think they stagger MCM and monolithic releases.

And any guesses as to whether the 14nm one has some vega graphics?
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,686
1,221
136
But why not 22fdx if the real goal is cost and perf/watt?

And any guesses as to whether the 14nm one has some vega graphics?
12FDX has a couple features from 14FD+/10FD which makes it optimized for reliance. Then, there is the portability of 14nm/12nm FF to 12FDX.
-> DITO for two diodes of FBB and RBB for dynamic optimization and static optimization.

I do not think the I/O chip will include graphics. As is right now, the APUs are stuck in monolithic design. Till 3D SRAM comes along at GloFo, it is unlikely that the I/O chip will have ROPs or any display capability.