Discussion Speculation: Zen 4 (EPYC 4 "Genoa", Ryzen 7000, etc.)

Page 61 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Vattila

Senior member
Oct 22, 2004
799
1,351
136
Except for the details about the improvements in the microarchitecture, we now know pretty well what to expect with Zen 3.

The leaked presentation by AMD Senior Manager Martin Hilgeman shows that EPYC 3 "Milan" will, as promised and expected, reuse the current platform (SP3), and the system architecture and packaging looks to be the same, with the same 9-die chiplet design and the same maximum core and thread-count (no SMT-4, contrary to rumour). The biggest change revealed so far is the enlargement of the compute complex from 4 cores to 8 cores, all sharing a larger L3 cache ("32+ MB", likely to double to 64 MB, I think).

Hilgeman's slides did also show that EPYC 4 "Genoa" is in the definition phase (or was at the time of the presentation in September, at least), and will come with a new platform (SP5), with new memory support (likely DDR5).

Untitled2.png


What else do you think we will see with Zen 4? PCI-Express 5 support? Increased core-count? 4-way SMT? New packaging (interposer, 2.5D, 3D)? Integrated memory on package (HBM)?

Vote in the poll and share your thoughts! :)
 
Last edited:
  • Like
Reactions: richardllewis_01

jamescox

Senior member
Nov 11, 2009
637
1,103
136
I see no reason why you couldn't stack chips made on different processes. All you have to line up are the TSVs, the location of which are independent of process. TSVs connect to the metal layers, and don't care about the location or density of transistors (other than not putting them where a TSV will poke through in chips designed to stack more than two levels high)

I wonder how well TSMC's "SRAM optimized" process scales from N7 to N5? Does it scale as poorly as cache scales on its regular process (where SRAM only gets a 20% scaling from N5 to N3) or does it scale more like logic does on the regular "logic optimized" process (which gets 70% scaling from N5 to N3)

This talk of "cache optimized" processes getting 2x the density for cache versus the standard process is a bit confusing to me. Can someone point to where that claim is made, and how it squares up with Apple's A13? On the A13, which is also made in TSMC 7nm Apple has a 16MB SLC which per Andrei's measurement is 8.47 mm^2 in size. That's basically identical density to AMD's L3 chip with 64MB in 36 mm^2. So is this "cache optimized" thing real, because that's the same density Apple is getting in a standard process. There are different cache types (6T vs 8T cells) and so forth so maybe that accounts for the difference but it is suspicious to me how close the density of the A13 and AMD's L3 chips are.

This is an older article:


It list different TSV pitch, but it also says that they might be able to get it down to 0.9 microns from 9 microns on 7 nm and 6 microns for 5 nm. That is almost a year old, so they may have significantly smaller pitch by now. I don’t see why they wouldn’t be able to mix 7 and 5 nm die; they should just need to use a compatible tsv pitch. The 7 nm cache die is presumably a different process presumably from a different line. Mixing something made in a different fab location may be difficult, so that could be a limitation. The cache die is probably a bit cheaper to produce. I doubt that it needs as many metal layers as the full cpu die, so less processing. If we get a 5 nm version of it with Zen 4, then it may be an even larger amount of cache.

This is a much smaller pitch than micro-solder ball solutions so TSMC tech is enabling this for AMD. 9 microns is still 9000 nm though, so this isn’t quite the same as being on the same die, but it is closer than anything else.
 

DrMrLordX

Lifer
Apr 27, 2000
21,582
10,785
136
Warhol is real.

So is Dali. That one had people scratching their heads for a long time, though, and in the end it wasn't much. So what exactly is Warhol? Where will it wind up? Why does it seem to fit so awkwardly in AMD's release schedule? Does it have anything to do with the stacked L3 chips coming out somewhere around the Alder Lake launch?

As far as anyone outside of AMD's cloistered labs is concerned, Warhol may as well not exist. Until it finally shows up.

COW-stacking


(lol)

*new fish related code name get!*

Fish and cows? Oh my. Guess I should have seen that coming.
 

Vattila

Senior member
Oct 22, 2004
799
1,351
136
Regarding the timing of the V-Cache technology introduction, as many here have commented on, note that the hybrid bonding used in TSMC's SoIC stacking technology requires cleanroom level of precision and cleanliness, which is not something the general assembly-and-test (OSAT) houses currently can do. So there is likely not enough capacity for AMD to roll out chips based on this technology yet. That said, TSMC has a substantial build-out plan for SoIC assembly and test. Also note that TSMC considers N6 part of the N7 generation as far as SoIC stacking is concerned, and that their roadmap has N3-on-N5 stacking for late 2023.


I wonder what Genoa is going to look like. There are some really ugly mock-ups out there based on rumours (ExecutableFix's 96-core mock-up and AdoredTV's 128-core mock-up), that assume that AMD will evolve the current interconnect in the package, while growing the socket to accommodate it all. Really, really ugly. So I have dreamed up my own mock-up, assuming they will use a silicon interposer, include in-package HBM and keep dimensions pretty much as is. The latter assumption is mainly because I'm lazy — I've simply reused existing Rome/Milan graphics — but let's run with it and assume they have decreased socket pin pitch rather than increased socket size. At least, that would be a nice thought from a infrastructure perspective, with the ability to build on current cooling solutions and form factors.

AMD EPYC Genoa (interposer speculation).png

If AMD does not move to a silicon interposer with Genoa, why would that be, considering the obvious energy saving and bandwidth opportunity? AMD presentations (Teja Singh at ISSCC 2020) have stated they considered it for Rome, but interposers at the time didn't have the "reach" they needed. TSMC has produced enormous silicon interposers since then, and I think they can do stitching as well now. Or is the required size still prohibitive? Cost?

1622935108923.png
 
Last edited:

moinmoin

Diamond Member
Jun 1, 2017
4,933
7,619
136
So Zen4 is earliest 4Q22 if it uses this tech from the beginning, baring any late hiccups.
Chances are high AMD really is an early adopter this time and Ryzen 5000 + V-Cache is the pipe cleaner bringing the production at TSMC up to speed (especially for more than one stack). So I'd still not rule out Zen 4 coming earlier next year (unless it uses yet another tech not quite finished, repeating this Zen 3 situation).

TSMC has produced enormous silicon interposers since then
I'm pretty sure AMD prefers to not use "enormous" silicon interposers if it can and wants to use significantly smaller dies. ;)

Other than that very good point that TSMC's stacking tech moves the test and assembly bottleneck in-house (from OSAT to within the same fab) which is likely a serious impediment for mass scale production. That is unless TSMC planned ahead to expand that service at the same rate as its respective fab output.
 
  • Like
Reactions: Tlh97 and Vattila

MadRat

Lifer
Oct 14, 1999
11,909
229
106
Maybe we're missing the obvious. All 15 of those chips sit on a slightly larger rectangle. Could they all be themselves connected to a larger underlying silicon rather than simply stacking on each other? It would mae sense to bond smaller chips to silicon than the PCB.

Nevermind, this is exactly what you meant above as a silicon interposer.
 
Last edited:
  • Like
Reactions: Tlh97 and Vattila

jpiniero

Lifer
Oct 1, 2010
14,510
5,159
136

Some of the stuff mentioned in the article appears to be out of date, but AMD is planning on pushing Rapahel into 35 - 65 W gaming laptops to fight Alder (Raptor) Lake-S BGA.
 

MadRat

Lifer
Oct 14, 1999
11,909
229
106
Vattila-

You're chiplets are mainly L3 cache, no logic. So your arrangement is not constrained by signal timing. The chiplets can be laid out first in a line across a common bus, then stacked in a vertical bus on each chiplet footprint. So even with a silicon interposer, why more than one row of chips? The IOC should sit on one side of the interposer rather than in the middle.
 
  • Like
Reactions: Tlh97 and Vattila

Doug S

Platinum Member
Feb 8, 2020
2,202
3,405
136
Other than that very good point that TSMC's stacking tech moves the test and assembly bottleneck in-house (from OSAT to within the same fab) which is likely a serious impediment for mass scale production. That is unless TSMC planned ahead to expand that service at the same rate as its respective fab output.


When they rolled out InFO packaging they were able to handle Apple's iPhone volumes right away. That's a less complex technology but at the time just as cutting edge.

It isn't like customers would tell them they are interested in a new technology at the last minute. They know well in advance who will be using it, and will get a commitment for a given volume so they can build the necessary infrastructure. Of course they will plan ahead, TSMC's customers will have known this was coming for several years and be able to decide if they are interested or not and TSMC will have what they need (at least at the committed volumes)
 

Vattila

Senior member
Oct 22, 2004
799
1,351
136
I'm pretty sure AMD prefers to not use "enormous" silicon interposers if it can and wants to use significantly smaller dies.

I guess size is the main issue, as AMD has alluded to in presentations (lack of "reach"), not cost. However, although the interposer I've drawn is well above twice the reticle limit (~800 mm²), TSMC apparently is on a roadmap to support 3x the reticle limit by this year, and 4x by 2023. Also note that the interconnect on the interposer in my mock-up does not need to reach from the far left to the far right. It only needs to reach from the CCDs to the IOD. So it could be stitched together, I would think. Regarding cost, note that the silicon interposer would be made on an old established process node (e.g. 65 nm) with just a few metal layers (which is just BEOL processing, no FEOL processing, i.e. no transistors or other small structures), hence the yield should be very good, and wafer cost should be reasonable. The cost is probably in the packaging service.

"The limit of CoWoS-S is all in the size of the interposer, which is often built on a 65nm manufacturing process or similar. As the interposer is a single piece of silicon, it has to be manufactured similarly, and as we move into a chiplet era, customers are asking for bigger and bigger interposers, which means TSMC has to be able to manufacture them (and give high yield). Traditional chips are limited by what is known as the size of the reticle, which is a fundamental limit inside of a machine as to how big it can ‘print’ a layer at a single instance. In order to enable products whose chips are on the reticle size, TSMC has been developing multi-reticle sized interposer technology, allowing these products to be big. Based on TSMC’s own roadmaps, we are expecting CoWoS implementations in 2023 to be around 4x the size of the reticle, allowing for over 3000+ mm2 of active logic silicon per product. We have a news item specifically covering this technology that you can read here."

anandtech.com

CoWoS_reticle_roadmap-1024x540.jpg
 
Last edited:

Vattila

Senior member
Oct 22, 2004
799
1,351
136
You're chiplets are mainly L3 cache, no logic. [...] The IOC should sit on one side of the interposer rather than in the middle.

The V-Cache sits on top of the CCD, which sits on top of the silicon interposer, which connects to the IOD, which in turn connects to the HBM, also through the silicon interposer. It is all similar to the interconnect in the current Rome/Milan package, except that the latter has no silicon interposer, nor HBM, and the interconnect is implemented within the package substrate (with far higher energy consumption). The topology is the same.
 
Last edited:

DisEnchantment

Golden Member
Mar 3, 2017
1,590
5,722
136
Seems with latest leaks from GN/VCZ, the L3 is going to be same 32 MB. If this turns out to be true, for a 72mm2 die Zen4, the core (including L1/L2) should get a hefty 45-50% increase in MTr. Probably more, if any where close to TSMC's advertised 1.8x scaling for N5
This is good news if true. They can make special SKUs with the V Cache if needed as announced for Zen 3.
But for the vanilla Zen4 Core should be a significant uptick in logic.

Talking of this interposer I recollect this patent application which makes a lot of sense and kind of seems suited for the idea.
1622925466453.png

Additionally for the thick IHS I recollect a patent application related to power delivery via the IHS, because... SoIC cannot effectively deliver power to the top die for logic. Therefore the stacking is suitable for SRAM only. Workaround is deliver power via the IHS.
1622925866956.png

All in all, fairly interesting bits they have. Issue remains how much money will the typical customer pay. And the problem is that the core is being designed for the least common denominator
With stacking we get at one more way of creating a new CCD for a new SKU for which some might be willing to pay, in addition to the existing way of adding extra cores with a second CCD.

Remains to be seen if someone finds new info on what lies beneath the IHS. Power delivery or TEC or whatever it may be, it sounds interesting.
 

tomatosummit

Member
Mar 21, 2019
184
177
116
Some of the stuff mentioned in the article appears to be out of date, but AMD is planning on pushing Rapahel into 35 - 65 W gaming laptops to fight Alder (Raptor) Lake-S BGA.
That looks mostly due to the long overdue igpu placed in the io die. Good news for all zen4 platforms even if it's only useless desktop enabler.
Another thing is probably time to market, the apus lag behind by a significan't amount of time and this will allow amd to claim "zen4 on laptops", probably for the 2022 back to school rush.
 

jamescox

Senior member
Nov 11, 2009
637
1,103
136
I guess size is the main issue, as AMD has alluded to in presentations (lack of "reach"), not cost. However, although the interposer I've drawn is well above twice the reticle limit (~800 mm²), TSMC apparently is on a roadmap to support 3x the reticle limit by this year, and 4x by 2023. Also note that the interconnect on the interposer in my mock-up does not need to reach from the far left to the far right. It only needs to reach from the CCDs to the IOD. So it could be stitched together, I would think. Regarding cost, note that the silicon interposer would be made on an old established process node (e.g. 65 nm) with just a few metal layers (which is just BEOL processing, no FEOL processing, i.e. no transistors or other small structures), hence the yield should be very good, and wafer cost should be reasonable. The cost is probably in the packaging service.

"The limit of CoWoS-S is all in the size of the interposer, which is often built on a 65nm manufacturing process or similar. As the interposer is a single piece of silicon, it has to be manufactured similarly, and as we move into a chiplet era, customers are asking for bigger and bigger interposers, which means TSMC has to be able to manufacture them (and give high yield). Traditional chips are limited by what is known as the size of the reticle, which is a fundamental limit inside of a machine as to how big it can ‘print’ a layer at a single instance. In order to enable products whose chips are on the reticle size, TSMC has been developing multi-reticle sized interposer technology, allowing these products to be big. Based on TSMC’s own roadmaps, we are expecting CoWoS implementations in 2023 to be around 4x the size of the reticle, allowing for over 3000+ mm2 of active logic silicon per product. We have a news item specifically covering this technology that you can read here."

anandtech.com

CoWoS_reticle_roadmap-1024x540.jpg

Rampant speculation ahead; as I have said, it is hard to tell which way they will go with chip stacking.

The IO die could be on an old process if it is a passive interposer. An active interposer would likely be made on a more advanced process since it would have transistor layers. The large pitch, micro-solder ball type stacking could be done with interposer shipped from another location. For some of the other tech, I doubt that they would want to ship between locations. TSMC had talked about interposers multiple times the size of the reticule, but that probably is more expensive and probably has some limitations.

There was some rumors about a 15 chip device a very long time ago. I had wondered if that was an early mock up of a device with cores stacked on an interposer. With an active interposer, they would put all of the physical layer interfaces in the interposer but they would still have things like the unified memory controller that could be put on a separate chip stacked on top. That wouldn’t contain the physical IO interface, so it would be an advantage to have it on a smaller process. You could have the cpu die, with stacked L3 and then a few chiplets that contain IO die elements that do benefit from the advanced process, like perhaps multiple memory controller chiplets. They probably would have been working on prototypes and such several years ago to decide which direction to go.

I had wondered whether they will make a stacked device such that it would have room for 1 or 2 HBM containing GPUs directly on the Epyc package, maybe one on each side of the CPU interposer. That would pull a massive amount of power and it would be ridiculously expensive, but it would be a very compact design for super computers.

We may already get 288 MB of L3 on Zen 3 based Milan processors. That could go a lot higher if they take advantage of higher numbers of layers. With that much cache, the interconnect to the IO die may be less important, so that could mean that initial Genoa may not have any stacking except the cache die. HBM is interesting since it would still presumably be a micro-solder ball interface such that we have several types of stacking used in one package. The cost goes up a lot though, so I don’t know how they will do low end devices. I don’t know if we need HBM on package that much if we have ridiculous amounts of SRAM. The bandwidth of DDR 5, possibly more channels, would make up for no HBM in most cases. AMD stayed with cheaper, standard GDDR6 for the gpus and made up for it with infinity cache.

Perhaps multiple sizes of interposers or multiple smaller interposers would be the modular way to go. I suspect most Rome processors sold are the 4 cpu chiplet versions. With the interposer split into quadrants, they might do essentially 1, 2, or 4 quadrant devices. I don’t know if it would make sense to expand it to 6 domains. There hasn’t been any rumors of the pci-express count going up to 192. The IO die or interposer can’t get too small due to the massive number of IO pins required. It might be cheaper and more modular to do small separate interposers connected by infinity fabric with 2 or 3 CCD, IO, and memory controllers. That would be up to 16 or possibly 24 cores per interposer.

Epyc processors are internally 4 domains. It was 4 separate chips in Epyc 1, but the 4 domains has been retained. You can set the NPS setting (numa per socket) to be 1, 2, or 4. The 1 node setting stripes memory across all 8 channels. The 2 node setting does stripes across 4 channels in each half (2 quadrants). The NPS4 setting essentially makes each 128-bit memory controller local to the quadrant. There is different amounts of latency going between quadrants. I think it also has a setting for making each L3 (CCX) a separate numa domain. The NPS4 setting may be very good with 4 CCD Rome processors since each L3 ends up in a separate domain. This may be more important with larger and possibly slower L3 caches.

Using multiple smaller interposers would allow good modularity but the connections between the interposers would still be serial infinity fabric, which wouldn’t be as good as a single large interposer. Perhaps they could use embedded silicon bridges (similar to intel EMIB) to avoid the giant interposer though. This would allow them to crank out massive numbers of the small interposers. If the interposer to interposer connection is bad, then sell it as a single interposer device (Desktop Ryzen). The giant interposer is great for power and such, but the manufacturability isn’t that good. If something goes wrong in the last stage, you just might lose a lot of expensive devices. That might be okay if they sell for $10,000 each or something
 

eek2121

Platinum Member
Aug 2, 2005
2,904
3,906
136
Regarding the timing of the V-Cache technology introduction, as many here have commented on, note that the hybrid bonding used in TSMC's SoIC stacking technology requires cleanroom level of precision and cleanliness, which is not something the general assembly-and-test (OSAT) houses currently can do. So there is likely not enough capacity for AMD to roll out chips based on this technology yet. That said, TSMC has a substantial build-out plan for SoIC assembly and test. Also note that TSMC considers N6 part of the N7 generation as far as SoIC stacking is concerned, and that their roadmap has N3-on-N5 stacking for late 2023.


I wonder what Genoa is going to look like. There are some really ugly mock-ups out there based on rumours (ExecutableFix's 96-core mock-up and AdoredTV's 128-core mock-up), that assume that AMD will evolve the current interconnect in the package, while growing the socket to accommodate it all. Really, really ugly. So I have dreamed up my own mock-up, assuming they will use a silicon interposer, include in-package HBM and keep dimensions pretty much as is. The latter assumption is mainly because I'm lazy — I've simply reused existing Rome/Milan graphics — but let's run with it and assume they have decreased socket pin pitch rather than increased socket size. At least, that would be a nice thought from a infrastructure perspective, with the ability to build on current cooling solutions and form factors.

View attachment 45329

If AMD does not move to a silicon interposer with Genoa, why would that be, considering the obvious energy saving and bandwidth opportunity? AMD presentations (Teja Singh at ISSCC 2020) have stated they considered it for Rome, but interposers at the time didn't have the "reach" they needed. TSMC has produced enormous silicon interposers since then, and I think they can do stitching as well now. Or is the required size still prohibitive? Cost?


So word on the street is that TSMC is pushing everyone to 6nm due to the not insignificant savings in die space/production output. 7nm and 6nm were "mostly" compatible. I bring this up because I suspect that Warhol is essentially Zen 3 on a smaller chip + vcache. AMD is prototyping vache stacking on Zen 3, but the final product will ship on 6nm...and yes, it will be on AM4.

Chances are high AMD really is an early adopter this time and Ryzen 5000 + V-Cache is the pipe cleaner bringing the production at TSMC up to speed (especially for more than one stack). So I'd still not rule out Zen 4 coming earlier next year (unless it uses yet another tech not quite finished, repeating this Zen 3 situation).


I'm pretty sure AMD prefers to not use "enormous" silicon interposers if it can and wants to use significantly smaller dies. ;)

Other than that very good point that TSMC's stacking tech moves the test and assembly bottleneck in-house (from OSAT to within the same fab) which is likely a serious impediment for mass scale production. That is unless TSMC planned ahead to expand that service at the same rate as its respective fab output.

Some of the stuff mentioned in the article appears to be out of date, but AMD is planning on pushing Rapahel into 35 - 65 W gaming laptops to fight Alder (Raptor) Lake-S BGA.
I guess size is the main issue, as AMD has alluded to in presentations (lack of "reach"), not cost. However, although the interposer I've drawn is well above twice the reticle limit (~800 mm²), TSMC apparently is on a roadmap to support 3x the reticle limit by this year, and 4x by 2023. Also note that the interconnect on the interposer in my mock-up does not need to reach from the far left to the far right. It only needs to reach from the CCDs to the IOD. So it could be stitched together, I would think. Regarding cost, note that the silicon interposer would be made on an old established process node (e.g. 65 nm) with just a few metal layers (which is just BEOL processing, no FEOL processing, i.e. no transistors or other small structures), hence the yield should be very good, and wafer cost should be reasonable. The cost is probably in the packaging service.

"The limit of CoWoS-S is all in the size of the interposer, which is often built on a 65nm manufacturing process or similar. As the interposer is a single piece of silicon, it has to be manufactured similarly, and as we move into a chiplet era, customers are asking for bigger and bigger interposers, which means TSMC has to be able to manufacture them (and give high yield). Traditional chips are limited by what is known as the size of the reticle, which is a fundamental limit inside of a machine as to how big it can ‘print’ a layer at a single instance. In order to enable products whose chips are on the reticle size, TSMC has been developing multi-reticle sized interposer technology, allowing these products to be big. Based on TSMC’s own roadmaps, we are expecting CoWoS implementations in 2023 to be around 4x the size of the reticle, allowing for over 3000+ mm2 of active logic silicon per product. We have a news item specifically covering this technology that you can read here."

anandtech.com

CoWoS_reticle_roadmap-1024x540.jpg
Seems with latest leaks from GN/VCZ, the L3 is going to be same 32 MB. If this turns out to be true, for a 72mm2 die Zen4, the core (including L1/L2) should get a hefty 45-50% increase in MTr. Probably more, if any where close to TSMC's advertised 1.8x scaling for N5
This is good news if true. They can make special SKUs with the V Cache if needed as announced for Zen 3.
But for the vanilla Zen4 Core should be a significant uptick in logic.

Talking of this interposer I recollect this patent application which makes a lot of sense and kind of seems suited for the idea.
View attachment 45340

Additionally for the thick IHS I recollect a patent application related to power delivery via the IHS, because... SoIC cannot effectively deliver power to the top die for logic. Therefore the stacking is suitable for SRAM only. Workaround is deliver power via the IHS.
View attachment 45341

All in all, fairly interesting bits they have. Issue remains how much money will the typical customer pay. And the problem is that the core is being designed for the least common denominator
With stacking we get at one more way of creating a new CCD for a new SKU for which some might be willing to pay, in addition to the existing way of adding extra cores with a second CCD.

Remains to be seen if someone finds new info on what lies beneath the IHS. Power delivery or TEC or whatever it may be, it sounds interesting.

Zen 4 was developed in parallel to Zen 3. With Zen 4, AMD has to allow the die to shrink significantly. They actually gave a margin target a few years ago. I can't remember what the exact number was, but I want to say it is 65%. For them to hit that target, the die would have to shrink significantly, therefore I suspect that L3 cache won't grow at all. I guess we will see.. Not sure about cache stacking, TBH.

EDIT: I want to add that I don't think a 65% margin is greedy. I think it is the norm for most high performance companies. A 65% margin allows a company to have a healthy R&D budget and lower margin products elsewhere. I bought a 5950X knowing that AMD had a > 65% margin on my purchase. I am okay with that. Intel's margins are around 65-70%.
 
Last edited:

jamescox

Senior member
Nov 11, 2009
637
1,103
136
It sounds like people assume interposers must be mostly logic-free. I don't buy that notion. You probably don't want to stack over any logic on the interposer, but it doesn't mean it has to be some kind of dumb bridge.
I think most of the interposers we have seen so far (HBM gpus) are passive interposers; dumb bridges. They are built on an older process since the TSV pitch is something like 50 microns. They do not have any logic since there are no transistor layers. They just do TSVs and then metal interconnect layers on top. This allows much denser interconnect than the PCB but it is still very large geometries compared to more modern processes.

If the interposer has transistor layers then it is an active interposer with some logic. I don’t know if anyone is using what would be described as an active interposer yet, but I may have missed it. I have expected AMD to use an active interposer for the IO die, but it is unclear what form that will take. There are a lot of different stacking technologies available from TSMC. The tech used for the cache chips allows for much finer pitch TSVs, possibly an order of magnitude smaller than the older micro-solder ball stacking. This tech and the other stacking technologies from TSMC could enable all manner of different things, so it is hard to tell what direction they will go.
 
  • Like
Reactions: Tlh97 and Vattila

moinmoin

Diamond Member
Jun 1, 2017
4,933
7,619
136
So far AMD has only introduced packaging complexity when it directly allows for scaling out the design. So I'm expecting them to continue with relatively "simple" designs that strike a good balance between fewest possible distinct smallish die designs, lowest possible packaging complexity and highest possible scalability.
 

blckgrffn

Diamond Member
May 1, 2003
9,110
3,029
136
www.teamjuchems.com
So far AMD has only introduced packaging complexity when it directly allows for scaling out the design. So I'm expecting them to continue with relatively "simple" designs that strike a good balance between fewest possible distinct smallish die designs, lowest possible packaging complexity and highest possible scalability.

Pragmatism FTW.

This should almost always be the answer for every business but it’s so easy to lose sight of.

My business partner and I are constantly reminding each other simple is good and working really hard to solve some niche scenario/problem we are facing or trying to prevent is really time wasted that we could be using to solve the real problems. The most profitable systems aren’t the ones that solve for every use case.
 

MadRat

Lifer
Oct 14, 1999
11,909
229
106
I'd think if HBM is coming to the CPU that it's to a standalone interposer for something disproportionately larger than anything else. It should be in GB rather than MB at that point. Anything less becomes an overly complicated buffer, not worth the cost. No matter how much bandwidth is there in the chip with HBM attached, it's going to be only a subset of CPU bandwidth. Therefore you can afford to create rows of HBM. This suggests HBM wouldn't come anytime soon in a next step for consumer chip but rather be squarely aimed at EPYC customers where the rectangle shape is necessitated.

On the move to 6nm I think you're going to see a slight increase in core count per chip. They could move to 10, 12, 14, or 16 per die with scaling they've already been working on. They've created architecture that can be scaled as needed.. I know everyone likes iterations in powers of two. But you are no longer bound to filling out groups to the power of 2 on AMDs architecture. They can turn cores on and off at will. And with Infinity Fabric architecture, it's simplified to accomplish these incremental increases into the future.

The layout of the mockups would probably look a bit different IMHO, knowing the two factors above. I could be wrong. But it seems to be evolving in that direction.
 

maddie

Diamond Member
Jul 18, 2010
4,723
4,628
136
Pragmatism FTW.

This should almost always be the answer for every business but it’s so easy to lose sight of.

My business partner and I are constantly reminding each other simple is good and working really hard to solve some niche scenario/problem we are facing or trying to prevent is really time wasted that we could be using to solve the real problems. The most profitable systems aren’t the ones that solve for every use case.
Pareto?
 

CakeMonster

Golden Member
Nov 22, 2012
1,384
482
136
I'd love for more cores to be added to the CCD's but if you're talking about 6nm, that's not Zen4 which is is 5nm, right? So that would be a still unknown Zen3 refresh, but we all assume that to be the one with more cache added, so.... would there be two Zen3 refreshes, one with cache and one with more cores? Or would the one with more cache also be 6nm (seems unlikely since they're already talking about measurements being similar to Zen3)?