- Mar 3, 2017
- 1,607
- 5,800
- 136
I don't see any advantage in having CPU+IOD in a single 3nm die, but IGP using a separate die.Hi guys, since STX1 should move to chiplets design, there are some discussions on which solutions AMD will employ. Thus I have listed 3 designs with pro and cons.
Cyan color indicates N3E process whereas orange color means N4 process. AMD should use 2 x MCD, each containing 16MB IC and supporting 32-bit x 2 LPDDR5x-8533 memory controller.
I created 2 versions of RDNA3+ cause there was rumors of 24 CU with 1536SP. That means STX1 will have 18TF . I have doubts but Intel is rumored to have 320EU with 2560ALU in the works so AMD may need to response
See for yourself and let me know what you think AMD will employ. If you have other ideas, do let me know...
Oh yeah, I use PHX1 base as scale: 178mm2 die size. N3E no doubt have higher density but STX has 8 Zen 5 cores and 4 Zen 4C (plus 8MB L3 cache?), RDNA3+ and other improvement. For the sake of comparison, let's use 178mm2.
RDNA3+ 768SP (1536ALU) Pros and Cons RDNA3+ 1536SP (3072ALU) View attachment 74936
- Derived from desktop 7000 CPU
- CCD contains CPU with L3 cache
- IOD contains graphics, AIE and FCH
- Small CCD saves cost
- IOD can get pretty big with more features like double the SP of graphics
View attachment 74937 View attachment 74938
- Diagram leaks by RGT
- CCD contains CPU + IOD
- GCD contains graphics only
- CCD die is comparable big
- GCD can have multiple sizes
- AMD could change graphics GCD depends on competition
- GCD with 1536SP will draw more power. My estimate would be addition of 10-15W
- That's why AMD will create monolithic version of STX2 to cater ultraportable market
View attachment 74939
- Wild speculation: In 2025, STX+ RDNA4 with 1536SP
- Half of SP of N43
- With N3E, GCD should consume lower power than STX version
View attachment 74940 View attachment 74941
- All in one cores with external cache + memory controllers
- Almost like 3nm version of M2 Pro with IC
- BOM will be highest
- Lack of flexibility
View attachment 74942
F12P8 was for Intel I thought...There is going to be too much capacity.
TSMC did not actually start the F12P9 meant for Intel, only F12P8. Intel have cut orders.
Plenty of N3 is there by beginning of 2024. F18P5/6/7 are for N3 and F18P4 also was doing N3 as well during risk production in 2022. And by 2H24 Fab18P8 will be online.
No more expansion for N3 and TSMC is actually starting to build N2 fabs now, they already have clearance to build behind the Fab12P9 in Hsinchu.
Among them, the P5 plant of Nanke Plant 18B is mainly used for the second-generation processors of Apple's next-generation tablet PCs and Macbook series notebook computers; the eighth new plant (P8) of the R&D Center of Hsinchu Plant No. 12 is for Intel. The foundry supports the supporting chips in the company's core processors.
F12P8 and F12P9 are meant for Intel. TSMC did not start F12P9.F12P8 was for Intel I thought...
They already said, N3 and N4 for Zen 5. Also from LinkedIn, some engineers were mentioning 64 Gbps GMI on N3.
But 3D stacking would be more mature on N4 I guess at beginning of 2024
That is assuming you know all there is to know and ready to share. Otherwise there is not much to argue about until official disclosure.I know what was said.
This also tracks well with the recent report on TSMC seeking to woo it's big customers to transition to N3 family of nodes faster, allegedly pitching even price cuts/discounts.There is going to be too much capacity.
TSMC did not actually start the F12P9 meant for Intel, only F12P8. Intel have cut orders.
Plenty of N3 is there by beginning of 2024. F18P5/6/7 are for N3 and F18P4 also was doing N3 as well during risk production in 2022. And by 2H24 Fab18P8 will be online.
No more expansion for N3 and TSMC is actually starting to build N2 fabs now, they already have clearance to build behind the Fab12P9 in Hsinchu.
There is no "slow" IO die bit for Epyc. With 128 pci-express 5 lanes, that can be more than 500 GB/s of bandwidth, so it is similar to the memory interfaces. With CXL, it may actually be memory, so latency is important also.So my thought was that the only die-to-die connections are from the IOD-fast to each other die, so that would take care of the big-ticket items like feeding the CPU and GPU. Then the IOD-slow would only have to support enough bandwidth to run whatever IO you put on there — namely, stuff like PCIe, USB4, and a few other bits and pieces.
I think the big question would actually be the GPU. It's likely to be the most bandwidth-intensive die, so that link might need special treatment. Perhaps IFOP for the CPU/IOD-slow and "Infinity Fanout Links" or EFB for the GPU?
Another reason that I believe Zen 5 has double cores count cause Turin will double the cores from 96 cores to 192 cores with same socket SP5.
Even taking it at face value, that comparison seems to be to base N5. Comparing N3E to N5P or N4P would show a much smaller gap.At the very least N5 --> N3E should yield as much perf as the N7 --> N5 transition if not better.
I doubt that's something that would change much on new node. With SRAM density near constant, you're not going to get any relief there, and high speed logic like you need to hit ~6GHz is going to be quite low density. You'd only expect a change there if AMD backed off on their frequency targets significantly.Zen 4 CCD has an abysmally low density for N5 at ~93 MTr/mm2. The thermal hotspot also is a constraint.
I think clock speed can also tell us a bit about how "grounds up" the new architecture is. For an iterative design, at least, you don't want to budge your cycle times much, because it means you have to re-tune all those critical timing loops you spent so long refining.I wouldn’t mind a frequency regression if it meant improved efficiency, but I doubt AMD will willingly drop frequencies. Many people still look at clock speed to this day as an indicator of performance.
It limits their flexibility, but combining the CPU and IOD (or at least memory controller) would make some sense from a PnP perspective. Will be very, very curious to see where they draw the line, if at all. If Strix is still on N4/P, then I think they'd probably leave it monolithic.I don't see any advantage in having CPU+IOD in a single 3nm die, but IGP using a separate die.
From what you made the first or last is most likely.
Yeah, that idea was just meant for client. For the datacenter chips, I think if they did a multi-chip IO die, they'd just chunk it up into 2 or 4 mirrored pieces, with something fast like EFB in between.There is no "slow" IO die bit for Epyc. With 128 pci-express 5 lanes, that can be more than 500 GB/s of bandwidth, so it is similar to the memory interfaces. With CXL, it may actually be memory, so latency is important also.
Someone in the Zen 4 thread had this idea (forgot who it was, but all credit goes to them)
If Zen5c is indeed on N3 node, that explains how AMD solve Turin Bergamo Next die size issue. I was expecting 32 Zen 5c per CCD but seems like die area would be too big on N4. So with N3, 32 * 8 = 256 cores seem possible...not sure on stacking cache thoughView attachment 75060View attachment 75061View attachment 75062
Seems I keep forgetting what I posted in the first page. From these slides, DT Zen 5 it would seem will be on N4, and Zen 5c and Strix (Advanced Node) on N3.
Zen 5 and Zen 5 3D V-Cache would make sense to have on N4. 3D stacking tech probably not ready on N3.
I forgot that they said EPYC will have XDNA.
Of course, I know. Please refer to front page of this thread to have better idea how AMD might put 16 cores per CCD on N4 within same die area...No it doesn't. The process used gains minimally in density and it improves the uarch which means the core size will be at the best equal if not bigger.
Didn't know about bonding issues until now. But I would expect this will resolve as long there is demand.The V-cache die is not the limitation. The bonding process is. I read somewhere that currently TSMC has a production limit of about 30K V-cache CPUs per month. Has there been any progress in increasing that rate?
Not sure if Intel can have access to V-cache. I think it's more likely that AMD has reserved all V-cache capacity at TSMC fabs for the next few years. Depending on their deal, it could even be a perpetually exclusive deal where only AMD gets to enjoy V-cache because maybe they had a big hand in helping TSMC develop it.then 600K CPU per quarter are available for Intel and AMD to pursuit.
Nah, I doubt that. My understanding is that the tools they're using for hybrid bonding come from one supplier, and they've basically just been shipping trial systems unsuited to volume production. AMD's probably been the guinea pig for it. This year or next, they should get the proper equipment, and things should really ramp up.Not sure if Intel can have access to V-cache. I think it's more likely that AMD has reserved all V-cache capacity at TSMC fabs for the next few years. Depending on their deal, it could even be a perpetually exclusive deal where only AMD gets to enjoy V-cache because maybe they had a big hand in helping TSMC develop it.
Don't you find it weird that no one has V-cache on their roadmap other than AMD?This year or next, they should get the proper equipment, and things should really ramp up.
Not really, in light of what I just said. TSMC needed something they could make in fairly low volume, and the customer couldn't be dependent on the tech this early in its life cycle. AMD fit the bill. I'm sure in a year or two we'll start to see broader adoption.Don't you find it weird that no one has V-cache on their roadmap other than AMD?
I don't think that necessarily follows. It's a reasonable assumption, but AMD's clearly trying not to publicly tie any core or product to a given node. The "Advanced Node" thing for Strix is them deliberately withholding that info for now. Tbh, wouldn't be surprising if Strix uses N4P and server use N3.Seems I keep forgetting what I posted in the first page. From these slides, DT Zen 5 it would seem will be on N4, and Zen 5c and Strix (Advanced Node) on N3.
Don't you find it weird that no one has V-cache on their roadmap other than AMD?
Not really, in light of what I just said. TSMC needed something they could make in fairly low volume, and the customer couldn't be dependent on the tech this early in its life cycle. AMD fit the bill. I'm sure in a year or two we'll start to see broader adoption.