João Bortolace
Member
- Jan 12, 2021
- 30
- 64
- 61
HBM2E stacks will be adjacent to the tiles and connected via EMIB. 64 GB would be four 16 GB stacks, probably one per tile. I'm guessing each tile will also have a pair of dual-channel DDR5 memory controllers, although a maximum of 8 channels will be exposed per module/package.You think they wont try to compete with 64core TR, when they can? I mean, if only to have the halo-product, to be able to claim they have the fastest CPU and whatnot. Surely they wont be able to say that about 2-tile version.
Regarding HBM, i guess you are right. But do you assume then it will be stacked on top of those tiles and its not part of it?
You think they wont try to compete with 64core TR, when they can?
HBM2E stacks will be adjacent to the tiles and connected via EMIB. 64 GB would be four 16 GB stacks, probably one per tile. I'm guessing each tile will also have a pair of dual-channel DDR5 memory controllers, although a maximum of 8 channels will be exposed per module/package.
edit: I should also point out that Intel is shooting for a Q4'21 PRQ for Sapphire Rapids. That means starting the volume ramp in December for a Q2'22 launch. So I wouldn't expect retail availability of workstation/HEDT parts until the latter half of 2022.
The tiles are ridiculously large. There's a reason there's been rumors that Intel is keeping Ice Lake Server on the market and SPR is only going to be the high end product.
If its still considered large and expensive, it kind of misses its purpose, is it not? And why are they not making them smaller, 8-core or so, like AMD does, then?
jpiniero is "focused" on yields. The SPR tiles are around 420 mm², whereas ICL-SP XCC 40C is 19.5 mm x 32 mm = 624 mm². We actually don't know the mesh configuration at this juncture. People were looking at the micro-bump fields which are on top of the metal layers and somehow thought they were seeing something that related to FEOL features.The tiles seem to have 4x5 configuration. Same as mine HCC Skylake, which is like 4 years old. I know its different node, but surely by now thats not considered "ridiculously large" - what is the monolithic 40-core IceLake then? I thought the whole point of this tile approach was to lower the costs thanks to better yields. If its still considered large and expensive, it kind of misses its purpose, is it not? And why are they not making them smaller, 8-core or so, like AMD does, then?
Looks like there's plenty of room on that substrate to me. They would directly abut the compute tiles and only measure 10 mm x 11 mm. For reference, the compute tiles are ~20.5 mm / side, and the package is 77.6 mm x 54 mm.
Where do you reckon would those adjacent HBM be in there?
I think it's like the original Epyc in how it's designed with the only difference being that they are connected via EMIB. So you always need 4 of them, and looking at the leaked picture it looks like at least 250 mm2 each.
250mm2 would be less than RocketLake. Skylake-X is 484, almost 2x as big. Its not something never seen before.
They certainly aren't going on that package. I think hbm srapids is going to be the high end stuff only, finally a real reason for intel to charge and arm and a leg for platinum cpus.Where do you reckon would those adjacent HBM be in there?
@mikk, @eek2121
BTW to reiterate where I stand on Alder Lake. My estimates are:
- It will be faster (in all likelyhood 10-20% faster) than Zen 3 in ST or lightly threaded productivity apps
- A top-of-the-line 8 + 8 Alder Lake will give Ryzen 5900X a run for it's money in MT performance, not sure how much faster/slower but being all-around in the same range. This means:
- 8 + 8 will be in very similar balpark while 8 + 0 conf will certainly win in heavily parallel AVX-512 enabled workloads, but in no AVX2 or below workloads that utilize more than 8 threads.
- It will not outperform a 5950X on average in MT loads. At least not actual reviewer benchmark suites (say Tom's hardware, Anandtech etc)
- In Gaming performance it should eek out Rocket Lake and Zen3 slightly but the difference will not be large (up to 5% on average) as gaming is mostly sensitive to cache and Memory Latency and Cache size and only to a lesser extent IPC. The titles where Zen 3 is doing better than Rocket lake (due to cache) will see the biggest gain going from 16MB -> 24MB of L3. All of This will only happen if all of the issues listed below are solved:
- There the gain can be considerably larger than what's stated above. They will need to fix the cache-latency issues that plague Rocket Lake for that to be the case, but that's most likely not a problem as the latter was hampered by a backport to 14nm.
- Windows Scheduler will also need updates to not raise any issues in games that scale to > 8 threads. This can be fixed by running in 8 + 0 config.
- For clear gaming wins either DDR4 based MOBOs (for best OC latency in the beginning) or very good early DDR5 XMP module availability is requried.
Let's revisit this after reviews to see see how terribly AMD biased Intel hater I end up being in the end.
I still remember people being angry at me for claiming that Rocket Lake might not significantly outperform Zen 3 in gaming (due to aforementioned cache and memory limitations, which certainly ended up being true)
The only reason Epyc 1 always consisted of 4 dies is due to AMD positioning the SP3 socket as a platform that always offers 8 memory channels regardless whether it's a top or bottom end chip used (and every die has 2 channels). Intel on the other hand never has been shy to introduce segmentation within a platform.I think it's like the original Epyc in how it's designed with the only difference being that they are connected via EMIB. So you always need 4 of them, and looking at the leaked picture it looks like at least 250 mm2 each.
Yeah, it's bananas, when you think about it. When people talk about HBM2E being expensive, I don't think they're fully considering how expensive these chips are going to be. A 16 GB HBM2E stack costs what, $160?That's on 14 nm though. If the tiles are closer to 420 mm2, that's over 1600 mm2 of 10 nm for one chip.
The only reason Epyc 1 always consisted of 4 dies is due to AMD positioning the SP3 socket as a platform that always offers 8 memory channels regardless whether it's a top or bottom end chip used (and every die has 2 channels). Intel on the other hand never has been shy to introduce segmentation within a platform.
Yeah, it's bananas, when you think about it. When people talk about HBM2E being expensive, I don't think they're fully considering how expensive these chips are going to be. A 16 GB HBM2E stack costs what, $160?
We will see. For the record, I never had faith in Rocket Lake. The reason I think your thinking is flawed is because you aren’t taking into consideration the fact that ADL-S will likely be able to sustain much higher nT boost clocks (compared to rocket lake) thanks to lower power consumption and improved thermals. I personally expect multicore workloads to be drastically improved over rocket lake. Also, DDR5 will double memory bandwidth, which will also show itself in interesting ways (assuming it launches with DDR5)
I suspect it will beat the 5950X by a good 10-15%.
And it's also a much wider core than each Willow Cove core as well. Personally, I don't expect higher clocks on each WLC core within limited power budgets.
As for with unlimited power? Who knows, but 4.7GHz all core is a high bar to beat.
Intel generally doesn't lower all-core boost much from generation to generation. I think they'll end up in the 4.6 GHz to 4.7 GHz range for the top Alder Lake all-core turbo. The TDP is based on base clocks, and from that I think the top Alder Lake will have base clocks in the ~3.0 GHz region, maybe a hair higher. I'd be happy with a 20% IPC boost, but I'll plan more on 15%.Agreed. For Alder Lake I’m banking on roughly a ~4.5Ghz all-core boost (For higher end SKUs), but with a 20% IPC boost.
God your messages are awfully depressing now. I get it Intel has been Mis-executing and not living to expectations doesn’t mean they ‘shouldn’t be invited’. Intel hotchips program looks to be exciting, it’s their latest a greatest, you want them to show Tigerlake again? 🤷♂️
- is that HBM meant to serve as some kind of L4 cache or is it supposed to serve as actual system RAM? 64GB system RAM for servers seems fairly low...and pricing-wise i cant see so much RAM to be part of consumer grade product... so will the HEDT part not have it? Anyway, it would surely provide massive performance boost by itself, would it not?
Still, I wonder about that. Tiger Lake-H actually scores very similarly to Rocket Lake-S with both limited to 35-45W according to both sets of scores from early Chinese reviews we saw earlier, so they're obviously holding similar clocks. In one review RKL-S wins, in the other TGL-H wins, both by small margins though (of course, they're not the same tests, R15 vs R20).
What's your opinion about Notebookcheck's coverage?
I will consider it a very good product if 8+8 AL can beat Ryzen in single thread and match or exceed 5900x in multi threaded. I know this forum is all about performance, but really I consider 5900x the top of the line "mainstream" AMD product. 5950x is more of a halo product. Hopefully, top of the line AL will also be significantly cheaper than the 800.00 (if you can find one at MSRP) 5950x.What's your point?
Mine is that they should leave Hot Chips for something actually meaningful. Do you seriously not remember Tigerlake at Hot Chips? They should have gave us the official die pic, or the breakdown, or how it clocked higher(aside from that it just does), or the transistor count, or details on the caches. Do I need to say more?
It won't be cache, if they decide to use HBM on HEDT parts, which I have some doubt on.
HBM will simply act as fast memory. Go look up at Knights Landing.
What's your opinion about Notebookcheck's coverage?
Alderlake: I don't think Alderlake will do more than beat 5900X slightly. It might have a chance to be good on Notebooks. Intel will need more Golden Cove or Gracemont cores to beat AMD's top line.
They'll potentially be in a far better situation than today.
Warning, there's a lot of assumptions I'm going to be making here:
Assumptions:
Zen 2 IPC = Skylake IPC
Zen 3 IPC = 1.20 Zen 2 IPC
Sunny Cove IPC = 1.20 Skylake IPC
Willow Cove IPC = 1.05 Sunny Cove IPC
Golden Cove IPC = 1.20 WIllow Cove IPC
Gracemont IPC = Skylake IPC
Golden Cove (ADL) All-core = 4.6GHz
Gracemont (ADL) All-core = 3.5 GHz
Zen 3 (5900X) All-core = 4.1 GHz
Zen 3 (5950X) All-core = 3.75 GHz
Alder Lake has a hardware scheduler/Windows scheduler is fixed an optimised for heteregeneous cores. The scheduler does well with balancing Golden Cove and Gracmont Atom cores. (Biggest Assumption right now in my opinion)
Single-threading:
ADL 12900K: 1.20 × 1.05 × 1.00 = 1.260
Zen 3 5900X: 1.20 × 0.98 = 1.152
Zen 3 5950X: 1.20 × 0.98 = 1.176
Multi-threaded:
ADL 12900K: 1.20 × 1.05 × 0.92 × 8 + 0.7 × 8 = 14.87
Zen 3 5900X: 1.20 × 0.82 × 12 = 11.81
Zen 3 5950X: 1.20 × 0.75 × 16 = 14.40
All calculated numbers are relative to a fictional Zen 2/Skylake core at 5 GHz.
By these very crude estimates, Alder Lake might be able to beat Vermeer in both single threaded and multi threaded workloads/benchmarks.
Of course, Zen 4 will be out in 2022 eventually, but that's competing with Raptor Lake, not Alder Lake.
I take it you're assuming sustained PL2 for Alder Lake and stock 142W PPT for the 5950X there and using those numbers to compare? Because you certainly won't see Alder Lake clock at 4.6GHz all core at it's 125W PL1.
Also I don't think you've factored in the Gracemont's lack of SMT into the equation here.
Also, Tiger Lake doesn't seem to show a 5% IPC advantage over Ice Lake. Intel claim the same 18% as Ice Lake, and Anandtech found a small regression in performance per clock.