Discussion Intel current and future Lakes & Rapids thread

repoman27 · Apr 20, 2021

IntelUser2000 said:
Alderlake-M is 1+4 with 64EUs, not 2+8 with 96EUs.

ADL-M includes U9 which is 2+8+2. The M5 SKUs top out at 1+4 with 64EUs. This is essentially an updated version of Lakefield, which makes sense. But is there any evidence that Intel is making a separate die for these SKUs? Or is this just 2+8+2 with half the cores and 1/3 to 1/2 of the GPU disabled?

jpiniero · Apr 20, 2021

repoman27 said:
ADL-M includes U9 which is 2+8+2. The M5 SKUs top out at 1+4 with 64EUs. This is essentially an updated version of Lakefield, which makes sense. But is there any evidence that Intel is making a separate die for these SKUs? Or is this just 2+8+2 with half the cores and 1/3 to 1/2 of the GPU disabled?

The P die is gonna be huge. You're adding 2 more big cores (and the Big cores are going to be way bigger) plus the two new small core clusters. Might be too big for the "Y" packaging.

TESKATLIPOKA · Apr 20, 2021

One thing I don't understand is, why both U9 and U15 have only 2+8 version and there is no 4+8 version for U15. This one will compete against 5600U in MT at best while AMD still has the faster 5800U.

Hitman928 · Apr 20, 2021

IntelUser2000 said:
There's a decently sized thread in Notebookreview forums where this one guy goes through all the tips on increasing battery life.

Here's the link to the OP getting 1.0W on his Y CPU under these conditions:

TechnologyGuide

Thank you for visiting the TechnologyGuide network. Unfortunately, these forums are no longer active. We extend a heartfelt thank you to the entire community for their steadfast support—it is really you, our readers, that drove

forum.notebookreview.com

That's a Kabylake-Y. He says his friend's Broadwell-Y setup can idle at 0.2W package power, and it's under Throttlestop which will have some activity, nevermind being unable to completely control a real-time OS like Windows.

Did you look to see what they do to get there? Manual undervolting, disabling controllers, turning off all windows effects, registry hacks, turning off turboboost, forcing worse CPU responsiveness to keep it in deep C states. They are trying to force their CPU to run below base clocks as much as possible. They practically turn their laptops into a glorified kindle reader. Their system responsiveness is going to be terrible and honestly unacceptable for most people buying a modern laptop. I also very much doubt they can actually keep their CPU power consumption as low as they claim if actually using it normally. Having 4 tabs already open doesn't mean anything if there is nothing happening in the web pages, once they're open there's nothing to process if there isn't an ad or something interactive going on in the page which I highly doubt there is. The youtube video playing in the background is impressive, but what video, at what resolution? I'd be really shocked if it's an actual HD video playing. I'd also like to see an actual power consumption chart over time as they are basically just picking out an instant in time to report power consumption but when they actually start loading web pages or their video is processing/buffering I guarantee their consumption increases.

If they can run any actual benchmark or real activity on that laptop without terrible performance and keep that consumption that low, let me know. I guarantee you this is not representative of how Intel would configure a reference system to give to a reviewer. Alderlake would be seen as a complete failed product if they tried.

jpiniero · Apr 20, 2021

TESKATLIPOKA said:
One thing I don't understand is, why both U9 and U15 have only 2+8 version and there is no 4+8 version for U15. This one will go against 5600U in MT while AMD still has 5800U.

Yields I'm guessing. OEMs do have the option of going down to 20 W on U28 and could probably go to 15 anyway.

The U15 models would presumably be tighter quality binned and have higher turbo frequencies.

repoman27 · Apr 20, 2021

TESKATLIPOKA said:
Guys, I don't have a problem with Intel releasing a separate die with fewer cores, because as you mentioned It's more cost-effective etc, etc.
I just don't understand why they want to make a separate die with only big cores and no small cores. Why don't they use some mobile version or make a 4+8 32EU variant instead If the IGP is too big? I am expecting that 4 small cores are not much bigger than a single big core, the same as with Lakefield.
The only possibility I see is that 6 cores would perform better within 6 threaded apps than a 4+8 version, where I expect the small core having ~1/2 of the big core's performance.

Avoiding the hybrid design and only having a single ring bus instead of dual is probably a lot simpler. This die may have been the starting point for Alder Lake and provides a way to evaluate Golden Cove cores in a traditional homogenous scenario. The mobile versions also include integrated Thunderbolt 4 and GT2 graphics. The OEMs that cater to the high-volume, price sensitive segment of the desktop market aren't willing to pay for integrated Thunderbolt or 96 EUs.

The mobile market is where the money is. Entry level desktop chips are the bottom of the barrel for Intel. Think of 6+0+1 as being the 6+8+2 mobile die (the largest and most expensive Alder Lake die) with everything except for the 6 big cores and 32 EUs ripped out, with no real worries about frequency scaling or leakage, in a simple flip-chip LGA package. These products aren't about benchmarks or performance, they're all about trying to preserve margins while hitting a price-point.

repoman27 · Apr 20, 2021

TESKATLIPOKA said:
One thing I don't understand is, why both U9 and U15 have only 2+8 version and there is no 4+8 version for U15. This one will compete against 5600U in MT at best while AMD still has the faster 5800U.

I would imagine it's because 4 Golden Cove cores require a TDP greater than 15 W.

repoman27 · Apr 20, 2021

jpiniero said:
The P die is gonna be huge. You're adding 2 more big cores (and the Big cores are going to be way bigger) plus the two new small core clusters. Might be too big for the "Y" packaging.

This is why I posted my cheat sheet in the first place. There isn't really a "P" die or "Y" packaging. There are two LP dies (2+8+2 and 6+8+2), two HP dies (6+0+1 and 8+8+1), and four packages (M, P, S BGA, and S). M and P are multi-chip packages including the PCH, while S BGA and S are "2-chip" platforms with separate packages for the CPU and PCH. As far as we know, all of the packages are traditional flip-chip using an organic substrate for fan-out, i.e. the size of the ball/land grid is the limiting factor, not the dimensions of the silicon dies.

There may actually be three PCH dies (M, P, and S) rather than just two (LP and H). I think Ice Lake may have had three as well (N, LP, and H). Unfortunately, most of the info regarding the Alder Lake PCHs and PCIe lanes at this point is coming from the coreboot project, and I'm not so sure the people reporting on these tidbits are correctly interpreting what they're finding there.

jpiniero · Apr 20, 2021

repoman27 said:
the size of the ball/land grid is the limiting factor, not the dimensions of the silicon dies.

Go look at the Tiger Lake Y/UP4 package. It's a tight fit as it is and we're talking about a P die that's going to be 30-40 mm2 bigger. It's not a small amount. The cost of the die size is I'm sure a factor but if you want to reuse the UP4 package or something close to it, the P die isn't fitting.

Shivansps · Apr 20, 2021

Magic Carpet said:
It’s interesting to see how fast Windows Scheduler adapts to these newer cpus. I know, that it took a Windows release for AMD Thuban to properly boost its cores (Didn’t work as intended in Windows 7 OOTB). Shouldn’t take long, I’d imagine since we on a faster cycle now, but still interesting.

Windows already has support Big.Little because it runs on ARM cpus like the 8CX that has a 4+4 configuration, so im petty sure thats already in, it will be a matter of properly ID the x86-64 CPU with big+small. So i expect day 1 support.

The one thing that may causes issues is the HT, i havent read anything about that, Alder Lake it is going to have HT on the big cores? If so you might fall into something unknow that is HT vs Small Cores.

jpiniero · Apr 20, 2021

Shivansps said:
Alder Lake it is going to have HT on the big cores?

Yes.

repoman27 · Apr 20, 2021

jpiniero said:
Go look at the Tiger Lake Y/UP4 package. It's a tight fit as it is and we're talking about a P die that's going to be 30-40 mm2 bigger. It's not a small amount. The cost of the die size is I'm sure a factor but if you want to reuse the UP4 package or something close to it, the P die isn't fitting.

I think we're mostly agreeing here, but I'm not entirely sure...

P is a 50 mm x 25 mm BGA1744 package, very similar to previous U/Type 3/UP3 packages. Both the 6+8+2 LP and 2+8+2 LP dies fit on it.

M is BGA1781 but ostensibly a smaller package using tighter ball-pitch, similar to previous Y/Type 4/UP4 packages. It can accommodate the 2+8+2 LP die.

The 6+8+2 LP die is only found in 28 and 45 W TDP SKUs with a minimum cTDP-down of 20 W. Y/Type 4/UP4/M is first and foremost for extremely low power platforms between 4.5 W and 9 W with a maximum cTDP-up of 15 W. 6+8+2 doesn't fit under 20 W, so it's right out, regardless of the die size.

Intel can make an FCBGA package any size they need to in order to accommodate the dies, power delivery, and I/O. I don't believe M, even M5, is going to be a chip-scale Foveros package with PoP DRAM like Lakefield was.

repoman27 · Apr 20, 2021

Shivansps said:
Windows already has support Big.Little because it runs on ARM cpus like the 8CX that has a 4+4 configuration, so im petty sure thats already in, it will be a matter of properly ID the x86-64 CPU with big+small. So i expect day 1 support.

The one thing that may causes issues is the HT, i havent read anything about that, Alder Lake it is going to have HT on the big cores? If so you might fall into something unknow that is HT vs Small Cores.

Doesn't Windows already support Lakefield just fine?

coercitiv · Apr 20, 2021

repoman27 said:
Doesn't Windows already support Lakefield just fine?

It does, although Lakefield sunk into obscurity from day one, so nobody really bothered to analyze it further.

However a 1+4 configuration is hardly a good candidate to evaluate scheduler efficiency. I would expect the 6+8 die to be much more interesting, as the scheduler would arguably have more complex choices to make.

TESKATLIPOKA · Apr 20, 2021

repoman27 said:
I would imagine it's because 4 Golden Cove cores require a TDP greater than 15 W.

Intel has 2+8+2 within 9W TDP and 6+8+2 can be set to just 20W, so I don't see why 4+8+2 is not possible for 15W TDP.

Asterox · Apr 20, 2021

As reminder, first AMD Desktop APU(Athlon 200GE) with Vega 3 iGPU was launched in September 2018.

IntelUser2000 · Apr 20, 2021

Hitman928 said:
Did you look to see what they do to get there? Manual undervolting, disabling controllers, turning off all windows effects, registry hacks, turning off turboboost, forcing worse CPU responsiveness to keep it in deep C states. They are trying to force their CPU to run below base clocks as much as possible. They practically turn their laptops into a glorified kindle reader.

You haven't actually worked on a laptop per the guide. I did.

I don't go outright disabling all processes. But I know laptops out of the box come with whole bunch of processes and software that ruins the power efficiency. A single software/hardware not behaving can prevent the CPU from going into lower power states.

Also, Haswell-U can support sleep states all the way down to C10, but in most actual implementations they don't go below C7 even on the Skylake generation. You are at the mercy of the device in that regard. If the implementation is good and the BIOS has support, potentially you can go really low. Some of the hacks are just to enable things that the manufacturer has disabled.

We're talking about the potential for a chip right?

TESKATLIPOKA said:
Guys, I don't have a problem with Intel releasing a separate die with fewer cores, because as you mentioned It's more cost-effective etc, etc.

Probably marketing. The rumor is that they'll change to "Hybrid Threading". So 6 cores without Hyperthreading means 6 threads, something fit for Core i3. So Core i3's go from 4/8 to 6/6.

You were right about Alderlake-M. It'll be interesting to see the package, if they are going to have Foveros on 5W or not. Lakefield was actually die size limited to get the package that small.

IntelUser2000 · Apr 20, 2021

TESKATLIPOKA said:
Intel has 2+8+2 within 9W TDP and 6+8+2 can be set to just 20W, so I don't see why 4+8+2 is not possible for 15W TDP.

Realistically at 9W, the TDP is low enough that it even limits single threaded performance. Look at Amberlake. If you look at every benchmark other than Geekbench, the 15W parts are 20-30% faster in single threaded applications.

You need 15W for full single thread performance, and beyond that for multi-threads. 9W is probably just enough to mostly unleash performance of the 8 Gracemont cores.

Again 28W is pushing it for 6+8. 45W makes much more sense.

TESKATLIPOKA · Apr 20, 2021

Of course performance will be limited thanks to low TDP, but that's to be expected in this segment and even 15W is not enough for unleashing full single core performance.
AMD has 8 big cores with 15W TDP and no one cares It has limited performance.

IntelUser2000 · Apr 20, 2021

Yea I mean, if you look at Tigerlake, 15W barely serves 4 cores. AMD has 8 because not only it's smaller but more power efficient too.

The decisions are probably arbitrary as well. Marketing, profits, performance all play a role. 8+8+2 is possible too, but will it make sense as a brand, perf/watt, and in revenue terms?

Golden Cove, unless they change the design paradigm it's likely it won't be more efficient in MT. The growth in transistors don't result in corresponding increases in performance.

I won't bet on ESF being a big of a gain as SF. 10nm Icelake process sucked, so SF had lot more potential. SF brought 20% gains. ESF, maybe 7-10% on top of that?

Thala · Apr 20, 2021

TESKATLIPOKA said:
Of course performance will be limited thanks to low TDP, but that's to be expected in this segment and even 15W is not enough for unleashing full single core performance.
AMD has 8 big cores with 15W TDP and no one cares It has limited performance.

And almost all laptops with 8 core ryzen are configured for at least 25W - for good reason. I mean 15W already limits 4-core Tigerlake quite significantly.

Hitman928 · Apr 20, 2021

IntelUser2000 said:
We're talking about the potential for a chip right?

No, we're talking about supposed 'leaked' benchmark power consumption results, not a race to the bottom of who has the worst performing, yet least power consuming chip.

JoeRambo · Apr 21, 2021

Not sure if it was posted already, but it seems people have figured out cache composition for Alder Lake ( due to lucky leak of GB5 OpenCL bench getting scheduled on small cores and revealing structure ).

https://twitter.com/x/status/1382239074560438273

24MB of L3 in 10 slices, Golden cove has 1.25MB of L2, Gracemont cluster has 2MB of L2 shared for each whole cluster of 4C.

What is also interesting is 64KB L1I cache for Gracemont, Intel probably realized that without uCode cache L1I is glass jaw of performance and are increasing it from 32KB in Tremont to 64KB in Silvermont.

Ajay · Apr 21, 2021

Hmm, haven't followed the 'mont cores at all really, but why have both a shared L2$ and shared L3$ per four core cluster?
L2$ is probably inclusive and L3$ victim; sharing L2$ seems pretty old skool.

JoeRambo · Apr 21, 2021

Ajay said:
Hmm, haven't followed the 'mont cores at all really, but why have both a shared L2$ and shared L3$ per four core cluster?
L2$ is probably inclusive and L3$ victim; sharing L2$ seems pretty old skool.

L3 is shared whole chip wide and is victim most likely, and sharing it with the rest of the chip is good way to reduce memory misses. Even more so with heterogenous chip, you don't want reschedule between clusters or on big core turn into 100% memory cold scenario, L3 will have some data at least.
Cluster shared L2 increases it for all cores at the cost of some latency ( and some advanced problems like one core being memory traffic heavy can evict lines for the rest the cores ).
512kb of L2 private to each would be still OK, having 2MB shared at some 1X cycles of latency is even better.

I would not discount coherency traffic simplifications either, why have 4 agents, when you can get away with 1 - checking that inclusive L2 for everything small core clusters have in.

Discussion Intel current and future Lakes & Rapids thread

Senior member

Lifer

Platinum Member

Diamond Member

Lifer

Senior member

Senior member

Senior member

Lifer

Diamond Member

Lifer

Senior member

Senior member

Diamond Member

Platinum Member

Golden Member

Elite Member

Elite Member

Platinum Member

Elite Member

Golden Member

Diamond Member

Golden Member

Lifer

Golden Member