Discussion Intel current and future Lakes & Rapids thread

Page 414 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

TESKATLIPOKA

Golden Member
May 1, 2020
1,295
1,551
106
One thing I don't understand is, why both U9 and U15 have only 2+8 version and there is no 4+8 version for U15. This one will compete against 5600U in MT at best while AMD still has the faster 5800U.
 
Last edited:

Hitman928

Diamond Member
Apr 15, 2012
4,396
5,863
136
There's a decently sized thread in Notebookreview forums where this one guy goes through all the tips on increasing battery life.

Here's the link to the OP getting 1.0W on his Y CPU under these conditions:


That's a Kabylake-Y. He says his friend's Broadwell-Y setup can idle at 0.2W package power, and it's under Throttlestop which will have some activity, nevermind being unable to completely control a real-time OS like Windows.
Did you look to see what they do to get there? Manual undervolting, disabling controllers, turning off all windows effects, registry hacks, turning off turboboost, forcing worse CPU responsiveness to keep it in deep C states. They are trying to force their CPU to run below base clocks as much as possible. They practically turn their laptops into a glorified kindle reader. Their system responsiveness is going to be terrible and honestly unacceptable for most people buying a modern laptop. I also very much doubt they can actually keep their CPU power consumption as low as they claim if actually using it normally. Having 4 tabs already open doesn't mean anything if there is nothing happening in the web pages, once they're open there's nothing to process if there isn't an ad or something interactive going on in the page which I highly doubt there is. The youtube video playing in the background is impressive, but what video, at what resolution? I'd be really shocked if it's an actual HD video playing. I'd also like to see an actual power consumption chart over time as they are basically just picking out an instant in time to report power consumption but when they actually start loading web pages or their video is processing/buffering I guarantee their consumption increases.

If they can run any actual benchmark or real activity on that laptop without terrible performance and keep that consumption that low, let me know. I guarantee you this is not representative of how Intel would configure a reference system to give to a reviewer. Alderlake would be seen as a complete failed product if they tried.
 
  • Like
Reactions: lightmanek

jpiniero

Lifer
Oct 1, 2010
12,864
4,152
136
One thing I don't understand is, why both U9 and U15 have only 2+8 version and there is no 4+8 version for U15. This one will go against 5600U in MT while AMD still has 5800U.
Yields I'm guessing. OEMs do have the option of going down to 20 W on U28 and could probably go to 15 anyway.

The U15 models would presumably be tighter quality binned and have higher turbo frequencies.
 

repoman27

Senior member
Dec 17, 2018
308
435
136
Guys, I don't have a problem with Intel releasing a separate die with fewer cores, because as you mentioned It's more cost-effective etc, etc.
I just don't understand why they want to make a separate die with only big cores and no small cores. Why don't they use some mobile version or make a 4+8 32EU variant instead If the IGP is too big? I am expecting that 4 small cores are not much bigger than a single big core, the same as with Lakefield.
The only possibility I see is that 6 cores would perform better within 6 threaded apps than a 4+8 version, where I expect the small core having ~1/2 of the big core's performance.
Avoiding the hybrid design and only having a single ring bus instead of dual is probably a lot simpler. This die may have been the starting point for Alder Lake and provides a way to evaluate Golden Cove cores in a traditional homogenous scenario. The mobile versions also include integrated Thunderbolt 4 and GT2 graphics. The OEMs that cater to the high-volume, price sensitive segment of the desktop market aren't willing to pay for integrated Thunderbolt or 96 EUs.

The mobile market is where the money is. Entry level desktop chips are the bottom of the barrel for Intel. Think of 6+0+1 as being the 6+8+2 mobile die (the largest and most expensive Alder Lake die) with everything except for the 6 big cores and 32 EUs ripped out, with no real worries about frequency scaling or leakage, in a simple flip-chip LGA package. These products aren't about benchmarks or performance, they're all about trying to preserve margins while hitting a price-point.
 

repoman27

Senior member
Dec 17, 2018
308
435
136
One thing I don't understand is, why both U9 and U15 have only 2+8 version and there is no 4+8 version for U15. This one will compete against 5600U in MT at best while AMD still has the faster 5800U.
I would imagine it's because 4 Golden Cove cores require a TDP greater than 15 W.
 

repoman27

Senior member
Dec 17, 2018
308
435
136
The P die is gonna be huge. You're adding 2 more big cores (and the Big cores are going to be way bigger) plus the two new small core clusters. Might be too big for the "Y" packaging.
This is why I posted my cheat sheet in the first place. There isn't really a "P" die or "Y" packaging. There are two LP dies (2+8+2 and 6+8+2), two HP dies (6+0+1 and 8+8+1), and four packages (M, P, S BGA, and S). M and P are multi-chip packages including the PCH, while S BGA and S are "2-chip" platforms with separate packages for the CPU and PCH. As far as we know, all of the packages are traditional flip-chip using an organic substrate for fan-out, i.e. the size of the ball/land grid is the limiting factor, not the dimensions of the silicon dies.

There may actually be three PCH dies (M, P, and S) rather than just two (LP and H). I think Ice Lake may have had three as well (N, LP, and H). Unfortunately, most of the info regarding the Alder Lake PCHs and PCIe lanes at this point is coming from the coreboot project, and I'm not so sure the people reporting on these tidbits are correctly interpreting what they're finding there.
 

jpiniero

Lifer
Oct 1, 2010
12,864
4,152
136
the size of the ball/land grid is the limiting factor, not the dimensions of the silicon dies.
Go look at the Tiger Lake Y/UP4 package. It's a tight fit as it is and we're talking about a P die that's going to be 30-40 mm2 bigger. It's not a small amount. The cost of the die size is I'm sure a factor but if you want to reuse the UP4 package or something close to it, the P die isn't fitting.
 

Shivansps

Diamond Member
Sep 11, 2013
3,657
1,331
136
It’s interesting to see how fast Windows Scheduler adapts to these newer cpus. I know, that it took a Windows release for AMD Thuban to properly boost its cores (Didn’t work as intended in Windows 7 OOTB). Shouldn’t take long, I’d imagine since we on a faster cycle now, but still interesting.
Windows already has support Big.Little because it runs on ARM cpus like the 8CX that has a 4+4 configuration, so im petty sure thats already in, it will be a matter of properly ID the x86-64 CPU with big+small. So i expect day 1 support.

The one thing that may causes issues is the HT, i havent read anything about that, Alder Lake it is going to have HT on the big cores? If so you might fall into something unknow that is HT vs Small Cores.
 

repoman27

Senior member
Dec 17, 2018
308
435
136
Go look at the Tiger Lake Y/UP4 package. It's a tight fit as it is and we're talking about a P die that's going to be 30-40 mm2 bigger. It's not a small amount. The cost of the die size is I'm sure a factor but if you want to reuse the UP4 package or something close to it, the P die isn't fitting.
I think we're mostly agreeing here, but I'm not entirely sure...

P is a 50 mm x 25 mm BGA1744 package, very similar to previous U/Type 3/UP3 packages. Both the 6+8+2 LP and 2+8+2 LP dies fit on it.

M is BGA1781 but ostensibly a smaller package using tighter ball-pitch, similar to previous Y/Type 4/UP4 packages. It can accommodate the 2+8+2 LP die.

The 6+8+2 LP die is only found in 28 and 45 W TDP SKUs with a minimum cTDP-down of 20 W. Y/Type 4/UP4/M is first and foremost for extremely low power platforms between 4.5 W and 9 W with a maximum cTDP-up of 15 W. 6+8+2 doesn't fit under 20 W, so it's right out, regardless of the die size.

Intel can make an FCBGA package any size they need to in order to accommodate the dies, power delivery, and I/O. I don't believe M, even M5, is going to be a chip-scale Foveros package with PoP DRAM like Lakefield was.
 

repoman27

Senior member
Dec 17, 2018
308
435
136
Windows already has support Big.Little because it runs on ARM cpus like the 8CX that has a 4+4 configuration, so im petty sure thats already in, it will be a matter of properly ID the x86-64 CPU with big+small. So i expect day 1 support.

The one thing that may causes issues is the HT, i havent read anything about that, Alder Lake it is going to have HT on the big cores? If so you might fall into something unknow that is HT vs Small Cores.
Doesn't Windows already support Lakefield just fine?
 

coercitiv

Diamond Member
Jan 24, 2014
5,386
9,020
136
Doesn't Windows already support Lakefield just fine?
It does, although Lakefield sunk into obscurity from day one, so nobody really bothered to analyze it further.

However a 1+4 configuration is hardly a good candidate to evaluate scheduler efficiency. I would expect the 6+8 die to be much more interesting, as the scheduler would arguably have more complex choices to make.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,590
3,648
136
Did you look to see what they do to get there? Manual undervolting, disabling controllers, turning off all windows effects, registry hacks, turning off turboboost, forcing worse CPU responsiveness to keep it in deep C states. They are trying to force their CPU to run below base clocks as much as possible. They practically turn their laptops into a glorified kindle reader.
You haven't actually worked on a laptop per the guide. I did.

I don't go outright disabling all processes. But I know laptops out of the box come with whole bunch of processes and software that ruins the power efficiency. A single software/hardware not behaving can prevent the CPU from going into lower power states.

Also, Haswell-U can support sleep states all the way down to C10, but in most actual implementations they don't go below C7 even on the Skylake generation. You are at the mercy of the device in that regard. If the implementation is good and the BIOS has support, potentially you can go really low. Some of the hacks are just to enable things that the manufacturer has disabled.

We're talking about the potential for a chip right?

Guys, I don't have a problem with Intel releasing a separate die with fewer cores, because as you mentioned It's more cost-effective etc, etc.
Probably marketing. The rumor is that they'll change to "Hybrid Threading". So 6 cores without Hyperthreading means 6 threads, something fit for Core i3. So Core i3's go from 4/8 to 6/6.

You were right about Alderlake-M. It'll be interesting to see the package, if they are going to have Foveros on 5W or not. Lakefield was actually die size limited to get the package that small.
 
Last edited:

IntelUser2000

Elite Member
Oct 14, 2003
8,590
3,648
136
Intel has 2+8+2 within 9W TDP and 6+8+2 can be set to just 20W, so I don't see why 4+8+2 is not possible for 15W TDP.
Realistically at 9W, the TDP is low enough that it even limits single threaded performance. Look at Amberlake. If you look at every benchmark other than Geekbench, the 15W parts are 20-30% faster in single threaded applications.

You need 15W for full single thread performance, and beyond that for multi-threads. 9W is probably just enough to mostly unleash performance of the 8 Gracemont cores.

Again 28W is pushing it for 6+8. 45W makes much more sense.
 

TESKATLIPOKA

Golden Member
May 1, 2020
1,295
1,551
106
Of course performance will be limited thanks to low TDP, but that's to be expected in this segment and even 15W is not enough for unleashing full single core performance.
AMD has 8 big cores with 15W TDP and no one cares It has limited performance.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,590
3,648
136
Yea I mean, if you look at Tigerlake, 15W barely serves 4 cores. AMD has 8 because not only it's smaller but more power efficient too.

The decisions are probably arbitrary as well. Marketing, profits, performance all play a role. 8+8+2 is possible too, but will it make sense as a brand, perf/watt, and in revenue terms?

Golden Cove, unless they change the design paradigm it's likely it won't be more efficient in MT. The growth in transistors don't result in corresponding increases in performance.

I won't bet on ESF being a big of a gain as SF. 10nm Icelake process sucked, so SF had lot more potential. SF brought 20% gains. ESF, maybe 7-10% on top of that?
 
  • Like
Reactions: Tlh97

Thala

Golden Member
Nov 12, 2014
1,355
652
136
Of course performance will be limited thanks to low TDP, but that's to be expected in this segment and even 15W is not enough for unleashing full single core performance.
AMD has 8 big cores with 15W TDP and no one cares It has limited performance.
And almost all laptops with 8 core ryzen are configured for at least 25W - for good reason. I mean 15W already limits 4-core Tigerlake quite significantly.
 
Last edited:

JoeRambo

Golden Member
Jun 13, 2013
1,613
1,738
136
Not sure if it was posted already, but it seems people have figured out cache composition for Alder Lake ( due to lucky leak of GB5 OpenCL bench getting scheduled on small cores and revealing structure ).


24MB of L3 in 10 slices, Golden cove has 1.25MB of L2, Gracemont cluster has 2MB of L2 shared for each whole cluster of 4C.

What is also interesting is 64KB L1I cache for Gracemont, Intel probably realized that without uCode cache L1I is glass jaw of performance and are increasing it from 32KB in Tremont to 64KB in Silvermont.
 

Ajay

Lifer
Jan 8, 2001
12,095
5,759
136
Hmm, haven't followed the 'mont cores at all really, but why have both a shared L2$ and shared L3$ per four core cluster?
L2$ is probably inclusive and L3$ victim; sharing L2$ seems pretty old skool.
 

JoeRambo

Golden Member
Jun 13, 2013
1,613
1,738
136
Hmm, haven't followed the 'mont cores at all really, but why have both a shared L2$ and shared L3$ per four core cluster?
L2$ is probably inclusive and L3$ victim; sharing L2$ seems pretty old skool.
L3 is shared whole chip wide and is victim most likely, and sharing it with the rest of the chip is good way to reduce memory misses. Even more so with heterogenous chip, you don't want reschedule between clusters or on big core turn into 100% memory cold scenario, L3 will have some data at least.
Cluster shared L2 increases it for all cores at the cost of some latency ( and some advanced problems like one core being memory traffic heavy can evict lines for the rest the cores ).
512kb of L2 private to each would be still OK, having 2MB shared at some 1X cycles of latency is even better.

I would not discount coherency traffic simplifications either, why have 4 agents, when you can get away with 1 - checking that inclusive L2 for everything small core clusters have in.
 

repoman27

Senior member
Dec 17, 2018
308
435
136
I've updated this enough at this point that I'm reposting it along with links to all of the sources.

Alder Lake (ADL)

manufacturing process:
Intel 10nm Enhanced SuperFin (10+++ > 10++ > 10ESF)

dies:
2+8+2 LP = 2 Golden Cove cores + 8 Gracemont cores + GT2 graphics + 4 Thunderbolt 4 ports (Intel Family 6, Model 154, Stepping 1?)
6+8+2 LP = 6 Golden Cove cores + 8 Gracemont cores + GT2 graphics + 4 Thunderbolt 4 ports (Intel Family 6, Model 154, Stepping 0?)
6+0+1 HP = 6 Golden Cove cores + GT1 graphics (Intel Family 6, Model 151, Stepping ?)
8+8+1 HP = 8 Golden Cove cores + 8 Gracemont cores + GT1 graphics (Intel Family 6, Model 151, Stepping 1?)
*Golden Cove cores support Hyper-Threading and AVX-512

graphics:
GT1 = 32EU Xe-LP Gen12.2
GT2 = 96EU Xe-LP Gen12.2

chipsets:
ADP-LP = 600 Series on-package PCH, OPI x8 @ 4 GT/s
ADP-H = 600 Series PCH (2-chip platform), DMI Gen4 x8, 28 mm x 25 mm
*Alder Lake PCH = Alder Point (ADP), Intel 14nm

packages:
M = BGA 1781, ? (Y > Type 4 > UP4 > M)
P = BGA 1744, 50 mm x 25 mm (U > Type 3 > UP3 / H35 > P)
S BGA = BGA ?, ? (H > S BGA)
S = LGA 1700, 45 mm x 37.5 mm

memory interfaces:
M = LPDDR4X-4266 / LPDDR5-5400?
P = LPDDR4X-4266 / LPDDR5-5400? / DDR4-3200 1DPC / DDR5-4800 1DPC
S = DDR4-3200 2DPC / DDR5-4000 2DPC / DDR5-4800 1DPC

PCI Express:
M = CPU Gen5 1x8 / Gen4 1x4?, PCH Gen3 up to 10 lanes
P = CPU Gen5 1x8 + Gen4 2x4, PCH Gen3 up to 12 lanes
S = CPU Gen5 1x16 / 2x8 + Gen4 1x4, PCH Gen4 up to 16 lanes + Gen3 up to 12 lanes

platforms:
M5 = 2+8+2 LP and TGP-LP? dies, M package
U9 = 2+8+2 LP and TGP-LP? dies, M package
U15 = 2+8+2 LP and ADP-LP dies, P package
U28 = 6+8+2 LP and ADP-LP dies, P package
H45 = 6+8+2 LP and ADP-LP dies, P package
H55 = 8+8+1 HP die, S BGA package
S35 = 6+0+1 HP or 8+8+1 HP die, S package
S65 = 6+0+1 HP or 8+8+1 HP die, S package
S80 = 6+0+1 HP or 8+8+1 HP die, S package
S125 = 8+8+1 HP die, S package

launch schedule:
ADL-M/P 2+8+2 (M5/U9/U15) Aug '21? press embargo
ADL-P 6+8+2 (U28) Aug '21? press embargo
ADL-S 8+8+1 WW35'21 start of volume production > NET Dec '21 RTS
ADL-S 6+0+1 WW41'21 start of volume production > NET Jan '22 RTS
ADL-P 6+8+2 (H45) Jan '22? press embargo
ADL-S 8+8+1 (H55) Apr '22? press embargo

sources:
sharkbay PTT BBS 2020-01-02
sharkbay PTT BBS 2020-03-02
sharkbay PTT BBS 2020-05-13
@JZWSVIC Zhihu 2020-07-12
sharkbay PTT BBS 2020-07-15
Li Tang Technology interposer list
Coelacanth's Dream Alder Lake
Intel Architecture Day 2020-08-13
Notebookcheck 2020-10-03
Intel CES 2021-01-11
HXL @9550pro Twitter 2021-03-06
VideoCardz 2021-03-11
VideoCardz 2021-03-20
188号 @momomo_us Twitter 2021-03-26
HXL @9550pro Twitter 2021-04-16

edit: added links to additional sources
 
Last edited:

ASK THE COMMUNITY