Discussion Intel current and future Lakes & Rapids thread

dullard · Jun 29, 2022

DrMrLordX said:
Couldn't Intel use two 6+8 compute tiles on a single package for desktop?

It is possible, yes. Advisable? I don't think so. Three main problems. (1) Intel is already probably stretched thin on their Intel 4 production capability. Doubling up on the CPU tile might not be possible due to that limitation. (2) There will be a lot of added latency between the tiles. That latency is possible to partly overcome but I don't think that is what Intel wants to see with this chip. (3) More P cores isn't advisable. If you had 28 cores in 125 W, then each core gets 4.5 W of power (and that assumes absolutely no power goes to the iGPU or the uncore). Maybe Intel could shift it so the P cores get ~6 W and the E cores get ~3 W, but even 6 W each would be a stretch. And at 6 W each, the E core is tied with or outperforms the P cores. So, why add more P cores that will perform worse?

Alder Lake’s Power Efficiency – A Complicated Picture

Reviews across the internet show Alder Lake getting very competitive performance with very high power consumption.

chipsandcheese.com

coercitiv · Jun 29, 2022

dullard said:
And at 6 W each, the E core is tied with or outperforms the P cores. So, why add more P cores that will perform worse?

That's 6W for a 4x cluster, meaning ~1.5W per core. At 4.5W/core the P core offers better perf/watt.

Timmah! · Jun 29, 2022

dullard said:
It is possible, yes. Advisable? I don't think so. Three main problems. (1) Intel is already probably stretched thin on their Intel 4 production capability. Doubling up on the CPU tile might not be possible due to that limitation. (2) There will be a lot of added latency between the tiles. That latency is possible to partly overcome but I don't think that is what Intel wants to see with this chip. (3) More P cores isn't advisable. If you had 28 cores in 125 W, then each core gets 4.5 W of power (and that assumes absolutely no power goes to the iGPU or the uncore). Maybe Intel could shift it so the P cores get ~6 W and the E cores get ~3 W, but even 6 W each would be a stretch. And at 6 W each, the E core is tied with or outperforms the P cores. So, why add more P cores that will perform worse?

Alder Lake’s Power Efficiency – A Complicated Picture

Reviews across the internet show Alder Lake getting very competitive performance with very high power consumption.

chipsandcheese.com

View attachment 63757
View attachment 63758

Is it not coming like in a year? Why would you assume they are stretched thin and why does it matters in regard to a product most likely a year away?

Regarding latency, i presume they are aware that moving away from monolithic toward tiled approach means latency increase, but since they chose to go that way anyway, they are probably OK with that. And so was AMD, right? Tiles/chiplets/stacked CPUs are the future.

I can see the power-draw with more cores to be an issue, but who knows, if they can make 8+16 on current process, perhaps they can do 12+16 on Intel 4.

DrMrLordX · Jun 29, 2022

dullard said:
It is possible, yes. Advisable? I don't think so. Three main problems. (1) Intel is already probably stretched thin on their Intel 4 production capability. Doubling up on the CPU tile might not be possible due to that limitation.

The alternative is taping out an entirely-separate 8P+16E tile that may yield worse than the 6P+8E tile. They're in trouble either way. I would say that Intel has the benefit of Granite Rapids being delayed and moved to a different node, though it's probably more the delay than the node shift that will help there.

(2) There will be a lot of added latency between the tiles.

That is maybe a problem, but if Intel is supposed to have all this wonderful packaging tech, don't you think they should have overcome that by now? Or compensated for it somehow? AMD has been doing this for years.

(3) More P cores isn't advisable. If you had 28 cores in 125 W, then each core gets 4.5 W of power (and that assumes absolutely no power goes to the iGPU or the uncore). Maybe Intel could shift it so the P cores get ~6 W and the E cores get ~3 W, but even 6 W each would be a stretch. And at 6 W each, the E core is tied with or outperforms the P cores. So, why add more P cores that will perform worse?[/quote]

Intel should be able to control voltages and clockspeeds well enough that a 12P+16E part should perform just as well in a <=8 thread task as a hypothetical 8P+16E part, and overall better in a >= 24 thread task. Yes interconnect power might go up when fully utilized, so that is a possible issue. In any case, as long as Redwood Cove is anything like Golden Cove or Raptor Cove, it should gain efficiency by reducing clocks and volts by a little bit. Having more P cores is not a bad thing.

LightningZ71 · Jun 29, 2022

Intel would likely do far, FAR better, doing two separate tiles, one for just P cores with process tweaks for clock speed and another with just E cores with process tweaks for density and power usage improvements. Just doing a relaxed, high speed P core tile with 8 cores would be enough there, then a high density 32 core E core tile, maybe they could fit 40 e cores there in a density optimized configuration. That would be 8 high speed threads, 8 HT threads, and 32-40 threads on the e cores. That neatly matches the 48 threads that a 24 core AMD processor would have, with higher overall throughput assuming that everything else is done right.

Timmah! · Jun 29, 2022

LightningZ71 said:
Intel would likely do far, FAR better, doing two separate tiles, one for just P cores with process tweaks for clock speed and another with just E cores with process tweaks for density and power usage improvements. Just doing a relaxed, high speed P core tile with 8 cores would be enough there, then a high density 32 core E core tile, maybe they could fit 40 e cores there in a density optimized configuration. That would be 8 high speed threads, 8 HT threads, and 32-40 threads on the e cores. That neatly matches the 48 threads that a 24 core AMD processor would have, with higher overall throughput assuming that everything else is done right.

Was not Arrow Lake rumored to be 8 + 32? Maybe it will be it exactly the way you propose.

dullard · Jun 29, 2022

Timmah! said:
Was not Arrow Lake rumored to be 8 + 32? Maybe it will be it exactly the way you propose.

Yes, that was the rumor. 8 P + 32 E would take roughly the same die area as 12 P + 16 E. But 8 P + 32 E (48 threads) would be faster in highly multi-threaded tasks than 12 P + 16 E (40 threads).

Lets give the P cores a benchmark that highlights the P-core strength and see what happens.
Rough math assumptions:

We'll use the Chips and Cheese libx264 Transcode chart since that favors P cores over E cores.
Assume a 125 W processor load.
Assume a 10 W uncore with all the various tiles (IO, iGPU, SOC, etc). This is low especially if the iGPU is used, but it gives the most advantage to the P core by giving it more power.
Assume that each P core is given double the power of each E core. (Feel free to change this assumption and recalculate as you wish).

12 P + 16 E Math:

Each block of 4 P cores uses 23 W and performs 7.7 frames/s according to the Chips and Cheese benchmark.
Each block of 4 E cores uses 11.5 W and performs 4.5 frames/s according to the Chips and Cheese benchmark.
Total power = 10 W (uncore) + 3 * 23 W (P cores) + 4 * 11.5 W (E cores) = 125 W
Total performance: 3 * 7.7 frames/s + 4 * 4.5 frames/s = 41.1 frames/s

8 P + 32 E Math:

Each block of 4 P cores uses 19.17 W and performs 6.8 frames/s according to the Chips and Cheese benchmark.
Each block of 4 E cores uses 9.6 W and performs 4.1 frames/s according to the Chips and Cheese benchmark.
Total power = 10 W (uncore) + 2 * 19.17 W (P cores) + 8 * 9.58 W (E cores) = 125 W
Total performance: 2 * 6.8 frames/s + 8 * 4.1 frames/s = 46.4 frames/s

The 8 P + 32 E wins over 12 P + 16 E on the benchmark that preferred P cores. Even if you subtract a few percent for Amdahl's law, 8 P + 32 E still wins.

nicalandia · Jun 30, 2022

dullard said:
It is true that Intel tries for about a 12 month cadence. But, there are very notable exceptions. There are timeline slips and a few odd chips that break up that timeline.

Desktop Chip
Generation
Months As Top Generation
Nahalem

14.2
Westmere

12.1
Sandy Bridge
2
15.6
Ivy Bridge
3
13.1
Haswell
4
12.0
Haswell Refresh
4
12.0
Broadwell
5
2.1
Sky Lake
6
12.8
Kaby Lake
7
13.2
Coffee Lake
8
12.5
Coffee Lake Refresh
9
10.1
Comet Lake
10
19.3
Rocket Lake
11
7.2
Alder Lake
12
10.9? Estimated

Haswell Refresh wasn't even called a new generation, so you could argue there was a 24 month gap.

IPC Wise there is no gains from Skylake to Comet Lake, both are identical uArch built on the same process. Intel kept adding speed and cores to an outdated design.

nicalandia · Jun 30, 2022

Also..

https://twitter.com/x/status/1542643729080655879

eek2121 · Jun 30, 2022

Timmah! said:
Is it not kinda downgrade from 8 + 16 Raptor Lake?

Oh boy...where to start...is a 5 GHz P4 Extreme faster than a 4 GHz Core i3? (HINT: the answer is no kid. It all comes down to absolute performance, or rather, perf/watt when comparing fixed power and/or heating targets. Note that power and heat are two different things)

pakotlar said:
It sure looks like Raptor Lake is going to be about 30% better in perf/watt, and more like 50% for PL4 workloads (burst), compared to Alder Lake. At the same time AMD will have only a tiny perf/w advantage, with 230W PPT, compared to 253W PL2 for Raptor Lake, with Raptor Lake being a bit faster. AMD seems to have made a great processor, but not to Zen 3’s level, unfortunately. Had they given us 24-32 cores, it would have killed, as it stands, a bit underwhelming for those who don't need AVX-512. Going on rumor anyway. I suspect its going to be quite a good desktop part.

Meteor Lake and beyond seems too speculative at this point, and maybe more focused on power efficiency for mobile.

I really want to know what you are smoking. You mention PL4 "burst" workloads and that sets off alarms. An Intel chip should never hit PL4. PL2 is "burst". PL4 is at the very limit of an Intel chip's capability to operate. If an Intel chip goes beyond PL4 (or even hits), automatic shutdown will occur. Note that under normal operating circumstances, even when overclocked, Intel chips should never come close to PL4. Period.

Raptor lake will be around 20-30% faster than Alder Lake in multicore workloads (depending on the workload), and a small percentage (under 10%) faster than Alder Lake in Single core workloads. Remember, you heard it here first. Don't bother arguing, just quote my post on launch if I'm wrong.

Zen 4, with 16 cores, will easily be able to deal with Raptor Lake, mind you, and according to what we know, the Zen 4 chip demoed by AMD did NOT use 230W, so the PPT for that chip would likely be lower, but alas, I want to keep this post about Intel.

One day Intel will realize what made the Core/Core 2 chips special: hint, it was perf/watt. Until they rediscover that, they won't be competitive with AMD, and according to the latest earnings reports, they have about 2-3 years until AMD technically becomes a larger company...and AMD is fabless.

Intel needs to stop mis-stepping and start stepping. I remember when Core came out (with little fandom), then Core 2 came out. It was Intel's exit strategy from the power-hungry GHz race. A (sub, in some cases) 2ghz chip quickly and easily beat out the hottest and fastest Pentiums. Until that moment, Intel kept releasing faster and hotter chips, while routinely making missteps, some of which required actual recalls. AMD consistently beat them with lower cost, yet superior chips.

Intel needs another Core 2 in order to be competitive. Meteor Lake may be that chip. We will see. Raptor lake is absolutely not that chip. It will provide a decent upgrade for those who buy Intel, but it won't change the status quo at all.

This message was NOT brought to you by an AMD fanboy. Just a guy who has been in tech since well before the 8088 existed.

Speaking of the Core 2 Duo (or similar AMD chips, for that matter) did you guys watch the Gamers Nexus livestream the other day?

pakotlar · Jun 30, 2022

eek2121 said:
Oh boy...where to start...is a 5 GHz P4 Extreme faster than a 4 GHz Core i3? (HINT: the answer is no kid. It all comes down to absolute performance, or rather, perf/watt when comparing fixed power and/or heating targets. Note that power and heat are two different things)

I really want to know what you are smoking. You mention PL4 "burst" workloads and that sets off alarms. An Intel chip should never hit PL4. PL2 is "burst". PL4 is at the very limit of an Intel chip's capability to operate. If an Intel chip goes beyond PL4 (or even hits), automatic shutdown will occur. Note that under normal operating circumstances, even when overclocked, Intel chips should never come close to PL4. Period.

Raptor lake will be around 20-30% faster than Alder Lake in multicore workloads (depending on the workload), and a small percentage (under 10%) faster than Alder Lake in Single core workloads. Remember, you heard it here first. Don't bother arguing, just quote my post on launch if I'm wrong.

Zen 4, with 16 cores, will easily be able to deal with Raptor Lake, mind you, and according to what we know, the Zen 4 chip demoed by AMD did NOT use 230W, so the PPT for that chip would likely be lower, but alas, I want to keep this post about Intel.

One day Intel will realize what made the Core/Core 2 chips special: hint, it was perf/watt. Until they rediscover that, they won't be competitive with AMD, and according to the latest earnings reports, they have about 2-3 years until AMD technically becomes a larger company...and AMD is fabless.

Intel needs to stop mis-stepping and start stepping. I remember when Core came out (with little fandom), then Core 2 came out. It was Intel's exit strategy from the power-hungry GHz race. A (sub, in some cases) 2ghz chip quickly and easily beat out the hottest and fastest Pentiums. Until that moment, Intel kept releasing faster and hotter chips, while routinely making missteps, some of which required actual recalls. AMD consistently beat them with lower cost, yet superior chips.

Intel needs another Core 2 in order to be competitive. Meteor Lake may be that chip. We will see. Raptor lake is absolutely not that chip. It will provide a decent upgrade for those who buy Intel, but it won't change the status quo at all.

This message was NOT brought to you by an AMD fanboy. Just a guy who has been in tech since well before the 8088 existed.

Speaking of the Core 2 Duo (or similar AMD chips, for that matter) did you guys watch the Gamers Nexus livestream the other day?

A rude way of asking “what is PL4”. Its the max power draw for a 10ms duration ref:https://www.igorslab.de/en/power-co...ptor-lake-s-cpus-in-comparison-exclusively/3/

the internet brings out the worst in people.

coercitiv · Jul 1, 2022

pakotlar said:
A rude way of asking “what is PL4”. Its the max power draw for a 10ms duration ref

the internet brings out the worst in people.

While @eek2121 over-reacted in his reply, his attitude is something most of us on the forum go through sooner or later considering how misinformation creeps in.

First, let's start with a better source for defining PL4 and PL3:
https://edc.intel.com/content/www/u...heet-volume-1-of-2/008/package-power-control/

Power Limit 3 (PL3): A threshold that if exceeded, the PL3 rapid power limiting algorithms will attempt to limit the duty cycle of spikes above PL3 by reactively limiting frequency. This is an optional setting

Power Limit 4 (PL4): A limit that will not be exceeded, the PL4 power limiting algorithms will preemptively limit frequency to prevent spikes above PL4.

So, just as @eek2121 warned you, PL4 is not intended to be used as an operating state but rather as a hard threshold in which the CPU must immediately lower clocks. There is no such thing as PL4 workloads and there is no point in attempting to define or discuss perf/watt during a PL4 excursion. AFAIK PL3/PL4 relevance is how these spikes affect the motherboard VRM and the PSU.

Timmah! · Jul 1, 2022

eek2121 said:
Oh boy...where to start...is a 5 GHz P4 Extreme faster than a 4 GHz Core i3? (HINT: the answer is no kid. It all comes down to absolute performance, or rather, perf/watt when comparing fixed power and/or heating targets. Note that power and heat are two different things)

so in other words you think that Raptor Lake -> Meteor Lake will be akin to Pentium 4 -> Nehalem?

moinmoin · Jul 1, 2022

eek2121 said:
Intel needs to stop mis-stepping and start stepping.

Make no mistake, Intel is hard at work stepping, many of them!

https://twitter.com/x/status/1542642453186232320

DrMrLordX · Jul 1, 2022

Dylan Patel/Skyjuice may be too kind to Intel, if Intel can't actually launch Sapphire Rapids until next year. Though that rumour is as-of-yet unconfirmed, I guess?

pakotlar · Jul 1, 2022

coercitiv said:
While @eek2121 over-reacted in his reply, his attitude is something most of us on the forum go through sooner or later considering how misinformation creeps in.

First, let's start with a better source for defining PL4 and PL3:
https://edc.intel.com/content/www/u...heet-volume-1-of-2/008/package-power-control/

So, just as @eek2121 warned you, PL4 is not intended to be used as an operating state but rather as a hard threshold in which the CPU must immediately lower clocks. There is no such thing as PL4 workloads and there is no point in attempting to define or discuss perf/watt during a PL4 excursion. AFAIK PL3/PL4 relevance is how these spikes affect the motherboard VRM and the PSU.

If you read the article, and the intel edc link, PL4 has a duration. That duration is 10ms. It is absolutely the burst power state, on small time scales.

igor_kavinski · Jul 1, 2022

Regarding PL4, it is disabled by default. Even when enabled, it will work in 10ms bursts as long as the CPU isn't melting. So CPU boosts for 10ms and then how long before it can be cooled for PL4 boosting again? You would likely need a 360mm liquid cooler. Not something the average user is going to be using.

pakotlar · Jul 1, 2022

igor_kavinski said:
Regarding PL4, it is disabled by default. Even when enabled, it will work in 10ms bursts as long as the CPU isn't melting. So CPU boosts for 10ms and then how long before it can be cooled for PL4 boosting again? You would likely need a 360mm liquid cooler. Not something the average user is going to be using.

Defaults are not respected on many mobos, but that was not my point. My point was that conditional on PL4 being used, at that power state Raptor Lake will have something like 1.3/.87 perf/w = 1.49 alder lake perf/watt in that power state. At PL2 its around 1.3/1.04, 1.29 alder lake perf/watt.

None of that is controversial based on the current ES3 performance, which may be a bit worse than production. People seem to want to argue it anyway 🤷‍♀️

Exist50 · Jul 1, 2022

eek2121 said:
Oh boy...where to start...is a 5 GHz P4 Extreme faster than a 4 GHz Core i3? (HINT: the answer is no kid. It all comes down to absolute performance, or rather, perf/watt when comparing fixed power and/or heating targets. Note that power and heat are two different things)

I really want to know what you are smoking. You mention PL4 "burst" workloads and that sets off alarms. An Intel chip should never hit PL4. PL2 is "burst". PL4 is at the very limit of an Intel chip's capability to operate. If an Intel chip goes beyond PL4 (or even hits), automatic shutdown will occur. Note that under normal operating circumstances, even when overclocked, Intel chips should never come close to PL4. Period.

Raptor lake will be around 20-30% faster than Alder Lake in multicore workloads (depending on the workload), and a small percentage (under 10%) faster than Alder Lake in Single core workloads. Remember, you heard it here first. Don't bother arguing, just quote my post on launch if I'm wrong.

Zen 4, with 16 cores, will easily be able to deal with Raptor Lake, mind you, and according to what we know, the Zen 4 chip demoed by AMD did NOT use 230W, so the PPT for that chip would likely be lower, but alas, I want to keep this post about Intel.

One day Intel will realize what made the Core/Core 2 chips special: hint, it was perf/watt. Until they rediscover that, they won't be competitive with AMD, and according to the latest earnings reports, they have about 2-3 years until AMD technically becomes a larger company...and AMD is fabless.

Intel needs to stop mis-stepping and start stepping. I remember when Core came out (with little fandom), then Core 2 came out. It was Intel's exit strategy from the power-hungry GHz race. A (sub, in some cases) 2ghz chip quickly and easily beat out the hottest and fastest Pentiums. Until that moment, Intel kept releasing faster and hotter chips, while routinely making missteps, some of which required actual recalls. AMD consistently beat them with lower cost, yet superior chips.

Intel needs another Core 2 in order to be competitive. Meteor Lake may be that chip. We will see. Raptor lake is absolutely not that chip. It will provide a decent upgrade for those who buy Intel, but it won't change the status quo at all.

This message was NOT brought to you by an AMD fanboy. Just a guy who has been in tech since well before the 8088 existed.

Speaking of the Core 2 Duo (or similar AMD chips, for that matter) did you guys watch the Gamers Nexus livestream the other day?

I think the sad part, depending how you look at it, is that Intel doesn't need a new Conroe to beat AMD in efficiency. Imagine a chip with the same topology as AMD (one IO + two compute), but instead of 2x8 Zen3 cores, you had 2x8 Golden Cove cores? Or one 8c GLC + one 64c GRT die? That chip could use a ton of power if you wanted to push it to the very limits, but would also be very efficient at lower power as well. Of course, one can argue that the 16+ core consumer market isn't very large today, and thus doesn't justify the effort, but while we're on the top of making these top to top comparisons, might as well entertain the thought.

Now Intel's left fighting off 16x gen N+1 cores from AMD, with a full node advantage, but only dedicating 14 big cores worth of space. That may be a good product from an economic perspective, but is clearly going to be a stretch in terms of high end performance.

Ultimately, I do think we'll get a new Conroe out of Intel in the form of Royal. But it's interesting to imagine what would happen if that were to be proceeded, or even accompanied by, a topology change.

moinmoin · Jul 1, 2022

Exist50 said:
I think the sad part, depending how you look at it, is that Intel doesn't need a new Conroe to beat AMD in efficiency.

What kind of efficiency are you talking about, power it doesn't seem to be? Pat himself stated that Intel is behind TSMC in process perf/watt efficiency and intends to catch up by 2024.

Exist50 · Jul 1, 2022

moinmoin said:
What kind of efficiency are you talking about, power it doesn't seem to be? Pat himself stated that Intel is behind TSMC in process perf/watt efficiency and intends to catch up by 2024.

Power, yes. But I was talking about from an iso-process kind of view. Obviously Raptor Lake will be less efficient than Raphael, but thinking longer-term, AMD has been lagging TSMC's latest by a year or two. That works quite well when Intel lags by the same or worse, but seems rather risky to rely on going forward.

igor_kavinski · Jul 1, 2022

Maybe this deserves its own thread, but anyone wanna hazard a guess how many more years before AMD gets a fab again?

scannall · Jul 1, 2022

igor_kavinski said:
Maybe this deserves its own thread, but anyone wanna hazard a guess how many more years before AMD gets a fab again?

My guess is never. Apple is more likely to do there own fab than AMD. It takes huge volume to make it worthwhile.

Ajay · Jul 1, 2022

scannall said:
My guess is never. Apple is more likely to do there own fab than AMD. It takes huge volume to make it worthwhile.

Well, it's still a 0% probability. A FAB is many orders of magnitude more complex than putting together a top notch silicon design team.

scannall · Jul 1, 2022

Ajay said:
Well, it's still a 0% probability. A FAB is many orders of magnitude more complex than putting together a top notch silicon design team.

Just making an EUV machine is likely the most complex tool ever created. And likely uses unicorn blood or something magical just to work. ;-) Putting together a FAB is easy compared to that.

Desktop Chip	Generation	Months As Top Generation
Nahalem		14.2
Westmere		12.1
Sandy Bridge	2	15.6
Ivy Bridge	3	13.1
Haswell	4	12.0
Haswell Refresh	4	12.0
Broadwell	5	2.1
Sky Lake	6	12.8
Kaby Lake	7	13.2
Coffee Lake	8	12.5
Coffee Lake Refresh	9	10.1
Comet Lake	10	19.3
Rocket Lake	11	7.2
Alder Lake	12	10.9? Estimated

Discussion Intel current and future Lakes & Rapids thread

Elite Member

Diamond Member

Golden Member

Lifer

Platinum Member

Golden Member

Elite Member

Diamond Member

Diamond Member

Diamond Member

Senior member

Diamond Member

Golden Member

Diamond Member

Lifer

Senior member

Lifer

Senior member

Platinum Member

Diamond Member

Platinum Member

Lifer

Golden Member

Lifer

Golden Member