Discussion Intel current and future Lakes & Rapids thread

Racan · Aug 19, 2021

Zucker2k said:
I noticed that, too. More performance on the table with Windows 11?

Windows 11 increases Golden Coves IPC?

Saylick · Aug 19, 2021

Racan said:
That would be extremely disappointing, it would make Zen 4 the weakest improvement since Zen>Zen+, especially after the longest wait time between Zen architectures. I hope it will be more substantial than that.

If rumors are to be believed, IPC increase for Zen 4 is >20%. Moore's Law is Dead heard something about the jump from Zen 2 to Zen 4 is similar to that as the original Zen 1. So, if you take the IPC gains for Zen 1 as 52%, and knowing that Zen 2 to Zen 3 is 19%, that implies that Zen 3 to Zen 4 is ~28%. I don't think it's going to be that high but I personally think AMD can match Intel's 19% IPC increase, if not out-do them.

dullard · Aug 19, 2021

Just in case some haven't seen it, here is the official Architecture day slide deck:

https://download.intel.com/newsroom/2021/client-computing/intel-architecture-day-2021-presentation.pdf

It has a lot of stuff in it that I didn't see in the couple of review websites that I read. The slide deck covers: Golden Cove, Grace Mont, Thread Director, Alder Lake, Xe HPG, Sapphire Rapids, IPU, Oak Springs Canyon, Arrow Creek, Mount Evans, and Ponte Vecchio.

The one that really stuck out like an impossible target is 1000x by 2025 (slides 5 to 11). That is, assuming that 1000x refers to performance.

IntelUser2000 · Aug 19, 2021

Gracemont versus Skylake
Single thread
-40% higher performance at same power, or 40% of the power at same performance
-Total performance is 7-8% higher

Multi Thread
-4C Gracemont is 80% higher performance than 2C4T Skylake. Based on SpecCPUs scaling lying in the 85-90% range, the few % advantage in single thread performance carries onto here without losses
-80% less power at same performance

A Gracemont cluster, consisting of 4 cores and the 2MB L2 cache takes up the same area as a single Skylake core with it's 256KB L2 cache. Skylake is larger at 8.7mm2 since it's on the 14nm process. Tremont cluster roughly equals Sunny Cove core. So again a very efficient area/performance improvement.

Full AVX2 support with FMA and 256-bit FP!

Innovative new concepts introduced, and just like the x86 optimization manual alluded, it has a hardware load balancer to improve throughput on the dual cluster decode.

Golden Cove's 19% gains are meh considering the sizable changes.

Asterox · Aug 19, 2021

Racan said:
Windows 11 increases Golden Coves IPC?

Maybe in the Twilight zone, hm.

https://twitter.com/x/status/1428419323232768006

IntelUser2000 · Aug 19, 2021

So if Golden Cove is only 19% faster per clock compared to Sunny Cove, and Sunny Cove is 18% faster than Skylake, and Gracemont is 7% faster than Skylake,

Golden Cove is only 31% faster than Gracemont. So much for a "little core". It sips power like a little core, and uses die area like a little core but much higher performing.

ondma · Aug 19, 2021

IntelUser2000 said:
So if Golden Cove is only 19% faster per clock compared to Sunny Cove, and Sunny Cove is 18% faster than Skylake, and Gracemont is 7% faster than Skylake,

Golden Cove is only 31% faster than Gracemont. So much for a "little core". It sips power like a little core, and uses die area like a little core but much higher performing.

Your point still stands but the gains in the big cores are not simply additive. The real gain is 1.18 x 1.19 or 40%. So GC is 33% faster than Gracemont. Problem with Gracemont is that it lacks hyperthreading and cant clock as high as the big cores.

moinmoin · Aug 19, 2021

lobz said:
Fyi, incredibly high power lasers use that incredibly high power only for an incredibly short amount time as well

Spot welding!

Abwx · Aug 19, 2021

IntelUser2000 said:
Gracemont versus Skylake
Single thread
-40% higher performance at same power, or 40% less power at same performance
-Total performance is 7-8% higher

A shrinked SKL would consume about 50% lower power as well as getting 35-40% better perf at isofrequency, and 8% higher perf is 2C vs 1C/2T.

IntelUser2000 said:
Golden Cove's 19% gains are meh considering the sizable changes.

I would say considering the sofwares used to get this number, guess that there s some marketing going on as a mean to get the same as below :

gdansk · Aug 19, 2021

dullard said:
Just in case some haven't seen it, here is the official Architecture day slide deck:

https://download.intel.com/newsroom/2021/client-computing/intel-architecture-day-2021-presentation.pdf
It has a lot of stuff in it that I didn't see in the couple of review websites that I read. The slide deck covers: Golden Cove, Grace Mont, Thread Director, Alder Lake, Xe HPG, Sapphire Rapids, IPU, Oak Springs Canyon, Arrow Creek, Mount Evans, and Ponte Vecchio.

The one that really stuck out like an impossible target is 1000x by 2025 (slides 5 to 11). That is, assuming that 1000x refers to performance.

They probably mean 1000x in some niche like matrix operations of low precision int/fp. Shouldn't be too hard to achieve given by 2025 they'll have massive 600W+ MCM GPU derivatives of Xe HP or Xe HPC. And right now they have nothing comparable except a few Xeons with VNNI extensions.

gdansk · Aug 19, 2021

jpiniero said:
The ones that look like regressions might be AVX-512 tests.

While possible, I would expect to drop more than 15% in those tests.

uzzi38 · Aug 19, 2021

19% IPC is strong for a generational uplift but disappointing considering the base line for that measurement is Cypress Cove which already displayed lower IPC than Zen 3 (and mind you, also lower than both Sunny and Willow Cove) and the sheer size of some of the core compared to Z3 as well. I mean, 2x the ROB? Jeez.

Gracemonts are definitely the real star of the show here.

gdansk · Aug 19, 2021

uzzi38 said:
19% IPC is strong for a generational uplift but disappointing considering the base line for that measurement is Cypress Cove which already displayed lower IPC than Zen 3 (and mind you, also lower than both Sunny and Willow Cove) and the sheer size of some of the core compared to Z3 as well. I mean, 2x the ROB? Jeez.

Gracemonts are definitely the real star of the show here.

Regardless, it should put them ahead of Zen 3 IPC and (I presume) Intel will keep these clocked slightly higher than Zen 3. So it comes down to how much Zen 3D will actually improve in these small benchmarks for IPC claims.

So I expect Zen 3D and Alder Lake to be trading blows. On mobile, however, it looks like Zen 3 will be much less competitive. Alder Lake will have IPC, maximum clock speed (probable) to win in ST. And those 8 E cores will more than make up for the missing 2 P core in MT workloads, even if GC cannot sustain good boost clocks in low power.

uzzi38 · Aug 19, 2021

gdansk said:
Regardless, it should put them ahead of Zen 3 IPC and (I presume) Intel will keep these clocked slightly higher than Zen 3. So it comes down to how much Zen 3D will actually improve in these small benchmarks for IPC claims. I think we might see a year where which is better depends entirely on whether your workload prefers more L3 or has more ILP for Golden Cove to work with.

Zen 3 with V-Cache is probably going to be very workload dependant in it's gains. For synthetics SPEC is definitely going to benefit the most, Cinebench and CPU-z I expect minor improvements at best.

In terms of real world workloads though? Anything that reaches into memory often - gaming, code compilation, video editing... I struggle to think of an actual workload that won't significantly benefit actually.

JoeRambo · Aug 19, 2021

dullard said:
Just in case some haven't seen it, here is the official Architecture day slide deck:
https://download.intel.com/newsroom.../intel-architecture-day-2021-presentation.pdf

Full presentation has some real "gems" about hardware assisted scheduling. I had good laugh at this slide:

Well, that is great example of throwing performance under the bus. Imagine a game thread, say one of thread pool members specialized in heavy processing some physics or AI or whatever data each frame, that has finished its work for the frame and is busy waiting in spin loop waiting for the next frame to start.
L2 cache is hot and full of relevant data. Then morons from Intel and Microsoft arrive and move this recently-busy-but-now-idle thread to small core. Context switch takes ton of time, caches are cold, once thread is back to work again, it is once again promoted to big core and the cycle repeats. At the huge cost of cache misses.

So Intel and MS move in to reign in this ping-pong with scheduler tunables and heuristics and so on and we are back to square zero with schedulers. Don't forget that we are already at negative starting position, since scheduler needs to take into account all that rich performance data from hardware to make its decisions and that also takes cycles and dirties caches.
While it is nice to have less static scheduling that can actually react to changes in characteristics of threads, that will still come at a cost of both peak performance and performance consistency. Alder Lake will be great at full MT load, great up to 8T of load and suffer from scheduler everywhere else and those problems might not even rear their ugly heads until games start to use just right amount of threads in a wrong way.

Hougy · Aug 19, 2021

I just notice that since Alder Lake is releasing on "Intel 7", Charlie was right that Intel would never release a 10 nm desktop chip

Abwx · Aug 19, 2021

gdansk said:
Regardless, it should put them ahead of Zen 3 IPC and (I presume) Intel will keep these clocked slightly higher than Zen 3. So it comes down to how much Zen 3D will actually improve in these small benchmarks for IPC claims.

Numbers for small cores are not Spec_int but Spec_rate, wich is a full system test that take big advantage of bandwith.

Hulk · Aug 19, 2021

The Golden Cove performance improvement/architectural changes I expected. If true, the Gracemont power/performance compared to Skylake is kind of mind blowing. ST Gracemont is basically faster, more power efficient, and much smaller than Skylake. This begs the question as to why we even need the Golden Cove cores?

After thinking about it a bit I believe the reason stems from the fact that ultimately ST performance is still very important. There are many apps, like Handbrake that don't scale well beyond 8 cores. Throwing 40 Gracemont cores at that app in the same die space as Alder Lake won't show the same performance as 8+8 ADL. Partly because of the higher clocks of Golden Cove, the higher inherent IPC, and as I wrote above, there are still quite a few apps that don't handle high core counts in a linear fashion performance-wise.

Starting to get just a little more excited about ADL...

gdansk · Aug 19, 2021

Hougy said:
I just notice that since Alder Lake is releasing on "Intel 7", Charlie was right that Intel would never release a 10 nm desktop chip

If I recall correctly Charlie was talking about their first generation 10nm. He was already right, long ago, as they trashed their first generation 10nm. Even if they kept the name as 10nm ESF.

eek2121 · Aug 19, 2021

Hulk said:
The Golden Cove performance improvement/architectural changes I expected. If true, the Gracemont power/performance compared to Skylake is kind of mind blowing. ST Gracemont is basically faster, more power efficient, and much smaller than Skylake. This begs the question as to why we even need the Golden Cove cores?

After thinking about it a bit I believe the reason stems from the fact that ultimately ST performance is still very important. There are many apps, like Handbrake that don't scale well beyond 8 cores. Throwing 40 Gracemont cores at that app in the same die space as Alder Lake won't show the same performance as 8+8 ADL. Partly because of the higher clocks of Golden Cove, the higher inherent IPC, and as I wrote above, there are still quite a few apps that don't handle high core counts in a linear fashion performance-wise.

Starting to get just a little more excited about ADL...

We already knew everything that Intel announced today. It is just that a lot of people did not believe the rumors/leaks. I remember getting yelled at for saying that gracemont would be faster than skylake…

dullard · Aug 19, 2021

JoeRambo said:
Full presentation has some real "gems" about hardware assisted scheduling. I had good laugh at this slide:
View attachment 49077

Well, that is great example of throwing performance under the bus. Imagine a game thread, say one of thread pool members specialized in heavy processing some physics or AI or whatever data each frame, that has finished its work for the frame and is busy waiting in spin loop waiting for the next frame to start.
L2 cache is hot and full of relevant data. Then morons from Intel and Microsoft arrive and move this recently-busy-but-now-idle thread to small core. Context switch takes ton of time, caches are cold, once thread is back to work again, it is once again promoted to big core and the cycle repeats. At the huge cost of cache misses.

So Intel and MS move in to reign in this ping-pong with scheduler tunables and heuristics and so on and we are back to square zero with schedulers. Don't forget that we are already at negative starting position, since scheduler needs to take into account all that rich performance data from hardware to make its decisions and that also takes cycles and dirties caches.
While it is nice to have less static scheduling that can actually react to changes in characteristics of threads, that will still come at a cost of both peak performance and performance consistency. Alder Lake will be great at full MT load, great up to 8T of load and suffer from scheduler everywhere else and those problems might not even rear their ugly heads until games start to use just right amount of threads in a wrong way.

1) Simple fix to your concern: put a minimum idle limit in the guidance for considering a thread idle. Suppose a game is running at 60 fps. Then each frame takes 16.7 ms. Don't move an idle thread unless it is idle more than 16.7 ms. Then your game can not impacted at all by the scheduler. No ping-pong possible. Note: I'm not saying 16.7 ms is the ideal number, this is just an example. I'm not saying that this is an easy thing to solve (which is why it took both a big change from Intel and Microsoft). I'm just saying that your example is an easy thing to solve.

2) It is a new hardware based microcontroller doing the heavy lifting. Thus, with separate hardware doing much of the calculations, it is less work on the software, fewer cycles and less dirty cache.

3) Why are you suddenly running some processor heavy other program in the middle of gaming?

dullard · Aug 19, 2021

Hulk said:
After thinking about it a bit I believe the reason stems from the fact that ultimately ST performance is still very important. There are many apps, like Handbrake that don't scale well beyond 8 cores.

Exactly. User interface needs high ST performance for a fast "feel". You could have a billion slow cores, and while it will crunch numbers quickly, it would feel like a dog on the user interface thread. And many applications just physically can't be divided into a large number of threads no matter what the programmer tries. There is always a need for a few very fast ST cores.

gdansk · Aug 19, 2021

Presently available schedulers already try to avoid ping-pong of threads even between the same core type. The same mechanisms which keep busy threads running on the same core should apply here. Intel presented an example and I expect it is exactly that, an example. I presume that Intel engineers know that tasks which are using a spin lock do so because it is a low latency lock. But who knows.

Hougy · Aug 19, 2021

gdansk said:
If I recall correctly Charlie was talking about their first generation 10nm. He was already right, long ago, as they trashed their first generation 10nm. Even if they kept the name as 10nm ESF.

Actually, he said they cancelled 10 nm. So all 10 nm products. He even admitted he was wrong

itsmydamnation · Aug 19, 2021

eek2121 said:
We already knew everything that Intel announced today. It is just that a lot of people did not believe the rumors/leaks. I remember getting yelled at for saying that gracemont would be faster than skylake…

faster per clock, it wont be faster

clock power scaling of gracemont will be very interesting ,
Did i see 3 cycle L1 latency ? thats gotta put a lid on clocks.

Discussion Intel current and future Lakes & Rapids thread

Golden Member

Diamond Member

Elite Member

Elite Member

Golden Member

Elite Member

Diamond Member

Diamond Member

Lifer

Diamond Member

Diamond Member

Platinum Member

Diamond Member

Platinum Member

Golden Member

Member

Lifer

Diamond Member

Diamond Member

Diamond Member

Elite Member

Elite Member

Diamond Member

Member

Diamond Member