Discussion Intel current and future Lakes & Rapids thread

insertcarehere · Jun 13, 2021

IntelUser2000 said:
See the figure of Sunny Cove without L2 being 4.5mm2? Well Tremont is actually only 0.85mm2. The ENTIRE Tremont cluster is only 4.9mm2. So that includes the 2MB L2 and the I/O that's required for a quad core cluster.

Core-only Tremont is only 0.85mm2. Sunny Cove is 5-6x as large, not 4x.

The Front End, the L3 cache, Load/Store units, or the Vector units are each a Tremont core!

If tremont/gracemont turn out to be decent performing cores in their own right, the path to competitive MT performance lies with putting as many of them in a die as possible with a few Core cores to handle lightly-threaded tasks.

Come to think of it, the team that designed those cores should take over on Golden Cove and beyond as well...

SAAA · Jun 13, 2021

insertcarehere said:
If tremont/gracemont turn out to be decent performing cores in their own right, the path to competitive MT performance lies with putting as many of them in a die as possible with a few Core cores to handle lightly-threaded tasks.

Come to think of it, the team that designed those cores should take over on Golden Cove and beyond as well...

I posted earlier in favour of 8+32 core design going forward, would make sense given Raptor has been said to be 8+16, I could see that competitive MT performance happening very soon.
The issue left looks to be area/power of the big cores, unless they are somehow hiding larger IPC gains with golden and sandbagging on purpose. Would be a surprise if it turns out to be a lot larger than 20% IPC over Tiger but at lower average clockspeed keeping up with the 20% claim.
That or golden+ and redwood cove cores are due to some efficiency tuning.

coercitiv · Jun 13, 2021

diediealldie said:
Still smaller by a margin. As for a power consumption, who knows how Golden Cove is clocked.

Based on what we've seen with Willow Cove and 10SF, Intel seems to be aiming for high clocks. I don't see how people can judge SPR as a power hog based on TDP alone, not when we've already seen TGL-H managing a near-linear efficiency curve as the 8-core die uses 80W+ all by itself.

Having such low efficiency to start is not good at all, but being able to catch up in efficiency at higher power levels means they may actually seek those high performance levels to improve their figures relative to the competition. As far as I can tell, high performnce levels also play well with the inclusion with HBM on the package.

uzzi38 · Jun 13, 2021

eek2121 said:
Yes, it os a significant difference, but Intel still will make more money per chip than AMD because they own their own fabs.

Don’t write off AVX-512. It is very important for certain workloads. That 350W TDP is likely with AVX workloads. Without it will likely be a bit lower.

The top Genoa chips are rumored to have a 370W TDP.

On margins, we don,'t really have a per-product breakdown of Intel/AMD's margins - heck, we don't even know what actual final prices Xeons/EPYCs chips actually sell for because it's rarely every MSRP. But if Intel want to price competitively (like they are now), I think they'll struggle to maintain the same margins.

The 350W TDP will be for all workloads, not just AVX-512 ones. The TDP rating works the same way as that for any Xeon released since they came to be.

Top end Genoa is 320W TDP (cTDP can go higher of course, but it's not the default for the chip).

João Bortolace · Jun 13, 2021

I was thinking if Intel brought big.LITTLE to the servers, they could do on the small cores something similar to what ARM did with the A510, share the AVX-512 and AMX with 2 cores, so they would have a smaller penalty on a area and in power.

Ajay · Jun 13, 2021

scineram said:
So why not 16 cores? Looks like a 4x4 grid. What happened to one of the slots?

It's a 4x5 grid. 15 of those are clearly compute cores (by my eyes), 3 are definitely not and I can't tell on the other 2.
Unfortunately, the false coloration of these metal layers is a bit messed up. The Skylake-SP die shot shown above is much clearer.
There's allot of I/O around the periphery of the chip - as expected.

semiman · Jun 14, 2021

coercitiv said:
Based on what we've seen with Willow Cove and 10SF, Intel seems to be aiming for high clocks. I don't see how people can judge SPR as a power hog based on TDP alone, not when we've already seen TGL-H managing a near-linear efficiency curve as the 8-core die uses 80W+ all by itself.

View attachment 45694

Having such low efficiency to start is not good at all, but being able to catch up in efficiency at higher power levels means they may actually seek those high performance levels to improve their figures relative to the competition. As far as I can tell, high performance levels also play well with the inclusion with HBM on the package.

I agree with you in general, but achieving same performance with additional 10W isn't really a good sign(same score at 75W and 85W).
This is 8 core 11800H and you'll need to use almost 100W to provide clear PPC advantage against Ryzen 5800H. If it were 52 core server variant, power consumption will be whooping 700+W. More than 10watts per core.

Sapphire rapids' 350W power budget means ~7W per core, which is quite far from current 10SF / 7nm intercept points(where intel provides more performance per watt). 10ESF + new Golden cove core might give intel more room, but I don't think they can fix this soon.

I think Sapphire Rapids' true power comes from HPC, bandwidth limited tasks and RDBMS. Raw computing capability per socket, maybe not.

coercitiv · Jun 14, 2021

diediealldie said:
This is 8 core 11800H and you'll need to use almost 100W to provide clear PPC advantage against Ryzen 5800H. If it were 52 core server variant, power consumption will be whooping 700+W. More than 10watts per core.

Sapphire rapids' 350W power budget means ~7W per core, which is quite far from current 10SF / 7nm intercept points(where intel provides more performance per watt). 10ESF + new Golden cove core might give intel more room, but I don't think they can fix this soon.

I wasn't trying to build a narrative where SPR would reach the crossover point where it matches or beats the competition. The point of my post was to show Intel may be in a position where increasing power per core doesn't really hurt their efficiency. (the bad start in efficiency is another problem).

eek2121 · Jun 14, 2021

The 11980HK was tested, the the 11800H. I will have my hands on an 11800H system next week, so I can post some tests here.

On the subject of small cores, I expect a cluster of 4 to perform somewhere between a Core i5 6500 and a Core i5 7500. Possibly a bit faster or slower depending on final clocks.

EDIT: We all saw the low Geekbench scores. I assumed it was due to low clocks, but what if those were the small cores running the test? It lines up perfectly with the expected performance. I guess we will find out. If it were the small cores doing the lifting for the tests, then Alder Lake is actually going to be quite a beast.

EDIT 2: If my first edit is correct, it would match up with my earlier expectation that Alder Lake will be a bit faster than a 5950X.

coercitiv · Jun 14, 2021

eek2121 said:
but what if those were the small cores running the test?

You mean what if 8 Gracemont cores were just as fast as 8 Cypress Cove cores? Sure, why stop at the moon...

NTMBK · Jun 14, 2021

João Bortolace said:
I was thinking if Intel brought big.LITTLE to the servers, they could do on the small cores something similar to what ARM did with the A510, share the AVX-512 and AMX with 2 cores, so they would have a smaller penalty on a area and in power.
View attachment 45702

Share the front end too, and go Full Bulldozer!

Mopetar · Jun 14, 2021

Never go full Bulldozer.

eek2121 · Jun 14, 2021

coercitiv said:
You mean what if 8 Gracemont cores were just as fast as 8 Cypress Cove cores? Sure, why stop at the moon...

I don’t know what you are talking about. Alder Lake scored similar similarly to how a hypothetical 8-core Core i5 7500 would: https://browser.geekbench.com/v5/cpu/search?q=alder

Hulk · Jun 14, 2021

eek2121 said:
I don’t know what you are talking about. Alder Lake scored similar similarly to how a hypothetical 8-core Core i5 7500 would: https://browser.geekbench.com/v5/cpu/search?q=alder

Did we ever get any confirmation on what frequency that Golden Cove core was running for the Geekbench 5 single core score? I think without knowing that and the core speeds for the MT test we don't really learn much from these benches. If that ST score was at 2GHz then wow, that's fantastic. If it's 4GHz then not so much.

coercitiv · Jun 14, 2021

eek2121 said:
I don’t know what you are talking about. Alder Lake scored similar similarly to how a hypothetical 8-core Core i5 7500 would: https://browser.geekbench.com/v5/cpu/search?q=alder

11700 does 9500-10k in GB5.
10700 does 8500-8700.
9700 hits around 7500-7800.

You hypothetical 8-core 7500 would hover under 7500 in GB5, yet Alder lake went as high as 9000, hitting close to i7 11700 at arguably lower clocks. Either it was using all 14 cores at lower speed, or the Gracemont clusters are at Cypress Cove level. Make your pick.

Thala · Jun 14, 2021

coercitiv said:
11700 does 9500-10k in GB5.
10700 does 8500-8700.
9700 hits around 7500-7800.

You hypothetical 8-core 7500 would hover under 7500 in GB5, yet Alder lake went as high as 9000, hitting close to i7 11700 at arguably lower clocks. Either it was using all 14 cores at lower speed, or the Gracemont clusters are at Cypress Cove level. Make your pick.

Just to help Eek2121 with his pick:
Jasper Lake scores at highest configuration in N6000 (4xTremont@3.3GHz): link

Sure, Gracemont is supposed to be somewhat faster than Tremont...but still...

IntelUser2000 · Jun 14, 2021

I agree with @eek2121 for the single threaded portion. The ones scoring close to 1.3k seems a bit too high, but ones in the 1000-range are plausible.

Plus user-submitted scores are notoriously unreliable indicators for comparison since you can have the same setup but have immensely different scores.

The early Geekbench scores for Lakefield were really, really poor especially on the ST side because it was probably running on the low-clocked Tremont cores.

The early nature of the submission could mean some are running on Gracemont, some on Golden Cove.

You can see from the cache data that it doesn't recognize that properly either. It says 10x 64KB L1-I, and 10x 32KB L2-D when we know that it's gracemont that has that configuration and Golden Cove is 32KB L1-I and 48KB L2-D.

eek2121 · Jun 14, 2021

coercitiv said:
11700 does 9500-10k in GB5.
10700 does 8500-8700.
9700 hits around 7500-7800.

You hypothetical 8-core 7500 would hover under 7500 in GB5, yet Alder lake went as high as 9000, hitting close to i7 11700 at arguably lower clocks. Either it was using all 14 cores at lower speed, or the Gracemont clusters are at Cypress Cove level. Make your pick.

AnandTech had a score of 1338/9019 in their testing for a 10700. They don’t have a 11700 to test sadly, but both chips are “65W” chips.

We know that Gracemont will support AVX2, and based on leaks, the ES2 samples were hitting 3.4 Ghz. We also know that the cache will be bigger, and that the platform supports DDR5. All of this is before any performance uplift from the cores themselves.

I think you are being overly pessimistic about the performance of an atom chip on 10ESF. I guess we will see who is right.

NostaSeronx · Jun 14, 2021

NTMBK said:
Share the front end too, and go Full Bulldozer!

Intel has a different technology from the VISC buyout they can do.

Tremont core 0.88 mm sqrd, Total quad-core cluster size 5.14 mm sqrd

1. Get rid of local front-ends.
2. Use a single non-processor integrated global front-end.
3. Implement a large micro-op/L0i cache in the place of the local front-end.
4. etc.

Ideally, there is interest to also get rid of local memory execution for a global memory execution cluster as well.

1. Get rid of local back-ends.
2. Use a snigle non-procoessor integrated global back-end.
3. Implement a large L0d cache in place of the local back-end.
4. etc.

Then, watch as all four 3-ALU pipelines become one 12-wide virtual core or twelve 1-wide virtual cores and everything in between.

!!1.5~4.5 MB L2!! -> <<GFE Branch Predict/Fetch/Pick/Decode -> 128KB L1i -> GFE Dispatch/Retire/Allocate>> -> ~~16KB L0i, etc.~~
!!L2 voltage/clock domain!!
<<GFE&BE voltage/clock domain>>
~~Processor voltage/clock domain~~

There is also the Mirage-config which allows the Atom cores to be converted back into InO processors. Since, the Global-part is OoO in itself, which leads to; one 12-wide OoO virtual core executing on four 3-wide InO physical cores.

The FPU is integrated within the core for Intel, so with the mirage setup. It can dispatch a part of the AVX512 to each AVX128 simd unit. However, the performance-angle makes it more ideal for each of the cores to get wider SIMDs instead.

Instead of, 12-wide 64-bit I-pipe, 4-wide 128-bit M-pipe, 4-wide 128-bit A-pipe. Do to efficiencies given by the mirage setup, it would be; 12-wide 64-bit I-pipe, 4-wide 512-bit M-pipe, 4-wide 512-bit A-pipe.

Current CPU w/ VISC at Intel has legacy to this:
https://dada.cs.washington.edu/smt/memoryLogix.pdf

IntelUser2000 · Jun 14, 2021

eek2121 said:
AnandTech had a score of 1338/9019 in their testing for a 10700. They don’t have a 11700 to test sadly, but both chips are “65W” chips.

If the ST performance is exactly Skylake, then it won't perform like the 7500. Because Gracemont lacks Hyperthreading.

Hulk · Jun 14, 2021

IntelUser2000 said:
I agree with @eek2121 for the single threaded portion. The ones scoring close to 1.3k seems a bit too high, but ones in the 1000-range are plausible.

Geekbench 5 small core (Gracemont) scores ~1000? If true I think that is very impressive for the small-er Alder Lake core. Very impressive considering my Haswell based 4770K doesn't reach 1000.

But I do have a caveat. I have a feeling that modern CPU's that are scoring well on GK5 work "well" with that benchmark (tuning for some of the apps in there) but that doesn't always translate to many real world apps. As I wrote above that Gracemont score is better than my 4770K but I wonder if a quad cluster of them in most applications wouldn't beat the 4770K? If that is the case ADL could be very special indeed.

IntelUser2000 · Jun 14, 2021

Hulk said:
But I do have a caveat. I have a feeling that modern CPU's that are scoring well on GK5 work "well" with that benchmark (tuning for some of the apps in there) but that doesn't always translate to many real world apps.

You are right. Tremont is about ~Ivy Bridge. In Geekbench it's almost Haswell level. If you look at side-by-side architectural comparisons one wins in one area, but loses in others. Tremont can at peak decode more if the two clusters work well, and it has more ROBs plus it has more ports. But Haswell has advantages in Load/Store units and has an uop cache. The lower number of pipeline stages mean lower branch mispredicts for Tremont.

Let's normalize the results:
-Take the Integer score, because the overall result is still flawed. Integer tells us the real uarch differences.
-Windows vs Windows.
-Clock scaling is prety much linear.

702 for 3.3GHz Jasper Lake versus 850-900 for 3.9GHz 4770K.

Of course in MT 4770K is much faster because of Hyperthreading, and Jasper Lake being limited in MT applications because of power. One test says it runs at 2GHz.

Same with Cinebench. In R15 Goldmont Plus gets 80. The 10W J5040 refresh gets to 95. Tremont is 130-140 which is really not far away from Haswell.

This means Gracemont will beat Haswell per clock, and not by a small amount.

VirtualLarry · Jun 15, 2021

Are you a human?

www.newegg.com

New Acer with Jasper Lake Celeron N4500 is out.

Intel® Celeron® Processor N4500 (4M Cache, up to 2.80 GHz) - Product Specifications | Intel

Intel® Celeron® Processor N4500 (4M Cache, up to 2.80 GHz) quick reference with specifications, features, and technologies.

ark.intel.com

Price seems high, $389.

I paid $230-250 for my Zen 3020e and 3050e laptops, and the 3050e is 1080P.

Curious how performance compares to Zen. But Intel is still a NO-BUY for me, until the add something similar to AMD's VSR into their drivers. I much prefer an AMD 3050e APU-based laptop with a 1080P screen that I can actually crank up to 4k UHD effectively.

coercitiv · Jun 15, 2021

eek2121 said:
I think you are being overly pessimistic about the performance of an atom chip on 10ESF. I guess we will see who is right.

Pessimistic!? I'm not the one making claims about Gracemont cluster performance. In fact, in all my posts I have accepted GC = Skylake IPC as a basis for any napkin math.

Problem is you're already shooting past Skylake. I don't know whether Anandtech's GB5 scores are under Windows or Linux, but what I do know is that even when looking at i7 9700 in GB5, which is a 8C/8T CPU running 4.5Ghz all-core and 4.7Ghz on 1-2 cores, you can barely reach 1300 ST score and MT scores hover around 7500, let's say 8000 to be safe & generous.

So what does this mean, when supposedly Gracemont running bellow 4Ghz peaked at slightly higher ST and considerably higher MT score that a 4.5Ghz+ Skylake? Even if we try to justify the higher MT scores by invoking memory scaling in the benchamrk, the ST score is still a problem since we know the benchmark doesn't scale much with bandwidth on a single core.

The only "reasonable" explanation, as long as we keep our presumption of GB5 running on the Gracemeont clusters alone, is that Gracemont on the ADL platform has considerably higher IPC than Skylake (enough to compensate a 10-20% clock deficit)

So how am I the pessimist in this discussion?

eek2121 · Jun 15, 2021

coercitiv said:
Pessimistic!? I'm not the one making claims about Gracemont cluster performance. In fact, in all my posts I have accepted GC = Skylake IPC as a basis for any napkin math.

Problem is you're already shooting past Skylake. I don't know whether Anandtech's GB5 scores are under Windows or Linux, but what I do know is that even when looking at i7 9700 in GB5, which is a 8C/8T CPU running 4.5Ghz all-core and 4.7Ghz on 1-2 cores, you can barely reach 1300 ST score and MT scores hover around 7500, let's say 8000 to be safe & generous.

So what does this mean, when supposedly Gracemont running bellow 4Ghz peaked at slightly higher ST and considerably higher MT score that a 4.5Ghz+ Skylake? Even if we try to justify the higher MT scores by invoking memory scaling in the benchamrk, the ST score is still a problem since we know the benchmark doesn't scale much with bandwidth on a single core.

The only "reasonable" explanation, as long as we keep our presumption of GB5 running on the Gracemeont clusters alone, is that Gracemont on the ADL platform has considerably higher IPC than Skylake (enough to compensate a 10-20% clock deficit)

So how am I the pessimist in this discussion?

The i5 6500 is a 4 core, 4 thread CPU running at 3.2-3.6 ghz. Previous rumors HAVE claimed the new cores will have performance similar to skylake. We also have rumors that say ES2 was running at 3.4 ghz. AnandTech has the 6500 as having a GB5 single core speed of 1001 and a multicore speed of 3372. You can't have it both ways, either it will offer performance similar to skylake or it won't. I'm inclined to think it will. There is no reason Intel can't make that kind of performance leap.

Apparently Razer canceled my preorder this morning for the 11800h system. Intel just lost a sale. I needed a laptop before July, so I bought the razer blade advanced 14 with a 5900hx and a 3070 off Amazon. It will be here later this week.

Discussion Intel current and future Lakes & Rapids thread

Senior member

Senior member

Diamond Member

Platinum Member

Member

Lifer

Member

Diamond Member

Diamond Member

Diamond Member

Lifer

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Golden Member

Elite Member

Diamond Member

Diamond Member

Elite Member

Diamond Member

Elite Member

No Lifer

Diamond Member

Diamond Member