Discussion Intel current and future Lakes & Rapids thread

Page 458 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

insertcarehere

Senior member
Jan 17, 2013
639
607
136
See the figure of Sunny Cove without L2 being 4.5mm2? Well Tremont is actually only 0.85mm2. The ENTIRE Tremont cluster is only 4.9mm2. So that includes the 2MB L2 and the I/O that's required for a quad core cluster.

Core-only Tremont is only 0.85mm2. Sunny Cove is 5-6x as large, not 4x.

The Front End, the L3 cache, Load/Store units, or the Vector units are each a Tremont core!
If tremont/gracemont turn out to be decent performing cores in their own right, the path to competitive MT performance lies with putting as many of them in a die as possible with a few Core cores to handle lightly-threaded tasks.

Come to think of it, the team that designed those cores should take over on Golden Cove and beyond as well...
 

SAAA

Senior member
May 14, 2014
541
126
116
If tremont/gracemont turn out to be decent performing cores in their own right, the path to competitive MT performance lies with putting as many of them in a die as possible with a few Core cores to handle lightly-threaded tasks.

Come to think of it, the team that designed those cores should take over on Golden Cove and beyond as well...
I posted earlier in favour of 8+32 core design going forward, would make sense given Raptor has been said to be 8+16, I could see that competitive MT performance happening very soon.
The issue left looks to be area/power of the big cores, unless they are somehow hiding larger IPC gains with golden and sandbagging on purpose. Would be a surprise if it turns out to be a lot larger than 20% IPC over Tiger but at lower average clockspeed keeping up with the 20% claim.
That or golden+ and redwood cove cores are due to some efficiency tuning.
 

coercitiv

Diamond Member
Jan 24, 2014
6,185
11,851
136
Still smaller by a margin. As for a power consumption, who knows how Golden Cove is clocked.
Based on what we've seen with Willow Cove and 10SF, Intel seems to be aiming for high clocks. I don't see how people can judge SPR as a power hog based on TDP alone, not when we've already seen TGL-H managing a near-linear efficiency curve as the 8-core die uses 80W+ all by itself.

tgl-h.jpg

Having such low efficiency to start is not good at all, but being able to catch up in efficiency at higher power levels means they may actually seek those high performance levels to improve their figures relative to the competition. As far as I can tell, high performnce levels also play well with the inclusion with HBM on the package.
 

uzzi38

Platinum Member
Oct 16, 2019
2,621
5,873
146
Yes, it os a significant difference, but Intel still will make more money per chip than AMD because they own their own fabs.



Don’t write off AVX-512. It is very important for certain workloads. That 350W TDP is likely with AVX workloads. Without it will likely be a bit lower.

The top Genoa chips are rumored to have a 370W TDP.

On margins, we don,'t really have a per-product breakdown of Intel/AMD's margins - heck, we don't even know what actual final prices Xeons/EPYCs chips actually sell for because it's rarely every MSRP. But if Intel want to price competitively (like they are now), I think they'll struggle to maintain the same margins.

The 350W TDP will be for all workloads, not just AVX-512 ones. The TDP rating works the same way as that for any Xeon released since they came to be.

Top end Genoa is 320W TDP (cTDP can go higher of course, but it's not the default for the chip).
 
Last edited:
Jan 12, 2021
30
64
61
I was thinking if Intel brought big.LITTLE to the servers, they could do on the small cores something similar to what ARM did with the A510, share the AVX-512 and AMX with 2 cores, so they would have a smaller penalty on a area and in power.
Captura de tela 2021-06-13 121907.png
 
Last edited:
  • Like
Reactions: Tlh97 and SAAA

Ajay

Lifer
Jan 8, 2001
15,429
7,847
136
So why not 16 cores? Looks like a 4x4 grid. What happened to one of the slots?
It's a 4x5 grid. 15 of those are clearly compute cores (by my eyes), 3 are definitely not and I can't tell on the other 2.
Unfortunately, the false coloration of these metal layers is a bit messed up. The Skylake-SP die shot shown above is much clearer.
There's allot of I/O around the periphery of the chip - as expected.
 

diediealldie

Member
May 9, 2020
77
68
61
Based on what we've seen with Willow Cove and 10SF, Intel seems to be aiming for high clocks. I don't see how people can judge SPR as a power hog based on TDP alone, not when we've already seen TGL-H managing a near-linear efficiency curve as the 8-core die uses 80W+ all by itself.

View attachment 45694

Having such low efficiency to start is not good at all, but being able to catch up in efficiency at higher power levels means they may actually seek those high performance levels to improve their figures relative to the competition. As far as I can tell, high performance levels also play well with the inclusion with HBM on the package.

I agree with you in general, but achieving same performance with additional 10W isn't really a good sign(same score at 75W and 85W).
This is 8 core 11800H and you'll need to use almost 100W to provide clear PPC advantage against Ryzen 5800H. If it were 52 core server variant, power consumption will be whooping 700+W. More than 10watts per core.

Sapphire rapids' 350W power budget means ~7W per core, which is quite far from current 10SF / 7nm intercept points(where intel provides more performance per watt). 10ESF + new Golden cove core might give intel more room, but I don't think they can fix this soon.

I think Sapphire Rapids' true power comes from HPC, bandwidth limited tasks and RDBMS. Raw computing capability per socket, maybe not.
 

coercitiv

Diamond Member
Jan 24, 2014
6,185
11,851
136
This is 8 core 11800H and you'll need to use almost 100W to provide clear PPC advantage against Ryzen 5800H. If it were 52 core server variant, power consumption will be whooping 700+W. More than 10watts per core.

Sapphire rapids' 350W power budget means ~7W per core, which is quite far from current 10SF / 7nm intercept points(where intel provides more performance per watt). 10ESF + new Golden cove core might give intel more room, but I don't think they can fix this soon.
I wasn't trying to build a narrative where SPR would reach the crossover point where it matches or beats the competition. The point of my post was to show Intel may be in a position where increasing power per core doesn't really hurt their efficiency. (the bad start in efficiency is another problem).
 
  • Like
Reactions: Tlh97 and lobz

eek2121

Platinum Member
Aug 2, 2005
2,930
4,025
136
The 11980HK was tested, the the 11800H. I will have my hands on an 11800H system next week, so I can post some tests here.

On the subject of small cores, I expect a cluster of 4 to perform somewhere between a Core i5 6500 and a Core i5 7500. Possibly a bit faster or slower depending on final clocks.

EDIT: We all saw the low Geekbench scores. I assumed it was due to low clocks, but what if those were the small cores running the test? It lines up perfectly with the expected performance. I guess we will find out. If it were the small cores doing the lifting for the tests, then Alder Lake is actually going to be quite a beast.

EDIT 2: If my first edit is correct, it would match up with my earlier expectation that Alder Lake will be a bit faster than a 5950X.
 
Last edited:
  • Like
Reactions: lightmanek

Hulk

Diamond Member
Oct 9, 1999
4,214
2,005
136
I don’t know what you are talking about. Alder Lake scored similar similarly to how a hypothetical 8-core Core i5 7500 would: https://browser.geekbench.com/v5/cpu/search?q=alder

Did we ever get any confirmation on what frequency that Golden Cove core was running for the Geekbench 5 single core score? I think without knowing that and the core speeds for the MT test we don't really learn much from these benches. If that ST score was at 2GHz then wow, that's fantastic. If it's 4GHz then not so much.
 

coercitiv

Diamond Member
Jan 24, 2014
6,185
11,851
136
I don’t know what you are talking about. Alder Lake scored similar similarly to how a hypothetical 8-core Core i5 7500 would: https://browser.geekbench.com/v5/cpu/search?q=alder
11700 does 9500-10k in GB5.
10700 does 8500-8700.
9700 hits around 7500-7800.

You hypothetical 8-core 7500 would hover under 7500 in GB5, yet Alder lake went as high as 9000, hitting close to i7 11700 at arguably lower clocks. Either it was using all 14 cores at lower speed, or the Gracemont clusters are at Cypress Cove level. Make your pick.
 
  • Like
Reactions: lobz and Tlh97

Thala

Golden Member
Nov 12, 2014
1,355
653
136
11700 does 9500-10k in GB5.
10700 does 8500-8700.
9700 hits around 7500-7800.

You hypothetical 8-core 7500 would hover under 7500 in GB5, yet Alder lake went as high as 9000, hitting close to i7 11700 at arguably lower clocks. Either it was using all 14 cores at lower speed, or the Gracemont clusters are at Cypress Cove level. Make your pick.

Just to help Eek2121 with his pick:
Jasper Lake scores at highest configuration in N6000 (4xTremont@3.3GHz): link

Sure, Gracemont is supposed to be somewhat faster than Tremont...but still...
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
I agree with @eek2121 for the single threaded portion. The ones scoring close to 1.3k seems a bit too high, but ones in the 1000-range are plausible.

Plus user-submitted scores are notoriously unreliable indicators for comparison since you can have the same setup but have immensely different scores.

The early Geekbench scores for Lakefield were really, really poor especially on the ST side because it was probably running on the low-clocked Tremont cores.

The early nature of the submission could mean some are running on Gracemont, some on Golden Cove.

You can see from the cache data that it doesn't recognize that properly either. It says 10x 64KB L1-I, and 10x 32KB L2-D when we know that it's gracemont that has that configuration and Golden Cove is 32KB L1-I and 48KB L2-D.
 

eek2121

Platinum Member
Aug 2, 2005
2,930
4,025
136
11700 does 9500-10k in GB5.
10700 does 8500-8700.
9700 hits around 7500-7800.

You hypothetical 8-core 7500 would hover under 7500 in GB5, yet Alder lake went as high as 9000, hitting close to i7 11700 at arguably lower clocks. Either it was using all 14 cores at lower speed, or the Gracemont clusters are at Cypress Cove level. Make your pick.
AnandTech had a score of 1338/9019 in their testing for a 10700. They don’t have a 11700 to test sadly, but both chips are “65W” chips.

We know that Gracemont will support AVX2, and based on leaks, the ES2 samples were hitting 3.4 Ghz. We also know that the cache will be bigger, and that the platform supports DDR5. All of this is before any performance uplift from the cores themselves.

I think you are being overly pessimistic about the performance of an atom chip on 10ESF. I guess we will see who is right.
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,686
1,221
136
Share the front end too, and go Full Bulldozer!
Intel has a different technology from the VISC buyout they can do.

Tremont core 0.88 mm sqrd, Total quad-core cluster size 5.14 mm sqrd

1. Get rid of local front-ends.
2. Use a single non-processor integrated global front-end.
3. Implement a large micro-op/L0i cache in the place of the local front-end.
4. etc.

Ideally, there is interest to also get rid of local memory execution for a global memory execution cluster as well.

1. Get rid of local back-ends.
2. Use a snigle non-procoessor integrated global back-end.
3. Implement a large L0d cache in place of the local back-end.
4. etc.

Then, watch as all four 3-ALU pipelines become one 12-wide virtual core or twelve 1-wide virtual cores and everything in between.

!!1.5~4.5 MB L2!! -> <<GFE Branch Predict/Fetch/Pick/Decode -> 128KB L1i -> GFE Dispatch/Retire/Allocate>> -> ~~16KB L0i, etc.~~
!!L2 voltage/clock domain!!
<<GFE&BE voltage/clock domain>>
~~Processor voltage/clock domain~~

There is also the Mirage-config which allows the Atom cores to be converted back into InO processors. Since, the Global-part is OoO in itself, which leads to; one 12-wide OoO virtual core executing on four 3-wide InO physical cores.

The FPU is integrated within the core for Intel, so with the mirage setup. It can dispatch a part of the AVX512 to each AVX128 simd unit. However, the performance-angle makes it more ideal for each of the cores to get wider SIMDs instead.

Instead of, 12-wide 64-bit I-pipe, 4-wide 128-bit M-pipe, 4-wide 128-bit A-pipe. Do to efficiencies given by the mirage setup, it would be; 12-wide 64-bit I-pipe, 4-wide 512-bit M-pipe, 4-wide 512-bit A-pipe.

:laughing:

Current CPU w/ VISC at Intel has legacy to this:
https://dada.cs.washington.edu/smt/memoryLogix.pdf
 
Last edited:

Hulk

Diamond Member
Oct 9, 1999
4,214
2,005
136
I agree with @eek2121 for the single threaded portion. The ones scoring close to 1.3k seems a bit too high, but ones in the 1000-range are plausible.

Geekbench 5 small core (Gracemont) scores ~1000? If true I think that is very impressive for the small-er Alder Lake core. Very impressive considering my Haswell based 4770K doesn't reach 1000.

But I do have a caveat. I have a feeling that modern CPU's that are scoring well on GK5 work "well" with that benchmark (tuning for some of the apps in there) but that doesn't always translate to many real world apps. As I wrote above that Gracemont score is better than my 4770K but I wonder if a quad cluster of them in most applications wouldn't beat the 4770K? If that is the case ADL could be very special indeed.
 
  • Like
Reactions: Tlh97

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
But I do have a caveat. I have a feeling that modern CPU's that are scoring well on GK5 work "well" with that benchmark (tuning for some of the apps in there) but that doesn't always translate to many real world apps.

You are right. Tremont is about ~Ivy Bridge. In Geekbench it's almost Haswell level. If you look at side-by-side architectural comparisons one wins in one area, but loses in others. Tremont can at peak decode more if the two clusters work well, and it has more ROBs plus it has more ports. But Haswell has advantages in Load/Store units and has an uop cache. The lower number of pipeline stages mean lower branch mispredicts for Tremont.

Let's normalize the results:
-Take the Integer score, because the overall result is still flawed. Integer tells us the real uarch differences.
-Windows vs Windows.
-Clock scaling is prety much linear.

702 for 3.3GHz Jasper Lake versus 850-900 for 3.9GHz 4770K.

Of course in MT 4770K is much faster because of Hyperthreading, and Jasper Lake being limited in MT applications because of power. One test says it runs at 2GHz.

Same with Cinebench. In R15 Goldmont Plus gets 80. The 10W J5040 refresh gets to 95. Tremont is 130-140 which is really not far away from Haswell.

This means Gracemont will beat Haswell per clock, and not by a small amount.
 
  • Like
Reactions: Hulk and Tlh97

VirtualLarry

No Lifer
Aug 25, 2001
56,326
10,034
126

New Acer with Jasper Lake Celeron N4500 is out.


Price seems high, $389.

I paid $230-250 for my Zen 3020e and 3050e laptops, and the 3050e is 1080P.

Curious how performance compares to Zen. But Intel is still a NO-BUY for me, until the add something similar to AMD's VSR into their drivers. I much prefer an AMD 3050e APU-based laptop with a 1080P screen that I can actually crank up to 4k UHD effectively.
 
  • Like
Reactions: Tlh97

coercitiv

Diamond Member
Jan 24, 2014
6,185
11,851
136
I think you are being overly pessimistic about the performance of an atom chip on 10ESF. I guess we will see who is right.
Pessimistic!? I'm not the one making claims about Gracemont cluster performance. In fact, in all my posts I have accepted GC = Skylake IPC as a basis for any napkin math.

Problem is you're already shooting past Skylake. I don't know whether Anandtech's GB5 scores are under Windows or Linux, but what I do know is that even when looking at i7 9700 in GB5, which is a 8C/8T CPU running 4.5Ghz all-core and 4.7Ghz on 1-2 cores, you can barely reach 1300 ST score and MT scores hover around 7500, let's say 8000 to be safe & generous.

So what does this mean, when supposedly Gracemont running bellow 4Ghz peaked at slightly higher ST and considerably higher MT score that a 4.5Ghz+ Skylake? Even if we try to justify the higher MT scores by invoking memory scaling in the benchamrk, the ST score is still a problem since we know the benchmark doesn't scale much with bandwidth on a single core.

The only "reasonable" explanation, as long as we keep our presumption of GB5 running on the Gracemeont clusters alone, is that Gracemont on the ADL platform has considerably higher IPC than Skylake (enough to compensate a 10-20% clock deficit)

So how am I the pessimist in this discussion?
 
  • Like
Reactions: Gideon and Tlh97

eek2121

Platinum Member
Aug 2, 2005
2,930
4,025
136
Pessimistic!? I'm not the one making claims about Gracemont cluster performance. In fact, in all my posts I have accepted GC = Skylake IPC as a basis for any napkin math.

Problem is you're already shooting past Skylake. I don't know whether Anandtech's GB5 scores are under Windows or Linux, but what I do know is that even when looking at i7 9700 in GB5, which is a 8C/8T CPU running 4.5Ghz all-core and 4.7Ghz on 1-2 cores, you can barely reach 1300 ST score and MT scores hover around 7500, let's say 8000 to be safe & generous.

So what does this mean, when supposedly Gracemont running bellow 4Ghz peaked at slightly higher ST and considerably higher MT score that a 4.5Ghz+ Skylake? Even if we try to justify the higher MT scores by invoking memory scaling in the benchamrk, the ST score is still a problem since we know the benchmark doesn't scale much with bandwidth on a single core.

The only "reasonable" explanation, as long as we keep our presumption of GB5 running on the Gracemeont clusters alone, is that Gracemont on the ADL platform has considerably higher IPC than Skylake (enough to compensate a 10-20% clock deficit)

So how am I the pessimist in this discussion?

The i5 6500 is a 4 core, 4 thread CPU running at 3.2-3.6 ghz. Previous rumors HAVE claimed the new cores will have performance similar to skylake. We also have rumors that say ES2 was running at 3.4 ghz. AnandTech has the 6500 as having a GB5 single core speed of 1001 and a multicore speed of 3372. You can't have it both ways, either it will offer performance similar to skylake or it won't. I'm inclined to think it will. There is no reason Intel can't make that kind of performance leap.

Apparently Razer canceled my preorder this morning for the 11800h system. Intel just lost a sale. I needed a laptop before July, so I bought the razer blade advanced 14 with a 5900hx and a 3070 off Amazon. It will be here later this week.