Discussion Intel current and future Lakes & Rapids thread

Page 19 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Mar 10, 2006
11,715
2,011
126
I know its apples and oranges, but didn't Apple move to a full L3 victim cache on their A9 and A10 cpus? Personally, with a large L2 cache, I don't really see the downside to a L3 victim cache.

Now the smaller size, higher latency, lower clocks does impact performance.
They had been using a victim cache since the A7. The A9X did away with the L3 though and the A10X kept the L3 off and saw a big jump in L2 cache size (from 3MB to 8MB), which helped boost perf/clock a lot.

For a client design you really want a big, fast, low-latency cache shared among your cores, and that's why you see Intel with Gemini Lake moving from 1MB/core pair in Apollo Lake to 4MB/quad core cluster. In a single-thread scenario, your core has access to a big pile of L2, and even when all four cores are loaded up, they have access to a large L2 pool.

I would like to see Intel diverge their cores significantly -- Atom for lower-end client/IoTG/etc., Core for high-end client, and then a separate core design for Xeon.

By trying to "converge" the cores to save on some R&D spending, Intel makes trade-offs that are annoying :p
 

IntelUser2000

Elite Member
Oct 14, 2003
8,079
2,883
136
I know its apples and oranges, but didn't Apple move to a full L3 victim cache on their A9 and A10 cpus? Personally, with a large L2 cache, I don't really see the downside to a L3 victim cache.

Now the smaller size, higher latency, lower clocks does impact performance.
The 3-level design actually makes a lot of sense. You can't just have a 64KB L1 then a 4MB L2. If you want blisteringly high clocks that Intel does, you have to sacrifice something. That means a slow 4MB L2. Since there will be code that find 256KB-1MB size plenty, a 4MB L2 would end up lowering performance. Triple level caches allow you to optimize it for most.

The L3 on the new CPUs is a step back, yeah -- higher latency in clocks, lower clocks, and it's a victim cache. But I'm not sure it was reworked like this because of the AVX512 units. Probably more because they beefed up the L2 per core and sticking with the old cache structure would've taken up a lot of die space.
I don't have a problem with SKL-SP core. It's a fine design for servers. The revised L2 cache and L3 is said to offer something like 10% increase in certain server scenarios. The same core on SKL-X is a bit of a thorny issue.
 
Last edited:

IntelUser2000

Elite Member
Oct 14, 2003
8,079
2,883
136
Intel's reluctance on making more core varieties.

You'll notice that Intel cancels projects that seem ok on their own, if it was done by a much smaller company. I think they have an internal policy where if the project doesn't fit into their "Grand Vision" of materially impacting their bottom line, they cancel it. Hence their numerous "failures" regarding their efforts with set-top box SoCs and even earlier ones like consumer oriented IntelPlay devices, or the more recent DIY IoT products like Edison.

That may be the real reason Intel can't branch out from their bread-and-butter PC line. Sometimes, you just have to stick to doing it for years until it works. Not everything grows like their x86 division did.

What does this have to do with my headline?

Intel proliferates an astonishing amount of dies and SKUs out of the top die. The ENTIRE market from high performance Tablets to K series desktop chips have a single top die they spread from. That's the 4+2 configuration.

They do that for servers too. Which explains why they make them the basis for HEDT, because the HEDT in itself doesn't impact their bottom line in a big way.

I assume the reason they keep the mobile PC and desktop cores the same is because the usage scenario heavily overlaps between the two markets. They may find if they do something like an SKL-SP core deviation to split the mobile PC and desktop PC ones, one will be more optimal for both than the other. The amount of people that would benefit from that is way too small to justify making a third line.

Perhaps in the hypothetical future if the enthusiast gaming market is big enough to be considered the 3rd next to consumer PC and server and they see future growth being significant we'll see a "Gaming Core" chip.

Until then, don't expect that to change.
 

TheGiant

Senior member
Jun 12, 2017
748
353
106
Intel's reluctance on making more core varieties.

Until then, don't expect that to change.
I wonder if SKL-SP won't perform better if they just implemented the mesh for skylake architecture and kept the cache untouched and AVX512 out.

I bet the power, performance and clocks would be better than with current implementation.

Can the mesh be clocked higher with 6-8 cores? Like 4GHz+
 
Mar 10, 2006
11,715
2,011
126
I wonder if SKL-SP won't perform better if they just implemented the mesh for skylake architecture and kept the cache untouched and AVX512 out.

I bet the power, performance and clocks would be better than with current implementation.

Can the mesh be clocked higher with 6-8 cores? Like 4GHz+
Mesh seems to top out at 3.2GHz, even with a lot of voltage shoved down its throat.
 
Mar 10, 2006
11,715
2,011
126
You know, I thought Intel was going to do a real split between mainstream and server. That was until it became obvious that Intel is cutting/redirecting R&D.
The naming suggests they're going to split. Sapphire Rapids is...Sapphire Rapids, not Tiger Lake. Cascade Lake is...Cascade Lake, not Coffee Lake Server.
 

jpiniero

Lifer
Oct 1, 2010
11,347
3,058
136
The naming suggests they're going to split. Sapphire Rapids is...Sapphire Rapids, not Tiger Lake. Cascade Lake is...Cascade Lake, not Coffee Lake Server.
I think it's premature to assume that Sapphire Rapids won't have a mainstream version.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,079
2,883
136
I wonder if SKL-SP won't perform better if they just implemented the mesh for skylake architecture and kept the cache untouched and AVX512 out.

I bet the power, performance and clocks would be better than with current implementation.
Skylake-SP is server. I think you meant Skylake-X.

The move to mesh, the 4x larger L2 cache, and the exclusive L3 cache all benefits server workloads.

The underwhelming result on SKL-X I guess can cloud our judgment for everything based on that core, but SKL-X being the way it is, is due to the fact they used a server-optimized core, not a client optimized one.
 

DrMrLordX

Lifer
Apr 27, 2000
19,176
7,930
136
Xeon D is also for network appliances. It was a result of collaboration with Facebook, and an answer to ARM servers being used in low compute but high networking requirement. Their chips are quite widely used in software router/switch applications for ISPs but they needed a more purpose built SoC that could also go in cheaper and lower cost/power devices.

The Atom based ones are there to fill the even lower end of the segment.
Okay, didn't know Xeon-D was also going into networking appliances. I would think they would have made a killer CPU for a CPU-based render farm given their efficiency.

Makes me wonder if we'll see any more Xeon-Ds at all then, since Xeon-D itself had interesting origins.

The large cores though are used in more broad usage scenarios where other types of applications can be run in addition to software networking. In the case where a more purpose-built networking chips are needed the regular Xeon chips weren't as suited. The answer became Xeon D. The server market isn't so simple as to think there's just one type of usage. It's as varied as any market.
Right, different strokes for different folks. The situation with Xeon-D is a bit more complicated since the CPUs themselves can do just about everything the same as their bigger brothers, albeit with different core counts and clockspeeds. Stuff like Denverton is notably weaker than what you get in a Xeon.

Intel's reluctance on making more core varieties.

Intel proliferates an astonishing amount of dies and SKUs out of the top die. The ENTIRE market from high performance Tablets to K series desktop chips have a single top die they spread from. That's the 4+2 configuration.

Until then, don't expect that to change.
It may last until someone makes a prupose-based series of SKUs that are cheaper and/or better than their multipurpose Core die in a particular segment.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,079
2,883
136
Right, different strokes for different folks. The situation with Xeon-D is a bit more complicated since the CPUs themselves can do just about everything the same as their bigger brothers, albeit with different core counts and clockspeeds. Stuff like Denverton is notably weaker than what you get in a Xeon.
That's true. But by offering it as a more specialized solution, the platform is cheaper for those that don't need the compute power. It also uses less power too.

Another chip that don't have much spotlight shone on by regular folks is their Puma networking line. They got the IP from acquiring the division from TI. Shaw(Canadian broadband internet company) uses Puma 5/6 for the router they send for their customers. They use an Atom-based core.

Though recently the Puma line had issues with spike in networking latency with sufficient simultaneous network requests and little more people know about them. Too bad for Intel, because they seemed to have skimped on some features and the Puma core is the fault for the slowdown :p
 
Last edited:

Bouowmx

Golden Member
Nov 13, 2016
1,138
550
146
Im not sure if this was ever posted here. but a friend on mine found this a couple of months back, I just remembered about it.
https://browser.geekbench.com/v4/cpu/2400363
Probably lost in the Intel Skylake thread, before the thread was changed to be exclusively about 14-nm products.

Processor detection is all over. 1-core, 48 (up from 32) KB L1D and 12 MB L3 (like a 6-core). Don't know if it is any useful information.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,079
2,883
136
Probably lost in the Intel Skylake thread, before the thread was changed to be exclusively about 14-nm products.

Processor detection is all over. 1-core, 48 (up from 32) KB L1D and 12 MB L3 (like a 6-core). Don't know if it is any useful information.
The latency part is suspicious. It's 10-20x better than 7700K. That would make the main memory faster in latency than the L3 cache, and almost on par with L2 cache. I think the results shouldn't be discarded, but not representative of a product that will be shipped.

For the L1 cache info,

There were random wiki entries about 128KB L1 cache on future Intel products as early as Skylake. Some fanboy that has his idea of what an ideal future Intel CPU decided to edit a Wiki entry I am assuming.
 
Last edited:
  • Like
Reactions: Bouowmx

IntelUser2000

Elite Member
Oct 14, 2003
8,079
2,883
136
According to the PCWatch article(thanks for the link goes to Sweepr), Cannonlake will release in limited quantities later this year. It also says that it'll ship in great volume by Q2 of next year.

Actually, I'm not 100% of the release this year. It is after all based on automatic translation. It could also mean that we'll see it "shipping" but products later, perhaps a CES launch for the limited part. I think it claims both U and Y will see limited launch. They also say yield problem may be on the GPU side, which is why we won't see GT3 and GT4 versions.

Originally Cannonlake was supposed to be,

GT1 - 24EUs
GT2 - 40EUs
GT2.5 - 56EUs
GT3 - 72EUs
GT4 - 104EUs

Also, Intel was originally planning to push parts with HBM memory in 2018. If we base it on original plans that means Icelake. Although HBM has issues of its own, likely unable to yield enough functioning chips or reaching desired clock speeds.

That does mean there's a possibility we'll see Icelake as a token launch in 2018, with real volume coming 2H of 2019. Assuming no further delays of course.
 
Last edited:

jpiniero

Lifer
Oct 1, 2010
11,347
3,058
136
The latency part is suspicious. It's 10-20x better than 7700K. That would make the main memory faster in latency than the L3 cache, and almost on par with L2 cache. I think the results shouldn't be discarded, but not representative of a product that will be shipped.
I have to think they were testing a part with LPDDR4X fused in via EMIB. That latency score is likely (?) unrealistic but fast enough to break the test.
 

Glo.

Diamond Member
Apr 25, 2015
4,973
3,588
136
Do you guys think that Icelake would bring Quad channel memory to mainstream offering?
 
  • Like
Reactions: Drazick

NTMBK

Diamond Member
Nov 14, 2011
9,698
3,543
136
Do you guys think that Icelake would bring Quad channel memory to mainstream offering?
Nope. Big jump in complexity and cost, for minimal gains in consumer workloads. They'll just keep riding the increasing DDR4 speeds.
 
Mar 10, 2006
11,715
2,011
126
Nope. Big jump in complexity and cost, for minimal gains in consumer workloads. They'll just keep riding the increasing DDR4 speeds.
Consumer workloads are sensitive to latency moreso than raw bandwidth, so there's no point for quad channel when DDR speeds keep increasing. Better to invest in technologies like eDRAM.
 
  • Like
Reactions: NTMBK

Bouowmx

Golden Member
Nov 13, 2016
1,138
550
146
Ice Lake U/Y
Status : 2017 Q3 Pre-release
Somebody remember what the status description of Ice Lake was, before "Pre-release"?

I noticed that Ice Lake's description changed:
The Ice Lake processor family is a successor to the 8th generation Intel® Core™ processor family.
The Ice Lake processor family is the next generation Intel® Core™ processor family.
 

TheGiant

Senior member
Jun 12, 2017
748
353
106
Somebody remember what the status description of Ice Lake was, before "Pre-release"?

I noticed that Ice Lake's description changed:
The Ice Lake processor family is a successor to the 8th generation Intel® Core™ processor family.
The Ice Lake processor family is the next generation Intel® Core™ processor family.
So this is the last lake?

what is coming next? I hope finally something revolutionary like C2D
 

ASK THE COMMUNITY