Discussion Intel current and future Lakes & Rapids thread

Page 719 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Exist50

Platinum Member
Aug 18, 2016
2,445
3,043
136
Because it's AP it's very plausible it was a version with AVX-512 support, maybe with 1x-512 units like client units or 2 cycle like with AMD which is double Gracemont.
I sincerely doubt they're adding AVX-512 to Atom, and they're certainly not going to do so for just one product line.

If Sierra Forest maxes out just a bit above GNR, let's say 144 cores for nice math, then that yields an area equivalent to ~36 RWC. That would almost certainly be part of the SP line, not AP, and thus max out well below the IO and power capabilities of an AP product. Really, the only reason I could see for such a decision is to limit development resources.
So while the E cores are actually power efficient when you put it in the proper habitat, it also follows a bell curve and in case of desktop chips it's entirely about area density.
Same applies to server. Density is very valuable from a TCO and product cost perspective.

Oh, and while I'm on the topic, I fully expect Royal to be added to the "revolutionary technologies" list. Ah, wish they would hurry up with it.
 

jpiniero

Lifer
Oct 1, 2010
14,509
5,159
136
I sincerely doubt they're adding AVX-512 to Atom, and they're certainly not going to do so for just one product line.

I have to think they will add it eventually. Unless they are planning on ditching big.LITTLE at some point.
 

jpiniero

Lifer
Oct 1, 2010
14,509
5,159
136
What's your reasoning as to why not?

Because they like AVX-512. The issue with the client parts has more to do with their inability to get cross-ISA working well enough. And then because of the 10 (and now 7) nm debacle, Intel is still releasing products with cores defined a long time ago... probably before they realized cross-ISA wasn't going to work. They just haven't caught up yet.
 
Jul 27, 2020
15,738
9,803
106
The issue with the client parts has more to do with their inability to get cross-ISA working well enough.
It could have been done through whitelisting (AVX-512 executables aren't that many and Windows could just maintain a list of such EXEs and run these only on the P-cores). Similar thing could be done in Linux.

They sacrificed AVX-512 for the sake of seamless operation and wasted valuable die space. They were really, REALLY desperate to compete with AMD at any cost. The launch with AVX-512 enabled proved that disabling AVX-512 was a last minute decision from someone at the top (maybe Pat?). I'm definitely not happy that they fused off the AVX-512 units. The only reason they got away with it is because most people don't need them. Should have left it to the users' discretion and disabled it in BIOS by default, like it was at launch.
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,683
1,218
136
I could have sworn that before Zen4 came out, Intel was being lambasted to make a new ISA inline with RVV/SVE.

Currently it is this:
SSE2+ 128-bit
AVX+ 128-bit
AVX_VL 128-bit
&&
AVX+ 256-bit
AVX_VL 256-bit
&&
AVX512

On one of the next advanced architecture design group speculation forums:
x-bit Reg FPFMA
V.x=0, 1-wide SISD/Scalar FMA x-bit op [if V.x=0, allows 80-bit/EP for x-bit]
V.x=1, 2-wide SIMD FMA x-bit op
V.x=2, 4-wide SIMD FMA x-bit op
V.x=3, 8-wide SIMD FMA x-bit op
Which can be overlapped with every x87->AVX512 operation with the least amount instructions. [The above covers Centaur's unreleased Extended x87 instruction set, CT64-x87.]

Microsoft's SSEx->NEON transpiler emulator can be improved upon with the above. With the added benefit of the base remains the same, only the extension is swapped: x64-(SSEx) -> x64-(New ISE).
----
AVX512 1024-bit (11)
AVX512 512-bit (01 or 10)
AVX512 256-bit (10 or 01)
AVX512 128-bit (00)
AVX512 Scalar :: 5 instructions for same op
AVX 256-bit
AVX 128-bit
AVX Scalar :: 3 instructions for same op
SSE 128-bit
SSE Scalar :: 2 instructions for same op
10 instructions for same op

-> New instruction set extension:
1 instruction can implement all of the above. Moving complexity of operations to the rename/scheduler is more CISC, than placing the complexity on decode.

New definitions:
CISC -> Grand Library of 1-ISC(OISC)
RISC -> Small Library of 1-ISC(OISC)

Complexity is available, but it is on the micro-architecture not the extension to achieve it.

However, you could just google HPC portability to get this more simplified:
HPC for example is moving away from tuning for portability;
SIMD Architecture-implementation Agnostic high-level code:
Gen1-Arch: 256-bit low-level runtime code
Gen9-Arch: 2048-bit, +Packed SIMD Streaming (8x 256-bit)-also... low-level runtime code

Because of the above, Modern SIMD instruction sets have made HPC computers lean towards custom RISC-V/ARM. Since, they can upgrade and expanded rapidly and know the code will work across generations.

HPC Computer Facility-example:
Building A Phase 1: Gen1 computers
Building E Phase 12: Gen9 computers

If the instruction op is shared between o.g. heterogeneous architectures CPU and GPU:
CPU does V.a_tot to 512-bit~1024-bit && GPU does V.b_tot to 1024-bit~2048-bit, the new instruction extension is definitely needed.

RVV Class A custom CPU: RVV supports 128-bit through 1024-bit.
RVV Class B custom GPU(minus graphics): RVV supports 1024-bit through 8192-bit.
Rapids and Shores being around this is a bit silly.
 
Last edited:
  • Wow
Reactions: Grazick

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
Really, the only reason I could see for such a decision is to limit development resources.

Yea probably because it still needs to test the waters. No matter the hype Sierra Forest(and much more true for Falcon Shores) is an untested platform.

If they got -AP out, there'd be no point of having cores that have low FP throughput. That's the whole point of the product line.

They didn't just port Silvermont to Xeon Phi. They called it "modified Silvermont" but they extended things like ROBs and was it's own core. It was "modified Silvermont" much as Goldmont Plus is "modified Goldmont". I doubt Sierra Forest would simply just port Crestmont.

Oh, and while I'm on the topic, I fully expect Royal to be added to the "revolutionary technologies" list. Ah, wish they would hurry up with it.

I think Royal is starting to be hyped up too much considering how early it is in the life cycle. Considering even the hyped up Apple chips didn't get there in one generation.
 
Last edited:

eek2121

Platinum Member
Aug 2, 2005
2,904
3,906
136
It could have been done through whitelisting (AVX-512 executables aren't that many and Windows could just maintain a list of such EXEs and run these only on the P-cores). Similar thing could be done in Linux.

They sacrificed AVX-512 for the sake of seamless operation and wasted valuable die space. They were really, REALLY desperate to compete with AMD at any cost. The launch with AVX-512 enabled proved that disabling AVX-512 was a last minute decision from someone at the top (maybe Pat?). I'm definitely not happy that they fused off the AVX-512 units. The only reason they got away with it is because most people don't need them. Should have left it to the users' discretion and disabled it in BIOS by default, like it was at launch.

Uh no. It is impossible to maintain a 100% accuracy rate with a white list without 100% developer cooperation for any company.

Since they have an open ecosystem, that will never happen.
 

JustViewing

Member
Aug 17, 2022
135
232
76
Uh no. It is impossible to maintain a 100% accuracy rate with a white list without 100% developer cooperation for any company.

Since they have an open ecosystem, that will never happen.
Applications can request OS to run only on cores which have AVX512. But is it worth compared with 8 x AVX512 vs 24 x AVX256 ?
 

moinmoin

Diamond Member
Jun 1, 2017
4,933
7,619
136
Applications can request OS to run only on cores which have AVX512. But is it worth compared with 8 x AVX512 vs 24 x AVX256 ?
The problem is not the 512 bit width but all the good extensions that would be useful at lower width as well but are disabled along with AVX512.

Intel got in a really stupid mess packaging then disabling all that the way it did.
 

DrMrLordX

Lifer
Apr 27, 2000
21,582
10,785
136
The problem is not the 512 bit width but all the good extensions that would be useful at lower width as well but are disabled along with AVX512.

Intel got in a really stupid mess packaging then disabling all that the way it did.

At least they figured out how to enable VNNI on parts without AVX512. By adding a 256b version to AVX2 . . .
 

Exist50

Platinum Member
Aug 18, 2016
2,445
3,043
136
Intel is still releasing products with cores defined a long time ago... probably before they realized cross-ISA wasn't going to work. They just haven't caught up yet.
That may be true to some extent, but it looks like the same applies to Meteor Lake, and that should have been defined after Alder Lake was known.
They did it for Phi.
While true, Phi was a CPU pretending to be an accelerator, while Sierra Forest is a CPU actually being used as a CPU. AVX-512 would be a waste.
The problem is not the 512 bit width but all the good extensions that would be useful at lower width as well but are disabled along with AVX512.

Intel got in a really stupid mess packaging then disabling all that the way it did.
See, that's why I think they have a way out. Most of the goodness from AVX-512 is independent of the 512b vector length, but it's that vector length that drives some of the big hardware tradeoffs. If they could backport the rest to 256b width, that might be a reasonable compromise.
 

Exist50

Platinum Member
Aug 18, 2016
2,445
3,043
136
If they got -AP out, there'd be no point of having cores that have low FP throughput. That's the whole point of the product line.
I think you're mistaken about the purpose of the AP line. It doesn't seem to be about FP, but rather just more compute than SP. And there's absolutely a cloud market for an extreme amount of dense compute that doesn't care much about FP.
I think Royal is starting to be hyped up too much considering how early it is in the life cycle. Considering even the hyped up Apple chips didn't get there in one generation.
Hah, be honest, you just mean me here, if for no other reason than no one seemingly understanding what Royal actually is. Though it's not entirely blind optimism on my part.
 
  • Haha
Reactions: Geddagod

moinmoin

Diamond Member
Jun 1, 2017
4,933
7,619
136
See, that's why I think they have a way out. Most of the goodness from AVX-512 is independent of the 512b vector length, but it's that vector length that drives some of the big hardware tradeoffs. If they could backport the rest to 256b width, that might be a reasonable compromise.
I wish Intel would create a clear roadmap for x86 ISA extensions that applies for consumer chips.

Currently Intel treats extensions only as a selling point for server chips, which led to the situation that AVX512 essentially was belatedly retracted without replacement so far in the consumer space. And AMD understandably doesn't seem eager to push own extensions anymore.
 

Cardyak

Member
Sep 12, 2018
72
159
106
Hah, be honest, you just mean me here, if for no other reason than no one seemingly understanding what Royal actually is. Though it's not entirely blind optimism on my part.

Do we have any information on Royal yet? I still don’t understand what makes it so incredible.

Possibilities I can think of which are cool but I’m really just shooting in the dark here:

- Dropping support for older instructions and emulating them in software to reduce decode complexity and technical debt?
- Hyper/Hybrid threading magic (I.e: changing the entire way threads are handled in some way or another that no one has thought of yet) ?
- More aggressive speculation with Value Prediction?
- Extending more (or maybe all) of the pipeline to be out of order? (An entire core that is out of order from fetch all the way through to retirement would be a huge innovation)

This is literally just me fantasising though
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
And there's absolutely a cloud market for an extreme amount of dense compute that doesn't care much about FP.

And yet both Bergamo and Sierra Forest is ballpark 128 cores.

Hah, be honest, you just mean me here, if for no other reason than no one seemingly understanding what Royal actually is. Though it's not entirely blind optimism on my part.

You are one of them yes, but many, many others believe it, mostly from MLID hype. I will say the opinions for Royal is very optimistic regardless of the sources though.

Do we have any information on Royal yet? I still don’t understand what makes it so incredible.

Hope for good, but prepare for when it's not.

How revolutionary are we talking about here?
-Like out of order transition?
-Netburst?
-Bulldozer?
-Ahem... Itanium?

There are many patents that sound very promising but never be productized. Chip Architect site(ran by Hans de Vries) runs over the patent and ideas the "original K8" might/could/possibly have had. It sounds fantastic but the actual K8 was nowhere near promised, as good as the actual chip was.

That's because the ideas have to be realistic when it comes to cost, ease of implementation, and whether it benefits versus what's just in your head.

I could see Royal being "revolutionary" in terms that Intel finally figures things out and with all the optimism and a capable engineering team take perf/w leadership away from Apple.
 

Thunder 57

Platinum Member
Aug 19, 2007
2,647
3,706
136
And yet both Bergamo and Sierra Forest is ballpark 128 cores.



You are one of them yes, but many, many others believe it, mostly from MLID hype. I will say the opinions for Royal is very optimistic regardless of the sources though.



Hope for good, but prepare for when it's not.

How revolutionary are we talking about here?
-Like out of order transition?
-Netburst?
-Bulldozer?
-Ahem... Itanium?

There are many patents that sound very promising but never be productized. Chip Architect site(ran by Hans de Vries) runs over the patent and ideas the "original K8" might/could/possibly have had. It sounds fantastic but the actual K8 was nowhere near promised, as good as the actual chip was.

That's because the ideas have to be realistic when it comes to cost, ease of implementation, and whether it benefits versus what's just in your head.

I could see Royal being "revolutionary" in terms that Intel finally figures things out and with all the optimism and a capable engineering team take perf/w leadership away from Apple.

You have been into this longer than I have I'm sure. K8 was nowhere near as promised? It did very well.

What I remember are the delays though. IIRC it was supposed to be out in 2001. Then it was just constantly delayed. After all that time, they still didn't get what they wanted in there? I do remember a lot of speculation back then. Reverse Hyperthreading comes to mind.
 

nicalandia

Diamond Member
Jan 10, 2019
3,330
5,281
136
Based on the most recent info we have on Grand Ridge and Sierra Forrest. They will not have AVX-512, but are adding new extensions to their AVX

For Example: AVX-VNNI And AVX-IFMA

1669556568418.png
 

Exist50

Platinum Member
Aug 18, 2016
2,445
3,043
136
I wish Intel would create a clear roadmap for x86 ISA extensions that applies for consumer chips.

Currently Intel treats extensions only as a selling point for server chips, which led to the situation that AVX512 essentially was belatedly retracted without replacement so far in the consumer space. And AMD understandably doesn't seem eager to push own extensions anymore.
An ISA roadmap for x86 in general would be a godsend. I think the problem is that right now, Intel doesn't have a clear roadmap themselves, and kind of cobble it together as they go. AVX-512 was spearheaded by Xeon Phi, and then there's all the ML and more HPC stuff added along the way, and it seems like a great big mess at the end of the day. But beyond that, they seem to view ISA differentiation as a competitive advantage, and thus keep is secret till pretty much the last possible moment. It's kind of ironic since it gives AMD years to implement the same while the software catches up.

Would be even more helpful in reverse. Imagine how much better things would be if Intel could say, for example, "All processors released past 2025 will no longer support x87 or MMX", and actually hold to that plan. Though the ideal state would be Intel and AMD actually working together. Would better fit the reality of a world where ARM is a real competitive threat.
 
  • Like
Reactions: Tlh97 and moinmoin

Exist50

Platinum Member
Aug 18, 2016
2,445
3,043
136
Do we have any information on Royal yet? I still don’t understand what makes it so incredible.

Possibilities I can think of which are cool but I’m really just shooting in the dark here:

- Dropping support for older instructions and emulating them in software to reduce decode complexity and technical debt?
- Hyper/Hybrid threading magic (I.e: changing the entire way threads are handled in some way or another that no one has thought of yet) ?
- More aggressive speculation with Value Prediction?
- Extending more (or maybe all) of the pipeline to be out of order? (An entire core that is out of order from fetch all the way through to retirement would be a huge innovation)

This is literally just me fantasising though
I don't think there's any real info about Royal around. Even its existence as a discrete core is a matter of some disagreement. I think you're the only one I've seen propose actual microarchitectural features, and quite interesting ones at that. You were the one who first noted André Seznec and his joining Intel, right?
 

Cardyak

Member
Sep 12, 2018
72
159
106
You were the one who first noted André Seznec and his joining Intel, right?

Yes that’s right, he joined Intel a couple of years ago I think. He’s also the pre-eminent figure of academic research into Value Prediction.

With Royal Core, I think I’m most curious to see whether if it is a completely revolutionary idea and involves crazy new features similar to some of my earlier suggestions. Or, whether it is a more contemporary and evolutionary core. (E.G: The standard improvement of: Wider, Deeper, Smarter. Just lots more of it and restarted with a clean slate)

Essentially, if you are designing a brand new core from scratch, do you either:

A) Stick to standard conventions and just remove all the old clutter and have a fresh start with traditional methods. (Basically what AMD did with Zen)

OR

B) Grasp this opportunity and do something completely new and innovative, which could result in either a “home run” or a complete disaster - High Risk/High Reward. As someone mentioned earlier in this thread, we’ve seen Intel attempt do this before with Netburst and it was a catastrophe.