Discussion Intel current and future Lakes & Rapids thread

jpiniero · Nov 24, 2022

Exist50 said:
I sincerely doubt they're adding AVX-512 to Atom, and they're certainly not going to do so for just one product line.

I have to think they will add it eventually. Unless they are planning on ditching big.LITTLE at some point.

Exist50 · Nov 24, 2022

jpiniero said:
I have to think they will add it eventually. Unless they are planning on ditching big.LITTLE at some point.

Or they could ditch AVX-512.

jpiniero · Nov 24, 2022

Exist50 said:
Or they could ditch AVX-512.

They definitely aren't doing that.

Exist50 · Nov 25, 2022

jpiniero said:
They definitely aren't doing that.

What's your reasoning as to why not?

jpiniero · Nov 25, 2022

Exist50 said:
What's your reasoning as to why not?

Because they like AVX-512. The issue with the client parts has more to do with their inability to get cross-ISA working well enough. And then because of the 10 (and now 7) nm debacle, Intel is still releasing products with cores defined a long time ago... probably before they realized cross-ISA wasn't going to work. They just haven't caught up yet.

DrMrLordX · Nov 25, 2022

Exist50 said:
I sincerely doubt they're adding AVX-512 to Atom, and they're certainly not going to do so for just one product line.

They did it for Phi.

igor_kavinski · Nov 25, 2022

jpiniero said:
The issue with the client parts has more to do with their inability to get cross-ISA working well enough.

It could have been done through whitelisting (AVX-512 executables aren't that many and Windows could just maintain a list of such EXEs and run these only on the P-cores). Similar thing could be done in Linux.

They sacrificed AVX-512 for the sake of seamless operation and wasted valuable die space. They were really, REALLY desperate to compete with AMD at any cost. The launch with AVX-512 enabled proved that disabling AVX-512 was a last minute decision from someone at the top (maybe Pat?). I'm definitely not happy that they fused off the AVX-512 units. The only reason they got away with it is because most people don't need them. Should have left it to the users' discretion and disabled it in BIOS by default, like it was at launch.

jpiniero · Nov 25, 2022

igor_kavinski said:
It could have been done through whitelisting (AVX-512 executables aren't that many and Windows could just maintain a list of such EXEs and run these only on the P-cores). Similar thing could be done in Linux.

I doubt they trust Microsoft for even that.

NostaSeronx · Nov 25, 2022

I could have sworn that before Zen4 came out, Intel was being lambasted to make a new ISA inline with RVV/SVE.

Currently it is this:
SSE2+ 128-bit
AVX+ 128-bit
AVX_VL 128-bit
&&
AVX+ 256-bit
AVX_VL 256-bit
&&
AVX512

On one of the next advanced architecture design group speculation forums:
x-bit Reg FPFMA
V.x=0, 1-wide SISD/Scalar FMA x-bit op [if V.x=0, allows 80-bit/EP for x-bit]
V.x=1, 2-wide SIMD FMA x-bit op
V.x=2, 4-wide SIMD FMA x-bit op
V.x=3, 8-wide SIMD FMA x-bit op
Which can be overlapped with every x87->AVX512 operation with the least amount instructions. [The above covers Centaur's unreleased Extended x87 instruction set, CT64-x87.]

Microsoft's SSEx->NEON transpiler emulator can be improved upon with the above. With the added benefit of the base remains the same, only the extension is swapped: x64-(SSEx) -> x64-(New ISE).
----
AVX512 1024-bit (11)
AVX512 512-bit (01 or 10)
AVX512 256-bit (10 or 01)
AVX512 128-bit (00)
AVX512 Scalar :: 5 instructions for same op
AVX 256-bit
AVX 128-bit
AVX Scalar :: 3 instructions for same op
SSE 128-bit
SSE Scalar :: 2 instructions for same op
10 instructions for same op

-> New instruction set extension:
1 instruction can implement all of the above. Moving complexity of operations to the rename/scheduler is more CISC, than placing the complexity on decode.

New definitions:
CISC -> Grand Library of 1-ISC(OISC)
RISC -> Small Library of 1-ISC(OISC)

Complexity is available, but it is on the micro-architecture not the extension to achieve it.

However, you could just google HPC portability to get this more simplified:
HPC for example is moving away from tuning for portability;
SIMD Architecture-implementation Agnostic high-level code:
Gen1-Arch: 256-bit low-level runtime code
Gen9-Arch: 2048-bit, +Packed SIMD Streaming (8x 256-bit)-also... low-level runtime code

Because of the above, Modern SIMD instruction sets have made HPC computers lean towards custom RISC-V/ARM. Since, they can upgrade and expanded rapidly and know the code will work across generations.

HPC Computer Facility-example:
Building A Phase 1: Gen1 computers
Building E Phase 12: Gen9 computers

If the instruction op is shared between o.g. heterogeneous architectures CPU and GPU:
CPU does V.a_tot to 512-bit~1024-bit && GPU does V.b_tot to 1024-bit~2048-bit, the new instruction extension is definitely needed.

RVV Class A custom CPU: RVV supports 128-bit through 1024-bit.
RVV Class B custom GPU(minus graphics): RVV supports 1024-bit through 8192-bit.
Rapids and Shores being around this is a bit silly.

IntelUser2000 · Nov 25, 2022

Exist50 said:
Really, the only reason I could see for such a decision is to limit development resources.

Yea probably because it still needs to test the waters. No matter the hype Sierra Forest(and much more true for Falcon Shores) is an untested platform.

If they got -AP out, there'd be no point of having cores that have low FP throughput. That's the whole point of the product line.

They didn't just port Silvermont to Xeon Phi. They called it "modified Silvermont" but they extended things like ROBs and was it's own core. It was "modified Silvermont" much as Goldmont Plus is "modified Goldmont". I doubt Sierra Forest would simply just port Crestmont.

Exist50 said:
Oh, and while I'm on the topic, I fully expect Royal to be added to the "revolutionary technologies" list. Ah, wish they would hurry up with it.

I think Royal is starting to be hyped up too much considering how early it is in the life cycle. Considering even the hyped up Apple chips didn't get there in one generation.

eek2121 · Nov 25, 2022

igor_kavinski said:
It could have been done through whitelisting (AVX-512 executables aren't that many and Windows could just maintain a list of such EXEs and run these only on the P-cores). Similar thing could be done in Linux.

They sacrificed AVX-512 for the sake of seamless operation and wasted valuable die space. They were really, REALLY desperate to compete with AMD at any cost. The launch with AVX-512 enabled proved that disabling AVX-512 was a last minute decision from someone at the top (maybe Pat?). I'm definitely not happy that they fused off the AVX-512 units. The only reason they got away with it is because most people don't need them. Should have left it to the users' discretion and disabled it in BIOS by default, like it was at launch.

Uh no. It is impossible to maintain a 100% accuracy rate with a white list without 100% developer cooperation for any company.

Since they have an open ecosystem, that will never happen.

JustViewing · Nov 26, 2022

eek2121 said:
Uh no. It is impossible to maintain a 100% accuracy rate with a white list without 100% developer cooperation for any company.

Since they have an open ecosystem, that will never happen.

Applications can request OS to run only on cores which have AVX512. But is it worth compared with 8 x AVX512 vs 24 x AVX256 ?

moinmoin · Nov 26, 2022

JustViewing said:
Applications can request OS to run only on cores which have AVX512. But is it worth compared with 8 x AVX512 vs 24 x AVX256 ?

The problem is not the 512 bit width but all the good extensions that would be useful at lower width as well but are disabled along with AVX512.

Intel got in a really stupid mess packaging then disabling all that the way it did.

DrMrLordX · Nov 26, 2022

moinmoin said:
The problem is not the 512 bit width but all the good extensions that would be useful at lower width as well but are disabled along with AVX512.

Intel got in a really stupid mess packaging then disabling all that the way it did.

At least they figured out how to enable VNNI on parts without AVX512. By adding a 256b version to AVX2 . . .

Exist50 · Nov 26, 2022

jpiniero said:
Intel is still releasing products with cores defined a long time ago... probably before they realized cross-ISA wasn't going to work. They just haven't caught up yet.

That may be true to some extent, but it looks like the same applies to Meteor Lake, and that should have been defined after Alder Lake was known.

DrMrLordX said:
They did it for Phi.

While true, Phi was a CPU pretending to be an accelerator, while Sierra Forest is a CPU actually being used as a CPU. AVX-512 would be a waste.

moinmoin said:
The problem is not the 512 bit width but all the good extensions that would be useful at lower width as well but are disabled along with AVX512.

Intel got in a really stupid mess packaging then disabling all that the way it did.

See, that's why I think they have a way out. Most of the goodness from AVX-512 is independent of the 512b vector length, but it's that vector length that drives some of the big hardware tradeoffs. If they could backport the rest to 256b width, that might be a reasonable compromise.

Exist50 · Nov 26, 2022

IntelUser2000 said:
If they got -AP out, there'd be no point of having cores that have low FP throughput. That's the whole point of the product line.

I think you're mistaken about the purpose of the AP line. It doesn't seem to be about FP, but rather just more compute than SP. And there's absolutely a cloud market for an extreme amount of dense compute that doesn't care much about FP.

IntelUser2000 said:
I think Royal is starting to be hyped up too much considering how early it is in the life cycle. Considering even the hyped up Apple chips didn't get there in one generation.

Hah, be honest, you just mean me here, if for no other reason than no one seemingly understanding what Royal actually is. Though it's not entirely blind optimism on my part.

moinmoin · Nov 26, 2022

Exist50 said:
See, that's why I think they have a way out. Most of the goodness from AVX-512 is independent of the 512b vector length, but it's that vector length that drives some of the big hardware tradeoffs. If they could backport the rest to 256b width, that might be a reasonable compromise.

I wish Intel would create a clear roadmap for x86 ISA extensions that applies for consumer chips.

Currently Intel treats extensions only as a selling point for server chips, which led to the situation that AVX512 essentially was belatedly retracted without replacement so far in the consumer space. And AMD understandably doesn't seem eager to push own extensions anymore.

Cardyak · Nov 26, 2022

Exist50 said:
Hah, be honest, you just mean me here, if for no other reason than no one seemingly understanding what Royal actually is. Though it's not entirely blind optimism on my part.

Do we have any information on Royal yet? I still don’t understand what makes it so incredible.

Possibilities I can think of which are cool but I’m really just shooting in the dark here:

- Dropping support for older instructions and emulating them in software to reduce decode complexity and technical debt?
- Hyper/Hybrid threading magic (I.e: changing the entire way threads are handled in some way or another that no one has thought of yet) ?
- More aggressive speculation with Value Prediction?
- Extending more (or maybe all) of the pipeline to be out of order? (An entire core that is out of order from fetch all the way through to retirement would be a huge innovation)

This is literally just me fantasising though

IntelUser2000 · Nov 26, 2022

Exist50 said:
And there's absolutely a cloud market for an extreme amount of dense compute that doesn't care much about FP.

And yet both Bergamo and Sierra Forest is ballpark 128 cores.

Hah, be honest, you just mean me here, if for no other reason than no one seemingly understanding what Royal actually is. Though it's not entirely blind optimism on my part.

You are one of them yes, but many, many others believe it, mostly from MLID hype. I will say the opinions for Royal is very optimistic regardless of the sources though.

Cardyak said:
Do we have any information on Royal yet? I still don’t understand what makes it so incredible.

Hope for good, but prepare for when it's not.

How revolutionary are we talking about here?
-Like out of order transition?
-Netburst?
-Bulldozer?
-Ahem... Itanium?

There are many patents that sound very promising but never be productized. Chip Architect site(ran by Hans de Vries) runs over the patent and ideas the "original K8" might/could/possibly have had. It sounds fantastic but the actual K8 was nowhere near promised, as good as the actual chip was.

That's because the ideas have to be realistic when it comes to cost, ease of implementation, and whether it benefits versus what's just in your head.

I could see Royal being "revolutionary" in terms that Intel finally figures things out and with all the optimism and a capable engineering team take perf/w leadership away from Apple.

Thunder 57 · Nov 27, 2022

IntelUser2000 said:
And yet both Bergamo and Sierra Forest is ballpark 128 cores.

You are one of them yes, but many, many others believe it, mostly from MLID hype. I will say the opinions for Royal is very optimistic regardless of the sources though.

Hope for good, but prepare for when it's not.

How revolutionary are we talking about here?
-Like out of order transition?
-Netburst?
-Bulldozer?
-Ahem... Itanium?

There are many patents that sound very promising but never be productized. Chip Architect site(ran by Hans de Vries) runs over the patent and ideas the "original K8" might/could/possibly have had. It sounds fantastic but the actual K8 was nowhere near promised, as good as the actual chip was.

That's because the ideas have to be realistic when it comes to cost, ease of implementation, and whether it benefits versus what's just in your head.

I could see Royal being "revolutionary" in terms that Intel finally figures things out and with all the optimism and a capable engineering team take perf/w leadership away from Apple.

You have been into this longer than I have I'm sure. K8 was nowhere near as promised? It did very well.

What I remember are the delays though. IIRC it was supposed to be out in 2001. Then it was just constantly delayed. After all that time, they still didn't get what they wanted in there? I do remember a lot of speculation back then. Reverse Hyperthreading comes to mind.

nicalandia · Nov 27, 2022

Based on the most recent info we have on Grand Ridge and Sierra Forrest. They will not have AVX-512, but are adding new extensions to their AVX

For Example: AVX-VNNI And AVX-IFMA

Exist50 · Nov 27, 2022

moinmoin said:
I wish Intel would create a clear roadmap for x86 ISA extensions that applies for consumer chips.

Currently Intel treats extensions only as a selling point for server chips, which led to the situation that AVX512 essentially was belatedly retracted without replacement so far in the consumer space. And AMD understandably doesn't seem eager to push own extensions anymore.

An ISA roadmap for x86 in general would be a godsend. I think the problem is that right now, Intel doesn't have a clear roadmap themselves, and kind of cobble it together as they go. AVX-512 was spearheaded by Xeon Phi, and then there's all the ML and more HPC stuff added along the way, and it seems like a great big mess at the end of the day. But beyond that, they seem to view ISA differentiation as a competitive advantage, and thus keep is secret till pretty much the last possible moment. It's kind of ironic since it gives AMD years to implement the same while the software catches up.

Would be even more helpful in reverse. Imagine how much better things would be if Intel could say, for example, "All processors released past 2025 will no longer support x87 or MMX", and actually hold to that plan. Though the ideal state would be Intel and AMD actually working together. Would better fit the reality of a world where ARM is a real competitive threat.

Exist50 · Nov 27, 2022

Cardyak said:
Do we have any information on Royal yet? I still don’t understand what makes it so incredible.

Possibilities I can think of which are cool but I’m really just shooting in the dark here:

- Dropping support for older instructions and emulating them in software to reduce decode complexity and technical debt?
- Hyper/Hybrid threading magic (I.e: changing the entire way threads are handled in some way or another that no one has thought of yet) ?
- More aggressive speculation with Value Prediction?
- Extending more (or maybe all) of the pipeline to be out of order? (An entire core that is out of order from fetch all the way through to retirement would be a huge innovation)

This is literally just me fantasising though

I don't think there's any real info about Royal around. Even its existence as a discrete core is a matter of some disagreement. I think you're the only one I've seen propose actual microarchitectural features, and quite interesting ones at that. You were the one who first noted André Seznec and his joining Intel, right?

Cardyak · Nov 27, 2022

Exist50 said:
You were the one who first noted André Seznec and his joining Intel, right?

Yes that’s right, he joined Intel a couple of years ago I think. He’s also the pre-eminent figure of academic research into Value Prediction.

With Royal Core, I think I’m most curious to see whether if it is a completely revolutionary idea and involves crazy new features similar to some of my earlier suggestions. Or, whether it is a more contemporary and evolutionary core. (E.G: The standard improvement of: Wider, Deeper, Smarter. Just lots more of it and restarted with a clean slate)

Essentially, if you are designing a brand new core from scratch, do you either:

A) Stick to standard conventions and just remove all the old clutter and have a fresh start with traditional methods. (Basically what AMD did with Zen)

OR

B) Grasp this opportunity and do something completely new and innovative, which could result in either a “home run” or a complete disaster - High Risk/High Reward. As someone mentioned earlier in this thread, we’ve seen Intel attempt do this before with Netburst and it was a catastrophe.

DrMrLordX · Nov 27, 2022

Cardyak said:
B) Grasp this opportunity and do something completely new and innovative, which could result in either a “home run” or a complete disaster - High Risk/High Reward. As someone mentioned earlier in this thread, we’ve seen Intel attempt do this before with Netburst and it was a catastrophe.

Ditto for Itanium.

Discussion Intel current and future Lakes & Rapids thread

Lifer

Platinum Member

Lifer

Platinum Member

Lifer

Lifer

Lifer

Lifer

Diamond Member

Elite Member

Diamond Member

Senior member

Diamond Member

Lifer

Platinum Member

Platinum Member

Diamond Member

Member

Elite Member

Diamond Member

Diamond Member

Platinum Member

Platinum Member

Member

Lifer