Or they could ditch AVX-512.I have to think they will add it eventually. Unless they are planning on ditching big.LITTLE at some point.
Or they could ditch AVX-512.I have to think they will add it eventually. Unless they are planning on ditching big.LITTLE at some point.
They definitely aren't doing that.Or they could ditch AVX-512.
What's your reasoning as to why not?They definitely aren't doing that.
Because they like AVX-512. The issue with the client parts has more to do with their inability to get cross-ISA working well enough. And then because of the 10 (and now 7) nm debacle, Intel is still releasing products with cores defined a long time ago... probably before they realized cross-ISA wasn't going to work. They just haven't caught up yet.What's your reasoning as to why not?
They did it for Phi.I sincerely doubt they're adding AVX-512 to Atom, and they're certainly not going to do so for just one product line.
It could have been done through whitelisting (AVX-512 executables aren't that many and Windows could just maintain a list of such EXEs and run these only on the P-cores). Similar thing could be done in Linux.The issue with the client parts has more to do with their inability to get cross-ISA working well enough.
I doubt they trust Microsoft for even that.It could have been done through whitelisting (AVX-512 executables aren't that many and Windows could just maintain a list of such EXEs and run these only on the P-cores). Similar thing could be done in Linux.
On one of the next advanced architecture design group speculation forums:
x-bit Reg FPFMA
V.x=0, 1-wide SISD/Scalar FMA x-bit op [if V.x=0, allows 80-bit/EP for x-bit]
V.x=1, 2-wide SIMD FMA x-bit op
V.x=2, 4-wide SIMD FMA x-bit op
V.x=3, 8-wide SIMD FMA x-bit op
Which can be overlapped with every x87->AVX512 operation with the least amount instructions. [The above covers Centaur's unreleased Extended x87 instruction set, CT64-x87.]
Microsoft's SSEx->NEON transpiler emulator can be improved upon with the above. With the added benefit of the base remains the same, only the extension is swapped: x64-(SSEx) -> x64-(New ISE).
----
AVX512 1024-bit (11)
AVX512 512-bit (01 or 10)
AVX512 256-bit (10 or 01)
AVX512 128-bit (00)
AVX512 Scalar :: 5 instructions for same op
AVX 256-bit
AVX 128-bit
AVX Scalar :: 3 instructions for same op
SSE 128-bit
SSE Scalar :: 2 instructions for same op
10 instructions for same op
-> New instruction set extension:
1 instruction can implement all of the above. Moving complexity of operations to the rename/scheduler is more CISC, than placing the complexity on decode.
New definitions:
CISC -> Grand Library of 1-ISC(OISC)
RISC -> Small Library of 1-ISC(OISC)
Complexity is available, but it is on the micro-architecture not the extension to achieve it.
However, you could just google HPC portability to get this more simplified:
HPC for example is moving away from tuning for portability;
SIMD Architecture-implementation Agnostic high-level code:
Gen1-Arch: 256-bit low-level runtime code
Gen9-Arch: 2048-bit, +Packed SIMD Streaming (8x 256-bit)-also... low-level runtime code
Because of the above, Modern SIMD instruction sets have made HPC computers lean towards custom RISC-V/ARM. Since, they can upgrade and expanded rapidly and know the code will work across generations.
HPC Computer Facility-example:
Building A Phase 1: Gen1 computers
Building E Phase 12: Gen9 computers
If the instruction op is shared between o.g. heterogeneous architectures CPU and GPU:
CPU does V.a_tot to 512-bit~1024-bit && GPU does V.b_tot to 1024-bit~2048-bit, the new instruction extension is definitely needed.
RVV Class A custom CPU: RVV supports 128-bit through 1024-bit.
RVV Class B custom GPU(minus graphics): RVV supports 1024-bit through 8192-bit.
Rapids and Shores being around this is a bit silly.
Yea probably because it still needs to test the waters. No matter the hype Sierra Forest(and much more true for Falcon Shores) is an untested platform.Really, the only reason I could see for such a decision is to limit development resources.
I think Royal is starting to be hyped up too much considering how early it is in the life cycle. Considering even the hyped up Apple chips didn't get there in one generation.Oh, and while I'm on the topic, I fully expect Royal to be added to the "revolutionary technologies" list. Ah, wish they would hurry up with it.
Uh no. It is impossible to maintain a 100% accuracy rate with a white list without 100% developer cooperation for any company.It could have been done through whitelisting (AVX-512 executables aren't that many and Windows could just maintain a list of such EXEs and run these only on the P-cores). Similar thing could be done in Linux.
They sacrificed AVX-512 for the sake of seamless operation and wasted valuable die space. They were really, REALLY desperate to compete with AMD at any cost. The launch with AVX-512 enabled proved that disabling AVX-512 was a last minute decision from someone at the top (maybe Pat?). I'm definitely not happy that they fused off the AVX-512 units. The only reason they got away with it is because most people don't need them. Should have left it to the users' discretion and disabled it in BIOS by default, like it was at launch.
Applications can request OS to run only on cores which have AVX512. But is it worth compared with 8 x AVX512 vs 24 x AVX256 ?Uh no. It is impossible to maintain a 100% accuracy rate with a white list without 100% developer cooperation for any company.
Since they have an open ecosystem, that will never happen.
The problem is not the 512 bit width but all the good extensions that would be useful at lower width as well but are disabled along with AVX512.Applications can request OS to run only on cores which have AVX512. But is it worth compared with 8 x AVX512 vs 24 x AVX256 ?
At least they figured out how to enable VNNI on parts without AVX512. By adding a 256b version to AVX2 . . .The problem is not the 512 bit width but all the good extensions that would be useful at lower width as well but are disabled along with AVX512.
Intel got in a really stupid mess packaging then disabling all that the way it did.
That may be true to some extent, but it looks like the same applies to Meteor Lake, and that should have been defined after Alder Lake was known.Intel is still releasing products with cores defined a long time ago... probably before they realized cross-ISA wasn't going to work. They just haven't caught up yet.
While true, Phi was a CPU pretending to be an accelerator, while Sierra Forest is a CPU actually being used as a CPU. AVX-512 would be a waste.They did it for Phi.
See, that's why I think they have a way out. Most of the goodness from AVX-512 is independent of the 512b vector length, but it's that vector length that drives some of the big hardware tradeoffs. If they could backport the rest to 256b width, that might be a reasonable compromise.The problem is not the 512 bit width but all the good extensions that would be useful at lower width as well but are disabled along with AVX512.
Intel got in a really stupid mess packaging then disabling all that the way it did.
I think you're mistaken about the purpose of the AP line. It doesn't seem to be about FP, but rather just more compute than SP. And there's absolutely a cloud market for an extreme amount of dense compute that doesn't care much about FP.If they got -AP out, there'd be no point of having cores that have low FP throughput. That's the whole point of the product line.
Hah, be honest, you just mean me here, if for no other reason than no one seemingly understanding what Royal actually is. Though it's not entirely blind optimism on my part.I think Royal is starting to be hyped up too much considering how early it is in the life cycle. Considering even the hyped up Apple chips didn't get there in one generation.
I wish Intel would create a clear roadmap for x86 ISA extensions that applies for consumer chips.See, that's why I think they have a way out. Most of the goodness from AVX-512 is independent of the 512b vector length, but it's that vector length that drives some of the big hardware tradeoffs. If they could backport the rest to 256b width, that might be a reasonable compromise.
Do we have any information on Royal yet? I still don’t understand what makes it so incredible.Hah, be honest, you just mean me here, if for no other reason than no one seemingly understanding what Royal actually is. Though it's not entirely blind optimism on my part.
And yet both Bergamo and Sierra Forest is ballpark 128 cores.And there's absolutely a cloud market for an extreme amount of dense compute that doesn't care much about FP.
You are one of them yes, but many, many others believe it, mostly from MLID hype. I will say the opinions for Royal is very optimistic regardless of the sources though.Hah, be honest, you just mean me here, if for no other reason than no one seemingly understanding what Royal actually is. Though it's not entirely blind optimism on my part.
Hope for good, but prepare for when it's not.Do we have any information on Royal yet? I still don’t understand what makes it so incredible.
You have been into this longer than I have I'm sure. K8 was nowhere near as promised? It did very well.And yet both Bergamo and Sierra Forest is ballpark 128 cores.
You are one of them yes, but many, many others believe it, mostly from MLID hype. I will say the opinions for Royal is very optimistic regardless of the sources though.
Hope for good, but prepare for when it's not.
How revolutionary are we talking about here?
-Like out of order transition?
-Netburst?
-Bulldozer?
-Ahem... Itanium?
There are many patents that sound very promising but never be productized. Chip Architect site(ran by Hans de Vries) runs over the patent and ideas the "original K8" might/could/possibly have had. It sounds fantastic but the actual K8 was nowhere near promised, as good as the actual chip was.
That's because the ideas have to be realistic when it comes to cost, ease of implementation, and whether it benefits versus what's just in your head.
I could see Royal being "revolutionary" in terms that Intel finally figures things out and with all the optimism and a capable engineering team take perf/w leadership away from Apple.
An ISA roadmap for x86 in general would be a godsend. I think the problem is that right now, Intel doesn't have a clear roadmap themselves, and kind of cobble it together as they go. AVX-512 was spearheaded by Xeon Phi, and then there's all the ML and more HPC stuff added along the way, and it seems like a great big mess at the end of the day. But beyond that, they seem to view ISA differentiation as a competitive advantage, and thus keep is secret till pretty much the last possible moment. It's kind of ironic since it gives AMD years to implement the same while the software catches up.I wish Intel would create a clear roadmap for x86 ISA extensions that applies for consumer chips.
Currently Intel treats extensions only as a selling point for server chips, which led to the situation that AVX512 essentially was belatedly retracted without replacement so far in the consumer space. And AMD understandably doesn't seem eager to push own extensions anymore.
I don't think there's any real info about Royal around. Even its existence as a discrete core is a matter of some disagreement. I think you're the only one I've seen propose actual microarchitectural features, and quite interesting ones at that. You were the one who first noted André Seznec and his joining Intel, right?Do we have any information on Royal yet? I still don’t understand what makes it so incredible.
Possibilities I can think of which are cool but I’m really just shooting in the dark here:
- Dropping support for older instructions and emulating them in software to reduce decode complexity and technical debt?
- Hyper/Hybrid threading magic (I.e: changing the entire way threads are handled in some way or another that no one has thought of yet) ?
- More aggressive speculation with Value Prediction?
- Extending more (or maybe all) of the pipeline to be out of order? (An entire core that is out of order from fetch all the way through to retirement would be a huge innovation)
This is literally just me fantasising though
Yes that’s right, he joined Intel a couple of years ago I think. He’s also the pre-eminent figure of academic research into Value Prediction.You were the one who first noted André Seznec and his joining Intel, right?
Ditto for Itanium.B) Grasp this opportunity and do something completely new and innovative, which could result in either a “home run” or a complete disaster - High Risk/High Reward. As someone mentioned earlier in this thread, we’ve seen Intel attempt do this before with Netburst and it was a catastrophe.
At least in the early stages of development do you think they could be pursuing both A and B until one path looks like it's leading to a better result? This could be a situation where it would be good for the person at the top, who will most likely eventually steering the ship to be a person more versed in tech first and finance second rather than the reverse.Yes that’s right, he joined Intel a couple of years ago I think. He’s also the pre-eminent figure of academic research into Value Prediction.
With Royal Core, I think I’m most curious to see whether if it is a completely revolutionary idea and involves crazy new features similar to some of my earlier suggestions. Or, whether it is a more contemporary and evolutionary core. (E.G: The standard improvement of: Wider, Deeper, Smarter. Just lots more of it and restarted with a clean slate)
Essentially, if you are designing a brand new core from scratch, do you either:
A) Stick to standard conventions and just remove all the old clutter and have a fresh start with traditional methods. (Basically what AMD did with Zen)
OR
B) Grasp this opportunity and do something completely new and innovative, which could result in either a “home run” or a complete disaster - High Risk/High Reward. As someone mentioned earlier in this thread, we’ve seen Intel attempt do this before with Netburst and it was a catastrophe.