• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

Question Zen 6 Speculation Thread

Page 392 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Both have advantages and disadvantages. I'm saying that the comparison is not as straightforward as it might seem on the surface. I'm sure if AMD and Intel could "start over" with a completey new instruction set and OS they could do what Apple has done with the M series.
What you compile for also matters X86_64 people still doesn't compile for AVX despite it being ages old Intel is one of the reason as well.
Google chrome is compiled with SSE3/4 not exactly sure which version.
 
What you compile for also matters X86_64 people still doesn't compile for AVX despite it being ages old Intel is one of the reason as well.
Google chrome is compiled with SSE3/4 not exactly sure which version.
Yes. x86 developers are still writing for Skylake. A necessary downside of a completely open infrastructure.
 
Bulldozer's FPU was not a problem, though. It was so starved on the memory side that it was not really possible to saturate the FPU execution units on any realistic load.

It's weird how the narrative around that core worked out. It was a bad design, but the parts that made it a bad design (combination of write-through L1 and slow-as-molasses L2), were harder to see on a block diagram than the shared FPU, so everyone concentrated on that even though it was basically never the bottleneck.
Back in the day, on an old forum now defunct, I raised this question to an AMD rep that participated in the forum. His reply to me was

"Don't worry, we are going to keep these executions units fed!".

He ended up being quite wrong and my architectural concerns were absolutely on target after all.
 
Skylake is rather new people write for SSE4 aka before Nehlam🤣 🤣
Pre Nehalem only had SSE4.1 and that's just one generation, Penryn, which actually fell out of support by Microsoft Windows, of all things.
Nehalem/SSE4.2 is the new baseline requirement of *Windows*.
 
Last edited:
  • Like
Reactions: 511
Back in the day, on an old forum now defunct, I raised this question to an AMD rep that participated in the forum. His reply to me was

"Don't worry, we are going to keep these executions units fed!".

He ended up being quite wrong and my architectural concerns were absolutely on target after all.
From what I remember the biggest flaw with Bulldozer was CMT. Basically two logical processors sharing execution resources. It was an attempt to maximize compute/die area. Kaveri fixed a lot of the front end starvation issues by adding per core decoders and better branch prediction but by this time it was too late because Skylake was "born" about this same time and the Bulldozer era cores couldn't compete. A new direction was needed and AMD realized that and Zen was born.
 
Pre Nehalem only had SSE4.1 and that's just one generation, Penryn, which actually fell out of support by Microsoft Windows, of all things.
Nehalem/SSE4.2 is the new baseline requirement of *Windows*.

IMHO, AVX2 will be common minimum pretty fast if not already (except for games, maybe).
Had to depart my beloved x79 system because some software dropped pre-AVX2 support...
 
I hope most of their IPC gains this time around come from integer workloads. There are certainly low hanging fruits (of small benefit) on the decoder side, like finally allowing decoding 2 branches for a single thread, or bringingback some Zen-4 optimizations. Not that decoding is even the main bottleneck:

Yeah, majority of new resources / optimization should be on Integer.

AVX-512 is already fine and AMD is adding separate units for Matrix Multiplications. So, it will be all about integer on the cores and the SoCs on the client.
 
While I understand your point, and I would agree the efficiency of Apple's M core is astounding, I don't think it is better at running x64 code than Skymont.

It's like comparing a car built for road racing to one built for off-road. Sure they are both vehicles but they run on completely different infrastructures. Apple has a tightly controlled ecobase. No DIY builds, very limited upgraded, limited software base, no third party hardware, limited third party software (only though Apple store, etc..), not required to run apps from 50 years ago, etc... It's a tightly controlled, perfectly paved race track with no bumps. Due to their tightly controlled system they've even been able to totally start from scratch with their code base a few times over the years.

x64 on the otherhand is a jungle of thousands of third party hardware parts and software apps spanning 50 plus years that has to work across generations and generations of processors and hardware. CPUs built for x64 have to be able to handle wild terrain.

Both have advantages and disadvantages. I'm saying that the comparison is not as straightforward as it might seem on the surface. I'm sure if AMD and Intel could "start over" with a completey new instruction set and OS they could do what Apple has done with the M series.
Hulk I was talking about new the M core as in the Middle core architecture in M5.

we reached a point where x64 isn’t the only performance side making fast cores. This isn’t 2010. Yes, I know that Apple doesn’t do DIY/thirdparty hardware but DIY is very SMALL piece of the TAM for x64 CPUs. I don't really care if x64 can run an app that was made 50 years ago natively, most people don't care and when comparing uArchs that is completely irrelevant as that doesn't make a design worse.

Apple laptops however are very popular here and elsewhere, which pretty much own the premium laptop segment, i.e laptops over a $999. Now they are moving down to the $499-$599 budget segment which is where the bulk of Windows laptops are sold and its selling well.

As for the Apple OS side, you can install outside the App Store and yes you can get to a root folder on a Mac. don't know why you have this idea that Macs run a iOS, they don't.

Apple’s CPUs are not good because they are built for a controlled environment but because they are simply just good, ( see Apple CPUs running on macOS vs Linux, it’s the same performance for CPU tasks or even better for some server tasks on Linux).

Your argument for Apple making cores their cores the best is thats that the ARM ISA is "new" and that Intel/AMD can do somthing simliar. Thats not even remotely true, RISC-V is a newer ISA and RISC-V Linux there doesn't have the same baggage as x86 Linux and yet not yet a single RISC-V core can compete with Skymont in terms of perf/mm2 or absolute perf. Not a SINGLE company managed it with their latest RISC-V designs.
 
Last edited:
So, it will be all about integer on the cores and the SoCs on the client.
No, it's just perf. FP included.
Never waste your lead and all.
As for the Apple OS side, you can install outside the App Store and yes you can get to a root folder on a Mac. don't know why you have this idea that Macs run a iOS, they don't.
Atta boy, Gatekeeper is annoying.
Apple’s CPUs are not good because they are built for a controlled environment but because they are simply just good, ( see Apple CPUs running on macOS vs Linux, it’s the same performance for CPU tasks or even better for some server tasks on Linux).
YES
 
the only "issue" im aware of for apple cores and server workloads would be the 16k min page size and the fact many real servers spend a stupid amount of IO writing tiny log messages to disk/network.

What would be interesting is what does an apple stedy state high load core performance look like when you bolt it onto a memory sub system design to scale to large numbers ( > 128 ) cores with "good" NUMA performance. i suspect still very very good , but that fast 16mb L2 is yum and that thing probably not that easy to scale.
 
This time AMD has a nice and meaty roadmap, though.
Yeah intel was also at peak of there semi conductor powers and foundries where dropping like flys, can get away with alot when moores law was alive just by being ahead in semi. I dont have much faith in intel these days, so i guess the question is , i wonder when ARM will decide it would like some money from Vendors for DC cores?
 
Hulk I was talking about new the M core as in the Middle core architecture in M5.

we reached a point where x64 isn’t the only performance side making fast cores. This isn’t 2010. Yes, I know that Apple doesn’t do DIY/thirdparty hardware but DIY is very SMALL piece of the TAM for x64 CPUs. I don't really care if x64 can run an app that was made 50 years ago natively, most people don't care and when comparing uArchs that is completely irrelevant as that doesn't make a design worse.
DIY x64 might be small but the percentage of computers using x86/x64 is huge. It IS harder to maintain compatibility for over 50 years than to just be able to start from scratch when you feel like it because you have a tightly controlled ecosystem.
Apple laptops however are very popular here and elsewhere, which pretty much own the premium laptop segment, i.e laptops over a $999. Now they are moving down to the $499-$599 budget segment which is where the bulk of Windows laptops are sold and its selling well.
I'm sure they are great for people who don't know what a directory tree is or could care less about it.
As for the Apple OS side, you can install outside the App Store and yes you can get to a root folder on a Mac. don't know why you have this idea that Macs run a iOS, they don't.
Got it. It is still a very closed (East Berlin) type of system.
Apple’s CPUs are not good because they are built for a controlled environment but because they are simply just good, ( see Apple CPUs running on macOS vs Linux, it’s the same performance for CPU tasks or even better for some server tasks on Linux).
Your argument for Apple making cores their cores the best is thats that the ARM ISA is "new" and that Intel/AMD can do somthing simliar. Thats not even remotely true, RISC-V is a newer ISA and RISC-V Linux there doesn't have the same baggage as x86 Linux and yet not yet a single RISC-V core can compete with Skymont in terms of perf/mm2 or absolute perf. Not a SINGLE company managed it with their latest RISC-V designs.
So are you saying if Apple wanted to they could make their cores run x64 natively and immediately clean the clocks of Intel and AMD in the x64 space? I'm sorry but find that hard to believe. I find it more likely they would run into the same design challenges that Intel and AMD have been struggling with for the past 5 decades. ARM was designed with efficiency in mind. x86 not so much. I mean ARM was designed from the start to be a full 32 bit RISC architecture, no need for uops. AMD and Intel have been working around that limitation since the PowerPC and still are!
 
i wonder when ARM will decide it would like some money from Vendors for DC cores?
They already did, v9 rates are up and CSS rates are especially up.
Makes Venice an offer no one refuses.
So are you saying if Apple wanted to they could make their cores run x64 natively and immediately clean the clocks of Intel and AMD in the x64 space?
Yes.
I mean, they already do TSO (which is the tricky part).
 
the only "issue" im aware of for apple cores and server workloads would be the 16k min page size and the fact many real servers spend a stupid amount of IO writing tiny log messages to disk/network.

VM page size doesn't affect that stuff. For years we had most CPUs with a 4K VM page size and hard drives with a 512 byte sector size. It wasn't until they crossed the 2 TB barrier that sector sizes went to 4096 bytes to match VM page size. Hopefully if you are writing log files you are allowing some level of write caching rather than syncing to storage with every 50 byte message. That's gonna be a MUCH bigger problem for NAND than it is for the difference between 4K and 16K VM page size.

Ditto for network, interfaces use DMA to take data from DRAM to the wire and the VM page size doesn't enter into it.

The major efficiency hit of larger pages are for applications that do a lot of small memory allocations and deallocations and fragment your allocation area. Moden allocators do a good job of delaing with this, but applications that think they can do a better job than the OS and manage it themselves may not. But there are some advantages to larger page sizes such as making it easier to implement larger low level caches and simpler page table structures so it is probably mostly a wash.

DEC Alpha seemed to work pretty well for servers, despite its 8K VM page size. And that was over 30 years ago, when server main memory was three orders of magnitude smaller than it is today. Apple's 16K page sizes are a hassle for compatibility with poorly written software that assumes a 4K page size, meaning porting would be a problem, but that's about its only meaningful impact.
 
Back
Top