FlameTail
Diamond Member
- Dec 15, 2021
- 4,384
- 2,754
- 106
Here's the thing, if AMD drops their frequency to ARM levels, that's not going to suddenly give them a free double digit IPC gain. Increasing IPC is hard, and as things stand now, ARM vendors have a large IPC advantage over AMD.So I feel that it does not make sense for AMD to keep chasing high clocking designs unless the engineers feel that it's the optimal way to extract performance at this point in time. The only reason I could think of that could make them wary of switching to lower freq, higher IPC design if it's indeed more performant is severe area penalty maybe? So severe that it would not work on server?
Here's the thing, if AMD drops their frequency to ARM levels, that's not going to suddenly give them a free double digit IPC gain. Increasing IPC is hard, and as things stand now, ARM vendors have a large IPC advantage over AMD.
That's for another thread but I suspect they'll be better off with a focus on clock rate for a bit - at least it worked for Apple.
That was about ARM.Apple is coming from much lower clock rates, so it is much easier for them to meaningfully gain frequency without blowing up power too much. Further, it appears that the M3->M4 clock gain may be largely due to FinFlex allowing them to use HP cells for the P cores, which AMD is already using so they won't get any clock benefit there (their potential benefit would be in the opposite direction, using HD cells for the 'c' cores to save a bit of power)
FinFlex is on N3E+, AMD don't have a product using it yetFurther, it appears that the M3->M4 clock gain may be largely due to FinFlex allowing them to use HP cells for the P cores, which AMD is already using so they won't get any clock benefit there (their potential benefit would be in the opposite direction, using HD cells for the 'c' cores to save a bit of power)
RDNA clocks and Zen clocks will go up with the next node is my position, other designers are getting lot of clock uplifts going to N3B why not AMD with N3E or N3P which is even better.
AMD physical implementation teams seems good enough.
AMD is 14 best case and 18 worst case. If they go down to 11 like X925, yes they will get double digit gains. They are infected with the subtler version of Bulldozer/Netburst ideology of chasing high clocks.... annnd Intel, their main competitor.Here's the thing, if AMD drops their frequency to ARM levels, that's not going to suddenly give them a free double digit IPC gain. Increasing IPC is hard, and as things stand now, ARM vendors have a large IPC advantage over AMD.
Different markets different targets.Now answering the question of why the ARM and x86 vendors have arrived at such different conclusions regarding this trade-off requires someone who actually knows what he's talking about, I have no idea personally. Any guesses as to what might be going on here?
Heh, I might regret having written that 😀And Nothingness is on record saying that another 10% would be a good result which is in line with the roadmap.
I’m not sure it makes sense to average INT and FP scores in general. Design choices are very different if you want to target high FP performance.Did they? Except for Apple, which only sells consumer hardware, aren't all other arm vendors at similar PPC* and area levels compared to Zen (C) cores?
*At least on average, considering both int and fp ppc
Informative, slightly off topic but I wasn't aware of this:I’m not sure it makes sense to average INT and FP scores in general. Design choices are very different if you want to target high FP performance.
That being said, if we look at PPC of specint 2017, Arm PPC is significantly above AMD, but along with significantly lower clocks.
There’s a nice sheet on David Huang blog that summarizes that for many chips.
for some reason I thought that SPEC will have their own libc/glibc equivalent just to provide equal playing field, but might be this would be a nightmare to maintain for relatively little influenceRegarding the performance of macOS: Due to differences in operating environments (especially macOS libc/malloc), the performance of various processors including x86_64/ARM64 running 523.xalancbmk under macOS has significant advantages over the default configuration of Linux/glibc. , other sub-items are mutually victorious. In the end, the total score of macOS will be about 3%-4% ahead of Linux.
Even though SPEC tries to abstract as much as possible score results from platform specific things, there's not much you can do about some things. The effect of malloc implementation is notorious and that's why many official results use jemalloc instead of the system default libc allocation implementation.Informative, slightly off topic but I wasn't aware of this:
for some reason I thought that SPEC will have their own libc/glibc equivalent just to provide equal playing field, but might be this would be a nightmare to maintain for relatively little influence
Even though SPEC tries to abstract as much as possible score results from platform specific things, there's not much you can do about some things. The effect of malloc implementation is notorious and that's why many official results use jemalloc instead of the system default libc allocation implementation.
I meant SPEC can't come with its own libraries and even less rely on external libraries beyond the standard C/C++/FORTRAN (and OpenMP IIRC), as much as it can't use CPU specific intrinsics or assembly language inlines/routines.SPEC doesn't try to abstract platform specific things, that's the whole point. It has always been clear that it is a SYSTEM benchmark, not a CPU benchmark.
jemalloc is better at memory fragmentation management and better too for multithreading. It seems to be used by default on FreeBSD and Firefox. All this from the jemalloc page, I never studied it thoroughly.The use of special malloc libraries in SPEC results is particularly annoying to me. Either the system malloc implementation is slow, if so fix the damn thing, or the replacement is fragile/limited and is really suitable only for SPEC runs in which case its use should be banned IMHO. Long ago I was involved with some software that supported 1000+ users on an entry level workstation. Its memory was maxed out, so reducing the size of each process was really important, and I went to some extraordinary lengths to make that happen. Not replacing malloc, but eliminating its use entirely. I replaced a few libc functions because linking them caused a ton of other stuff to be linked in and ballooned the data size of the process. That meant replacing printf/sprintf with a cut down version that could only do the things that were needed by this particular software. I can't help but wondering if these SPEC special mallocs are similar to my printf.
Having worked on embedded systems, I feel the pain you had with your app. printf is particularly nasty given the range of features it uses (FP printing is quite complex for instance). And don't start me on C++ libraries...
The clocks will absolutely rise. Even if first and foremost for MT workloads and mobile chips. I also believe slightly on desktop, but we'll see.I am as convinced with the no clock increase with Z6 as I am with the 32% IPC Z5.
note that Vmax is going down on future nodes.The clocks will absolutely rise. Even if first and foremost for MT workloads and mobile chips. I also believe slightly on desktop, but we'll see.
haha very funny. Should post that above the speculation.*above is speculation
I keep running into people who thinks desktop medusa is AM6. I think AMD is quite happy with 3 gens on a socket.Desktop is still AM5.
No earthly way AMD will increase their APU CU count by 1.5x and 1.8x just one generation after increasing it 1.33x on the base APU and creating the big APU SKU.MEDUSA POINT
12C/24CU
MEDUSA HALO
24C/72CU
That doesn't matter. Halo APUs are not available on the desktop, they only potentially compete on large laptops, and a large APU is preferable to AMD to a discrete part because it's an angle where AMD can compete against NV in an asymmetric way. If AMD has a customer willing to pay for such a part, they will make it, and what other GPU parts they have in their lineup is irrelevant.72 CU would wipe out the point of all RDNA4 SKUs when they are still barely on the market.
Dragon Range was there first.Strix Halo having 16C sounds like a lot, but desktop mainstream had that since 3950X.
The Halo parts are supposed to use similar core chiplets as the desktop, so their core counts track the desktop products. If there are more cores on desktop, there are more cores on Medusa Halo.I doubt the APU side will increase CPU core counts over Strix Halo for quite a while.
On the other hand it's also been quite a while since 3950X, and 32C doesn't seem like a stretch for desktop mainstream/hi end in 2026+
As I mentioned, 384 bits of LPDDR6-10667 will be the dream for Medusa Halo. ~450 GB/s of bandwidth (66% higher than Strix Halo's 273 GB/s).The bigger issue I have with such a large CU count is memory. What memory interface would keep that fed? Would they go for soldered LPDDR6 with a very wide interface?