- Mar 3, 2017
- 1,747
- 6,598
- 136
Have AMD themselves mentioned that the FPUs differ?Will be interesting to see iso-clock benches with neutered FPU Strix vs full FPU GNR.
At Computex they claimed ZEN5 trashes 14900K. If these Numbers are right, 14900K will be slightly faster instead
I'm not convinced the clustered decode on Zen 5 works well on ST. David Huang got zero from his ST tests
You are too noble. I would manifest for it to fall into my lapIf this is the case, 9950X will be priced at or below $600. I am going extremely optimistically with $499 and know that will be unlikely unless I can manifest it hard enough. I've got my copy of The Secret on my desk and I try to re-read parts of it every day.
yes , i said in the other thread the fact they are trying to look further ahead for branches and that the uop cache is multi ported says they are trying to run ahead and stick stuff in the op cacheHe used a sequence of NOPs specially crafted to measure it, not a realistic code. The explanation could be that the sequence had no branches. Different microbenchmarking code would be need to catch the effect of both decoder clusters getting used.
He used a sequence of NOPs specially crafted to measure it, not a realistic code. The explanation could be that the sequence had no branches. Different microbenchmarking code would be need to catch the effect of both decoder clusters getting used.
except for all the op fusion they kept. why would dual decode affect op fusion , its post decode at dispatch AFAIK.I'm not sure whether this is pertinent to the benchmark results, but it's worth mentioning Clark stated no op fusion was removed in Zen 5 during the Chips & Cheese interview. Reading between the lines a bit, it sounds like AMD sacrificed a number of optimizations that would have needed to be rebuilt for dual decode on the alter of cadence.
except for all the op fusion they kept. why would dual decode affect op fusion , its post decode at dispatch AFAIK.
Part of the reason I would say we didn’t put let’s say no op fusion into Zen 5 is that we had that wider dispatch. Zen 1 to Zen 4 had that 6 wide dispatch and 4 ALUs, so getting the most out of that 6-wide dispatch was important and it drove some complexity into the dispatch interface to be able to do that. When looking at having the capability of an 8-wide dispatch and putting no op fusion on top of it, it didn’t really seem to pay off for the complexity because we had that wider dispatch natively. But you may see it come back. Zen 5 is sort of a foundational change to get to that 8-wide dispatch and 6 ALUs. We’re now going to try to optimize that pinch point of the architecture to get more and more out of it and so you know as we move forward, no op fusion is likely to come back as a good leverage of that eight wide dispatch. But for the first generation, we didn’t want to bite off the complexity.
why lie by omission.............From Mike Clark:
There are two micro-op caches... are there two micro-op queues also? If the two paths converge at dispatch, it's plausible there's complexity in fusing ops that arrive from different paths. If they converge prior to dispatch, maybe fusion is taking place earlier?
Mike Clark: We don’t support no op (NOP) fusion. We do have a lot of op fusion that’s similar, we still fuse branches and there’s some other cases that we fuse.
Part of the reason I would say we didn’t put let’s say no op fusion into Zen 5 is that we had that wider dispatch. Zen 1 to Zen 4 had that 6 wide dispatch and 4 ALUs, so getting the most out of that 6-wide dispatch was important and it drove some complexity into the dispatch interface to be able to do that. When looking at having the capability of an 8-wide dispatch and putting no op fusion on top of it, it didn’t really seem to pay off for the complexity because we had that wider dispatch natively. But you may see it come back. Zen 5 is sort of a foundational change to get to that 8-wide dispatch and 6 ALUs. We’re now going to try to optimize that pinch point of the architecture to get more and more out of it and so you know as we move forward, no op fusion is likely to come back as a good leverage of that eight wide dispatch. But for the first generation, we didn’t want to bite off the complexity.
why lie by omission.............
Nothing I can find. And if what David found is true, I mean the silicon is different and it wasn't an ES effect [not fully working microcode etc] then it's big lie by omission on AMD marketing dept. part seeing the press materials never differentiate Strix-Point Zen5 core from Granite Ridge Zen5 core, they only mention the distinction between Zen5 and Zen5c.Have AMD themselves mentioned that the FPUs differ?
Watch out!RTG: Ladies and Gentlemen, I was told from sources that Zen 6 will have mid-double digits IPC gain. Well, the middle between 10% and 99% is roughly 60%. Zen60% confirmed.
Source (probably): Expect modest gains for Zen 6, like no more than 15% IPC gain.
Will the leaker increase the screenshot dosage now that r23 scores are out anyway ?You are too noble. I would manifest for it to fall into my lap
You can get fine wine already with cachyOS with their latest Zen4/5 optimised release.Cores do usually gain performance over time as codebases get updated, but Z5 does seem to be an outsized FineWine candidate based on what Clark said.
Stop teasing and post the juicy stuff already
So at 80W Z5 is roughly equivalent to stock 5950x (140w?)
I was talking about ALU count, looks like it's the same six as Zen4? Did they just made them wider then? I was under impression that the ALU count was substantially increased, hence all this 'jebaited expectations' spiel of last few weeks.The FPU had large changes.
2x vector register file
It s 12% faster than a stock 5950X, from the previous pic at 60W where it is 8.45% below it should match it at about 68W in this plateform.So at 80W Z5 is roughly equivalent to stock 5950x (140w?)
Why would they increase the execution resources on the FP side, if they cannot sustain more than 2x512b loads per cycle? [It's still a great improvement from Zen4 btw, that could do only 1x512b]. Not sure what is the story with FP stores if they can do 2x512b or 1x512, but either of them is also nice improvement over zen4 that could only do 0.5x512b per cycle store.I was talking about ALU count, looks like it's the same six as Zen4? Did they just made them wider then? I was under impression that the ALU count was substantially increased, hence all this 'jebaited expectations' spiel of last few weeks.
Dunno if real numbers, but this was posted over at WCCTech "forum" by a 13900KF user
13900KF @ 150w packet power = 34.9k points in Cinebench r23
13900KF @ 170w packet power = 36.3k points in Cinebench r23
View attachment 103153
13900KF @ 190w packet power = 37.2k points in Cinebench r23