- Mar 3, 2017
- 1,774
- 6,757
- 136
Very often patents only differ in the details, while the general approach might be common knowledge.If it's nothing new the patent is invalid and completely wasted.
Found an interesting patent from AMD for increasing IPC by concurrently executing both sides of a branch instruction.
ALTERNATE PATH FOR BRANCH PREDICTION REDIRECT
View attachment 76053
They need a good bump size of the register file and the other OoO resources to pull this off.
There has been research/work on this since at least the 90s, and while I believe a few CPUs may do this on an extremely very limited basis (I've seen claims that Apple's big cores can run both paths in certain cases, though that may simply be to allow progress before the branch predictor has its result ready) no one has gone all-in on it because branch predictors are so good these days you won't get much out of it.
Sure, there are some branches that are essentially impossible to predict where it would be of benefit (so long as they aren't quickly followed by more such branches) but then you are paying a price in terms of additional transistors, power to operate them, and verification time for something that doesn't help you very often.
Likely not the Infinity Fabric.Watched RedGamingTech video about actual Meteor/ArrowLake and Zen5 rumors and he claimed that Zen5 will be still just 8 cores per chip, cause more cores would be starved by its Infinity fabric connection to IOD. Do you reckon to be true? I thought that when Infinity Fabric is an issue, because of the penalty it incurs on cross chip communication, then having more cores per chip is actually a good thing, because the chance of needing additional cores housed on that second CCD is lower. Or do they mean that there would be not enough physical space to connect say 12 cores instead of 8 to IOD?
We do see core counts growing, and we will continue to increase the number of cores in our core complex that are shared under an L3. As you point out, communicating through that has both latency problems, and coherency problems, but though that's what architecture is, and that's what we signed up for. It’s what we live for - solving those problems. So I'll just say that the team is already looking at what it takes to grow to a complex far beyond where we are today, and how to deliver that in the future.
CCD or CCX? Because I thought we are already expecting Bergamo to be 16 cores per chip, two CCX per CCD.and he claimed that Zen5 will be still just 8 cores per chip
Exactly this.
That idea is sometimes called eager execution. Eager execution doing twice the work and throwing one half of that burned power away is quite bad given the overall accuracy of the current predictors. Turning the eager execution on and off based on the prediction history doesn't sound easy. There is a RWT thread about this topic.
Honestly, this is a Bulls**t argument by RGT. Yes, some SKUs are already a bit starved by the IFoP, but AMD made that decision knowingly.Watched RedGamingTech video about actual Meteor/ArrowLake and Zen5 rumors and he claimed that Zen5 will be still just 8 cores per chip, cause more cores would be starved by its Infinity fabric connection to IOD. Do you reckon to be true? I thought that when Infinity Fabric is an issue, because of the penalty it incurs on cross chip communication, then having more cores per chip is actually a good thing, because the chance of needing additional cores housed on that second CCD is lower. Or do they mean that there would be not enough physical space to connect say 12 cores instead of 8 to IOD?
I think a much more likely culprit for why the CCD might stay at 8 cores is the inter-CCD core connect and poor node scaling. SRAM is essentially staying the same, and the core size of zen 5 should increase (comparative to zen 4, if they stayed on the same node I mean) because they are making the core wider.Honestly, this is a Bulls**t argument by RGT. Yes, some SKUs are already a bit starved by the IFoP, but AMD made that decision knowingly.
If AMD identified the Infinity Fabric CCX link to be a significant bottleneck, they are entirely free to widen it as much as they like and their target costing allows. @DisEnchantment already mentioned several indications. As this is an on-package topic, they don't even need to change socket, chipset or anything else that would break AM5 compatibility. There are options such as InFo-R and EFB for a physical implementation beyond plain old IFoP.
And I am still speculating a bit, that already Zen4c might be in for a surprise.
Maybe V-cache Zen 5 is a 2 Hi stack.I think a much more likely culprit for why the CCD might stay at 8 cores is the inter-CCD core connect and poor node scaling. SRAM is essentially staying the same, and the core size of zen 5 should increase (comparative to zen 4, if they stayed on the same node I mean) because they are making the core wider.
Maybe I imagine a 12 core CCD or 2 8 core CCXs in one CCD, connected with infinity fabric like zen 2 was, but even that I doubt.
Some people claim that zen 5 defaults with L3 stacked onto the CCD, but I don't see that happening since AMD already confirmed there are separate V-cache models for Zen 5.
Thats exactly what i thought, hence why i posted about it in the first place. I do not question that they will stick to 8 cores, but the reasoning for that was suspect there.Honestly, this is a Bulls**t argument by RGT.
I think a much more likely culprit for why the CCD might stay at 8 cores is the inter-CCD core connect and poor node scaling. SRAM is essentially staying the same, and the core size of zen 5 should increase (comparative to zen 4, if they stayed on the same node I mean) because they are making the core wider.
Maybe I imagine a 12 core CCD or 2 8 core CCXs in one CCD, connected with infinity fabric like zen 2 was, but even that I doubt.
Some people claim that zen 5 defaults with L3 stacked onto the CCD, but I don't see that happening since AMD already confirmed there are separate V-cache models for Zen 5.
Maybe V-cache Zen 5 is a 2 Hi stack.
Honestly, this is a Bulls**t argument by RGT. Yes, some SKUs are already a bit starved by the IFoP, but AMD made that decision knowingly.
If AMD identified the Infinity Fabric CCX link to be a significant bottleneck, they are entirely free to widen it as much as they like and their target costing allows. @DisEnchantment already mentioned several indications. As this is an on-package topic, they don't even need to change socket, chipset or anything else that would break AM5 compatibility. There are options such as InFo-R and EFB for a physical implementation beyond plain old IFoP.
And I am still speculating a bit, that already Zen4c might be in for a surprise.
They could alternatively stack it underneath. Would have its own challenges, but might be better for thermals. Though we never did get a deep dive on what the limiting factor actually is for clocks w/ v-cache.Regarding v-cache, stacking L3 cache purely onto CCD would have to be precluded by solving the clock penalty. I mean, they are stacking the cache on top of cache, not the cores themselves, and still have to clock the chips lower. Removing the cache from the die completely, would mean stacking it on top of the cores - i dont see that happening by next year. Probably not by 2025 either.
I think they could still get away with 96c reasonably comfortably if they had to, but if not, they could wait for a 3nm Zen 5 version (refresh?) and then do a sort of mid-cycle upgrade. Or they could just make a new socket with a 128c, 16 channel config, but that's probably not an ideal outcome.Well, AMD has long been rumored to substantially increase core counts with Turin, and they probably need to in order to maintain a comfortable lead over Intel in servers. I don't think AMD would plan to stay at 96 cores for serves because it's their #1 priority and they aren't dumb.
Evidence points to Strix Point already being 3nm - as to whether we will see a refreshed SKU line up using 3nm CCDs who knows.they could wait for a 3nm Zen 5 version (refresh?)
I strongly suspect that die thickness plays a significant role, especially thickness of the die on top so that there is a shorter path of thermal dissipation between the power heavy bottom die and the IHS.Though we never did get a deep dive on what the limiting factor actually is for clocks w/ v-cache
I could imagine that the contact patches for the TSVs introduce quite some resistance, producing heat, consuming more power, more electromagnetic trouble etc. pp.I strongly suspect that die thickness plays a significant role, especially thickness of the die on top so that there is a shorter path of thermal dissipation between the power heavy bottom die and the IHS.
Agreed, except about the other guy, neither of them has brought any positives to the hardware community overall. I love the AT forums because we usually discuss the core technologies instead of the frauds and clickbaiters that unfortunately dominate other hardware communities.But RGT is such a garbage tier leaker that it's not even worth putting much speculation into anything he says. Lots of people here hate on MLID, but at least he has some decent sources. RGT doesn't even seem to have that, and his speculation is always much worse. RDNA3 triple RDNA2 performance, anyone?
Based on all the official info so far, I'm very much inclined to bet on Z5 still being 8c main CCD for the consumer lineup. The question is whether they feel the pressure to go for a secondary 16c CCD to keep up with Intel's core counts. Maybe the Z4 X3D lineup with two different CCD's is a sign that they're moving to that with all the scheduling that entails. If the node advance is only minor then they will need those transistors to achieve the main 8c CCD IPC improvements that Mike Clark hinted at in Ian's interview 18 months ago.I don't see how that happens with 8-Core CCDs unless you can stack them on top of one another.
It might also be possible that there are two different CCDs with different core counts. If AMD does make a 16-core CCD, that would be pretty overkill for a lot of the market.
What evidence points to STX being on N3?Evidence points to Strix Point already being 3nm - as to whether we will see a refreshed SKU line up using 3nm CCDs who knows.
I think it more likely that 3nm will be saved for Zen5c.
Zen4c is coming out almost 1 year after Zen4, how would Zen5c be ready at the same time as Zen5 for GNR launch?RGT claims only 16C/32T for top Zen 5 desktop AM5 part. That seems kinda low as I was expecting AMD to do Zen 5 + Zen 5C (or whatever they'll call it). They could easily to 8C Zen 5 + 16C Zen 5C for a total of 24 Zen 5 cores. ISA would be the same, Zen 5C might clock ~15-20% lower but that's fine as they would still get ~30% boost versus 16C Zen 5 in MT workloads.
I would expect Zen 5 to start using stacked die in some manner, so it may look more like MI300 than Genoa, which is partially why I am so interested in exactly what is in MI300. If they can economically use the same interconnect used in RDNA3/MCDs, then that would reduce power consumption significantly. Going up to pci-e 5 or 6 speeds for SerDes based GMI has to cost a lot of power. Speculating on how stacked die are going to used or arranged is very difficult without more information.Turin with 128c is doable as 16 * 8c CCDs. The IFOP was said to have a ~20mm reach. Now Genoa features 3-deep stacks of CCDs. Turin might rearrange them to "quads" since a straight line of 4-deep CCDs would likely be out of the IFOP reach.
Zen 5c could have 12 * dual-CCX CCDs = 192c.
I thought I saw something about using TSVs to help transfer heat since they are copper rather than silicon. Keeping the thermal expansion differences in check would be difficult though.I could imagine that the contact patches for the TSVs introduce quite some resistance, producing heat, consuming more power, more electromagnetic trouble etc. pp.