Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

DisEnchantment · Sep 29, 2022

Speculate at will

poke01 · Jul 24, 2024

CouncilorIrissa said:
No, CPU design and manufacturing is just complicated.
These companies are packed with talented and highly professional people, yet things go wrong from time to time. This (Zen 4/RPC) gen should be very indicative of that.

True. I am not denying that. It’s just AMD needs to have better communication. On July 15th they said it’s launching by 31st of June. So in just one week they found out that testing wasn’t done properly.

This should have been identified months ago not weeks before launch.

Hitman928 · Jul 24, 2024

Abwx said:
That s not a more efficient condition, rather the contrary since mosfets are slower at very low temps, the higher the temp the more the transconductance up to a given temp, generaly in the 100-120°C range, at wich the conduction and speed at a given voltage start to decline, it s just that under LN2 you wont overheat the chip with a hugely over specced TDP.

Actually at -100°C you need to pump higher voltage to reach the same frequencies than at 50°C, and as already said i dont believe that the 9950X has only 3.85% better 7 Zip perf than the 7950X, this very number just doesnt make sense.

Edit : Methink that in reviews the CBR23 score will be up to 10% higher and the 7 Zip one 15-20% higher, AMD stated perf improvement over the 7950X is 22%.

Ryzen 9000/AI 300: Details zu Zen 5, RDNA 3.5, Zen 6(c) & Zen 7

Ryzen 9000 kommt am 31. Juli. Schon heute gibt es weitere technische Details zu den Desktop-CPUs und Ryzen AI 300 Strix Point für Notebooks.

www.computerbase.de

Temperature decrease reduces all of the conductor and channel resistances as well as decreases the static power consumption of the FETs. So your RC delay goes down and your static power goes down. In turn, the CPU will consume less power as the temperature is decreased, all else being equal, and will lead to more efficient performance by the CPU.

Without CB testing a 9950x, you shouldn't compare their 7zip results with AMD's as evidence that AMD's stock test is under performing. Too many variables come into play.

Josh128 · Jul 24, 2024

Abwx said:
That s not a more efficient condition, rather the contrary since mosfets are slower at very low temps, the higher the temp the more the transconductance up to a given temp, generaly in the 100-120°C range, at wich the conduction and speed at a given voltage start to decline, it s just that under LN2 you wont overheat the chip with a hugely over specced TDP.

Actually at -100°C you need to pump higher voltage to reach the same frequencies than at 50°C, and as already said i dont believe that the 9950X has only 3.85% better 7 Zip perf than the 7950X, this very number just doesnt make sense.

Edit : Methink that in reviews the CBR23 score will be up to 10% higher and the 7 Zip one 15-20% higher, AMD stated perf improvement over the 7950X is 22%.

Ryzen 9000/AI 300: Details zu Zen 5, RDNA 3.5, Zen 6(c) & Zen 7

Ryzen 9000 kommt am 31. Juli. Schon heute gibt es weitere technische Details zu den Desktop-CPUs und Ryzen AI 300 Strix Point für Notebooks.

www.computerbase.de

22% is for Blender, not CB R23. They gave a specific 17% IPC uplift for R23. That will not change. At least for ST, all you need to do is extrapolate the score vs 7950X and you will have the score for 9950X. 2050*1.17= ~2398

Abwx · Jul 24, 2024

Hitman928 said:
Temperature decrease reduces all of the conductor and channel resistances as well as decreases the static power consumption of the FETs. So your RC delay goes down and your static power goes down. In turn, the CPU will consume less power as the temperature is decreased, all else being equal, and will lead to more efficient performance by the CPU.

Static power is reduced because the lower transconducatnce imply lower leakage but that also mean lower speed at a given voltage.

Beside the conductors resistance shouldnt be significant at currents required to hit 5GHz, the RC delay getting lower wont automaticaly compensate for the lower transconductance that will increase the time required to charge the parasitic capciatances of all kind (wich are left unchanged by the low temp).

Hitman928 said:
Without CB testing a 9950x, you shouldn't compare their 7zip results with AMD's as evidence that AMD's stock test is under performing. Too many variables come into play.

We can argue at length but guess that s the only thing left to have a reasonable clue, so far there wasnt a single Cinebench score in normal conditions, perhaps that the italian guy will spill the beans, in the meantime we re trying to locate a cat in some obscure black room.

Abwx · Jul 24, 2024

Josh128 said:
22% is for Blender, not CB R23. They gave a specific 17% IPC uplift for R23. That will not change. At least for ST, all you need to do is extrapolate the score vs 7950X and you will have the score for 9950X. 2050*1.17= ~2398

The 22% in the slide i linked is a global number comprising an average of several benches, that s in AMD s footnotes.

Hitman928 · Jul 24, 2024

Abwx said:
Static power is reduced because the lower transconducatnce imply lower leakage but that also mean lower speed at a given voltage.

Beside the conductors resistance shouldnt be significant at currents required to hit 5GHz, the RC delay getting lower wont automaticaly compensate for the lower transconductance that will increase the time required to charge the parasitic capciatances of all kind (wich are left unchanged by the low temp).

Conductor resistance is a big deal on advanced nodes and channel mobility increases with lower temperature, it's not just the conductors.

I mean, we have direct tests of power use vs. temperature and decades of practical overclocking experience to tell us that your theory is not correct. I honestly thought this was just established knowledge at this point, at least in overclocking communities.

Saylick · Jul 24, 2024

IEC said:
Per The Verge:

Some people on Xitter are saying it might be a packaging issue, whatever that means. Obviously, that's not in reference to the box the CPU comes in, although it would be funny if the issue was to fix a typo on the box (gotta remove that "+32% IPC" claim

)

poke01 · Jul 24, 2024

https://twitter.com/x/status/1816203495340421274

It’s good that AMD found this issue before launch. But as I finding this issue this close to launch is a bit concerning. AMD should have done tested all scenarios and not skipped on some.

Hitman928 · Jul 24, 2024

Saylick said:
Some people on Xitter are saying it might be a packaging issue, whatever that means. Obviously, that's not in reference to the box the CPU comes in, although it would be funny if the issue was to fix a typo on the box.

According to AMD, it is not a design or packaging issue, but that they discovered that not all chips that were sent out went through QA, so they are sending out new chips to make sure they were properly tested before being sold/reviewed. See post #16,557.

Edit: If it was an actual issue with the chips, there's no chance they would be able to get them fixed and new ones out the door within a week or two. It would have to be either the QA testing miss as explained, or something wrong with the microcode/firmware that they could fix and push out quickly.

poke01 · Jul 24, 2024

Hitman928 said:
According to AMD, it is not a design or packaging issue, but that they discovered that not all chips that were sent out went through QA, so they are sending out new chips to make sure they were properly tested before being sold/reviewed. See post #16,557.

Edit: If it was an actual issue with the chips, there's no chance they would be able to get them fixed and new ones out the door within a week or two. It would have to be either the QA testing miss as explained, or something wrong with the microcode/firmware that they could fix and push out quickly.

That’s not what hardwareluxx is reporting. It’s also a hardware issue.

Saylick · Jul 24, 2024

Hitman928 said:
According to AMD, it is not a design or packaging issue, but that they discovered that not all chips that were sent out went through QA, so they are sending out new chips to make sure they were properly tested before being sold/reviewed. See post #16,557.

Edit: If it was an actual issue with the chips, there's no chance they would be able to get them fixed and new ones out the door within a week or two. It would have to be either the QA testing miss as explained, or something wrong with the microcode/firmware that they could fix and push out quickly.

Maybe. Ryan seems to think it's a packaging issue as well:

https://twitter.com/x/status/1816237383895113808

poke01 · Jul 24, 2024

poke01 said:
That’s not what hardwareluxx is reporting. It’s also a hardware issue.

More info, translated:

Quality problems ensure a complete recall of the samples and also of the processors already delivered to the trade. All processors already delivered initially will therefore be replaced by a fresh production badge. AMD does not provide any information exactly which quality problems have occurred. But apparently it is a hardware problem that cannot be fixed by software.

SK10H · Jul 24, 2024

biostud said:
Do you remember the launch problems of Zen 4 and burned burned sockets? Maybe AMD simply prefer a launch without bugs, as in September no one will remember if it was a late July or early August launch. They will however remember if their CPU is not working.

Not sure if this affect later Zen4, but the 3 7900x retail samples I got in Feb2023 that had ihs dated Jul-Aug 2022 all failed single thread corecycler AVX2 ycruncher/p95 at stock clock with no pbo or curve optimizer.
I just live with the last one with +10 curve on some cores but set static clock almost all the time. I just sidegraded to a 7800x3d recently at no cost for power efficient v/f 24/7 operation, as the Zen4 reg voltage is stupidly unoptimized below the 4.8Ghz range as I tested last year. The x3d I have obviously is a better quality die at lower clock, so pass this AVX2 test just fine at ~-20 curve.

Looking forward for ppl to test single thread AVX2 corecycler on Zen5 at stock clock no pbo/curve, and what the v/f curve look like since they sure know how to tweak the x3d die. 😏

Abwx · Jul 24, 2024

Hitman928 said:
Conductor resistance is a big deal on advanced nodes and channel mobility increases with lower temperature, it's not just the conductors.

I mean, we have direct tests of power use vs. temperature and decades of practical overclocking experience to tell us that your theory is not correct. I honestly thought this was just established knowledge at this point, at least in overclocking communities.

They said that cold bug for the 9950X occur at -130°C, it means that at this temp the device is just too slow to work, wich say that at extremely low temps lowered transconductance has more impact than the lower resistances.

It s just that under LN2 they must make sure that the silicon reach a minimal temperature to be functional, because even with LN2 it will be way over this temp once it booted and is somewhat loaded.

Hitman928 · Jul 24, 2024

Saylick said:
Maybe. Ryan seems to think it's a packaging issue as well:

https://twitter.com/x/status/1816237383895113808

poke01 said:
More info, translated:

Quality problems ensure a complete recall of the samples and also of the processors already delivered to the trade. All processors already delivered initially will therefore be replaced by a fresh production badge. AMD does not provide any information exactly which quality problems have occurred. But apparently it is a hardware problem that cannot be fixed by software.

I mean, this is just them speculating based upon the fact that mobile isn't being recalled and it can't be fixed in software. Whereas you have an AMD rep directly stating that it isn't a hardware issue but a testing one. That makes the most sense (if they're not sending out new firmware) because, like I said, if it was something in the chip, there is zero chance they could get replacements out this quickly. It's possible that some bad samples went out because they were damaged during packaging (packaging has defects and yields too) and didn't go through the proper QA testing to catch it before shipping, but that would still be what the AMD rep said, that some chips slipped through QA and so they were sending out new chips they know went through the proper testing.

Abwx · Jul 24, 2024

That could be as trivial as badly aligned SMS caps on the CPU substrate, the chips
would still work reliably but that s something to be corrected because that wouldnt look professional.

Hitman928 · Jul 24, 2024

Abwx said:
They said that cold bug for the 9950X occur at -130°C, it means that at this temp the device is just too slow to work, wich say that at extremely low temps lowered transconductance has more impact than the lower resistances.

It s just that under LN2 they must make sure that the silicon reach a minimal temperature to be functional, because even with LN2 it will be way over this temp once it booted and is somewhat loaded.

Cold bugs aren't because it's too slow, they happen because of either timing violations or that there are analog parts of the CPU that fail with the increased Vth from cold temperatures. I don't think the analog part is really a concern with modern CPUs, so it's most likely a hold time violation as the timing paths shift too far with the extreme temperatures and the data misses the edge window of the flip flop and fails to propagate to the next stage. It's not running too slow, the timings just weren't designed for that cold of operation.

Markfw · Jul 24, 2024

All I can add, is after the Intel fiasco, AMD wants to be SURE there is nothing at all wrong with what they send out, even if it causes a slight delay. 2 weeks is a slight delay. You can't get pissed about that.

Abwx · Jul 24, 2024

Hitman928 said:
I don't think the analog part is really a concern with modern CPUs, so it's most likely a hold time violation as the timing paths shift too far with the extreme temperatures and the data misses the edge window of the flip flop and fails to propagate to the next stage. It's not running too slow, the timings just weren't designed for that cold of operation.

But for time violation to occur or interstage propagation to be too slow something has to limit the speed at wich the transistors are switching since lower resistance are supposed to help...

This means that the parasistic capacitances cant be charged fast enough, that is, that the provided current are too low, wich get us back to too low transistors conductance, actually low temp would be an advantage for higher speed if it werent for the transistors worse characteristics under this condition.

In2Photos · Jul 24, 2024

Markfw said:
You can't get pissed about that.

Sure we can! This is the Internet!

Hitman928 · Jul 24, 2024

Abwx said:
But for time violation or interstage propagation to be too slow something has to limit the speed at wich the transistors are switching since lower resistance are supposed to help...

This means that the parasistic capacitances cant be charged fast enough, that is, that the provided current are too low, wich get us back to too low transistors conductance, actually low temp would be an advantage for higher speed if it werent for the transistors worse characteristics under this condition.

Timing violation does not mean too slow, it just means off. It can also be too fast. Flip flops need a narrow window for the signal to be present and held in. If the signal is too early, it will also be a timing violation. A hold time violation cannot be fixed by lowering the frequency (i.e., the signal is propagating too quickly), hence a cold bug will still be there even if you down clock as low as possible. Again, your theory is wrong. You can argue all you want, but real world tests have shown that it is not correct.

poke01 · Jul 24, 2024

Markfw said:
All I can add, is after the Intel fiasco, AMD wants to be SURE there is nothing at all wrong with what they send out, even if it causes a slight delay. 2 weeks is a slight delay. You can't get pissed about that.

Yep, Im happy AMD is doing this. Cooled down a bit and a yeah better do it now and have a smooth launch.

Josh128 · Jul 24, 2024

In2Photos said:
Sure we can! This is the Internet!

Amen, brotha.

RnR_au · Jul 24, 2024

In2Photos said:
Sure we can! This is the Internet!

If we can't have our daily drama... life becomes rather dull...

/me throws Lisa Su off the hypetrain... (╯°□°)╯︵ ┻━┻

Abwx · Jul 24, 2024

Hitman928 said:
Timing violation does not mean too slow, it just means off. It can also be too fast. Flip flops need a narrow window for the signal to be present and held in. If the signal is too early, it will also be a timing violation.

It doesnt mater if it s too early as long as the clocks rising and falling edges are fast enough, once triggered the flip flop will keep its state for at least the duration of a clock cycle.

Hitman928 said:
A hold time violation cannot be fixed by lowering the frequency (i.e., the signal is propagating too quickly), hence a cold bug will still be there even if you down clock as low as possible. Again, your theory is wrong.

Same as above, if the signal is propagated swiftly this will allow for better level validation, what is a problem actually is when clocks signal hedges are not fast enough, at wich point levels coherency can no more be maintained since the flip flops cant be switched on/off correctly if the clocks signals are not well formed, no matter what are the data signals levels and shapes.

Hitman928 said:
You can argue all you want, but real world tests have shown that it is not correct.

I never use such sentences, i mean such arguments or rather lack of, you know, things like "it s well known that", "it s shown in real world tests" and so on.

Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Golden Member

Diamond Member

Diamond Member

Banned

Lifer

Lifer

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Member

Lifer

Diamond Member

Lifer

Diamond Member

Moderator Emeritus, Elite Member

Lifer

Platinum Member

Diamond Member

Diamond Member

Banned

Platinum Member

Lifer