Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Page 663 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

poke01

Diamond Member
Mar 8, 2022
4,196
5,542
106
No, CPU design and manufacturing is just complicated.
These companies are packed with talented and highly professional people, yet things go wrong from time to time. This (Zen 4/RPC) gen should be very indicative of that.
True. I am not denying that. It’s just AMD needs to have better communication. On July 15th they said it’s launching by 31st of June. So in just one week they found out that testing wasn’t done properly.

This should have been identified months ago not weeks before launch.
 
Last edited:

Hitman928

Diamond Member
Apr 15, 2012
6,695
12,370
136
That s not a more efficient condition, rather the contrary since mosfets are slower at very low temps, the higher the temp the more the transconductance up to a given temp, generaly in the 100-120°C range, at wich the conduction and speed at a given voltage start to decline, it s just that under LN2 you wont overheat the chip with a hugely over specced TDP.

Actually at -100°C you need to pump higher voltage to reach the same frequencies than at 50°C, and as already said i dont believe that the 9950X has only 3.85% better 7 Zip perf than the 7950X, this very number just doesnt make sense.

Edit : Methink that in reviews the CBR23 score will be up to 10% higher and the 7 Zip one 15-20% higher, AMD stated perf improvement over the 7950X is 22%.

11-1280.639cc65d.png



Temperature decrease reduces all of the conductor and channel resistances as well as decreases the static power consumption of the FETs. So your RC delay goes down and your static power goes down. In turn, the CPU will consume less power as the temperature is decreased, all else being equal, and will lead to more efficient performance by the CPU.

Without CB testing a 9950x, you shouldn't compare their 7zip results with AMD's as evidence that AMD's stock test is under performing. Too many variables come into play.
 

Josh128

Golden Member
Oct 14, 2022
1,318
1,983
106
That s not a more efficient condition, rather the contrary since mosfets are slower at very low temps, the higher the temp the more the transconductance up to a given temp, generaly in the 100-120°C range, at wich the conduction and speed at a given voltage start to decline, it s just that under LN2 you wont overheat the chip with a hugely over specced TDP.

Actually at -100°C you need to pump higher voltage to reach the same frequencies than at 50°C, and as already said i dont believe that the 9950X has only 3.85% better 7 Zip perf than the 7950X, this very number just doesnt make sense.

Edit : Methink that in reviews the CBR23 score will be up to 10% higher and the 7 Zip one 15-20% higher, AMD stated perf improvement over the 7950X is 22%.

11-1280.639cc65d.png


22% is for Blender, not CB R23. They gave a specific 17% IPC uplift for R23. That will not change. At least for ST, all you need to do is extrapolate the score vs 7950X and you will have the score for 9950X. 2050*1.17= ~2398
 

Abwx

Lifer
Apr 2, 2011
11,884
4,873
136
Temperature decrease reduces all of the conductor and channel resistances as well as decreases the static power consumption of the FETs. So your RC delay goes down and your static power goes down. In turn, the CPU will consume less power as the temperature is decreased, all else being equal, and will lead to more efficient performance by the CPU.
Static power is reduced because the lower transconducatnce imply lower leakage but that also mean lower speed at a given voltage.

Beside the conductors resistance shouldnt be significant at currents required to hit 5GHz, the RC delay getting lower wont automaticaly compensate for the lower transconductance that will increase the time required to charge the parasitic capciatances of all kind (wich are left unchanged by the low temp).


Without CB testing a 9950x, you shouldn't compare their 7zip results with AMD's as evidence that AMD's stock test is under performing. Too many variables come into play.

We can argue at length but guess that s the only thing left to have a reasonable clue, so far there wasnt a single Cinebench score in normal conditions, perhaps that the italian guy will spill the beans, in the meantime we re trying to locate a cat in some obscure black room.
 

Abwx

Lifer
Apr 2, 2011
11,884
4,873
136
22% is for Blender, not CB R23. They gave a specific 17% IPC uplift for R23. That will not change. At least for ST, all you need to do is extrapolate the score vs 7950X and you will have the score for 9950X. 2050*1.17= ~2398
The 22% in the slide i linked is a global number comprising an average of several benches, that s in AMD s footnotes.
 

Hitman928

Diamond Member
Apr 15, 2012
6,695
12,370
136
Static power is reduced because the lower transconducatnce imply lower leakage but that also mean lower speed at a given voltage.

Beside the conductors resistance shouldnt be significant at currents required to hit 5GHz, the RC delay getting lower wont automaticaly compensate for the lower transconductance that will increase the time required to charge the parasitic capciatances of all kind (wich are left unchanged by the low temp).

Conductor resistance is a big deal on advanced nodes and channel mobility increases with lower temperature, it's not just the conductors.

I mean, we have direct tests of power use vs. temperature and decades of practical overclocking experience to tell us that your theory is not correct. I honestly thought this was just established knowledge at this point, at least in overclocking communities.
 

Hitman928

Diamond Member
Apr 15, 2012
6,695
12,370
136
Some people on Xitter are saying it might be a packaging issue, whatever that means. Obviously, that's not in reference to the box the CPU comes in, although it would be funny if the issue was to fix a typo on the box.

According to AMD, it is not a design or packaging issue, but that they discovered that not all chips that were sent out went through QA, so they are sending out new chips to make sure they were properly tested before being sold/reviewed. See post #16,557.

Edit: If it was an actual issue with the chips, there's no chance they would be able to get them fixed and new ones out the door within a week or two. It would have to be either the QA testing miss as explained, or something wrong with the microcode/firmware that they could fix and push out quickly.
 

poke01

Diamond Member
Mar 8, 2022
4,196
5,542
106
According to AMD, it is not a design or packaging issue, but that they discovered that not all chips that were sent out went through QA, so they are sending out new chips to make sure they were properly tested before being sold/reviewed. See post #16,557.

Edit: If it was an actual issue with the chips, there's no chance they would be able to get them fixed and new ones out the door within a week or two. It would have to be either the QA testing miss as explained, or something wrong with the microcode/firmware that they could fix and push out quickly.
That’s not what hardwareluxx is reporting. It’s also a hardware issue.
 

Saylick

Diamond Member
Sep 10, 2012
4,033
9,454
136
According to AMD, it is not a design or packaging issue, but that they discovered that not all chips that were sent out went through QA, so they are sending out new chips to make sure they were properly tested before being sold/reviewed. See post #16,557.

Edit: If it was an actual issue with the chips, there's no chance they would be able to get them fixed and new ones out the door within a week or two. It would have to be either the QA testing miss as explained, or something wrong with the microcode/firmware that they could fix and push out quickly.
Maybe. Ryan seems to think it's a packaging issue as well:
 

poke01

Diamond Member
Mar 8, 2022
4,196
5,542
106
That’s not what hardwareluxx is reporting. It’s also a hardware issue.
More info, translated:

Quality problems ensure a complete recall of the samples and also of the processors already delivered to the trade. All processors already delivered initially will therefore be replaced by a fresh production badge. AMD does not provide any information exactly which quality problems have occurred. But apparently it is a hardware problem that cannot be fixed by software.
 

SK10H

Member
Jun 18, 2015
128
62
101
Do you remember the launch problems of Zen 4 and burned burned sockets? Maybe AMD simply prefer a launch without bugs, as in September no one will remember if it was a late July or early August launch. They will however remember if their CPU is not working.
Not sure if this affect later Zen4, but the 3 7900x retail samples I got in Feb2023 that had ihs dated Jul-Aug 2022 all failed single thread corecycler AVX2 ycruncher/p95 at stock clock with no pbo or curve optimizer.
I just live with the last one with +10 curve on some cores but set static clock almost all the time. I just sidegraded to a 7800x3d recently at no cost for power efficient v/f 24/7 operation, as the Zen4 reg voltage is stupidly unoptimized below the 4.8Ghz range as I tested last year. The x3d I have obviously is a better quality die at lower clock, so pass this AVX2 test just fine at ~-20 curve.

Looking forward for ppl to test single thread AVX2 corecycler on Zen5 at stock clock no pbo/curve, and what the v/f curve look like since they sure know how to tweak the x3d die. 😏
 

Abwx

Lifer
Apr 2, 2011
11,884
4,873
136
Conductor resistance is a big deal on advanced nodes and channel mobility increases with lower temperature, it's not just the conductors.

I mean, we have direct tests of power use vs. temperature and decades of practical overclocking experience to tell us that your theory is not correct. I honestly thought this was just established knowledge at this point, at least in overclocking communities.

They said that cold bug for the 9950X occur at -130°C, it means that at this temp the device is just too slow to work, wich say that at extremely low temps lowered transconductance has more impact than the lower resistances.

It s just that under LN2 they must make sure that the silicon reach a minimal temperature to be functional, because even with LN2 it will be way over this temp once it booted and is somewhat loaded.
 

Hitman928

Diamond Member
Apr 15, 2012
6,695
12,370
136
Maybe. Ryan seems to think it's a packaging issue as well:

More info, translated:

Quality problems ensure a complete recall of the samples and also of the processors already delivered to the trade. All processors already delivered initially will therefore be replaced by a fresh production badge. AMD does not provide any information exactly which quality problems have occurred. But apparently it is a hardware problem that cannot be fixed by software.

I mean, this is just them speculating based upon the fact that mobile isn't being recalled and it can't be fixed in software. Whereas you have an AMD rep directly stating that it isn't a hardware issue but a testing one. That makes the most sense (if they're not sending out new firmware) because, like I said, if it was something in the chip, there is zero chance they could get replacements out this quickly. It's possible that some bad samples went out because they were damaged during packaging (packaging has defects and yields too) and didn't go through the proper QA testing to catch it before shipping, but that would still be what the AMD rep said, that some chips slipped through QA and so they were sending out new chips they know went through the proper testing.
 

Abwx

Lifer
Apr 2, 2011
11,884
4,873
136
That could be as trivial as badly aligned SMS caps on the CPU substrate, the chips
would still work reliably but that s something to be corrected because that wouldnt look professional.
 

Hitman928

Diamond Member
Apr 15, 2012
6,695
12,370
136
They said that cold bug for the 9950X occur at -130°C, it means that at this temp the device is just too slow to work, wich say that at extremely low temps lowered transconductance has more impact than the lower resistances.

It s just that under LN2 they must make sure that the silicon reach a minimal temperature to be functional, because even with LN2 it will be way over this temp once it booted and is somewhat loaded.

Cold bugs aren't because it's too slow, they happen because of either timing violations or that there are analog parts of the CPU that fail with the increased Vth from cold temperatures. I don't think the analog part is really a concern with modern CPUs, so it's most likely a hold time violation as the timing paths shift too far with the extreme temperatures and the data misses the edge window of the flip flop and fails to propagate to the next stage. It's not running too slow, the timings just weren't designed for that cold of operation.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
27,235
16,106
136
All I can add, is after the Intel fiasco, AMD wants to be SURE there is nothing at all wrong with what they send out, even if it causes a slight delay. 2 weeks is a slight delay. You can't get pissed about that.
 

Abwx

Lifer
Apr 2, 2011
11,884
4,873
136
I don't think the analog part is really a concern with modern CPUs, so it's most likely a hold time violation as the timing paths shift too far with the extreme temperatures and the data misses the edge window of the flip flop and fails to propagate to the next stage. It's not running too slow, the timings just weren't designed for that cold of operation.

But for time violation to occur or interstage propagation to be too slow something has to limit the speed at wich the transistors are switching since lower resistance are supposed to help...

This means that the parasistic capacitances cant be charged fast enough, that is, that the provided current are too low, wich get us back to too low transistors conductance, actually low temp would be an advantage for higher speed if it werent for the transistors worse characteristics under this condition.
 

Hitman928

Diamond Member
Apr 15, 2012
6,695
12,370
136
But for time violation or interstage propagation to be too slow something has to limit the speed at wich the transistors are switching since lower resistance are supposed to help...

This means that the parasistic capacitances cant be charged fast enough, that is, that the provided current are too low, wich get us back to too low transistors conductance, actually low temp would be an advantage for higher speed if it werent for the transistors worse characteristics under this condition.

Timing violation does not mean too slow, it just means off. It can also be too fast. Flip flops need a narrow window for the signal to be present and held in. If the signal is too early, it will also be a timing violation. A hold time violation cannot be fixed by lowering the frequency (i.e., the signal is propagating too quickly), hence a cold bug will still be there even if you down clock as low as possible. Again, your theory is wrong. You can argue all you want, but real world tests have shown that it is not correct.
 

poke01

Diamond Member
Mar 8, 2022
4,196
5,542
106
All I can add, is after the Intel fiasco, AMD wants to be SURE there is nothing at all wrong with what they send out, even if it causes a slight delay. 2 weeks is a slight delay. You can't get pissed about that.
Yep, Im happy AMD is doing this. Cooled down a bit and a yeah better do it now and have a smooth launch.
 
  • Haha
Reactions: igor_kavinski

Abwx

Lifer
Apr 2, 2011
11,884
4,873
136
Timing violation does not mean too slow, it just means off. It can also be too fast. Flip flops need a narrow window for the signal to be present and held in. If the signal is too early, it will also be a timing violation.

It doesnt mater if it s too early as long as the clocks rising and falling edges are fast enough, once triggered the flip flop will keep its state for at least the duration of a clock cycle.

A hold time violation cannot be fixed by lowering the frequency (i.e., the signal is propagating too quickly), hence a cold bug will still be there even if you down clock as low as possible. Again, your theory is wrong.

Same as above, if the signal is propagated swiftly this will allow for better level validation, what is a problem actually is when clocks signal hedges are not fast enough, at wich point levels coherency can no more be maintained since the flip flops cant be switched on/off correctly if the clocks signals are not well formed, no matter what are the data signals levels and shapes.

You can argue all you want, but real world tests have shown that it is not correct.
I never use such sentences, i mean such arguments or rather lack of, you know, things like "it s well known that", "it s shown in real world tests" and so on.
 
Last edited: