Intel processors crashing Unreal engine games (and others)

Page 54 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

positivedoppler

Golden Member
Apr 30, 2012
1,144
236
116
Bro, that was so long ago I don't remember details just that the first was DOA (replaced), the 2nd lasted less then a month (replaced), then the 3rd lasted a little over a year before it quit working. And yeah, it gives me pause, I don't mean I'm breaking out into hives and huffing into a paper bag lol. I'd be lying if I said I wasn't still a little shy.

That's why I'm in here asking the questions, I am asking. Some of the folks in here are going to be leaps and bounds smarter than I am, more informed, etc. I wouldn't call myself the average desktop user, but I wouldn't really call myself enthusiast-level either. Kind of sitting inbetween.

Even a badly made cpu shouldnt go 0 for 3, and i had a Cyrix.
My overclocked 1.4ghz athlon lasted me at least 7 years.
 

gdansk

Diamond Member
Feb 8, 2011
4,195
7,035
136
A lot of spitballing on causes, I think. But I'm still not convinced that even Intel knows it all. They are still investigating why the mobile parts aren't impacted, apparently.
 

lakedude

Platinum Member
Mar 14, 2009
2,778
528
126
I remember people sticking with Intel 20+ years ago because the perception was Intel = stable and AMD = unstable. Where are these people today?
Right here partner. AMD and Intel have both had their share of problems over the years. Our company sold only Intel back when AMD lacked thermal protection and if you don't understand why that was a good decision at the time please don't bother responding.

AMD had several bad years when they screwed over their foundry and then their foundry screwed them back.

Now that they are with TSMC they are doing much better.

We are not married to any company. You gotta earn our business and AMD did exactly that for our most recent laptop and desktop purchases. If this thread is any indication looks like we dodged a bullet.
 

dmens

Platinum Member
Mar 18, 2005
2,275
965
136
CEO Snowball :hearteyecat:

Steve mentioned the used market issues this will create. I could not, in good conscience, sell a CPU I am not highly confident will serve the next owner well for years to come. My reputation and integrity mean far more to me than a few $100.

He is getting the quote for the FA lab, he is going to try to send off one for educational and edifications purposes. If any of you Anandtech ballers can afford to buy some merch or donate, now is the time.

The company that reached out to him is much bigger than MSI or Asus, I still have Dell on my bingo card.

Blanket do not buy 13or14 until further notice still in effect. He advised if someone bought the CPU recently to either do all the MC and bios stuff next month and put it out of their mind. Or return it. Anyone that has ruled out common instability issues should RMA pronto.

He threw a lot of shade at the dirtbag way they have gone about all of this. Along with all of the RMA rejection revisits ala Asus.

Epic going to AMD is another kick in the jimmy. Buying those high ASP SKUs too. The lost sales, the lost confidence, the avalanche of RMAs, the coming lawsuits, the nightmare media attention = category 5 crapstorm with massive financial damages that will take years to know the full extent.

More reports of 50% failure rates with significant performance stealing settings not resolving the problems.

Says this is Intel's biggest FUBAR in the 16yrs since he started GN.

@dmens told us Intel was falling apart. This must be him watching all of this go down

omg-yes-antonio-banderas.gif
Heh I was told I’ve been name-dropped so I figure I’d show up like Beetlejuice. Been so busy at the fruit company and time flies.

So, the official Intel explanation makes zero sense. What really boggles the mind is Intel throwing the fab under bus by blaming “oxidation” on vias. OK, so if it is a premature long term failure (i.e. degradation after 10+ years showing up prematurely), how in the world can it be fixed with pcode adjustment, which can only possible alter voltage (or frequency, but that makes zero difference here).

No, this is obviously voltage induced stress that Intel was supposed to catch by running accelerated aging simulations in the racks. Here is how that works the design team (not the fab folks) design an aging simulation by running a power virus at elevated temps (~120C) and elevated voltages. This protocol should have exposed any issues related to the grotesque over-voltage that Intel has deployed in recent years in an attempt to stay competitive in benchmarks.

That leaves some possible scenarios:
- Raptorlake was rushed out so quickly they didn’t even have enough time to run the whole aging simulation: seems unlikely since it only takes 6-8 weeks
- The aging simulation is broken: how? The content should not have changed… the corners should be well above what is productized.
- They didn’t even bother running the aging simulation: that would be beyond irresponsible.
- They ran the simulation and found issues but just either ignored it due to time-to-market or competitive pressure, or management told the engineering team to swag an operating point where the issues wouldn’t crop up. Problem with that is the aging simulation is a statistical exercise: they likely wouldn’t have enough parts and rack time to re-verify at the actual swagged voltage: the entire point of the aging simulation is to go well beyond the actual points and tease out the errors without having to run millions of parts for months on end. Management must have known this and brought the part to market anyways.

This is obviously a recall situation. I don’t know how blaming the fab makes a difference either way, it is obviously an error on the part of the design team management operating under extreme marketing pressure. I am personally not shocked by any of this: running parts at the kinds of voltages that Intel has pushed for the last couple years would inevitably result in something like this happening.
 
Last edited:

Nothingness

Diamond Member
Jul 3, 2013
3,292
2,360
136
Heh I was told I’ve been name-dropped so I figure I’d show up like Beetlejuice. Been so busy at the fruit company and time flies.

So, the official Intel explanation makes zero sense. What really boggles the mind is Intel throwing the fab under bus by blaming “oxidation” on vias. OK, so if it is a premature long term failure (i.e. degradation after 10+ years showing up prematurely), how in the world can it be fixed with pcode adjustment, which can only possible alter voltage (or frequency, but that makes zero difference here).

No, this is obviously voltage induced stress that Intel was supposed to catch by running accelerated aging simulations in the racks. Here is how that works the design team (not the fab folks) design an aging simulation by running a power virus at elevated temps (~120C) and elevated voltages. This protocol should have exposed any issues related to the grotesque over-voltage that Intel has deployed in recent years in an attempt to stay competitive in benchmarks.

That leaves some possible scenarios:
- Raptorlake was rushed out so quickly they didn’t even have enough time to run the whole aging simulation: seems unlikely since it only takes 6-8 weeks
- The aging simulation is broken: how? The content should not have changed… the corners should be well above what is productized.
- They didn’t even bother running the aging simulation: that would be beyond irresponsible.
- They ran the simulation and found issues but just either ignored it due to time-to-market or competitive pressure, or management told the engineering team to swag an operating point where the issues wouldn’t crop up. Problem with that is the aging simulation is a statistical exercise: they likely wouldn’t have enough parts and rack time to re-verify at the actual swagged voltage: the entire point of the aging simulation is to go well beyond the actual points and tease out the errors without having to run millions of parts for months on end. Management must have known this and brought the part to market anyways.

This is obviously a recall situation. I don’t know how blaming the fab makes a difference either way, it is obviously an error on the part of the design team management operating under extreme marketing pressure. I am personally not shocked by any of this: running parts at the kinds of voltages that Intel has pushed for the last couple years would inevitably result in something like this happening.
I can only imagine how much Intel engineers are frustrated at the moment. Depressing.
 
Jul 27, 2020
25,996
17,941
146
I can only imagine how much Intel engineers are frustrated at the moment. Depressing.
Are they really? There's nothing they can do. It's not their fault. This is purely a management mistake. They told the engineers what to do. The engineers must've voiced their concerns (as one of MLID's sources said that the ring bus thingy was a major concern even with Alder Lake). Management was like, "F U, peasants! You do what we tell you. Now crank this thing up to 11 and don't you dare speak up again!". If this happened to me, I would remain perfectly calm and unconcerned when things go bad. Whatever management says to me at that point, my simple answer would be, "We had this conversation before. I told you this and this and this. My advice was ignored. I did my job to the best of my ability. Don't ask me to clean up the mess that I was not willing to create in the first place."

This is actually what I told MY management when they didn't like the performance improvement of our core application in Azure Cloud. I did the necessary benchmarking and told them before going live that they were spending way too much for relatively paltry gains. They ignored me. If I hadn't done my job, I would be out on the streets right now because the easiest thing for them would be to lay all the blame on me. Now they can't coz I have evidence and they won't dare touch my hornet's nest :D
 

jpiniero

Lifer
Oct 1, 2010
16,490
6,983
136
OK, so if it is a premature long term failure (i.e. degradation after 10+ years showing up prematurely), how in the world can it be fixed with pcode adjustment, which can only possible alter voltage (or frequency, but that makes zero difference here).

Because it's a bug in how high it sets the voltage?
 

positivedoppler

Golden Member
Apr 30, 2012
1,144
236
116
I dont know if staying quiet is the best choice. They are risking their own job and retirement. This can nuke Intel and cascade them down to 2016 AMD level unless Congress saves them again. It's not only the current rma cost, but the future cost as consumers and volume corporate customers abondons Intel with their brand reputation now in the gutters. Intel needs these high volume and a dominant position in market share to fund new nodes. Without a competitive node, the ship will eventually sink.
 

gdansk

Diamond Member
Feb 8, 2011
4,195
7,035
136
The fact that Intel can sell defective CPUs is astonishing and no one opened a class action lawsuit yet is strange.
It is odd. AMD settled a Bulldozer class action lawsuit over misleading terminology. And I think this situation is far more misleading.

They have reviews that mainly use settings that Intel no longer recommends. They have millions of chips that may have reduced lifespan due to a microcode bug. There are an unknown number already crashing due to it and in a way that will often cause people to suspect/blame software and other hardware. Is it causing reputational damage to Nvidia and Unreal because their software is what triggers a CPU bug?

If they don't want to do a recall or extend the warranty period for Raptor Lake S then I do think a lawsuit is in order.
 

inquiss

Senior member
Oct 13, 2010
464
685
136
A lot of spitballing on causes, I think. But I'm still not convinced that even Intel knows it all. They are still investigating why the mobile parts aren't impacted, apparently.
Who says the mobile parts are not affected? Alderon games said they had found that they were ..
 

jpiniero

Lifer
Oct 1, 2010
16,490
6,983
136
That would be a design issue, not an oxidation problem with the process as they claim.

Didn't Intel claim that the oxidation issue was fixed a long time ago and only a small number of units? The fast degradation is being caused by too high of voltage.

If they don't want to do a recall or extend the warranty period for Raptor Lake S then I do think a lawsuit is in order.

Best you will get is maybe being more generous with RMA. Intel typically only produces product for 3 years, and K is usually even shorter than that. They simply wouldn't have product that long to have an extended warranty period.

Especially when it seems that with AI hype, Intel feels necessary to bring out another gen (Bartlett Lake) rather than extending Raptor Lake for as long as needed.
 
Jul 27, 2020
25,996
17,941
146
They simply wouldn't have product that long to have an extended warranty period.

Especially when it seems that with AI hype, Intel feels necessary to bring out another gen (Bartlett Lake) rather than extending Raptor Lake for as long as needed.
Yes, they would. You mentioned it in your post and apparently forgot about it? All affected RPL-S/RPL-R customers get upgraded to Barty Lakey!