Question Rome has a 1044-day continuous-operation core-crash bug.

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
27,098
16,014
136
You just have to reboot after 3 years... Damn, sensationalism ???? Most data centers update bios and reboot sooner than that.

Get a grip Larry,
 
  • Haha
  • Love
Reactions: eek2121 and A///

A///

Diamond Member
Feb 24, 2017
4,351
3,160
136
amd said they wouldn't be fixing it. it's not a major issue a restart won't help. the unabashed dream of getting a high online time is a dick measuring contest going back forever.

ryzen 8000 for desktop surprised me. this may be their attempt to fix their generational naming again but as it's amd they'll mess it up again. originally i thought they wanted another generation of thinking before using the 9000 name series to avoid clashing with the fx9000 processors but then remembered the fx8000 was a series that existed. either way this is good and hopefully they think of something clever for after 9000.

don't drink the lead tainted milk, amd, don't be like intel and their now stupid naming.
 

A///

Diamond Member
Feb 24, 2017
4,351
3,160
136
Intel will come out with a CPU that lasts for 4 years...a new competitive cycle and Bullet Point will be born.
last? the rome chip doesn't die. a restart will fix the issue. intel needs to be competitive in dc for any new contract purchasers to consider them.
 
  • Like
Reactions: Tlh97 and Mopetar

sandorski

No Lifer
Oct 10, 1999
70,677
6,250
126
last? the rome chip doesn't die. a restart will fix the issue. intel needs to be competitive in dc for any new contract purchasers to consider them.

All that downtime![/someFanBoy].....I just wonder who would test it?
 

A///

Diamond Member
Feb 24, 2017
4,351
3,160
136
All that downtime![/someFanBoy].....I just wonder who would test it?
people who home brew? they run used servers at home for idk really. rome is cheap enough and has been for a few years to pick up 2nd hand with a decent motherboard and lots of ram. i gather amd got enough reports in or even through dc's that don't upgrade as frequently as others. I haven't bought ex dc processors in a few years since the pandemic began. In the xeon era the usual cycle was 3-5 years depending on the model before it got decommissioned. rome's 2nd gen but its performance has been surpassed by 3rd gen and now current or we're on 5th I don't keep track. Wasn't not long ago, maybe 5 o6 years now intel'x plat 28 core was making waves especially when hp and dell equipped their premade ws with it in dual socket form too. 10k a pop, they trade for 1100 or less on ebay now. a couple hundred for es or qs processors. amd decimated intel's shares in dc and will continue to do so to the true defintion of that word.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
27,098
16,014
136
All that downtime![/someFanBoy].....I just wonder who would test it?
Or me... If it wasn't for power outages, by 3 7742's would still be up, and probably with this bug.
 

Mopetar

Diamond Member
Jan 31, 2011
8,438
7,633
136
Not a huge issue, but depending on how long it takes to restart or the amount of redundancy, this may cause issues for service contracts that only allow for minutes of downtime per year (or decade).

It is kind of funny in a way as the only way to discover this bug is otherwise rock solid uptime.
 
Last edited:

aigomorla

CPU, Cases&Cooling Mod PC Gaming Mod Elite Member
Super Moderator
Sep 28, 2005
21,044
3,524
126
Yeah, in my early programming days I was using Sun workstations - I'd just get to, like, 345 days, and there would be a power outage ^(*&^%%! No 365 day uptimes for me.

LOL i know this feeling....

Also im pretty sure someone on the FreeNAS forum has a X58 FreeNAS server which has probably broken 7+yrs uninterrupted.

I wouldn't doubt it on a Gainstown system. Like ive said many times, id like to see you actually kill a bloomsfield cpu using it properly and not as a race car, and even then they took abuse and still lasted long enough for you to upgrade to something else, or longer then the board you were running it on at the very least.

Not a huge issue, but depending on how long it takes to restart or the amount of redundancy, this may cause issues for service contracts that only allow for minutes of downtime per year (or decade).

It is kind of funny in a way as the only way to discover this bug is otherwise rock solid uptime.

To the DC and Mining crowd... downtime was pretty close to listening to the trumpets of heaven right before the coming of the apocalypse.
 

A///

Diamond Member
Feb 24, 2017
4,351
3,160
136
the mining farm owners ought to be pushed off a cliff into the depths of hell for what they did.
 

aigomorla

CPU, Cases&Cooling Mod PC Gaming Mod Elite Member
Super Moderator
Sep 28, 2005
21,044
3,524
126
the mining farm owners ought to be pushed off a cliff into the depths of hell for what they did.

nah there will be many boxes of dead gpu's they took with them, to coushin the fall.
Also Jensen will be there with us, and im sure the AI he created will save the ex miners somehow, because we are the ones responsible for skynet.
 

A///

Diamond Member
Feb 24, 2017
4,351
3,160
136
nah there will be many boxes of dead gpu's they took with them, to coushin the fall.
Also Jensen will be there with us, and im sure the AI he created will save the ex miners somehow, because we are the ones responsible for skynet.
The mob allgedly got rid of people with cement blocks for shoes. maybe the country govs of those people overseas could duck tape the heavy cards to them and throw them overboard as a message to others. there's no benefit to these scam coins, everyone fails in the end, even the dumbasses who had farms and held onto them right as they became worthless. but saying that is admitted they had worth at some point. even beads had worth back in history's time, doesn't mean they were worth what they were claimed to be worth.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
27,098
16,014
136
nah there will be many boxes of dead gpu's they took with them, to coushin the fall.
Also Jensen will be there with us, and im sure the AI he created will save the ex miners somehow, because we are the ones responsible for skynet.
Glad you did not include DC in the comments. We rarely overclock, as we need stability above all.
 

Ajay

Lifer
Jan 8, 2001
16,094
8,112
136
Also im pretty sure someone on the FreeNAS forum has a X58 FreeNAS server which has probably broken 7+yrs uninterrupted.

I wouldn't doubt it on a Gainstown system. Like ive said many times, id like to see you actually kill a bloomsfield cpu using it properly and not as a race car, and even then they took abuse and still lasted long enough for you to upgrade to something else, or longer then the board you were running it on at the very least.
I had a Bloomfield for 3 years, overclocked to hell (920 D0), for ~3 years, till I upgraded to Gulftown, six core, 4.2GHz (would have needed water cooling to go higher). I had the I7 970 for, I think, 9 years. Crazy. I upgraded *everything else* on it over that time frame more than once.. Awesome platform.
 
  • Like
Reactions: lightmanek

eek2121

Diamond Member
Aug 2, 2005
3,384
5,011
136
You just have to reboot after 3 years... Damn, sensationalism ???? Most data centers update bios and reboot sooner than that.

Get a grip Larry,

amd said they wouldn't be fixing it. it's not a major issue a restart won't help. the unabashed dream of getting a high online time is a dick measuring contest going back forever.

ryzen 8000 for desktop surprised me. this may be their attempt to fix their generational naming again but as it's amd they'll mess it up again. originally i thought they wanted another generation of thinking before using the 9000 name series to avoid clashing with the fx9000 processors but then remembered the fx8000 was a series that existed. either way this is good and hopefully they think of something clever for after 9000.

don't drink the lead tainted milk, amd, don't be like intel and their now stupid naming.
Hot patching/reloading is a thing. 😉
 

A///

Diamond Member
Feb 24, 2017
4,351
3,160
136
Hot patching/reloading is a thing. 😉
Yeah linux has had it for over a decade. M$ only recently got that ability. sounds like M$ parlayed their knowledge gained into the windows 12 partitioned system files for live patching and not requiring reboot. Live patching is great for critical systems where redunedency may not carry over or there isn't enough. definition of critical systems varie between people.
 

itsmydamnation

Diamond Member
Feb 6, 2011
3,045
3,835
136
Yeah linux has had it for over a decade. M$ only recently got that ability. sounds like M$ parlayed their knowledge gained into the windows 12 partitioned system files for live patching and not requiring reboot. Live patching is great for critical systems where redunedency may not carry over or there isn't enough. definition of critical systems varie between people.
if you have a critical system and you cant take a node offline then your failure domain is ~1.
No matter what you do your gonna be lucky hitting 99.x availability over a prolonged period.

Some of my systems have 99.999999 availability targets. You think i care about patching 1 box ?
IF you do proper fault tree analysis you will end up around ~ n+3 at every layer to hit 8 9's.

I find some of the things people bring up in this forum really perplexing , mental gymnastics to score some points .