2 x A770 16GB issue with ASROCK rack mobo

Jul 27, 2020
26,754
18,435
146
Posting here because most server mobo owners are here.

Installed the two cards. Debug LED gets stuck at 94 which means card not recognized. Except on the 8th or 9th power cycle, it did get into Windows, installed the drivers and both cards were visble as OK in Device Manager. The driver installer recommended a reboot and now I'm back with the code 94. Any tricks to get boot working reliably?

Both cards and mobo (Zen 2 Epyc mobo) are ASROCK made. Annoyingly ironic.
 
Jul 27, 2020
26,754
18,435
146
I'm not sure. But even if it is turned off, why did the mobo boot one time? Sometimes, the debug LED turns off. Why this weird behavior?
 

Icecold

Golden Member
Nov 15, 2004
1,145
1,088
146
I'm not sure. But even if it is turned off, why did the mobo boot one time? Sometimes, the debug LED turns off. Why this weird behavior?
I've noticed all kinds of flakiness with multiple GPU's with above 4G decoding turned off. I would try to turn that on (if it's not already) and see what happens. If it's turned off there's not enough PCIe resources to fully allocate to the GPU's, so maybe the one time it booted you lucked out and maybe it disabled your onboard NICs or something and allowed it to boot. Hard to say but that's definitely the first thing I'd check with the symptoms you're describing.
 
  • Like
Reactions: igor_kavinski
Jul 27, 2020
26,754
18,435
146
Makes sense. I got it to boot two more times. Once I tried to load Win11 but it failed with Video_TDR_Failure message. Second time, It automatically went to Win2019 Server because I wasn't there. It was sluggish as hell. Also, the first time it booted fine into Win11, it showed that the 2nd card had been disabled by Windows because it reported some issue. But after about 15 minutes of installing the driver (unusually long time), both cards looked fine in Device Manager. Now the moment it shows the ASROCK Rack splash screen, need to get into BIOS.
 
Last edited:

Skillz

Golden Member
Feb 14, 2014
1,149
1,165
136
I would try the cards in a different system just to make sure they work reliably. Outside of that, I am assuming this is the EPYC8D or ROMED8 board you're using. Both of which I have and I've got 2 - 7 GPUs in mine. Do you have a PCIe power connected to the motherboard's PCIe power plug on the other side of the PCIe slots? Also I would disable discrete video output in the BIOS and let the video output go to the BMC. You'll need to either use a VGA cable to plug your monitor in or get the BMC/IPMI IP address and remotely login to the host on another computer.

Also the BMC/IPMI logs might give some insight on what's going on.
 
Jul 27, 2020
26,754
18,435
146
1743464785243.png

I gave up. Now need to check these two cards if I burnt them out or something coz it was pretty warm around here and the cards probably are not designed to spend so much time outside an OS. They were scorching hot when I touched them. Have not been able to get any of them to boot after the fluke boot where I was able to install the drivers and ensure that the cards were showing in Device Manager without issue. Really, REALLY regret booting the OS after that successful driver install. At least could've checked whether the cards were working fine.

I checked in the BIOS and Above 4G decoding is enabled. The last successful boot from the onboard VGA, the ARC card was installed but did not show in Device Manager. This sucks big time.
 

Skillz

Golden Member
Feb 14, 2014
1,149
1,165
136
Try turning off all virtualization and legacy boot options in the BIOS. While leaving 4G encoding on.
 
Jul 27, 2020
26,754
18,435
146
Try turning off all virtualization and legacy boot options in the BIOS. While leaving 4G encoding on.
Ah. Funny. I turned on virtualization after checking that 4G decoding was on and that led to a 100% boot failure rate for the ARC card. Any idea why that is?

My fingers hurt from pressing the stupid PCIe slot release lever. Need to first confirm that both cards are working in another consumer system. Then I will try again :(
 
Jul 27, 2020
26,754
18,435
146
Outside of that, I am assuming this is the EPYC8D or ROMED8 board you're using. Both of which I have and I've got 2 - 7 GPUs in mine. Do you have a PCIe power connected to the motherboard's PCIe power plug on the other side of the PCIe slots?
Yeah it's one of those (don't remember the exact model). You are using riser cables to have all those cards plugged in, right? Coz a single card blocks the next PCIe x16 slot. Did you face any issues the first time you installed those cards?
 

Skillz

Golden Member
Feb 14, 2014
1,149
1,165
136
You have the EPYCD8 model.

Yes, the riser cards are a #$$@# as some of them don't work when you get them. Testing them all takes a lot of time.

Getting the cards to show up on a couple hosts I ran into issues, but usually solved them by changing different things. From 4G encoding, virtualization crap, legacy boot crap and changing the PCIe gen version.

In fact, now that I think about it. I think that might be your problem. I think those are Gen4 cards. Try changing the PCIe Gen to Gen3. They're probably on auto right now.
 
  • Love
Reactions: igor_kavinski

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
27,122
16,032
136
Use a regular 9 pin video card (vga I think they call it) with one of these cards only. no connection on the PCI express cards, just the 9 pin VGA on the motherboard. It always works. Even with the PCI express card has a problem, pretty sure it will still boot.
 
  • Like
Reactions: igor_kavinski
Jul 27, 2020
26,754
18,435
146
OK, this time, it rebooted twice before getting to the Windows bootloader and then there was some display output crap. Turned the monitor off and back on and the ARC's HDMI output showed the Win11 login screen!

Testing it right now in some benchmarks. Then going to put the other one in and see if that works too.
 
  • Like
Reactions: Skillz

Skillz

Golden Member
Feb 14, 2014
1,149
1,165
136
Glad you got it figure out.

You should do some Primegrid and other BOINC GPU Projects for us on those GPUs. I don't think any of us have them. Would be nice to know how they compare to the Nvidia and AMD GPUs we normally use.
 
Jul 27, 2020
26,754
18,435
146
You should do some Primegrid and other BOINC GPU Projects for us on those GPUs.
I could do a trial run for comparison for a few hours. Can't keep them on all the time going full steam coz I don't want the landlord freaking out at the increased electricity bill. What's the easiest one to get into with the least setup effort involved? My guess is that the performance will be between 3060 12GB and 3060 Ti. Any higher and that would be a surprise.

Also, my normal internet hasn't been working for the past week. The guy who owns the router went somewhere for a few days and don't know what he did (maybe turned off the router? Says limited connection and no internet). Been using a sorry excuse as an alternative but the speed is atrocious. The internet situation should get resolved in a few days hopefully.
 

Skillz

Golden Member
Feb 14, 2014
1,149
1,165
136
I'd just wait till the Internet situation gets resolved first,

But sign up at www.primegrid.com and run a few tasks of GFN18, GFN19, GFN20 and GFN21.

Let me know when you sign up for it and get it going. If you can, join our discord so I can (or one of the other team members) can walk you though how to run those specific projects in a way that you don't download 1000s of tasks at once.

If you can't, I'll try to get a detailed tutorial setup with screenshots on what settings you want.
 
Jul 27, 2020
26,754
18,435
146
I'll try signing up and downloading the client when my internet is back to normal. If any issues, I will ask in this thread.

I try to avoid Discord. Not a fan of its UI (and annoying login issues).
 
  • Like
Reactions: Skillz

Skillz

Golden Member
Feb 14, 2014
1,149
1,165
136
Go here to configure what tasks you download:
Your Account ..> PrimeGrid preferences



1743558443061.png

Here, you want to keep resource share at 0 (or blank like the screenshot)
Turn 'Use CPU' to no
Turn ATI/AMD GPU, Nvidia GPU, Apple M-series... GPU, to no. (Optional, they're not in the system so it wont matter really.

1743558536157.png

Make sure you fill this out correctly. If the testing does happen to turn out to be a prime you'll get an entry on the top 5000 primes web site. (optional, but recommended)

1743558584350.png

In this screenshot you see I am running PPS and PPSE on CPUs. What you want is to make sure ALL the CPU columns are blank/empty.
Check GFN18 and leave everything blank. Make sure you choose Intel ARC GPU on it. Then when you attach the project it will download a GFN18 task. Let this task run. **
When that task is done. Go back to that page and uncheck GFN18 and select GFN19.
Keep repeating this process for the GFNs.


**There are two types of tasks.
Main tasks - These are the tasks that we want. They run the longest and do the actual prime checking.
Proof tasks - These tasks do not matter. They're identifiable by a "c" in the name behind the long string of numbers and they run really short.

Make sure you are running at least one main task to completion.

Post a link to your host here so myself or one of the other team members can get the tasks you run so we can compare that exact task on other hardware.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
27,122
16,032
136
No config changes for CPU ??

Edit: I see you are testing the GPUs. sorry
 
Jul 27, 2020
26,754
18,435
146
I did all the settings. Some perplexing questions:

There is no account login option in BOINC manager.

It does show my username correctly but it started doing this Serpienski Sieve thing and only one GPU is being used with 1.2GB VRAM utilized. Something seems off I think.
 
Jul 27, 2020
26,754
18,435
146
1743560623626.png

I think SR5 was the one running which I suspended. I can't see it on the Project Preferences page to disable it.