Question Necessity of letting GPUs "breathe"? (Dual RX 5700 XT "Raw II" from XFX, top card has VRAM temps near or at 100C!)

VirtualLarry

No Lifer
Aug 25, 2001
56,339
10,044
126
So, I'm trying to figure this out, whether one card is way out of spec, or what's going on here. (Bad thermal pads on VRAM?)

Two identical (?) brand-new cards. Installed into a Z170 / G4560 / 2x8GB DDR4-2400 Pro4S desktop ATX board, with two PCI-E x16 triple-slot spaced slots.

Turns out that the XFX "Raw II" RX 5700 XT (which supposedly have "fixed" or "better" cooling than the DD ("Double Dissapation") model), are in effect, triple-slot cards on their own.

Because when I installed them, they have VERY LITTLE clearance between the two, for air to get into the dual fans.

Top card, has GPU temps of like 72C, VRAM temps of 96C (with a small Rosewill external USB fan shoved in front of both cards, mostly the top card), or 102C (without additional fan), while mining.

The bottom GPU, is running nice and cool at 54C, and VRAM temps of like 82C.

Also, the top card is apparently taking 105W, and the bottom card is only taking 80W, at the same undervolt settings (1350Mhz, 860mV), and Power Limit settings (-20%). That seems like a big disparity to me, more than just the temp difference would entail. (Hotter chips take more power, I know this.)

Is this RMA-worthy? Or "take apart GPU cooler and re-paste and re-apply thermal pads"? Or just "it's normal, par for the course"? Or "XFX has crap Q.C."?

It should be noted that it's possible that the temp difference on the VRAM is also due to the top GPU handling the Windows Desktop, and maybe it would equal out things if I plugged the HDMI into the bottom-most card for output. Maybe I'll try that, or run the display output off of the Intel onboard, so that the GPUs only have to handle the mining load. Also, mining ETH is mostly a load on the memory, not the cores. I DO NOT have the VRAM overclocked, on either card (1750Mhz stock). Not with those VRAM temps.

Edit: Important! This PC is NOT in a cubby, and the side-panel is removed.
 
Last edited:

Mopetar

Diamond Member
Jan 31, 2011
7,837
5,992
136
What happens if you swap the two cards? I'd try that to see if anything changes which would help you figure out if it's just the card itself, or the arrangement impacting the cooling ability.
 
  • Like
Reactions: GodisanAtheist

GodisanAtheist

Diamond Member
Nov 16, 2006
6,808
7,163
136
I feel like the answer is "Of course card placement is going to have an effect on card temps and performance" when the top card is actually helping cool the bottom card (by pulling hot air off the back) while also trying to cool itself.

No such thing as a free lunch, as your top card has discovered.

I agree with @Mopetar though: some basic de rigueur is definitely in order by swapping the cards and cross checking the resulting behavior.
 
Mar 11, 2004
23,074
5,557
146
Probably just isn't as good of a chip (seen in the higher power use to get similar clocks). And don't you have that backwards, that chips using more power are hotter chips? Obviously cooling also plays a role so yes, blocking the airflow while also being the higher card (heat rises after all) will definitely have an impact. But it seems likely it just isn't as good for your use.

As far as RMA goes, that'd depend on spec. Is the card able to run at the rated specs without issue? If not, then yeah probably something up and you might see about RMAing it, although because of your use I would say that's probably not a good idea (since you're inhibiting its cooling). If it does, then you got what you paid for. Now, you might check if you can remove the cooler and check the heatsink/thermal pads/TIM without voiding the warranty and then give it a look.
 

VirtualLarry

No Lifer
Aug 25, 2001
56,339
10,044
126

Hmm, RedPandaMining got a "Code 43" on his Radeon VII card, and he claims that it was "bricked".

I actually GOT a "Code 43" after a few hours, after the PC spontaneously rebooted somehow, it was waiting for me at the desktop, but the mining software was no longer loaded or running. One (the non-display card, so the COOLER, bottom card, had the Code 43.)

But I was able to "Disable", then re-"Enable" them in Device Manager, and the Code 43 went away. But then I rebooted the PC (without mining), and the Code 43 came back.

So I noticed that I had a BIOS update, from 7.30 to 7.50 on my ASRock Z170 Pro4S ATX mobo. I flashed it, and re-installed the newest Adrenaline 2020 drivers, and so far, it seems to have kept the "Code 43" away.

Also changed the Win10 Virtual Memory limit to 32768 (MB), or 32GB. That might actually not be high enough, because 16GB RAM + 2x 8GB GPUs, well, not much room for actual extra virtual memory.

Also, I have another one of these boards running 2x RX 5700XT cards as well (Asus), with no real issues.

I do suspect the PSU in this rig, though, too, because of the reboot, and the fact that when I loaded the mining software and started benchmarking, without tweaking the cards first, I got a HARD REBOOT.

It's a RaidMax 735W "Thunder"(?), with 2x PCI-E modular leads, each with 2x 6+2-pin connectors, which is enough for both cards, but the PSU and the whole PC was in storage for over a year, and the PSU is probably nearly 4-5 years old at this point, and I don't know if it could even deliver 735W continuous.

I have some 650W 80Plus Gold Rosewill Photon PSUs, hopefully those will be enough for both of my 2x RX 5700XT rigs.
 

VirtualLarry

No Lifer
Aug 25, 2001
56,339
10,044
126

He fixed the thermal-throttling (VRAM-related?) on his RTX 3080 (Gigabyte OC), with some Thermal Grizzly Thermal Pads ("Minus 8", 3mm).

I'm thinking that, besides/after card-swapping, if the problem follows the card, that perhaps I should investigate replacing the thermal pads on my "problem GPU", with the 102C+ VRAM temps. It's kind of crazy. Cranking up fanspeed doesn't seem to affect VRAM temps that well, so perhaps something is messed up with their application, or XFX didn't care to put proper pads on or there's a gap or something.

PS. @ 99.97MH/sec, the RTX 3080 seems like a decent card for mining now, watch out! That's what I get with TWO RX 5700XT cards in one PC, and I think that, properly-tuned, a single RTX 3080 actually draws less power.
 

Accord99

Platinum Member
Jul 2, 2001
2,259
172
106
Also, the top card is apparently taking 105W, and the bottom card is only taking 80W, at the same undervolt settings (1350Mhz, 860mV), and Power Limit settings (-20%). That seems like a big disparity to me, more than just the temp difference would entail. (Hotter chips take more power, I know this.)
While the power readings for Nvidia cards are quite accurate, the AMD ones are notoriously bad. I have a 5700 that reports 73W but uses more power at the wall than a 5700 XT that reports 89W and almost as much as another 5700 XT that reports 118W.

On a related note, if you aren't after the highest hash rates and can settle for something in the 50-51 Mhash/s range, you can underclock/undervolt the 5700s even further. At 1150 MHz, they can still do about 50.4.
 
  • Like
Reactions: VirtualLarry

VirtualLarry

No Lifer
Aug 25, 2001
56,339
10,044
126
I fixed the "Code 43" problems, that started popping up again.

The PC that these two identical-model cards are running in, was pulled from my "warehouse", after nearly a year or so in storage. It was dusty, and the PSU in there, was a 4-5 year old Raidmax 735W "Lightning" or "Thunder" model or something. It had served it's purpose back in the day, but I felt that I should probably have changed-out the PSU when I pulled the PSU, yet I did not. (Lazy me, bad idea.)

So I just swapped it out now, and swapped in a brand-new Rosewill 650W Photon 80Plus Gold full-modular PSU. Now it boots and runs mining software on both cards, at stock (165W-180W) power levels. Something that the other PSU didn't do.

One other thing that I noticed, when I swapped them, which I neglected, apparently, with the Raidmax - the ASRock Z170 Pro4S has a molex plug above the first PCI-E slot, which is used for supplemental power for the slots, when using both of them. I had neglected to power that.

So whether it was plugging that PCI-E slot-power molex in, or swapping to the 80Plus Gold 650W Rosewill Photon, either way, now it seems to be running "spiffy".

I've also power-tuned it again, and raised fan-speeds, and undervolted, and even bumped up the VRAM clock, and it seems stable, will leave it running overnight to see.
 
  • Like
Reactions: Mopetar