XDMA transfers of images on the 290X calculation

BrightCandle

Diamond Member
Mar 15, 2007
4,762
0
76
Listening to the techreport podcast today they got me wondering just how much bandwidth is used over the PCI-E bus to transfer just the image data from one card to the other. Since they dropped the crossfire bridge now the final image data has to go across the bus. One reason for doing that was 4k resolutions which couldn't go over the bridge as there wasn't enough bandwidth. So I wondered just how much data are we talking about.

The PCI-E 3.0 bandwidth is 985 MB/s per lane.
PCI-E 2.0 bandwidth is 500MB/s per lane.

A 4k image with standard RGB encoding is 3840 * 2160 * 4 (32 bits is 4 bytes) = 31.64 MB

Assuming we are getting 60 fps with AFR that is 30 fps that needs transferring that comes out to 30 * 31.64 = 949.2 MB/s

So just how much data usage is that transfer of 4k images really going to take out of the bus when looking at crossfire configurations on recent platforms?

X79 IB-E PCI-E 3.0 dual crossfire 16x lanes bandwidth available is 15760 MB/s total usage is 6%

X79 SB-E PCI-E 2.0 dual crossfire 16x bandwidth available is 8000 MB/s total usage is 11.9%.

X79 IB-E PCI-E 3.0 triple crossfire 16x 8x 8x, 30 frames worth of images to the 16x card, 15 fps for each of the 8x. Total data transferred is 6%.

X79 SB-E PCI-E 2.0 triple crossfire 16x 8x 8x - triple crossfire - 30 frames on 16x, 15 fps on 8x, 11.9% usage

Haswell PCI 3.0 dual crossfire 16x 4x = 949.2 MB/s transferred of 3940 MB/s bandwidth is 24%
Haswell PCI 3.0 dual crossfire 8x 8x = 949.2 MB/s transferred of 7880 MB/s bandwidth is 12%

Sandy Bridge PCI-E 2.0 dual crossfire 16x 4x = 949.2 MB/s transferred of 2000 MB/s bandwidth is 47.5%
Sandy Bridge PCI-E 2.0 dual crossfire 8x 8x = 949.2 MB/s transferred of 4000 MB/s bandwidth is 23.7%

edit: Updated based on AFR sending only half the fps (duh stupid mistake on my part). Also changed it to 32 bit for each pixel because the more I think about it the more I suspect we use 11:11:10 encoding these days.
 
Last edited:

Granseth

Senior member
May 6, 2009
258
0
71
If they want to have 60fps they probably would only have to send 30 images per second to the master controller (GPU1) not 60 as GPU1 is making the other 30
 

Lonyo

Lifer
Aug 10, 2002
21,938
6
81
Currently the difference between single card x8 and x16 PCIe 3 is zero, give or take.
That means anything up to 50% use isn't an issue.

If you are running a 4k display with HD290X on a SB PCIe 2.0 system... well what?

The Crossfire bridge also only had 900MB/s of bandwidth, so 4k would already kill it and make it useless, unless they updated the bridge spec to give it more bandwidth.

Plus what was said, you're only sending half the frames.
 

Imouto

Golden Member
Jul 6, 2011
1,241
2
81
Plus what was said, you're only sending half the frames.

BrightCandle doing maths wrong again isn't really surprising.

Him making a thread to try to show AMD in a bad light neither.

Warning issued for personal attack.
-- stahlhart
 
Last edited by a moderator:

TerryMathews

Lifer
Oct 9, 1999
11,464
2
0
Listening to the techreport podcast today they got me wondering just how much bandwidth is used over the PCI-E bus to transfer just the image data from one card to the other. Since they dropped the crossfire bridge now the final image data has to go across the bus. One reason for doing that was 4k resolutions which couldn't go over the bridge as there wasn't enough bandwidth. So I wondered just how much data are we talking about.

The PCI-E 3.0 bandwidth is 985 MB/s per lane.
PCI-E 2.0 bandwidth is 500MB/s per lane.

A 4k image with standard RGB encoding is 3840 * 2160 * 3 (a byte per colour) = 23.73 MB

Assuming we are getting 60 fps that comes out to 60 * 23.73 = 1423.8 MB/s

So just how much data usage is that transfer of 4k images really going to take out of the bus when looking at crossfire configurations on recent platforms?

X79 IB-E PCI-E 3.0 dual crossfire 16x lanes bandwidth available is 15760 MB/s total usage is 9%

X79 SB-E PCI-E 2.0 dual crossfire 16x bandwidth available is 8000 MB/s total usage is 17.8%.

X79 IB-E PCI-E 3.0 triple crossfire 16x 8x 8x, 120 frames worth of images to the same card on 16x and 8x for 60 frames each. Total data transferred is 18%.

X79 SB-E PCI-E 2.0 triple crossfire 16x 8x 8x - triple crossfire - 120 frames, total data 2847.6 = 35.6% usage

Haswell PCI 3.0 dual crossfire 16x 4x = 1423.8 MB/s transferred of 3940 MB/s bandwidth is 36.1%
Haswell PCI 3.0 dual crossfire 8x 8x = 1423.8 MB/s transferred of 7880 MB/s bandwidth is 18%

Sandy Bridge PCI-E 2.0 dual crossfire 16x 4x = 1423.8 MB/s transferred of 2000 MB/s bandwidth is 71.1%!
Sandy Bridge PCI-E 2.0 dual crossfire 8x 8x = 1423.8 MB/s transferred of 4000 MB/s bandwidth is 35.6%


Some of these platforms really aren't realistic for 4k resolutions and dual/triple cards in crossfire. Even 1080p is potentially reaching problematic bandwidths since its just 1/4 of these values. The SB PCI-E 2.0 16x 4x configuration for example would still show 17.8% usage of its bandwidth on just 1080p. I don't know what percentage is too much to start impacting game performance but its an interesting theoretical treatment, anyone with a pair of these cards fancy playing with the PCI-E lanes and telling us what happens to performance?

Its also possible that they are using 32 bits for colour instead for dealing with the higher quality monitors which would increase all of the usage by a third.

AMD uses alternate frame rendering. Try again.
 

TerryMathews

Lifer
Oct 9, 1999
11,464
2
0
Wow. You both took the time to disrespect the OP, but neither offered any help or correction. I'd help but I don't really know.

A: Nowhere did I "disrespect" the OP.

B: Alternate frame rendering is pretty self-explanatory. I assumed if he could calculate PCIe bandwidth, that he could read alternate and go "Oh yeah, I need to divide by 2 or 3. Duh!"
 

Keysplayr

Elite Member
Jan 16, 2003
21,211
50
91
A: Nowhere did I "disrespect" the OP.

B: Alternate frame rendering is pretty self-explanatory. I assumed if he could calculate PCIe bandwidth, that he could read alternate and go "Oh yeah, I need to divide by 2 or 3. Duh!"

So I'm getting 110 average fps using crossfired 290s over the PCI-e bus. What happens?
 

Gikaseixas

Platinum Member
Jul 1, 2004
2,836
218
106
Keys you just feel compelled to defend OP because he's a Crossfire self-proclaimed victim. All he does is attack CF every time he sees an opportunity and that’s how your bond was born, cute

Infraction issued for thread crapping.
-- stahlhart
 
Last edited by a moderator:

Keysplayr

Elite Member
Jan 16, 2003
21,211
50
91
Keys you just feel compelled to defend OP because he's a Crossfire self-proclaimed victim. All he does is attack CF every time he sees an opportunity and that’s how your bond was born, cute

Doesn't quite help me understand XDMA transfers, but thanks for trying to help.
 

BrightCandle

Diamond Member
Mar 15, 2007
4,762
0
76
Totally correct I came up with the idea rather late and hadn't thought the calculations through. Main post has been updated with 30 fps as the base number of images transferred and I also changed it to 32 bit because I suspect we do now use 11:11:10 encoding or its much faster and cheaper to transfer that 4th byte over the bus. They probably can't just send 24 bits but I could be wrong on that one. As I looked through the Windows settings possible I found nothing other than 32 bit now. It used to be true you could set 24 bits but its just gone altogether now and I think the world has moved on. I was also realising an article I read recently that was using a 11:11:10 colour space based on luminance in the GPU pipeline and the conversion back I am pretty certain was still 32 bit. So I think I was wrong there as well and it should be 4 bytes and not 3.

From what I have seen in reviews most games (not seen any 290X reviews but assuming its basically the same) don't seem to show any real performance benefit from going from 8x/8x to 16x/16x, even on PCI-E 2.0. But once we go to 4x that is where the trade off starts to happen and the performance starts to drop a reasonable amount. The realistic necessity for gaming is somewhere in the middle between 8x and 4x. Lets say its 6x PCI-E and thus most games need 3000 MB/s for general usage.

Under any of the scenarios I have given does the XDMA transfers result in worse performance because of the image transfer extra data? I think the answer is only the PCI 2.0 8x 8x is brought close to its bandwidth limit by the XDMA transfers where it wasn't before having a problem. Its only got 1000MB/s available and almost all of it is going to the XDMA transfer. It fits but barely which means its probably going to delay critical things getting to the card and will slow it down a little.

The 16x/4x on PCI-E 2.0 becomes even less viable than it was before. Not only will you loose performance from the 4x being less than necessary to run the game but an additional lose of nearly 50% due to the image transfers! Even at 1080p this system would loose frame rate compared to running cards with bridges due to the constrained bandwidth it has.

Every other system I think is by and large unaffected by the additional transfers.
 
Last edited:

Granseth

Senior member
May 6, 2009
258
0
71
Isn't PCI-e full duplex so if the GPU has 4x to download data it will still have 4x to send?
 
Last edited:

blackened23

Diamond Member
Jul 26, 2011
8,548
2
0
I don't really get the purpose of the math here. Is it a correlation to frame pacing or smoothness? Brent Justice is working on a crossfire 290X review compared to Titan SLI specifically for 4k resolution. He states that all frame pacing and microstutter issues are solved, everything is smooth in terms of gameplay on CF 290X at 4k. As smooth as Titan SLI. If you'll remember HardOCP was the first website that brought the issues with crossfire to light, although they didn't do frame pacing - they always mentioned the smoothness of SLi in reviews. It sounds like CF is resolved in that respect on the 290X. The bigger concern is whether AMD applies this to the older generation cards, which it sounds like they have not done so at this point. (for surround, that is)

Essentially, regardless of the how it works and why along with the math seems irrelevant - it works just fine from what i've read. That said, the 780 price cuts changed the value proposition of the 290X entirely - Definitely makes the 290X a harder sell when it will obviously be louder in CF. I definitely would not opt for 290X when I could get 780 SLI for cheaper and get quieter cards to boot.
 
Last edited:

Keysplayr

Elite Member
Jan 16, 2003
21,211
50
91
Totally correct I came up with the idea rather late and hadn't thought the calculations through. Main post has been updated with 30 fps as the base number of images transferred and I also changed it to 32 bit because I suspect we do now use 11:11:10 encoding or its much faster and cheaper to transfer that 4th byte over the bus. They probably can't just send 24 bits but I could be wrong on that one. As I looked through the Windows settings possible I found nothing other than 32 bit now. It used to be true you could set 24 bits but its just gone altogether now and I think the world has moved on. I was also realising an article I read recently that was using a 11:11:10 colour space based on luminance in the GPU pipeline and the conversion back I am pretty certain was still 32 bit. So I think I was wrong there as well and it should be 4 bytes and not 3.

From what I have seen in reviews most games (not seen any 290X reviews but assuming its basically the same) don't seem to show any real performance benefit from going from 8x/8x to 16x/16x, even on PCI-E 2.0. But once we go to 4x that is where the trade off starts to happen and the performance starts to drop a reasonable amount. The realistic necessity for gaming is somewhere in the middle between 8x and 4x. Lets say its 6x PCI-E and thus most games need 3000 MB/s for general usage.

Under any of the scenarios I have given does the XDMA transfers result in worse performance because of the image transfer extra data? I think the answer is only the PCI 2.0 8x 8x is brought close to its bandwidth limit by the XDMA transfers where it wasn't before having a problem. Its only got 1000MB/s available and almost all of it is going to the XDMA transfer. It fits but barely which means its probably going to delay critical things getting to the card and will slow it down a little.

The 16x/4x on PCI-E 2.0 becomes even less viable than it was before. Not only will you loose performance from the 4x being less than necessary to run the game but an additional lose of nearly 50% due to the image transfers! Even at 1080p this system would loose frame rate compared to running cards with bridges due to the constrained bandwidth it has.

Every other system I think is by and large unaffected by the additional transfers.

Are your calculations based on the assumption that v-sync will always be used and a 30fps cap on each card?
 

Grooveriding

Diamond Member
Dec 25, 2008
9,147
1,329
126
I don't really get the purpose of the math here. Is it a correlation to frame pacing or smoothness? Brent Justice is working on a crossfire 290X review compared to Titan SLI specifically for 4k resolution. He states that all frame pacing and microstutter issues are solved, everything is smooth in terms of gameplay on CF 290X at 4k. As smooth as Titan SLI. If you'll remember HardOCP was the first website that brought the issues with crossfire to light, although they didn't do frame pacing - they always mentioned the smoothness of SLi in reviews. It sounds like CF is resolved in that respect on the 290X. The bigger concern is whether AMD applies this to the older generation cards, which it sounds like they have not done so at this point. (for surround, that is)

That said, the 780 price cuts changed the value proposition of the 290X entirely - Definitely makes the 290X a harder sell when it will obviously be louder in CF. I definitely would not opt for 290X when I could get 780 SLI for cheaper.

He does ?!?! Well none of that matter as PCPER, nvidia co-opted shill and mouthpiece says otherwise. The definitive voice of nvidia's opinion on AMD's multi-gpu solution. :biggrin:
 

blackened23

Diamond Member
Jul 26, 2011
8,548
2
0
Well, smoothness is one part of the equation. It seems 290X is working fine in crossfire by that metric. Obviously buyers have other considerations such as software, overall cost, quietness, features etc, etc. (as I mentioned in my earlier post)

Here's the thread where he commented on it:

http://hardforum.com/showthread.php?p=1040324400&highlight=#post1040324400

Performance and smoothness seems fine. Like I said, though, I still would not go for 290X crossfire. Not until the cooler situation is improved. It's really annoying that AMD got so close to creating a great 10 out of 10 GPU despite the higher power use, only to slap the same cooler that the 7970 used on it..../oh well.
 
Last edited:

Keysplayr

Elite Member
Jan 16, 2003
21,211
50
91
I don't really get the purpose of the math here. Is it a correlation to frame pacing or smoothness? Brent Justice is working on a crossfire 290X review compared to Titan SLI specifically for 4k resolution. He states that all frame pacing and microstutter issues are solved, everything is smooth in terms of gameplay on CF 290X at 4k. As smooth as Titan SLI. If you'll remember HardOCP was the first website that brought the issues with crossfire to light, although they didn't do frame pacing - they always mentioned the smoothness of SLi in reviews. It sounds like CF is resolved in that respect on the 290X. The bigger concern is whether AMD applies this to the older generation cards, which it sounds like they have not done so at this point. (for surround, that is)

That said, the 780 price cuts changed the value proposition of the 290X entirely - Definitely makes the 290X a harder sell when it will obviously be louder in CF. I definitely would not opt for 290X when I could get 780 SLI for cheaper.

Very good news! For me as well as I am planning on an aftermarket cooled 290x and possibly crossfire it. I wonder if Nvidia will follow suit or if their SLI bridge is good to go..
 

Granseth

Senior member
May 6, 2009
258
0
71
Are your calculations based on the assumption that v-sync will always be used and a 30fps cap on each card?
As I said.
Does his calculations matter as the PCI Express is Full-Duplex, ie. it can send data both ways at the same time. So the 4-lane option will only limit the maximum number of frames GPU2 can send to the GPU1
 

BrightCandle

Diamond Member
Mar 15, 2007
4,762
0
76
The other aspect I was thinking about was latency. With the crossfire bridge supporting 900MB/s of bandwidth (thanks Lonyo, unable to find a source on this one) a 4k image takes 35ms to transfer across that bridge! That is ridiculous. That also means a 1080p image takes 8.8ms to transfer. That is a lot of extra latency in both cases and I haven't really seen anyone really talk about the actual numbers before. I have seen people show graphics with a little bit of extra time but that is more than half the rendering time of the image, that is enormous. They must be putting out the pixels as soon as they have them to reduce the latency because otherwise that is kind of insane amount of extra time and would explain why its so hard to get the variance down.

Switching to XDMA and PCI-E 3.0 a 4k image takes just 2ms to transfer. A 1080p image takes 0.5ms. Its contending with game assets and the driver generally talking to the GPU but even so these are enormous improvements over the crossfire bridge.

XDMA on a PCI-E 2.0 with 8x link and a 4k image is still a big improvement over the bridge. While its a quarter of the bandwidth over all its still only 8ms to transfer a 4k image and 2ms for 1080p.

So it looks like XDMA would be a good idea for NVidia to copy, its an advancement that should dramatically improve frame timings and latency, at least in theory. The impact on bandwidth is negligible although those with 16x/4x and PCI-E 2.0 might have been good with previous cards for the most part they should likely avoid the new no bridge GPUs and upgrade their motherboards + CPUs at the same time.

Personally in the process of going through this I learnt something, hope you guys did as well.
 

TerryMathews

Lifer
Oct 9, 1999
11,464
2
0
So I'm getting 110 average fps using crossfired 290s over the PCI-e bus. What happens?

Take the OPs original figures, reduce them by 10%, and find the right scenario for you.

Or post and someone will be glad to help you. We need to know 2.0/3.0, how many cards, and interface speed (16/16,16/8,8/8)
 

BrightCandle

Diamond Member
Mar 15, 2007
4,762
0
76
As I said.
Does his calculations matter as the PCI Express is Full-Duplex, ie. it can send data both ways at the same time. So the 4-lane option will only limit the maximum number of frames GPU2 can send to the GPU1

Yes but its always in the same direction. The supporting cards are all sending to a single card and never receiving and the receiving card is always receiving and never sending, so its all just in one direction. The full duplex nature of the PCI-E bus doesn't really impact on the endpoints which are serial in design and the total bandwidth I have given for any one endpoint I believe I am correct on.
 

Granseth

Senior member
May 6, 2009
258
0
71
(...)

So it looks like XDMA would be a good idea for NVidia to copy, its an advancement that should dramatically improve frame timings and latency, at least in theory. (...)

I think I read somewhere that nVIdia already uses PCI-E link for datatransfer and the SLI-bridge for signaling/synchronization. Think it was in some x90(probably 590) review they found the PCI-e Bridge as well as the SLI bridge
 

Granseth

Senior member
May 6, 2009
258
0
71
Yes but its always in the same direction. The supporting cards are all sending to a single card and never receiving and the receiving card is always receiving and never sending, so its all just in one direction. The full duplex nature of the PCI-E bus doesn't really impact on the endpoints which are serial in design and the total bandwidth I have given for any one endpoint I believe I am correct on.
I believe you are wrong as each PCI-E line has 4 connections.
A lane is composed of two differential signaling pairs: one pair for receiving data, the other for transmitting.
http://en.wikipedia.org/wiki/PCI_Express