Forum Nav: suggestion for fastest small size GPU needed

froky

Member
Jul 19, 2015
59
0
0
I'm working on a project for what I will need some advice on what GPU to use. You can read about the project here: http://www.fpgaarcade.com/punbb/viewtopic.php?id=483
Basically I need a fast GPU which can render several thousand frames per second. That's the amount of frames which will be displayed by the high speed projector. On screen and from the GPU ports it will be 120 Hz 24bit 1024x768 frames. The specialized projector will decompose the 24 bit frames it gets into 1 bit frames (so in total each second 24 * 120 frames).
But because how OpenGL works the 3d scenes have to be rendered in RGB, then in shaders converted to black and white and then 24 of these joined into 1 24 bit frame again. In other words, the scene really does need to be rendered 2880 times each second.
The scenes are going to be simple and few universities have finished a similar project successfully few years ago, but we really want the fastest GPU we can afford to get as much detail as we can in realtime.
Final requirement: the device design is small, so we want to use mini ATX/ITX and a GPU which can fit in such cases.
Right now I'm only aware of the GTX 970 Mini GPU. If there are others in this form factor with similar or better performance, please let me know.
 

Ancalagon44

Diamond Member
Feb 17, 2010
3,274
202
106
A 750 Ti is fairly small and power efficient. You can also wait for the Radeon R9 Fury Nano, which is supposed to be pretty small.

What is the complexity of each frame? I mean, rendering something 2880 times per second is easy if the scene is simple.
 

Piroko

Senior member
Jan 10, 2013
905
79
91
Most mATX cases and some ITX cases can fit reasonably large GPUs actually, here's an example video (ignore the heat comparisons though). So, a GTX980Ti with stock cooler or a Fury X would be the limit.
 

froky

Member
Jul 19, 2015
59
0
0
Hm, okay, I should have been clearer: we are not going to use an actual ITX case, but put everything inside the case of our device, similar to this one: http://i.imgur.com/JPog21E.jpg but smaller.

The case is going to have a motherboard, GPU, projector board and optics and a motor inside. Already hardly fits, long GPUs won't have space for them.

However, the 750TI seems similar in size to 970 Mini. I'll check the difference in performance and sizes.
EDIT: Okay, its slower.
 
Last edited:

Piroko

Senior member
Jan 10, 2013
905
79
91
Then yes, the 970 mini is probably the best option. It's a bit higher than a slot/the 750TI though and you need some clearance for the fans and decent air flow, keep that in mind.
As for performance: the 970 mini is about three times the speed, at ~140W power consumption vs. ~60W.

The absolute fastest card that is reasonably small would be a Fury X, but you need quite a bit of extra room for the radiator. It's also not that much faster than a 970, unless the architecture suits your code better.
 
Last edited:

Ancalagon44

Diamond Member
Feb 17, 2010
3,274
202
106
No - nothing confirmed yet.

How much rendering power do you actually need?

Is it possible to borrow a GPU to see how it would perform, before you spend the money?

My point is, don't buy a 970 Mini unless you are sure you actually need that much power. The 750 Ti is slower, sure, but maybe it is fast enough? Using the smallest GPU possible saves you money, produces less heat and less noise.
 

froky

Member
Jul 19, 2015
59
0
0
As many polygon as possible.
If it can render 1000 polygons at 2880 frames, then its acceptable.
If it can do 5000, or higher res textures, even better. Our artist can always think of more complex and interesting interactive scenes for the presentation of the device if I can increase his minimum poly count, etc. requirement.
 
Last edited:

NTMBK

Lifer
Nov 14, 2011
10,411
5,677
136
As many polygon as possible.
If it can render 1000 polygons at 2880 frames, then its acceptable.
If it can do 5000, or higher res textures, even better. Our artist can always think of more complex and interesting interactive scenes for the presentation of the device if I can increase his minimum poly count, etc. requirement.

The higher your polycount, the more work it will be for your artist.

If you really want the fastest though, wait for benchmarks of the AMD R9 Nano.
 

Ancalagon44

Diamond Member
Feb 17, 2010
3,274
202
106
Cool, that makes sense.

In that case, go for an unlocked Intel CPU, so that you can get that clockspeed as high as possible. You can even disable some cores if you want. Single threaded performance could make a big difference.

What are you programming this in? It might be worth looking at something like Mantle - CPU overhead could cost you a lot of performance here.
 

3DVagabond

Lifer
Aug 10, 2009
11,951
204
106
[/]The higher your polycount, the more work it will be for your artist. [/b]

If you really want the fastest though, wait for benchmarks of the AMD R9 Nano.

I've spent at least as much time "optimizing" poly count as making the mesh in the first place. For game modeling you've always got one eye on the polygon count.
 

froky

Member
Jul 19, 2015
59
0
0
What are you programming this in? It might be worth looking at something like Mantle - CPU overhead could cost you a lot of performance here.
Any reasons you think there might be a CPU bottleneck here?
Scene is rendered in OpenGL. Any logic, AI, physics, particle simulation, etc. has to be done only once. The rendering is what has to be done several hundred times more. Textures, geometry, scene data have to be loaded from files once as well.
The code which has to process the frames to work with the high speed projector properly is executed on the GPU.
I'm not sure how a CPU bottleneck would arise.


The higher your polycount, the more work it will be for your artist.
Not always. Hardware limitations for artists can mean both limiting their artistic freedom and doing less work. When they have less artistic freedom, they have to try to make something impressive with the limited resources they have which isn't less effort itself. In short he'd be glad.
 
Last edited:

Ancalagon44

Diamond Member
Feb 17, 2010
3,274
202
106
Because the CPU is still involved every frame. It is in charge of sending the batches to the graphics card, driver overhead, etc etc.

If you were targeting 60 FPS, sure the CPU would not be a limiting factor. But because you are targeting 2880 FPS, you might find that your CPU suddenly becomes a limiting factor. It might become the bottleneck, even if it wasn't before.

Put it this way - 2880 FPS means you have 0.34ms to render a frame. That includes driver overhead, batching, etc etc. If your CPU takes .20ms to render a frame - which ordinarily would be extremely quick - that leaves less than half of that time for the GPU to actually render the frame.

EDIT:

NTMBK isn't talking about hardware limitations. He is talking about high poly count - two very different things. Higher poly count normally requires more powerful hardware, but powerful hardware does not mean you need to use a higher poly count.
 

froky

Member
Jul 19, 2015
59
0
0
Because the CPU is still involved every frame. It is in charge of sending the batches to the graphics card, driver overhead, etc etc.
...
Put it this way - 2880 FPS means you have 0.34ms to render a frame. That includes driver overhead, batching, etc etc. If your CPU takes .20ms to render a frame - which ordinarily would be extremely quick - that leaves less than half of that time for the GPU to actually render the frame.
This is true. Some things have to be turned off or set to not be ran every frame. For example culling definitely should be turned off as culling 2880 times will be overkill. I'll need to think some more about this.

NTMBK isn't talking about hardware limitations. He is talking about high poly count - two very different things. Higher poly count normally requires more powerful hardware, but powerful hardware does not mean you need to use a higher poly count.
What?
He said using more polys requires the artist to do more work. I said that's not always true and gave reasons why.
I said if we can use more complex scenes geometry-wise, we will. We know we don't have to, that's very basic stuff which if we didn't know we wouldn't be qualified for a project like this. My point is we want to create as detailed of content for presentation of the device as we can, and the content creators share this view.
 
Last edited:

SystemVipers

Member
May 18, 2013
162
171
116
this NANO looks killer for small boxes

The AMD Radeon R9 Nano was announced during the live-stream even last month. The card will be a small form factor Fiji based solution with 4 GB HBM and air based cooling.

The product would be 19cm in length, offers 2x the perf of the 290X at half the power - it is now confirmed launching in August.

AMD confirmed this info during its quarter result conference call. It will be the 3rd product deriving from the Fiji range GPUs and follows a mITX form factor. The spec for this product have not been released just yet but expect, hoqwever the it'll be a cutdown Fiji GPU, the original design has 64 ROPs, 256 texture mapping units 64 GCN units with 4096 stream processors.
 

Ken g6

Programming Moderator, Elite Member
Moderator
Dec 11, 1999
16,632
4,561
75
I don't suppose there's anywhere to put a 120mm exhaust fan on that thing? Maybe give it feet and exhaust through the bottom? If so, you could try a full R9 Fury X in there - if you can get the water cooling lines to wrap around properly.
 

Headfoot

Diamond Member
Feb 28, 2008
4,444
641
126
Yeah the Fury X is small but has an attached water cooler. Faster than a gtx 970 mini by quite a bit.

If you end up using any OpenCL code, AMD has traditionally been faster at that than nVidia and that has held true for the 970 and Fury cards.
 

Azix

Golden Member
Apr 18, 2014
1,438
67
91
Fury X if you can find a spot for the cooler. Otherwise the cooler could be removed in favor of some air cooler and the GPU downclocked to where the performance is reasonable. Fury Nano. You probably should check out mantle. AMD might even get involved. They aren't pushing it for games anymore but seem to target other applications.

970 mini
 

n0x1ous

Platinum Member
Sep 9, 2010
2,574
252
126
.

What,? who told you this? So you think the Nano will be as fast as a gtx980ti overclocked while using 140 watts? That's crazy!

its actually twice the performance per watt and "faster" than 290x. Likely a downclocked fiji that is slightly faster than 290x
 

Elixer

Lifer
May 7, 2002
10,371
762
126
On screen and from the GPU ports it will be 120 Hz 24bit 1024x768 frames. The specialized projector will decompose the 24 bit frames it gets into 1 bit frames (so in total each second 24 * 120 frames).
But because how OpenGL works the 3d scenes have to be rendered in RGB, then in shaders converted to black and white and then 24 of these joined into 1 24 bit frame again. In other words, the scene really does need to be rendered 2880 times each second.
I would wait for Nano as well, you need memory bandwidth from what I am reading, and that should cover you.

However, let me see if I understand this, you have a 24 bit RGB 1024x768 image source, and then you need to turn that into 24 monochrome images, then that gets converted back to 24bit RGB by the projector to display it, right?
If that is the case, then your way seems vastly inefficient. Instead, you can convert the 24bit image into 6 RGBA textures and ping-pong them out as needed. You can also do double/triple buffer to keep feeding the GPU as well for more performance. Hope that makes sense. :)
 
Aug 11, 2008
10,451
642
126
.

What,? who told you this? So you think the Nano will be as fast as a gtx980ti overclocked while using 140 watts? That's crazy!

Yea, I caught that too. If I recall correctly, it was supposed to have the performance of 290x at half the power. It has supposedly twice the performance per watt, not twice the absolute performance. I dont think even full power Fury X is twice as fast as 290X, except maybe in some artificial bench.

Edit: but I agree, it might be worth waiting to see how the Fury nano turns out. OTOH, with HBM it may be expensive and in short supply, or even delayed.
 
Last edited:

froky

Member
Jul 19, 2015
59
0
0
However, let me see if I understand this, you have a 24 bit RGB 1024x768 image source, and then you need to turn that into 24 monochrome images, then that gets converted back to 24bit RGB by the projector to display it, right?
No. The projector displays monochrome images at 2880 Hz refresh rate. The only way to stream such high amount of data is via its HDMI port. Because of how HDMI works, I can't just stream 2880 monochrome frames each second from my PC with HDMI, it has to be 24bit frames and no more than 120 Hz. So the trick (or the only way) to do it is to have monochrome frames combined into 24bit frames so I can send that instead and overcome the 120 Hz limit. The projector hardware does the decoding back into 24 bit frames instead. That's how the projector hardware works and I can't change it.
Inefficient, yes, but the only way.
 
Last edited:

Pottuvoi

Senior member
Apr 16, 2012
416
2
81
You can also render images into a big atlas, there is no need for 3D texture.

Render to big enough buffer to fit all 24 images at the time (ie 4096*4608) and then use shader to combine the result into the final 24bit texture. (preferably with some sort of dithering.)

I'm sure there are ways to have such a simple rendering very efficient as all 24 images share identical object.
You can render 24 instances of the object into same buffer with slightly different rotation.. etc.

You might want to test using MSAA to get as good as possible image quality as well, most cards have more than enough fillrate to make it work.