Workstation Graphics Cards worth it?

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Apr 20, 2008
10,067
990
126
Are you saying that consumer cards ship with known defects in the SPs? Considering how many CUDA apps there are, and GPGPU in general, that doesn't seem wise, nor correct.

Does it explain why badaboom and other GPU encoding apps look so crappy? I mean they are just doing millions of calculations, yet CPU encoding looks the same (great) while GPU encoding looks blurry and grainy.

Or maybe why F&H requires many if the same data sets to be completed by both CPU and GPU?
 

Voo

Golden Member
Feb 27, 2009
1,684
0
76
Does it explain why badaboom and other GPU encoding apps look so crappy? I mean they are just doing millions of calculations, yet CPU encoding looks the same (great) while GPU encoding looks blurry and grainy.
They just aren't using the same algorithms and filters.
 

Scali

Banned
Dec 3, 2004
2,495
1
0
I think you have to establish what software you're using first.
A lot of Autodesk software can use D3D these days.
In the case of D3D there is no difference at all between workstation and consumer cards/drivers. The D3D standard just has a pretty rigid definition of how cards should render things. So I think in D3D mode, there is pretty much 0 difference between workstation and consumer cards.

OpenGL is slightly less strict, and then you see differences (mostly in drivers) between workstations and consumer cards. Quite often there was a workaround to 'soft-mod' a consumer card to a workstation card, so the driver would think it ran on a workstation card, and enable the extra functionality (eg antialiased lines, different performance optimizations for CAD software etc).
I haven't heard about soft-modding in a while, so perhaps it is no longer possible (or at least, people haven't figured out how to do it). But yes, there is a difference, albeit mostly artificial.

You can use consumer cards just fine in most CAD/professional software, as baseline performance is very good anyway. Whether you think the extra performance/features are worth the huge price premium of a workstation card, is up to you (some of the lower end Quadros aren't actually very fast, and would probably be beaten by equally priced consumer cards in performance).
 

Modelworks

Lifer
Feb 22, 2007
16,240
7
76
When you say CAD you need to be more specific as to what program and version. Also what kind of technical ability does the person who will be using the pc have ?

The thing that workstation level cards get you is support. Support from the cards manufacturer and support from the software company. Companies like autodesk spend a ton of time testing their applications with specific hardware so that people can buy those exact setups and be assured that they will not have problems.

If while drawing out a hidden line your user starts seeing the wrong line pattern on screen is he going to be able to find the cause of the problem if it is in the driver for the gaming card ? Will he have the time to stop work and find out ?

With a professional card he can pick up the phone and call support and they will walk him through the problem and how to fix it. Call support and tell them you are using a gaming card and they will refer you to the qualified hardware list.

The problem with gaming cards is 3d apps are not tested with every driver release . The current driver may work fine with CAD , you upgrade to the next version of the driver, and suddenly you have line drawing problems because they changed the driver to fix a game compatibility. I have seen it with 3dsmax more than a few times . If you do go with a gaming card, find the driver version that works and do not upgrade it just because a new version comes out. Most people that run workstations rarely if ever update drivers or other software because the risk of breaking what already works is too costly.

While some gaming-based graphics hardware will work with the 3D display features in AutoCAD, they are neither optimized for optimal performance nor supported by the graphics hardware vendor for CAD use. If you encounter problems, Autodesk will not be able to obtain support from the graphics hardware vendor to fix them.

If you have the ability and the time to fix things when they break then going with a gaming card is next to no difference in pro apps most of the time. If you don't have the ability then get a professional card.
 
Last edited:

Sahakiel

Golden Member
Oct 19, 2001
1,746
0
86
Ahm no, sure there may be some bugs in one driver that aren't in the other, but better accuracy? If the architecture doesn't support dp fp, no driver in the whole world will change anything and if it does support it, you get exactly that accuracy.

The only reason why thoes drivers are better is, that they're better optimized for those application areas, so they're faster, but you won't loose magically a few bit precission just because you switch from a workstation card to a gaming card of the same architecture.
Ati presumably bins the workstation cards exactly like their gaming pieces, if they reach certain performance goals at certain voltage levels and don't produce any errors they can sell them, otherwise not. The same chips with another name on it, but if a card produces errors they won't sell it as a gaming nor a workstation card.


But as 3DVagabond showed, the drivers really affect the performance, so if you want hw support and don't want to play with hacked BIOSes (you also don't OC servers although you could save money there..), that's probably worth the money.

There's a big difference between hardware capability and how you use it, especially with something as imprecise as floating point numbers. If precision were a non-issue, then why do some games exhibit graphical glitches until a driver fix? For games, it's mainly an annoyance, though occasionally you get something game-breaking. For workstations, it's simply unacceptable.

The biggest issue with floating point calculation has always been precision. You literally cannot provide 100% accurate calculations when simply storing the number in the first place limits accuracy. Start executing operations and you lose a bit here and there with each step. With floating point, standards are in place to ensure minimum precision and software works around those standards. Drivers are essentially software that translate standard API calls into hardware-specific instructions. That's the whole reason they exist.

With games, nobody really cares if the precision is off slightly. Your line might be off a couple pixels, your color off by a few bits, but you don't really notice for the most part. Gaming cards and drivers are optimized for performance with good enough precision. As long as nothing is noticeably wrong, you don't need to fix it, or you at least don't need to prioritize a fix.
With workstation apps, the expectations are higher, which is why support is better in the first place. Something may not be obviously wrong, at least when displayed on screen, but if you start doing bit comparisons with your expected output, problems crop up. For some applications, that usually means when you port the result to the end of your workflow, you end up with something that's not exactly what was intended. For companies that care, that means you send everything back to the beginning and figure out what went wrong. For others, it's how you get products of inconsistent quality. That's why drivers are different, to provide better stability, compatibility, and, yes, precision. Performance is secondary because it doesn't matter if your end result is completely off, completely unusable, or you're unable to even get there.

There is no single method of executing every API function, especially since
each GPU almost certainly uses an incompatible ISA from the last chip generation. With a modern GPU, there are also so many semi-independent cores that you're essentially looking at a many-core computation where a myriad of possible issues resulting from something as simple as issuing instructions in a certain order or issuing the wrong one where you should probably split it into several others.
Again, with games, you don't really care about precision, so you go with the simplest, fastest, set of instructions. You don't really care about the hardware limitations unless something becomes obviously incorrect. You may even not bother with conforming exactly to standard if you find an execution path that's significantly faster, but only a few bits off. With workstation applications, you usually do, depending on what you're doing, so even if it's slower, even if it's quirky, even if it requires a unique workaround, you don't simply use the same path as a card designed for gaming. Sometimes you might, but more likely you won't. Speed comes later.

For someone that doesn't care to go that far, someone just starting, or working on a hobby, workstation class equipment is almost certainly not cost-effective. You get better precision, you get better reliability, you get better support, but you may not need it. For those who care, the extra money is worth the cost.
 
Last edited:

Voo

Golden Member
Feb 27, 2009
1,684
0
76
And still, if the hardware supports 52bits precision (i.e. standard IEEE dp fp) you'll get that and nothing else. It's not as if the driver would just throw away a few bits for the gaming variants.

Sure there are some algorithms for special occasions, where you can improve the overall accuracy and that's something you'll only find in the workstation drivers (actually I very much doubt that they do that stuff in the drivers, that's something the programmer should take care of sine it's at the algorithm level and has nothing to do with the HW), but the accuracy itself stays the same.
No idea what you mean with different execution paths.. do you really think that the HW has different execution paths for sp/dp fp and so on? I very much doubt that. You can use different algorithms but that's again something on the application level (how you compile it and against what libs you link..) and not stuff for the drivers..

And you're sure they don't write those APIs in higher level languages and write compilers for the different architectures and optimize afterwards? After all they need those in either case.
Sure there will be more errors in the gaming drivers since they don't test them that thorough and the obvious performance deltas, but it's not as if the gaming variants would somehow magically lose precission.
 

Sahakiel

Golden Member
Oct 19, 2001
1,746
0
86
And still, if the hardware supports 52bits precision (i.e. standard IEEE dp fp) you'll get that and nothing else. It's not as if the driver would just throw away a few bits for the gaming variants.

Quite possibly, yes. I can't possibly know everything that goes on at that level, but I wouldn't find it surprising to find out some operations carried out in another order will be faster, but lose a couple extra bits here and there.
Also, the obvious source of any errors lie during the output stages. Any data that isn't going anywhere except display is more tolerant of calculation errors or even simple copy errors. The standards for consumer grade parts is much lower. Workstation grade or better is generally more reliable in terms data integrity. That's part of the reason for a higher price tag along with the better support (supposedly).
It's the same concept as building ECC capability into memory controllers specifically for workstation/server parts, building them into a general design and sorting based on which sections are functional, or simply taking the same entire design and binning the good chips for the workstations.
Even if the better chips can probably run fine at much higher clocks (as overclockers are fond of pointing out), the reason for clocking them at equivalent speeds or even lower than the same chip binned for consumers is that errors are less likely to occur at lower speeds. Every processor eventually calculates an erroneous result, the only question is how long between errors on average. Consumers are more tolerant of the occasional glitch or blue screen. With workstation/server level hardware, the extra precautions significantly reduces the chance of errors and some users are willing to pay for that reliability.

As far as I know, GPU's haven't followed server processors into adding failover into the design. There probably just isn't enough die space to add the necessary functionality without significantly crippling performance. Some server-specific CPU's executed the same instructions in duplicate to ensure integrity. Some would even shut down a specific pipeline on the fly if it determined it was producing errors. Naturally, those types of features added so much cost the designs didn't fare as well in the long run.

Sure there are some algorithms for special occasions, where you can improve the overall accuracy and that's something you'll only find in the workstation drivers (actually I very much doubt that they do that stuff in the drivers, that's something the programmer should take care of sine it's at the algorithm level and has nothing to do with the HW), but the accuracy itself stays the same.
No idea what you mean with different execution paths.. do you really think that the HW has different execution paths for sp/dp fp and so on? I very much doubt that. You can use different algorithms but that's again something on the application level (how you compile it and against what libs you link..) and not stuff for the drivers..

And you're sure they don't write those APIs in higher level languages and write compilers for the different architectures and optimize afterwards? After all they need those in either case.

Yes, that's essentially how a graphics driver works these days. They are just in time compilers translating API calls into binary code. Your drivers take API calls and compiles at runtime into machine code. No software company compiles down to binary. They basically meet halfway and the GPU manufacturer takes it from there. That's the whole point of an API in the first place.

As for different execution paths, that's due primarily to different GPU architecture. A GPU isn't quite the same as a CPU, not yet, so you can't treat, say, a Radeon HD5870 chip as 1600 really small but really fast independent processors. Each processing unit is probably better compared to an execution unit in a modern CPU, though that's really hard to say with the latest generations. At the same time, you can't call a Radeon HD5870 a 1600 issue CPU. The closest analogy is probably 20 really wide VLIW CPU's. Don't know for sure, I'm getting already in a topic where I know very little.
So while a single instruction more than likely passes through only one stream processor at a time, the result is not guaranteed to stay there. For all we know, data that requires multiple passes through one processor are simply handed off to another processor assembly-line style. You're also not guaranteed to have each atomic step in your algorithm or even your API call translate into one GPU instruction. I'm certain each step down breaks into multiples.
One instruction that would normally go to one execution unit on a modern CPU may end up getting split among multiple stream processors or recycled on the same one. Which order you execute operations can affect the end result for floating point. It may also affect performance, though the idea is to flood each processor so that order doesn't matter.
Anyway, since the driver is the one doing the actual talking, you have only the assurances of the GPU vendor that your code will draw exactly. There's a lot of data that gets sent off for crunching but you don't bother asking it back to check for integrity.

Optimization always involves moving, deleting, or even adding instructions. Compilers have options you can toggle that will vary the resulting binary code, changing the order of operations or presence of error checks. The tightest optimizations result in less code and less checks, but you're almost guaranteed to have boundary or round-off errors. Usually, you find a middle ground that's reasonably stable, then work out the bugs as you find them.

Whether or not Nvidia or AMD really do purposely lower precision for gaming card drivers, I don't know. I don't work for either of them. Still, it's not entirely outside the realm of probability. It's also not entirely outside the realm of probability that both drivers are essentially the same, with just a few checks here and there querying the GPU for a model string.
However, judging by the performance variations in certain applications and games with the same chip, one marked for gaming one marked for workstation, it's safe to assume the code is significantly different.
Lowering precision slightly is simply one possible method to speed up code for an application that probably doesn't care. Another possibility is that the tweaks meant to speed up specific mixes of API calls result in optimizations that can't be shared with a significantly different mix.
If precision was the primary difference between both drivers, then a gaming card would outperform a workstation card in all applications, though produce errors in all of them. That is not the case, so it is more likely that manufacturers follow the common sense approach of simply optimizing for performance in specific applications. For workstation cards, precision is a careful consideration while optimizing whereas in gaming cards, it's simply a reasonable necessity.

Sure there will be more errors in the gaming drivers since they don't test them that thorough and the obvious performance deltas, but it's not as if the gaming variants would somehow magically lose precission.

That's pretty much what's been reiterated in this thread. Gaming cards are not guaranteed for more than reasonable operation and very likely produce more errors than workstation cards. Top to bottom and in between, gaming cards are more tolerant of errors even in key areas where it really is necessary no matter what you're running. It's simply not a consideration unless it produces glaring problems.
Buying a workstation class card even though it uses the same chip design essentially eliminates the possibility of purposeful imprecision and provides an understanding that accidental imprecision aside from standard tolerances will be fixed and not ignored if nobody really notices. Again, for some users, it's worth the extra cost for peace of mind and guaranteed results. Some applications will also query cards for the appropriate model type. For most people, it's probably not worth it.
 

Scali

Banned
Dec 3, 2004
2,495
1
0
As far as I know, GPU's haven't followed server processors into adding failover into the design. There probably just isn't enough die space to add the necessary functionality without significantly crippling performance.

Fermi has single-error correction and double-error detection.
Although I don't know if that's enabled on all models, including the consumer ones.
 

Voo

Golden Member
Feb 27, 2009
1,684
0
76
Whether or not Nvidia or AMD really do purposely lower precision for gaming card drivers, I don't know. I don't work for either of them. Still, it's not entirely outside the realm of probability.
And that's the only thing I disagree here. Lower performance? Yep. More bugs? Probably. But there's no way for a GPU to just compute the first 20bits of precission and then stop - so precission is really the only thing you just get from your hardware. Since both Nvidia and Ati follow the IEEE specs more or less so you'll get exactly the results that've been specified.

Sure you can use less stricter algorithms, but that's not something the driver does, that depends on what other libraries you link and how you compile the code.

Anyway I don't think that point is important, we should stop here..
 

Scali

Banned
Dec 3, 2004
2,495
1
0
Sure you can use less stricter algorithms, but that's not something the driver does, that depends on what other libraries you link and how you compile the code.

Well, the code is compiled by the driver, so it could cut some corners here and there (eg, it may replace expensive operations such as sqrt/trigonometry with fast partial approximations, you can do the same with SSE on a CPU, for example. Various compilers can also compile floating point code either for maximum speed, or for maximum strictness).
However, I think in most cases, visual rendering bugs are exactly that: bugs. Not a result of deliberate shortcuts on precision. Just unexpected results in unexpected situations.
Because generally the margin of error is very small with these sort of compiler tricks... and at the end of the day, RGB only has 8-bit to 10-bit precision on your screen. Even single-precision floats have 24 bit precision, so you need to lose a lot of bits of accuracy along the way, before you can see clear flickering and other obvious visual bugs.

Even back when GPUs only had 16-bit mantissas (Radeon 9x00 series), you never really saw such issues. The difference between 16-bit and 24-bit GPUs is generally almost impossible to notice.
 

Voo

Golden Member
Feb 27, 2009
1,684
0
76
Well, the code is compiled by the driver, so it could cut some corners here and there (eg, it may replace expensive operations such as sqrt/trigonometry with fast partial approximations, you can do the same with SSE on a CPU, for example. Various compilers can also compile floating point code either for maximum speed, or for maximum strictness).
Sure but that's the compiler. From my point of understanding (I only played a bit with Cuda so that's hardly complete..), you compile the program into some kind of universal machine language that the driver translates into the actual machine language that is useable by the card, which sounds more like an interpreter than compiler to me and also happens at runtime.


Sure they could replace code with a faster, "good enough" approximation, but all in all, isn't that exactly what compilers are for? I'd assume some game optimizations work that way, but I'd hope they work more on stuff a compiler can't do - after all if someone compiles stuff with a strict-math flag you'd hope they have a reason for that.
 

Scali

Banned
Dec 3, 2004
2,495
1
0
Sure but that's the compiler. From my point of understanding (I only played a bit with Cuda so that's hardly complete..), you compile the program into some kind of universal machine language that the driver translates into the actual machine language that is useable by the card, which sounds more like an interpreter than compiler to me and also happens at runtime.

No, that's called a compiler aswell.
It's similar to how Java/.NET work. The 'binary' that is generated at compile-time is universal bytecode. The actual native code is compiled just-in-time by the 'virtual machine' (in this case, the display driver).
This is where certain optimizations/replacements can occur, or even complete shader replacement, if the application is known beforehand.

Sure they could replace code with a faster, "good enough" approximation, but all in all, isn't that exactly what compilers are for?

No, the universal bytecode is just a compact way to store the program code for final compilation.
Certain instructions in the bytecode may not exist in hardware at all, and need to be replaced by the driver with the proper functionality. Eg, in the old days there was no actual implementation of sin/cos. This was a valid instruction in D3D shaders, but it was replaced by a simple taylor polynomial approximation by the driver. Obviously the driver will have a bit of control on the accuracy of the approximation. It also has a bit of control on where to insert the instructions exactly.
 

Voo

Golden Member
Feb 27, 2009
1,684
0
76
Thanks for the explanation - yep makes sense from that point of view..
So the DX (and Co) specs are loose enough, to allow enough different kind of implementations, so you can't optimize even those general things beforehand.
 

Scali

Banned
Dec 3, 2004
2,495
1
0
Thanks for the explanation - yep makes sense from that point of view..
So the DX (and Co) specs are loose enough, to allow enough different kind of implementations, so you can't optimize even those general things beforehand.

Yes, instruction-level optimization with shaders is pretty much impossible.
You can do general optimizations, such as picking algorithms to use operations that will generally be the cheapest... and trying to get a good balance between texture and arithmetic workload... but other than that, it depends a lot on the underlying hardware and how good the driver's compiler is.

Sometimes it can be downright weird.
For example, on my Intel IGP, I have shaders which are faster when I use D3D9 than when I use D3D10/11. They are the exact same shaders (literally, same sourcecode files passed to the compiler), but apparently the D3D9 compiler/driver is better optimized.
On nVidia and AMD hardware, I don't notice that difference.