News Crytek CryEngine shows raytracing technology demo (runs on Radeon RX Vega 56)

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

BFG10K

Lifer
Aug 14, 2000
22,709
2,972
126
Tech demos are shown all the time as proof of concept, but often do not carry the same performance when an actual game is built on top of it.
Yep, and Port Royal falls into that exact category. Yet we have people using it as proof that Volta gets "destroyed" by RTX. Meanwhile in an actual game (BF5) we see RTX is at best 45% faster.

The easiest way to see some of the arguments above are nonsense is to consider how much R&D and die space on these chips NV are devoting to the RT cores. They would not be doing that if they didn't do something really quite valuable
Oh, I'm sure they were created for something valuable, just not for ray tracing. It's looking more and more like RT / Tensor is aimed at workstation purposes, but they've been bludgeoned into consumer space and passed off as "gaming features" like naked emperor's clothes.

Presumably all parties involved (nVidia, Dice, 4A, etc) tested the games at least once before release/patching. So if we assume all parties aren't legally blind and understand elementary benchmarking, there's absolutely no way they could be surprised at how horrifically DXR and DLSS have failed in practice.

So the only logical conclusion was that it was done in purpose with the hopes the sheeple wouldn't notice under the "10 gigarays!" and "it just works!" buzzwords
 

Qwertilot

Golden Member
Nov 28, 2013
1,604
257
126
Professional workstation rendering I think it was? They didn't hide that. The Tensor cores have very obvious/massive AI applications.

RTX has mostly suffered so far through still not being quite fast enough to run all that well even with the non trivial speed up all this extra hardware gives.
(+ limited experience in both using the hardware in rendering engines & the actual effects in games.).
 

BFG10K

Lifer
Aug 14, 2000
22,709
2,972
126
Professional workstation rendering I think it was? They didn't hide that. The Tensor cores have very obvious/massive AI applications.
Sure they did, when they tried to sell them as gaming features, and charged gamers exorbitant workstation pricing to fund said features. That's why their financials tanked.

RTX: it just works...just flip a switch and it works automatically everywhere with no development effort...no more game hacks will be required - all lies.

DLSS: it just works...no mention of API/resolution/RTX/hardware locks...4K/64xSSAA quality at much higher performance - all lies.

Instead what we got was badly upscaled reflective puddles @ 1080p60, and usage combinations that required nVidia's blessing before they could be used.

And suddenly the concept of high refresh/FPS/resolution completely disappeared from nVidia's vocabulary. Meanwhile they were still selling $2000 4K/144 monitors.
 

TheELF

Diamond Member
Dec 22, 2012
3,973
731
126
Yep, and Port Royal falls into that exact category. Yet we have people using it as proof that Volta gets "destroyed" by RTX. Meanwhile in an actual game (BF5) we see RTX is at best 45% faster.
So how far above 45% does it have to be before you can call it destroyed?
Also BF5 was the case you talk about later on they just flipped the switch in the engine and where running uncontrolled raytracing,for all we know it could be the same deal as way back when with tessellation where they had tessellation active on the whole map even though in most of the map it was covered and wasn't even showing any tessellation.

It does "just work" as nvidia advertised but obviously you are buying a GPU from 2019 and not one from 2219.
Raytracing is hard to do.
 

Thala

Golden Member
Nov 12, 2014
1,355
653
136
Yep, and Port Royal falls into that exact category. Yet we have people using it as proof that Volta gets "destroyed" by RTX. Meanwhile in an actual game (BF5) we see RTX is at best 45% faster.

The argument was about absolute performance going down from a tech demo to an actual game. Your argument is about relative performance with and without RTX, which is a different matter.

But even if we go with 45% relative performance gain for a second, despite this might be at the lower end of the spectrum. This means that with current progress of technology, that you can achieve the same raytracing performance using RT cores a few years! earlier compared to when using more general compute resources.
 

coercitiv

Diamond Member
Jan 24, 2014
6,211
11,941
136
The argument was about absolute performance going down from a tech demo to an actual game. Your argument is about relative performance with and without RTX, which is a different matter.
His argument is about BF5 running with RTX enabled on both Turing and Volta - Titan RTX is roughly 45% faster than Titan V with Ultra reflections. The advantage drops to 25% when Low preset is used.

jFPBGaz.png


Tech demos may paint the best case scenario but benchmarks may end up painting the worst case scenario instead , since atm games are hybrid rendering environments with different raster & ray tracing ratios.

Volta vs. Turing is also the best comparison we have, but may not be accurate enough anyway, as there may be extra memory subsystem tweaks in Turing that further enhance RT performance. (we don't even know how many Turing general compute cores we could cram into the same area as TU102, any signifcant difference in core count and/or operating clocks can significanlty alter the numbers above)

But even if we go with 45% relative performance gain for a second, despite this might be at the lower end of the spectrum. This means that with current progress of technology, that you can achieve the same raytracing performance using RT cores a few years! earlier compared to when using more general compute resources.
The argument being made here and in other threads was the current raster / ray tracing ratio used in games looks like it may favor tuned general compute resources instead of specialized hardware, with the advantage of offering more "horsepower" across the entire GPU lineup in raster heavy games, and the disadvantage of a bigger performance penalty once the raster / rt ratio alters significantly in the future.

As it stands now Turing seems to be in an awkward position: it pushes specialized hardware in an underdeveloped software environment, and does so with considerable sacrifices in both acquisition price and overall performance.

Unfortunately I doubt this discussion can productively go anywhere, we simply lack the tech details needed even for a basic analysis, so we're bound to support out favorite prediction over fact based conclusions.
 
Last edited:

Stuka87

Diamond Member
Dec 10, 2010
6,240
2,559
136
Explosions, bokeh, filtering, occlusion, etc. all graphics resources. AI, scripting, sound, networking, etc. may have little to do with graphics but all still impact performance. Either way, you completely missed the entire point of what I was saying. Tech demos are designed to look and run awesome. It's a very controlled scenario putting a best foot forward to highly small aspects of a larger picture. Just because a tech demo can run 30fps on a Vega 56 does not mean Vega 56 or a 1080 TI or whatever will run a full-featured game based on the same engine.

Knee jerk reactions to a video of an unreleased tech demo not yet running a game, or even having an announced a game, on the market is exactly that. Knee jerk.

For sure. Tech Demos have always just been that. My only point was that CPU should have zero impact on how the RT was being done. I would not be surprised if Crytek unveils a game at this years E3 (or maybe next years) which uses this technology.
 

Thala

Golden Member
Nov 12, 2014
1,355
653
136
His argument is about BF5 running with RTX enabled on both Turing and Volta - Titan RTX is roughly 45% faster than Titan V with Ultra reflections. The advantage drops to 25% when Low preset is used.

This does not contradict my statement, that the original argument regarding tech demos was related to a general performance drop on the same HW, while here we comparing two implementations relatively to each other.

The argument being made here and in other threads was the current raster / ray tracing ratio used in games looks like it may favor tuned general compute resources instead of specialized hardware, with the advantage of offering more "horsepower" across the entire GPU lineup in raster heavy games, and the disadvantage of a bigger performance penalty once the raster / rt ratio alters significantly in the future.

And i was stating that even considering the ratios used in a first generation game, which only utilizes raytracing for reflections on a limited amount of geometry, we already see a speed-up of 45% - which equates to several years before we see the same performance using general compute units for raytracing. Thats not what i would call "favor general compute".
The achievable speed-up is much higher if more ray-tracing, for instance for global illumination and not just reflections, is used - now you may want to translate this in years again.
 
Last edited:

coercitiv

Diamond Member
Jan 24, 2014
6,211
11,941
136
This does not contradict my statement, that the original argument regarding tech demos was related to a general performance drop on the same HW, while here we comparing two implementations relatively to each other.
It was a clarification. I wasn't aware of any need for contradiction.

And i was stating that even considering the ratios used in a first generation game, which only utilizes raytracing for reflections on a limited amount of geometry, we already see a speed-up of 45% - which equates to several years before we see the same performance using general compute units for raytracing. Thats not what i would call "favor general compute".
And I addressed your point by stressing the difference between comparing RTX vs non-RTX and comparing Volta vs. Turing - it's not apples and oranges but it's not apples vs. apples either, since a Turing equivalent w/o RTX cores might still be faster than Volta due to a number of reasons:
  • possibly higher compute core count (TU 102 has 4608 units, GV-100 has 5120 but comes with extra FP64 capabilities and a higher number of tensor cores)
  • possibly better memory subsystem (more cache, maybe faster)
  • possibly higher clocks
All of the factors above can have significant performance impact. What happens if a non-RTX implementation of Turing ends up being 20-30% faster than Titan V for the same size as TU102? Would that favor general compute?
 

Timmah!

Golden Member
Jul 24, 2010
1,429
638
136
Please notice how much RT cores accelerate rendering with Octane render. Almost 3x faster than without. Unless you think the Turing chip could have 3x more standard CUDA cores, if not for RT cores, then i think its safe to say that current solution with fixed function RT cores is simply faster at raytracing that alternative without them (currently viable).

At the same time, it needs to be said, that this 3x speed-up is at ideal conditions, when the rendered scene is polygon-heavy. It is the use-case. where the RT cores shine. Apparently, in many cases, the speed-up is just meager 20 percent (1,2x)....mostly in scenes, which are not geometry heavy enough. Those scenes would very likely benefit from more CUDA cores at the expense of RT cores, since CUDA cores still do the shading...so obviously, more CCs = faster shading.

Just leaving it here for anyone not following Octane render and this kind of stuff, point being things are not quite black and white in raytraced world. But ultimately, if this is what are moving toward, its IMO a good start. I for one definitely look forward for the RTX tech to be implemented in regular production Octane version, not just bench as it is for now.

Regarding Crysis, apparently thats tech called SVOGI, or VXGI, in other words its based on voxels, so its not quite raytracing, even though it could be that so called "voxel cone tracing" - in which case RT cores could probably be programmed to accelerate it same way as they accelerate raytracing. Anyway, it definitely looks great, comparably to raytraced reflections in BF5, and if it comes at negligible performance impact, then awesome. I would not say it makes RTX tech pointless or obsolete, it just means you have other, less taxing way of doing nice realistic reflections, so you can use raytracing for other stuff like GI or shadows. Which IMO are the more important anyway, when it concerns realistic graphics.

DzgZKEfV4AEKYs5.jpg
 

Thala

Golden Member
Nov 12, 2014
1,355
653
136
All of the factors above can have significant performance impact. What happens if a non-RTX implementation of Turing ends up being 20-30% faster than Titan V for the same size as TU102? Would that favor general compute?

I am sure, that the NVidia Engineers having done precisely such estimation when deciding to go for RT cores. Current indications are that 45% speed-up are on the low side, and your estimate of 20-30% performance gain at iso-area look to be on the high side of the spectrum. Still a discrepancy of 15-25%, mind you, which cannot be easily mitigated within the same technology.

Also not sure, why you take BF5 as reference - a title where just the reflections on reflective surfaces only are considered. Why not take a title like Metro Exodus, where all the GI calculations are done on RT cores as reference and where they have to shoot a photon at almost each pixel? As far as i am concerned, using raytracing for GI and shadows is much more impacting than reflections.
 
Last edited:

maddie

Diamond Member
Jul 18, 2010
4,747
4,689
136
I am sure, that the NVidia Engineers having done precisely such estimation when deciding to go for RT cores. Current indications are that 45% speed-up are on the low side, and your estimate of 20-30% performance gain at iso-area look to be on the high side of the spectrum. Still a discrepancy of 15-25%, mind you, which cannot be easily mitigated within the same technology.

Also not sure, why you take BF5 as reference - a title where just the reflections on reflective surfaces only are considered. Why not take a title like Metro Exodus, where all the GI calculations are done on RT cores as reference and where they have to shoot a photon at almost each pixel? As far as i am concerned, using raytracing for GI and shadows is much more impacting than reflections.
Glad you brought up Metro Exodus.

Take a look at this slide.
https://www.anandtech.com/show/1410...n-pascal-gpus-dxr-support-in-unity-and-unreal
GDC_Update_FINAL-page-012_575px.jpg


What I find very strange is they're comparing the RTX 2080 using the RT cores and not using them.
Notice the frame before the RT cores (green) come into play. Turing (no RT) has an identical shape as Turing (with RT) as would be expected. The only difference is that the job appears to finish a lot faster (66% time?) even though only integer and float cores are in use and we're speaking of the identical frame.

This should be equal. A similar thing is seen after the the RT cores usage. The (with RT) appears to have compressed the work to give less time taken.

I strongly feel we're being manipulated with some 1st class BS.
 
Last edited:

Muhammed

Senior member
Jul 8, 2009
453
199
116

Just read ANAND description:


In this example, NVIDIA showed a representative frame of Metro Exodus using DXR for global illumination. The top graph shows a Pascal GPU, with only FP32 compute, having a long render time in the middle for the effects. The middle bar, shwoing an RTX 2080 but could equally be a GTX 1660 Ti, shows FP32 and INT32 compute working together during the RT portion of the workload and speeding the process up. The final bar shows the effect of adding RT cores to the mix, and tensor cores at the end.
 

Ajay

Lifer
Jan 8, 2001
15,468
7,871
136
Glad you brought up Metro Exodus.

Take a look at this slide.
https://www.anandtech.com/show/1410...n-pascal-gpus-dxr-support-in-unity-and-unreal
GDC_Update_FINAL-page-012_575px.jpg


What I find very strange is they're comparing the RTX 2080 using the RT cores and not using them.
Notice the frame before the RT cores (green) come into play. Turing (no RT) has an identical shape as Turing (with RT) as would be expected. The only difference is that the job appears to finish a lot faster (66% time?) even though only integer and float cores are in use and we're speaking of the identical frame.

This should be equal. A similar thing is seen after the the RT cores usage. The (with RT) appears to have compressed the work to give less time taken.

I strongly feel we're being manipulated with some 1st class BS.
Shocking! Could it be that Nvidia would manipulate data for their press deck? ;)
 

poohbear

Platinum Member
Mar 11, 2003
2,284
5
81
What's the big deal if it's not an open world, non-fixed, real time demo? Can't any modern hardware render a fixed ray tracing scene? It's the open world, real time ray tracing that brings hardware to its knees. We've already scene photo realistic computer graphics for set scenes in movies, but try to play a game in a free, open, and real time environment and all modern hardware chokes.
 

AtenRa

Lifer
Feb 2, 2009
14,001
3,357
136
Glad you brought up Metro Exodus.

Take a look at this slide.
https://www.anandtech.com/show/1410...n-pascal-gpus-dxr-support-in-unity-and-unreal
GDC_Update_FINAL-page-012_575px.jpg


What I find very strange is they're comparing the RTX 2080 using the RT cores and not using them.
Notice the frame before the RT cores (green) come into play. Turing (no RT) has an identical shape as Turing (with RT) as would be expected. The only difference is that the job appears to finish a lot faster (66% time?) even though only integer and float cores are in use and we're speaking of the identical frame.

This should be equal. A similar thing is seen after the the RT cores usage. The (with RT) appears to have compressed the work to give less time taken.

I strongly feel we're being manipulated with some 1st class BS.

I believe this graph is correct, you just have to realize that the last frame (Turing RTX) is not rendered at the same resolution as the first two (Pascal & Turing) because we have DLSS enable.
So, frames 1 and 2 are rendered for example at 4K and frame 3 (Turing RTX) is rendered at 1080p and then with Tensor Cores (DLSS) is being up-scaled to 4K.
This is why both the start and the end of the frame 3 (Turing RTX) is not the same as the first two.
 
  • Like
Reactions: Muhammed

Muhammed

Senior member
Jul 8, 2009
453
199
116
I believe this graph is correct, you just have to realize that the last frame (Turing RTX) is not rendered at the same resolution as the first two (Pascal & Turing) because we have DLSS enable.
Exactly.
 

coercitiv

Diamond Member
Jan 24, 2014
6,211
11,941
136
Also not sure, why you take BF5 as reference - a title where just the reflections on reflective surfaces only are considered. Why not take a title like Metro Exodus, where all the GI calculations are done on RT cores as reference and where they have to shoot a photon at almost each pixel? As far as i am concerned, using raytracing for GI and shadows is much more impacting than reflections.
I'm taking BF5 as reference because it's the only real game I've seen benchmarked with both Turing and Volta with ray tracing enabled. Metro Exodus would bee a useful addition. I too feel GI and shadows are much more important than reflections, but my limited experience says these can be implemented with a lower performance impact (GI especially should respond better to downsampling and other performance enhancing tricks which would otherwise likely ruin image quality in reflections).

I am sure, that the NVidia Engineers having done precisely such estimation when deciding to go for RT cores. Current indications are that 45% speed-up are on the low side, and your estimate of 20-30% performance gain at iso-area look to be on the high side of the spectrum. Still a discrepancy of 15-25%, mind you, which cannot be easily mitigated within the same technology.
RT cores are definitely faster than general cores, much faster than 45%. Use the GPU to render a fully ray-traced scene and Turing probably obliterates Volta, but that process won't be in real time anymore. If we want real-time, we use hybrid rendering, with RT taking up only a fraction of the time needed to render the frame. The rest of the time is directly dependent on raster performance. In the example from Nvidia the RT portion takes only roughly 10% of the frame time.

ZFKMn2U.png


If we were to assume Turing was 100% faster at RT than Volta, the frame above would only take 10% longer to render. We can tilt this ratio to favor RT cores, but the problem is when doing so we end up tanking overall performance hard enough that only benchmarks can afford to do so.

As for my estimates of 20-30% perf gain at iso-area, keep in mind I was talking about a jump from Volta to non-RTX Turing - from a compute oriented arch to a "nimble" gaming oriented product. Take a look at them side by side and judge for yourself:

PkKPfx9.png


There's FP64 and Tensor cores in there which could arguably make plenty room for more compute units even with added FP16 units. I'm inclined to believe a 10% increase going from Volta to non-RTX Turing would not only be possible, but rather on the low side of the spectrum. Then comes the frequency question. Titan RTX lists 10%+ higher base clocks and and 20%+ higher boost clocks over Titan V. Again, a 10% increase in clocks in power limited scenarios seems adequate. Add these two together and 20% perf increase is already probable.

We should also keep in mind that performance increases in general compute cores offer two-fold jumps in frame times (as opposed to specialized hardware), as they lower both RT and raster time. For the example above (with 1:9 RT to Raster ratio), a 20% increase in overall performance may allow the general compute card to spend 100-200% more time with ray-tracing than it did before, assuming iso performance targets. On the other side, a 200% increase in RT core performance would yield a ~7% improvement in FPS.

Its only when the RT ratio starts to go up, when we use it for reflections, GI, shadows and maybe more at the same time that numbers make a lot more sense for specialized hardware, but one can argue that won't be happening in the near future, as today's RTX cards wouldn't be able to handle such load anyway. For now all we can reasonably do is take a look at RT enabled games running with decent FPS and try to extrapolate where that threshold is. To me, the 45% performance advantage Turing has over Volta is not convincing enough, for the reasons outlined above. (especially considering we can turn it into a 25% perf advantage if we run Low RT details)

Give me an offline render scenario and I'll beg for Turing. Give me a modern, fast paced game, and I'll start asking questions.
 
Last edited:

Thala

Golden Member
Nov 12, 2014
1,355
653
136
I'm taking BF5 as reference because it's the only real game I've seen benchmarked with both Turing and Volta with ray tracing enabled. Metro Exodus would bee a useful addition. I too feel GI and shadows are much more important than reflections, but my limited experience says these can be implemented with a lower performance impact (GI especially should respond better to downsampling and other performance enhancing tricks which would otherwise likely ruin image quality in reflections).

The Metro Exodus developer already reasoned about this - they already limit the number of rays they are shooting such that it still gives great results. Still on non-RTX hardware it would take almost >60% of the frame time (as you also can see in the pictures linked here).

RT cores are definitely faster than general cores, much faster than 45%. Use the GPU to render a fully ray-traced scene and Turing probably obliterates Volta, but that process won't be in real time anymore. If we want real-time, we use hybrid rendering, with RT taking up only a fraction of the time needed to render the frame. The rest of the time is directly dependent on raster performance. In the example from Nvidia the RT portion takes only roughly 10% of the frame time.

And without RT cores the RT portion running on general shader cores takes up 60% of the frame time. Amdahls law tells us, that the overall speedup would be close to 100% - which is precisely what we are seeing.

We should also keep in mind that performance increases in general compute cores offer two-fold jumps in frame times (as opposed to specialized hardware), as they lower both RT and raster time. For the example above (with 1:9 RT to Raster ratio), a 20% increase in overall performance may allow the general compute card to spend 100-200% more time with ray-tracing than it did before, assuming iso performance targets. On the other side, a 200% increase in RT core performance would yield a ~7% improvement in FPS.

Its only when the RT ratio starts to go up, when we use it for reflections, GI, shadows and maybe more at the same time that numbers make a lot more sense for specialized hardware, but one can argue that won't be happening in the near future, as today's RTX cards wouldn't be able to handle such load anyway.

Just take the Metro Exodus example. We seeing the RT portion going down from 60% frame time to 10% or so, resulting in an overall speed-up of roughly 100%. Its clear that adding an equivalent amount of general compute resources cannot be a solution. It also shows, that RT taking up 60% of the frametime is already feasible today - because this part can be sufficiently speed-up thanks to RT cores. We are not talking about a distant future where the RT part takes possibly >90% of the frame time - and even higher speed-ups can be observed (as we see currently in the off-line render cases)[/quote]
 

coercitiv

Diamond Member
Jan 24, 2014
6,211
11,941
136
The Metro Exodus developer already reasoned about this - they already limit the number of rays they are shooting such that it still gives great results. Still on non-RTX hardware it would take almost >60% of the frame time (as you also can see in the pictures linked here).
With Nvidia soon enabling DXR on both Pascal and non-RTX Turing we'll soon be able to more accurately asses RT to Raster ratios.

On one side, looking at the slides Nvidia provided I'm inclined to change my mind and acknowledge specialized units have their utility even today, on the other side looking at performance impact of RTX in BF5 vs. Metro something seems off - Metro has a ~20% performance penalty to enabling RTX while BF5 is still around 30%+, yet Volta does "well" in BF5 /w RTX.

Either the Metro workload is much better suited for RT cores and will tank hard on Volta and non-RTX Turing alike, or Volta comes with an innate advantage of it's own that we hadn't considered, or the numbers provided by Nvidia are stretched towards the worst possible values.

I guess I'll patiently wait for RTX enabled numbers on 1660Ti to better understand the workloads. Volta running Metro /w RTX enabled would also help.
 

SirDinadan

Member
Jul 11, 2016
108
64
71
boostclock.com
Ray-tracing applications have already started to roll out early support to utilze RT Cores inside Turing. OctaneBench RTX preview was already mentioned, Arnold will have a a public beta soon.
octane_rtx.png

Fermat, NVLabs' physically based research renderer can also harness the power of RTX Turing and logs the shading and ray-tracing workloads separately.
fermat_museumhallRD.png

More @ BoostClock.com - GPU rendering RTX ray tracing benchmarks - RTX 2080 Ti | GTX 1080 Ti | TITAN V - OptiX 6.0 Fermat & RTX OctaneBench 2019 Preview