News Crytek CryEngine shows raytracing technology demo (runs on Radeon RX Vega 56)

BFG10K · Mar 18, 2019

tviceman said:
Tech demos are shown all the time as proof of concept, but often do not carry the same performance when an actual game is built on top of it.

Yep, and Port Royal falls into that exact category. Yet we have people using it as proof that Volta gets "destroyed" by RTX. Meanwhile in an actual game (BF5) we see RTX is at best 45% faster.

Qwertilot said:
The easiest way to see some of the arguments above are nonsense is to consider how much R&D and die space on these chips NV are devoting to the RT cores. They would not be doing that if they didn't do something really quite valuable

Oh, I'm sure they were created for something valuable, just not for ray tracing. It's looking more and more like RT / Tensor is aimed at workstation purposes, but they've been bludgeoned into consumer space and passed off as "gaming features" like naked emperor's clothes.

Presumably all parties involved (nVidia, Dice, 4A, etc) tested the games at least once before release/patching. So if we assume all parties aren't legally blind and understand elementary benchmarking, there's absolutely no way they could be surprised at how horrifically DXR and DLSS have failed in practice.

So the only logical conclusion was that it was done in purpose with the hopes the sheeple wouldn't notice under the "10 gigarays!" and "it just works!" buzzwords

Qwertilot · Mar 18, 2019

Professional workstation rendering I think it was? They didn't hide that. The Tensor cores have very obvious/massive AI applications.

RTX has mostly suffered so far through still not being quite fast enough to run all that well even with the non trivial speed up all this extra hardware gives.
(+ limited experience in both using the hardware in rendering engines & the actual effects in games.).

BFG10K · Mar 18, 2019

Qwertilot said:
Professional workstation rendering I think it was? They didn't hide that. The Tensor cores have very obvious/massive AI applications.

Sure they did, when they tried to sell them as gaming features, and charged gamers exorbitant workstation pricing to fund said features. That's why their financials tanked.

RTX: it just works...just flip a switch and it works automatically everywhere with no development effort...no more game hacks will be required - all lies.

DLSS: it just works...no mention of API/resolution/RTX/hardware locks...4K/64xSSAA quality at much higher performance - all lies.

Instead what we got was badly upscaled reflective puddles @ 1080p60, and usage combinations that required nVidia's blessing before they could be used.

And suddenly the concept of high refresh/FPS/resolution completely disappeared from nVidia's vocabulary. Meanwhile they were still selling $2000 4K/144 monitors.

TheELF · Mar 18, 2019

BFG10K said:
Yep, and Port Royal falls into that exact category. Yet we have people using it as proof that Volta gets "destroyed" by RTX. Meanwhile in an actual game (BF5) we see RTX is at best 45% faster.

So how far above 45% does it have to be before you can call it destroyed?
Also BF5 was the case you talk about later on they just flipped the switch in the engine and where running uncontrolled raytracing,for all we know it could be the same deal as way back when with tessellation where they had tessellation active on the whole map even though in most of the map it was covered and wasn't even showing any tessellation.

It does "just work" as nvidia advertised but obviously you are buying a GPU from 2019 and not one from 2219.
Raytracing is hard to do.

Dribble · Mar 18, 2019

I note this discussion doesn't mention Metro which is only the first game ever with ray traced global illumination, and it seems to work just fine.

Thala · Mar 18, 2019

BFG10K said:
Yep, and Port Royal falls into that exact category. Yet we have people using it as proof that Volta gets "destroyed" by RTX. Meanwhile in an actual game (BF5) we see RTX is at best 45% faster.

The argument was about absolute performance going down from a tech demo to an actual game. Your argument is about relative performance with and without RTX, which is a different matter.

But even if we go with 45% relative performance gain for a second, despite this might be at the lower end of the spectrum. This means that with current progress of technology, that you can achieve the same raytracing performance using RT cores a few years! earlier compared to when using more general compute resources.

coercitiv · Mar 18, 2019

Thala said:
The argument was about absolute performance going down from a tech demo to an actual game. Your argument is about relative performance with and without RTX, which is a different matter.

His argument is about BF5 running with RTX enabled on both Turing and Volta - Titan RTX is roughly 45% faster than Titan V with Ultra reflections. The advantage drops to 25% when Low preset is used.

Tech demos may paint the best case scenario but benchmarks may end up painting the worst case scenario instead , since atm games are hybrid rendering environments with different raster & ray tracing ratios.

Volta vs. Turing is also the best comparison we have, but may not be accurate enough anyway, as there may be extra memory subsystem tweaks in Turing that further enhance RT performance. (we don't even know how many Turing general compute cores we could cram into the same area as TU102, any signifcant difference in core count and/or operating clocks can significanlty alter the numbers above)

Thala said:
But even if we go with 45% relative performance gain for a second, despite this might be at the lower end of the spectrum. This means that with current progress of technology, that you can achieve the same raytracing performance using RT cores a few years! earlier compared to when using more general compute resources.

The argument being made here and in other threads was the current raster / ray tracing ratio used in games looks like it may favor tuned general compute resources instead of specialized hardware, with the advantage of offering more "horsepower" across the entire GPU lineup in raster heavy games, and the disadvantage of a bigger performance penalty once the raster / rt ratio alters significantly in the future.

As it stands now Turing seems to be in an awkward position: it pushes specialized hardware in an underdeveloped software environment, and does so with considerable sacrifices in both acquisition price and overall performance.

Unfortunately I doubt this discussion can productively go anywhere, we simply lack the tech details needed even for a basic analysis, so we're bound to support out favorite prediction over fact based conclusions.

Stuka87 · Mar 18, 2019

tviceman said:
Explosions, bokeh, filtering, occlusion, etc. all graphics resources. AI, scripting, sound, networking, etc. may have little to do with graphics but all still impact performance. Either way, you completely missed the entire point of what I was saying. Tech demos are designed to look and run awesome. It's a very controlled scenario putting a best foot forward to highly small aspects of a larger picture. Just because a tech demo can run 30fps on a Vega 56 does not mean Vega 56 or a 1080 TI or whatever will run a full-featured game based on the same engine.

Knee jerk reactions to a video of an unreleased tech demo not yet running a game, or even having an announced a game, on the market is exactly that. Knee jerk.

For sure. Tech Demos have always just been that. My only point was that CPU should have zero impact on how the RT was being done. I would not be surprised if Crytek unveils a game at this years E3 (or maybe next years) which uses this technology.

FiendishMind · Mar 18, 2019

Since it's being brought up a lot, there's a fairly detailed RTRT comparison between Volta and Turing in this SIGGRAPH presentation.

https://www.ea.com/seed/news/siggraph-2018-picapica-nv-turing

Thala · Mar 18, 2019

coercitiv said:
His argument is about BF5 running with RTX enabled on both Turing and Volta - Titan RTX is roughly 45% faster than Titan V with Ultra reflections. The advantage drops to 25% when Low preset is used.

This does not contradict my statement, that the original argument regarding tech demos was related to a general performance drop on the same HW, while here we comparing two implementations relatively to each other.

The argument being made here and in other threads was the current raster / ray tracing ratio used in games looks like it may favor tuned general compute resources instead of specialized hardware, with the advantage of offering more "horsepower" across the entire GPU lineup in raster heavy games, and the disadvantage of a bigger performance penalty once the raster / rt ratio alters significantly in the future.

And i was stating that even considering the ratios used in a first generation game, which only utilizes raytracing for reflections on a limited amount of geometry, we already see a speed-up of 45% - which equates to several years before we see the same performance using general compute units for raytracing. Thats not what i would call "favor general compute".
The achievable speed-up is much higher if more ray-tracing, for instance for global illumination and not just reflections, is used - now you may want to translate this in years again.

coercitiv · Mar 18, 2019

Thala said:
This does not contradict my statement, that the original argument regarding tech demos was related to a general performance drop on the same HW, while here we comparing two implementations relatively to each other.

It was a clarification. I wasn't aware of any need for contradiction.

Thala said:
And i was stating that even considering the ratios used in a first generation game, which only utilizes raytracing for reflections on a limited amount of geometry, we already see a speed-up of 45% - which equates to several years before we see the same performance using general compute units for raytracing. Thats not what i would call "favor general compute".

And I addressed your point by stressing the difference between comparing RTX vs non-RTX and comparing Volta vs. Turing - it's not apples and oranges but it's not apples vs. apples either, since a Turing equivalent w/o RTX cores might still be faster than Volta due to a number of reasons:

possibly higher compute core count (TU 102 has 4608 units, GV-100 has 5120 but comes with extra FP64 capabilities and a higher number of tensor cores)
possibly better memory subsystem (more cache, maybe faster)
possibly higher clocks

All of the factors above can have significant performance impact. What happens if a non-RTX implementation of Turing ends up being 20-30% faster than Titan V for the same size as TU102? Would that favor general compute?

Timmah! · Mar 18, 2019

Please notice how much RT cores accelerate rendering with Octane render. Almost 3x faster than without. Unless you think the Turing chip could have 3x more standard CUDA cores, if not for RT cores, then i think its safe to say that current solution with fixed function RT cores is simply faster at raytracing that alternative without them (currently viable).

At the same time, it needs to be said, that this 3x speed-up is at ideal conditions, when the rendered scene is polygon-heavy. It is the use-case. where the RT cores shine. Apparently, in many cases, the speed-up is just meager 20 percent (1,2x)....mostly in scenes, which are not geometry heavy enough. Those scenes would very likely benefit from more CUDA cores at the expense of RT cores, since CUDA cores still do the shading...so obviously, more CCs = faster shading.

Just leaving it here for anyone not following Octane render and this kind of stuff, point being things are not quite black and white in raytraced world. But ultimately, if this is what are moving toward, its IMO a good start. I for one definitely look forward for the RTX tech to be implemented in regular production Octane version, not just bench as it is for now.

Regarding Crysis, apparently thats tech called SVOGI, or VXGI, in other words its based on voxels, so its not quite raytracing, even though it could be that so called "voxel cone tracing" - in which case RT cores could probably be programmed to accelerate it same way as they accelerate raytracing. Anyway, it definitely looks great, comparably to raytraced reflections in BF5, and if it comes at negligible performance impact, then awesome. I would not say it makes RTX tech pointless or obsolete, it just means you have other, less taxing way of doing nice realistic reflections, so you can use raytracing for other stuff like GI or shadows. Which IMO are the more important anyway, when it concerns realistic graphics.

Thala · Mar 18, 2019

coercitiv said:
All of the factors above can have significant performance impact. What happens if a non-RTX implementation of Turing ends up being 20-30% faster than Titan V for the same size as TU102? Would that favor general compute?

I am sure, that the NVidia Engineers having done precisely such estimation when deciding to go for RT cores. Current indications are that 45% speed-up are on the low side, and your estimate of 20-30% performance gain at iso-area look to be on the high side of the spectrum. Still a discrepancy of 15-25%, mind you, which cannot be easily mitigated within the same technology.

Also not sure, why you take BF5 as reference - a title where just the reflections on reflective surfaces only are considered. Why not take a title like Metro Exodus, where all the GI calculations are done on RT cores as reference and where they have to shoot a photon at almost each pixel? As far as i am concerned, using raytracing for GI and shadows is much more impacting than reflections.

maddie · Mar 18, 2019

Thala said:
I am sure, that the NVidia Engineers having done precisely such estimation when deciding to go for RT cores. Current indications are that 45% speed-up are on the low side, and your estimate of 20-30% performance gain at iso-area look to be on the high side of the spectrum. Still a discrepancy of 15-25%, mind you, which cannot be easily mitigated within the same technology.

Also not sure, why you take BF5 as reference - a title where just the reflections on reflective surfaces only are considered. Why not take a title like Metro Exodus, where all the GI calculations are done on RT cores as reference and where they have to shoot a photon at almost each pixel? As far as i am concerned, using raytracing for GI and shadows is much more impacting than reflections.

Glad you brought up Metro Exodus.

Take a look at this slide.
https://www.anandtech.com/show/1410...n-pascal-gpus-dxr-support-in-unity-and-unreal

What I find very strange is they're comparing the RTX 2080 using the RT cores and not using them.
Notice the frame before the RT cores (green) come into play. Turing (no RT) has an identical shape as Turing (with RT) as would be expected. The only difference is that the job appears to finish a lot faster (66% time?) even though only integer and float cores are in use and we're speaking of the identical frame.

This should be equal. A similar thing is seen after the the RT cores usage. The (with RT) appears to have compressed the work to give less time taken.

I strongly feel we're being manipulated with some 1st class BS.

Muhammed · Mar 18, 2019

maddie said:
Take a look at this slide.
https://www.anandtech.com/show/1410...n-pascal-gpus-dxr-support-in-unity-and-unreal

Just read ANAND description:

In this example, NVIDIA showed a representative frame of Metro Exodus using DXR for global illumination. The top graph shows a Pascal GPU, with only FP32 compute, having a long render time in the middle for the effects. The middle bar, shwoing an RTX 2080 but could equally be a GTX 1660 Ti, shows FP32 and INT32 compute working together during the RT portion of the workload and speeding the process up. The final bar shows the effect of adding RT cores to the mix, and tensor cores at the end.

Ajay · Mar 18, 2019

maddie said:
Glad you brought up Metro Exodus.

Take a look at this slide.
https://www.anandtech.com/show/1410...n-pascal-gpus-dxr-support-in-unity-and-unreal

What I find very strange is they're comparing the RTX 2080 using the RT cores and not using them.
Notice the frame before the RT cores (green) come into play. Turing (no RT) has an identical shape as Turing (with RT) as would be expected. The only difference is that the job appears to finish a lot faster (66% time?) even though only integer and float cores are in use and we're speaking of the identical frame.

This should be equal. A similar thing is seen after the the RT cores usage. The (with RT) appears to have compressed the work to give less time taken.

I strongly feel we're being manipulated with some 1st class BS.

Shocking! Could it be that Nvidia would manipulate data for their press deck?

poohbear · Mar 18, 2019

What's the big deal if it's not an open world, non-fixed, real time demo? Can't any modern hardware render a fixed ray tracing scene? It's the open world, real time ray tracing that brings hardware to its knees. We've already scene photo realistic computer graphics for set scenes in movies, but try to play a game in a free, open, and real time environment and all modern hardware chokes.

maddie · Mar 19, 2019

Muhammed said:
Just read ANAND description:

Maybe you should reread what I wrote. I don't think you got it.

AtenRa · Mar 19, 2019

maddie said:
Glad you brought up Metro Exodus.

Take a look at this slide.
https://www.anandtech.com/show/1410...n-pascal-gpus-dxr-support-in-unity-and-unreal

What I find very strange is they're comparing the RTX 2080 using the RT cores and not using them.
Notice the frame before the RT cores (green) come into play. Turing (no RT) has an identical shape as Turing (with RT) as would be expected. The only difference is that the job appears to finish a lot faster (66% time?) even though only integer and float cores are in use and we're speaking of the identical frame.

This should be equal. A similar thing is seen after the the RT cores usage. The (with RT) appears to have compressed the work to give less time taken.

I strongly feel we're being manipulated with some 1st class BS.

I believe this graph is correct, you just have to realize that the last frame (Turing RTX) is not rendered at the same resolution as the first two (Pascal & Turing) because we have DLSS enable.
So, frames 1 and 2 are rendered for example at 4K and frame 3 (Turing RTX) is rendered at 1080p and then with Tensor Cores (DLSS) is being up-scaled to 4K.
This is why both the start and the end of the frame 3 (Turing RTX) is not the same as the first two.

Muhammed · Mar 19, 2019

AtenRa said:
I believe this graph is correct, you just have to realize that the last frame (Turing RTX) is not rendered at the same resolution as the first two (Pascal & Turing) because we have DLSS enable.

Exactly.

coercitiv · Mar 19, 2019

Thala said:
Also not sure, why you take BF5 as reference - a title where just the reflections on reflective surfaces only are considered. Why not take a title like Metro Exodus, where all the GI calculations are done on RT cores as reference and where they have to shoot a photon at almost each pixel? As far as i am concerned, using raytracing for GI and shadows is much more impacting than reflections.

I'm taking BF5 as reference because it's the only real game I've seen benchmarked with both Turing and Volta with ray tracing enabled. Metro Exodus would bee a useful addition. I too feel GI and shadows are much more important than reflections, but my limited experience says these can be implemented with a lower performance impact (GI especially should respond better to downsampling and other performance enhancing tricks which would otherwise likely ruin image quality in reflections).

Thala said:
I am sure, that the NVidia Engineers having done precisely such estimation when deciding to go for RT cores. Current indications are that 45% speed-up are on the low side, and your estimate of 20-30% performance gain at iso-area look to be on the high side of the spectrum. Still a discrepancy of 15-25%, mind you, which cannot be easily mitigated within the same technology.

RT cores are definitely faster than general cores, much faster than 45%. Use the GPU to render a fully ray-traced scene and Turing probably obliterates Volta, but that process won't be in real time anymore. If we want real-time, we use hybrid rendering, with RT taking up only a fraction of the time needed to render the frame. The rest of the time is directly dependent on raster performance. In the example from Nvidia the RT portion takes only roughly 10% of the frame time.

If we were to assume Turing was 100% faster at RT than Volta, the frame above would only take 10% longer to render. We can tilt this ratio to favor RT cores, but the problem is when doing so we end up tanking overall performance hard enough that only benchmarks can afford to do so.

As for my estimates of 20-30% perf gain at iso-area, keep in mind I was talking about a jump from Volta to non-RTX Turing - from a compute oriented arch to a "nimble" gaming oriented product. Take a look at them side by side and judge for yourself:

There's FP64 and Tensor cores in there which could arguably make plenty room for more compute units even with added FP16 units. I'm inclined to believe a 10% increase going from Volta to non-RTX Turing would not only be possible, but rather on the low side of the spectrum. Then comes the frequency question. Titan RTX lists 10%+ higher base clocks and and 20%+ higher boost clocks over Titan V. Again, a 10% increase in clocks in power limited scenarios seems adequate. Add these two together and 20% perf increase is already probable.

We should also keep in mind that performance increases in general compute cores offer two-fold jumps in frame times (as opposed to specialized hardware), as they lower both RT and raster time. For the example above (with 1:9 RT to Raster ratio), a 20% increase in overall performance may allow the general compute card to spend 100-200% more time with ray-tracing than it did before, assuming iso performance targets. On the other side, a 200% increase in RT core performance would yield a ~7% improvement in FPS.

Its only when the RT ratio starts to go up, when we use it for reflections, GI, shadows and maybe more at the same time that numbers make a lot more sense for specialized hardware, but one can argue that won't be happening in the near future, as today's RTX cards wouldn't be able to handle such load anyway. For now all we can reasonably do is take a look at RT enabled games running with decent FPS and try to extrapolate where that threshold is. To me, the 45% performance advantage Turing has over Volta is not convincing enough, for the reasons outlined above. (especially considering we can turn it into a 25% perf advantage if we run Low RT details)

Give me an offline render scenario and I'll beg for Turing. Give me a modern, fast paced game, and I'll start asking questions.

coercitiv · Mar 19, 2019

maddie said:
I strongly feel we're being manipulated with some 1st class BS.

Here's better numbers from Nvidia, since they include both DLSS On and Off for the RTX 2080.

Thala · Mar 19, 2019

coercitiv said:
I'm taking BF5 as reference because it's the only real game I've seen benchmarked with both Turing and Volta with ray tracing enabled. Metro Exodus would bee a useful addition. I too feel GI and shadows are much more important than reflections, but my limited experience says these can be implemented with a lower performance impact (GI especially should respond better to downsampling and other performance enhancing tricks which would otherwise likely ruin image quality in reflections).

The Metro Exodus developer already reasoned about this - they already limit the number of rays they are shooting such that it still gives great results. Still on non-RTX hardware it would take almost >60% of the frame time (as you also can see in the pictures linked here).

RT cores are definitely faster than general cores, much faster than 45%. Use the GPU to render a fully ray-traced scene and Turing probably obliterates Volta, but that process won't be in real time anymore. If we want real-time, we use hybrid rendering, with RT taking up only a fraction of the time needed to render the frame. The rest of the time is directly dependent on raster performance. In the example from Nvidia the RT portion takes only roughly 10% of the frame time.

And without RT cores the RT portion running on general shader cores takes up 60% of the frame time. Amdahls law tells us, that the overall speedup would be close to 100% - which is precisely what we are seeing.

We should also keep in mind that performance increases in general compute cores offer two-fold jumps in frame times (as opposed to specialized hardware), as they lower both RT and raster time. For the example above (with 1:9 RT to Raster ratio), a 20% increase in overall performance may allow the general compute card to spend 100-200% more time with ray-tracing than it did before, assuming iso performance targets. On the other side, a 200% increase in RT core performance would yield a ~7% improvement in FPS.

Its only when the RT ratio starts to go up, when we use it for reflections, GI, shadows and maybe more at the same time that numbers make a lot more sense for specialized hardware, but one can argue that won't be happening in the near future, as today's RTX cards wouldn't be able to handle such load anyway.

Just take the Metro Exodus example. We seeing the RT portion going down from 60% frame time to 10% or so, resulting in an overall speed-up of roughly 100%. Its clear that adding an equivalent amount of general compute resources cannot be a solution. It also shows, that RT taking up 60% of the frametime is already feasible today - because this part can be sufficiently speed-up thanks to RT cores. We are not talking about a distant future where the RT part takes possibly >90% of the frame time - and even higher speed-ups can be observed (as we see currently in the off-line render cases)[/quote]

coercitiv · Mar 19, 2019

Thala said:
The Metro Exodus developer already reasoned about this - they already limit the number of rays they are shooting such that it still gives great results. Still on non-RTX hardware it would take almost >60% of the frame time (as you also can see in the pictures linked here).

With Nvidia soon enabling DXR on both Pascal and non-RTX Turing we'll soon be able to more accurately asses RT to Raster ratios.

On one side, looking at the slides Nvidia provided I'm inclined to change my mind and acknowledge specialized units have their utility even today, on the other side looking at performance impact of RTX in BF5 vs. Metro something seems off - Metro has a ~20% performance penalty to enabling RTX while BF5 is still around 30%+, yet Volta does "well" in BF5 /w RTX.

Either the Metro workload is much better suited for RT cores and will tank hard on Volta and non-RTX Turing alike, or Volta comes with an innate advantage of it's own that we hadn't considered, or the numbers provided by Nvidia are stretched towards the worst possible values.

I guess I'll patiently wait for RTX enabled numbers on 1660Ti to better understand the workloads. Volta running Metro /w RTX enabled would also help.

SirDinadan · Mar 19, 2019

Ray-tracing applications have already started to roll out early support to utilze RT Cores inside Turing. OctaneBench RTX preview was already mentioned, Arnold will have a a public beta soon.

Fermat, NVLabs' physically based research renderer can also harness the power of RTX Turing and logs the shading and ray-tracing workloads separately.

More @ BoostClock.com - GPU rendering RTX ray tracing benchmarks - RTX 2080 Ti | GTX 1080 Ti | TITAN V - OptiX 6.0 Fermat & RTX OctaneBench 2019 Preview

News Crytek CryEngine shows raytracing technology demo (runs on Radeon RX Vega 56)

Lifer

Golden Member

Lifer

Diamond Member

Platinum Member

Golden Member

Diamond Member

Diamond Member

Member

Golden Member

Diamond Member

Golden Member

Golden Member

Diamond Member

Senior member

Lifer

Platinum Member

Diamond Member

Lifer

Senior member

Diamond Member

Diamond Member

Golden Member

Diamond Member

Member