Pascal cards - Octane Render performance

Timmah!

Golden Member
Jul 24, 2010
1,418
630
136
So i ran into interesting piece of info on Octane Render forums.... one of the users benched both GTX 1060 and 1080 in Octanebench and the results are suprising, to say at least. Usually, when it comes to Octane, performance-wise, the most important stats are number of CUDA cores and frequency. Obviously, higher means better/faster, in both cases.

Knowing this, comparing different GPUs is fairly easy, at least when you compare within the same generation, when there are no architectural changes present, which could sway the results...

One example for all, GTX970 has average score of 79. It has 1664 CC to 980Ti 2816... thats 1,69x more in favor of 980Ti. 79 x 1,69 = 133. Slightly above whats listed as avg score of 980Ti, 126 - could be perhaps explained by some frequency differences, or simply the app does not scale 100 percent...still its pretty close.

Now lets get to 1060 and 1080, both at stock apparently, and see this:

file.php

file.php


1060 = 89, 1080 = 134. Thats 2/3 of the performance, even though 1060 has only 1/2 of CUDA cores. Frequencies should be at stock more or less the same, i assume.

Any idea whats going on?
 

Timmah!

Golden Member
Jul 24, 2010
1,418
630
136
Oooh, very nice... i have 2 1080 FTWs still in boxes, to replace my current 580, so this is the kind of pics i want to see. What clocks is this on? Above 2GHz? VRAM OC too?
 
  • Like
Reactions: Dufus

Dufus

Senior member
Sep 20, 2010
675
119
101
That was run at P-State 2 (CUDA) with 2088/5580. Something to watch out for as memory clocks are reduced in P2. Increasing to 2202/5580 only achieved a little over 156. Not sure if that Octane test version is fully optimized, GPU and memory loads were not being pegged.

Edit: Looks like the CPU is lightly loaded and this can cause latency issues. Some effect from DRAM bandwidth too. Seems my Haswell doesn't clock high enough to take full advantage of this bench with a single 1080. :(

Haswell i7-4700MQ at 4.5GHz and only C1/C0 active still doesn't give 100% GPU load with max at 96% and average at less than 90%. GPU 2214-2202/5580.

2k0o60.jpg



Something weird happens with the scaling after 1.4GHz GPU Clock.

hsjbex.png
 
Last edited:
  • Like
Reactions: Timmah!

Timmah!

Golden Member
Jul 24, 2010
1,418
630
136
Good stuff!

The Octanebench is actually not up to date and officially does not even support Pascal cards, it only works cause somebody figured out swapping some files from the actual app, where the cards work, to the benching app could help, and indeed it does. That said, i am not holding my breath that eventual official version is going to magically solve any performance discrepancies, if it does not scale well now, color me pessimist but i dont believe it will start to scale well later.

Still interesting, i thought there is weird scaling is related to CUDA core numbers, now it seems it has to do with clocks as well. Most peculiar thing. I am curious about further development, would love to get some explanation, whats going on.

BTW, i dont get the CPU part, do you think CPU utilization has anything to do with this? I always assumed Octane is strictly GPU bound and CPU matters only in regard to the scene loading process and speed of that, but has zero influence on actual rendering.

The other thing, these 2200 clocks, are they only achievable in Octane or can your card actually run 3DMark or some demanding game like Witcher on those clocks without crashing? Cause TBH, i am yet to see any test to show 1080 to be stable at those clocks, then again, all the testing is done on the games or benching apps, which i dont care about. What matters to me are the stable clocks at Octane, and if it can go up to 2200, even if in games cant, thats indeed good news. Bringing not much additional performance due to bad scaling, not so much...

Still, my current 580 gets 65 points. 2 FTWs should get 154 x2, thats 308, almost 5x faster, thats decent, and if it translates into real world performance in my own scenes, i wont regret spending the money and not waiting further on 1080Ti or whatever is coming next....

EDIT> Wait, 173 now at cca 2200? I read 156 and did not look at the pics properly... so which one is it? 173 is awesome.
 

Dufus

Senior member
Sep 20, 2010
675
119
101
EDIT> Wait, 173 now at cca 2200? I read 156 and did not look at the pics properly... so which one is it?
Both.

Getting rid of the CPU latency increased the score. Should it affect the bench? I don't know, a question for the author(s). Did try posting on that forum (otoy) but too many hoops to jump through, can't post in the relevant thread. However did post a bench but under "admin approval" so might not even see the light of day.

Here's a copy run with stock VBIOS (174 @2.1GHz).
2rhxkpv.jpg

The 2.2GHz results were with a x-flash VBIOS which enables higher voltage and unlimited power. It does seem to perform a little slower clock for clock than the stock VBIOS but the extra frequency does allow a 26k+ graphics score in 3DMark Fire Strike 1.1 My own GTX 1080 voltage scaling gets poor after 2GHz so performance gains can be very expensive. The Firestrike bench can consume over 300W at 1.2V so while I'm okay with that for the occasional short bench I would not use it for gaming.

With Octanebench rendering gains also drop off at the top frequencies so even 2000/1395 can net a score above 165. At 1400/1395 can get 140+ and only 75W max consumption. It also seems strange that the Octanebench can complete in a shorter time with lower clocks and lower score. Maybe it's just a buggy bench.
 
Last edited:
  • Like
Reactions: Timmah!

Timmah!

Golden Member
Jul 24, 2010
1,418
630
136
Thanks!

Not sure where your post is on Octane forums, so i took the liberty to post the last pic of yours over there in one of the GTX1080 topics in General Discussion over there. I hope you dont mind, i credited you ofc as the author of the screenshot / performer of the bench.

So, how did you rid of CPU latency then? By OCing the CPU to 4,5GHz, is that what you mean by that?
 
  • Like
Reactions: Dufus

Dufus

Senior member
Sep 20, 2010
675
119
101
If the CPU enters an idle state C3-C7 then it needs to be woken up again to carry on and do some work. Waking the CPU may take some tens of microseconds which in compute time is a lot of cycles. There can also be some latency as the CPU core voltage is ramped up by the VRM and can be even more with EIST. The bench does not use the CPU constantly it seems so plenty of opportunities for the CPU to enter those idle states.

By disabling package C-States and core states C3-C7 then the CPU should stop at C1 which just halts the clock while leaving the CPU ready to restart execution when required. A similar senario can happen with benches for SSD's where CPU is idle while waiting for data from the disk.

This can be done via BIOS setup and enabling high performance mode in Windows. It remains to be seen if the effect is because of a bursty nature with the CPU or a problem with the bench.

Post finally went through on otoy. Replied to poster but held up while waiting for admin approval once again.
 
Last edited:

Dufus

Senior member
Sep 20, 2010
675
119
101
@Timmah! Here's a run with newer version 3.04 using stock WC Founders Edition GTX 1080

2i7lmjo.png


Pretty much the same. Peak power draw 140W, 178pts, 2100MHz GPU and 1395MHz Mem. I'm giving up posting on OTOY, waiting on admin approval for posts is just too much.
 

Dufus

Senior member
Sep 20, 2010
675
119
101
Version 2.17 does not support Pascal, need version 3.

One can download the latest version3 demo from here, copy the benchmark_data folder from 2.17 to it and run "octane.exe --benchmark"
or download an older version 3 with the benchmark included from here.

Should be able to get over 200pts with Titan X Pascal. Memory helps a lot especially since it runs in P-State P2 where memory clock is reduced by default. I got nearly 20% increase in benchmark just increasing the memory clock.

The benchmark is very light on the CPU so can leave the CPU idle with high exit latency when woken so had some more increase in score by disabling C-States. Wonder if that's the same on older cards, if so might mean a lot of time lost in rendering.

EDIT: Forgot to mention that the benchmark upload button is disabled by OTOY so will not be able to upload scores to the database as yet.
 
Last edited:

alcoholbob

Diamond Member
May 24, 2005
6,271
323
126
1b77c4c5-cd20-4eba-a1dc-a44b21213231
Mu8bFmw.jpg


171.85? Seems lower than the 1080s here. I wonder if there's some other bottleneck with the Titan X? It only reaches around 55% power consumption during the test.

Edit: Maybe a Windows 10 issue? All the high scores seem to be Windows 7 based. Windows 10 scores all seem low.
 
Last edited:
  • Like
Reactions: Dufus

Dufus

Senior member
Sep 20, 2010
675
119
101
@alcoholbob Nice one. Please check your memory clock as this may be different in P2 than P0. IIRC AB does not support P2 memory clock but if you are going to use it then 4.3.0 beta 14 is AFAIK the latest for use with Pascal.
 

Timmah!

Golden Member
Jul 24, 2010
1,418
630
136
1b77c4c5-cd20-4eba-a1dc-a44b21213231
Mu8bFmw.jpg


171.85? Seems lower than the 1080s here. I wonder if there's some other bottleneck with the Titan X? It only reaches around 55% power consumption during the test.

Edit: Maybe a Windows 10 issue? All the high scores seem to be Windows 7 based. Windows 10 scores all seem low.

Well, i finally have my new rig with 2 EVGA 1080s FTWs put together and the scores are bit underwhelming too. Definitely not in line with what Dufus posted, i am getting just 249 score stock clocks / 257 score OCed (+91 core, + 204 VRAM), which makes single card score 120/125. Thats a far cry from 178. Now i run my CPU and RAM stock so far and did not mess with C-states either, but even then i expected my score to be at least in 140-150 range. Especially OCed, when the Afterburner reports cards running at 2113/4174 MHz.... Am i missing something? Or is there really some Win10 issue?

EDIT: Someone on Octane Forums is getting 263 score with 2 1070s @ 1987MHz. So i could have just bought 2 1070s to have the same performance and saved lot of money. Slighly annoyed now....

Seems he is running 3930K at 4,4GHz, while i run my 6850K stock. Could this have some influence?

EDIT no2.: OCed the cards bit further - core to 99 and VRAM to 309 - this slightly increased the score to from 257/259 to 261... then i changed Win Power settings from Balanced to Performance and got the score bumped to 283... thats more like it! But i still want more, at least the magical 300. Will see if OCing the CPU eventually will help there.
 
Last edited:

Dufus

Senior member
Sep 20, 2010
675
119
101
Especially OCed, when the Afterburner reports cards running at 2113/4174 MHz....

4174MHz, is that a typo? Note for memory that I was running close to 11200MT/s in P2. Also note that AFAIK Afterburner does not allow individual clocks for memory P0 and P2 but applies the same offset across both. Since Nvidia defaults are to have P2 memory clocks 1000MT/s lower than P0 this means AB would not be suitable for overclocking P2. Someone please correct me if I am wrong.

Just to make things more confusing GDDR5X should be running in QDR at max performance although it can switch to DDR hence perhaps why it was never called GQDR5. So at default speed of 10Gbps, memory clock should be 2500MHz. Don't know why AB shows 5000, maybe just the way Nvidia reports it.

I don't know why P2 memory clocks are set lower than default GDDR5X memory clocks, perhaps someone from Nvidia would explain but I have both P0 and P2 set to a little under 11200MT/s. I hope it's not that there might be thermal problems with the memory. There is a sharp drop in memory performance once going over 11200MT/s for me which then starts increasing in performance again. This maybe perhaps due to memory retraining, I'm not sure but there is little point for me in going further than 11200MT/s.

I run Windows balanced but have set CPU settings in HW so they override Windows. For me that's either the CPU is running full speed or stopped and idle.

If your not happy setting BIOS C-States then you can try the old trick of running a single thread of 32M SuperPi during the bench. It's an old trick used by some for benching SSD's where CPU latency would have a big effect on 4k random R/W speeds. It can be downloaded from HwBot

One other note, you probably already know this but there have been problems with EVGA voltage regulators and / or memory possibly overheating / failing. A fix is in place so if you didn't already know check out the latest from them.
 

Timmah!

Golden Member
Jul 24, 2010
1,418
630
136
4174MHz, is that a typo? Note for memory that I was running close to 11200MT/s in P2. Also note that AFAIK Afterburner does not allow individual clocks for memory P0 and P2 but applies the same offset across both. Since Nvidia defaults are to have P2 memory clocks 1000MT/s lower than P0 this means AB would not be suitable for overclocking P2. Someone please correct me if I am wrong.

Just to make things more confusing GDDR5X should be running in QDR at max performance although it can switch to DDR hence perhaps why it was never called GQDR5. So at default speed of 10Gbps, memory clock should be 2500MHz. Don't know why AB shows 5000, maybe just the way Nvidia reports it.

I don't know why P2 memory clocks are set lower than default GDDR5X memory clocks, perhaps someone from Nvidia would explain but I have both P0 and P2 set to a little under 11200MT/s. I hope it's not that there might be thermal problems with the memory. There is a sharp drop in memory performance once going over 11200MT/s for me which then starts increasing in performance again. This maybe perhaps due to memory retraining, I'm not sure but there is little point for me in going further than 11200MT/s.

I run Windows balanced but have set CPU settings in HW so they override Windows. For me that's either the CPU is running full speed or stopped and idle.

If your not happy setting BIOS C-States then you can try the old trick of running a single thread of 32M SuperPi during the bench. It's an old trick used by some for benching SSD's where CPU latency would have a big effect on 4k random R/W speeds. It can be downloaded from HwBot

One other note, you probably already know this but there have been problems with EVGA voltage regulators and / or memory possibly overheating / failing. A fix is in place so if you didn't already know check out the latest from them.

Thats the number the Afterburner reports. Now i OCed it further, to 409 and it says 4920 now. I thought its the result of those memory quad pumped whatever shenanigans and did not really paid attention.

BTW, is the Afterburner bit buggy? Or Nvidia drivers? For whatever reason, playing around it today and trying different clocks, one of my cards (listed as No.2 in AB) stopped boosting. AB would report its frequency as 1721 even under load, when i ran the bench, while the other card ran at usual 2100 MHz. The bench results did not seem to change though, actually i think i got even higher score than before LOL, so i wonder if it really was something wrong with the card/drivers or just AB reported wrong number... then again, so did GPU-Z. Anyway, rebooting the machine seemed to fix it.

BTW, what GPU load do you get reported? In my case, it hovers around 80 percent during the bench, and i think even during rendering one of my own scenes within actual production app. Is that normal? I expected it to be close to 100 percent.

Thanks about the heads up regarding Evga issues, i am aware about them. So far both my cards seem to run stable, i did only bit of benching though and played CoD IW for about 45 minutes (on one card, not SLI). Everything seems fine here, hope it stays that way. I applied for their thermal pads though and once they arrive, i will probably install them.
 

Dufus

Senior member
Sep 20, 2010
675
119
101
Thats the number the Afterburner reports. Now i OCed it further, to 409 and it says 4920 now.
With GDDR5X specified to run at 10gps, that is not an OC it is an underclock. Default memory speed in AB would be 5000 (5005) but this gets cut down in P2 and because AB does not allow overclocking P2 and P0 memory separately AFAIK then overclocking P2 to say 5500 would result in P0 running memory at 6000 which may result in a crash and/or artifacts. I have both P0 and P2 set individually which allows both P2 and P0 to run at what would be 5580 in AB IIRC but then I don't use AB to OC.

Pascal is buggy but not enough to be a show stopper for the majority so don't know if those bugs will get fixed. For instance forcing P-States may end up breaking the video clock and OC'ing may become 'awkward'.

You can see in my previous screen shots that GPU load peaks at 98%. It doesn't peg 98% but generally runs in the 90's.
 
Last edited:
  • Like
Reactions: Timmah!

Timmah!

Golden Member
Jul 24, 2010
1,418
630
136
With GDDR5X specified to run at 10gps, that is not an OC it is an underclock. Default memory speed in AB would be 5000 (5005) but this gets cut down in P2 and because AB does not allow overclocking P2 and P0 memory separately AFAIK then overclocking P2 to say 5500 would result in P0 running memory at 6000 which may result in a crash and/or artifacts. I have both P0 and P2 set individually which allows both P2 and P0 to run at what would be 5580 in AB IIRC but then I don't use AB to OC.

Pascal is buggy but not enough to be a show stopper for the majority so don't know if those bugs will get fixed. For instance forcing P-States may end up breaking the video clock and OC'ing may become 'awkward'.

You can see in my previous screen shots that GPU load peaks at 98%. It doesn't peg 98% but generally runs in the 90's.

I see now. So what am i supposed to do about it? Or better said, is it actually running at P0 under load, even though AB reports P2 clocks (thus the clocks under load are actually 5000 + 409 OC of mine, not 4920 as AB/GPU-Z says)? Or am i truly running my VRAMs underclocked all the time? AB reports my default speed as 4511 (4920-409), not 5000. This could perhaps explain my lower scores compared to you / other people?

I looked at the GPU load again and it goes into 90s too - its just different parts of the bench (directlighting) are probably less taxing, so the load drops there to 80 avg. And i guess i was looking exactly during that period when i checked first time.
 

Dufus

Senior member
Sep 20, 2010
675
119
101
The P2 memory down clock has been around for a while for compute applications and making use of CUDA. For instance here

You could request the Author of AB to add a separate P2 memory clock but you might have better luck convincing EVGA to do it using Precision XOC especially if you point out it would be a feature AB doesn't have. I have also read some people have had success when using NvidiaInspector, unfortunately don't remember where exactly. Might have been a mining blog, lots of crunchers out there experiencing the same problems with P2.
 
  • Like
Reactions: Timmah!

Timmah!

Golden Member
Jul 24, 2010
1,418
630
136
So, if i looked at Afterburner while playing games, it would show 5000/5005 MHz (+409 with my OC), but while using CUDA it works in that P2 state, where the RAMs are underclocked? Did i get this right?

If you messed around it and OCed your VRAMs to work with Octane Bench at equal clocks as they would in games, that i guess would explain the high scores you were able to get.
 
  • Like
Reactions: Dufus

Dufus

Senior member
Sep 20, 2010
675
119
101
Two things, memory OC and c-state latency.

Remember I ran this benchmark just to see what score I could achieve and that's it. You however will have to decide on how well you optimize to bring the best efficiency to your work.

Here's a run with W7 and C-states enabled. Score 157.

2dsl3rb.jpg



And W7 with C-states disabled except for C0 and C1. Score 176.

opnrih.jpg



A run on Linux using Coolbits.

27zb0h2.jpg


Coolbits is a little limited and has the same problem with memory overclocking, there's just one offset used. It does however show all four performance levels. Starting the bench will force level 2 for as long as the bench application is kept open so just as a means we can cancel the bench run itself and set the memory overclock while still in level 2 (P2). You can see memory clock is now 11186 MT/s which in AB under Windows would show half that, 5595. Have to be a little careful here as the level 3 (P0) mem clock is now set to 12170 MT/s and if we exit the bench application the clocks will change to level 3 ( or even level 1 for that matter) and my memory doesn't run that high so would crash the system.

Tip: If you want to see the other performance states in Windows try running NvidiaInspector.

Okay I'm off for a couple of weeks, hope you find some resolve.
 
  • Like
Reactions: Timmah!