MunkyMark

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Munky

Diamond Member
Feb 5, 2005
9,372
0
76
Ok, I just made some updates, and uploaded the newer version. Hopefully it will fix some of the issues people were having with SM2 cards not running and hanging while closing the app. Let me know how it works.
 

xtknight

Elite Member
Oct 15, 2004
12,974
0
71
Congrats, it works nicely in wine. That's some potential!

GeForce 7800 GT/PCI/SSE2
PS3, single pass, dynamic rendering
iterations: 255
fps: 11
SM3=6.11, SM2=6.00, FBO=6.10, CPU=1.56

Not a big deal, but in VMware it still crashes with this. I'm not sure why (if the capabilities are unavailable there should be a graceful exit though). (DEP is only on for essential services.)

The exception unknown software exception (0xc0000005) occurred in the application at location 0x00000000.

Click on OK to terminate the program
Click on CANCEL to debug the program
 

Zstream

Diamond Member
Oct 24, 2005
3,395
277
136
Originally posted by: xtknight
Congrats, it works nicely in wine. That's some potential!

GeForce 7800 GT/PCI/SSE2
PS3, single pass, dynamic rendering
iterations: 255
fps: 11
SM3=6.11, SM2=6.00, FBO=6.10, CPU=1.56

Not a big deal, but in VMware it still crashes with this. I'm not sure why (if the capabilities are unavailable there should be a graceful exit though). (DEP is only on for essential services.)

The exception unknown software exception (0xc0000005) occurred in the application at location 0x00000000.

Click on OK to terminate the program
Click on CANCEL to debug the program

So does that mean if my SM3 score is 24 and yours is 6 my 1900XTX is four times faster?
 

schneiderguy

Lifer
Jun 26, 2006
10,801
91
91
Originally posted by: Zstream
Originally posted by: xtknight
Congrats, it works nicely in wine. That's some potential!

GeForce 7800 GT/PCI/SSE2
PS3, single pass, dynamic rendering
iterations: 255
fps: 11
SM3=6.11, SM2=6.00, FBO=6.10, CPU=1.56

Not a big deal, but in VMware it still crashes with this. I'm not sure why (if the capabilities are unavailable there should be a graceful exit though). (DEP is only on for essential services.)

The exception unknown software exception (0xc0000005) occurred in the application at location 0x00000000.

Click on OK to terminate the program
Click on CANCEL to debug the program

So does that mean if my SM3 score is 24 and yours is 6 my 1900XTX is four times faster?

no

edit: well in this benchmark, yes. but not in real games
 

Munky

Diamond Member
Feb 5, 2005
9,372
0
76
Originally posted by: schneiderguy
Originally posted by: Zstream
Originally posted by: xtknight
Congrats, it works nicely in wine. That's some potential!

GeForce 7800 GT/PCI/SSE2
PS3, single pass, dynamic rendering
iterations: 255
fps: 11
SM3=6.11, SM2=6.00, FBO=6.10, CPU=1.56

Not a big deal, but in VMware it still crashes with this. I'm not sure why (if the capabilities are unavailable there should be a graceful exit though). (DEP is only on for essential services.)

The exception unknown software exception (0xc0000005) occurred in the application at location 0x00000000.

Click on OK to terminate the program
Click on CANCEL to debug the program

So does that mean if my SM3 score is 24 and yours is 6 my 1900XTX is four times faster?

no

edit: well in this benchmark, yes. but not in real games

The SM3 method uses per-pixel dynamic branching. Depending on the hardware, it may or may not provide an increase in performance. Factors such as cost of dynamic branching as well as how many pixels per batch the gpu works with will affect the results. For comparison, use the single pass SM2 method to measure brute PS performance without the benefits/penalties of dynamic branching.
 

Munky

Diamond Member
Feb 5, 2005
9,372
0
76
Yep. I got some ideas from looking at other similar apps, but I made this myself. Mainly I wanted to see just how much faster a GPU can crunch math than a CPU.
 

xtknight

Elite Member
Oct 15, 2004
12,974
0
71
When multiple threads are used for cpu rendering, each thread renders an interlaced image - meaning that for 2 threads each thread renders every other line. With 4 threads each one does every 4th line and so on. The number of rendering threads is equal to the number of processors detected.

Would you mind explaining how this is? Aren't 3D apps rendered using triangle/vertex arrays and line strips/etc? Or are you doing it in a way that is slower but multithreaded, allowing it to do one line at a time?
 

Munky

Diamond Member
Feb 5, 2005
9,372
0
76
Originally posted by: GundamSonicZeroX
How come some people are getting 1 digit results while others are getting high 20's?

The x1k cards from Ati have much more efficient dynamic branching performance than other cards. Dynamic branching is used in the SM3 mode to break out of a loop when the result is found, instead of running all 255 iteration for each pixel.
 

Munky

Diamond Member
Feb 5, 2005
9,372
0
76
Originally posted by: xtknight
When multiple threads are used for cpu rendering, each thread renders an interlaced image - meaning that for 2 threads each thread renders every other line. With 4 threads each one does every 4th line and so on. The number of rendering threads is equal to the number of processors detected.

Would you mind explaining how this is? Aren't 3D apps rendered using triangle/vertex arrays and line strips/etc? Or are you doing it in a way that is slower but multithreaded, allowing it to do one line at a time?

The cpu method still uses OpenGL for the actual drawing on the screen, by simply reading from a buffer that the cpu writes to. The drawing is not multithreaded, it just draws pixels based on the data in the buffer, without any triangles or vertices. But the cpu does all the math, and if there 2 threads, each thread calculates the values for every other line: so thread 1 does even-numbered lines, and thread 2 does odd-numbered lines. Both threads write to different parts of the same buffer. When both threads are finished, OpenGL reads the buffer and draws the result.
 

Munky

Diamond Member
Feb 5, 2005
9,372
0
76
Originally posted by: JRW
Originally posted by: x80064
I just use escape to exit the program, works fine everytime

Yea I did read the readme.txt lol , escape doesnt work for me even in the updated version.

My scores seem low after reading through this thread? using latest official nvidia driver (91.47).

Results: http://img127.imageshack.us/img127/8761/clip2ic9.jpg

Your SM2 score does seem too low, even compared to other 7-series users. Not sure what the issue is, but your cpu score and FBO score is almost the same as mine. I'll have to try some more ideas on the refusing to exit issue. Do the F-keys work for you, at least?
 

xtknight

Elite Member
Oct 15, 2004
12,974
0
71
Originally posted by: munky
Originally posted by: xtknight
When multiple threads are used for cpu rendering, each thread renders an interlaced image - meaning that for 2 threads each thread renders every other line. With 4 threads each one does every 4th line and so on. The number of rendering threads is equal to the number of processors detected.

Would you mind explaining how this is? Aren't 3D apps rendered using triangle/vertex arrays and line strips/etc? Or are you doing it in a way that is slower but multithreaded, allowing it to do one line at a time?

The cpu method still uses OpenGL for the actual drawing on the screen, by simply reading from a buffer that the cpu writes to. The drawing is not multithreaded, it just draws pixels based on the data in the buffer, without any triangles or vertices. But the cpu does all the math, and if there 2 threads, each thread calculates the values for every other line: so thread 1 does even-numbered lines, and thread 2 does odd-numbered lines. Both threads write to different parts of the same buffer. When both threads are finished, OpenGL reads the buffer and draws the result.

Gotcha.

All F-Keys and Escape work fine here.

Edit: disregard what was asked here (about the multiple images), it was covered in the readme.
 

JRW

Senior member
Jun 29, 2005
569
0
76
Originally posted by: munky
Originally posted by: JRW
Originally posted by: x80064
I just use escape to exit the program, works fine everytime

Yea I did read the readme.txt lol , escape doesnt work for me even in the updated version.

My scores seem low after reading through this thread? using latest official nvidia driver (91.47).

Results: http://img127.imageshack.us/img127/8761/clip2ic9.jpg

Your SM2 score does seem too low, even compared to other 7-series users. Not sure what the issue is, but your cpu score and FBO score is almost the same as mine. I'll have to try some more ideas on the refusing to exit issue. Do the F-keys work for you, at least?

Yep the F keys work as well as the other controls (Tab etc.) ,The escape key just doesnt wanna respond and I made sure my ESC key does work outside of the program ;)
 

Munky

Diamond Member
Feb 5, 2005
9,372
0
76
Ok, all the people who have trouble exiting the app, I made a new version with an optional "-debug" command line switch. The switch forces the app to run in a 640x480 window, so try it and tell me if you still can't exit correctly. That way I'll know if it's related to the fullscreen mode or not.
 

JRW

Senior member
Jun 29, 2005
569
0
76
Originally posted by: munky
Ok, all the people who have trouble exiting the app, I made a new version with an optional "-debug" command line switch. The switch forces the app to run in a 640x480 window, so try it and tell me if you still can't exit correctly. That way I'll know if it's related to the fullscreen mode or not.

Escape does work when running it with -debug option.
 

GundamSonicZeroX

Platinum Member
Oct 6, 2005
2,100
0
0
Originally posted by: munky
Originally posted by: GundamSonicZeroX
How come some people are getting 1 digit results while others are getting high 20's?

The x1k cards from Ati have much more efficient dynamic branching performance than other cards. Dynamic branching is used in the SM3 mode to break out of a loop when the result is found, instead of running all 255 iteration for each pixel.

Call me dumb but are you going to recode it where NV cards get ore reasonable scores?
 

Munky

Diamond Member
Feb 5, 2005
9,372
0
76
Originally posted by: GundamSonicZeroX
Originally posted by: munky
Originally posted by: GundamSonicZeroX
How come some people are getting 1 digit results while others are getting high 20's?

The x1k cards from Ati have much more efficient dynamic branching performance than other cards. Dynamic branching is used in the SM3 mode to break out of a loop when the result is found, instead of running all 255 iteration for each pixel.

Call me dumb but are you going to recode it where NV cards get ore reasonable scores?

What do you mean? Currently I use the same shaders for both Ati and Nvidia cards, and even more surprisingly, both cards can actually run it without errors. If the next generation of Nvidia cards improve dynamic branching efficiency, then they will get increased SM3 scores without me having to change anything.
 

Paratus

Lifer
Jun 4, 2004
17,637
15,825
146
Still no joy. I can get the debug window but it still just goes away and never comes back. I still have to kill it with the task manager. (Seems to be using 50% of my P4 w HT so it's only spawning 1 thread)
 

Munky

Diamond Member
Feb 5, 2005
9,372
0
76
Originally posted by: Paratus
Still no joy. I can get the debug window but it still just goes away and never comes back. I still have to kill it with the task manager. (Seems to be using 50% of my P4 w HT so it's only spawning 1 thread)

Does it at least respond to the F-keys or Enter? Does it even display anything?
 

Paratus

Lifer
Jun 4, 2004
17,637
15,825
146
Originally posted by: munky
Originally posted by: Paratus
Still no joy. I can get the debug window but it still just goes away and never comes back. I still have to kill it with the task manager. (Seems to be using 50% of my P4 w HT so it's only spawning 1 thread)

Does it at least respond to the F-keys or Enter? Does it even display anything?

Nope without the debug it resets the screen to 1024x768 and then it goes black. None of the F keys work nor ESC. In the task manager it using about 34,000K of memory and 50% of the CPU. I also no longer get the SM3 warning message.

Does it take several minutes to do anything? While I'm on an older machine it's not that old (P4 3.2E 800FSB w HT skt 478 - 2GB Ram, AGP 8x 9600XT -128mb Vram all stock)

I would still like to try it if you feel like doing some more work on it.