raycaster on an FPGA

BrownTown

Diamond Member
Dec 1, 2005
5,314
1
0
So, I have a final project comming up and I was thinking about doing a raycaster type program on my FPGA. I was wondering if ya'll think I can get everything to fit, and get the timming tight enough? I feel like there should be a problem given that FPGAs are so much slower and ASICs, and the raycasters I program in C ran at only 60FPS. But honestly, I cannot find a problem. It seems like I could easily get the timming down. At 640x480 @ 20 FPS I would have 162ns to play with per pixel, that seems like a pretty good amount of time. Also, the program could be divided easily, I was thinking 4 pixel pielines working in parellel using the 10MB DRAM as the frame buffer. My sinusoidal lookup tables can also easily occupy the DRAM, or FLASH since they are constants. I'm not sure the DRAM bandwidth, but the sinusoids only load 3 per horizontal line (IE:trivial), and I only need about 100MB/s on the pixels(256 colors) (actually now that I think about it, textures would probably be difficult, so 256 colors is even pushign it, only really need 16 colors, so half that bandwidth although textures would be good if there is time). Keyboard and mouse handlers would also be programmed of course for simple movement logic. So, you could have a basic proof of concept raytracing program running for hardware and actually have a game that looks more like DOOM and the silly FROGGER and SPACE INVADERS type games everyone else will probably do.

EDIT: so I've been doing more work and it DEFFINITELY looks doable, but still I am a little skeptical, because the timming issues seems so easy to meet, but running a similar program on my 2GHz P4 back in the day only got the same framerate that I am expecting to get now running at 6MHz (and almost all of the calculations run on the 13kHz cycles since they are resued for every pixel). Just amazing how much better a specific purpose integrated circuit can be (Like a GPU) compared to a general purpose CPU.
 

bobsmith1492

Diamond Member
Feb 21, 2004
3,875
3
81
P.S. : When you say "timming," do you mean "trimming" or "timing" ? There is a big difference...
 

Special K

Diamond Member
Jun 18, 2000
7,098
0
76
So you are going to code the pixel pipelines in verilog or something? How difficult would that be?
 

BrownTown

Diamond Member
Dec 1, 2005
5,314
1
0
yeah, I'm gonna do it in VHDL and that would be pretty easy to be honest, I wrote pseudocode for the raycaster in C this morning and 90% of the calculations only have to be done once per vertical line. So in fact only one pipeline will be needed and it will probably only take ~10% of the chip space. Actually I've gone from thinking that this would be too much to realising that it would be pretty trivial to impliment. I sorta want something a little more calculation intensive that requires pipelining and parellelism, but right now it looks like this can just be done with a single stage pipeline without any difficulty in meeting the timing constraints. I guess Ill go ahead and texture map it too since that is a per pixel operation.
 

BrownTown

Diamond Member
Dec 1, 2005
5,314
1
0
New Question:

The monitor currently refreshes row by row which I assume is standard. But since the raycaster generates information column by column I wanted to changeh the monitor to run this way. However I do not know if this is possible or if it will hurt the monitor? Can I jsut change the Vsynch and Hsynch pulses easily so that the monitor updates column by column, or does it assume that information will only come in row by row and will be confused by this?
 

BrownTown

Diamond Member
Dec 1, 2005
5,314
1
0
hmm, it seems to be working when I click it, but there are some stupid popups :(. Probably get a better link like a wiki article on the same topic that would be more reliable and less pop-up-y.

As for the Vsynch stuff I figured out a way to get around that, I put in a what is more or less a Z-buffer (since raycasters don't have Z-axis thats a misnomer, its actually a Y-buffer) which stores the depth and color values from each column until its needed for the pixels.

UPDATE: So, this is taking longer than I though, but I think I've got everything set up so I can get into the real code. Of course the fact it took me 8 hours of qork just to get the framework up to actually start the real raycaster is kinda scarry, but now all the nescecarry information is routed where it needs to be and all the memmory signal are set correctly so the correct adresses and data will be sent to them and the hold time are good for it to be written and such. Now I just need put the transforms in to go from the input information to the correct ouput information istead of just hte debugging inputs I'm using now. Hopefully tommorow I can get the first beta of the renderer workign so that there is actually something meaningfull to see and then I can expand it from there.
 

MrDudeMan

Lifer
Jan 15, 2001
15,069
94
91
ouch. 6MHz? The FPGA we use runs at 50MHz, so video with double buffering at 60Hz was dead simple.

what hardware are you using?
 

BrownTown

Diamond Member
Dec 1, 2005
5,314
1
0
no, its main clock is 50 MHz, I'm just saying that the logic only has to run at 6MHz to achieve the same results as a P4 at 2GHz. Like I said too, the vast majority of the calculation are only done at a much slower clock rate. Currently they are going at 56kHz, but that isn't because they couldn't run at several MHz or anything like that, its simply because thats all they need to run at.
 

Lord Banshee

Golden Member
Sep 8, 2004
1,495
0
0
BrownTown,

Nice project :) can not wait to see pics..

BTW what hardware are you going to be running this on.. i am just wondering as i am always looking for something to replace my no so great board from my digital design class.
 

MrDudeMan

Lifer
Jan 15, 2001
15,069
94
91
Originally posted by: BrownTown
Altera DE2

Just what the proffessor gave us at the begining of the class, nothing fancy so far as I know.

That is the board we are using. IMO (and everyone else in the class + professor) it is very lacking software wise. The only reason we haven't given them away for free is because of the VHDL support. Most of the C/C++ Altera provides is garbage, especially the ethernet driver. I wrote my own driver for the DE2 because the one that comes with it is SO slow. The VGA core is also pretty bad because the university core limits you to something like 80x60 pixels, maybe less. As a class we wrote our own video driver and it works better than the university core driver or the stand-alone driver provided by Altera. The USB, PS2, and sound drivers are all pretty shoddy as well.

On more than one occasion we have found errors in their code and called their support line to confirm our find. So far we are 5/5. As an example, if you are using ethernet, you have to pick your own MAC for each board as they are all shipped with the same one because the software for one is the software for all. The ethernet interrupt routine provided by Altera doesn't respond after the buffer has 1 or more packets waiting to be processed in the form it is shipped in. You have to add a few lines of code at the bottom for it to work, which can be found in other interrupt routines throughout the Altera software.

Overall my experience with the DE2 has been poor at best. I wouldn't recommend it to an enemy. It is, however, better than its predecessors by a wide margin (UP2, UP3).

Sorry about the threadjack BrownTown.
 

BrownTown

Diamond Member
Dec 1, 2005
5,314
1
0
We are using a library from a third party that has alot of the basic drivers so I guess I haven't really run into that problem.
 

Lord Banshee

Golden Member
Sep 8, 2004
1,495
0
0
lol thats very nice board compared to ours http://www.bin-tek.com/item.aspx?cid=391&iid=btu001

Not only is it 135 dollars but it doesn't have any sample anything and hardly any expansion. every was made with my two hands and my laptop :) So these DE2 boards with software and models, this is ip cores we are talking about here?

Anyway i wish i had that board but i been thinking about waiting for an update for the cyclone 3 first then make up my mind.

Also sorry for thread jacking

So any update?
 

BrownTown

Diamond Member
Dec 1, 2005
5,314
1
0
Nah, today I have to do the work thats due this week, and the project isn't due till May 1st. Maybe if I get that other work done I can get back to the FPGA, but I've been a little frustrated with the different signs and magnitudes of the different signals. I think I'll probably just take the easy way out and use really high precision for everything instead of trying to be elegent about it since It's not like I am even gonna come close to using all the resources anyways. I think I should put the lookup table memmories on a different PLL too since right now the clock network is apparently pretty heavily loaded (at least that what all the warnings say I dunno much about that stuffs) which might cause problems for the VGA display which actually has to run at a decent speed.
 

BrownTown

Diamond Member
Dec 1, 2005
5,314
1
0
So, if anyone cares, the untextured version works now, some I'm going on to texturing. Does anyone know a good program that can take a bunch of textures and create a common pallet, or can convert textures into arbitrary un-palleted color depths (IE 64 bit colors based on 3 2bit channels instead of a 64bit colormap like is normally used). Basically I just want ways to use as little memory for storing textures as possible. Also, If you know any good websites with very SIMPLE (computationally cheap) procedural textures those would be nice too.
 

Fallen Kell

Diamond Member
Oct 9, 1999
6,223
540
126
Originally posted by: BrownTown
New Question:

The monitor currently refreshes row by row which I assume is standard. But since the raycaster generates information column by column I wanted to changeh the monitor to run this way. However I do not know if this is possible or if it will hurt the monitor? Can I jsut change the Vsynch and Hsynch pulses easily so that the monitor updates column by column, or does it assume that information will only come in row by row and will be confused by this?

Why not just simply rotate the image you are processing 90 degrees for processing, which would then cause the "columns" to become "rows" and it would output properly to the monitor? It is a real solution to the problem and not a hardware specific hack to that monitor, which means you could then use the FPGA on any monitor, just just your custom one.
 

CTho9305

Elite Member
Jul 26, 2000
9,214
1
81
Originally posted by: BrownTown
EDIT: so I've been doing more work and it DEFFINITELY looks doable, but still I am a little skeptical, because the timming issues seems so easy to meet, but running a similar program on my 2GHz P4 back in the day only got the same framerate that I am expecting to get now running at 6MHz (and almost all of the calculations run on the 13kHz cycles since they are resued for every pixel). Just amazing how much better a specific purpose integrated circuit can be (Like a GPU) compared to a general purpose CPU.

Either your code is really bad, vsync was on, or there's more going on than you're telling us. A 2GHz CPU would produce astronomical framerates on a raycaster like Wolfenstein 3D, or even a more advanced renderer like Doom 1.
 

Special K

Diamond Member
Jun 18, 2000
7,098
0
76
Originally posted by: BrownTown
So, I have a final project comming up and I was thinking about doing a raycaster type program on my FPGA. I was wondering if ya'll think I can get everything to fit, and get the timming tight enough? I feel like there should be a problem given that FPGAs are so much slower and ASICs, and the raycasters I program in C ran at only 60FPS. But honestly, I cannot find a problem. It seems like I could easily get the timming down. At 640x480 @ 20 FPS I would have 162ns to play with per pixel, that seems like a pretty good amount of time. Also, the program could be divided easily, I was thinking 4 pixel pielines working in parellel using the 10MB DRAM as the frame buffer. My sinusoidal lookup tables can also easily occupy the DRAM, or FLASH since they are constants. I'm not sure the DRAM bandwidth, but the sinusoids only load 3 per horizontal line (IE:trivial), and I only need about 100MB/s on the pixels(256 colors) (actually now that I think about it, textures would probably be difficult, so 256 colors is even pushign it, only really need 16 colors, so half that bandwidth although textures would be good if there is time). Keyboard and mouse handlers would also be programmed of course for simple movement logic. So, you could have a basic proof of concept raytracing program running for hardware and actually have a game that looks more like DOOM and the silly FROGGER and SPACE INVADERS type games everyone else will probably do.

EDIT: so I've been doing more work and it DEFFINITELY looks doable, but still I am a little skeptical, because the timming issues seems so easy to meet, but running a similar program on my 2GHz P4 back in the day only got the same framerate that I am expecting to get now running at 6MHz (and almost all of the calculations run on the 13kHz cycles since they are resued for every pixel). Just amazing how much better a specific purpose integrated circuit can be (Like a GPU) compared to a general purpose CPU.

Just out of curiosity what class is this for, and are you working alone on this project?
 

BrownTown

Diamond Member
Dec 1, 2005
5,314
1
0
Originally posted by: CTho9305
Originally posted by: BrownTown
EDIT: so I've been doing more work and it DEFFINITELY looks doable, but still I am a little skeptical, because the timming issues seems so easy to meet, but running a similar program on my 2GHz P4 back in the day only got the same framerate that I am expecting to get now running at 6MHz (and almost all of the calculations run on the 13kHz cycles since they are resued for every pixel). Just amazing how much better a specific purpose integrated circuit can be (Like a GPU) compared to a general purpose CPU.

Either your code is really bad, vsync was on, or there's more going on than you're telling us. A 2GHz CPU would produce astronomical framerates on a raycaster like Wolfenstein 3D, or even a more advanced renderer like Doom 1.

the renderer's on those programs are written in assembly and fined tuned for performance, I would challenge you try and actually write the code so as to achieve such astronomical frame rates, especially considering I wrote it as a sophomore in high school, so its not like I was an efficiency expert at the time.

@ Special K:This is for FPGA Design class, and its a two person group. Its done now, so its just writing the 10 page report about it and the 15 minute powerpoint presentation left to do, but that should only take a few hours.
 

CTho9305

Elite Member
Jul 26, 2000
9,214
1
81
Originally posted by: BrownTown
Originally posted by: CTho9305
Originally posted by: BrownTown
EDIT: so I've been doing more work and it DEFFINITELY looks doable, but still I am a little skeptical, because the timming issues seems so easy to meet, but running a similar program on my 2GHz P4 back in the day only got the same framerate that I am expecting to get now running at 6MHz (and almost all of the calculations run on the 13kHz cycles since they are resued for every pixel). Just amazing how much better a specific purpose integrated circuit can be (Like a GPU) compared to a general purpose CPU.

Either your code is really bad, vsync was on, or there's more going on than you're telling us. A 2GHz CPU would produce astronomical framerates on a raycaster like Wolfenstein 3D, or even a more advanced renderer like Doom 1.

the renderer's on those programs are written in assembly and fined tuned for performance, I would challenge you try and actually write the code so as to achieve such astronomical frame rates, especially considering I wrote it as a sophomore in high school, so its not like I was an efficiency expert at the time.

I wrote a 3d engine in JavaScript: http://ctho.ath.cx/toys/3d.html - it uses a real z-buffer when you put it in filled polygon mode. The overhead of the actual display slows it down by maybe 2 orders of magnitude - if I were using mode 13h for the display and C for the logic, it'd be screaming.

http://www.abrahamjoffe.com.au/ben/canvascape/ is a raycaster and he quite clearly gets good framerates. He's even got a textured version. Again, he's being hit hard by the poor performance of canvas for display (in the textured case, I suspect multiple large image transformations are occuring for each column of pixels), and the overhead involved in using an interpreted language (it's not even JITed in Firefox/SeaMonkey).

What did you write your renderer in? Do you still have the code?
 

BrownTown

Diamond Member
Dec 1, 2005
5,314
1
0
I wrote two incarnations in high school: sophomore year I wrote one in visual basic that made like 10 FPS (no lookup tables FTL), senior year I made one in Java that got 40-100 FPS textured with lighting on a 256x256 map. I don't have the code for either of those. The one I am doing now is in VHDL (60FPS essentially, but no Vsynch, so the screen tears ) and is untextured with 16x16 map, I can send you the code for that if you wants it. I also made a raytracer in C++ with reflections, diffuse and dielectric effects, but none of the fancy stuff (thats like .1FPS btw). The problem here is that were not actually hitting the graphics card here (or even using SSE unless the compiler is doing it), so the performance is terrible compared to what it could be (raycaster should be hitting thousands of FPS if written efficiently). IF I just had a chip with some DSP elements (IE: graphics card or CPU vector units) you could make a ray caster or ray tracer MUCH faster since the calculations per pixel (or column in the raycasters case) are independent.