Looking at the PS2 specs, I can imagine programming for that thing was quite the nightmare.
Soo, if Sega were able to put through such a patent, anything using memory and storage chips would be royalties paid to Sega. :whiste:
It was fun as hell for someone like me. I absolutely enjoyed the challenge of keeping the DMAC packets going from pipe to pipe without stalls. Had to tripple buffer the VU1s 64k (?) so that the GS could read ready to render vertex data, VU could process the current data packet in place or have enough room for src and dest, then a third section allocated fir incoming DMA for the next packet of data. All happens concurrently in a heavily pipelined organization. The packet size and workload had to be experimented with such that by the time the current incoming DMA completed and it kicked off a VU1 execute on it it would stall for minimal time because GS would have JUST finished, as would the VU1 program currently running to xgkick another GS draw just in time, and go idle to accept the next VU1 execute request on the new data, and the stalled DMA would start uploading the next packet again to the new floating destination buffer. It was all about avoiding stalls just in time.
Btw EE (the R5900 MIPS main CPU) and VU0 were both idle and free to do their own thing simultaneously because the entire chain of events I just described is commanded entirely by the very intelligent DMA controller. You chain together packets with DMA headers and once you kick it off, the DMA just hops from DMA header to DMA header however many are chained, until its all done. So powerful. DMAC ruled the show, the CPU was just a resource. You could build an entire frame in sorted draw order in a DMA display list, including synchronized texture swaps and VU1 program changes and constants uploads all at once, and just send it off, meanwhile EE and VU0 are doing AI and physics and prepairing the next display list.
Then throw texture synchronization into the mix which bypassed VIF1 and went straight to GIF in parallel synchronized with dependent geometry by tags that stalled the VIF if the texture upload at the GIF wasn't completed by the time that geometry stream came up to bat.
Optimally you allocated remaining video memory for two textures: the one currently rendering , and the next one in sorted draw order being uploaded before its needed. PS2 was a streaming architecture, it was MADE to utilize GOBS of bus bandwidth to stream data as needed rather than keep things in memory at all times. And with 1MB of texture memory left after front, rear, and z buffers, you had no choice. If you tried to hoard memory and keep textures resident, you'd have low res 8 bit textures and an idle bus.
It was my favorite system to program for.
The first time you get a transformed, perspective projected clipped triangle on your TV and double check that your code only uploaded to VIF/VU1 and you didn't touch GS directly or perform ANY geometry processing on EE and confirm your VU1 transform and clip program is working, its straight up orgasmic :awe:
And VU programming in and of itself with its dual side by side scalar and SIMD instruction pairs executed simultaneously and having to pipeline vertex processing around the 4 cycle load/store and 7 cycle perspective divide was an art all of its own. God I love that system.
I used a DMS3 mod chip to boot from memory card into Pukklink IP loader and had Visual Studio configured to run gcc and send the elf to the IP client on the PC to reboot the PS2 and run the program over the LAN on hitting F5 and print back debug messages to the PC. VU programs I just organized in excel to work out my dual instruction dependencies, paste to notepad and assembled, and just included the output as raw block data in .h files in the main program.