Love all the great stories of a development on ps2, found this a while back.
http://robertoconcerto.blogspot.fi/2013/03/my-hardest-bug.html
Never had anything that hellish.
Worst for me took a day. My Gameboy Advance demos worked on emulators but not on real hardware.
Finally narrowed it down to DMA code and it worked in C but not inlined asm, and only occurred on back to back transfers as when loading a title screen or any map (tiles/charset, nametable, and palette).
DMA looks like:
Write dest address to DMA channel DST reg
Write src address to DMA channel SRC reg
Write flags, etc, and start the DMA to DMA CTRL reg
Doing this back to back works fine because no cache in GBA, CPU is automatically halted when DMA takes the bus, so no need to wait/check DMA status on consecutive transfers.
Turns out GBA requires two cycles for DMA to latch the CPU accessible DEST/SRC internally to the address counters so CPU gets off two more instructions after a write to DMA CTRL.
So
write src
write dst
write ctrl ; tiles
write src
write dst
write ctrl ; nametable
write src
write dst
write ctrl ; palette
CPU was writing new src and dst as DMA was still latching the previous values causing a hardware lockup.
Simply adding two NOPS between consecutive DMA uses solved the problem. It worked in C because the epilogue/prologue code and return provided the sufficient 2 instruction wait before attempting to access DMA again.
I think this might have even been documented in the official manual, but we have to find stuff like that ourselves the hard way when doing homebrew.