I should start a kickstarter to fund me learning more VHDL so I can make a 2D accelerator for the IBM PC via a 8 bit ISA card.
None exist until Windows and at that point they are 16 bit+ ISA/VLB/PCI and are proprietary, there is no DOS capable VGA card with a hardware blitter and VLB2.0 accelerator functions were never implemented in any known card. Even the earliest Mach 8 and other 8514 based accelerator cards are 640x480 and 1024x768 16 color and would only offer asynchronous hardware VRAM to VRAM copy commands.
So I looked at using a Yamaha VDP V9990 and making my own card But it's pretty crappy too, only 2 layers restricted to 16 colors and 256 x 212. Not even Sega Genesis quality despite having 512k VRAM.
So maybe I'll make my own chip too:
320 x 240 x 16 bpp, 4 layers of 80 x 60 tiles of 8x8 size (4 screens of scrolling) (128 bytes per tile)
256 sprites 8x8 in size sharing the same tile memory.
128k x 16 bit SRAM for 2048 8x8 565 RGB tiles.
32k x 16 SRAM for layer name tables
Support for H/V flip, BG and sprite priority, and vblank and hblank interrupts.
32k-64k sections of the card's RAM specified by user to be mapped at D0000-DFFFF or such to allow linear framebuffer access for level loads, tile animations, etc without conflicting with VGA @ A0000-AFFFF.
It should be relatively simple as a stand alone card (and 100x more complex integrating VGA). The BG layer logic is pretty much just an address generator and some counters. During scan of line n, simultaneously compute the initial pattern table address and pixel pattern address counters from the scroll registers and get it ready, from there it's just incrementing the pixel counter until it his 8 and wraps around to 0, and increments the tile counter. Use SRAM clocked at 28 MHz or so to be able to perform the 4-5 reads per pixel @ 15 khz scan rate. Separate SRAM's for tile data and pattern table can help keep clocks somewhat "era correct" and allow for simultaneously reading BG(x+1)'s name table while also reading BG(x)'s tile using the previously fetched name table value. initial pixel address = (__BGxTILE < 7) + (__BGxSX & 7) < 1 + (__BGxSY & 7) < 4 for tiles stored linearly as 16 bytes of 8 pixel, row 0-7 end to end. With address modifiers to invert and reverse the counters for H/V flip.
All 4 BG layer pixel reads read into a write combiner/merge circuit which has the BG register priority bits or blend operators feeding into it's select/demux inputs to select which pixel gets latched to the CRTC/DAC during scan out.
Sprites would go to a 5th "layer" of the merge/output control.
The sprite logic seems to be the most complex. Not sure if I should use a CAM lookup to get me hits on all sprites on the current scanline or implement a matrix decode method that uses Y value to activate sprites on the same scanline as columns, then X value for the current scan position to select the active sprite/pixel at the merge circuit from the active Y sprites for a row match in increasing X.
I thought about having the merge circuit read only the pixel needed from SRAM to minimize reads but I if I want to do color addition/subtraction, I have to read all 4 BG pixels plus sprite pixel anyway so might as well hard wire to read all layers.
I'd love to see what the original 8088 @ 4.77 Mhz could do with a proper console style pattern table based accelerated graphics chip instead of having to read/write 1 or 4 (VGA latches) pixels at a time over the < 2 MB/sec ISA bus.
Wondering if I could fit this in a Cyclone II...
Anyway.... carry on
