Dual Architecture Graphics Card

acx · Jan 28, 2004

So, how do I upgrade my graphics without upgrading to a new MPU? If the GPU is severely crippled, how is this solution better than the current motherboard chipset with integrated video? Why is this more cost efficient than discrete parts?

Don't forget the power regulating capacitors and the giant heatsink/fan needed to cool this thing. Also you will need more components on the PCB for handling the analog video signal out.

MadRat · Jan 28, 2004

This offers a scaleable FPU and ALU to match the memory bandwidth, something current setups are pretty well handicapped by their slow system memory and small (in comparison to memory) cache structures. The general purpose architecture would allow scientific calculations to be performed on a higher scale then currently done, while also offering good all-around gaming performance.

The output signal from the MPU would require some kind of external connector and a PCB for the video-related components, true, but they would be standardized across platforms for basic display. There is no reason addon PCI-Express or plain PCI videocards could not run in unison with this architecture. In all reality, what would be the point to add in another videocard except perhaps for multiple displays? The processor could surely drive multiple displays using multiple standard external connectors, all powered by the same MPU - even multiple MPU's each powering their own video outputs.

On the same token, if you wanted more performance then a better equipped MPU would drive that video performance using the same standard external connectors. No need to discard that perfectly good video card just so you can both play UT2005 and also use it capture HDTV signal, when that standardized HDTV tuner addon card to supplement the current video output will do fine...

acx · Jan 29, 2004

Why would I want to pay more for an integrated package if I just want to improve my graphics performance? Even if I buy the MPU with the best graphics now, a year down the road i might want to improve my graphics. If the GPU were separate from the CPU I could at least sell the old one or move it into another machine.

What do you mean by scalable FPU and ALU? Aren't the width of the FPU and ALU's (32 bit/ 64 bit) fixed by the ISA? If you mean the number of FPU's and ALU's in a superscalar processor, these are determined by factors outside of memory bandwidth. If you mean the speed at which FPU and ALU operate, these are fixed by the physical transistor and wire delays on the chip.

Isn't the point of caches to hide latency of main memory and disk? What is the latency of this integrated memory? We already have enough bandwidth for the CPU from memory. The limiting factor is latency from chip to memory. Getting the first bit of data from off chip DRAM is the most costly. Subsequent bits are almost free. How much would integrating the memory help in reducing latency? What is the price/performance tradeoff?

Is it actually physically possible to fit all those components onto one small PCB board and the GPU onto the CPU die? I am not talking about physical space to place the components. Is it possible to route all the interconnect wires in a reasonable number of layers? Is wire inductance going to be a problem? Can you actually package the MPU die?

Even if these limitations or question can be overcome, does any company aside from Intel actually have the manpower and money to pull something like this off? The NRE cost of this project must be huge. Who would want to take the risk if the market decides not to buy this solution?

aka1nas · Jan 29, 2004

I would go out on a slightly shorter limb than madrat and say that in the upcoming years we will see more and more instructions that have traditionally defined a graphics chip get added to the CPU. I am thinking matrix operations etc. As the speed of high end CPUs continues to increase past the needs of the average user(which I think it very well might, future software and killer apps notwithstanding), then it might be feasible to start adding more features to the chips rather than continuing to crank up clockspeed another notch higher. I could also see a GPU ISA being defined somewhere in the near future, and GPUs becoming a little bit more like CPU's in that respect. As far as Intel having to fear Nvidia, I think it is more likely that somewhere down the line, ATI/NVidia will merge with Intel/AMD(either one with either one), as their respective products start to converge. I think this is more along the lines of 5 to 10 years in the future though.

MadRat · Jan 29, 2004

After reconsidering the need for paired memory, these are the likely possibilities:

256bit = High end:

512MB/256bit (8x64 DDR/375-425MHz - Most practical 512MB 'Extreme Edition' solution)
256MB/256bit (4x64 DDR2/400-500MHz; Most practical 256MB/256bit solution)

128bit = Mid-Range:

256MB/128bit (4x64 DDR/375-425MHz)
128MB/128bit (2x64 DDR2/400-500MHz; Most practical 128MB/128bit solution)

64bit = Low end: (All solutions Would still necessitate the use of on-mainboard expansion slots)

128MB/64bit (2x64 DDR/375-425MHz)
64MB/64bit (2x32 DDR/325-375MHz)

By placing the memory controller on-die and mounting the memory physically next to the processor then there is the minimization of the pathway - the most direct path that is - from controller to memory bank. Don't forget that high-end 500MHz GDDR can be down to one-fourth the latency of current 166MHz desktop memory! The GPU functions require alot of memory bandwidth, so what you said about latency being king is not entirely true in all cases, especially for a design with an integrated GPU. Plus, by sharing the memory accross architectures we simplify the system, thereby eliminating costly duplication of components which in this case includes processor functions and physical memory.

I like to think of this design as more of a brain than just an MPU, where the CPU is like a right brain and the GPU is like the left brain. Or do I have that backwards?

The memory controller would be the medulla obligata and route data traffic to the proper location. The raw memory bandwidth would be devoured by the GPU functions and the high clock speeds would enable low latency for the CPU funtionality. Common data between the components no longer has to travel an independent bus or port, because the two components would be practically built on top of each other and the transfer time between MPU halves neglible. In the end, both sides of the equation benefit.

The ALU and FPU in current processors are basically hamstrung by the front-side bus. They can never achieve a broad parrallelization because of the lack of raw bandwidth to them, meaning that there is no need to further increase their design while the front-side buses are so hampered. In reality both AMD and Intel have moved away from beefing up raw FPU for this exact reason, instead moving as much FPU functionality over to SiMD structures. Intel builds their ALU functions off their SSE2 units, which takes this one step further in the evolution to replace the traditional ALU in their design. (Amazingly, though, its rumoured that Intel is actually moving back away from the SSE2 unit for Prescott's ALU units.) SSE3 may be yet another step for Intel to make the traditional FPU a secondary function in future processors, but we don't really have much information out there yet to know. ALU units are not nearly as complex as many of the other components of the processor, and its one of the components that have alot of headroom for gains in raw clock speed without moving to longer pipelines. In the case of the current Intel and AMD designs, the ALUs have a substantially higher IPC than the FPU units. Heck, the P4's ALUs are double-pumped even on the highest end models and still are relatively weak in comparison to specialized processors that key in on these functions.

I'm willing to bet that these issues of design have all already been tackled years ago. They may not be to the scale I've laid out, but likely the same issues you've brought up were tackled when Intel worked on SOC technology. In my opinion the memory speeds and raw processor clock speeds have made this technology vary feasible now.

MadRat · Jan 30, 2004

My numbers for GDDR2 are very conservative. Apparently Samsung is sampling 700MHz GDDR2 to ATI and NVidia and has been trying to eek over 750MHz out of it, with the goal to offer the newest generation in a 600-800MHz spectrum. This makes the proposition of using GDDR2 even more lucrative. Oooh, and both 500MHz and 800MHz GDDR3 is on the horizon which, in the 256-bit form, offers theoretical 32GB/sec and 51.2GB/sec (respectively) transfers at GDDR latencies. (The down-side is heat disipation for GDDR3 and it will be at minimum a 256MB solution.) Thats a pretty big leap from the 500MHz DDR2 in my original proposition. Looks like my GDDR numbers were way too conservative, too. The numbers would look more like this:

256bit = High end:

512MB/256bit (8x64 GDDR/375-500MHz - Most practical 512MB 'Extreme Edition' solution until GDDR3)
256MB/256bit (4x64 GDDR2/400-700MHz; Most practical 256MB/256bit solution)

128bit = Mid-Range:

256MB/128bit (4x64 GDDR/375-500MHz)
128MB/128bit (2x64 GDDR2/400-700MHz; Most practical 128MB/128bit solution)

64bit = Low end: (All solutions Would still necessitate the use of on-mainboard expansion slots)

128MB/64bit (2x64 GDDR/375-500MHz)
64MB/64bit (2x32 GDDR/325-400MHz)

I can see the BTX form factor and a front mounted 120mm fan should give this type of design exactly the type of cooling it needs. Heck, for SMP it wouldn't take alot of extra work or layout design. It would be hard to imagine two of these MPU processors, each complete with its own 256MB of dedicated 800MHz GDDR3...

Search

Dual Architecture Graphics Card

acx

Senior member

MadRat

Lifer

acx

Senior member

aka1nas

Diamond Member

MadRat

Lifer

MadRat

Lifer

TRENDING THREADS