Originally posted by: Matthias99
Suffice it to say that there's a *lot* more to it than just "put[ting] 2 GPUs on a circuit board".
Originally posted by: MadRat
How about we just integrate the CPU to work off the GPU's memory controller. Hows that for dual architecture?
Originally posted by: m1sterr0gers
Originally posted by: Matthias99
Suffice it to say that there's a *lot* more to it than just "put[ting] 2 GPUs on a circuit board".
guess these guys know a *lot* about it
Originally posted by: Sahakiel
How about we just ditch the whole cost-effective principle? How's that for practicality?Originally posted by: MadRat
How about we just integrate the CPU to work off the GPU's memory controller. Hows that for dual architecture?
Originally posted by: MadRat
Originally posted by: Sahakiel
How about we just ditch the whole cost-effective principle? How's that for practicality?Originally posted by: MadRat
How about we just integrate the CPU to work off the GPU's memory controller. Hows that for dual architecture?
I'm still trying to jar some of these whiz bang processor designers to toy with the idea of using internal memory bandwidth on par with the latest, greatest graphics cards. It wasn't but seven or eight years ago when $500 was mid-range for a CPU core. Its not like people aren't dolling out $1000 for the P4EE on the belief that the bleeding edge is worth it.
I think the design lines between CPU and GPU would certainly blur if the CPU core suddenly gained a 256-bit memory controller attached to a local 256MB memory package operating in the 20-25GB/sec range. Graphics cards have been pushing these theoretic bandwidth ranges for better than a year now, so why not CPU's? There should be a way to do it in the $500-$600 range, especially considering if a NUMA approach was used where people could run a second set of memory, like in conventional slots, to push the total system memory up to where its useful for the largest of programs. For the SMP crowd, each processor enjoys its own uninterupted, dedicated memory and all of the processors would (because of the NUMA approach) share the second expansion set of memory placed in the slots on the mainboard. Less expensive mainstream-grade modules in the $150-$400 range could use 128-bit memory controllers tied to 128MB of memory running in the 10-12GB/sec range. Cheap bastards in the entry-level crowd could use the NUMA expansion memory slots (otherwise without any high speed memory tied to them) if they're too cheaps to fork over $150... (Yes, it would sell just like its true that people really do buy Pentium4-based Celerons when real P4's are but a little more money!)
The real limit of CPU power today, in my opinion, is that we really don't yet have a killer application out there to need more power worthwhile. Not many of us even push our current systems in any meaningful way 99% of the time the systems are on.
Link's deadOriginally posted by: m1sterr0gers
Originally posted by: Matthias99
Suffice it to say that there's a *lot* more to it than just "put[ting] 2 GPUs on a circuit board".
guess these guys know a *lot* about it
Originally posted by: Sahakiel
I spent half an hour typing up a long flame. Good grief, I'm starting to dread reading your posts.
Long story short, what you propose is absolutely possible.... SEVERAL YEARS FROM NOW.
Go read a book on basic computer architecture. Patterson and Hennessey come highly recommended.
My personal opinion of your posts has nothing to do with your character beyond the "get-rich-quick" type of mentality. First off, whether or not the technology is available is only one problem. 10 GHz transistors have been in existence for at least a year. There are several damn good reasons you don't see a 10 GHz Pentium IV and it has little to do with marketing.Originally posted by: MadRat
Like I care if you have some kind of personal problem with my ideas. This is not something that is possible several years away, the technology exists today. Integrating memory into the mainboard is nothing new, nor is mating dedicated memory to a MPU. Different companies have already tackled the technology behind the memory controllers so its not like the memory technology does not exist.
CPUs on expansion cards is nothing new. How the heck do you think backplanes came into existence? SCSI wasn't always a cable bus for your hard drives. SCSI backplanes hooked up to CPU riser cards have been around since at least the early 80's. Heck, I even worked on a PC with a Pentium 233 MMX on a CPU card hooked up to a PCI backplane circa 1996. The idea is nothing new. The PROBLEM you don't seem want to acknowledge is the fact that those systems were both more expensive than ATX layouts due to cramming everything into one card and everything else NOT soldered on board was hella slow. Also, I think you're also ignoring the fact that both Intel and AMD moved away from the Slot format as soon as was feasible. The main reason they even used the slot in the first place was due to die sizes being too large to integrate enough cache on die to match performance requirements. The first iterations, the Pentium Pro, used a dual cavity socket design for the sole reason that cache at that time was becoming a serious bottleneck much the same as DRAM a a HUGE bottleneck spawned the introduction of said caches in the first place.What I proposed was putting the CPU on an expansion card, perhaps even moving back to a slot since the interface would no longer need the huge pin count used by current fancy motherboard-centric designs, and then dedicated memory on the expansion card.
Let me try to hammer this point in one more time : Graphics cards have different design architectures than CPUs. Why do you think graphics cards have such great memory? Because it's soldered on, close to the GPU, and the card design is somewhat different than motherboards. Plus, I think the memory used for graphics cards is somewhat different than the DRAM used for system memory; not only in packaging, but also density and interfacing/accessing.I realize you may be thinking that memory dedicated to the card would mean the architecture is akin to parrallel processing, but this can be done in an x86-compatible machine using a NUMA aware OS. The NUMA aware OS is already available now, too. If graphics cards can do this kind of bandwidth for nary the cost of a single CPU, then there is a margin with which to build a profit upon.
What planet did you hail from? Die size of CPUs is somewhat smaller each year, that's true. But how does this make room for a GPU? You have a 50+ million transistor Pentium IV versus a 110 million transister Radeon 9800 (125 M for GeforceFX) and somehow these two are supposed to work together? We haven't even started counting transistors required for coupling the two very different processors. In other words, try later, preferably SEVERAL YEARS LATER.With die size dropping so drastically every couple of years, it makes more sense to integrate the GPU and CPU together now than ever before.
Shrinking memory increases density more than speed. In the DRAM market, density is everything. Think about it.Memory has shrunken to the point where current slots are inefficent for the speeds easily possible; what could be better than putting the memory right there next to the CPU? In order to do a combination core like this effectively the thing must have adequate memory bandwidth. Varying levels of performance can be gleamed by varying the amounts of and pathways to the memory on the card. The idea is for massive memory bandwidth for either graphics work or computing depending on need, but all of the horsepower in one package. Basically it becomes a jack of all trades CPU/GPU core at an affordable price and motherboard layouts again become simplified like they should be kept.
In short, AMD and NVidia better beware graphics card makers doing this before they do it themselves.
Originally posted by: User1001
There's been all this talk about the failed xgi dou cards with 2 GPUs. Couldn't ati and NVIDIA tell manufacturers to put 2 GPUs on a circuit board and offer the dual architurture.
Unfortunately, you're the only one I know that exemplifies such passion.Originally posted by: MadRat
1. Sorry, User1001, for derailing the discussion.
2. Sahakiel, get real. You fire a torpedo at my idea before even contemplating the big picture. I am beginning to think you have some frustrations with something, perhaps at school, and would rather deflect it my way. Perhaps it has to do with you spending long hours up at night reading ATHT...
I never said the technology was impossible today. Perhaps you should read carefully and see that I keep saying that such a design for CPUs will far exceed reasonable cost. Plus, you still have the socket problem at the system level.3. The 256-bit pathway to memory solution already exists and is used by today's graphics cards. Likewise, high speed memory exists on the latest graphics cards at speeds that far exceed desktop memory. GeForce FX5800U uses a 256-bit memory controller tied to 500MHz DDR (1000MHz eq.) memory. I rest my case that 256-bit pathways to dedicated memory is possible using today's technology.
That's why I keep saying wait several years. It takes relatively little time to take an existing system design, shave off everything except the bare necessities, and then plop it onto an expansion board. Notebooks have been doing that for years (except for the expansion card layout). However, if you want to integrate it into a socket, well, that's a whole 'nother story. That requires at least one or two new process technologies to even shrink the CPU to a decent size to make room for integration of other system components.4. The reason a slot or expansion card type of interface COULD be used is for simplicity reasons. As the technology was polished off then a socket interface could be used to shrink the entire package. The design would not need as many pins for external memory due to the less need for high speed memory on the motherboad; the high speed memory would be already on the package. A simpler 64-bit pathway should be acceptable for consumer boards, whereas they'd still be able to use dual-channel pathways for servers and other high end products.
How many times do I have to tell you that graphics cards and motherboards have different design methodologies. You can try integrating the CPU, memory controller, and DRAM onto one package (or Opteron + DRAM ) but you'll run into problems with upgrades. Unfortunately, upgradeability supercedes performance when designing for the consumer and server market. That, and you run into more problems with I/O running through the socket pins. I'm not even sure if HyperTransport would work through a socket.5. The memory is put on graphics cards close to the controller to limit trace length, else the timing would be thrown off with high clock speeds. This is the same hurdle that current SDRAM is running into using conventional DDR modules, with 250MHz memory pushing the limits of even the best boards out right now. If we want to entertain the idea of 500MHz memory then the trace lengths have to shrink to a quarter of their current length, meaning its absolutely necessary to put the memory next to the controller. Hey, thats exactly what current graphics cards do...
The memory hierarchy you describe is pretty well exploited with caching. The main difference I can see is whether or not the OS is aware of it.6. With a NUMA approach the clock speeds of main memory and expansion memory become less relevant. Therefore new memory technology can be used on the upper echelon of processors without affecting the bottom line of the mainstream processors using my approach. Lesser performing cards could use less expensive memory configurations to keep costs down. Current graphics cards using memory controllers that allow mutliple memory configurations, some using the choice of 64-bit, 128-bit or 256-bit settings.
If I remember correctly, S3's Savage4 had around 9 M transistors at 110MHz core and 125 MHz memory. At the same time, the Pentium 3 had around 5 M transistors running at 600 MHz with 100MHz memory. The S3's memory was based on SDRAM, the same type as desktops. 124MHz SDRAM was not exactly "dirt" cheap, but it was cheaper than, say, SGRAM. Notice that S3 could not hit above 150 MHz no matter how much they tried. I do believe S3 could have made the Savage4 run even 200MHz if they had the financial backing of Intel. However, even today's Radeon9800 and GeforceFX run slower than that 600 MHz Pentium3.I remember when 32MB of memory was relatively expensive yet the S4 Savage cost rougly 50-60% more than the price of just a 32MB stick of RAM, meaning the whole graphics card was dirt cheap. The S4 Savage had a huge transistor count, as high as alot of CPU's at the time, yet somehow they managed to cram 32MB into the package and this core all for a measely pitance in price. The secret of the low price was that the 32MB of memory was dirt cheap and had relatively poor performance. This meant that a graphics card with such low memory speed was more relevant to its low price than the high transistor count of the GPU. They sure sold alot of these cards so someone had to be making a profit along the line else they wouldn't of been so common.
I can't recall off the top of my head what nVidia challenged Intel, but I can speculate. In terms of nVidia producing a CPU to challenge Intel, that would be very surprising. Like I said many times before (and you've so far, ignored) GPUs and CPUs have different design methodologies. They have different tasks and different data sets. It wouldn't be hard for nVidia (or any other company, for that matter) to produce an x86 CPU. What would be difficult is ramping up clock speeds to match Intel and AMD's level of performance. No other company has as much experience with the x86 ISA as those two. Plus, you gotta wonder where nVidia would produce said CPUs. They don't have a new fab coming online in the next couple of years, which means if they suddenly decided to drop $6+ billion today for a cutting edge 65nm fab, it won't be ready for at least 3 years. After that, they have to tweak the process to get anything to work, then tweak it some more to get anything running fast, send the data back to the design lab, and tweak the modified design a bit more. Once that fab's ready and the CPUs come rolling off the line ready to market, you have to deal with Intel's brute force fab production capacity. It's hard selling your new, untested, CPU when the established competition can flood the market and drive you out. The only way nVidia could pull of something of that scale is a new CPU that easily surpasses anything Intel can offer at a low price AND keep it up for years and years. Intel has an established reputation that's damn hard to beat (see how long AMD has been at it).7. AMD and Intel have dual-core processors in their roadmaps and its rumoured to be because the cores are becoming pad-limited as they shrink. GPU's on the other hand have been lagging a good 12 months behind CPU's when it comes to process size. The graphics makers just might decide to add the less complex CPU component into their cores, which is why I said AMD and INTEL need to beware. What was that direct challenge NVidia made towards Intel a shortwhile ago?
You just gave the reason for 6-layer boards : chipset/memory complexities. That, and I/O and expansion boards.8. My idea would work for imbedded systems, but it takes a lesson from the embedded world to simplify the desktop world. Alot of modern mainboards had to move to 6 layers because the chipset/memory complexities are too much for a 4 layer design. Why not make it easier to stay with 4 layers if that is what can keep the system price simpler and cheaper? Removing the need for a 6 layer mainboard is incentive to work on this approach.
Not a new idea. AMD pushed the methodology you describe during the SlotA era. AMD motherboards at that time were relatively simple changes to existing Intel boards. Just swap out the chipset and socket, add a few tweaks here and there, and we have an AMD board. Unfortunately, nowadays the CPU architecture and the accompanying bus architecture is so different and the complexity of the board has increased to the point where I don't think that situation can occur. My guess is the expansion and basic I/O areas could be relatively untouched, but the power regulation, memory, northbridge, basically anything else would require major redesigns. However, I know very little of what goes on underneath, so they may be able to share more than I know.Hell, it could make it where Intel and AMD could share a common motherboard design but use different sockets to connect their CPU packages. Kind of like in the Socket A/370 days where some VIA reference designs could merely call for a different Northbridge and CPU socket (they had common pinouts) for the AMD or Intel processor support, yet that was the only way to distinguish them apart. This would conserve valuable design time for motherboard layouts yet allow them to keep their propriety.
Damn, now you just sound like Romero.Innovation comes from someone taking a lead and running with the design they know will work. Convincing naysayers that it is possible would cost too much time so the deed gets done and the naysayers left behind. I have a feeling that AMD and INTEL better watch out for some renegade GPU maker to someday soon issue a design that makes their designs outdated. They would catch up, but any serious outside competition can have serious long term consequences to one's bottom line. I think my idea is basically a bridge to what others have already done to what can be done by taking it another step in a new direction. If it doesn't work then okay I'm wrong. But if it revolutionized the industry then who wants to be the one in the marketplace looking at the other guy's back?
Originally posted by: MadRat
I'm not sure what I said implied imbedded memory. I was thinking more along the lines of designing the CPU's package (you know the thing they mount the CPU core onto) around a flat card that includes a first stage of high speed memory dedicated to the local processor. The memory would not be built into the CPU core, but rather mounted onto the CPU's packaging to place it right up close to the core. The second stage of memory, the NUMA memory, would be external to the CPU packaging and likely on the mainboard like we do it today. The card could include CPU and GPU functions, being that they would share such memory bandwidth that it would be possible to avoid using AGP/PCI-Express videocards.
I forget the exact quote made by the chairman of NVidia, but I believe he said NVidia has big plans to sink Intel across the whole market spectrum. He may have only meant integrated graphics, but it sounded like he meant the GPU was going to be taking on functions impossible to do anywhere near the same performance on current CPU's. Sounded like a thinly disguised threat to Intel's overall processor and chipset businesses.
Originally posted by: MadRat
I'm not sure what I said implied imbedded memory. I was thinking more along the lines of designing the CPU's package (you know the thing they mount the CPU core onto) around a flat card that includes a first stage of high speed memory dedicated to the local processor. The memory would not be built into the CPU core, but rather mounted onto the CPU's packaging to place it right up close to the core. The second stage of memory, the NUMA memory, would be external to the CPU packaging and likely on the mainboard like we do it today. The card could include CPU and GPU functions, being that they would share such memory bandwidth that it would be possible to avoid using AGP/PCI-Express videocards.
I forget the exact quote made by the chairman of NVidia, but I believe he said NVidia has big plans to sink Intel across the whole market spectrum. He may have only meant integrated graphics, but it sounded like he meant the GPU was going to be taking on functions impossible to do anywhere near the same performance on current CPU's. Sounded like a thinly disguised threat to Intel's overall processor and chipset businesses.
Your "general purpose jack-of-all-trades-master-of-neither core" is a regular CPU. I can't begin to fathom how you don't see that.Originally posted by: MadRat
Sahakiel-
The socket would enlargen, true, but it takes on enough components to justify the cost. Two 256-bit memory controllers are not necessary. A single 256-bit memory controller would suffice for the local dedicated memory, with the CPU core acting as both CPU and GPU. One core. Not two, but one general purpose jack-of-all-trades-master-of-neither core. General purpose with adequate mainstream performance, supplemented with capacity for both high and low end offshoots, is the key to reducing cost.
That's great, but much like the nForce2, CPUs aren't designed to sustain that much bandwidth. You can't just take fifty years of development and throw it out the window. Caching systems have been in place for years and for a good reason : memory is slower than the CPU. Even if you were to somehow get DRAM to run at 3 GHz to match your 3GHz Pentium IV, the gains you'll see are really not worth the cost. Something on the order of 15% better performance for close to 1000% the cost depending on DRAM speed. What you're proposing is technically just another level of cache. It just happens to be off-die (like older versions) and high speed (like the slot era).A general purpose core is never going to outperform standalone, specialized chips. Then again, the standalones each require so many components it basically ends up adding to the overall cost to support them. We're cutting out the independent memory controllers for each GPU and CPU, thereby simplifying the system to one high speed memory bus. So what if we cannibalize the GPU-side of the performance somewhat, the CPU is going to have unfettered memory access that it never had before.
If you're referring to GeForce2 MX200 level of performance, how is your statement supposed to mean anything? GF2 MX200 has 1.6 GB/s in memory bandwidth. One single channel of 266DDR would more than satisfy that. Heck, it seems to me SDRAM on the GPU's 128-bit memory bus would suffice. You're left with 2.6 GB/s for the CPU. That's just a bit shy of a 333FSB, but you'll probably run faster RAM with a 333FSB than a 200 or 266 FSB, anyway.In this scheme the GPU is not a standalone device, actually, so it may not affect the performance as much as you might think in comparison to standalone architecture. The nForce 200 with a matched set of 266DDR was actually quite similar to the GF4 MX200 in performance, although the integrated graphics core shared memory with the CPU! So why would we expect this scheme to perform differently?
"bridge between memory and storage"... are you referring to DMA, now? Or did you suddenly integrate storage capabilities into your new "super-sized" socket? Might I remind you that the more you integrate into a single board, the more you move towards embedded systems. High performance single-board solutions cost a pretty penny and cheap solutions perform rather low. All hail the now defunct internet web appliances, which were highly integrated solutions that suddenly found themselves too expensive when compared to desktop PCs AND were more limited in functionality and upgrades.The external (NUMA) memory architecture does not have to be as fast as the primary memory, just the bridge between memory and storage. The primary memory would be the high end performance memory. The external memory (NUMA) would be whatever the market would bear as efficent for the customer base, with each market segment able to use different memory types if need be. This gives the mainboard manufacture alot of flexibility, something that they've been working towards for years now.
You're quite literally only a few steps away from SOC. Your socket design contains just about everything needed for SOC. It basically just lacks nonvolatile memory and a simple I/O interface.MrSheep-
This card would contain a distinct: 1. MPU (multipurpose processor unit) 2. flexible-setting memory controller (256-, 128-, or 64-bit) 3. flexible-support for memory quantity (could be anywhere from 64MB to 2GB) This is not a SOC. The external memory is not a uniform standard, where say SDRAM or RDRAM has to be affixed. The northbridge on the mainboard (for boards that would support external memory) would need be compatible to the pathway from CPU to northbridge, but the memory need not be defined.
And if your supplier lets you down, you're screwed. Still, the trend is towards fabless technology companies because, like I said, Intel is probably the only company that can keep using cutting edge process technology. What it means is that Intel will always have a leg up on the competition since they're the first to try out new processes.The writing is on the wall. High tech companies need not their own fab. Few megacorporations can even afford to do it on their own. NVidia has been a fabless company for years now. ATI is a fabless company. So what? They still get their products out the old fashioned way, they bid for the fab time. Thats how the free market system works you know.![]()
128MB of DRAM still requires multiple chips. A quick check on Samsung shows the highest DDR density yields a 64MB chip. Okay, so it's only two chips. However, that's two chips in a stacked TSOP package. Pull out your CPU and tell me how much real estate is left to place a memory controller (if you don't have an Opteron) and two DRAMs plus associated support chips and traces. You want to expand the socket? Great. Fantastic. Now explain why nobody does that for desktops. Oh, could it be, *gasp*, COST?!?!?What this design gives is a way to integrate memory into the core's socket without defining the support by the mainboard manufacturers. The CPU package does not need much memory for basic functions, but we are to the point where its ridiculous to think of memory in any quantity below 128MB. I can run just about anything on my machine with 128MB, not that I'd want to, but for beginners and low end customers its plenty acceptable.
System integrators could opt for absolutely no expansion slots for memory and place no imbedded memory on the mainboard for absolute barebone costs. With this strategy there can be one board for no external memory, another for using external memory. Either way, whichever CPU package they choose largely determines what performance spectrum they want to target. The customer can still be sold on upgrading the CPU later to something bigger and better!
I STILL don't see how this is any better than simply adding another level of cache.As for alot of SKU's, why would one offer 36 SKU's? The GPU functions are inherent to the core, not separate. The memory on the CPU package would vary by size and quantity to fit the target market. I don't see them using much more than 3-4 memory sizes, with the memory bandwidth configuration varying in but three ways. Lets think using today's technology: At the high end you may see 256-bit/1GB config and a 256-bit/2GB config with much slower memory. At the mid-market level you might see 128-bit/1GB config, 128-bit/512MB config, and maybe even a 128-bit/256MB configurations. At the low end you might see 64-bit/256MB config and a 64-bit/128MB config. Notice how the different market segments are related? The 256-bit/2GB config and 128-bit/1GB configurations are one and the same configuration, only the latter with half the installed memory. Likewise, the configurations overlap all the way down to the bottom of the spectrum. So we see but six SKU's for a very flexibly-configuration of CPU packaging.
I should perhaps of used smaller memory sizes to denote the "current technology" - keeping it inline with whats on-the-market. I was using whats possible as my watermark. After doing some reading I need to lower my expectations.
Figure the most practical solutions would be 64MB chips in either DDR or DDR2 configurations. We could practically weed out the less efficient possibilities to make a whopping three efficiently simple solutions overall, with all of them using DDR2. If you made it possible to use DDR then it really doesn't make sense to offer it in more than the 512MB/256bit, 256MB/128bit and 128MB/64bit solutions. Too add in DDR support would make a total of nine symmetrical DDR configurations possible via the employment of 16MB or 32MB chips, although these smaller chips aren't really practical. That makes a grand total of six worthwhile SKU's.
Some of the current GPU memory controllers, like in Nvidia 5900-series of cards, are fully capable of DDR and DDR2 support.