Would categorizing data types into serial/parrallel-optimized make sense?

MadRat

Lifer
Oct 14, 1999
11,960
278
126
Seems to me that with the advent of x86-64 that AMD and Microsoft ought to team up to try some new wrinkles with memory management. Instead of strictly setting memory management inside of each program with the OS loosely arranging the big picture, why not develop some new tricks for it? Seems like so many tasks are optimized for memory to be either serial or parrallel in nature. Imagine if the OS could address two separate memory memory types to optimize these differences. Add in motherboards with memory banks 0 and 1 supporting 200MHz DDR RAM while memory banks 2 and 3 would be for PC1200 RDRAM devices. The user would ultimately decide whether he wanted to load up on one memory type or the other, all the while enjoying the benefits of using each type of RAM. Perhaps this would be little more complicated than current dual-channel chipsets, where the chipset already switches back and forth between two separate memory channels to feed the main vein so to speak; the trick is apparently all in the buffering.

Now this idea isn't quite the NUMA approach that people have brought up in the past. And it would require support from BOTH hardware (memory and chipset) manufacturers and software (OS and applications) programmers to get it to work. What other hurdles do you engineers out there see in a design like this?
 

borealiss

Senior member
Jun 23, 2000
913
0
0
MadRat

the problem i think is to determine what tasks are more "serial" and which are more "parallel". different programs are going to have different access profiles to main memory, the problem is who's going to profile these programs? and then once you've profiled these programs, you have to be sure you've captured the behavior of the memory accesses for that program in it's entirety in order to optimize memory accesses, if you had the dual memory approach that you are proposing. from a feasability standpoint, this requires a lot of overhead, not to mention the hardware complexity. there's also the factor that the profiling may not entirely be accurate because the accesses to main memory may be a function of the dataset the program is working with, something like a database for instance. queries doing multiple joins on huge sets will access main memory in a very parallel way if the memory is there. something like many small queries on small sets will have a more random, "serial" type access pattern. this is all for the same database software.

memory is more dependent on the actual hardware platform you're running, i.e. cpu, not necessarily the program you are running. the object of the memory hierarchy present in computers now is to hide as much of the hardware implementation from the program as possible. each type of ram is also just a storage medium. how the memory controller handles the interface between the cpu and the actual modules is what's going to determine the performance factor. dual channel ddr can certainly compete with the likes of rambus in terms of bandwidth, and does better with latency. rambus could compete in the latency department if enough channels are interleaved. even serverworks implemented a design using standard pc133 sdram that had enough bandwidth to compete with the likes of ddr.

as it stands right now, i think that if you make the memory damn fast, the performance you're going to get is going to be good enough not to warrant a dedicated channel for disk buffering and another for video streaming etc... as long as you take into account the worse case scenarios. with hammer coming out, the hardware complexity will be increased even more because you have the memory controller onboard now, and i see this as a trend for the future.
 

MadRat

Lifer
Oct 14, 1999
11,960
278
126
So in your opinion, if a programmer had the ability to define the data type then you think it wouldn't become important?
 

borealiss

Senior member
Jun 23, 2000
913
0
0
data type i don't think really has anything to do with it. programmers are able to define datatypes that are objects in most object oriented programming language. how the datatype is handled by the system is really a function of the compiler writer and program. datatype has limited effect on how your program accesses memory. i think there are a lot of other factors that you're not taking into account here. a short is going to be less memory intensive than a double, but what if you are fetching a bunch of shorts, or a bunch of doubles? there's prefetch effects from the chipset and other things to take into account. i don't think you can just sum up the entire scope of memory access behavior with just the datatype that you're using in a program.

in short, i don't think it would have much effect. too many other factors.
 

kpb

Senior member
Oct 18, 2001
252
0
0
I think what he's really saying is it doesn't matter.

The program doesn't need to know and doesn't want to know what type of memory the system is dealing with. This is something that is handled at a compleately different level in the computer system. The operating system doesn't know what type of memory it is. Heck even the processor doesn't know. The only thing that knows what type of memory it's dealing with is the memory controller in the North bridge/ MHC. (the exception to the processor thing is the new amd processor that has the memory controller intergrated). Really the type of memory doesn't really matter. What matters as far as system performance for memory goes is latency and bandwidth. How fast you can move the info and how long you have to wait before you get the info. The more the program has to worry about the more complicated it is to do things and the less cross compatibly things are. Just think about our current crop of processors the P4 and athlon. They have bolth had mutiple types of memory used with the processors. The p4 has had standard pc 133, ddr from 266-400 and several different flavors of rdram. The athlon has had all the same minus the rdram stuff. Imagine if each one of those memories cause programs to stop working or had to have patches to get them working right.

Every step of the process is designed so that it knows how to talk to the next stop in the process and doesn't care what's past that or how it's handled. Thats what allows things to work together and not need a custom thing for every situation.
 

MadRat

Lifer
Oct 14, 1999
11,960
278
126
So basically when they figure out practical NUMA architecture nobody will want to use it?
 

TronX

Member
Apr 9, 2003
147
0
0
Originally posted by: MadRat
Seems to me that with the advent of x86-64 that AMD and Microsoft ought to team up to try some new wrinkles with memory management. Instead of strictly setting memory management inside of each program with the OS loosely arranging the big picture, why not develop some new tricks for it? Seems like so many tasks are optimized for memory to be either serial or parrallel in nature. Imagine if the OS could address two separate memory memory types to optimize these differences. Add in motherboards with memory banks 0 and 1 supporting 200MHz DDR RAM while memory banks 2 and 3 would be for PC1200 RDRAM devices. The user would ultimately decide whether he wanted to load up on one memory type or the other, all the while enjoying the benefits of using each type of RAM. Perhaps this would be little more complicated than current dual-channel chipsets, where the chipset already switches back and forth between two separate memory channels to feed the main vein so to speak; the trick is apparently all in the buffering.

Now this idea isn't quite the NUMA approach that people have brought up in the past. And it would require support from BOTH hardware (memory and chipset) manufacturers and software (OS and applications) programmers to get it to work. What other hurdles do you engineers out there see in a design like this?


There are no wrinkles in the memory management.
To change the X86 would cost money and I'm sure
no one would put Billions into anything new at this
point in time. You would have to get away from anything
X86.

What AMD could do is have the Chipset's refined to run
faster. This would be like Intels new Chipset that use PAL.
However, there is a cost for doing these tweaks and it
might look good in Sandra, but the games might suffer.
The games would hurt because they are low level coded
to use AMD and Intel tricks. To change how the memory
is used would call on M$ to spend money in an area I'm
sure they don't fell is needed. Then to have every program
that runs 32 and 64bit would need to be patched to even
run.

So like they said, "it's all about money."

As an X86 system things are running like they should in 32bit or 64bit.
It's all about the BUFFER in a PC X86 system. You got cache on the
HD's, CPU's, Videocards and motherboards ect.. Adding more cache
and ram here and there lets the system run smooth and hitch free.
Just because AMD and M$ could add a few tweeks in one spot does
not mean the system will run any better because it's the same information
that will get processed by the CPU in the end. And this is why the CPU has
cache built on it because the RAM is slow. But you need the ram because it's
the second best buffer. The 3rd buffer is the Harddrive and we all know at that
point it's SLOW CRUCH TIME!

64bit is nice and all but I'd rather have more built on cpu cache.
Just think of an AMD XP with 1Gb of L1/L2 cache. I guess you wouldn't
even need ram then would ya?

/blah I'm rambling

 

Lynx516

Senior member
Apr 20, 2003
272
0
0
OK ths doesnt seem to make much sence to me.

1) What do you mean by serial and parralel data? The CPU can only accept data in 32bit chunks and can only perform functions on those 32bits. So a memmory controller needs 32bits of data to send to the CPU from memmory before sending any data off.

2) The CPU idealy uses the RAM as little as possible. 99% of the time it uses the Caches to retreve data as this is alot faster than the RAM and using the caches prevents pipeline stalling.

3) If this was implemented the trace count would be extraordinary you woudl have to use atleast 6 maybe 8 layer motherboards dramatically increasing the price of a motherboard.

4) Think I just get what you are getting at. YOu are talking about sequential reads. This would be near impossible to implement as you woudl have to mirror the memmory as it woudl be impossible to define the two "types" of memmory access efficiently. Thus increasing the cost as you would need twice as much ram.

cannot think of other reasons at the moment.
 

zephyrprime

Diamond Member
Feb 18, 2001
7,512
2
81
...optimized for memory to be either serial or parrallel in nature
I think you mean serial vs. random access. The problem is, it's hard to optimize for random access because...uh...it's random. The other problem is, PC1200 is better at both random and sequential access than DDR333 (not sure about ddr400 yet).
 

tinyabs

Member
Mar 8, 2003
158
0
0
Originally posted by: zephyrprime
...optimized for memory to be either serial or parrallel in nature
I think you mean serial vs. random access. The problem is, it's hard to optimize for random access because...uh...it's random. The other problem is, PC1200 is better at both random and sequential access than DDR333 (not sure about ddr400 yet).

I believe the cache has already done this. When a piece of data is read from the memory randomly, it would load a line of data like 256bytes(?) serially and store and tagged in the cache. The data is often cohesive as in a record or an object. The cache would implicitly load part or all of that record into the cache area for fast pickup.

If you are talking about memcpy/movsd operations, then serial access might be a bit faster but programs rarely move huge chunk of data from one place to another except in graphical apps. Otherwise I believe random or serial access does not differ much in performance.
 

MadRat

Lifer
Oct 14, 1999
11,960
278
126
I keep polling for an L3 cache on modern CPUs but the engineers around here seem to say its "not necessary". I can go blow an extra $100 on a videocard easy enough for an extra 64MB of high speed RAM for marginally better video performance yet Intel engineers are telling me they wouldn't care to support me paying an extra $100 for the same type of support on the mainboard - no wonder the PC industry is in the craphole that it currently has settled...
 

Venix

Golden Member
Aug 22, 2002
1,084
3
81
Just because you'd pay for it doesn't mean it's a good idea. Computer performance is dictated by much more than simple cache size and speed--the addition of L3 cache to modern CPU's won't necessarily result in any performance gain. Changing cache size, speed, or associativity is completely different from increasing the amount of memory in the system.
 

Sahakiel

Golden Member
Oct 19, 2001
1,746
0
86
Would categorizing data types into serial/parrallel-optimized make sense?
Adding extra steps in memory management to differentiate between serial and parallel memory accesses isn't necessary, and is most likely a waste of time and money. It'll actually increase latency for every single access and add several new problems dealing with noise, signaling, etc.
No one memory type is the holy grail of computing. All memory types and all data types have strengths and weaknesses, depending on the task at hand. Adding a flag in your code to tell the memory manager what type of memory you want to be stored in doesn't mean you'll speed up general performance.
What happens if you run out of memory space? Well, then you'll just have to be stored onto disk because there's no room for you in memory.
Oh, but we can always store the data in the other memory type. Well, yeah, but you're gonna slow down a whole 3-4% or so, and most of it due to the extra latency in the memory controller.
Oh, that's too costly. Let's stick it on disk.

Originally posted by: MadRat
I keep polling for an L3 cache on modern CPUs but the engineers around here seem to say its "not necessary". I can go blow an extra $100 on a videocard easy enough for an extra 64MB of high speed RAM for marginally better video performance

An extra 64MB of RAM on your vidcard doesn't cost you another $100. On budget cards, the price difference ranges from less than $20 to maybe $50.
The marginal increase in performance is due to the fact that textures and graphics data for your game don't take up the extra space, so your number of accesses to main memory via the slower AGP bus isn't really reduced by any significant amount.
Now, running the same pretty graphics on your old voodoo 1 with a whopping 8MB memory is another story.

yet Intel engineers are telling me they wouldn't care to support me paying an extra $100 for the same type of support on the mainboard -
You seem to imply an understanding of diminishing returns for increasing RAM on video cards. Why can't you understand the diminishing returns of more cache for CPUs?

no wonder the PC industry is in the craphole that it currently has settled...
If you have such great ideas for the computer industry and which are so obviously easy and revolutionary, why are you posting in this forum? Go patent and sell your idea, or, better yet, do what everyone else does and start your own company.

If you want to get a faster computer for your gaming needs, you'll need a different architecture, one specialized for games.
Oh, wait, that's been done. It's called a gaming console.
 

MadRat

Lifer
Oct 14, 1999
11,960
278
126
Originally posted by: Sahakiel
An extra 64MB of RAM on your vidcard doesn't cost you another $100. On budget cards, the price difference ranges from less than $20 to maybe $50.

I was speaking more to the idea of the top runners on the market where an extra 64MB does cost around an additional $100.

Originally posted by: Sahakiel
You seem to imply an understanding of diminishing returns for increasing RAM on video cards. Why can't you understand the diminishing returns of more cache for CPUs?

I'm looking at it from a revenue point of view. Intel is into commoditization of the market because they virtually own the x86 world. If they can design a product that coaxes $10 more profit per unit then they just upped the ante. The simplification of the Intel product lines is a direct result of their market share being pretty well maxed out and the market itself tailoring off its growth potential, if nothing else for lack of a new gee whiz product.

Originally posted by: Sahakiel
If you want to get a faster computer for your gaming needs, you'll need a different architecture, one specialized for games. Oh, wait, that's been done. It's called a gaming console.

The console isn't everything its cracked up to be. Besides that, there are relatively "old timer" products on the market that don't run a whole lot faster on the new architecture simply because they are memory bandwidth constrained. The move to 512k of cache was a nice gesture from Intel/AMD, but it doesn't have the macho appeal of sticking on bigger memory cache modules like we had in the days of the first Pentiums.
 

Sahakiel

Golden Member
Oct 19, 2001
1,746
0
86
Originally posted by: MadRat
I was speaking more to the idea of the top runners on the market where an extra 64MB does cost around an additional $100.
That extra 64MB costs an extra $100 because the cards themselves cost about $1000. You're comparing workstations to gaming machines. Another 64MB does make a difference when rendering.
Actually, these days, those cards sport 256MB or more, easily.

I'm looking at it from a revenue point of view. Intel is into commoditization of the market because they virtually own the x86 world. If they can design a product that coaxes $10 more profit per unit then they just upped the ante. The simplification of the Intel product lines is a direct result of their market share being pretty well maxed out and the market itself tailoring off its growth potential, if nothing else for lack of a new gee whiz product.

Adding more cache doesn't equate to just $10 more. Sure, the sand is cheap, but semiconductor manufacturing may be the only industry where the equipment costs virtually erase material costs. A fab itself costs a few billion to build and may operate at that level for only a few years before requiring an upgrade to a new process. Start calculating how many chips you can build in that time and how much energy and wages you'll have to pay for. Oh, don't forget backups and insurance.
Yield is the important issue. Increasing die size by a few square mm may end up costing you anywhere from 10% to 50% of your optimal yield depending on where the flaws are likely to show up. This is the main reason nobody makes a wafer sized CPU. It's simply not practical. Die sizes are usually calculated and/or adjusted to maintain net revenue. If a die of 100 CPU's gives me 60 good chips, assume doubling cache size gives a CPU exactly 1.5 as large so the wafer gets me at most 66 CPUs.
However, wafers are round, so 66 may not fit, let's try 60.
Less CPUs means more flaws will appear on a given die, so yield becomes 20.
These 20 CPU's now cost 3x as much as the original 60 with less cache, and may yield anywhere from 0-.
People who aren't running time-critical applications where even the smallest reduction in execution time means something will most likely buy the CPU that costs ten times as much but gives back only 2% extra performance.

The console isn't everything its cracked up to be. Besides that, there are relatively "old timer" products on the market that don't run a whole lot faster on the new architecture simply because they are memory bandwidth constrained. The move to 512k of cache was a nice gesture from Intel/AMD, but it doesn't have the macho appeal of sticking on bigger memory cache modules like we had in the days of the first Pentiums.

I find it highly unlikely AMD and Intel moved to 512K L2 simply to please customers. Intel probably did it because the PIV seriously needed it. Plus, it's always possible they calculated optimum die space vs yield and found room left over to fill up with extra memory. AMD, on the other hand, probably released 512K as a holdover until Hammer could reach the desktop and also as a way to counter the PIV's bigger number.

The reason sticking extra cache improved performance so well with Pentiums is due entirely to older technology. The Pentium had a front side bus of 60 or 66MHz. The back side bus wasn't any better, if I recall. However, the main performance advantage likely came from reduced latency vs the main memory's 60ns. Add to that the severe lack of available memory in the first place, which results in frequent disk access, and you can see where most of the performance is coming from.
Nowadays, memory isn't as constrained. Just look around and you'll see plenty of debates over whether 512MB is better than 1GB which usually end up with a grudging "In most casts, 512 is enough, but we're seeing some programs taking up more." Also, compilers are a lot better these days as well as data structure development, each doing its part to reduce disk access.
Truth be told, we're probably already beyond the point where the prescence of serial and parallel memory accesses play a large role in performance. RDRAM runs fast enough to compete with DDR and vice versa. By the time you figure out which will end up faster, something better will be available and the point will be moot.


If you really believe cache makes all the difference in the world, I suggest you try using a Xeon with 2MB and compare it to a Xeon with 512K. See what kind of performance gains you get, then take a good look at the extra $3000 the 2MB version costs.
If adding such "monstrous" cache size is so cheap and easy and the performance gain is so noticeable, why don't any desktop CPU's have them? If the cost was only 10% for a large percentage gain, don't you think a lot of people would pay that extra 10%? At that point, you've got a Williamette vs a Northwood.
The thing is, the extra cost is already included in CPU prices, and it isn't always 10% or less.
 

MadRat

Lifer
Oct 14, 1999
11,960
278
126
You keep talking about on die costs, which is off topic. I'm talking about an off die cache. Remove the whole idea of being on die and 90% of your rant is lost.

The memory controller on the videocards has a direct impact on the price of various cards. On videocards with 64-bit controllers you typically see from $5 to $15 difference moving from 64MB to 128MB. On videocards with 128-bit controllers you typically see from $15 to $45 difference. On videocards with 256-bit controllers it is more like $25 to more than $100. If we're talking about professional workstation cards it can get up there higher, but like you said its typically for 128MB differences between cards and not just 64MB. The point being that videocard makers recognized that memory bandwidth is important for their products and unfortunately the public doesn't know that memory size and bandwidth aren't necessarily the same. So they tend to take advantage of this presumed need by the public for "more memory", which can be sated without actually increasing the bandwidth of the cards. (Hence Nvidia can still sell healthy wads of GeForce4 MX440 cards...)
 

Lynx516

Senior member
Apr 20, 2003
272
0
0
Problem wiht it being off die is that it will be slow. And if it is slow it gets to the point where the latency to the Cache is approching the latency for the RAM at which point there is not point in the Cache. Also at the moment CPUs have 99.5-99.9% Cache it. More if it is a highly repetative proccess so to increase this more to a point where it makes 20-30% dif you woudl need 5-6mb if not more of Cache at which point the latency goes up even if it is ondie and if it is ondie the cost woudl be huge. Look at the Itanium II 3Mb ondie and costs a bomb.
 

Sahakiel

Golden Member
Oct 19, 2001
1,746
0
86
Originally posted by: MadRat
You keep talking about on die costs, which is off topic. I'm talking about an off die cache. Remove the whole idea of being on die and 90% of your rant is lost
.
In most cases, moving silicon off die decreases die costs, but increases packaging costs. A silicon die isn't just simply dropped into its packaging and sealed off. One has to keep a fine balance between having less yield and less packaging vs more yield and more packaging.
Plus, moving cache off die also increases latency. Signals have to travel farther and propogate through an environment with more noise and no guarantee the cache will run just as fast as the CPU. Just look at the old Pentium II for a well-known example. Four times the cache of the Celeron versions, but having the cache off die running at even half speed and it only outpaced the Celeron during cache overflows. Both shipped with off-die cache.
The Pentium III equivelents were all on-die and for most situations, the PIII Celeron performed just as well with half the cache and half the cache coherency.
In a nutshell, moving the cache off die and "90% of my rant is lost." However, moving it off die also brings in other considerations and some of the same problems rear their ugly heads.

The memory controller on the videocards has a direct impact on the price of various cards. On videocards with 64-bit controllers you typically see from $5 to $15 difference moving from 64MB to 128MB. On videocards with 128-bit controllers you typically see from $15 to $45 difference. On videocards with 256-bit controllers it is more like $25 to more than $100.
The cost doesn't come from only extra memory modules. Getting 128 signals to arrive at the same point at the same time is a few orders more difficult than 64 traces. 256 would be a few orders harder at that. This is considering the same clock speeds in all situations and doesn't get into the fact that higher trace count usually means a newer technology that actually runs faster. In other words, one isn't going to find a 64-bit memory controller on a video card that runs just as fast as a 256 bit.

The point being that videocard makers recognized that memory bandwidth is important for their products and unfortunately the public doesn't know that memory size and bandwidth aren't necessarily the same. So they tend to take advantage of this presumed need by the public for "more memory", which can be sated without actually increasing the bandwidth of the cards. (Hence Nvidia can still sell healthy wads of GeForce4 MX440 cards...)
With the AGP bus being so slow, increasing memory size has a profound effect on running games with all the glorious detail. This was recognized long ago by Intel who then pushed the original AGP specification to increase the pipe because videocards back then had so little memory it was difficult even running 2D applications above 1280x960 or so.
Video card manufacturers recognized the bandwidth constraints of even the AGP bus, so they increased the memory size of their video cards to fully hold everything your game will need. They also increased the bandwidth because a graphics processors is highly analogous to a DSP, and the faster you can shove the data stream, the more operations you can perform on each pixel.
However, increasing memory had limits to performance, which is why Voodoo 2's weighed in at 12 MB per board, TnT maxed at 16, Tnt2 and Geforce at 32, Geforce2 64, Geforce3 predominantly 64, but with a few 128 (and only at the very end of the product cycle) and current cards are still at 64/128 with only a single board even considering 256MB. There simply aren't enough data to store in memory, so why waste money?
Also, increasing the bandwidth doesn't necessarily increase the performance. Look at the original TnT2 and compare it to the TnT2 M64. The original had more bandwidth, but the newer version performed just as well in most cases simply because the processor wasn't fast enough to require a fatter pipe.
The same goes for CPU cache and system main memory. Increasing your memory will bring huge benefits; less disk access, which is on the order of 1,000 to 10, 000 times slower than RAM. However, at a certain point, the costs bring in less return because the data required is already stored in RAM.

Why won't alternating memory types help? Probably because it'll introduce extra latency that'll kill any returns. You're also trying to fatten a bottleneck that's already wider than other bottlenecks (CPU, disk, etc..). The problem with today's processors isn't memory capacity. Systems can easily exceed the optimal price vs performance point. Memory bandwidth isn't the problem since Dual DDR does jack squat to the Athlon architecture and increasing memory bandwidth beyond Pentium IV's limit does the same thing. It's not memory type either, since Pentium IV does just about equal given Dual DDR and Dual RDR with the same bandwidth.
So, having two types of memory in the same system isn't going to help. Supporting two helps transitioning, since if one gets faster than the other, you can just ditch the old memory and get new sticks. In fact, during the old days of EDO vs SDRAM, I do believe no board could have both types installed at the same time. It was one or the other, likely due to engineering considerations and costs vs the performance benefit.
 

borealiss

Senior member
Jun 23, 2000
913
0
0
MadRat

I think you're oversimplifying the prospects of an off-die cache. From a performance perspective, yes, it would increase performance, but probably not by much, except in a very few circumstances. Remember that the old celeron A cores (mendocino?) outperformed their p2 and in some cases p3 equivalents that had the L2 cache off of the die. There was a reason from a performance/cost perspective that Intel and AMD engineers moved the cache of CPU onto the entire die. With a slower cache, you're going to have a much better hit rate, but also remember that it won't be nearly as fast an on-die cache, and with caching schemes, the instruction/data fetches always occur in the lowest level caches to the higher level ones and eventually to memory if a cache miss happens. This means that all memory fetches will have to incur a higher penalty for each cache miss, and since L3 cache is slower being off-die, this just exacerbates the issue. Also remember that prefetch is more important with bigger cache sizes, so in order to fully take advantage of the present L3 cache, different prefetch schemes may have to be implemented. So this can include reworking the prefetch on the northbridge if present or the CPU or both. Also, prefetches take more memory bandwidth, and if they include memory words that are not needed because of say a branch misprediction, that's wasted bandwidth. It's a bit hard for me to explain the entirety of the situation, but if you've ever seen statistics from software simulations of cache hit/miss rates as they increase in size, you'd see what i'm talking about. In almost all instances, smaller faster L2 caches are going to benefit a CPU design more than a bigger slower L3 cache will. Databases and webserving are another matter altogether. But if a design of a system would benefit from an L3 cache by much, it would benefit that much more from more on-die L2 cache since it's faster, miss penalties are lower, and the bus interface is usually much wider than external cache since it's not limited by the pinout of a CPU.
 

MadRat

Lifer
Oct 14, 1999
11,960
278
126
I've always thought that certain things that are memory bound, like the ALU, would benefit by having some supercharged bypass to feed it. (The P4 kind of does it but they cut the width of the ALU's in half, right?) Maybe it would make more sense to have a highspeed port hooked up to the AGP slot, that way it could leach bandwidth from the videocard memory.

Would leaching the raw memory bandwidth off the videocard through the AGP port be possible?
 

borealiss

Senior member
Jun 23, 2000
913
0
0
I guess it is possible in linux. But I still don't see why you would want to do it. The p4 afaik implements double pumped ALU's, nothing about cutting the width of the ALU's in half. Even intel implemented a smaller faster cache to feed data to it. And for some type of super bypass specifically for the ALU in a NUMA architecture, you would have to have some type of dedicated instruction/data fetch module just for the ALU with lines just for it, as opposed to the generic instruction/data fetch module in cpus now. Then you'd have to redesign a new northbridge just to separate data to be fed to the ALU's to separate memory from the rest of memory. Again, this involves breaking down code into different functional parts. More complexity means more $$, and i think for the target audience these parts are meant for are not going to pay for the extra die space to implement that. I can see some of your points for maybe ultra high end scientific computing, like the earth simulator in japan, or something along the lines of that. But for general computing I think what you're proposing is a bit overkill and can be solved through conventional means rather than rework the entire computer system architecture.
 

MadRat

Lifer
Oct 14, 1999
11,960
278
126
With all that raw power sitting in the videocards it would be awesome to find new ways to tap it. Think about the bandwidth that is going to be sitting in the baseline videocard just a year from now. The main memory bandwidth won't even come close.