Memory Gap Keeps Growing -- Um, Yeah?

Torquemada

Junior Member
Feb 4, 2012
18
0
0
Pardon if this has been done ad absurdum. But I've seen so many recent references to the "memory gap" that I had to whip up a pocket calculator. Maybe we can set a timeline, or even better, draw a graph for the "memory gap". (No, not last night, but RAM B/W contra CPU powah.)

I had a nice Athlon rig that was 750 @ 1000. (OK, it was really 850 when naked. Those were the days.) It had single-channel PC133 memory. Let's make this a 7.5 ratio for the sake of conversation. Now I have a rig at 3000 that has triple-channel 1600 memory. That's like a 6.25 ratio. (The CPU is possibly more effective at caching and using what it gets.) So... Does the memory gap keep growing? No doubt quad-channel is around the corner, and I mean already lurking there.

(And don't give me any of that old stuff about "dual-pumped" or "rising and falling edges". The databus speed is really what it is. Clock triggering is a minor detail.)

?Que?

Edit: Sweet Jebus I'm an idiot, I temporarily forgot multicore. And next I noticed I can't count. Aiiigh. Interesting to see the replies though. I shan't retract my bad.
 
Last edited:

LokutusofBorg

Golden Member
Mar 20, 2001
1,065
0
76
The "old stuff" about double data rate is not a minor detail. Data is actually being moved across the bus on the rising and falling edges. I don't know how you could think this is a minor detail...
 

exdeath

Lifer
Jan 29, 2004
13,679
10
81
The concept of the memory gap spans the time to initially access the data from the primary non volatile data storage through the entire chain all the way to the CPU registers. When the data size is very very large, as has become the norm lately with ever growing use of virtual machines, bloated applications written in languages like Java, SHVHQXL-HD+++++ video files, 10000 gigapixel family photo albums, large databases, 50 GB email archives, game installs that span 20 DVDs, 92.8 channel 128 MHz sampled 256 bit audio, etc, the only thing that starts to matter any more is the disk access time and data transfer rate.

Of which, the HDD/SSD is the greatest offender by a factor of millions. The HDD is the only thing in a 2012 era PC that still measures things in kilobytes and megabytes per second. Most people's broadband connections in parts of the world are starting to give their primative 50 year old HDD technology a run for the money in access time...

Where I work, downloading large files and disk images over the network, the network has to slow down because the HDD is overloaded (aka the disk light is solid and not left wanting for anything).

CPUs get faster. The amount of data we need to store and work with just keeps getting gargantuan. The ability to store and retrieve that data instantly from it's primary storage device where it resides on power off is what is quickly becoming the crippling bottleneck. In the time that main memory has gone from 1 GB/sec to 30 GB/sec (30 fold), HDDs have gone from 50 MB/sec to 150 MB/sec (scant 3 fold). Loading 1000 times the data off a disk that is 3 times faster = waiting around on our new super fast computers 333 times longer than we used to. No matter how fast RAM is you still need to store that data to a permanent location, and with the size of gargantuan data driven applications today, disk IO is the greatest offender. In the last 50 years we've gone from sipping martinis to sipping the oceans dry, but we are still using a martini straw.

CPU: process 100 GB/sec
|
|
RAM: bandwidth of 30 GB/sec
|
|
|
|
|
|
|
|
|
|
|
|
|
|
SSD array: 1+ GB/sec
|
|
|
|
|
|
|
|
|
|
|
Single SSD .5 GB/sec
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| <- this is the worst part of the memory chasm today
|
|
|
|
|
|
|
|
|
| <- this is the reason people complain their new 16 core 30 GHz uber computer is just as slow as their 15 year old Compaq Presario
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
HDD < 100 MB sec when the moon is full and the tides aligned (more like 1.5 MB/sec in normal random use)
|
Cassette tapes
|
|
33.6 modems
|
Punch cards
|
Caveman with a chisel and hammer
 
Last edited:

exdeath

Lifer
Jan 29, 2004
13,679
10
81
A breakthrough in fast non volatile high density main memory (read: no HDD/SSD required to retain data on power loss) would be the biggest revolution in computing since, and even greatly exceeding the importance of, the inventions of the transistor and microprocessor.

We've been I/O kneecapped since the birth of the first abacus. Even mechanical computers were limited by how fast they could pull punch cards without destroying them.

In theory, STT-MRAM doesn't wear out like flash, is fast like SRAM in CPU caches, is as high density as DRAM or flash (1M1T), and retains data over 30 years without power. Not only would "drives" or "disks" be a thing of the past in a new era of truly solid state zero wait state computing, but with the main memory being as fast as CPU cache, 2/3rds of the CPU die is freed up for more cores instead of massive L2/L3 caches.

Sadly there isn't much interest in this as flash memory is a huge cash cow right now that nobody is willing to kill. Or shall I say nobody is willing to endanger the golden goose known as NAND flash. Any time something better shows potential, it's cast aside in favor of milking inferior technology to death before progress can be made.
 
Last edited:

Torquemada

Junior Member
Feb 4, 2012
18
0
0
The "old stuff" about double data rate is not a minor detail. Data is actually being moved across the bus on the rising and falling edges. I don't know how you could think this is a minor detail...

"On the edges" is a somewhat ambiguous phrase. The data is encoded simply as voltage differentials, and the data pins physically work at the "full" speed. The waveform edges refer only to how that clockspeed gets triggered from the clock pins. So I'd say how the operating speed gets derived is a minor detail in the whole system. Might as well bring in PLLs and oscillators while going into the detail.

Short version: the rising and falling edge has nothing to do with how data actually gets moved; just to the way the frequency is generated for it.

I hope you get my particular focus here, I'm often none too clear with it :p

(As we know, the DDR mechanism came about for two reasons: 1) it's easier to avoid jitter in a high final clock when you transmit a slower base clock to the device and derive from it, 2) memory devices needed a separate internal working speed anyway when the capacitors in the bit cells there couldn't any longer keep up with transistor switching speed, so DDR and latter iterations have instead consecutively doubled the internal access width. However, the internal speed is also a minor detail; the external speed of the device at the address and data pins is all that matters for the system's performance. Latencies aside for now. We agree on this stuff?)
 
Last edited:

Torquemada

Junior Member
Feb 4, 2012
18
0
0
exdeath, good insight, and easy to agree with all that. But I suspect people (well, computer geeks) don't typically use "memory gap" in such a wide sense. Don't really know though.
 

lamedude

Golden Member
Jan 14, 2011
1,230
68
91
I guess it will grow since DDR4 isn't suppose to show up until 2014 and while Intel may not up clockspeeds much this year AMD will try to. DDR3-2133 is a JEDEC standard so that would be an option if was an issue but as the quad channel SNB-E shows its not.
That Athlon's 2/5 or 1/3 L2$ speed (unless it was a T-bird) was a bigger problem than its memory speed. AMD has the same problem today IIRC with its L3$ being much slower than Intel's.
 

BFG10K

Lifer
Aug 14, 2000
22,709
3,004
126
I had a nice Athlon rig that was 750 @ 1000. (OK, it was really 850 when naked. Those were the days.) It had single-channel PC133 memory. Let's make this a 7.5 ratio for the sake of conversation. Now I have a rig at 3000 that has triple-channel 1600 memory. That's like a 6.25 ratio.
Uh, what? That makes no sense whatsoever.

Ratio 1 = 1000 MHz : 1.066 GB/sec (PC-133 SDRAM).
Ratio 2 = 3000 MHz : 38.4 GB/sec (12.8 GB/sec x 3 for triple channel DDR3-1600).

Clock speed has gone up by a factor of three while bandwidth has increased by over 36 times.

(And don't give me any of that old stuff about "dual-pumped" or "rising and falling edges". The databus speed is really what it is. Clock triggering is a minor detail.)
I don’t think you understand how memory bandwidth works.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,787
136
Uh, what? That makes no sense whatsoever.

Ratio 1 = 1000 MHz : 1.066 GB/sec (PC-133 SDRAM).
Ratio 2 = 3000 MHz : 38.4 GB/sec (12.8 GB/sec x 3 for triple channel DDR3-1600).

Clock speed has gone up by a factor of three while bandwidth has increased by over 36 times.

We gotta look at multiple timelines. For example, we could take the first PC-133 SDRAM days to the initial Pentium IIIs, which were at 600MHz. The 3-channel memory on Nehalem was the first system to have that, so its a better comparison.

Just by looking at Athlon 750 to Nehalem, its 36:36. More like 50:36 if we count the Pentium III's running at 500MHz. Performance per clock went up went significantly too. It's probably at least 2x from Pentium III to Nehalem.

I don't think the "memory gap" is bad as for consumers as it is for workstations/servers though. The advent of advanced cache memories have made that moot for smaller working data sets.
 

DominionSeraph

Diamond Member
Jul 22, 2009
8,386
32
91
What "memory gap"? Sandy Bridge with dual-channel memory isn't memory bandwidth limited for 99% of desktop tasks. DDR3-1333 is just as good as DDR3-2400.
 

Cerb

Elite Member
Aug 26, 2000
17,484
33
86
The real problem with memory from a GPU's standpoint is bandwidth. More GB/s means more chips and pins, which means more cost.

The real problem with memory from a CPU's standpoint is access time. Writing takes time. Reading takes time. Changing to a different address takes time. For the CPU, it's usually much more about latency, and that has only been improving slightly, over the years. A CPU waiting on RAM is not doing you any good, and there are several common reasons that can happen (bad branch prediction, wrong prefetching, overwriting data soon to be needed, way conflicts, truly random access, data set larger than cache, data set able to be processed faster than RAM can be accessed for it, etc.). Caches allow a small subset of data in RAM to be 'near', but they are far from perfect.

So, let's look at gaming, a task which drives sales, and tends to limited god quality scalar branchiness.

SDR -> DDR typically gained 10% or so for gaming.
Single-channel -> dual-channel typically gained 10% or so for gaming.
12.8GB/s (DDR2-800) -> 25.6GB/s (DDR3-1600) typically gained 5% or so for gaming.

In fact, you can check out articles on DDR2 and DDR3 timings, and find that tighter timings (lower latencies) are as good as, and often better than, having more bandwidth.

There's no simple fix, unless someone comes up with a cheap dense RAM technology better than DRAM.
 

Torquemada

Junior Member
Feb 4, 2012
18
0
0
Uh, what? That makes no sense whatsoever.

Ratio 1 = 1000 MHz : 1.066 GB/sec (PC-133 SDRAM).
Ratio 2 = 3000 MHz : 38.4 GB/sec (12.8 GB/sec x 3 for triple channel DDR3-1600).

Clock speed has gone up by a factor of three while bandwidth has increased by over 36 times.

Sorry, I indeed had a real brainfart with the math there. I forgot to mention that the latter CPU is quad-core, so the respective "ratio" ends up 2.5 (I don't know where that oddball 6.25 came from). So,

Ratio 1 = 1x1000 : 1x133 = 7.5
Ratio 2 = 4x3000 : 3x1600 = 2.5

IOW, looking at this rough example the "memory gap" seems to have diminished by a factor of 3 or so, not grown. (This ignores CPU core development though. Nehalem > K7 and so on. Anyhow.)

I don’t think you understand how memory bandwidth works.
In meant that clock edges have nothing to do with the data bus between a memory controller and a memory device; only with how the speed for that bus gets derived. To put it more precisely, the CLKN and CLKP pins involve clock edges, but the actual DQ pins used for data transfer do not. Like, in DDR3-1600 memory, the I/O really happens at physically 1600 MHz, nevermind if that frequency is derived from a lower clock elsewhere. Only this I/O speed is relevant for bandwidth. That's why the usage of "effective" or "MT/s" or "Gbps" with memory bandwidth (instead of just plain old MHz/GHz figures for the data bus) is a bit of much ado about nothing.