BSNAMD’s Fusion Kaveri APU Supports GDDR5 Memory and PCIE3.0

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Rezist

Senior member
Jun 20, 2009
726
0
71
I was under the impression DDR memory latency is measured in cycles and that while GDDR5 has a higher cycles for latency there happening faster so it offset's it somewhat.
 

sefsefsefsef

Senior member
Jun 21, 2007
218
1
71
Memory latency is most appropriately measured in nanoseconds. In terms of nanosecond latency, DDR3 and GDDR5 are very similar. Someone already posted a link that says that GDDR5 has programmable CAS latency (from 5 to 20 cycles), which if you take the increased clockspeed of GDDR5 into account, means that the latency in terms of nanoseconds is very similar to DDR3.

GDDR5's architecture is different from typical DDR3 in only a few important performance-oriented ways:
* More banks (16 in GDDR5 versus 8 [typical] in DDR3, this is actually a latency-*reducing* feature).
* More data pins per GDDR5 device (32 in GDDR5 versus [typicall] 8 or [very rarely] 16 in DDR3). This makes it so you can get all of the data for a cache line (or whatever granularity of access you're talking about), in a reasonable number of cycles from a single chip. In DDR3, all 8 chips in a rank work together to provide 64 bits per transfer of a cache line (8 bits per transfer each). This width, plus the very high clockspeed of GDDR5 has the net effect of data transfer taking *less time* with GDDR5 compared to DDR3, but data transfer is never the latency bottleneck in a DRAM system, so this part isn't very important. Suffice it to say, this does not have any negative impact on GDDR5 latency.
* Obviously also very high clockspeeds and data transfer speeds.

As a source, other than the link provided earlier in the page, I myself researched DRAM for a couple years, and my lab mate just finished the GDDR5 version of his DRAM simulator, with all the industry-correct timing parameters involved, and according to him they have the almost identical latency.
 

zephyrprime

Diamond Member
Feb 18, 2001
7,512
2
81
Why doesn't anyone think the memory might be for video use only? It might not be available at all as cpu memory.
 

Gideon

Platinum Member
Nov 27, 2007
2,030
5,035
136
I was under the impression DDR memory latency is measured in cycles and that while GDDR5 has a higher cycles for latency there happening faster so it offset's it somewhat.

That's definitely somewhat true. For instance DDR3 has higher latency than DDR2 but this is offset by faster Clock Speed (more cycles per time unit).

However, AFAIK, GDDR5 is only clocked at 1/4 of what the "effective clock speed" implies. It's just a marketing stunt to compare it to DDR3 (DDR3 would need to be at that speed to offer similar bandwidth). So in layman's terms GDDR5 is 4 times wider than DDR3 but runs at lower clock speeds.

So all in all, it should do LESS cycles per clock than DDR3.
 

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
Why doesn't anyone think the memory might be for video use only? It might not be available at all as cpu memory.

I've thought about this too but then I remind myself that doing such a thing, breaking the unified memory space like that, would be an absolute step backwards against the whole "the future is fusion" strategy which has aspirations for someday creating a unified hardware model (HSA) for programs to run on APUs and so forth.

Doing what you propose would be the absolute best thing to do if all AMD wanted to do was advance the performance of their integrated GPU for graphics applications only. But they want these things to evolve into truly usable APUs and for that to happen they need a unified memory architecture.
 

sefsefsefsef

Senior member
Jun 21, 2007
218
1
71
So all in all, it should do LESS cycles per clock than DDR3.

What? Cycles per clock is like saying 1/1<1. I think what you're referring to is transfers per clock, in which case GDDR5 does twice as many as DDR3. DDR3 (DDR stands for "double data rate") transfers two bits per clock cycle. In other words, for every two clock cycles that the DDR3 chip experience, it can send two bits per wire across its data channel to the memory controller. The data bus operates at 2x the DRAM core frequency. In GDDR5, the data bus operates at 4x the DRAM core frequency. So why isn't GDDR5 called GQDR5 (for quad data rate)? I don't know, that's a good question.

Concrete examples: My DDR3 chips in my PC operate at 800MHz, but they transmit data at a rate of 1.6Gbps per data wire in the data bus. The GDDR5 chips on my graphics card operate at 1.2GHz, but they transmit data at a rate of 4.8Gbps per data wire in the data bus.
 

SocketF

Senior member
Jun 2, 2006
236
0
71
.. the lowest available GDDR5-speedgrade is 900 MHz for 3.6 GT/s.

BSN wrote 800 Mhz QDR ... that sounds more like DDR4-3200 to me.

And if they would use GDDR5, why should they limit themselves to something worse than the lowest speed-grade? You can already buy 1750 Mhz GDDR5 chips with 7 GT/s, that would be the double bandwidth.

One can argue that these APUs are only prototypes, but even then ... DDR4 would be more reasonable, as you wont have a 4 GB memory limitation either and it is planned to scale at least to 4.2 GT/s.
 

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
similar latency?

gimme a link please...

I beleive the argument is that they "could have" similar latency if one desired to push the latency on GDDR5 towards that which DDR3 already operates...at the expense of cost (binning), power (voltage), and so on of course.

Nothing in the device physics precludes the two memory devices from having comparable latency, it just isn't something that has been prioritized for GDDR5 binning to date.

GPU core logic runs at vastly lower clockspeeds than does CPU core logic, a factor of 4-5x difference in some cases. GDDR5 latency being 4-5x worse than DDR3 ram is not a problem in that case.

But it will have to be addressed if they intend on making a 4GHz CPU dependent on the latency of GDDR5 memory accesses, and as we all know from buying DDR3 ram you gotta pay for lower latency chips.
 

sefsefsefsef

Senior member
Jun 21, 2007
218
1
71
I've thought about this too but then I remind myself that doing such a thing, breaking the unified memory space like that, would be an absolute step backwards against the whole "the future is fusion" strategy which has aspirations for someday creating a unified hardware model (HSA) for programs to run on APUs and so forth.

Doing what you propose would be the absolute best thing to do if all AMD wanted to do was advance the performance of their integrated GPU for graphics applications only. But they want these things to evolve into truly usable APUs and for that to happen they need a unified memory architecture.

It would also be very expensive, because you'd need wires coming out of your APU to both DDR3 and GDDR5 channels. Unless you're thinking of just a 64-bit DDR3 and 64-bit GDDR5 channel, but then you're gimping yourself on both fronts (CPU and GPU bandwidth). The solution they are actually going with has 0 drawbacks.
 

sefsefsefsef

Senior member
Jun 21, 2007
218
1
71
I beleive the argument is that they "could have" similar latency if one desired to push the latency on GDDR5 towards that which DDR3 already operates...at the expense of cost (binning), power (voltage), and so on of course.

I'm not saying that they *could* have similar latency, I'm saying that they *do* have similar latency.

... GDDR5 latency being 4-5x worse than DDR3 ram is not a problem in that case.

These figures get thrown around the internet a lot, but I've never once seen a source for it. Care to share? It would be even better if you could get super technical and describe to me the wire- and transistor-level reasons why you think that GDDR5 has 4-5x worse latency than DDR3. I can take it.
 

Madpacket

Platinum Member
Nov 15, 2005
2,068
326
126
I've thought about this too but then I remind myself that doing such a thing, breaking the unified memory space like that, would be an absolute step backwards against the whole "the future is fusion" strategy which has aspirations for someday creating a unified hardware model (HSA) for programs to run on APUs and so forth.

Doing what you propose would be the absolute best thing to do if all AMD wanted to do was advance the performance of their integrated GPU for graphics applications only. But they want these things to evolve into truly usable APUs and for that to happen they need a unified memory architecture.

Why couldn't AMD implement unified memory with both GDDR5 and traditional DDR3? They could implement a virutal adddress layer to sit on top of both memory to types allow for a unified address space for all general OS calls and then use catalyst profiles to force DMA of the GDDR5 when games are launched.

Just a thought but I think it could work.
 

sefsefsefsef

Senior member
Jun 21, 2007
218
1
71
Why couldn't AMD implement unified memory with both GDDR5 and traditional DDR3? They could implement a virutal adddress layer to sit on top of both memory to types allow for a unified address space for all general OS calls and then use catalyst profiles to force DMA of the GDDR5 when games are launched.

Just a thought but I think it could work.

But why do you want to have DDR3 at all in that scenario?
 

Fox5

Diamond Member
Jan 31, 2005
5,957
7
81
Memory latency is most appropriately measured in nanoseconds. In terms of nanosecond latency, DDR3 and GDDR5 are very similar. Someone already posted a link that says that GDDR5 has programmable CAS latency (from 5 to 20 cycles), which if you take the increased clockspeed of GDDR5 into account, means that the latency in terms of nanoseconds is very similar to DDR3.

GDDR5's architecture is different from typical DDR3 in only a few important performance-oriented ways:
* More banks (16 in GDDR5 versus 8 [typical] in DDR3, this is actually a latency-*reducing* feature).
* More data pins per GDDR5 device (32 in GDDR5 versus [typicall] 8 or [very rarely] 16 in DDR3). This makes it so you can get all of the data for a cache line (or whatever granularity of access you're talking about), in a reasonable number of cycles from a single chip. In DDR3, all 8 chips in a rank work together to provide 64 bits per transfer of a cache line (8 bits per transfer each). This width, plus the very high clockspeed of GDDR5 has the net effect of data transfer taking *less time* with GDDR5 compared to DDR3, but data transfer is never the latency bottleneck in a DRAM system, so this part isn't very important. Suffice it to say, this does not have any negative impact on GDDR5 latency.
* Obviously also very high clockspeeds and data transfer speeds.

As a source, other than the link provided earlier in the page, I myself researched DRAM for a couple years, and my lab mate just finished the GDDR5 version of his DRAM simulator, with all the industry-correct timing parameters involved, and according to him they have the almost identical latency.

So why not use GDDR5 all the time? Cost? Density?

I believe that the original Xbox used some form of GDDR memory, and it didn't seem to suffer too much for it.
If AMD gives the cpus adequate cache, the latency of GDDR5 shouldn't be too hurtful... although I expected they'd follow Intel and give the GPU cache, but it's probably cheaper to give the cpu adequate cache than to give a GPU.
 

sefsefsefsef

Senior member
Jun 21, 2007
218
1
71
So why not use GDDR5 all the time? Cost? Density?

GDDR5 cannot be used in a DIMM form factor because it requires point to point connections, meaning it has to be soldered onto the motherboard. Because motherboards are only so big, and GDDR5 only comes in capacities of 256MB chips, this puts a severe limit on the amount of total system memory you can have if you go for an all-GDDR5 solution. 4GB is 16 chips, and that would already make for a pretty crowded motherboard.
 

Madpacket

Platinum Member
Nov 15, 2005
2,068
326
126
GDDR5 cannot be used in a DIMM form factor because it requires point to point connections, meaning it has to be soldered onto the motherboard. Because motherboards are only so big, and GDDR5 only comes in capacities of 256MB chips, this puts a severe limit on the amount of total system memory you can have if you go for an all-GDDR5 solution. 4GB is 16 chips, and that would already make for a pretty crowded motherboard.

Yes and that's why I'm thinking AMD could be implementing a hybrid DDR3/GDDR5 solution. DDR 3 is much cheaper and uses less power so it makes sense to leave that as the system memory and outfit the boards with 1 or 2GB GDDR5 options.

I just hope they can figure out a way to socket the GDDR5 so it's upgradable to larger sizes but I doubt they'll do this unless they implement a custom low latency bus as PCIe would probably be to slow.

DDR4 can't come soon enough :)
 
Last edited:

Olikan

Platinum Member
Sep 23, 2011
2,023
275
126
now, if AMD really do this... it will cause even more pain to the memory market
...even with AMD's low market share, losing another 10% of overall sales is scary as hell, probably another manufactorer will had to close the doors :(
 

Evleos

Member
Jan 23, 2004
45
24
81
GDDR5 cannot be used in a DIMM form factor because it requires point to point connections, meaning it has to be soldered onto the motherboard. Because motherboards are only so big, and GDDR5 only comes in capacities of 256MB chips, this puts a severe limit on the amount of total system memory you can have if you go for an all-GDDR5 solution. 4GB is 16 chips, and that would already make for a pretty crowded motherboard.

I believe Samsung is making much larger chips now, c.f 8gb GDDR5 in PS4, but I can't find the source right now.
 

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
These figures get thrown around the internet a lot, but I've never once seen a source for it. Care to share? It would be even better if you could get super technical and describe to me the wire- and transistor-level reasons why you think that GDDR5 has 4-5x worse latency than DDR3. I can take it.

It would be great if you could back up the claims you are making since your claims are the ones that are not adhering to conventional expectations.

I'm not saying that they *could* have similar latency, I'm saying that they *do* have similar latency.
You seem keen to make sure people believe your statements on that very topic, how about giving us something other than blind faith to go on here?

There is a reason why GDDR5 has to be soldered rather than dimmed. You don't get the required eye-pattern to support the clockspeeds for bandwidth and latency targets if the signal to noise degrades as it will when the chips are dimmed rather than BGA'ed.

I suspect you know all this already, you don't need me to hold your hand while I point out the self-evident differences in how these ram types are implemented at the product level (and why they are implemented differently).

But as I have nothing to prove here, while you seem keen to disprove the figures that "get thrown around the internet a lot" it is you who needs to walk the walk and give some proof of your assertions.

If you can't, or won't, then don't be surprised if people just ignore your assertions to the contrary. The internet is full of anonymous dudes who want to talk the talk but bail on the convo as soon as they are pressed to walk the walk. So the ball is firmly in your court. Prove the following assertion please, as well as explaining why GDDR5 is soldered but DDR3 can be a dimm (because it is a factor in latencies, a factor you need to account for):
I'm not saying that they *could* have similar latency, I'm saying that they *do* have similar latency.
 

sefsefsefsef

Senior member
Jun 21, 2007
218
1
71
This is probably what you're looking for. I just spent about 30 minutes going through this document and I think it should answer all your questions.

http://www.hynix.com/datasheet/pdf/graphics/H5GQ1H24AFR(Rev1.0).pdf

Start on page 131 for the list of timings (133 for the row activation, precharge, and column read commands). Nothing in there suggests higher latency for ACT, CAS or PRE commands (the things we care about when we're talking about latency) compared to DDR3.

If you mean something different than this when you say "GDDR5 has high latency," then please let me know, because that would mean that you and I have been speaking two different languages so far.

EDIT: I should say that all I've been arguing so far is that GDDR5 *devices* are not inherently high-latency machines, but I've said nothing about nVidia's or AMD's ability to create a low-latency memory controller. When it comes to memory controller design, high bandwidth and low latency are fundamentally at odds with one another. Graphics memory controllers(so far the most common examples of GDDR5 controllers) care more about bandwidth, and don't care much about latency, so AMD and nVidia have never bothered to design a low latency GDDR5 controller. I'm arguing that you could design a memory controller that could achieve low latencies similar to a DDR3 controller, or even a controller that could switch modes between high bandwidth and low latency (which is what I bet AMD is doing with Kaveri, depending on whether it's working with a CPU or GPU load).
 
Last edited:

Pohemi

Lifer
Oct 2, 2004
10,884
16,969
146
I'm arguing that you could design a memory controller that could achieve low latencies similar to a DDR3 controller, or even a controller that could switch modes between high bandwidth and low latency (which is what I bet AMD is doing with Kaveri, depending on whether it's working with a CPU or GPU load).

That would be great, but what would happen with mixed loads? Just have the controller auto-adjust on a "sliding scale", as opposed to a one way or the other switch?
 

GammaLaser

Member
May 31, 2011
173
0
0
What? Cycles per clock is like saying 1/1<1. I think what you're referring to is transfers per clock, in which case GDDR5 does twice as many as DDR3. DDR3 (DDR stands for "double data rate") transfers two bits per clock cycle. In other words, for every two clock cycles that the DDR3 chip experience, it can send two bits per wire across its data channel to the memory controller. The data bus operates at 2x the DRAM core frequency. In GDDR5, the data bus operates at 4x the DRAM core frequency. So why isn't GDDR5 called GQDR5 (for quad data rate)? I don't know, that's a good question.

Concrete examples: My DDR3 chips in my PC operate at 800MHz, but they transmit data at a rate of 1.6Gbps per data wire in the data bus. The GDDR5 chips on my graphics card operate at 1.2GHz, but they transmit data at a rate of 4.8Gbps per data wire in the data bus.

DDR in GDDR5 is probably in relation to the forwarded write clock and not the command clock in a GDDR5 DRAM. So for a DRAM operating its command clock at 1.2GHz the forwarded clock is physically running 2.4GHz.

Also what you mention about the DRAM core frequency is not quite right. DDR3 and GDDR5 use prefetch-8n architecture so the internal core is 8 times wider but 8 times slower than the I/O bitrate.
 

sefsefsefsef

Senior member
Jun 21, 2007
218
1
71
DDR in GDDR5 is probably in relation to the forwarded write clock and not the command clock in a GDDR5 DRAM. So for a DRAM operating its command clock at 1.2GHz the forwarded clock is physically running 2.4GHz.

Also what you mention about the DRAM core frequency is not quite right. DDR3 and GDDR5 use prefetch-8n architecture so the internal core is 8 times wider but 8 times slower than the I/O bitrate.

I see. So in GDDR5, the WCK controls reads and writes, and it operates at twice the CK (which is the command clock).

Any idea why the way to talk about these things is to say that "my GDDR5 is running at 1.2GHz" and "my DDR3 is running at 800MHz," when the transfer rate is really 4.8Gbps/pin and 1.6Gbps/pin respectively?