AMD summit today; Kaveri cuts out the middle man in Trinity.

Page 8 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Status
Not open for further replies.

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,787
136
To add to the above, though, Trinity does now feature a "borrow" type of TDP style turbo (that's actually more efficient and responsive than Ivy Bridge).

Are you talking about the performance when running CPU and GPU intensive applications?

More EUs + more cache should spell better performance but don't expect miracles.

There's a potential for improvement per EU over Ivy Bridge. Peak flops on Ivy Bridge are equal to 2x Sandy Bridge, but in average the benefits are only 50% per EU. Meaning that would end up 2x with 33% more EUs. Peak flops increased because Ivy Bridge's EU allows co-issuing with the previously special-math function only unit, but that only happens 2/3rd of the time. That suggests improving co-issue rate to 90% should improve sustained flops by 20% or so.
 

pelov

Diamond Member
Dec 6, 2011
3,510
6
0
Are you talking about the performance when running CPU and GPU intensive applications?

Yea. The Turbo core feature in Bulldozer was actually really really good. AMD developed it further and then implemented an "how much headroom is available" style of turbo for both CPU and GPU that Intel employs.

There's a potential for improvement per EU over Ivy Bridge. Peak flops on Ivy Bridge are equal to 2x Sandy Bridge, but in average the benefits are only 50% per EU. Meaning that would end up 2x with 33% more EUs. Peak flops increased because Ivy Bridge's EU allows co-issuing with the previously special-math function only unit, but that only happens 2/3rd of the time. That suggests improving co-issue rate to 90% should improve sustained flops by 20% or so.

Theoretical is always going to differ from actual performance figures. Both chip makers abuse "theoretical" persistently. I think the wild card here is going to be the cache availability. The sizes should increase overall (I guess this still depends on the GT# you get but in general it should be higher) so that should help performance as well assuming they stretched the dedicated amount by some. The clock speeds, though, I doubt will increase much. Whatever the gains are will likely come from the increase in EUs and, like you said, if they improve the co-issue rate.

It's kind of shocking to see just how much on-die GPUs are improving in comparison to the stall we've seen on the CPU side. It goes to show just how far behind software is and how much more "visual" and small computing has become.
 
Aug 11, 2008
10,451
642
126
That costs money. Triple and quad channel configurations are expensive.

This seems like a big problem for APUs, maybe more so for AMD, since their APUs are usually going into mainstream to lower end products. Why would you want to go wild with expensive memory on a low to mainstream product. (I am thinking about off the shelf systems mainly). I cant see Dell or Acer or HP putting expensive quad channel memory into their systems. Right now sometimes they dont even set up the memory in dual channel configuration.

On the desktop, if you want to game, I still dont see the place for APUs. Just add a discrete card and avoid all the bottlenecks of memory bandwidth and limited TDP.
 

pelov

Diamond Member
Dec 6, 2011
3,510
6
0
This seems like a big problem for APUs, maybe more so for AMD, since their APUs are usually going into mainstream to lower end products. Why would you want to go wild with expensive memory on a low to mainstream product. (I am thinking about off the shelf systems mainly). I cant see Dell or Acer or HP putting expensive quad channel memory into their systems. Right now sometimes they dont even set up the memory in dual channel configuration.

On the desktop, if you want to game, I still dont see the place for APUs. Just add a discrete card and avoid all the bottlenecks of memory bandwidth and limited TDP.

That's my thinking as well, but I wouldn't be so quick on the desktop end. If you remove (alleviate, more aptly) the DDR bus and speed issue then you're free to grow your chips bigger. So big, in fact, that you can potentially have equal performance from an APU as you would a mid/high end desktop discrete GPU. In reality there's little that differs between the two other than the size and location, one sitting on the PCIE while the other sits attached to the CPU. They're the same architecture, rely on the external DDR (one sitting on the PCB the other on the motherboard) and... well, that's it really ;P Quad and triple channel configurations are also more common in server (so common they've become a necessity) so if you grow down a server-grade APU to the desktop (think 2011 workstation but with an on-die GPU), the 3x/4x channel memory makes far more sense.

For low end systems it makes much less sense, though. The 3x/4x channel memory configurations are expensive and take up too much room meaning they're essentially a nonstarter for laptops and mobile devices. Most of the chips sold to consumers are on mobile devices as well so triple and quad channel memory configs will never happen for low end desktop and mobile. That's not even a possibility. On package RAM, though, makes far more sense for both. Just how it would look like depends on the implementation. Maybe we might see something like a GPU style PCB rather than your typical motherboard where the RAM is attached to the PCB itself rather than individual sticks. Considering AMD is selling their "unified memory address" between CPU and GPU you'd figure that too would help.

Like I said, they're going to have to be creative or follow Intel's lead with cache dedication. It should certainly be interesting :)
 

Arkadrel

Diamond Member
Oct 19, 2010
3,681
2
0
That costs money. Triple and quad channel configurations are expensive.

Bulls**t poor reason.

Right now you can buy 2 x 1gb DDR3-1600 for 23$ on newegg.
Im not sure how much they make when they sell ram, but I bet OEM makers buy ram alot cheaper than that.

so lets say giveing someone 4 gb (4 x 1gb DDR3-1600) costs like ~35$

Now if you build a intire PC.... that tiny increase in cost has to be very small overall.
The performance gain would be BIG (more than the maybe 13$ or so extra the system costs).
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,787
136
Yea. The Turbo core feature in Bulldozer was actually really really good. AMD developed it further and then implemented an "how much headroom is available" style of turbo for both CPU and GPU that Intel employs.

Actually newer drivers vastly improved the issue:

Old: http://www.hardware.fr/articles/815-10/intel-hd-graphics-cpu-vs-igp.html
New: http://www.hardware.fr/articles/863-7/hd-graphics-cpu-vs-igp-quicksync.html

What it is was merely a power sharing problem. My WAG is early drivers dedicated little bit too much to the CPU, so the GPU had less room to increase performance. In desktops, CPU is a major portion of TDP for Sandy and Ivy Bridge.

You can see the 2700K doing slightly worse in Cinebench compared to the 2600K with older drivers(guessing due to driver shifting more to GPU). But much better in HAWX, losing nothing on 4 threads and doing lot better with 8.

The A8-3850 isn't completely immune either.

http://www.hardware.fr/articles/837-3/architecture-gpu-memoire-partagee.html

We need to see mobile Trinity with simultaneous CPU and GPU Turbo to see how it is.
 

Lepton87

Platinum Member
Jul 28, 2009
2,544
9
81
Bulls**t poor reason.

Right now you can buy 2 x 1gb DDR3-1600 for 23$ on newegg.
Im not sure how much they make when they sell ram, but I bet OEM makers buy ram alot cheaper than that.

so lets say giveing someone 4 gb (4 x 1gb DDR3-1600) costs like ~35$

Now if you build a intire PC.... that tiny increase in cost has to be very small overall.
The performance gain would be BIG (more than the maybe 13$ or so extra the system costs).
Yeah, more complex PCB with more layers to accommodate wider memory bus not to mention more complex socket with more pins costs nothing. It's only about the memory sticks
 

bronxzv

Senior member
Jun 13, 2011
460
0
71
So I really think you can safely increase your expectations for Haswell.

from past experience of subpar first implementation of new instructions I prefer to wait for the real chips to be truly convinced

but, indeed, it appears they are aiming for very high performance implementations in the near future, from this post at RWT:
http://www.realworldtech.com/forum/?threadid=123653&curpostid=123697
I learned that a scatter/gather patent was disclosed recently (published June 7, 2012) http://www.faqs.org/patents/app/20120144089

it looks like it will be possible to reach the best throughput for most of the cases where all the lines are available in the L1D cache so that we were in fact too conservative assuming the best case with all elements in a single cache line
 

Arkadrel

Diamond Member
Oct 19, 2010
3,681
2
0
Yeah, more complex PCB with more layers to accommodate wider memory bus not to mention more complex socket with more pins costs nothing. It's only about the memory sticks


Lets see:

AMD Llano A8-3870 ~110$ASUS F1A55-M LX PLUS FM1 ~70$
2 x 1gb DDR3-1600 G.skill ~23$

total = 203$ (for motherboard & CPU(/w IGP) & ram)

vs;

lets say motherboard price goes up 5$ in costs, and that a hypothetical cpu does as well, along with useing another 24$ or so on ram.

new total = 237$ (16% increase in total price)


(if you add in PSU,Monitor,Speakers, keyboard+mouse and the case, the extra 34$ is nothing (probably less than 5%))

Now lets assume you see performance gains of about 40% in gameing at higher resolutions?
You dont think thats worth the extra cost?

I do, I think its a bad dissision AMD is makeing by staying dual channel with their APU's.
 

AtenRa

Lifer
Feb 2, 2009
14,003
3,362
136
OEMs don't like triple and quad memory channels for low cost budget systems.

APUs came to drive cost down not to elevate it. For an OEM system builder, $34 more could be spend for a discrete GPU that will differentiate his system from the competition.

I dont see AMD ever install a triple or quad channel in the Desktop/Mobile APU models.
 

Arkadrel

Diamond Member
Oct 19, 2010
3,681
2
0
@AtenRA

where can you find a GPU for 34$ thats new and faster than a IGP in say a trinity top end APU?
You cant <.< (the 34$ was consumer end, for the OEM's it would probably be half that)

Even something like a 6670 is around ~50$ on newegg.

Spending 34$ more as a consumer for the APU's IGP would give better performance than you can currently get from spending same amount on a low end discrete GPU.

There *should* be quad channel motherboards so they could feed the APUs.
Hell if they want to *cut* something out, dont pay the 15$ the chip for thunderbolt option.
I rather have 40% stronger IGP from quad channel ram, than have thunderbolt on motherboard.

Its about weighing the options, Im sure motherboard manufactures could cut out alot of excess "cost" if they had too, and Id be willing to bet their systems would sell just as good, if they showed theirs had quad channel and 40% better performance than the system next to it (even if it was missing thunderbolt ect)
 

Atreidin

Senior member
Mar 31, 2011
464
27
86
You realize that there are costs that go into a product that have nothing to do with how much the materials cost, right? It's called development. It isn't like you can just add materials and bam, you've got quad channel. It costs time and money to figure out how to make it happen. The companies themselves have to do a risk/reward analysis to see if it is profitable for them to do that. Engineering is difficult and expensive, and I'm sure you don't know all of the problems that must be overcome. Everyone who does armchair-engineering always misses things.

Don't get me wrong I'd like 4 channels, hell why not 8 or 16 channels? But there are people paid to figure out how to do it profitably and they are good at their jobs, and they'd better be because otherwise they would be out of jobs.
 

AtenRa

Lifer
Feb 2, 2009
14,003
3,362
136
&#927;&#954;, lets see what happens if AMD trinity was a Quad memory Chanel implementation.

CPU
Current Trinity die size is 246mm2, we get close to 234 dies per 300mm wafer.

Adding 2x 64bit memory controllers would increase the die size by approximately 8.5% making the die bigger at ~266mm2. That will get us ~216 dies per 300mm wafer.

This is for a quad CPU core die, if you have a dual Core die it will be even worst.
Less dies equals higher manufacturing cost. That equals higher purchase price for the OEMs or less money for the AMD if they will sell at the same price they currently sell Dual Channel Trinity CPUs.

CPU package cost will also increase because of the extra memory lanes, extra layers and extra pins (larger socket array).

Motherboard
Bigger socket array will add cost (bigger CPU socket, bigger socket clip etc)

Quad memory will need more PCB layers, will add cost

Adding four more memory slots (2x 4 DIMMS) will add cost

Will make Mini-iTX format non possible (cant install quad DIMM slots)

System

Two more memory DIMMs will add cost and power usage.

Smaller case formats may need redesign of the power supply and air/thermal characteristics of the case. Adding more cost

You may think that all those are not that expensive but when you manufacturing hundreds of thousands or millions of units it saves you millions
 

Arkadrel

Diamond Member
Oct 19, 2010
3,681
2
0
I still think it could be worth it... if not Quad then atleast tripple channel.

Bottleneck is a ugly thing, it would be awesum if IGP's could match mid level GPUs... or atleast ~7750-5770 level performance. And there is just no way when they cant get the memory bandwidth.

Most people buy a CPU and still end up buying a discrete GPU for 100$.
Why? because the IGP's arnt up to snuff.

as long as their treated "just barely" decent enough, they dont give enough incentive (which means people dont buy it).
They need to be made to match mid level discretes or so..... (make it stand out, charge more for it)
and that isnt gonna happend on Dual Channel ram, because of memory bandwidth issues.

I know there are other reasons, like a 7750 useing like 50watts TPD, so a IGP would probably use around that amount to, which is alot, and the extra heat ect, not to mention what it would do to the chip size of the APU's (probably go into the 350-400mm^2 range).

I just think it would be a better way to go. I mean there is NO frekking way your ignoreing the graphics advantage if its THAT big, which is what AMD should have done. Instead it settled for just meh x2 intel graphics IGP performance (which intel mostly disguards, as just barely good enough for HTPC purposes).

I just think its a shame IGPs arnt growing at a faster rate, even if Trinity adds another 25-30% IGP performance ontop of what Llano has.
 

Martimus

Diamond Member
Apr 24, 2007
4,490
157
106
OEMs don't like triple and quad memory channels for low cost budget systems.

APUs came to drive cost down not to elevate it. For an OEM system builder, $34 more could be spend for a discrete GPU that will differentiate his system from the competition.

I dont see AMD ever install a triple or quad channel in the Desktop/Mobile APU models.

I don't see why that would remove the posibility of having a quad channel memory controller. AMD would need to make it be able to run in dual channel mode and allow for simple and cheap termination of the extra leads for cheaper 2 channel main boards, but I dont see any technical reason having the option for more channels would be a large cost driver.
 

Khato

Golden Member
Jul 15, 2001
1,282
366
136
Bottleneck is a ugly thing, it would be awesum if IGP's could match mid level GPUs... or atleast ~7750-5770 level performance. And there is just no way when they cant get the memory bandwidth.

Here's a question - why exactly would AMD want to effectively invalidate the market for their mid-level GPUs? That's the cost factor that you haven't put into your equations. The manufacturing cost for going quad channel (CPU area, packaging, motherboard changes, and extra sticks of memory) are going to be somewhere around $40... and the profit margin for a 7750 sale is likely right in that area as well. So now you're looking at an $80 increase in price for the OEM, which is pretty much exactly what buying a normal 7750 costs them.

So yeah, it's not just about the extra manufacturing cost, for AMD it's also about protecting their discrete graphics business. They want the iGPU to be powerful for GPGPU where bandwidth isn't going to be the bottleneck and merely decent/better than what Intel can offer for graphics.
 

moonbogg

Lifer
Jan 8, 2011
10,731
3,440
136
Sometimes I have a hard time sleeping at night. When that happens, I look for a spec sheet on an AMD cpu, and I read it. Before I can get past the first paragraph I am sleeping like a baby.
 

BenchPress

Senior member
Nov 8, 2011
392
0
0
I still think it could be worth it... if not Quad then atleast tripple channel.

Bottleneck is a ugly thing, it would be awesum if IGP's could match mid level GPUs... or atleast ~7750-5770 level performance. And there is just no way when they cant get the memory bandwidth.
Extra channels is not the only solution to increase the effective bandwidth. A lot of the GPU's memory accesses are for the color and depth data, and these buffers only take a few tens of megabytes. So by having, say, 64 MB of eDRAM acting like a cache, the off-package bandwidth can remain the same while still allowing to scale up the performance.

Here's a thread on this topic: Breakdown of graphics bandwidth usage. Beware that it's an old thread and the graphs are for Unreal Tournament 2004, but it should still have some relevance.

Haswell GT3 is presumably Intel's guinea pig for this approach. Note that eDRAM should also be more power efficient. Hence GT3 will be introduced in laptop/ultrabook chips. For desktops the expectations from gamers are higher so they'll buy a discrete card anyway and Intel only offers GT1 and GT2 for the non-gamers.

It will be interesting to see how Kaveri deals with the bandwidth issue.
 

pelov

Diamond Member
Dec 6, 2011
3,510
6
0
I don't see why that would remove the posibility of having a quad channel memory controller. AMD would need to make it be able to run in dual channel mode and allow for simple and cheap termination of the extra leads for cheaper 2 channel main boards, but I dont see any technical reason having the option for more channels would be a large cost driver.

It is a very significant cost driver. For an equivalent platform you'd likely pay upwards of $100 more.

Quad and triple channel memory won't happen. It's not even a remote possibility for the mainstream market and the only place it makes sense is where quad channel memory implementation thrives currently -- meaning you don't have to add the memory channels, adjust the IMC, the pins the chipset and the PCB layout -- and that's the server CPUs or APUs.

As was repeated again and again, it'll have to be something other than more channels and DDR4. Neither of those will suffice as far for providing significant bandwidth expansion to allow for higher throughput; the prior due to cost and layout (4 DDRs take up a lot of space) and the latter because it just doesn't provide the necessary performance.
 

Martimus

Diamond Member
Apr 24, 2007
4,490
157
106
It is a very significant cost driver. For an equivalent platform you'd likely pay upwards of $100 more.

Quad and triple channel memory won't happen. It's not even a remote possibility for the mainstream market and the only place it makes sense is where quad channel memory implementation thrives currently -- meaning you don't have to add the memory channels, adjust the IMC, the pins the chipset and the PCB layout -- and that's the server CPUs or APUs.

As was repeated again and again, it'll have to be something other than more channels and DDR4. Neither of those will suffice as far for providing significant bandwidth expansion to allow for higher throughput; the prior due to cost and layout (4 DDRs take up a lot of space) and the latter because it just doesn't provide the necessary performance.

AMD has already done it once with including both a DDR2 and DDR3 IMC on their processor. My point was simply that it probably would not take up that much additional die space on the processor, and it shouldn't add many pins for the extra channels. The main boards don't need to have additional traces if they are only going to use two of the channels. In fact, I think some low end chips have boards that only have traces for a single channel when the chip itself is dual channel.
 

pelov

Diamond Member
Dec 6, 2011
3,510
6
0
AMD has already done it once with including both a DDR2 and DDR3 IMC on their processor. My point was simply that it probably would not take up that much additional die space on the processor, and it shouldn't add many pins for the extra channels. The main boards don't need to have additional traces if they are only going to use two of the channels. In fact, I think some low end chips have boards that only have traces for a single channel when the chip itself is dual channel.

But why?

You're still inflating costs for the mainstream market (your entire market, mind you. AMD has only ~5% server market share nowadays) and selling most of your chips as handicapped, bigger chips with features that are turned off. That makes no sense.
 

Martimus

Diamond Member
Apr 24, 2007
4,490
157
106
But why?

You're still inflating costs for the mainstream market (your entire market, mind you. AMD has only ~5% server market share nowadays) and selling most of your chips as handicapped, bigger chips with features that are turned off. That makes no sense.

It is a simple solution to the biggest bottleneck in the APU. It may not be the best solution, but it is easy and quick to implement, and relatively inexpensive. It takes time to develop more complex and likely more ingrained solutions.
 

pelov

Diamond Member
Dec 6, 2011
3,510
6
0
It is a simple solution to the biggest bottleneck in the APU. It may not be the best solution, but it is easy and quick to implement, and relatively inexpensive. It takes time to develop more complex and likely more ingrained solutions.

Relatively inexpensive? For who? And what about practicality?

AMD sells most of its chips on the low end where triple and quad channel configurations aren't just too expensive but impossible. They won't fit into such devices and the cost of the platform would nearly double (and probably even triple and quadruple the cost of the platform).

They're sticking with FM2 for Kaveri/Steamroller and likely just pushing up the DDR3 speeds. This should buy them time to come up with something that actually makes sense.
 

Lepton87

Platinum Member
Jul 28, 2009
2,544
9
81
Lets see:

AMD Llano A8-3870 ~110$ASUS F1A55-M LX PLUS FM1 ~70$
2 x 1gb DDR3-1600 G.skill ~23$

total = 203$ (for motherboard & CPU(/w IGP) & ram)

vs;

lets say motherboard price goes up 5$ in costs, and that a hypothetical cpu does as well, along with useing another 24$ or so on ram.

new total = 237$ (16% increase in total price)


(if you add in PSU,Monitor,Speakers, keyboard+mouse and the case, the extra 34$ is nothing (probably less than 5%))

Now lets assume you see performance gains of about 40% in gameing at higher resolutions?
You dont think thats worth the extra cost?

I do, I think its a bad dissision AMD is makeing by staying dual channel with their APU's.

Somehow your cost analysis does not seem right. How much does a good socket 2011 motherboard cost over a 1155 counterpart?
 

sefsefsefsef

Senior member
Jun 21, 2007
218
1
71
http://www.hotchips.org/wp-content/...A/HC23.18.320-HybridCube-Pawlowski-Micron.pdf

Micron HMC memories could entirely fix this problem in the future. But who knows when these memories will be commercially available, let alone available in mainstream computers with APUs.

I figure that within the same number of off-chip CPU pins (which is the main manufacturing cost concern) as a typical 2-channel DDR3 implementation that offers ~20GB/s combined read/write bandwidth, you'd be able to fit 160 GB/s read + 80 GB/s write bandwidth using HMCs. They achieve these speeds by fixing the main problem with DRAM bandwidth, which is that devices built on a DRAM process are not very good at driving their output pins, which is why even high end GDDR5 only runs its output pins at ~6Gbps, and those memories are obviously impossible to use as a drop-in replacement for desktop memory anyway.
 
Status
Not open for further replies.