Nvidia GPUs soon a fading memory?

Scali · May 14, 2010

markdvdman said:
Scali WHAT is this agenda?

You talk about Fanboy rubbish. This is a WORLWIDE market where consumers matter. YOU are the Intel fanboy anbd it is PATHETIC.

lol wut?
I thought this was about GPUs, clearly NOBODY could be a fan of Intel's GPUs. I certainly am not.
Just because I don't like AMD fanboys doesn't mean I'm in the other camp.
I have a Radeon 5770 GPU myself, as I already said.

markdvdman said:
Let us hear FACTS.

I gave many facts, about how a DDR3 modular system is inefficient, and how a CPU+GPU shared memory controller is even MORE inefficient.
Sadly everyone ignored these facts, so there is no discussion possible.

markdvdman · May 14, 2010

Man you gave facts about CONJECTURE.

You are making an assumption they will not sort out memory bandwidth issues. You may well be right but how can you say for certain you are right? You cannot. Let us see and for the greater good let us hope they succeed!

Scali · May 14, 2010

GaiaHunter said:
Or at least we don't see how it solves the problem.

No, it doesn't solve the problem, period.
You can't create bandwidth when it simply isn't there in the memory you're using. It's a simple as that. PERIOD.

GaiaHunter said:
Because there is really no competition pressing them to have an IGP with 5570 performance level.

This could also be interpreted as AMD not having a reason to deliver an IGP with 5570 performance level.
Where does that notion come from anyway?
Is that the conclusion you jumped to just because there is this rumour circling around that it will have 400-480 SPs?
Well, clearly there's a lot more to performance than 400-480 SPs, so even if Llano does have 400-480 SPs, that doesn't mean that AMD is trying to get to 5570 performance level.
Did AMD ever claim they were?

GaiaHunter said:
To expect AMD will realize their 400-480 SP GPU will be bandwidth limited is wishful thinking?

No, to think that glueing 400-480 SP on a CPU+GPU will guarantee 5570-level performance is wishful thinking.
It could be so bandwidth limited that it's barely faster than a 790GX. Who knows at this point?

GaiaHunter said:
Are you telling me that if Llano GPU reach 5570 performance you'll be in pain?

Who knows, I'd be in an alternate universe with completely different laws of physics, apparently.

GaiaHunter said:
But then again, people said it was impossible for Conroe reach as much performance as the initial information claimed. And it reached that performance.

Yea, funny enough it was the same people that are now claiming that Fusion is the best thing since sliced bread, and the people that bought the Barcelona nonsense.
It's always the AMD fanboys.

GaiaHunter said:
Again everyone looking at Llano GPU specs will see immediately the bandwidth problems. It is so obvious that is almost asinine to think AMD engineers wouldn't catch it.

So the question is: what are they going to do about it? The information we have so far tells us they're NOT going to do anything about it. It's going to use dual channel DDR3 memory. Am I correct?
We already know what the bandwidth of dual channel DDR3 memory is, and it's not good enough. PERIOD. THE END. We can reiterate this point until we're blue in the face, but it isn't going to change anything. It's just NOT going to happen.

GaiaHunter said:
Also, at this time, according to the rumour, Llano is already sampling and this time AMD is comparing the GPU performance with their own products.

I'm quite sure they can spin the information in a positive way aswell. It's not that difficult to cook up some benchmarks that aren't very bandwidth-intensive and say "See! It's just as fast as a 5570"... just not in real life situations.

Scali · May 14, 2010

markdvdman said:
Man you gave facts about CONJECTURE.

You are making an assumption they will not sort out memory bandwidth issues. You may well be right but how can you say for certain you are right? You cannot. Let us see and for the greater good let us hope they succeed!

I can be certain because they use dual channel DDR3 memory. That is technology that has been on the market for a few years now. I know for certain how it performs.

markdvdman · May 14, 2010

You cannot be certain at all. You are ASSUMING they will use DDR3 on the RELEASE product?

However, the caching and how it is delivered is more important.

Latency is a big issue but if they sort the caching issues out it may not be.

Some posters supposit Fusion will deliver like radeon 5850 kind of graphics on a cpu die. In 5 years maybe but no chance now.

It will work well (maybe) but you are using conjecture by DEFAULT.

Scali · May 14, 2010

markdvdman said:
You cannot be certain at all. You are ASSUMING they will use DDR3 on the RELEASE product?

No, their roadmaps and slides explicitly STATE that they are using DDR3 on the RELEASE product.
See this image pasted earlier by Martimus, for example:

Is that conjecture? Oh please.

psoomah · May 14, 2010

Scali said:
Who knows, I'd be in an alternate universe with completely different laws of physics, apparently.

You're defintely in a alternate logic universe.

Scali · May 14, 2010

I'm not arguing that AMD can't implement Fusion. I'm merely arguing that they cannot overcome the shortcomings of DDR3, which will stand in the way of '5570 performance'.
Perfectly sound logic. You'll find that out when Llano releases, I suppose.

psoomah · May 14, 2010

Scali said:
I'm not arguing that AMD can't implement Fusion. I'm merely arguing that they cannot overcome the shortcomings of DDR3, which will stand in the way of '5570 performance'.
Perfectly sound logic. You'll find that out when Llano releases, I suppose.

you might want to check out beyond 3d on this. might change your mind.

Scali · May 14, 2010

psoomah said:
you might want to check out beyond 3d on this. might change your mind.

Yea, let's go to Dave Baumann's site, you know, Dave Baumann, Product Manager at AMD?
I'm sure that will give use nice, reliable, unbiased information on AMD products without any marketing or spindoctoring!

GaiaHunter · May 14, 2010

Scali said:
Yea, funny enough it was the same people that are now claiming that Fusion is the best thing since sliced bread, and the people that bought the Barcelona nonsense.
It's always the AMD fanboys.

Please find a post where I talk about barcelona.

Opposed to what you think I'm not trying to prove you that the Llano concept works or not.

That is AMD job to get their products working.

I've just posted the information that is available.

Have you seen the Llano die shot? How many SIMDs/SPs does it seem to have? You disagree that it seems to be 8 SIMDs which means 480 SP?

Does that alone makes the performance comparable to a 5570?

Of course not.

What about clock speeds? 32 nm SOI process not bulk 40 nm. That could mean higher clock speed.

Memory bandwidth? How AMD handles this seems to be what can make this product great or POS. DDR3 1600 dual channel is 25.6 GB/s. And I don't care that the CPUs can't use as much. As you said GPUs use memory differently. I guess you believe that AMD can just create a direct path to the memory controller without changing how the memory controller works and handles the loads.

Then we have the rumours since Llano started sampling. The most recent is tomshardware saying that AMD suggested Llano will have similar performance to 5570. I don't know if tomshardware is lying, if AMD is lying or both. It is there you can go check the site. Maybe AMD was only talking about FP performance. Tomshardware suggested 3d performance.

You believe AMD is a bunch of liar cunts that can't make it happen due to the bandwidth obvious limitation. That is fine.

But people in here are just talking what happens if in fact the rumours are true.

You dislike rumours. That is fine too.

What I don't understand is why the possibility of Llano reaches 5570 performance on the graphics side causes such a reaction on your part.

Are you afraid people will start queuing on stores waiting to buy Llanos? Are you afraid someone will read this thread and shares will go up or down? Are you afraid people will stop buying CPUs or laptops meanwhile?

Because if that is what you are afraid you have loads of threads on several forums to crash in, starting by this one on the AT CPU forums http://forums.anandtech.com/showthread.php?t=2073362 .

Cause if there is no problem, well, we can just wait. No need to shout. We all have access to the same information. We all know that bandwidth seems to be the key. Look at all those "ifs" around.

Scali said:
I gave many facts, about how a DDR3 modular system is inefficient, and how a CPU+GPU shared memory controller is even MORE inefficient.
Sadly everyone ignored these facts, so there is no discussion possible.

Not that I disagree in theory, it would be interesting if you could provide hard data for this one though. Although I don't see how you could, since the only GPU+CPU shared memory controllers that are similar to Llano are only the Llano chips.

Fox5 · May 14, 2010

Scali said:
If you can tell me the secret to how they will overcome the bandwidth problems with a DDR3 motherboard, maybe...

AMD has two possible solutions already implemented in existing products.

1. A tile base deferred renderer greatly reduces bandwidth cost. They could even implement edram like on the xbox 360 if they really wanted to. (optional for higher end APUs)

2. Attach another memory bus to it and external memory. Once again, it only has to be implemented on the higher end products, on chip with the apu or maybe located on the motherboard. If an extra hypertransport bus is too big for this, then maybe a simple GDD3/5 memory controller (without the extra logic that hypertransport needs) would be feasible. If dies smaller than llano can include a GDDR5 memory controller, I'd think AMD could dedicate some space to that. Do we have a die layout of the graphics portion of llano yet?

AMD might not even need to add an extra hypertransport port.
Look at the diagram on this page:
http://www.hardwarereview.net/Reviews/AMD Athlon2 X4-630/AMD-Phenom2-X4-630.htm
From the appearance of it, the memory bus is SEPARATE from the hypertransport bus. Thus, there's 21GB/s of bandwidth available to the memory, and additional memory could hang off the hypertransport bus for a TOTAL of 37GB/s of bandwidth, assuming no changes are made from the current AM3 socket.

And AMD may want to create an IGP with 5570 level performance because that's only ONE generation beyond what you would expect an IGP to be. A late 2012 or 2013 IGP would be expected to be at that performance level. It needs a way to differentiate its products, and it simply can't make a cpu that is performance or power consumption competitive with Intel.

cbn · May 15, 2010

Fox5 said:
And AMD may want to create an IGP with 5570 level performance because that's only ONE generation beyond what you would expect an IGP to be. A late 2012 or 2013 IGP would be expected to be at that performance level. It needs a way to differentiate its products, and it simply can't make a cpu that is performance or power consumption competitive with Intel.

If ATI could get HD5570 performance out of Llano that would be amazing.

According to this, HD5570, HD4670 and HD3870 are all approximately the same level (in the majority of games).

Mr. Pedantic · May 15, 2010

I don't see why they couldn't. 5750 TDP is about 85-90W. Clock the core/memory a bit slower, push down the voltage, or take out some cores, then put that in with a 65W CPU (AMD seems to like those) and you have something roughly 130-140W that will do Turbo in both CPU and GPU intensive tasks, that require only one cooling solution, and which OEMs would probably love. There are obviously some problems to doing that, which is probably why AMD pushed it back so far, but I think it's definitely doable.

Acanthus · May 15, 2010

Scali said:
I'm not arguing that AMD can't implement Fusion. I'm merely arguing that they cannot overcome the shortcomings of DDR3, which will stand in the way of '5570 performance'.
Perfectly sound logic. You'll find that out when Llano releases, I suppose.

If its a new socket it could easily be triple / quad channel, which would dramatically increase bandwidth.

cbn · May 15, 2010

Mr. Pedantic said:
I don't see why they couldn't. 5750 TDP is about 85-90W. Clock the core/memory a bit slower, push down the voltage, or take out some cores, then put that in with a 65W CPU (AMD seems to like those) and you have something roughly 130-140W that will do Turbo in both CPU and GPU intensive tasks, that require only one cooling solution, and which OEMs would probably love. There are obviously some problems to doing that, which is probably why AMD pushed it back so far, but I think it's definitely doable.

So you are saying 130-140 watts if Llano were made on 45nm/40nm process?

What TDP do you think will be achieved on 32nm SOI process?

cbn · May 15, 2010

SlowSpyder said:
I think Nvidia's future largely depends on markets that we aren't used to them being a huge part of, at least not be a part of until recent times.

I think discreet video cards will continue to be a big part of their business, but I am willing to guess that as time goes on it'll be less and less. Nvidia has their foot in the door of the HPC world, but it remains to be seen where that takes them.

Good point. I think a lot of people are interested in where HPC is going.

busydude · May 15, 2010

Computer Bottleneck said:
So you are saying 130-140 watts if Llano were made on 45nm/40nm process?

What TDP do you think will be achieved on 32nm SOI process?

Yes IMO it can be possible, Looking at the TDP of 6-core Thuban(125W @45nm) and a possible 95W version of 1055T/1035T.

4 cores at 65W seems achievable.

Scali · May 15, 2010

GaiaHunter said:
Memory bandwidth? How AMD handles this seems to be what can make this product great or POS. DDR3 1600 dual channel is 25.6 GB/s.

No it's not. I already proved with actual benchmarks that neither Intel nor AMD get anywhere NEAR 25.6 GB/s with DDR3 1600 dual channel. Only triple channel can do that.

GaiaHunter said:
And I don't care that the CPUs can't use as much. As you said GPUs use memory differently.

GPUs do yes, because they don't use a system with memory modules. The chips are mounted directly on the PCB, and as such you don't really have dual or triple channel. They are much wider, and also suffer less from signal degradation, which means the signaling frequencies can be higher etc.
I've already discussed all of this many times, but it keeps getting ignored by people who don't have a clue. Ask any EE and they will confirm what I say.

GaiaHunter said:
What I don't understand is why the possibility of Llano reaches 5570 performance on the graphics side causes such a reaction on your part.

I just think it's pathetic that people want to believe in fairytales so badly. Wake up people!
If you want to know the truth, I would be VERY happy if AMD (or anyone else) pulls it off.
Being a D3D/OpenGL developer myself, I HATE IGPs with a passion, because they are so poor on features and performance. The gap with discrete cards is so incredibly large that you have to jump through a LOT of hoops in order to get your stuff running on an IGP at all, let alone acceptably.
So I want nothing more than for AMD to pull this off, trust me. It would make my life a lot easier.
But it's just not realistic.

GaiaHunter said:
Not that I disagree in theory, it would be interesting if you could provide hard data for this one though. Although I don't see how you could, since the only GPU+CPU shared memory controllers that are similar to Llano are only the Llano chips.

I have provided hard data, it's just being ignored by you because you don't realize that it IS hard data.

Scali · May 15, 2010

Fox5 said:
AMD has two possible solutions already implemented in existing products.

1. A tile base deferred renderer greatly reduces bandwidth cost. They could even implement edram like on the xbox 360 if they really wanted to. (optional for higher end APUs)

AMD doesn't have any tile based deferred renderer technology.
They could implement edram, except none of the information indicates that Llano does.
If we go back to the transistor count... where are the transistors for an edram framebuffer? It doesn't add up. As people have pointed out, the transistorcount pretty much adds up to a quadcore Stars chip and a '5570' chip, as in 400-480 SPs. No edram (the edram chip in the Xbox is slightly over 100 million transistors, hard to miss).
People have analyzed the die shots aswell, nobody saw any indication of edram.

Fox5 said:
2. Attach another memory bus to it and external memory.

I already indicated that this is not possible.
Firstly, another memory bus means that you need more memory modules (it effectively becomes triple or quad channel). This will greatly increase the cost of the system, and that is not going to happen (remember the image a few posts ago? Llano was a MAINSTREAM part, cost is an important factor for the platform).
Secondly, you can't get GDDR in module form, so there's no way you can stick it on a motherboard.
Pre-soldering GDDR modules on a motherboard is not going to happen either, again because of cost (and the fact that AMD never mentioned any such thing, they specifically said DDR3 memory, again look at the image).
The only logical conclusion is that they will use a standard dual-channel DDR3 memory solution. Which is not going to work.

But again I'm just repeating myself. Please try to actually UNDERSTAND my posts, so I don't have to continue refuting new crackpot theories that I've already disproved by earlier posts anyway. In fact, most of it is disproved by information that AMD released itself.

Mr. Pedantic · May 15, 2010

So you are saying 130-140 watts if Llano were made on 45nm/40nm process?

What TDP do you think will be achieved on 32nm SOI process?

No idea. I'm not an engineer. But looking at what Intel's done with 32nm in terms of Gulftown and i3/i5, AMD could do some seriously good stuff with it.

BenSkywalker · May 15, 2010

This is a VERY small market, a little image processing, maybe some nuclear simulation, some climate modeling and a few other specialties where they sell maybe 10-15 super computers a year.

How about data analyzation on any large data set with a lot of varriables? How about every store you walk into that uses a computer generated ordering system, every company that plots sales trends, every company that asseses risk based scenarios(every insurance policy, loan officer etc). The potential market is HUGE- right now it doesn't exist as the technology hasn't been there. We have server racks right now that spend hours trying to compute the data they spend hours figuring out what it takes people who know their jobs seconds to do in a lot of cases. It is painful watching the creaky old x86 FPU trying to compute data sets without the ability to dynamically weight data without slowing to a crawl. The potential market for HPC GPGPU is huge, and I say this as someone who could end up being moved out of my job because of it(I get paid because x86 is too weak to pull it off). Even a dozen $2K graphics cards is a hell of a lot less then my salary, if it can do the same job I can, that is another $24K in revenue for nV and there are a lot of people who have jobs like mine.

Frankly nVidia is smoking some hefty crack or they truly believe discreet graphics is a dead market in 5-10 years and are trying to position themselves in a market that will exist.

Don't you realize that the inverse is also true? How much raw GPGPU power is it going to take to run a JIT compiler on code and handle any x86 based apps on the GPU? Outside of a very few instances- most of which are handled quite nicely by GPUs- raw CPU power is becoming increasingly useless for the typical consumer. What happens first, does nV have a GPU flexible enough to handle all general purpose computing with perhaps some assits from a small ARM or comparable, or does Intel figure out how to make a GPU that people won't fall down laughing over? Right now I think nV is closer on this one

Intel is 100% correct about that and it's the reason AMD and nVidia saw Larrabee as such a threat, x86 FPU's with the power of discreet graphics chips will destroy the discreet graphics market.

Larrabee was so stupid Intel had to cancel all short term plans for it. Larrabee was without question an abject failure in no uncertain terms. At best in prototype mode it could run a several years old game at single digit framerates. It is odd you would bring it up when trying to defend their line of thought, it proved how shockingly bad their idea was. They are offering performance levels below the original GeForce- they are about a decade behind GPUs and GPUs aren't stopping any time soon. Perhaps by 2020 Intel will have something comparable to Larrabee that can compete with today's parts, the idea that AMD or nVidia are going to wait for them to catch up is a rather huge mistake.

With continued process shrinks and better FPU and matrix operations on the CPU, discreet graphics are going away. We are nearly to the point that a general CPU with some advanced FPU's (derived from graphics chips) are going to be capable of real time ray-tracing.

Ray tracing sucks as a final rendering solution. Ray Tracing was really impressive back in the late 80s/early 90s. Now it is a sad joke that can offer cool reflections for real time but that's about it. Ray Tracing is terrible for diffuse, it is a technique best utiltized in a hybrid setup along with rasterization or it is going to look quite frankly, bad. If Intel hits RTRT by 2020 it will be quaint, but if they think that is going to remove the need for discrete graphics they are delusional.

There are some additional areas such as finite element analysis of structural engineering and materials science where these cards might be useful but this market will never even be 10% of the size of the enthusiast graphics market, but nVidia is betting the whole farm on HPC.

Do you not follow nVidia at all? Honestly it seems to me the company as a whole is putting more emphasis on Tegra being the next big thing over Tesla, by quite a bit. HPC is the focus of their high end discrete offerings, it seems like Tegra is what they are planning on using as their bread and butter(and from what I have seen on the market, they dominate the competition in that space).

Either that or get Microsoft and all the software makers to port all the software to ARM.

I took those two quotes of yours out of order due to the fact that the biggest supporter of Tegra atm is MS. It seems that MS is planning on using Tegra as the foundation for their ultra portable market. Given the tablet segment and how nice it would be to have full functionality between Tegra and Windows based apps, let's just say I can see why MS may be self served to start exploring ARM development.

evolucion8 · May 15, 2010

BenSkywalker said:
Do you not follow nVidia at all? Honestly it seems to me the company as a whole is putting more emphasis on Tegra being the next big thing over Tesla, by quite a bit. HPC is the focus of their high end discrete offerings, it seems like Tegra is what they are planning on using as their bread and butter(and from what I have seen on the market, they dominate the competition in that space).

Following? LOL you meant stalking nVidia. Creepy....

I see the opposite, nVidia doing more emphasis with Tesla, besides of the Zune HD (Which I own and is an awesome piece of hardware), I can't recall seeing it in other devices or seeing a lot of talk from companies wanting to adopt it yet. If it stay like that, Tegra which is a nice piece of engineering will fall. Tesla will gain pace eventually, slowly though.

Fox5 · May 15, 2010

Scali said:
AMD doesn't have any tile based deferred renderer technology.
They could implement edram, except none of the information indicates that Llano does.
If we go back to the transistor count... where are the transistors for an edram framebuffer? It doesn't add up. As people have pointed out, the transistorcount pretty much adds up to a quadcore Stars chip and a '5570' chip, as in 400-480 SPs. No edram (the edram chip in the Xbox is slightly over 100 million transistors, hard to miss).
People have analyzed the die shots aswell, nobody saw any indication of edram.

I already indicated that this is not possible.
Firstly, another memory bus means that you need more memory modules (it effectively becomes triple or quad channel). This will greatly increase the cost of the system, and that is not going to happen (remember the image a few posts ago? Llano was a MAINSTREAM part, cost is an important factor for the platform).
Secondly, you can't get GDDR in module form, so there's no way you can stick it on a motherboard.
Pre-soldering GDDR modules on a motherboard is not going to happen either, again because of cost (and the fact that AMD never mentioned any such thing, they specifically said DDR3 memory, again look at the image).
The only logical conclusion is that they will use a standard dual-channel DDR3 memory solution. Which is not going to work.

But again I'm just repeating myself. Please try to actually UNDERSTAND my posts, so I don't have to continue refuting new crackpot theories that I've already disproved by earlier posts anyway. In fact, most of it is disproved by information that AMD released itself.

Well ok, if GDDR isn't possible, then how about straight up DDR3? AMD's current IGPs integrate that on motherboard, and AMD has hypertransport links that it could use to link the memory (that's how it's done in the current configuration for 790gx I believe, the GPU is just integrated onto the cpu die now). An extra 16GB/s of memory bandwidth would be very helpful.

AMD has tackled this problem before, and suddenly you think they just won't attempt to solve it now?

Scali · May 15, 2010

Fox5 said:
Well ok, if GDDR isn't possible, then how about straight up DDR3? AMD's current IGPs integrate that on motherboard, and AMD has hypertransport links that it could use to link the memory (that's how it's done in the current configuration for 790gx I believe, the GPU is just integrated onto the cpu die now). An extra 16GB/s of memory bandwidth would be very helpful.

As I already said... if you want more bandwidth, you need to add extra channels.
That is not very likely as more channels require more modules in order to run the system.
Since AMD markets it as a mainstream platform, it is unlikely that they will demand more than 2 modules (dual channel) as a minimum for a running system.
Not even Intel has triple channel in their mainstream products, even though Intel does have triple channel products on the market. It's just too expensive for the mainstream market.

Which leaves us the only option that AMD's current DDR3 interface in the Stars architecture, yielding about 13 GB/s, is going to be shared between CPU and GPU.
Now they may be able to improve that 13 GB/s a bit (Intel gets 16 GB/s from their dual channel quadcore CPUs with the same DDR3 memory.. AMD currently seems to be somewhat bottlenecked by their cache bandwidth), but it is too far-fetched that AMD can get 20+ GB/s out of just two channels, no matter what they do (let alone that the GPU can get 20+ GB/s while the CPU is also using memory, which it will in any non-trivial 3D application, such as games).
DDR3 isn't exactly new technology, and both AMD and Intel have been optimizing their memory controllers for a few years now. If there was going to be a breakthrough in memory performance, it would have happened already. But neither AMD nor Intel have succeeded so far, and both their DDR3 controllers are quite mature and optimized solutions by now.

Fox5 said:
AMD has tackled this problem before, and suddenly you think they just won't attempt to solve it now?

How has AMD tackled it before?

Nvidia GPUs soon a fading memory?

Banned

Junior Member

Banned

Banned

Junior Member

Banned

Senior member

Banned

Senior member

Banned

Diamond Member

Diamond Member

Lifer

Diamond Member

Lifer

Lifer

Lifer

Diamond Member

Banned

Banned

Diamond Member

Diamond Member

Platinum Member

Diamond Member

Banned