ATI RV870

Page 3 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

MrSpadge

Member
Sep 29, 2003
100
6
0
Originally posted by: Azn
Such an argument is wrong. Just because it always has been like that doesn't mean it has to be like that. It's likely, but not necessary. Rjc's argument is that 16 ROPs for 256 Bit was just the best balance between performance and cost, considering the data rate of the memory and the chip clock speed. Now that the data rate has almost doubled, but the chip clock stays about the same, they may want to change this balance (or make the ROPs more powerful). I don't know if this is possible though.

Unless there's evidence ROP to memory bit bus has changed we can only assume it will be same.

Look, the situation is the following: we don't know what will be in RV870. We could speculate in one way or another, but for this point it doesn't matter. RJC says they could increase the ROP number while staying with a 256 bit bus, whereas you say this is not possible. Now comes the interesting part: I'm telling you that you can't base your argument on the fact that in the last years a 256 Bit bus has always been coupled with 16 ROPs. This does not necessarily tell us that it has to be like that. That's what I'm saying, not more and not less.

And regarding your argument with the diagrams.. sure, the ROPs are tied to the mem controller and in case of RV770 are decoupled from the shaders / TMUs. But look at that picture you are always linking to. There is one block labeled ROP next to each 64 Bit mem channel. It does not say "4 ROPs" or anything like that!

And please don't get me wrong, I'm not taking much of a position in this discussion, except that I'm convinced ATI can increase the ROP throughput as necessary, be it by increased numbers or improved performance. But I'm not in the position to judge what they're going to do and/or need to do. But I do tell you that your point does not hold a critical assessment.

He doesn't say that. It's just an example. Anyway, the German article says die size will be 205 mm^2, so they do add features. Which should already have been obvious from the claim of DX 11 and >1000 SP

Of course he said that. He's saying ATI needs to sell smoke alarms and fire extinguisher with every single card because some how ATI is going to sell pad limited chips.

Please, don't be rediculous! Let me quote him again:

If its at 40nm RV770 the chip with a perfect shrink it will go from ~260mm2 -> 140mm2. Unless packaging improves dramatically that is too small for a 256bit interface let alone a 384 bit one. Maybe if you put 2 together somehow on one die could get it big enough for a 384bit bus....2 billion transistors for sure. Bonus smoke alarm and fire extinguisher with every sale

There's an "if" in the beginning of this sentence for a reason! It's a "Gedanken experiment", not something declared to be a fact. You are turning the words in his mouth around while irgnoring some of them, twisting the statement into something which was obviously not intended and looks rubbish. Really, he's just saying three things here:

- if ATI went for a pure optical shrink of RV770 to 40 nm he would expect the chip to be ~140 mm²
- on a 140 mm² chip it's difficult to implement a 256 Bit bus (due to the limited number of pads which can phyically fit beneath such a *small* chip), let alone a 386 Bit bus (which had been brought up in the discussion before)
- putting 2 of these chips with 1 billion transistors each onto one single card will create a lot of heat -> fire extinguishers reference

Compare these statements with what you are reading into them. Isn't that totally different? In fact so different that one might suspect you misunderstand him deliberately?

Applying it to GT200 with its huge die is certainly out of proportion. But if you wanted to feed a 140 mm^2 GPU with a 386 Bit mem bus you'd get into serious trouble with the pads. These 100 - 150W also have to be delivered in some way.

Why are you applying 140mm chip with a 384 bit memory bus? Baseless assumptions on your part as well. ATI is somehow going to shrink RV770 and stick 384bit bus? Was I even saying that in the first place? Yeah really... :roll:

This entire point also resembles around the Gedankenexperiment above: an optical shrink of RV770. It was created to show you what being pad limited means, that the chip is physically too small to accomodate a wider memory interface. RJC brought up the 140 mm² chip and you'd like to see a 386 Bit bus. It's just the extreme cases covered: GT200 on one side and this hypothetical chip on the other side. The first is fine with it's pad whereas the latter is not feasible. Of course you can add more functional units to the chip to make it larger, but then you get a different chip, don't you? More expensive, higher power consumption and higher performance.

If these early rumors are true which I highly doubt it seems ATI is aiming for a 256bit bus with faster GDDR5 memory. They can maybe have 32ROP's. Say 8 ROP for every 64bit memory controller which it won't matter if they used gddr3 or 5. That would make the chip 2x larger than what the RV770 is now so that's probably out of the question.

The ROPs are just a small part of the chip, so doubling their number shouldn't increase die size too much. If you double die size you'd also have to double SPs, TMUs.. everything.

Wow you got all that after reading nordic who quote that unreliable german site? 128bit bus gddr5? That makes no sense. Why would they lower specs? They want faster performance not slower.

The German site is not mentioning this 128 Bit version at all. They don't have much of a reputation (yet=), but actually they look much more serious than.. the normal suspects for rumors.

Why they would want such a chip? (~800 SPs, 40 nm, 128 Bit high speed GDDR5) To lower cost, obviously! They'd loose some performance due to the smaller mem interface and could make up for that by a higher chip clock (compared to 4850). They'd achieve about the same performance with smaller chips (cheaper if the yield is comparable) and a simpler / cheaper board layout. They could offer you a better price / performance compromise. And it has been done before: 3850 -> 4670. Of course this won't be their new flag ship..

You link me to a beyond forum which they talk about 384bit bus being a viable option with only 300mm die size which you were quoting 400mm in your very first reply. With a die shrink, 24ROP, 96TMU, 1280SP, on a 384 bit bus with high speed gddr5 isn't out of the question. If the die size is too small they can easily add more SP and texture units to compensate just like they did with RV770.

So you are accepting the concept of "need certain die size for mem interface" and the only question remaining is "how bad is it"? Damn it, really got to go to bed..

(I'll just skip the quadro discussion for now, I don't think it adds anything to the discussion)

MrS
 

VirtualLarry

No Lifer
Aug 25, 2001
56,327
10,035
126
Originally posted by: Karathkasun
Think back to the 1950GT same exact chip as 1950/xt/xtx but had a laser cut on the die to disable one ROP quad, same goes for x800 and x800xl/xt or x850 and x850xt.
I thought that the 1950GT was the same as the 1950Pro, only lower clocks. I have both a 1950Pro and a 1950GT, I guess I should hook up the other machine and check with GPU-Z.

 

Tempered81

Diamond Member
Jan 29, 2007
6,374
1
81
If rv870 is on schedule, which it should be, you should be able to buy one in june/july 2009
 

rjc

Member
Sep 27, 2007
99
0
0
Originally posted by: Azn
So what does that have to do with your original statement about GDDR5 can use 32ROP while GDDR3 can not?
What i meant to say is doubling the ROP power would be useful if the chip was gddr5 only, but a waste of transistors if they are planning mainly to sell it with gddr3. ie the gddr3 would be the bottleneck

Considering RV770 ROP is tied to memory bus I don't see how they are going to raise the ROP without raising the memory bit bus unless they went back to their original ring bus method which was a power hog with higher latencies. I don't think ATI is going to do that.

If these early rumors are true which I highly doubt it seems ATI is aiming for a 256bit bus with faster GDDR5 memory. They can maybe have 32ROP's. Say 8 ROP for every 64bit memory controller which it won't matter if they used gddr3 or 5. That would make the chip 2x larger than what the RV770 is now so that's probably out of the question.
They did it with the RV770, they are double power compared to RV670(see rage3d link previous). As gddr5 memory is going up in bandwidth i am speculating they will need to do it again. If you can find a labelled die shot of the RV670 and RV770 you could compare the die area the rops used on the RV670 to the RV770 to see the cost of this improvement. From that could make a reasonable guess at how much another doubling would cost.

Originally posted by: rjc
Apparently according to current rumors the first chip on 40nm for ATI will be a die shrunk RV770 which cause of the size will have a 128bit bus on GDDR5. The idea is to replace to 4850 in the lineup with this, the smaller chip + cheaper board offsetting the GDDR5 cost. After follows the RV870 with extra stuff to fill it up to be big enough for 256 bit memory interface.

Wow you got all that after reading nordic who quote that unreliable german site? 128bit bus gddr5? That makes no sense. Why would they lower specs? They want faster performance not slower.
First they go with a test chip at the new 40nm process, usually a smaller one or straight shrink to make sure everything works as expected, minimise variables. The straight RV770 shrink + 128bit bus people are calling the RV740 chip. This is supposed to be ATI's test.

If that works(say available by Q2 next year) then they will take plunge on the RV870 hoping for maybe late Q3 perhaps.

No the chip is for cuda. Crunching numbers. memory bandwidth or ROP has nothing to do with crunching numbers. It's the SP that does all the work.
I have been looking everywhere for a review of this card but not found a single one, the specs providing on the nvidia website are kind of minimal.

You link me to a beyond forum which they talk about 384bit bus being a viable option with only 300mm die size which you were quoting 400mm in your very first reply. With a die shrink, 24ROP, 96TMU, 1280SP, on a 384 bit bus with high speed gddr5 isn't out of the question. If the die size is too small they can easily add more SP and texture units to compensate just like they did with RV770.
The 400mm suggestion i made was just a guess off the top of my head - i was thinking about G80 at 484mm2 and couldn't remember any other 384bit chips offhand.

The 300mm die size i personally think is not reachable for the RV880 with 384bits memory as the current RV770 with 260, implies maybe 330 or 350 if i had to put money on it. At 40nm though that is space for >2 RV770 chips....how likely is that?

Contrast existing 256 bit chips say the RV670 at 195mm2 or so and the G94b(9600GT) at around 190mm2 would put the RV870 at RV770 + 40% free die to add more stuff.
 

AzN

Banned
Nov 26, 2001
4,112
2
0
Originally posted by: MrSpadge
Originally posted by: Azn
Such an argument is wrong. Just because it always has been like that doesn't mean it has to be like that. It's likely, but not necessary. Rjc's argument is that 16 ROPs for 256 Bit was just the best balance between performance and cost, considering the data rate of the memory and the chip clock speed. Now that the data rate has almost doubled, but the chip clock stays about the same, they may want to change this balance (or make the ROPs more powerful). I don't know if this is possible though.

Unless there's evidence ROP to memory bit bus has changed we can only assume it will be same.

Look, the situation is the following: we don't know what will be in RV870. We could speculate in one way or another, but for this point it doesn't matter. RJC says they could increase the ROP number while staying with a 256 bit bus, whereas you say this is not possible. Now comes the interesting part: I'm telling you that you can't base your argument on the fact that in the last years a 256 Bit bus has always been coupled with 16 ROPs. This does not necessarily tell us that it has to be like that. That's what I'm saying, not more and not less.

And regarding your argument with the diagrams.. sure, the ROPs are tied to the mem controller and in case of RV770 are decoupled from the shaders / TMUs. But look at that picture you are always linking to. There is one block labeled ROP next to each 64 Bit mem channel. It does not say "4 ROPs" or anything like that!

And please don't get me wrong, I'm not taking much of a position in this discussion, except that I'm convinced ATI can increase the ROP throughput as necessary, be it by increased numbers or improved performance. But I'm not in the position to judge what they're going to do and/or need to do. But I do tell you that your point does not hold a critical assessment.


Maybe if you actually read what RJC and even my later posts this isn't even remotely we had a disagreement over.

RJC says you can raise ROP with GDDR5 and not with GDDR3. That somehow ROP wasn't tied to RV770 memory bit bus. Sure you can raise ROP per bit bus and I don't doubt that but it doesn't mean gddr3 can not do this either.



He doesn't say that. It's just an example. Anyway, the German article says die size will be 205 mm^2, so they do add features. Which should already have been obvious from the claim of DX 11 and >1000 SP

Of course he said that. He's saying ATI needs to sell smoke alarms and fire extinguisher with every single card because some how ATI is going to sell pad limited chips.

Please, don't be rediculous! Let me quote him again:

If its at 40nm RV770 the chip with a perfect shrink it will go from ~260mm2 -> 140mm2. Unless packaging improves dramatically that is too small for a 256bit interface let alone a 384 bit one. Maybe if you put 2 together somehow on one die could get it big enough for a 384bit bus....2 billion transistors for sure. Bonus smoke alarm and fire extinguisher with every sale

There's an "if" in the beginning of this sentence for a reason! It's a "Gedanken experiment", not something declared to be a fact. You are turning the words in his mouth around while irgnoring some of them, twisting the statement into something which was obviously not intended and looks rubbish. Really, he's just saying three things here:

- if ATI went for a pure optical shrink of RV770 to 40 nm he would expect the chip to be ~140 mm²
- on a 140 mm² chip it's difficult to implement a 256 Bit bus (due to the limited number of pads which can phyically fit beneath such a *small* chip), let alone a 386 Bit bus (which had been brought up in the discussion before)
- putting 2 of these chips with 1 billion transistors each onto one single card will create a lot of heat -> fire extinguishers reference

Compare these statements with what you are reading into them. Isn't that totally different? In fact so different that one might suspect you misunderstand him deliberately?

Why even mention a perfect die shrink of the exact same chip and stick 384 bit bus? It doesn't even make sense to bring up a point and loosely stick that "if" to confuse people. That isn't even the chip I was perpetuating in the first place.

Wow misunderstood him deliberately? :disgust:


Applying it to GT200 with its huge die is certainly out of proportion. But if you wanted to feed a 140 mm^2 GPU with a 386 Bit mem bus you'd get into serious trouble with the pads. These 100 - 150W also have to be delivered in some way.

Why are you applying 140mm chip with a 384 bit memory bus? Baseless assumptions on your part as well. ATI is somehow going to shrink RV770 and stick 384bit bus? Was I even saying that in the first place? Yeah really... :roll:

This entire point also resembles around the Gedankenexperiment above: an optical shrink of RV770. It was created to show you what being pad limited means, that the chip is physically too small to accomodate a wider memory interface. RJC brought up the 140 mm² chip and you'd like to see a 386 Bit bus. It's just the extreme cases covered: GT200 on one side and this hypothetical chip on the other side. The first is fine with it's pad whereas the latter is not feasible. Of course you can add more functional units to the chip to make it larger, but then you get a different chip, don't you? More expensive, higher power consumption and higher performance.

So my idea about raising ROP and texture sp ratio doesn't even get in to your ridiculous experiment? Of course you can add more functional units to the chip. That's what I was proposing in the first place. :roll: That's why it's not called RV770 with a die shrink. it's called RV870. :music:


If these early rumors are true which I highly doubt it seems ATI is aiming for a 256bit bus with faster GDDR5 memory. They can maybe have 32ROP's. Say 8 ROP for every 64bit memory controller which it won't matter if they used gddr3 or 5. That would make the chip 2x larger than what the RV770 is now so that's probably out of the question.

The ROPs are just a small part of the chip, so doubling their number shouldn't increase die size too much. If you double die size you'd also have to double SPs, TMUs.. everything.

That's basically it.

Wow you got all that after reading nordic who quote that unreliable german site? 128bit bus gddr5? That makes no sense. Why would they lower specs? They want faster performance not slower.

The German site is not mentioning this 128 Bit version at all. They don't have much of a reputation (yet=), but actually they look much more serious than.. the normal suspects for rumors.

Why they would want such a chip? (~800 SPs, 40 nm, 128 Bit high speed GDDR5) To lower cost, obviously! They'd loose some performance due to the smaller mem interface and could make up for that by a higher chip clock (compared to 4850). They'd achieve about the same performance with smaller chips (cheaper if the yield is comparable) and a simpler / cheaper board layout. They could offer you a better price / performance compromise. And it has been done before: 3850 -> 4670. Of course this won't be their new flag ship..

This is what puzzles me. I don't even where he got this idea that RV870 is going to be a die shrink with a 140mm with 128bit bus to emulate 4850.

That would be more of the mainstream part. They wouldn't want to stick gddr5 either at least not yet on a lower end part.

You link me to a beyond forum which they talk about 384bit bus being a viable option with only 300mm die size which you were quoting 400mm in your very first reply. With a die shrink, 24ROP, 96TMU, 1280SP, on a 384 bit bus with high speed gddr5 isn't out of the question. If the die size is too small they can easily add more SP and texture units to compensate just like they did with RV770.

So you are accepting the concept of "need certain die size for mem interface" and the only question remaining is "how bad is it"? Damn it, really got to go to bed..

(I'll just skip the quadro discussion for now, I don't think it adds anything to the discussion)

MrS
[/quote]

I never denied the facts. :confused:

 

AzN

Banned
Nov 26, 2001
4,112
2
0
Originally posted by: MrSpadge
Back again.. pretty tired, but there should be enough time left for a reply.

The scaling:
Features like transistors are placed on a die in a single plain (don't bother with hight and interconnects for now). That means e.g. a transistor occupies an area A defined by a certain length l and a width w. Primary school tells us that A = l * w. So if your typical feature is reduced from 55 to 40 nm you have to apply the scaling to both, l and w. Now A = (40/55)*l * (40/55)*w = (40/55)^2 *l*w. 40/55 = 0.72 and 0.72*0.72 = 0.53. 260 mm^2 * 0.53 = 137 mm^2. You see, there's not much room for any mistakes in these simple steps. Is it clear where this comes from?

Regarding your very valid question: note that I put in brackets "note that this assumes perfect scaling, reality will be a bit worse than that". Not all elements on a chip can be scaled down the same way as transistors and I'm not 100% sure if transistors always get the full size reduction in both dimensions. These factors depend on the details of the actual physical implementation. Additionally in cases of "a dumb optical shrink", where you don't redesign your chip, you might get wasted space between functional blocks. You have to rearrange the blocks to optimize space usage, but then you'd have to redo all the timings, thus negating the advantage of the "simplicity" of a pure optical shrink.

Let's take a look at another example: Athlon 64 going from 130 nm Newcastle (144 mm²) to 90 nm Venice (83.5 mm²). According to your method you'd get 90/130 * 144 = 99.6 mm². Theoretically it should be (90/130)² * 144 = 69 mm². So going with the square of the "smallest feature size" (the xx nm number) and allowing for some inefficiency is a good rule of thumb and may underestimate the size after the die shrink, whereas just going with the feature size overestimates the size and is thus wrong. In the first case it's not the formula / maths which is wrong, it's just that we don't know the exact efficiency.

And something else to note: 40, 55, 80, 110 and 150 nm are "half node" processes, which, if I remember correctly, means that not all features are scaled down from the corresponding normal / "full node" processes (45, 65, 90, 130 and 180 nm).

MrS

Thanks for the explanation. I guess that makes sense.
 

AzN

Banned
Nov 26, 2001
4,112
2
0
Originally posted by: rjc
Originally posted by: Azn
So what does that have to do with your original statement about GDDR5 can use 32ROP while GDDR3 can not?
What i meant to say is doubling the ROP power would be useful if the chip was gddr5 only, but a waste of transistors if they are planning mainly to sell it with gddr3. ie the gddr3 would be the bottleneck

It's just not feasible to add 32ROP on a gddr5 only just because it has the bandwidth. You would have to raise everything else along with ROP. If you think about it the card would be so big and expensive to produce. That's why I mention 384bit bus with 24ROP. Something in the middle to compensate for the size and bandwidth limitations of the chip even with gddr5 on 256bit bus.

Considering RV770 ROP is tied to memory bus I don't see how they are going to raise the ROP without raising the memory bit bus unless they went back to their original ring bus method which was a power hog with higher latencies. I don't think ATI is going to do that.

If these early rumors are true which I highly doubt it seems ATI is aiming for a 256bit bus with faster GDDR5 memory. They can maybe have 32ROP's. Say 8 ROP for every 64bit memory controller which it won't matter if they used gddr3 or 5. That would make the chip 2x larger than what the RV770 is now so that's probably out of the question.
They did it with the RV770, they are double power compared to RV670(see rage3d link previous). As gddr5 memory is going up in bandwidth i am speculating they will need to do it again. If you can find a labelled die shot of the RV670 and RV770 you could compare the die area the rops used on the RV670 to the RV770 to see the cost of this improvement. From that could make a reasonable guess at how much another doubling would cost.

In theoretical the ROP has doubled in performance. Not entirely true. Only thing it really super charges is AA performance. Considering RV670 did AA entirely on SP AMD didn't really need this fix until the AA to ROP resolve with RV770. In 32 bit color RV770 or RV670 does 16pixels per clock.

A quick synthetic bench will show it hasn't doubled.

http://techreport.com/r.x/rade...870/3dm-color-fill.gif

Notice the 4850 and 3870. Not much difference with nearly same bandwidth and clocks. 4870 is different since it uses gddr5. It's able to saturate the ROP lot more than gddr3. Same effect can be shown with 2900xt compared to 3870. Since 2900xt has 512bit bus and more bandwidth.

http://techreport.com/r.x/rade...0xt/gpu-3dm-single.gif

When you consider the SP has completely redesigned with RV770 and got rid of unwanted parts I don't think another major optimization would be possible with RV870 to reduce size of the SP again for a 32 ROP card. When you raise ROP count that means you have to raise everything else or it won't perform like it's supposed to just by adding 32ROP. Just look at GT200 with 32ROP. It sure isn't performing like everyone had hoped is it?

It would be 1 giant chip with everything else raised. Much bigger than the 384bit I had proposed which you estimated it will be 400mm with over 2 billion transistors which in your later post becomes much smaller. At least 25% bigger if not bigger. Not to mention bandwidth limited like a g92 would.

Originally posted by: rjc
Apparently according to current rumors the first chip on 40nm for ATI will be a die shrunk RV770 which cause of the size will have a 128bit bus on GDDR5. The idea is to replace to 4850 in the lineup with this, the smaller chip + cheaper board offsetting the GDDR5 cost. After follows the RV870 with extra stuff to fill it up to be big enough for 256 bit memory interface.

Wow you got all that after reading nordic who quote that unreliable german site? 128bit bus gddr5? That makes no sense. Why would they lower specs? They want faster performance not slower.
First they go with a test chip at the new 40nm process, usually a smaller one or straight shrink to make sure everything works as expected, minimise variables. The straight RV770 shrink + 128bit bus people are calling the RV740 chip. This is supposed to be ATI's test.

If that works(say available by Q2 next year) then they will take plunge on the RV870 hoping for maybe late Q3 perhaps.

Where did you get this info from? You've got a link? Seems you are speculating like it's a fact.


You link me to a beyond forum which they talk about 384bit bus being a viable option with only 300mm die size which you were quoting 400mm in your very first reply. With a die shrink, 24ROP, 96TMU, 1280SP, on a 384 bit bus with high speed gddr5 isn't out of the question. If the die size is too small they can easily add more SP and texture units to compensate just like they did with RV770.
The 400mm suggestion i made was just a guess off the top of my head - i was thinking about G80 at 484mm2 and couldn't remember any other 384bit chips offhand.

The 300mm die size i personally think is not reachable for the RV880 with 384bits memory as the current RV770 with 260, implies maybe 330 or 350 if i had to put money on it. At 40nm though that is space for >2 RV770 chips....how likely is that?

Contrast existing 256 bit chips say the RV670 at 195mm2 or so and the G94b(9600GT) at around 190mm2 would put the RV870 at RV770 + 40% free die to add more stuff.

Yeah that was on a 80nm and completely different chip.

It's a shame really. You completely ignored what I had said in my very first reply or the post above.

I had proposed for higher SP texture ratio by 50% and the post you replied to with 2 more SP clusters. If the chip doesn't fit they can easily add more SP and texture units as they did to RV770 even in your very first link to Tom's article saying ATI had added more SP and TMU to make it fit. Pad limiting isn't an issue for the most part they can always improvise. Also you can't count on perfect scaling as Mr Spadge explained in the post above.

I just don't know how you got 330 or 350? Exact 50% increase on a 55nm RV770 would be 390mm. That's not including 2 more SP clusters and 50% more texture units which you ignored on top of the 50% raise when scaled to 24ROP with 384bits. All added together it's more like 500mm if not more on the same process. Since perfect scaling never happens, 300mm seems about right if not a little bigger on 40nm.
 

MrSpadge

Member
Sep 29, 2003
100
6
0
OK, let's go back a little:

1. in this thread the idea came up that 16 ROPs won't be enough for RV870
2. RJC said they could up the number of ROPs as needed or make them more powerful
3. you are saying that you need to use a wider memory bus to use more ROPs and thus his idea can not work (hence you suggest 24 ROPs + 384 Bit bus)

That's the fundamental problem. If this is not clarified any further discussion is almost pointless, the discussion can not get anywhere. So I'm trying to bring some order into this mess by telling you that your point (3) is based on two arguments, which are both invalid. This doesn't automatically make rjcs point (2) valid, but you can not rightfully claim yours to be true. I feel this is important because your entire concept of RV870 revolves around statement (3).

And since you seem to put a heavy emphasis on this GDDR3/5 argument: I think RJC is not argumenting well here. The point I'm taking away from his posts is that 16 ROPs are enough for 256 bit GDDR3, but with high speed GDDR5 you'll want more ROP-power (be it by number or capability). That makes sense to me whereas the rest looks more wrong than just sketchy. That's why I thought this field was not worth further discussion.

Maybe if you actually read what RJC and even my later posts this isn't even remotely we had a disagreement over.

RJC says you can raise ROP with GDDR5 and not with GDDR3. That somehow ROP wasn't tied to RV770 memory bit bus. Sure you can raise ROP per bit bus and I don't doubt that but it doesn't mean gddr3 can not do this either.

This entire GDDR3/5 discussion is one of his weak points, but (2) still remains. Now you are saying "Sure you can raise ROP per bit bus and I don't doubt that", whereas in one of the first posts you said:

If they keep the 256bit bus that means they will still have 16ROP. It's about time ATI needs to raise up their ROP count. They've been on 16ROP count for more than 5 years.

Can you see my problem with your argumentation now? Is the statement above not valid any more? Let's say ATI decides that RV870 should get 96 TMUs and 1280SPs. They don't want to make it too big to keep power and cost in check. They figure they're better off with 24 ROPs than with 16, so these go in as well. Now what about the memory bus? If (2) holds they can stick with 256 Bit and high speed GDDR5, but if (3) is true they'd have to go for 384 Bits, which would require a larger chip and make everything more expensive. The larger chip would be considerably more powerful and a real power hog due to the raw number of transistors. This is important for us as customers: does AMD have to make RV870 a huge, power sipping chip or do they have a realistic option to stay with their "more small chips" strategy? This is so fundamental, that's why I'd like to clarify this issue.

And related:

It's just not feasible to add 32ROP on a gddr5 only just because it has the bandwidth. You would have to raise everything else along with ROP.

I ask again, why do you have to double everything else along with ROPs? Especially when you agree to "The ROPs are just a small part of the chip, so doubling their number shouldn't increase die size too much." and to "Sure you can raise ROP per bit bus and I don't doubt that"?

MrS
 

MrSpadge

Member
Sep 29, 2003
100
6
0
Why even mention a perfect die shrink of the exact same chip and stick 384 bit bus? It doesn't even make sense to bring up a point and loosely stick that "if" to confuse people. That isn't even the chip I was perpetuating in the first place.

Everything here about RV870 is speculation at this point. We are evaluating possibilities, since we lack the hard facts. It's only natural to use "if"s in such cases.

Being provocative I could say that if you said something like: "If they added 512bit memory controller to a 16ROP ..." I jump in and say: "What, they're gonna make RV870 512 bit? Are you crazy?" You'd reply: "Well, I didn't say that." And me: "Then don't confuse me with that if!". I know it would be hilarious, but basically it's the same situation. Except that your supposed first statement you may have added a subclause which explains the situation.

So why was that chip brought up anyway? We (actually rjc) were still explaining this pad-limited thing. It's a means to get a feel for the numbers involved, exploring the extreme cases to better judge everything in between.

Wow misunderstood him deliberately?

I know this sounds harsh and I'm sure you're not doing it. But just look at the statements, he said:

- if ATI went for a pure optical shrink of RV770 to 40 nm he would expect the chip to be ~140 mm²
- on a 140 mm² chip it's difficult to implement a 256 Bit bus (due to the limited number of pads which can phyically fit beneath such a *small* chip), let alone a 386 Bit bus (which had been brought up in the discussion before)
- putting 2 of these chips with 1 billion transistors each onto one single card will create a lot of heat -> fire extinguishers reference

And you:

ATI is going to release the exact same card without more SP texture units to the chip?

OK, by now we know you missed the "if", which turns his entire statement into garbage. Just wanted to show that, if you know about the "if", his and your statements differ dramatically, that's why I brought it up.

This is what puzzles me. I don't even where he got this idea that RV870 is going to be a die shrink with a 140mm with 128bit bus to emulate 4850.

That would be more of the mainstream part. They wouldn't want to stick gddr5 either at least not yet on a lower end part.

I may start to sounds like a broken record, but again: he didn't say so. He said according to Nordic Hardware there's gonna be a 40 nm chip with a similar internal config as RV770 but with 128 Bit GDDR5. This is their test chip for 40 nm, called RV740 and RV870 follows shortly, if things go smoothly.

What really puzzles me is how anyone could misunderstand such a clearly written statement into "this idea that RV870 is going to be a die shrink with a 140mm with 128bit bus".

So my idea about raising ROP and texture sp ratio doesn't even get in to your ridiculous experiment?

I did not yet say anything about what I think RV870 can be or should be. Still trying to sort out the basics. Based on the idea that it needs more ROP power the first question to solve is how to give it to the chip. Do we need a ~300 mm² chip for that, together with 384 Bit mem? Or is it possible to have a comparatively slim 200 - 250 mm² chip with 256 Bit mem, which still has enough bandwidth? I think the latter is in AMDs spirit, at least the post-R600 spirit. They'd like their single GPU cards to cover the 200 - 300? range, which prohibits large and expensive chips and boards. Everything above will be taken care of by the multi-GPU cards.

This leaves us with 2 options: the large chip with 384 Bit mem would have many SPs, probably more than the 1280 suggested by you. Even on 40 nm that means a high power consumption, which will surely limit clock speed. And consider this: chip area is expensive for AMD, because they have to pay TSMC directly for the area. Clock speed, however, is cheap, as the custumer pays for it.. to the local electricity company. That's why they'll rather want a smaller chip, which doesn't have to be throttled back due to power consumption (like e.g. GT200). They'll want the memory interface just powerful enough not to hold back performance in most cases, anything more is inefficient.

That's why I suppose they'll want to stick with a rather smaller chip, which dictates 256 Bit mem. Somehow they'll manage not to be ROP limited, except maybe at the highest resolutions. SP count somewhere between 1000 and 1300. And raising TMU/SP ratio? I don't think so. Ever since R520 ATI has had more emphasis on raw number crunching power than nVidia, so I don't expect them to divert from this path.

MrS
 

Ares202

Senior member
Jun 3, 2007
331
0
71
Originally posted by: MrSpadge
This leaves us with 2 options: the large chip with 384 Bit mem would have many SPs, probably more than the 1280 suggested by you. Even on 40 nm that means a high power consumption, which will surely limit clock speed. And consider this: chip area is expensive for AMD, because they have to pay TSMC directly for the area. Clock speed, however, is cheap, as the custumer pays for it.. to the local electricity company. That's why they'll rather want a smaller chip, which doesn't have to be throttled back due to power consumption (like e.g. GT200). They'll want the memory interface just powerful enough not to hold back performance in most cases, anything more is inefficient.

That's why I suppose they'll want to stick with a rather smaller chip, which dictates 256 Bit mem. Somehow they'll manage not to be ROP limited, except maybe at the highest resolutions. SP count somewhere between 1000 and 1300. And raising TMU/SP ratio? I don't think so. Ever since R520 ATI has had more emphasis on raw number crunching power than nVidia, so I don't expect them to divert from this path.

Its theoretically possible to fit any bus width on any size of chip, you just need equipment that can machine smaller links between the memory and the core, the question is if ATI has this kind of technology so dont dismiss 384/512 bit on the concept of the size of the gpu core itself

For ATI to get DX11 implementation and to double the size and performance of the RV770 within a year, that would blow everything out of the water that is currently on the market, G80 anyone? its possible but highly unlikely and not theoretically sound (moore's law) making big expensive chips is not what ATI is aiming for anymore, so we can assume theyll want to make a small chip thats cost effective and tries to undercut Nvidia

I also assume that ATI will be sticking to its current arhitecture, i can see something intermidary happening as already stated in this thread, building something like the RV770 just bigger most likely x1.5

768mb and 1.5gb DDR5 (96x8/192x8)
384 bit
24 rops
1200sp


55-40nm is basically a 1/3 shrink to so the rv770 is 256mm 1/3 = about 190mm x1.5 = 285mm which is hardly larger than the RV770 slap that new cooler that ATI is developing on it and your all set

power consunption would be up by up to 1/3, but that's hardly affecting the sales of the 4870's and 4850's at the moment, i think people in the enthisiast and high-ish end computer dont find power usage a big selling point, the performace itself does the hard sell and the rest is an afterthought when the electricty bills are recieved through the letterbox
 

MrSpadge

Member
Sep 29, 2003
100
6
0
- you're right, packaging technology could improve or some solution may already exist which allows for more pins / pads but is just more expensive. I don't know the details, but progress here is generally not as quick.

- adding DX 11 capability will also eat some transistors

- yes, power consumption doesn't matter much for high end - unless it starts to hurt. 150W (4870) can be handled by current coolers, but 250W and above gets really painful.

MrS
 

AzN

Banned
Nov 26, 2001
4,112
2
0
This entire GDDR3/5 discussion is one of his weak points, but (2) still remains. Now you are saying "Sure you can raise ROP per bit bus and I don't doubt that", whereas in one of the first posts you said:

Perhaps I worded wrong considering English isn't very strong. I guess that's where you thought I meant 16ROP is only for 256bit bus. I should have worded since ATI has been using 16ROP it's most likely that they will use 16ROP. We had a disagreement between his gddr5 being able to use 32ROP and not with GDDR3 not 256bit can use only 16ROP.

Can you see my problem with your argumentation now? Is the statement above not valid any more? Let's say ATI decides that RV870 should get 96 TMUs and 1280SPs. They don't want to make it too big to keep power and cost in check. They figure they're better off with 24 ROPs than with 16, so these go in as well. Now what about the memory bus? If (2) holds they can stick with 256 Bit and high speed GDDR5, but if (3) is true they'd have to go for 384 Bits, which would require a larger chip and make everything more expensive. The larger chip would be considerably more powerful and a real power hog due to the raw number of transistors. This is important for us as customers: does AMD have to make RV870 a huge, power sipping chip or do they have a realistic option to stay with their "more small chips" strategy? This is so fundamental, that's why I'd like to clarify this issue.

Why isn't it valid? Why not raise ROP so it can do 24 pixels per clock instead of 16?

What's another 20 more watts if it performs 2x faster than current 4870? In the PC world performance is everything.


I ask again, why do you have to double everything else along with ROPs? Especially when you agree to "The ROPs are just a small part of the chip, so doubling their number shouldn't increase die size too much." and to "Sure you can raise ROP per bit bus and I don't doubt that"?

I think I pretty much explained this on the next paragraph here....

When you consider the SP has completely redesigned with RV770 and got rid of unwanted parts I don't think another major optimization would be possible with RV870 to reduce size of the SP again for a 32 ROP card. When you raise ROP count that means you have to raise everything else or it won't perform like it's supposed to just by adding 32ROP. Just look at GT200 with 32ROP. It sure isn't performing like everyone had hoped is it?

It would be 1 giant chip with everything else raised. Much bigger than the 384bit I had proposed which you estimated it will be 400mm with over 2 billion transistors which in your later post becomes much smaller. At least 25% bigger if not bigger. Not to mention bandwidth limited like a g92 would.




Everything here about RV870 is speculation at this point. We are evaluating possibilities, since we lack the hard facts. It's only natural to use "if"s in such cases. Being provocative I could say that if you said something like: "If they added 512bit memory controller to a 16ROP ..." I jump in and say: "What, they're gonna make RV870 512 bit? Are you crazy?" You'd reply: "Well, I didn't say that." And me: "Then don't confuse me with that if!". I know it would be hilarious, but basically it's the same situation. Except that your supposed first statement you may have added a subclause which explains the situation. So why was that chip brought up anyway? We (actually rjc) were still explaining this pad-limited thing. It's a means to get a feel for the numbers involved, exploring the extreme cases to better judge everything in between.

When RJC explained the pad limiting I got it first time around and read tom's link. Why mention die shrink of RV770 with 384bit bus when that wasn't what I was proposing from the very first reply? Like this is what I said and stick a "if" in there for kicks.

I know this sounds harsh and I'm sure you're not doing it. But just look at the statements, he said:

OK, by now we know you missed the "if", which turns his entire statement into garbage. Just wanted to show that, if you know about the "if", his and your statements differ dramatically, that's why I brought it up.

So what? You still think I'm deliberately trying to misunderstand him? :roll:


I may start to sounds like a broken record, but again: he didn't say so. He said according to Nordic Hardware there's gonna be a 40 nm chip with a similar internal config as RV770 but with 128 Bit GDDR5. This is their test chip for 40 nm, called RV740 and RV870 follows shortly, if things go smoothly. What really puzzles me is how anyone could misunderstand such a clearly written statement into "this idea that RV870 is going to be a die shrink with a 140mm with 128bit bus".

It states nothing like that in Nordic's link. :disgust:

RV870, dubbed Lil' Dragon, will be the first 40nm GPU from AMD. It's slated for a summer launch in 2009. We expect the silicon to be taped out before the end of the year, but we don't know the specifications. A lot of numbers have now been posted over at a German site, much like the first specifications of RV770 that turned out to be planted misinformation. Even though much of this seem reasonable we urge you to consider this information with a truckload of salt until we can confirm this.

Just the fact that this source claims that AMD wants the GPU ready for market in late Q1 makes us doubtful, but we still feel we owe you to forward this information. They claim a memory bandwidth of up to 150-160GB/s with possibly a 512-bit bus. The memory bandwidth may be near the truth but there will NOT be a 512-bit bus. There is no need for it with GDDR5. With 5GHz GDDR5 over 256-bit bus you get 160GB/s, which should be enough.

The overall performance is said to be around 1.5TFLOPS, which matches the first stories of a performance boost of around 1.2 times that of RV770. This points to four added shader clusters, with 40 shaders and 2 texture units per cluster, adding up to 960 shaders and 48 texture units. This was not stated in the article, but mere speculation on our part.

The last piece goes on to say that R800 will feature a MCM (Multi-Chip Module) solution with two RV870 dies under the same IHS. This rumor has been going around for long, and when they claim it is already made, we doubt it even more. On the other hand, it turned out that the rather poor specifications of RV770 turned out to be completely off and that the unfeasible information of 800 shaders turned out to be right...



I did not yet say anything about what I think RV870 can be or should be. Still trying to sort out the basics. Based on the idea that it needs more ROP power the first question to solve is how to give it to the chip. Do we need a ~300 mm² chip for that, together with 384 Bit mem? Or is it possible to have a comparatively slim 200 - 250 mm² chip with 256 Bit mem, which still has enough bandwidth? I think the latter is in AMDs spirit, at least the post-R600 spirit. They'd like their single GPU cards to cover the 200 - 300? range, which prohibits large and expensive chips and boards. Everything above will be taken care of by the multi-GPU cards. This leaves us with 2 options: the large chip with 384 Bit mem would have many SPs, probably more than the 1280 suggested by you. Even on 40 nm that means a high power consumption, which will surely limit clock speed. And consider this: chip area is expensive for AMD, because they have to pay TSMC directly for the area. Clock speed, however, is cheap, as the custumer pays for it.. to the local electricity company. That's why they'll rather want a smaller chip, which doesn't have to be throttled back due to power consumption (like e.g. GT200). They'll want the memory interface just powerful enough not to hold back performance in most cases, anything more is inefficient. That's why I suppose they'll want to stick with a rather smaller chip, which dictates 256 Bit mem. Somehow they'll manage not to be ROP limited, except maybe at the highest resolutions. SP count somewhere between 1000 and 1300. And raising TMU/SP ratio? I don't think so. Ever since R520 ATI has had more emphasis on raw number crunching power than nVidia, so I don't expect them to divert from this path.

Why not dominate the high market as well when they've got a good design on their hands? ATI pretty much disabled SP and TMU on 4830. They can easily do it on the 384bit bus to scale accordingly and rule top to upper midrange. 300mm isn't that bigger compared to 200-250mm. ATI can easily charge $600~$400 instead of $300~200 that they've been charging with RV770 and take market shares away from Nvidia. Multi-GPu's do not scale as well as a single core card. It's just not practical for most people not that it doesn't stop some people. At 300mm a x2 isn't out of the question either. Nvidia is trying to gx2 with their GT200 with a die shrink which is still about 400mm. http://techreport.com/discussions.x/15807

 

rjc

Member
Sep 27, 2007
99
0
0
Originally posted by: Azn
In theoretical the ROP has doubled in performance. Not entirely true. Only thing it really super charges is AA performance. Considering RV670 did AA entirely on SP AMD didn't really need this fix until the AA to ROP resolve with RV770. In 32 bit color RV770 or RV670 does 16pixels per clock.

A quick synthetic bench will show it hasn't doubled.

http://techreport.com/r.x/rade...870/3dm-color-fill.gif

Notice the 4850 and 3870. Not much difference with nearly same bandwidth and clocks. 4870 is different since it uses gddr5. It's able to saturate the ROP lot more than gddr3.
Does not the 3870 have higher clocks and more bandwidth(GDDR4) than the 4850? Maybe the 3850 is a better compare? From the same site:
Techreport 4830 Review

There seems some improvement in the 4850 from the 3850 say 25% for that test. Yes its not double obviously.

First they go with a test chip at the new 40nm process, usually a smaller one or straight shrink to make sure everything works as expected, minimise variables. The straight RV770 shrink + 128bit bus people are calling the RV740 chip. This is supposed to be ATI's test.

If that works(say available by Q2 next year) then they will take plunge on the RV870 hoping for maybe late Q3 perhaps.

Where did you get this info from? You've got a link? Seems you are speculating like it's a fact.
Back in July, Nordic mentioned it here. Since then apparently it has taped now. Details on the spec are still not conclusive most people saying a 40nm RV770 with 128bit bus.

Yeah that was on a 80nm and completely different chip.

It's a shame really. You completely ignored what I had said in my very first reply or the post above.
:( which part? The discussion is hard to keep on track, easy to lose parts. With regard to the g80 been different, yeah sure but there are no other data points for 384bit recently.

I just don't know how you got 330 or 350? Exact 50% increase on a 55nm RV770 would be 390mm. That's not including 2 more SP clusters and 50% more texture units which you ignored on top of the 50% raise when scaled to 24ROP with 384bits. All added together it's more like 500mm if not more on the same process. Since perfect scaling never happens, 300mm seems about right if not a little bigger on 40nm.

The 330/350 figure was a guess, it's their second chip(after RV740) on 40nm so they are unlikely to get it as tight as what they are currently doing on 55nm...which is really 65nm as only some parts are shrunk. I think their first 65nm chip was release back in early 2007 or something...a RV630(ie 2600) maybe?

With large chips you also have add more redundancy otherwise the yield gets too low. Also i am thinking the number of power pins is going to have to increase quite a bit with so many logic transistors in such a small area. Finally as the product is planned for the second half of next year am wondering if they will need to do something about RoHS so they can sell it in europe in 2010. ie TechReport second last paragraph.
 

AzN

Banned
Nov 26, 2001
4,112
2
0
Originally posted by: rjc
Originally posted by: Azn
In theoretical the ROP has doubled in performance. Not entirely true. Only thing it really super charges is AA performance. Considering RV670 did AA entirely on SP AMD didn't really need this fix until the AA to ROP resolve with RV770. In 32 bit color RV770 or RV670 does 16pixels per clock.

A quick synthetic bench will show it hasn't doubled.

http://techreport.com/r.x/rade...870/3dm-color-fill.gif

Notice the 4850 and 3870. Not much difference with nearly same bandwidth and clocks. 4870 is different since it uses gddr5. It's able to saturate the ROP lot more than gddr3.
Does not the 3870 have higher clocks and more bandwidth(GDDR4) than the 4850? Maybe the 3850 is a better compare? From the same site:
Techreport 4830 Review

There seems some improvement in the 4850 from the 3850 say 25% for that test. Yes its not double obviously.

Let's not forget 3dmark vantage is testing 64bit color fill in this test. Which RV770 does 32 pixels per clock vs 16 on RV670. In the real world vantage is testing HDR scenes. Majority of games are 32bits with hints of HDR scenes. That's where you get the discrepancies. GDDr3 or GDDr4 is not much difference. Just little higher clocks for gddr4 with higher latencies.

Considering bandwidth has a whole lot to do with ROP that's where you see the 4850 only 25% advantage over RV670 in 64bit color fill even though it can do 32 pixels per clock in 64 bit color. Same reason 4870 seems to do well here compared GTX280 considering 4870 can stretch the 64bit color fill with its bandwidth.

First they go with a test chip at the new 40nm process, usually a smaller one or straight shrink to make sure everything works as expected, minimise variables. The straight RV770 shrink + 128bit bus people are calling the RV740 chip. This is supposed to be ATI's test.

If that works(say available by Q2 next year) then they will take plunge on the RV870 hoping for maybe late Q3 perhaps.

Where did you get this info from? You've got a link? Seems you are speculating like it's a fact.
Back in July, Nordic mentioned it here. Since then apparently it has taped now. Details on the spec are still not conclusive most people saying a 40nm RV770 with 128bit bus.
[/quote]

This is a low end part. They don't talk a die shrink of RV770 on a 128bit bus with gddr5. They are speculating on RV 740 core next generation not RV770 die shrink on 128bit bus. You've pretty much speculated the rest like it's fact when it's not case.


Yeah that was on a 80nm and completely different chip.

It's a shame really. You completely ignored what I had said in my very first reply or the post above.
:( which part? The discussion is hard to keep on track, easy to lose parts. With regard to the g80 been different, yeah sure but there are no other data points for 384bit recently.

1.5x texture ratio is the first thing I mentioned. I don't know how you missed it. Perhaps you should read before replying. It might help. :p You seem like a smart dude you can easily do the math yourself to fill in the blanks. The post you replied to had 24ROP 96TMU and 384bit bus. If you did simple math it has 50% more higher texture ratio compared to RV770. Why are you comparing to G80? It's entire different chip on a different process!


I just don't know how you got 330 or 350? Exact 50% increase on a 55nm RV770 would be 390mm. That's not including 2 more SP clusters and 50% more texture units which you ignored on top of the 50% raise when scaled to 24ROP with 384bits. All added together it's more like 500mm if not more on the same process. Since perfect scaling never happens, 300mm seems about right if not a little bigger on 40nm.

The 330/350 figure was a guess, it's their second chip(after RV740) on 40nm so they are unlikely to get it as tight as what they are currently doing on 55nm...which is really 65nm as only some parts are shrunk. I think their first 65nm chip was release back in early 2007 or something...a RV630(ie 2600) maybe?

With large chips you also have add more redundancy otherwise the yield gets too low. Also i am thinking the number of power pins is going to have to increase quite a bit with so many logic transistors in such a small area. Finally as the product is planned for the second half of next year am wondering if they will need to do something about RoHS so they can sell it in europe in 2010. ie TechReport second last paragraph.

A bad guess considering 50% more of 260mm is 390mm. yet you completely ignored my 50% higher texture ratio and 2 more clusters of SP just to give a falsified answer. A 384 bit bus with 24 ROP isn't really out of the question considering the scale of the chip. It seems to me if fits perfectly if not add more TMU and SP just like they did with RV770 which you disagreed from the get go. Considering you or mr. spadge don't even post here often I find it odd. You seem to be infatuated with 32ROP on gddr5 with 256bit bus just to prove me wrong? Well I gave your idea a thought which doesn't seem logical with 32ROP's on a 256bit bus without raising everything else which would make it bigger than the one I proposed that's starved for bandwidth.
 

rjc

Member
Sep 27, 2007
99
0
0
Originally posted by: Azn
This is a low end part. They don't talk a die shrink of RV770 on a 128bit bus with gddr5. They are speculating on RV 740 core next generation not RV770 die shrink on 128bit bus. You've pretty much speculated the rest like it's fact when it's not case.

Ok to do this need to make an assumption: That ATI's first 40nm chip will be a straight die shrink. They wont alter anything that cant be avoided.

Looking at ATI's current designs we have from most recent the RV730, RV770 and RV710. They are all at 55nm, so shrinking to 40nm:
RV730 146mm2 -> 77mm2
RV770 256mm2 -> 135mm2
RV710 73mm2 -> 38.6mm2

Shrinking the RV710 really doesnt make much sense as the resulting die is just too small, its igp territory, they would struggle to fit a 64bit bus on the chip. That leaves 2 choices push the RV730 down to entry level or the RV770 to mainstream level. In both cases due to pad limiting the memory interface would likely have to be halved which saves some die. These savings would likely be cashed in due to inefficiencies in the shrink.

What breaks the tie between these 2 choices? Not much really, but the codename 'RV740' is a bit of a giveaway, strongly suggesting the resulting part will fit between the RV770 and RV730 in performance. That makes it much more likely to be the RV770 shrink + 128bit bus at 135mm2. And hence its code name 'little dragon'

Apparently its taped, with the results being not much better performance or power saving compared to parent but good die space savings(ie the shrink got reasonably close to the maths predicted numbers).

Source are (the usual suspects):
Rage3d
Beyond3d
VR-Zone
(Sorry cant be bothered linking to individual posts within those threads at the moment)

1.5x texture ratio is the first thing I mentioned. I don't know how you missed it. Perhaps you should read before replying. It might help. :p You seem like a smart dude you can easily do the math yourself to fill in the blanks. The post you replied to had 24ROP 96TMU and 384bit bus. If you did simple math it has 50% more higher texture ratio compared to RV770. Why are you comparing to G80? It's entire different chip on a different process!
Ok so you want me to work out how big a RV770 with 1.5x textures + 384bit bus? As that is a bit time consuming to get a RV770 die shot and calculate the area of each feature and add them all up, maybe it try a different way?

Luckily the just released RV730 has rebalanced the shaders to 320 with 32 textures, 8 ROPs and a 128bit bus!

Multiplying this by 3x gives 960sp's, 96tmu, 24rops and 384bit bus. From above area at 55nm is 146mm2 so x 3 = 438mm2. Hmmm smaller than G80 or new GT206 at around 480mm2 or so(ouch!).

Shrinking 438mm2 -> 232mm2
Bit small perhaps, even with inefficiencies in the shrink will need to add more stuff to get it up in the 300mm2-350mm2 range for the memory interface to work.

Considering you or mr. spadge don't even post here often I find it odd. You seem to be infatuated with 32ROP on gddr5 with 256bit bus just to prove me wrong? Well I gave your idea a thought which doesn't seem logical with 32ROP's on a 256bit bus without raising everything else which would make it bigger than the one I proposed that's starved for bandwidth.
In the beyond3d thread linked above they talk(around page 8 or 9) about if there is a need for more ROP power on the new chip.

For seeming a very long time gddr3 has been around and about its bandwidth only increasing slowly, so much that people stopped talking about it directly instead expressing a graphics cards power in terms of the interface width(ie 64bit is unspeakable, 128bit for small children, 256 acceptable, 384 for real men, 512 for chuck norris). At least for the next year or so that shorthand wont work, with gddr5 increasing so much, will need to talk about the bandwidth directly, at least for awhile
 

AzN

Banned
Nov 26, 2001
4,112
2
0
Originally posted by: rjc
Originally posted by: Azn
This is a low end part. They don't talk a die shrink of RV770 on a 128bit bus with gddr5. They are speculating on RV 740 core next generation not RV770 die shrink on 128bit bus. You've pretty much speculated the rest like it's fact when it's not case.

Ok to do this need to make an assumption: That ATI's first 40nm chip will be a straight die shrink. They wont alter anything that cant be avoided.

Looking at ATI's current designs we have from most recent the RV730, RV770 and RV710. They are all at 55nm, so shrinking to 40nm:
RV730 146mm2 -> 77mm2
RV770 256mm2 -> 135mm2
RV710 73mm2 -> 38.6mm2

Shrinking the RV710 really doesnt make much sense as the resulting die is just too small, its igp territory, they would struggle to fit a 64bit bus on the chip. That leaves 2 choices push the RV730 down to entry level or the RV770 to mainstream level. In both cases due to pad limiting the memory interface would likely have to be halved which saves some die. These savings would likely be cashed in due to inefficiencies in the shrink.

What breaks the tie between these 2 choices? Not much really, but the codename 'RV740' is a bit of a giveaway, strongly suggesting the resulting part will fit between the RV770 and RV730 in performance. That makes it much more likely to be the RV770 shrink + 128bit bus at 135mm2. And hence its code name 'little dragon'

Apparently its taped, with the results being not much better performance or power saving compared to parent but good die space savings(ie the shrink got reasonably close to the maths predicted numbers).

Source are (the usual suspects):
Rage3d
Beyond3d
VR-Zone
(Sorry cant be bothered linking to individual posts within those threads at the moment)

You are speculating like that's what it said. It will be a redesign no doubt but RV770 die shrink with 128bit bus is not what it says.



1.5x texture ratio is the first thing I mentioned. I don't know how you missed it. Perhaps you should read before replying. It might help. :p You seem like a smart dude you can easily do the math yourself to fill in the blanks. The post you replied to had 24ROP 96TMU and 384bit bus. If you did simple math it has 50% more higher texture ratio compared to RV770. Why are you comparing to G80? It's entire different chip on a different process!
Ok so you want me to work out how big a RV770 with 1.5x textures + 384bit bus? As that is a bit time consuming to get a RV770 die shot and calculate the area of each feature and add them all up, maybe it try a different way?

Luckily the just released RV730 has rebalanced the shaders to 320 with 32 textures, 8 ROPs and a 128bit bus!

Multiplying this by 3x gives 960sp's, 96tmu, 24rops and 384bit bus. From above area at 55nm is 146mm2 so x 3 = 438mm2. Hmmm smaller than G80 or new GT206 at around 480mm2 or so(ouch!).

Shrinking 438mm2 -> 232mm2
Bit small perhaps, even with inefficiencies in the shrink will need to add more stuff to get it up in the 300mm2-350mm2 range for the memory interface to work.

Unfortunately 3x the specs of RV730 doesn't add up to what I proposed. yet you still don't read my post just to give a falsified answer. :disgust:

Did I mention 24ROP, 960SP, 96tmu, and 384 bit bus or did I mention 24ROP 1280SP, 96TMU on a 384bit bus? :eek: So what would 4 more clusters of SP add to size of the chip? I'm assuming it will be about 500mm. Add dx11 hardware and it would seem to fit well with a die shrink doesn't it. :lips:

Considering you or mr. spadge don't even post here often I find it odd. You seem to be infatuated with 32ROP on gddr5 with 256bit bus just to prove me wrong? Well I gave your idea a thought which doesn't seem logical with 32ROP's on a 256bit bus without raising everything else which would make it bigger than the one I proposed that's starved for bandwidth.
In the beyond3d thread linked above they talk(around page 8 or 9) about if there is a need for more ROP power on the new chip.

For seeming a very long time gddr3 has been around and about its bandwidth only increasing slowly, so much that people stopped talking about it directly instead expressing a graphics cards power in terms of the interface width(ie 64bit is unspeakable, 128bit for small children, 256 acceptable, 384 for real men, 512 for chuck norris). At least for the next year or so that shorthand wont work, with gddr5 increasing so much, will need to talk about the bandwidth directly, at least for awhile

A little more than 16ROP wouldn't hurt especially @ 1920x1200 or 2560x1600 or even newer games that are much more demanding. This is the reason why 8800gtx did so well for so many years. It was a well balanced chip at the time with right amount of bandwidth, tmu count and SP. Considering current GPU's TMU count and SP count are rising 2-3x you need more bandwidth. GDDR5 changes a lot of that of course considering it has 2x the bandwidth of gddr3. But the next generation of GPU's with nearly 100tmu's it wouldn't hurt to have a little more bandwidth.

Kind of the same reason why 9800gtx with higher texture fillrate is actually slower than 260gtx with more bandwidth. Especially with AA even on non shader intensive game. It starts to drag behind.
 

rjc

Member
Sep 27, 2007
99
0
0
Sorry for the delay Mr Azn, have been away!

Originally posted by: Azn
Did I mention 24ROP, 960SP, 96tmu, and 384 bit bus or did I mention 24ROP 1280SP, 96TMU on a 384bit bus? :eek: So what would 4 more clusters of SP add to size of the chip? I'm assuming it will be about 500mm. Add dx11 hardware and it would seem to fit well with a die shrink doesn't it. :lips:
Here is a labelled die shot of the RV770 from Rage3d

Very roughly using a ruler the shaders in total occupy about 25% of the total die area so:
25% * 256mm * (1280 - 960)/800 = 25.6mm2 for the extra 320 shaders.

Therefore total die size for your proposed chip at 55nm would be 438 + 25.6 = 464mm2. Would still need to add some extra stuff say more redundancy or dx11 things to get it up over 500mm2 so that when shrunk to 40nm the 384bit bus would work.