Looks like an accurate slide for 6970/Cayman specs

dangerman1337 · Nov 21, 2010

http://bbs.expreview.com/viewthread.php?tid=37918&from=recommend_f

Doesn't seem that much improvement, from AMDs new pricing and positining stratergy, though if the rumors are true about the 69** series having some new arch in it gives it a good performance boost over the 58** series hopefully.

Arkadrel · Nov 21, 2010

Well if its true that the "off chip buffering" is to help make the ROPs more effective, even if theres no more of them than the 5870's, it should still be a increase compaired to older series.

30 SIMD engines is a 50% increase...hmm.

160 GB/sec memory bandwidth seems lower than Id have expected <_<'.

I thought these chips where supposed to be 5.5 GHz - 6.0 Ghz? why run them slower than that then? to save on power? to give good overclocking headroom? I was expecting to see them use: 256 x 1.500 / 8 x 4 = 192 GB/s.

20% more SPs, 20% more TUs.

I look forward to benchmarks

bbs.expreview.com quotes:

NV took the 6950 test 1536SP 10% worse than the 580 ha ha ~!~

NV doomed to tragedy

Doubt thats true though.

tannat · Nov 21, 2010

This looks very much as the recent rumors.

Much of the potential improvement may be hidden from that slide in the shaders being 4d versus 5d in HD5870. The shadercount has in this case increased with 50%.

Multiply this with the much improved utilization of the shaders already seen from barts (220 shaders versus 320 shaders in HD5870 with close performance) and you may have a monster with improvement of anything between 30%-100% over HD5870.

Will be interesting.

dangerman1337 · Nov 21, 2010

I just hope that the 69** is more than a stop gap for the 7*** like the 68** is. But then again there are rumors floating around that the 69** is partially at least new architecture aswell.

blastingcap · Nov 21, 2010

memory bandwidth starvation, anyone?

Arkadrel · Nov 21, 2010

blastingcap said:
memory bandwidth starvation, anyone?

If its only 160 GB/s yes... It ll be the "norm" to oc the hell out of the memory when you get these cards.

People where expecting because the rumors of 5.5 - 6GHz chips being used that we would see:

(5.5 Ghz/4 = 1375mhz)
6950: 256x1.375 / 8 x 4 = 176 GB/s memory bandwidth

(6.0 Ghz/4 = 1500mhz)
6970: 256x1.500 / 8 x 4 = 192 GB/s memory bandwidth

160 GB/s looks to be on the low side. IF thats the real GB/s, then hopefully the memory on them can OC well, they might be running at slower speeds to make the cards use less power?

blastingcap · Nov 21, 2010

Nah, I was expecting lower than 192 for sure. What they really needed was a 384-bit bus. But maybe I'm wrong, and the main bottleneck is somewhere else. I guess we'll all find out soon enough. 2 polys/clock looks interesting, I wonder if they have 2-way geometry engines rather than 16-way like NV has with its top GPU.

Arkadrel said:
If its only 160 GB/s yes... It ll be the "norm" to oc the hell out of the memory when you get these cards.

People where expecting because the rumors of 5.5 - 6GHz chips being used that we would see:

(5.5 Ghz/4 = 1375mhz)
6950: 256x1.375 / 8 x 4 = 176 GB/s memory bandwidth

(6.0 Ghz/4 = 1500mhz)
6970: 256x1.500 / 8 x 4 = 192 GB/s memory bandwidth

160 GB/s looks to be on the low side. IF thats the real GB/s, then hopefully the memory on them can OC well, they might be running at slower speeds to make the cards use less power?

Borealis7 · Nov 21, 2010

i'm more impressed by the "2 polygons per clock" line. that could have a big impact.

Lonyo · Nov 21, 2010

One would think that if they have only increased memory bandwidth slight, that it's not a serious limiting factor in performance.
It wouldn't make much sense to release a product which was totally memory constrained when faster chips are available (and being used), unless they didn't manage to make their memory controller run well enough.

Silverforce11 · Nov 21, 2010

Speculating, but i think one of 2 possibilities:
1. Memory controller doesn't like to operate at crazy gddr5 speeds (drains too much power).

2. Its not bandwidth starved since new architecture improves efficiency. Drop some speed on the vram to lower TDP.

Quote from Napoleon: "NV要悲剧了....他们拿到的卡是1536SP的,就跟当初拿到的900SP HD6870一样....AMD现在保密做的真好,发出去的卡都用BIOS故意屏蔽了SP单元,然后等正式发布之前才给完整SP的BIOS." Means AMD has been really careful about leaks to prevent NV finding out perf targets of the new architecture. Even giving false-info BIOSes to AIBs in sample cards.

According to AIB source/rumor, Cayman PRO is giving the gtx580 a good run for its money. Huang has told NV partners to prep high factory OC gtx580 to rain on AMD's launch.

Edit: "hd6970给每5组simd core都配了一个tessellator" = six independent tessellator, one per 5 simd cores. I guess thats what AMD was referring to with the scalable tessellation.

blastingcap · Nov 21, 2010

Silverforce11 said:
Edit: "hd6970给每5组simd core都配了一个tessellator" = six independent tessellator, one per 5 simd cores. I guess thats what AMD was referring to with the scalable tessellation.

That ought to make Scali happy. Or at least happier than when AMD had just one.

As for oc'd GTX580 vs 6970, I hope we don't get another repeat of the GTX460 FTW situation. Either use it with asterisks and explanations, or not at all. Even if an oc'd GTX580 beats a stock 6970, that won't be considered a legitimate win by any reviewer that matters. I sorta doubt that 6950 will give a GTX580 a run for the money, though. I think the 6970 will have a hard time beating a GTX580 as it is. I'm expecting them to trade blows.

But I could be entirely wrong.

Competition is good.

flopper · Nov 21, 2010

blastingcap said:
That ought to make Scali happy.

you think anything would make him happy?

Anyhow, look forward to launch.

blastingcap · Nov 21, 2010

flopper said:
you think anything would make him happy?

Anyhow, look forward to launch.

I edited my original post to account for that. Six is less than 16, still. Haha.

Arkadrel · Nov 21, 2010

Lonyo said:
One would think that if they have only increased memory bandwidth slight, that it's not a serious limiting factor in performance.
It wouldn't make much sense to release a product which was totally memory constrained when faster chips are available (and being used), unless they didn't manage to make their memory controller run well enough.

That is what I fear might have happend lmao.

There is a reason why nvidia doesnt use a small bus running at insane speeds with fast memory, its hard to get working. It saves space so you get better performance/mm^2, which means more profits pr waffer. Which means if you can do it, it doesnt make sense not to do it.

If amd has chips that can run at 6GHz used on these, and then the product isnt running at those speeds there must be a reason for it.
1) the added bandwidth vs increase power useage? isnt worth it?
2) they had problems getting the unit running at those speeds.

I frankly think 2 is more likely.
However we wont know until we see benchmarks if these cards are really bandwidth starved.

Arkadrel · Nov 21, 2010

Silverforce11 said:
Edit: "hd6970给每5组simd core都配了一个tessellator" = six independent tessellator, one per 5 simd cores. I guess thats what AMD was referring to with the scalable tessellation.

holy hell...

6 tessellator units, so we ll likely see 6x or more tessellation power in these than the older 5xxx cards.

And the off chip buffering thingy, also to help with the tessellation performance. These cards should have ALOT better tessellation than the older cards.

*edit:
I wonder if this means these cards will be beating the 5xx cards in tessellation performance?

Hauk · Nov 21, 2010

Not sure if this is possible to ballpark, but can someone speculate how much cheaper (per unit) a 256-bit design is compared to 384-bit or higher?

tviceman · Nov 21, 2010

Should be an interesting product when released and may be very, very competetive with gf110. I still firmly believe, though, that delays are indicative of problems. If the chip has been taped out for some time, a company doesn't delay because of possible supply shortages - it's better to release what supply is available and spread as much positive info on product rather than keep everything a secret.

Lonyo · Nov 21, 2010

blastingcap said:
I edited my original post to account for that. Six is less than 16, still. Haha.

So? 512 cores is less than 1600 and yet the GTX580 still destroys the HD5870.
Numbers aren't everything, unless the number is FPS.

Ares1214 · Nov 21, 2010

This slide has been out a LONG time, or atleast something very much so like it. Id have to dig months back to find it, back then, it was dismissed as a rumor, since it didnt have the 192 GB/s bandwidth. That looks more like 6950 specs.

Daedalus685 · Nov 21, 2010

Some quick maths:

1920/30/4 = 16
1920/30/5 = 12.8

The card has to be a new 4 architecture for the numbers to be correct as you can't split fractions of them onto different SIMD cores.

Thus if we assume the 4 perform equal to the old 5 (give or take) we are looking at an increase from 320 to 480 in the groups, or 50% more shading power.

If the 5870 was bottlenecked with the memory (I don't think it was but many seemed to feel it was at least held back) this is 50% more shaders with almost no increase in memory BW. Seems crazy to me, and could be a problem.

Lonyo · Nov 21, 2010

Daedalus685 said:
Some quick maths:

1920/30/4 = 16
1920/30/5 = 12.8

The card has to be a new 4 architecture for the numbers to be correct as you can't split fractions of them onto different SIMD cores.

Thus if we assume the 4 perform equal to the old 5 (give or take) we are looking at an increase from 320 to 480 in the groups, or 50% more shading power.

If the 5870 was bottlenecked with the memory (I don't think it was but many seemed to feel it was at least held back) this is 50% more shaders with almost no increase in memory BW. Seems crazy to me, and could be a problem.

The number of shader groups doesn't matter does it?
Having more shader groups just means (assuming there's a split between basic and complex shaders still such as the 2+2 people have talked about) that the lowest end performance would increase, meaning it's better in worst case scenarios (for its shaders), but that wouldn't suddenly require more memory bandwidth. The increase in total shaders + increase in clockspeed (if any) would make the difference of requiring more bandwidth for "optimal" conditions where it performs highest.

http://www.firingsquad.com/hardware/ati_radeon_5870_overclocking/page3.asp
HD5870 wasn't really memory bandwidth bound, overclocking RAM gives minor gains that don't even come close to scaling with clockspeed. At the higher resolutions, 2GB RAM will do more than faster RAM anyway, assuming it ships as a 2GB card.

AtenRa · Nov 21, 2010

96 Texture Units / 30SIMDs = 3.2 Texture Unit per SIMD

not possible, unless Texture units will not be part of the SIMD anymore

Arkadrel · Nov 21, 2010

http://www.firingsquad.com/hardware/ati_radeon_5870_overclocking/page4.asp

Crysis is still one of the most demanding graphics tests we’ve got, and here we see a definite performance advantage in favor of graphics core over memory overclocking. Bumping up the graphics core’s speed 9% to 930MHz yielded a performance improvement of 5% in most of our tests, only at 2560x1600 does performance trail off. OC’ing the memory by the same 9% yielded a performance gain of just 2%.

The 5870 seemed to benefit more from GPU core clocks increase %'s than it did from memory clocks increases bby same %'s amounts.

Still 160 GB/s doesnt seem like much of a increase over the 5870's 153.6 GB/s.

A ~5% increase in memory bandwidth, when you ll have atleast 20% more shaders? I does stand to reason that the 6970 will be more memory bandwidth bound than a 5870 is.

Is 160 GB/s enough? hard to say... nvidia are useing 192 GB/s with their 580's, I dont think theyd so without reason.

Does the "off chip buffering" for tessellation effect this? I have no idea, if stuff like tessellation take up memory bandwidth. *IF* it does, and this is no longer takeing up bandwidth, then granted the 160 GB/s may me more than just a 5% increase.

Daedalus685 · Nov 21, 2010

Lonyo said:
The number of shader groups doesn't matter does it?
Having more shader groups just means (assuming there's a split between basic and complex shaders still such as the 2+2 people have talked about) that the lowest end performance would increase, meaning it's better in worst case scenarios (for its shaders), but that wouldn't suddenly require more memory bandwidth. The increase in total shaders + increase in clockspeed (if any) would make the difference of requiring more bandwidth for "optimal" conditions where it performs highest.

http://www.firingsquad.com/hardware/ati_radeon_5870_overclocking/page3.asp
HD5870 wasn't really memory bandwidth bound, overclocking RAM gives minor gains that don't even come close to scaling with clockspeed. At the higher resolutions, 2GB RAM will do more than faster RAM anyway, assuming it ships as a 2GB card.

The 4 was rumored to preform on par to the 5d, I am merely guessing that they would then require as much memory to feed them to get the same performance.

At any rate, if the 4d is even close to the 5d and these stats are correct the shading power of these cards is going to be massively increased over the 5870.

As for the texture units, there have been several rumors floating around that they are decoupled from the SIMD this time around (such that the 6950 will not have the same ratio of them to shaders as the 6970). But again it could just be a signal that the numbers are poorly faked and not that the rumor of decoupling was correct.

chewietobbacca · Nov 21, 2010

Hauk said:
Not sure if this is possible to ballpark, but can someone speculate how much cheaper (per unit) a 256-bit design is compared to 384-bit or higher?

Apparently the memory bus is the hardest thing to scale down. That is, if we pretend that shaders can be scaled down linearly with the process size, the memory bus wouldn't scale linearly. Thus it costs a lot more to do 384-bit than 256-bit (not to mention you'd have to add more memory chips, and PCB complexity increases)

Lonyo said:
So? 512 cores is less than 1600 and yet the GTX580 still destroys the HD5870.
Numbers aren't everything, unless the number is FPS.

Actually, AMD is 1600/5 = 320 SIMDs

Lonyo said:
The number of shader groups doesn't matter does it?
Having more shader groups just means (assuming there's a split between basic and complex shaders still such as the 2+2 people have talked about) that the lowest end performance would increase, meaning it's better in worst case scenarios (for its shaders), but that wouldn't suddenly require more memory bandwidth. The increase in total shaders + increase in clockspeed (if any) would make the difference of requiring more bandwidth for "optimal" conditions where it performs highest.

http://www.firingsquad.com/hardware/ati_radeon_5870_overclocking/page3.asp
HD5870 wasn't really memory bandwidth bound, overclocking RAM gives minor gains that don't even come close to scaling with clockspeed. At the higher resolutions, 2GB RAM will do more than faster RAM anyway, assuming it ships as a 2GB card.

Shader groups matter in real world benchmarks because apparently, it was extremely hard to feed AMD's 5-way configuration. In fact, it was rumored that internal benchmarks showed that the 5-way was only being used 80% of the time - meaning the 1600SP Cypress acted more like it was a 1280 SP card. Synthetic benchmarks, coded specifically to use all 5, however, were able to utilize the 1600

Going to 4D apparently puts it VERY close to the equivalent in 5D - meaning that a 320 SIMD 4D configuration is very close to the 320 SIMD 5-way config currently existing.

Thus SIMD count matters a lot.. going from 320 -> 480 is a 50% increase

And apparently a more real slide:

Looks like an accurate slide for 6970/Cayman specs

Senior member

Diamond Member

Member

Senior member

Diamond Member

Diamond Member

Diamond Member

Platinum Member

Lifer

Lifer

Diamond Member

Senior member

Diamond Member

Diamond Member

Diamond Member

Platinum Member

Diamond Member

Lifer

Senior member

Golden Member

Lifer

Lifer

Diamond Member

Golden Member

Senior member