Richland ES now out in the wild.

Page 6 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

ShintaiDK

Lifer
Apr 22, 2012
20,378
145
106
well, from Llano to trinity, amd just made the IGP even more bandwidth starved...what really bumped the performance was the improvments in bandwidth sharing

from vliw 4 to GCN, the increase will be bigger...
at the same clocks/shader/bandwidth: http://www.inpai.com.cn/doc/hard/168475_30.htm

add another improvments in IMC, then it is a decent bump
(but probably will get owned to GT3)

IMC improvements? Where?

It look awfully familiar...with these numbers:
http://www.anandtech.com/show/4061/amds-radeon-hd-6970-radeon-hd-6950/13

Not to mention CGN cores are clocked 33% higher. But they certainly dont perform 33% more. the TDP for the 2 cards are around the same, one is 40nm the other 28nm. CGN actually looks like a disaster. The hunt for GPGPU capabilities show its pain.

If anything, CGN cores are slower. But offset by the lower process node. So you can add more shaders/higherclock etc.

Your link only confirmed my expectations.
 
Last edited:

Idontcare

Elite Member
Oct 10, 1999
21,118
58
91
I thought they validated the IMC to work with higher memory clockspeeds? It may not have increased the efficiency but the stock allowed bandwidth at the top-end improves, right?

Naturally if you are comparing an OC'ed memory setting (essentially OC'ing the IMC) to that of a stock certified memory setting then you probably won't see bandwidth improvements unless the IMC was dramatically reworked.
 

Abwx

Lifer
Apr 2, 2011
10,853
3,298
136
I see benchs used as arguments but i fail to see the link with GCN
since the Radeon 6xxx dont use this architecture....
 

Hitman928

Diamond Member
Apr 15, 2012
5,182
7,633
136
IMC improvements? Where?

It look awfully familiar...with these numbers:
http://www.anandtech.com/show/4061/amds-radeon-hd-6970-radeon-hd-6950/13

Not to mention CGN cores are clocked 33% higher. But they certainly dont perform 33% more. the TDP for the 2 cards are around the same, one is 40nm the other 28nm. CGN actually looks like a disaster. The hunt for GPGPU capabilities show its pain.

If anything, CGN cores are slower. But offset by the lower process node. So you can add more shaders/higherclock etc.

Your link only confirmed my expectations.

I think you missed in the graphs it shows that they down clocked the GCN architecture to the same speed as the VLIW4, so the improvement you see is clock for clock, GCN appears to have substantial gains over VLIW4.
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
145
106
I think you missed in the graphs it shows that they down clocked the GCN architecture to the same speed as the VLIW4, so the improvement you see is clock for clock, GCN appears to have substantial gains over VLIW4.

The point is that you live on the 40->28nm shrink. Take a HD6870, shrink it to 28nm and expand transistors as well from 1.7B to 2.8B plus buff the memory speed. Suddenly HD7870 doesnt look so good. And the speed increase only comes from the expanded transistor budget.

Trinity to Richland is the same processnode. And a transistor count expansion on a 242mm2 chip wont be easy.

VLIW5 to VLIW4 had the benefit of the under utilized (bogus) shaders. That went from 320 to 384 from HD5870 to HD6970. Also why performance was abit of a hit or miss depending what you did, and why the VLIW4 cards suffered from throttle due to overheating.
 
Last edited:

wlee15

Senior member
Jan 7, 2009
313
31
91
IMC improvements? Where?

It look awfully familiar...with these numbers:
http://www.anandtech.com/show/4061/amds-radeon-hd-6970-radeon-hd-6950/13

Not to mention CGN cores are clocked 33% higher. But they certainly dont perform 33% more. the TDP for the 2 cards are around the same, one is 40nm the other 28nm. CGN actually looks like a disaster. The hunt for GPGPU capabilities show its pain.

If anything, CGN cores are slower. But offset by the lower process node. So you can add more shaders/higherclock etc.

Your link only confirmed my expectations.

No it doesn't. 0% increase of shader count, core frequency, rops, and memory bandwidth resulted in a 14% increase in performance, and that before the 20% increase in performance by 12.11 drivers.
 

Hitman928

Diamond Member
Apr 15, 2012
5,182
7,633
136
The point is that you live on the 40->28nm shrink. Take a HD6870, shrink it to 28nm and expand transistors as well from 1.7B to 2.8B plus buff the memory speed. Suddenly HD7870 doesnt look so good. And the speed increase only comes from the expanded transistor budget.

VLIW5 to VLIW4 had the benefit of the under utilized (bogus) shaders. That went from 320 to 384 from HD5870 to HD6970. Also why performance was abit of a hit or miss depending what you did, and why the VLIW4 cards suffered from throttle due to overheating.


What they compared was the limited edition 6930. BTW, the 6870 was still VLIW5. The 69xx series has 2.64B transistors compared to 2.8B on the 78xx series, that's an increase of 6%, which is no where near what a full shrink can give you in the same die area. So a 6% increase in transistor count provided 15% - 20% increase in performance. GCN is more efficient than VLIW4, not just for compute. Also, the 5th shader in VLIW5 wasn't bogus, it just wasn't used except in compute situations. That's why the 5xxx series can out perform the 6xxx series in most compute tasks when graphics performance is equal.

Trinity to Richland is the same processnode. And a transistor count expansion on a 242mm2 chip wont be easy.

I don't see anyone arguing otherwise. Richland, from what I've seen, is mostly likely a mildly tweaked Trinity to nudge up perf / watt, nothing more. If they do end up using GCN, you will see bigger gains in graphics performance and compute.
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
145
106
What they compared was the limited edition 6930. BTW, the 6870 was still VLIW5. The 69xx series has 2.64B transistors compared to 2.8B on the 78xx series, that's an increase of 6%, which is no where near what a full shrink can give you in the same die area.

Hold on there, you are comparing a cutdown product with disabled parts to a full one in terms of transistor count? Take a 6970 then, shrink it to 28nm. Add whatever speed you can up to 190W and see how it goes against a HD7870.

Without a shrink and its benefits, CGN doesnt look good at all.
 

inf64

Diamond Member
Mar 11, 2011
3,685
3,957
136
Shintai, shrinking transistor(s) in a brand new design is not a trivial task and you know it (unless you are intel of course). AMD did quite a good job with GCN considering the tech they had to work with.
 

Vesku

Diamond Member
Aug 25, 2005
3,743
28
86
If I had to choose between an integrated 7750 or an integrated VLIW4 transistor count equivalent, I'd take the 7750.
 

wlee15

Senior member
Jan 7, 2009
313
31
91
And Finally
2012-11-21 08:16:49 < Skyler_> BugMaster|work: it definitely doesn't do AVX2 and I didn't read that Steamroller supports AVX2 anywhere
2012-11-21 08:16:55 < Skyler_> AMD would have likely bragged about that if it did
2012-11-21 08:17:03 < Skyler_> I really doubt it because they didn't even shift to 256-bit execution units
2012-11-21 08:18:02 < MasterNobody> any way there is link to video drivers you can try but they from June so dunno if worse the try
2012-11-21 08:18:41 < BugMaster|work> *worth
2012-11-21 08:28:28 * Skyler_ reads
2012-11-21 08:30:02 < Skyler_> ah
2012-11-21 08:30:09 < Skyler_> note, I should mention that the shipping manifest is the thing that claimed Kaveri
2012-11-21 08:30:16 < Skyler_> I don't know if it said Kaveri anywhere on the actual chip/box
2012-11-21 08:30:48 < Skyler_> or wait. wow, I am dumb, it's a richland.
2012-11-21 08:31:13 < Skyler_> AHHHH I see what happened.
2012-11-21 08:31:22 < Skyler_> It said richland, I googled richland, I saw "richland is kaveri, okay"
2012-11-21 08:31:40 < Skyler_> but google seems to say that Richland *is* Kaveri?
2012-11-21 08:31:43 < Skyler_> aka Steamroller?
2012-11-21 08:31:53 < Skyler_> "AMD's Kaveri-based Richland APU [...]"
2012-11-21 08:32:48 < kshishkov> hasn't AMD shifted to some incomprehencible alphanumeric IDs for their products instead of random codenames?
2012-11-21 08:32:50 < Skyler_> This whole thing is really confusing, there seems to be a lot of contradictory information and I'm not even sure what AMD themselves claim.
2012-11-21 08:33:11 < Skyler_> AMD really does seem to be a mess ._.
2012-11-21 08:34:03 < kshishkov> don't worry, there are rumours it might cease to exist soon
2012-11-21 08:35:12 < Skyler_> For those reading these logs, I'm pretty confident they just sent me a refresh of Bulldozer. Benchmarks show that this 3.2ghz 4-core chip is almost exactly 1/4 + 5% as fast as a 3ghz 16-core bulldozer I've benched on, so I'm pretty confident there's zero speed improvement here
2012-11-21 08:35:17 < Skyler_> Steamrolle would be faster.
2012-11-21 08:35:26 < BugMaster|work> there was slide at http://www.donanimhaber.com/islemci...-ailesi-icin-2013te-guncelleme-gorunmuyor.htm that says that Richland is piledrive
2012-11-21 08:36:07 < Skyler_> It claims a 35W TDP though, which is extremely good -- I recall earlier that we were getting the same TDP off of ~2ghz parts, so it's probably the shrink.

If Richland really can stay at 3.2 Ghz at 35W TDP loaded then that is quite a bit of and improvement over Trinity.
 

Hitman928

Diamond Member
Apr 15, 2012
5,182
7,633
136
Hold on there, you are comparing a cutdown product with disabled parts to a full one in terms of transistor count? Take a 6970 then, shrink it to 28nm. Add whatever speed you can up to 190W and see how it goes against a HD7870.

Without a shrink and its benefits, CGN doesnt look good at all.

A) Even if you take into account how many transistors are disabled, you still get much better efficiency from GCN, especially once you add another 10-20% improvement from the latest drivers.

B) You can look at lots of comparisons. Check out the 6850 versus 7770. The 6850 has over twice the memory bandwidth, 50% more shaders, twice the ROPS, and 20% more texture units. The 6850 also has 17% more transistors. The 7770 has 17.6% higher clocks. With this, you would think the 6850 would crush the 7770, but it doesn't. In fact, the 7770 matches or beats the 6850 in many cases. Where it loses, it doesn't lose by nearly as much as it should if they were using the same architecture.

http://www.anandtech.com/show/5541/amd-radeon-hd-7750-radeon-hd-7770-ghz-edition-review/21

7770 versus 6850

Crysis Warhead: 79% 80%
Metro 2033: 92% 87%
Dirt 3: 107% 108%
Shogun 2: 98% 101%
Arkham City: 95% 98%
Portal 2: 87% 88%
Battlefield 3: 92% 94% 95%
Starcraft 2: 79% 93%
Skyrim: 156% 85%
Civ 5: 130%

Compute:
114%
112%
140%
258%

And this is before the driver improvements for GCN which help the 7770 in 6 of these games, allowing it to overtake the 6850 in a couple where it lost at launch. Mostly, if the 7770 isn't bandwidth limited from it's 1/2 sized memory controller and slower memory, it does quite well against the 6850 despite all the other advantages the 6850 has.

C) I can show you evidence over and over and over again, but if you only want to believe what you want to believe, well, that's up to you.
 
Last edited:

ShintaiDK

Lifer
Apr 22, 2012
20,378
145
106
You are still comparing the performance/tdp for 28nm vs 40nm. The 6850 would be clocked higher on 28nm. Again, without 28nm to the rescue, CGN doesnt look good.
 

Hitman928

Diamond Member
Apr 15, 2012
5,182
7,633
136
You are still comparing the performance/tdp for 28nm vs 40nm. The 6850 would be clocked higher on 28nm. Again, without 28nm to the rescue, CGN doesnt look good.

I mentioned nothing of TDP. I'm comparing shader performance per clock, essentially. So, because GCN gets more performance out of every shader, even if the 6850 were shrunken and clocked higher to make up for the efficiency of GCN, it still has more transistors and a higher clock with the same VDD, so therefore power usage goes up to achieve the same performance and you don't get the compute benefits of GCN.

You can continue to deny it, but GCN is a very efficient architecture, more efficient than anything AMD has put out in recent years.
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
145
106
I mentioned nothing of TDP. I'm comparing shader performance per clock, essentially. So, because GCN gets more performance out of every shader, even if the 6850 were shrunken and clocked higher to make up for the efficiency of GCN, it still has more transistors and a higher clock with the same VDD, so therefore power usage goes up to achieve the same performance and you don't get the compute benefits of GCN.

You can continue to deny it, but GCN is a very efficient architecture, more efficient than anything AMD has put out in recent years.

The HD6850 is essentially a 192 shader part. Marketing just got to it. Its also why VLIW4 was faster due to underutilized shaders. So you actually went from 320 to 384 with HD5870->HD6970. nVidia could have called the GTX580 for 1024SPs with the same convention.
 
Last edited:

Hitman928

Diamond Member
Apr 15, 2012
5,182
7,633
136
The HD6850 is essentially a 192 shader part. Marketing just got to it. Its also why VLIW4 was faster due to underutilized shaders. So you actually went from 320 to 384 with HD5870->HD6970. nVidia could have called the GTX580 for 1024SPs with the same convention.

Which doesn't lead to a contradiction of anything I've shown . . .

Could an architecture with no compute support perform better in games than GCN? Yes, but the problem is, that's not the way the industry is headed. More and more games are starting to utilize computation functions on the GPU as well as non-game applications. However, that doesn't change the fact that even without the compute functions, GCN is a better architecture than VLIW.
 

Vesku

Diamond Member
Aug 25, 2005
3,743
28
86
I think ShintaiDK should take this VLIW4>GCN talk to Video Cards and Graphics forum. I haven't seen anyone float that idea there, yet.
 

Idontcare

Elite Member
Oct 10, 1999
21,118
58
91

May as well just quote the relevant part here so folks see it upfront:

hansmoleman said:
Kabini Cinebench R10
Processor : AMD Eng Sample: 2M14F100J4460_17/14/08/06_9832
MHz :
Number of CPUs : 4
Operating System : WINDOWS 64 BIT 6.1.7601
Graphics Card : KB 4C 17W (N-1) (9832)
CINEBENCH R10
******************************
Rendering (Single CPU): 1568 CB-CPU
Rendering (Multiple CPU): 5653 CB-CPU
 

Idontcare

Elite Member
Oct 10, 1999
21,118
58
91
And Finally
2012-11-21 08:16:49 < Skyler_> BugMaster|work: it definitely doesn't do AVX2 and I didn't read that Steamroller supports AVX2 anywhere
2012-11-21 08:16:55 < Skyler_> AMD would have likely bragged about that if it did
2012-11-21 08:17:03 < Skyler_> I really doubt it because they didn't even shift to 256-bit execution units
2012-11-21 08:18:02 < MasterNobody> any way there is link to video drivers you can try but they from June so dunno if worse the try
2012-11-21 08:18:41 < BugMaster|work> *worth
2012-11-21 08:28:28 * Skyler_ reads
2012-11-21 08:30:02 < Skyler_> ah
2012-11-21 08:30:09 < Skyler_> note, I should mention that the shipping manifest is the thing that claimed Kaveri
2012-11-21 08:30:16 < Skyler_> I don't know if it said Kaveri anywhere on the actual chip/box
2012-11-21 08:30:48 < Skyler_> or wait. wow, I am dumb, it's a richland.
2012-11-21 08:31:13 < Skyler_> AHHHH I see what happened.
2012-11-21 08:31:22 < Skyler_> It said richland, I googled richland, I saw "richland is kaveri, okay"
2012-11-21 08:31:40 < Skyler_> but google seems to say that Richland *is* Kaveri?
2012-11-21 08:31:43 < Skyler_> aka Steamroller?
2012-11-21 08:31:53 < Skyler_> "AMD's Kaveri-based Richland APU [...]"
2012-11-21 08:32:48 < kshishkov> hasn't AMD shifted to some incomprehencible alphanumeric IDs for their products instead of random codenames?
2012-11-21 08:32:50 < Skyler_> This whole thing is really confusing, there seems to be a lot of contradictory information and I'm not even sure what AMD themselves claim.
2012-11-21 08:33:11 < Skyler_> AMD really does seem to be a mess ._.
2012-11-21 08:34:03 < kshishkov> don't worry, there are rumours it might cease to exist soon
2012-11-21 08:35:12 < Skyler_> For those reading these logs, I'm pretty confident they just sent me a refresh of Bulldozer. Benchmarks show that this 3.2ghz 4-core chip is almost exactly 1/4 + 5% as fast as a 3ghz 16-core bulldozer I've benched on, so I'm pretty confident there's zero speed improvement here
2012-11-21 08:35:17 < Skyler_> Steamrolle would be faster.
2012-11-21 08:35:26 < BugMaster|work> there was slide at http://www.donanimhaber.com/islemci/haberleri/DH-Ozel-AMDnin-FX-islemci-ailesi-icin-2013te-guncelleme-gorunmuyor.htm that says that Richland is piledrive
2012-11-21 08:36:07 < Skyler_> It claims a 35W TDP though, which is extremely good -- I recall earlier that we were getting the same TDP off of ~2ghz parts, so it's probably the shrink.

If Richland really can stay at 3.2 Ghz at 35W TDP loaded then that is quite a bit of and improvement over Trinity.

Any chance you'll be amending your thread title so as to more accurately reflect what this was all about?

Edit: nevermind, thanks OP for accommodating the request!
 
Last edited:

nismotigerwvu

Golden Member
May 13, 2004
1,568
33
91
Hold on there, you are comparing a cutdown product with disabled parts to a full one in terms of transistor count? Take a 6970 then, shrink it to 28nm. Add whatever speed you can up to 190W and see how it goes against a HD7870.

Without a shrink and its benefits, CGN doesnt look good at all.

He is making a comparison between pitcairin and cayman, and not really referring to cut down cards. http://www.anandtech.com/bench/Product/540?vs=548 are you seriously going to argue that the 7870 is slower here? Are you that confident that a ~5% bump in transistors and 100mhz would close that gap? Also, are you willing to completely rule out that GCN might actually scale better and reach higher clocks given equal fabrication versus what is essentially a 5 and a half year old architecture? AMD is a pretty easy target right now man, it's pretty tough to wiff that hard.
 

wlee15

Senior member
Jan 7, 2009
313
31
91
Any chance you'll be amending your thread title so as to more accurately reflect what this was all about?

Done

The Kabini ES looks pretty intriguing. It appears to be at 1.4 Ghz 1.7 Ghz turbo based on the OPN. So 4C 17W and it matches the low end 1.6Ghz 800mhz FSB Core 2 in single-thread in Cinebench R10 while matching a 2.6 Ghz Conroe Core 2 Duo, and a 2.8 Ghz Barcelona based Athlon II.