ATI Northern Islands (6xxx) also at 40nm?

Page 4 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

MrK6

Diamond Member
Aug 9, 2004
4,458
4
81
The major difference between HD5xxx and HD4xxxx is that HD5xxx has some type of error correcting memory.

Apparently instead of the memory clock failing it just keeps on retransmitting signals. This allows the user to increase the memory frequency even more, yet little performance is actually gained. I think the first Anandtech article on Cypress talks about this.

EDIT: Here is the article -->http://www.anandtech.com/video/showdoc.aspx?i=3643&p=12

Therefore using your example we cannot say memory speed isn't a limiting factor.
Possibly, but the linear nature of the performance curve would make one assume otherwise. There's a point after 1300MHz where performance stops increasing and actually starts to decrease, which I attribute to ECC kicking in. If memory speed was a bottleneck, even a small increase in frequency would add an approximately linear performance gain. But it doesn't, not even close actually.
 

cbn

Lifer
Mar 27, 2009
12,968
221
106
Possibly, but the linear nature of the performance curve would make one assume otherwise. There's a point after 1300MHz where performance stops increasing and actually starts to decrease, which I attribute to ECC kicking in. If memory speed was a bottleneck, even a small increase in frequency would add an approximately linear performance gain. But it doesn't, not even close actually.

When BFG10K looked for bottlenecks in HD5770 he got around this error correcting memory issue by lowering clocks on both the core and memory and then checking FPS.

Even though lowering core speed resulted in a greater decrease in performance (compared to lowering memory speed) it was very close. In fact, I think his final opinion was that the card was balanced.

Therefore I think we can assume that memory speed is a limiting factor in the architecture of these HD5xxxx cards.
 

evolucion8

Platinum Member
Jun 17, 2005
2,867
3
81
Yeah, Cypress performed about 40% faster on average compared to RV790 in games.



I've made a mistake there, it should have been 400 more SP's. So you ask, how can a 25% increase in power, @ 2000SP/100TMU, give a 40% boost?

If for example, we take Juniper (5770), it has 800SP/40TMU, and compare it with a 4870, also 800SP/40TMU, and set the clock speeds and memory bandwidth to be the same, we'll notice that the older 4870 series card is actually 15-20% faster. ATI cut back certain things to get the die sizes of their 5xxx series GPU's down, but in the process lost about 15-20% performance there.

So if for Northern Islands ATI simply puts back what they cut out from the 5xxx cards, they would gain 15%, then add the 25% gain from 400 more SP's and 20 more TMU's, you get ~40% total increase. Of course I'm just giving an example here, but this would not be very hard for ATI to do. And such a GPU would be big (by ATI standards) but not be much larger than 450mm^2.

What you posted there makes sense, that would explain why the HD 5870 isn't twice faster than the HD 4890 even having everything in double. About TMU's I'm not sure about the bottleneck in the HD 5x00 series, but I do know that the TMU Filtering performance tended to be a bottleneck in many scenarios with the RV770 architecture, moving such function to the shader helped considerably its texture performance.

So putting back everything that was taken from the RV770 plus more shaders, would do a nice refresh of the architecture, but I wonder how wider the architecture can go before it becomes bottleneck by its compiler performance, the HD 5x00 series is already wide enough.
 

KingstonU

Golden Member
Dec 26, 2006
1,405
16
81
I'm surprised few people are focusing on the fact that HD5XXX is (for the most part) only a die shrink of HD4XXX. Where as HD6XXX is supposed to be a completely new architecture with completely new possibilities. So comparing manufacturing process, # of transisters, shaders, TMUs, clocks, etc on HD4XXX to HD5XXX is apples to apples but HD5XXX to HD6XXX likely will not be at all. Like Conroe to Penryn, then Penryn to Nehalem.
 

bryanW1995

Lifer
May 22, 2007
11,144
32
91

that's funny, I used to read bbspot every day; that site is hilarious!


pretty weak anagram. (aadmit or something that keeps the "a" at the start would seem more appropriate. plus any word that starts with 2 vowels is automatically funny. i.e. aardvark, aaron burr)

hadn't heard it till today. seems weird to be using it still. the implied pejorative is based on the idea that the merger was a bad idea at the time. given that it seems to be working out, the user is kinda identifying themself as one of those shorter sighted individuals. admittedly i still call them ati every so often.

I think that I brought it up here, actually. I wouldn't consider myself a fanboi of either camp, but I think that most here would agree that I'm more likely to take a favorable posture towards ati/amd (aka daamit) than nvidia. I used daamit because the cards I was talking about were originally ati creations but ended up getting released by the merged company, aka "daamit". daamit isn't used much any more because most, if not all, of their current gpus have been conceived since the merger and are thus amd products. I have actually received criticism in the past for continuing to call some of the more recent cards "ati" instead of "amd".
 
Last edited:

bryanW1995

Lifer
May 22, 2007
11,144
32
91
Well, here's an easy question to answer.
AMD has a ~2 billion transistor chip in the HD58xx cards/

NV are making a ~3billion transistor chip called Fermi on the same process as that ~2bn transistor chip. It uses more power.

That means AMD have a theoretical 50% increase in transistors to play with. However will they manage to improve performance?! They can also increase power consumption for a single chip.

well, they will be hard-pressed to make a 3200sp gpu at 40nm. however, if NI is really a whole new design then they might still be able to get ~ double performance while not making the card too much larger. It will certainly be interesting to see how things shake out. Of course, if fermi keeps dragging out its release date we might end up seeing NI before fermi...:|
 

Kuzi

Senior member
Sep 16, 2007
572
0
0
What you posted there makes sense, that would explain why the HD 5870 isn't twice faster than the HD 4890 even having everything in double. About TMU's I'm not sure about the bottleneck in the HD 5x00 series, but I do know that the TMU Filtering performance tended to be a bottleneck in many scenarios with the RV770 architecture, moving such function to the shader helped considerably its texture performance.

So putting back everything that was taken from the RV770 plus more shaders, would do a nice refresh of the architecture, but I wonder how wider the architecture can go before it becomes bottleneck by its compiler performance, the HD 5x00 series is already wide enough.

For sure. I've checked the reviews of the 5830 today, and I have only one word about it's performance: SAD. ATI has done some serious castration to the 5000 GPU's, how can a 1120SP/56TMU be slower than the 4890 that only has 800SP/40TMU? Yes the 4890 is clocked at 850MHz, but that is only 6% more, while the 5830 supposedly has 40% higher shader power, and it should be +30% faster than the 4890.
 

Kuzi

Senior member
Sep 16, 2007
572
0
0
In a previous post, I mentioned that those cut backs in the 5000 hardware resulted in 15-20% less performance, but I was comparing with the 5770 card. And it seems this difference grows with the higher end cards. For example, the 5770 lost ~20%, the 5830 lost ~30%, the 5850 lost ~45%, the 5870 lost ~55%. Right now, the 5870 is about 40% faster than the 4890, if we add the lost 55%, you get a 95% difference, which is about double (the hardware doubled after all).

These numbers are not very accurate, but they give a rough idea. And if this is correct, a hypothetical Northern Islands with the "old" chip configuration of the RV770/RV790, but with 1600SP/80TMU, would be clock for clock ~50% faster than the 5870. Of course faster GDDR5 memory and/or a wider bus (384bit) needs to be used also.

Anand's RV870 article mentioned that Cypress would have been ~400mm^2 in size without the cut backs. So there is a possibility that NI would just be the "original" planned Cypress without the cuts, with some tweaks and higher memory bandwidth. You get a GPU @40nm that is about 400mm^2 in size, with similar or higher performance than the 5970, pretty impressive if correct.
 
Last edited:

Dribble

Platinum Member
Aug 9, 2005
2,076
611
136
The best speed comparison was the HD5450 to the HD 4550. The have identical numbers of shaders, identical memory bus and memory speeds. The only difference is the HD5450 has a 50mhz faster graphics clock. Hence if the 5xxx series was identical clock to clock to the 4xxx series the HD5450 should be faster, it's not, the HD 4550 beat it in every test.
http://www.anandtech.com/video/showdoc.aspx?i=3734
 

MrK6

Diamond Member
Aug 9, 2004
4,458
4
81
When BFG10K looked for bottlenecks in HD5770 he got around this error correcting memory issue by lowering clocks on both the core and memory and then checking FPS.

Even though lowering core speed resulted in a greater decrease in performance (compared to lowering memory speed) it was very close. In fact, I think his final opinion was that the card was balanced.

Therefore I think we can assume that memory speed is a limiting factor in the architecture of these HD5xxxx cards.
I might have posted this before, but I don't agree with that conclusion at all, as it relies on faulty logic. It implies that performance scales with memory bandwidth linearly in all situations (which it doesn't). The fact that he put the card in memory bandwidth limited situations to stress the card invalidates the conclusion. If he ran the tests across multiple resolutions, especially something like 1024x768, I'd bet the conclusions would be much different. Basically, the best conclusion you can make out of those tests is that at high resolutions with AA, the 5770, the core and memory bus complement each other. Most importantly is that the conclusion only applies to the 5770, as the 5870 is a different beast. It's much less memory bandwidth limited than the 5770, because, surprise, it has much more bandwidth. If you want to form a concrete opinion about the 5xxx series, you need to test all the cards in the 5xxx series (or at least most of them) in many different scenarios. Anything else is a generalization, and, especially when it's based on false assumptions, is worth crap.

The best speed comparison was the HD5450 to the HD 4550. The have identical numbers of shaders, identical memory bus and memory speeds. The only difference is the HD5450 has a 50mhz faster graphics clock. Hence if the 5xxx series was identical clock to clock to the 4xxx series the HD5450 should be faster, it's not, the HD 4550 beat it in every test.
http://www.anandtech.com/video/showdoc.aspx?i=3734
Excellent observation! Looking at the review, you can see that the 4550 is on average 10-15% faster than the 5450 in any given situation, despite the 5450 having a faster clock speed. Some parts of the 5xxx architecture were definitely loosened up compared to the 4xxx series architecture, and that's the point I was making earlier: there's something (or several things) that makes the 4xxx series faster clock for clock. I couldn't tell you what it is though.