wccftechAMD Pirate Islands : R9 300 Series Alleged Specifications Detailed

Page 8 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

3DVagabond

Lifer
Aug 10, 2009
11,951
204
106
My bad, thanks. That still doesn't help AMD since 580 OC annihilated a 6970 OC.
http://www.techspot.com/review/423-gigabyte-geforce-gtx580-soc/page5.html

Of course since one could purchase almost 2x 6950s and unlock them to 6970, the 580 was an awful value. With price/performance, unlocking and bitcoin mining out the window, AMD doesn't have those perks anymore. AMD can't afford to release a $600 flagship that is 15-20% slower than Maxwell unless NV raises prices to $800.



680 undercut 7970 by $50 and outperformed it at launch. It's hard to call NV unethical here since the consumer votes with his wallet. Some did vote and bought 7970 for mining or waited for 780/Ti and skipped GK104 entirely. Some of the more savvy gamers got 7950s and overclocked them which gave 90% of the performance of 680s/7970s. Then during the mining craze, offloaded those 7950s and got 780/R9 290s. One can hope for an unlockable R9 390.



DP is worse on Hawaii than Tahiti XT as AMD neutered it to 1/8th. In any case, for gamers DP doesn't matter. Maxwell brings 35% IPC in games.


Look at the W9100. Hawaii itself has 2x the IPC of Tahiti. It's artificially neutered on the 290's. They must have decided that being fully enabled on a consumer card wasn't the best move after Tahiti, I guess?

Nice to have you posting again. Beats the endless BS we see all of the time. Although, I'm not too sure how much we are going to agree. You seem to be of the opinion that AMD has no answers for Maxwell and they should just mail the "crown" to Jen-Hsun. :p
 
Last edited:

wand3r3r

Diamond Member
May 16, 2008
3,180
0
0
Finally some interesting debate about an intriguing topic. Thankfully it's remained quite civil too. :thumbsup:

@RussianSensation

I think you missed the posts by sushiwarrior. I would take that over any of these other rumors at this point so I'm willing to bet that we are going to see the largest die from AMD yet. I actually think this next battle will shape up to be something we have been missing for a while, although the cost is going to be prohibitive going by the symptoms of this generation.

Maxwell is shaping up to be a great architecture, however I think GCN 2 with large dies will be very competitive, perhaps more competitive then you are giving thought to.

I agree with your assessment, and historically speaking I would agree with your analysis, however we have the potential largest die yet coming and I would expect GCN 2 to bring more then GCN 1.1 brought over 1.0.

I think maxwell vs. GCN 2 will be a great showdown and await months of speculation followed by months of sticker shock once they do arrive.
 

VulgarDisplay

Diamond Member
Apr 3, 2009
6,188
2
76
Finally some interesting debate about an intriguing topic. Thankfully it's remained quite civil too. :thumbsup:

@RussianSensation

I think you missed the posts by sushiwarrior. I would take that over any of these other rumors at this point so I'm willing to bet that we are going to see the largest die from AMD yet. I actually think this next battle will shape up to be something we have been missing for a while, although the cost is going to be prohibitive going by the symptoms of this generation.

Maxwell is shaping up to be a great architecture, however I think GCN 2 with large dies will be very competitive, perhaps more competitive then you are giving thought to.

I agree with your assessment, and historically speaking I would agree with your analysis, however we have the potential largest die yet coming and I would expect GCN 2 to bring more then GCN 1.1 brought over 1.0.

I think maxwell vs. GCN 2 will be a great showdown and await months of speculation followed by months of sticker shock once they do arrive.

Luckily anyone with a high end GPU from this generation should be able to wait many months for the sticker shock pricing to come back to reality. Especially at lower resolutions while we wait for 4k prices to trend down as well.
 

OCGuy

Lifer
Jul 12, 2000
27,224
37
91
That card can't even run titan fall on ultra detail, hardly "top of the line".


If a basic source-engined game is maxing out 3GB, you don't have enough.


The card is called Titan Black.


This is just....special.
 

blastingcap

Diamond Member
Sep 16, 2010
6,654
5
76
Although I don't plan to get 4K, thank goodness it's coming because I don't want to go multi-GPU for my 3K (Eyefinity 1080p) config. It's about time that new high-end cards will be aiming at something higher than 1600p for once. A single 780 Ti/290X isn't really enough for a 3K+ system in many games with eye candy turned up.
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,815
1,294
136
Hawaii vs Future GPU

8 ACEs + 1 GCP vs 8 ACE F32 RISC units
4 Shader Engine vs 4 Compute Shader Engines
4 GEs + 4 RAS vs 1 Unified Compute GE/RAS
44 CUs vs 64 CUs
8 128KB L2s vs 4 512KB L2s
16 RBEs/64 ROPs (8 per L2) vs 16 RBEs/64 ROPs (16 per L2)
8 GDDR5 Memory Controllers vs 4 HBM Memory Controllers
512-bit Interface (Off-die memory chips) vs 4096-bit Interface (On-interposer stacked memory chips)

My alleged specifications for the GPU above Hawaii. (More of a successor to Tahiti than Hawaii)​
 
Last edited:

buletaja

Member
Jul 1, 2013
80
0
66
Hawaii vs Future GPU

8 ACEs + 1 GCP vs 8 ACE F32 RISC units
4 Shader Engine vs 4 Compute Shader Engines
4 GEs + 4 RAS vs 1 Unified Compute GE/RAS
44 CUs vs 64 CUs
8 128KB L2s vs 4 512KB L2s
16 RBEs/64 ROPs (8 per L2) vs 16 RBEs/64 ROPs (16 per L2)
8 GDDR5 Memory Controllers vs 4 HBM Memory Controllers
512-bit Interface (Off-die memory chips) vs 4096-bit Interface (On-interposer stacked memory chips)

My alleged specifications for the GPU above Hawaii. (More of a successor to Tahiti than Hawaii)​

1 F32 provided up to 32 thread
2 unit of it is enough for 64 CU
it could also embedded into the CPU itself, anyway that the concept
what is best the GPU can do --> vector/troughput processing

1 Unit means 1 cluster of cores,

Future GPU will use rather than 4 vec16
what about 16 vec4 the catch is it is in DP
 
Last edited:

buletaja

Member
Jul 1, 2013
80
0
66
Ready for DX12 , an XTX
l6ZQyxe.jpg


What about 16 Execution Unit x (4ALU 64bit) (Carrizo FMAC 256) ,
compare to GCN1.1 , 1 GCN 2.0 CU = 2x old CU
nkBqAIi.jpg


The Spec at first is like usual 32CU, but the BW is ~ for 64CU
The CPU represent 2 unit of F32
XKGOFal.jpg



*) the image from micro46 event, pdf available on the site
*) only 1 nextgen console currently share the same idea, or infact the original one
*) when 1 CU is not 1 CU
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,815
1,294
136
1 F32 provided up to 32 thread
2 unit of it is enough for 64 CU
As from what I can tell the F32 unit is for the Global Data Share and Flat Virtual Memory.

What I find odd is adding it to ACE unit where it should be in the Run List Controller. The RLC controls the interrupts, preemption, and the context of the ACE units.
 
Last edited:

NostaSeronx

Diamond Member
Sep 18, 2011
3,815
1,294
136
Could someone translate this to normal-people language? ;)
RLC Consists of
- Scratch RAM
- Block Power Manager
- Streaming Performance Monitor
- Multi-threaded F32 RISC Processor
- Save and Restore Machine

F32 the (RISC) Instruction Set which points it to Global and Flat Memory.

ACE F32 = ACE Unit + F32 Processor, but there should only be one F32 processor in the RLC.

slide-18-1024.jpg


Future Compute Shader Concurrent Multithreading.

The F32 part in this case is the Doorbell Hardware, which enables architected efficient queue write pointer update.
 
Last edited:

Nakai

Junior Member
Apr 13, 2014
1
0
0
Again, 500-550mm^2 isn't even in the same ballpark AFAIK. It's not that big because they're cramming shaders in though.

More than 600mm² should be a problem for TSMC.

Maybe it's a MCM. 2xTreasureIslands + MemChip = Fiji.

TreasureIslands ~ 250mm²
MemChip ~ 150mm²

2*250mm² + 150mm² = 650mm2

:D
 

krumme

Diamond Member
Oct 9, 2009
5,956
1,596
136
RLC Consists of
- Scratch RAM
- Block Power Manager
- Streaming Performance Monitor
- Multi-threaded F32 RISC Processor
- Save and Restore Machine

F32 the (RISC) Instruction Set which points it to Global and Flat Memory.

ACE F32 = ACE Unit + F32 Processor, but there should only be one F32 processor in the RLC.

slide-18-1024.jpg


Future Compute Shader Concurrent Multithreading.

The F32 part in this case is the Doorbell Hardware, which enables architected efficient queue write pointer update.

gulp! I am not a human...:(
 
Last edited:

buletaja

Member
Jul 1, 2013
80
0
66
what i thinking is simple:
as Mike Mantor said, GPU is become more and more about compute
then in my idea the 1536 Treasure Island is infact same area as 16 CU/1024 ALU

the idea is for compute they dont need all fixed function or geometry or whatever
so physically the die area of 1536 ALU is same as 1024 ALU (16CU)

512 (8CU) is Gfx part with 2RBE (each 24 ROPS)
512 (8CU x 2, XTX) is compute part same die area as Gfx part (but twice the ALUs)

3072 is
1024 (16CU) Gfx part with 3 RBE = 72 ROPS
1024 (16CU x 2), Compute Part

4224
2 Gfx Part
Shader engine 1 , 11 CU 2RBE , 48 ROPS
Shader engine 2 , 11 CU 2 RBE, 48 ROPS
1 COmpute Engine , 22 CU (22 x 2 = 44CU)
total 66CU

and so on, it fit with each Shader engine can have 1-4 RBE
Compute Engine dont need RBE at all

it fit with 20nm TSMC Bulk, as far as i know is only reduce area 20-30%
 
Last edited:

DiogoDX

Senior member
Oct 11, 2012
757
336
136
But if you increase the number of ALU per CU and not increase the frontend would not create a bottleneck for gaming?

Hawaii already have the double frontend compared to thaiti (4 geometry and 8 ACE) and 2x ROPS for only 40% more shader so maybe they have room to do more ALU per CU.
 

el etro

Golden Member
Jul 21, 2013
1,584
14
81
AMD needs to decrease bandwidth dependency on its architectures... Both CPU and GPU AMD tech of memory controllers is so bad...
 

VulgarDisplay

Diamond Member
Apr 3, 2009
6,188
2
76
AMD needs to decrease bandwidth dependency on its architectures... Both CPU and GPU AMD tech of memory controllers is so bad...

Just curious as to why you think amd's memory controllers are a problem when they make up ground on faster GPU's at resolutions where bandwidth is needed?
 

buletaja

Member
Jul 1, 2013
80
0
66
But if you increase the number of ALU per CU and not increase the frontend would not create a bottleneck for gaming?

Hawaii already have the double frontend compared to thaiti (4 geometry and 8 ACE) and 2x ROPS for only 40% more shader so maybe they have room to do more ALU per CU.

GP COMpute, or just compute shader dont need frontend at all
what they need only TMU, other than that
is just scheduler to CU to L2/memory controller

for GP Compute = no need for RBE or frontend, reduce the die area almost 50%
basically split the pipe into dual pipe (XTX code name)

so 1536 ALU alocation
Normal Gfx pipe (512) & Compute Pipe (2x 512) (as the compute only is 50% die area)
 
Last edited:

Enigmoid

Platinum Member
Sep 27, 2012
2,907
31
91

buletaja

Member
Jul 1, 2013
80
0
66
Modern GPU can use Compute Shader to bypass ROPS
but there is a problem

From GDC 2014, Avalanche Studio
Gc4kUrI.jpg


Solution is to bypass ROPS, by using only CS, as more Games are use more and more CS only
cCecwtI.jpg

kdllR37.jpg


The only Problem current GPU only have one render pipe
But suggest it would change in the future (NextGEN GPU / NextGen Console ?)
89AOaJr.jpg



Partial solution from AMD is R9 290x / R9 290, each shader engine is a complete separated block,
but if someday more games / apps utilizing Compute shader more,then,
it is like wasting die area for front end & Back end.

So the best of both world is Dual render pipe (just like slide above)
the 2nd pipe is compute only (save die area)
It will also shown on Pirate Island GPU, it is why the ROPS and the ALU count dont
make sense at first.

Probably for FULL DX12 implementation (FULL Feature set, not just compatible) need
dual render pipe, also shown on AMD HSA as next gen GPU
each pipe will have its own CU units.

DF interview:
Xbox One hardware supports two concurrent render pipes.
This can allow the system rendering to make use of the ROPs for fill, for example, while the title is simultaneously doing synchronous compute operations on the Compute Units.
 
Last edited:

el etro

Golden Member
Jul 21, 2013
1,584
14
81
I don't think so.

7970ghz - 3GB - 384bits - 32rops - 288GB/s
GTX780Ti - 3GB - 384bits -48rops - 366GB/s

GTX goes from 39,5% better in 1600P to 34,8% in 3 monitors.
http://www.techpowerup.com/reviews/AMD/R9_295_X2/24.html


RAM bandwidth(and framebuffer) is not the only thing that matter in High Resolution gaming performance. AMD Archs are better optimized to handle big resolutions(and this have a long time, is up to pre-R600 era).
And the number of rops is not the unique factor that defines how well GPUs will handle with bigger images. But obviously we can see how well AMD rops perform against Nvidia rops.

Go see in what of these two archs vRAM overclocks achieve better(proportionally, compare Tahiti vRAM Oc with GK104 vRAM Oc or Hawaii vRAM Oc with GK110 vRAM Oc) results.


Just curious as to why you think amd's memory controllers are a problem when they make up ground on faster GPU's at resolutions where bandwidth is needed?

Is not a problem, but is not a strength. Nvidia memory controller, or better, Nvidia memory system is really better.

This dates from Fermi vs NI era: http://www.sisoftware.co.uk/?d=qa&f=gpu_mem_latency&l=en&a=



AMD being pioneer at using each newer Vram tech don't mean they have the best memory control/management tech.
 
Feb 19, 2009
10,457
10
76
512 bit bus, stacked IMB memory, bandwidth isn't a concern on the high end.

Even a mid-range part with 256bit bus and stacked vram will have ample bandwidth.

And if its just GDDR5, they are spec to run at 7G now, AMD shipped with 5G clocks, there's room left if they can improve their MC.
 

3DVagabond

Lifer
Aug 10, 2009
11,951
204
106
How did we decide it was the mem controller holding clocks back and not simply lower spec RAM? Why pay for 7G ram when you have a 512 mem bus that doesn't require RAM to be that fast?
 

rtsurfer

Senior member
Oct 14, 2013
733
15
76
512 bit bus, stacked IMB memory, bandwidth isn't a concern on the high end.

Even a mid-range part with 256bit bus and stacked vram will have ample bandwidth.

And if its just GDDR5, they are spec to run at 7G now, AMD shipped with 5G clocks, there's room left if they can improve their MC.

If they can get stacked RAM out in time that is. I have feeling that it will most probably be delayed.

Even if we get stacked RAM with Pirate Islands, it won't be before Q3 2015.