wccftechAMD Pirate Islands : R9 300 Series Alleged Specifications Detailed

3DVagabond · Apr 12, 2014

RussianSensation said:
My bad, thanks. That still doesn't help AMD since 580 OC annihilated a 6970 OC.
http://www.techspot.com/review/423-gigabyte-geforce-gtx580-soc/page5.html

Of course since one could purchase almost 2x 6950s and unlock them to 6970, the 580 was an awful value. With price/performance, unlocking and bitcoin mining out the window, AMD doesn't have those perks anymore. AMD can't afford to release a $600 flagship that is 15-20% slower than Maxwell unless NV raises prices to $800.

680 undercut 7970 by $50 and outperformed it at launch. It's hard to call NV unethical here since the consumer votes with his wallet. Some did vote and bought 7970 for mining or waited for 780/Ti and skipped GK104 entirely. Some of the more savvy gamers got 7950s and overclocked them which gave 90% of the performance of 680s/7970s. Then during the mining craze, offloaded those 7950s and got 780/R9 290s. One can hope for an unlockable R9 390.

DP is worse on Hawaii than Tahiti XT as AMD neutered it to 1/8th. In any case, for gamers DP doesn't matter. Maxwell brings 35% IPC in games.

Look at the W9100. Hawaii itself has 2x the IPC of Tahiti. It's artificially neutered on the 290's. They must have decided that being fully enabled on a consumer card wasn't the best move after Tahiti, I guess?

Nice to have you posting again. Beats the endless BS we see all of the time. Although, I'm not too sure how much we are going to agree. You seem to be of the opinion that AMD has no answers for Maxwell and they should just mail the "crown" to Jen-Hsun.

wand3r3r · Apr 12, 2014

Finally some interesting debate about an intriguing topic. Thankfully it's remained quite civil too. :thumbsup:

@RussianSensation

I think you missed the posts by sushiwarrior. I would take that over any of these other rumors at this point so I'm willing to bet that we are going to see the largest die from AMD yet. I actually think this next battle will shape up to be something we have been missing for a while, although the cost is going to be prohibitive going by the symptoms of this generation.

Maxwell is shaping up to be a great architecture, however I think GCN 2 with large dies will be very competitive, perhaps more competitive then you are giving thought to.

I agree with your assessment, and historically speaking I would agree with your analysis, however we have the potential largest die yet coming and I would expect GCN 2 to bring more then GCN 1.1 brought over 1.0.

I think maxwell vs. GCN 2 will be a great showdown and await months of speculation followed by months of sticker shock once they do arrive.

VulgarDisplay · Apr 12, 2014

wand3r3r said:
Finally some interesting debate about an intriguing topic. Thankfully it's remained quite civil too. :thumbsup:

@RussianSensation

I think you missed the posts by sushiwarrior. I would take that over any of these other rumors at this point so I'm willing to bet that we are going to see the largest die from AMD yet. I actually think this next battle will shape up to be something we have been missing for a while, although the cost is going to be prohibitive going by the symptoms of this generation.

Maxwell is shaping up to be a great architecture, however I think GCN 2 with large dies will be very competitive, perhaps more competitive then you are giving thought to.

I agree with your assessment, and historically speaking I would agree with your analysis, however we have the potential largest die yet coming and I would expect GCN 2 to bring more then GCN 1.1 brought over 1.0.

I think maxwell vs. GCN 2 will be a great showdown and await months of speculation followed by months of sticker shock once they do arrive.

Luckily anyone with a high end GPU from this generation should be able to wait many months for the sticker shock pricing to come back to reality. Especially at lower resolutions while we wait for 4k prices to trend down as well.

OCGuy · Apr 12, 2014

TreVader said:
That card can't even run titan fall on ultra detail, hardly "top of the line".

If a basic source-engined game is maxing out 3GB, you don't have enough.

The card is called Titan Black.

This is just....special.

blastingcap · Apr 12, 2014

Although I don't plan to get 4K, thank goodness it's coming because I don't want to go multi-GPU for my 3K (Eyefinity 1080p) config. It's about time that new high-end cards will be aiming at something higher than 1600p for once. A single 780 Ti/290X isn't really enough for a 3K+ system in many games with eye candy turned up.

NostaSeronx · Apr 13, 2014

Hawaii vs Future GPU

8 ACEs + 1 GCP vs 8 ACE F32 RISC units
4 Shader Engine vs 4 Compute Shader Engines
4 GEs + 4 RAS vs 1 Unified Compute GE/RAS
44 CUs vs 64 CUs
8 128KB L2s vs 4 512KB L2s
16 RBEs/64 ROPs (8 per L2) vs 16 RBEs/64 ROPs (16 per L2)
8 GDDR5 Memory Controllers vs 4 HBM Memory Controllers
512-bit Interface (Off-die memory chips) vs 4096-bit Interface (On-interposer stacked memory chips)

My alleged specifications for the GPU above Hawaii. (More of a successor to Tahiti than Hawaii)

buletaja · Apr 13, 2014

NostaSeronx said:
Hawaii vs Future GPU

8 ACEs + 1 GCP vs 8 ACE F32 RISC units
4 Shader Engine vs 4 Compute Shader Engines
4 GEs + 4 RAS vs 1 Unified Compute GE/RAS
44 CUs vs 64 CUs
8 128KB L2s vs 4 512KB L2s
16 RBEs/64 ROPs (8 per L2) vs 16 RBEs/64 ROPs (16 per L2)
8 GDDR5 Memory Controllers vs 4 HBM Memory Controllers
512-bit Interface (Off-die memory chips) vs 4096-bit Interface (On-interposer stacked memory chips)

My alleged specifications for the GPU above Hawaii. (More of a successor to Tahiti than Hawaii)

1 F32 provided up to 32 thread
2 unit of it is enough for 64 CU
it could also embedded into the CPU itself, anyway that the concept
what is best the GPU can do --> vector/troughput processing

1 Unit means 1 cluster of cores,

Future GPU will use rather than 4 vec16
what about 16 vec4 the catch is it is in DP

buletaja · Apr 13, 2014

Ready for DX12 , an XTX

What about 16 Execution Unit x (4ALU 64bit) (Carrizo FMAC 256) ,
compare to GCN1.1 , 1 GCN 2.0 CU = 2x old CU

The Spec at first is like usual 32CU, but the BW is ~ for 64CU
The CPU represent 2 unit of F32

*) the image from micro46 event, pdf available on the site
*) only 1 nextgen console currently share the same idea, or infact the original one
*) when 1 CU is not 1 CU

NostaSeronx · Apr 13, 2014

buletaja said:
1 F32 provided up to 32 thread
2 unit of it is enough for 64 CU

As from what I can tell the F32 unit is for the Global Data Share and Flat Virtual Memory.

What I find odd is adding it to ACE unit where it should be in the Run List Controller. The RLC controls the interrupts, preemption, and the context of the ACE units.

Erenhardt · Apr 13, 2014

Could someone translate this to normal-people language?

NostaSeronx · Apr 13, 2014

Erenhardt said:
Could someone translate this to normal-people language?

RLC Consists of
- Scratch RAM
- Block Power Manager
- Streaming Performance Monitor
- Multi-threaded F32 RISC Processor
- Save and Restore Machine

F32 the (RISC) Instruction Set which points it to Global and Flat Memory.

ACE F32 = ACE Unit + F32 Processor, but there should only be one F32 processor in the RLC.

Future Compute Shader Concurrent Multithreading.

The F32 part in this case is the Doorbell Hardware, which enables architected efficient queue write pointer update.

Nakai · Apr 13, 2014

sushiwarrior said:
Again, 500-550mm^2 isn't even in the same ballpark AFAIK. It's not that big because they're cramming shaders in though.

More than 600mm² should be a problem for TSMC.

Maybe it's a MCM. 2xTreasureIslands + MemChip = Fiji.

TreasureIslands ~ 250mm²
MemChip ~ 150mm²

2*250mm² + 150mm² = 650mm2

krumme · Apr 13, 2014

NostaSeronx said:
RLC Consists of
- Scratch RAM
- Block Power Manager
- Streaming Performance Monitor
- Multi-threaded F32 RISC Processor
- Save and Restore Machine

F32 the (RISC) Instruction Set which points it to Global and Flat Memory.

ACE F32 = ACE Unit + F32 Processor, but there should only be one F32 processor in the RLC.

Future Compute Shader Concurrent Multithreading.

The F32 part in this case is the Doorbell Hardware, which enables architected efficient queue write pointer update.

gulp! I am not a human...

buletaja · Apr 13, 2014

what i thinking is simple:
as Mike Mantor said, GPU is become more and more about compute
then in my idea the 1536 Treasure Island is infact same area as 16 CU/1024 ALU

the idea is for compute they dont need all fixed function or geometry or whatever
so physically the die area of 1536 ALU is same as 1024 ALU (16CU)

512 (8CU) is Gfx part with 2RBE (each 24 ROPS)
512 (8CU x 2, XTX) is compute part same die area as Gfx part (but twice the ALUs)

3072 is
1024 (16CU) Gfx part with 3 RBE = 72 ROPS
1024 (16CU x 2), Compute Part

4224
2 Gfx Part
Shader engine 1 , 11 CU 2RBE , 48 ROPS
Shader engine 2 , 11 CU 2 RBE, 48 ROPS
1 COmpute Engine , 22 CU (22 x 2 = 44CU)
total 66CU

and so on, it fit with each Shader engine can have 1-4 RBE
Compute Engine dont need RBE at all

it fit with 20nm TSMC Bulk, as far as i know is only reduce area 20-30%

DiogoDX · Apr 14, 2014

But if you increase the number of ALU per CU and not increase the frontend would not create a bottleneck for gaming?

Hawaii already have the double frontend compared to thaiti (4 geometry and 8 ACE) and 2x ROPS for only 40% more shader so maybe they have room to do more ALU per CU.

el etro · Apr 14, 2014

AMD needs to decrease bandwidth dependency on its architectures... Both CPU and GPU AMD tech of memory controllers is so bad...

VulgarDisplay · Apr 14, 2014

el etro said:
AMD needs to decrease bandwidth dependency on its architectures... Both CPU and GPU AMD tech of memory controllers is so bad...

Just curious as to why you think amd's memory controllers are a problem when they make up ground on faster GPU's at resolutions where bandwidth is needed?

buletaja · Apr 14, 2014

DiogoDX said:
But if you increase the number of ALU per CU and not increase the frontend would not create a bottleneck for gaming?

Hawaii already have the double frontend compared to thaiti (4 geometry and 8 ACE) and 2x ROPS for only 40% more shader so maybe they have room to do more ALU per CU.

GP COMpute, or just compute shader dont need frontend at all
what they need only TMU, other than that
is just scheduler to CU to L2/memory controller

for GP Compute = no need for RBE or frontend, reduce the die area almost 50%
basically split the pipe into dual pipe (XTX code name)

so 1536 ALU alocation
Normal Gfx pipe (512) & Compute Pipe (2x 512) (as the compute only is 50% die area)

DiogoDX · Apr 14, 2014

el etro said:
AMD needs to decrease bandwidth dependency on its architectures... Both CPU and GPU AMD tech of memory controllers is so bad...

I don't think so.

7970ghz - 3GB - 384bits - 32rops - 288GB/s
GTX780Ti - 3GB - 384bits -48rops - 366GB/s

GTX goes from 39,5% better in 1600P to 34,8% in 3 monitors.
http://www.techpowerup.com/reviews/AMD/R9_295_X2/24.html

Enigmoid · Apr 14, 2014

DiogoDX said:
I don't think so.

7970ghz - 3GB - 384bits - 32rops - 288GB/s
GTX780Ti - 3GB - 384bits -48rops - 366GB/s

GTX goes from 39,5% better in 1600P to 34,8% in 3 monitors.
http://www.techpowerup.com/reviews/AMD/R9_295_X2/24.html

Maxwell reduces BW needs even further.

http://www.notebookcheck.net/NVIDIA-GeForce-GTX-850M.107795.0.html

3000 points in 3dmark firestrike with 1000 mhz DDR3 is impossible with any AMD gpu.

buletaja · Apr 14, 2014

Modern GPU can use Compute Shader to bypass ROPS
but there is a problem

From GDC 2014, Avalanche Studio

Solution is to bypass ROPS, by using only CS, as more Games are use more and more CS only

The only Problem current GPU only have one render pipe
But suggest it would change in the future (NextGEN GPU / NextGen Console ?)

Partial solution from AMD is R9 290x / R9 290, each shader engine is a complete separated block,
but if someday more games / apps utilizing Compute shader more,then,
it is like wasting die area for front end & Back end.

So the best of both world is Dual render pipe (just like slide above)
the 2nd pipe is compute only (save die area)
It will also shown on Pirate Island GPU, it is why the ROPS and the ALU count dont
make sense at first.

Probably for FULL DX12 implementation (FULL Feature set, not just compatible) need
dual render pipe, also shown on AMD HSA as next gen GPU
each pipe will have its own CU units.

DF interview:

Xbox One hardware supports two concurrent render pipes.

This can allow the system rendering to make use of the ROPs for fill, for example, while the title is simultaneously doing synchronous compute operations on the Compute Units.

el etro · Apr 14, 2014

DiogoDX said:
I don't think so.

7970ghz - 3GB - 384bits - 32rops - 288GB/s
GTX780Ti - 3GB - 384bits -48rops - 366GB/s

GTX goes from 39,5% better in 1600P to 34,8% in 3 monitors.
http://www.techpowerup.com/reviews/AMD/R9_295_X2/24.html

RAM bandwidth(and framebuffer) is not the only thing that matter in High Resolution gaming performance. AMD Archs are better optimized to handle big resolutions(and this have a long time, is up to pre-R600 era).
And the number of rops is not the unique factor that defines how well GPUs will handle with bigger images. But obviously we can see how well AMD rops perform against Nvidia rops.

Go see in what of these two archs vRAM overclocks achieve better(proportionally, compare Tahiti vRAM Oc with GK104 vRAM Oc or Hawaii vRAM Oc with GK110 vRAM Oc) results.

VulgarDisplay said:
Just curious as to why you think amd's memory controllers are a problem when they make up ground on faster GPU's at resolutions where bandwidth is needed?

Is not a problem, but is not a strength. Nvidia memory controller, or better, Nvidia memory system is really better.

This dates from Fermi vs NI era: http://www.sisoftware.co.uk/?d=qa&f=gpu_mem_latency&l=en&a=

AMD being pioneer at using each newer Vram tech don't mean they have the best memory control/management tech.

Silverforce11 · Apr 14, 2014

512 bit bus, stacked IMB memory, bandwidth isn't a concern on the high end.

Even a mid-range part with 256bit bus and stacked vram will have ample bandwidth.

And if its just GDDR5, they are spec to run at 7G now, AMD shipped with 5G clocks, there's room left if they can improve their MC.

3DVagabond · Apr 14, 2014

How did we decide it was the mem controller holding clocks back and not simply lower spec RAM? Why pay for 7G ram when you have a 512 mem bus that doesn't require RAM to be that fast?

rtsurfer · Apr 14, 2014

Silverforce11 said:
512 bit bus, stacked IMB memory, bandwidth isn't a concern on the high end.

Even a mid-range part with 256bit bus and stacked vram will have ample bandwidth.

And if its just GDDR5, they are spec to run at 7G now, AMD shipped with 5G clocks, there's room left if they can improve their MC.

If they can get stacked RAM out in time that is. I have feeling that it will most probably be delayed.

Even if we get stacked RAM with Pirate Islands, it won't be before Q3 2015.

wccftechAMD Pirate Islands : R9 300 Series Alleged Specifications Detailed

Lifer

Diamond Member

Diamond Member

Lifer

Diamond Member

Diamond Member

Member

Member

Diamond Member

Diamond Member

Diamond Member

Junior Member

Diamond Member

Member

Senior member

Golden Member

Diamond Member

Member

Senior member

Platinum Member

Member

Golden Member

Lifer

Lifer

Senior member