[BitsAndChips]390X ready for launch - AMD ironing out drivers - Computex launch

Page 23 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Cloudfire777

Golden Member
Mar 24, 2013
1,787
95
91
I think, that's per module!

No you dont know what you are talking about.
HBM1 supports 1GB per stack. Or 2Gb (bit, not byte) per DRAM. You can stack 4 of these DRAMs together max to make one stack.
sk_hynix_hbm_dram_2.jpg


We have already seen the leak on gfxbench. Its 4096bit. And how can it be that with 8GB HBM1? Two controllers with each accessing their own 4 * 1GB stacks. One controller per GPU die. Two dies working as one.
Not 8 stacks giving 8096bit bus or 4 stacks with 2GB because that isnt supported with HBM1 which we know 390X will have
 
Last edited:

3DVagabond

Lifer
Aug 10, 2009
11,951
204
106
No you dont know what you are talking about.
HBM1 supports 1GB per stack. Or 2Gb (bit, not byte) per DRAM. You can stack 4 of these DRAMs together max to make one stack.
sk_hynix_hbm_dram_2.jpg


We have already seen the leak on gfxbench. Its 4096bit. And how can it be that with 8GB HBM1? Two controllers with each accessing their own 4 * 1GB stacks. One controller per GPU die. Two dies working as one.
Not 8 stacks giving 8096bit bus or 4 stacks with 2GB because that isnt supported with HBM1 which we know 390X will have

So you are saying it has to be dual GPU for 8GB?
 

Cloudfire777

Golden Member
Mar 24, 2013
1,787
95
91
Yes.
How else would they use the two controllers but still only have 4096bit?
If you ask me, its getting more and more convincing that we may see two Tongas under one die.

Unless the die picture on previous page is just a concept and two controllers isnt really needed for HBM.
 
Last edited:

Shehriazad

Senior member
Nov 3, 2014
555
2
46
So you are saying it has to be dual GPU for 8GB?

Yea....just how you will need a 2nd CPU to run a dual channel DDR3 setup...right? *cough* XDDD


All the assumptions are just going rampant.

There is nothing that stops AMD or anyone from using HBM1 and "duallinking(triple/quadlinking)" it for 8(12, 16) gb of HBM....performance might not scale 100% like that...but it's still gonna destroy GDDR5 by far.

I'm not even that tech savvy...but we are still talking about memory here...you can always get "moar". So what if there are 2x4GB memory blocks? Just connect those 2 and it should work fine... Of course it's more than "just connecting" them...but that's what engineers are there for....they're being paid for that kind of stuff.


I am 1000% sure that SK Hynix, Nvidia and AMD would not waste their time and money on a technology that is impossible to scale beyond 4GB when it's quite obvious that the future is 4K gaming...and thus 4GB is already cutting it close if you want to push if forward. Sure, HBM2 will have "moar"....but if HBM1 was so useless...it would not be made into a product in the first place and never actually leave their R&D department until it could effectively be used in phones or some bs like that.
 
Last edited:

3DVagabond

Lifer
Aug 10, 2009
11,951
204
106
TechReport mentioned dual linking in an article last year to get 8GB with HBM1. Not sure about the technical details of it, as even they weren't. But it's not just something that has cropped up out of thin air recently.
 

msi2

Junior Member
Oct 23, 2012
22
0
66
No you dont know what you are talking about.
HBM1 supports 1GB per stack. Or 2Gb (bit, not byte) per DRAM. You can stack 4 of these DRAMs together max to make one stack.
sk_hynix_hbm_dram_2.jpg


We have already seen the leak on gfxbench. Its 4096bit. And how can it be that with 8GB HBM1? Two controllers with each accessing their own 4 * 1GB stacks. One controller per GPU die. Two dies working as one.
Not 8 stacks giving 8096bit bus or 4 stacks with 2GB because that isnt supported with HBM1 which we know 390X will have


Did you even watch the slide i posted?
 

Cloudfire777

Golden Member
Mar 24, 2013
1,787
95
91
Did you even watch the slide i posted?

That doesnt fit Hynix own specifications for HBM1 and HBM2. If they made some advancements in the technology it does in which case they could make 8GB on 4096bit. Im not sure how they could put 2Gb DRAMs next to each other just like that though

Conflicting information here.
 
Last edited:

msi2

Junior Member
Oct 23, 2012
22
0
66
One thing is sure, i don't believe for one second in the dual-GPU (in that case in a dual Tonga configuration) hypothesis.
 

monstercameron

Diamond Member
Feb 12, 2013
3,818
1
0
hmmm, would a gpu connected via an interposer need to use AFR/SFR etc. techniques like ones connected via pcie[meaning possible scaling issues] or is it more like all the cores are on die?
 

Tuna-Fish

Golden Member
Mar 4, 2011
1,699
2,623
136
hmmm, would a gpu connected via an interposer need to use AFR/SFR etc. techniques like ones connected via pcie[meaning possible scaling issues] or is it more like all the cores are on die?

In principle you could make a much wider and faster interface through the interposer than through any traditional means, which might let you drive two separate chips as one. The bottleneck in doing this is that you have to be able to route all rop and memory access from all sources on an any <-> any network. That's a lot of inter-chip bandwidth.

Long enough in the future, interposers might drive GPUs into only having a single chip optimized for yields manufactured, with the higher-end models being just more of them.

I really don't expect them to do this on the first model. The only "multi-chip magic" I expect is that if AMD makes the bottom-most logic chip of the stack that works as a memory controller, they might move some logic from the main die there, as I doubt you need to use the entire die just for the memory controller. Cache or even ROPs maybe?

The "2x Tonga" thing seems asisine to me.
 

shady28

Platinum Member
Apr 11, 2004
2,520
397
126
Almost mid April now and no release.

I still think their new cards will have a lot of Tonga in it, up to R9 380X.

And I suspect it will look something like this :

http://videocardz.com/52834/full-amd-tonga-gpu-might-feature-384-bit-memory-interface



2048 CU
128 TMU
32-48 ROPs
384bit bus

"The Tonga XT has 256 more shader count than the Tonga PRO GPU, which means that AMD is using the age old tactic of holding the good stuff back till it is actually needed. The full fat Tonga XT packs a total of 32 compute unit and six memory controllers. Since each is 64 bits wide, it adds up to a 384 bit interface."

Read more: http://wccftech.com/tonga-xt-hsa-support-384-bit-bus/#ixzz3X7WJx3Hy
 

looncraz

Senior member
Sep 12, 2011
722
1,651
136
2. Again, the two Tonga cores would not be connected through Crossfire. AMD should fire all their engineers if they couldnt connect two cores on the same die, with an internal connection that goes directly from one core to the other. If its through TSV or through L2 cache by routing crossbars I don`t know, but if they can do it with CPU cores and IGP, they can do it with dual GPU as well. This should have zero performance hit.

3. HBM1 have a limitation of 4GB. HBM2 wont be ready until 2016. Still AMD`s slide say "up to 8GB" for the 390 WCE. Dual controller like shown in the dual core picture anyone...?


2. It isn't as easy as you'd think to split a core between multiple dies. You either need to create a heterogeneous interposer interface which can connect a set of interconnects to either an external memory controller for the separate dies or you need to build a different die for each side, or the memory controller must be slowed down in order for it to be able to accept non-synchronized inputs (using latching).

It could be done, sure, but I don't see any advantage at all. The memory controller would still need to interface with the memory, or you will need to duplicate memory controllers, in which case you need a common external L3 for each memory controller to use to transfer data from the HBM chips it controls in the (common) event that an SP from the other die needs some chunk of memory. That means there will be considerable overhead and added latency.

The easiest solution to provide 8GB is to use one large die is... well... simple. Which brings me to your point 3.


3. HBM1 is limited to 4GB using a standard direct bus, true, but that doesn't stop the memory controller from being able to address more than 4GB. It just needs an added address line and to organize the HBM modules such that you can selectively address one of them selectively using the same bus. That's only a single added trace on the interposer per memory bus if the memory modules are aware.

[GPU] ...[BUS]... [HBM:0] + [HBM:1]

When the chip address line on the bus is low, the [HBM:0] responds, when the address line on the bus is high [HBM:1] responds.

However, if the memory is not able to support this, the situation will require some decoupling action on the bus for the action lines. (Get, Set, Reset, Write, whatever they're called on HBM). The memory should ignore any signals on the lines without one of the action lines being pulled high or low, depending on its design.

[GPU] [ACTION DECOUPLER]@[ACTION BUS:0]->[HBM:0]
[GPU] [ACTION DECOUPLER]@[ACTION BUS:1]->[HBM:1]
[GPU] [RAM BUS]:[HBM:0&1]

The RAM BUS would include the clock signal (likely needs to be driven a bit more, or amplified), power, ground, references, and the data bus. The memory controller could only address one module at a time, which means that memory locality suddenly becomes much more important. That requires some fairly careful logic to prevent any performance degradation. But all that logic will be in the memory controller, with some driver adjustments needed to take care of any oversights during the hardware design (happens) or because the designers weren't 100% sure of the performance effects in certain cases and left it for the software team to handle.

In any event, pretty much every solution makes more sense than going with a bifurcated die (and duplicate memory controllers).

In fact, going with duplicate memory controllers on the same die makes more sense. You'd just split the address space right down the middle.
 

Alatar

Member
Aug 3, 2013
167
1
81
To be fair he was the first one to confirm 8GB (few days before the slide stack leaked) even though his reasoning and explanations about the situation were complete garbage.
 

Cloudfire777

Golden Member
Mar 24, 2013
1,787
95
91
http://www.fudzilla.com/news/graphics/37566-two-amd-fiji-cards-coming-in-june
FIJI XT faster than GTX980 slower than TITANX
FIJI VR some dual card..:eek:

As much as I`d like to believe this, but having two huge die`s with big TDP on one card? One thing is 2x 2816shaders on one card (295X2) but 2 x 4096shaders seems impossible. 2 x cut down Fiji with 3500shaders may be theoretically possible but to me it seems like a stretch. Not just TDP wise but also with power supply available and having two enormous die`s on one card.
Its like Nvidia using 2 x Titan X on one card. I think they used 2 x GK104 for a reason. I think 2 x GM204 is the line here as well.
 

destrekor

Lifer
Nov 18, 2005
28,799
359
126
As much as I`d like to believe this, but having two huge die`s with big TDP on one card? One thing is 2x 2816shaders on one card (295X2) but 2 x 4096shaders seems impossible. 2 x cut down Fiji with 3500shaders may be theoretically possible but to me it seems like a stretch. Not just TDP wise but also with power supply available and having two enormous die`s on one card.
Its like Nvidia using 2 x Titan X on one card. I think they used 2 x GK104 for a reason. I think 2 x GM204 is the line here as well.

Well to be fair, you probably would have said that very same thing before the 295X2 was announced. Hawaii XT in a dual-GPU configuration? A single Hawaii XT card is so hot and has such massive power draw, no way it's possible!
Oh wait.

Just because the number of shaders is increasing doesn't change the situation. Transistor count and shader/stream processor counts are constantly increasing, yet power draw and heat output remains relatively the same they are improve on their techniques.

Now, with the way Nvidia's GM200 has shaped up, I thoroughly doubt they will be releasing a proper dual-GM200 any time soon, but long-term, they may develop such a solution. I would expect them to potentially use up cut-down GM200's for a dual-GPU card, that's one way to use up the lesser performers.

Dual Fiji XT around launch? Nope. And it seems ludicrous to name the dual-GPU configuration Fiji XT (or a dual-die, which again, ridiculous) when convention has seen that to simply represent the larger die size.
 

Subyman

Moderator <br> VC&G Forum
Mar 18, 2005
7,876
32
86
If AMD does make dual die work like Intel did with CPUs, then we could see the huge performance boosts akin to multi-core CPUs around the Q6600 era. Even if AMD gets a year or so ahead of Nvidia on this type of technology it could be what they need to get back in the market. It would open a new path for development exclusive to AMD for some time instead of chasing nvidia the traditional way.

I'd be surprised to see it happen, but would be excited to see how it shakes up the market.
 

destrekor

Lifer
Nov 18, 2005
28,799
359
126
If AMD does make dual die work like Intel did with CPUs, then we could see the huge performance boosts akin to multi-core CPUs around the Q6600 era. Even if AMD gets a year or so ahead of Nvidia on this type of technology it could be what they need to get back in the market. It would open a new path for development exclusive to AMD for some time instead of chasing nvidia the traditional way.

I'd be surprised to see it happen, but would be excited to see how it shakes up the market.

I thought about that kind of performance boost and advancement, but I don't think it translate here like it did with CPUs. GPUs are already essentially a multi-thread architecture, broken into many tiny cores, whereas multi-die CPUs were really the advent huge leaps in performance in multi-core and multi-threading x86 architecture. I don't think multi-die GPU translates to the same performance advancements. I could be wrong, however.
 

Elfear

Diamond Member
May 30, 2004
7,169
829
126
I thought about that kind of performance boost and advancement, but I don't think it translate here like it did with CPUs. GPUs are already essentially a multi-thread architecture, broken into many tiny cores, whereas multi-die CPUs were really the advent huge leaps in performance in multi-core and multi-threading x86 architecture. I don't think multi-die GPU translates to the same performance advancements. I could be wrong, however.

I think what Subyman was alluding to was that if AMD is able to make a dual die GPU interconnect function the same as a single die (i.e. without all the compromises of an XDMA/SLI interconnect), it would be huge. AMD would then be able to scale GPUs out much easier and we would see performance gains like we used to see each new generation.

I'm 99.9% sure AMD doesn't have that capability yet but I can dream.