Discussion RDNA4 + CDNA3 Architectures Thread

DisEnchantment · Mar 23, 2022

With the GFX940 patches in full swing since first week of March, it is looking like MI300 is not far in the distant future!
Usually AMD takes around 3Qs to get the support in LLVM and amdgpu. Lately, since RDNA2 the window they push to add support for new devices is much reduced to prevent leaks.
But looking at the flurry of code in LLVM, it is a lot of commits. Maybe because US Govt is starting to prepare the SW environment for El Capitan (Maybe to avoid slow bring up situation like Frontier for example)

See here for the GFX940 specific commits

History for llvm/lib/Target/AMDGPU - llvm/llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies. - History for llvm/lib/Target/AMDGPU - llvm/llvm-project

github.com

Or Phoronix

More AMD "GFX940" Enablement Work Landing In LLVM - Phoronix

www.phoronix.com

There is a lot more if you know whom to follow in LLVM review chains (before getting merged to github), but I am not going to link AMD employees.

I am starting to think MI300 will launch around the same time like Hopper probably only a couple of months later!
Although I believe Hopper had problems not having a host CPU capable of doing PCIe 5 in the very near future therefore it might have gotten pushed back a bit until SPR and Genoa arrives later in 2022.
If PVC slips again I believe MI300 could launch before it

This is nuts, MI100/200/300 cadence is impressive.

Previous thread on CDNA2 and RDNA3 here

Question - Speculation: RDNA3 + CDNA2 Architectures Thread

Man I have been dying to make this one for a while now. First rumours for RDNA3 are here so new thread time! Just going to start off with this one for now: kopite7kimi on Twitter: "@VideoCardz Ah, I mean a simple mcm design with 10240 cores is not enough. Because the lift from RDNA2 to RDNA3...

forums.anandtech.com

Ajay · Jul 23, 2023

DisEnchantment said:
Memory Bandwidth shouldn't be a problem for AMD (or NV) next year. 32GT/s GDDR7 on a 384 bit interface is a lot. Even if we assume the first gen GDDR7 to be unable to hit the PR value of 32 GT/s, at a lower 28 GT/s, that is an incredible 1344 GB/s. By next year GDDR6 should be able to hit ~1000 GB/s on a 384bit interface @22 GT/s.

According to Ryan's post on the front page, volumes will be too low in 2024 to use on consumer GFX cards. Those dice will be headed to high end HPC/ML GPUs. Also, Samsung want to get the power consumption down for consumer cards.

Samsung Completes Initial GDDR7 Development: First Parts to Reach Up to 32Gbps/pin

www.anandtech.com

As an aside - some people think the next round of consumer GPUs won’t hit till 2025. I have two opinions on this. First, I don’t see AMD or NV doing much to change the release schedules of the next generation of AIBs - in part because AIB manufacturers will want something new and shiny to sell, especially given the lackluster sales of this gen. Second, AMD really needs an improvement over the RDNA3 GPUs and, IMHO, can’t afford a delay. Nvidia surely isn’t going to sit still and likely be put in the rear view mirror when AMD top RDNA4 GPU is released.

So, I believe the 2H2024 rumors.

menhera · Jul 23, 2023

Maybe AMD should merge LDS and two separate L0 cache slices in a WGP, practically doubling L0 without more transitors just as Nvidia did with Turing. Radeon GPU Profiler indicates that most workloads use very little LDS or not at all. In Nvidia's architectures, unused LDS (what Nvidia calls Shared Memory) serves as additional L1 cache. Clever design.

SteinFG · Jul 23, 2023

TESKATLIPOKA said:
I think that effective BW is just marketing BS.

Averaging It is just nonsense, either you have the data in cache or you don't.
If you have It, then you have the maximum ~1940 GB/s BW in case of RDNA2 N21 or ~4470 GB/s for RDNA3 N31.
If you don't then It leaves you with only GDDR6/7 BW.

P.S. I wonder how much IC would be needed for 90% hitrate at 4K. 1GB? 😀
I think 4 stacks of Hynix HBM3E with 4TB/s(1TB/s per Stack) and 64-96GB Vram(16-24GB per Stack) could end up cheaper to make.

Edit: It looks like IC size matters for total BW.
I think what Locuza wrote as theoretical Iinfinity Cache BW is wrong for N23/24. N22 is 0.75 of N21 If we exclude hitrate, but N23 and N24 are not 1/4 and 1/8 of N21.
Then 1GB IC BW exluding hitrate would be 10.67x higher than N31 has?

If I'll take AMD's numbers, there's an easy way to calculate the L3 hitrate they're expecting. And about N23/N24 - yes, it's ~1/4 and ~1/8, otherwise numbers don't add up at all.

TESKATLIPOKA · Jul 23, 2023

SteinFG said:
If I'll take AMD's numbers, there's an easy way to calculate the L3 hitrate they're expecting. And about N23/N24 - yes, it's ~1/4 and ~1/8, otherwise numbers don't add up at all.
View attachment 83432

Your L3 hit rate is seriously wrong.
How can N22's 96MB IC have higher hit rate than N21's 128MB IC? That's nonsense. Or N23's 32MB IC having only 3% higher hit rate than N24's 16MB IC?
N31 has the same amount of IC as N22, yet you got only 48% vs 60%.
You should recalculate It. BTW, hit rates are already provided in the second picture.

SteinFG · Jul 23, 2023

TESKATLIPOKA said:
How can N22's 96MB IC have higher hit rate than N21's 128MB IC?

Because N22 is targeting 1440P, and N21 is targeting 4K. you already shared a slide where a card has different hit rate depending on the resolution, so it shouldn't be hard to grasp.

TESKATLIPOKA said:
Or N23's 32MB IC having only 3% higher hit rate than N24's 16MB IC?

That's a weird one, don't have an explanation.
Maybe there's something I'm missing, ~~most likely L3 BW is wrong on this one, as there's no info about that, and I'm just taking a % of N21 L3 instead.~~
EDIT: one possible explanation is that because N24 works with half the VRAM speed and half the VRAM size of N23, it has almost the same hit rate even with half of L3. idk.

TESKATLIPOKA said:
You should recalculate It. BTW, hit rates are already provided in the second picture.

1) hit rates is what I was calculating, so kinda defeats the purpose
2) the hit rates on the slide shown have huge deltas, can't really do any math with that. I prefer concrete numbers more, that's why I took the route of searching for exact numbers, even if they're mostly for marketing.

TESKATLIPOKA · Jul 23, 2023

SteinFG said:
Because N22 is targeting 1440P, and N21 is targeting 4K. you already shared a slide where a card has different hit rate depending on the resolution, so it shouldn't be hard to grasp.

Then you should mention that in your table, It's not like I know where you got those effective BW data except the one for N21. Where did you find them?
Also what you calculated vs data from graph are quite different.
N21 58% vs 62%. This isn't that different
N22 60% vs 69%. This is very different
N23 and N24 looks weird, but you already know It.

@SteinFG

EDIT: one possible explanation is that because N24 works with half the VRAM speed and half the VRAM size of N23, it has almost the same hit rate even with half of L3. idk.

Hit rate depends on resolution and infinity cache size. Vram size or speed has nothing to do with It.
N24 could have comparable hitrate to N23 only If It was at a lower resolution.

DisEnchantment · Jul 23, 2023

menhera said:
Maybe AMD should merge LDS and two separate L0 cache slices in a WGP, practically doubling L0 without more transistors just as Nvidia did with Turing. Radeon GPU Profiler indicates that most workloads use very little LDS or not at all. In Nvidia's architectures, unused LDS (what Nvidia calls Shared Memory) serves as additional L1 cache. Clever design.

Using LDS to augment L0 (depicted as L1 in pic) when unused and sharing L0 across SIMDs in a WGP appeared in a patent below.
RDNA2 has problems when one CU in a WGP wants to use data present in another L0 within the WGP. It has to go via LDS. It was in one presentation from Lou Kramer, AMD. Not sure if the same issue is there in RDNA3.

PROCESSING DEVICE AND METHOD OF SHARING STORAGE BETWEEN CACHE MEMORY, LOCAL DATA STORAGE AND REGISTER FILES
<https://www.freepatentsonline.com/y2023/0069890.html>

I think you know stuffs, but you are playing along 🙂

DisEnchantment · Jul 23, 2023

DisEnchantment said:
Recap of the bunch of new RT related patents too

20200380761 COMMAND PROCESSOR BASED MULTI DISPATCH SCHEDULER

https://www.freepatentsonline.com/y2020/0380761.html

10692271 Robust ray-triangle intersection

https://www.freepatentsonline.com/10692271.html

10706609 Efficient data path for ray triangle intersection

https://www.freepatentsonline.com/10706609.html

10930050 Mechanism for supporting discard functionality in a ray tracing context

https://www.freepatentsonline.com/10930050.html

20210209832 BOUNDING VOLUME HIERARCHY TRAVERSAL

https://www.freepatentsonline.com/y2021/0209832.html

Possible HW BVH Traversal

20210287421 RAY-TRACING MULTI-SAMPLE ANTI-ALIASING

https://www.freepatentsonline.com/y2021/0287421.html

20210287422 PARTIALLY RESIDENT BOUNDING VOLUME HIERARCHY

https://www.freepatentsonline.com/y2021/0287422.html

11158112 Bounding volume hierarchy generation

https://www.freepatentsonline.com/11158112.html

HW Assisted BVH structure generation

20210304484 BOUNDING VOLUME HIERARCHY COMPRESSION

https://www.freepatentsonline.com/y2021/0304484.html

BVH Compression

20210407175 EARLY CULLING FOR RAY TRACING

https://www.freepatentsonline.com/y2021/0407175.html

20200409695 COMPUTE UNIT SORTING FOR REDUCED DIVERGENCE

https://www.freepatentsonline.com/y2020/0409695.html

20210407176 EARLY TERMINATION OF BOUNDING VOLUME HIERARCHY TRAVERSAL

https://www.freepatentsonline.com/y2021/0407176.html

20220198739 PARALLELIZATION FOR RAYTRACING

https://www.freepatentsonline.com/y2022/0198739.html

20210342996 AMBIENT OCCLUSION USING BOUNDING VOLUME HIERARCHY BOUNDING BOX TESTS

https://www.freepatentsonline.com/y2021/0342996.html

11055895 Shader merge for reduced divergence

https://www.freepatentsonline.com/11055895.html

Compare RT related patents leading up to RDNA3 vs the ones below leading up to RDNA4.

BOUNDING VOLUME HIERARCHY HAVING ORIENTED BOUNDING BOXES WITH QUANTIZED ROTATIONS
From <https://www.freepatentsonline.com/y2023/0099806.html>
ACCELERATION STRUCTURES WITH DELTA INSTANCES
From <https://www.freepatentsonline.com/y2023/0097562.html>
TECHNIQUES FOR INTRODUCING ORIENTED BOUNDING BOXES INTO BOUNDING VOLUME HIERARCHY
From <https://www.freepatentsonline.com/y2023/0027725.html>
SPATIAL HASHING FOR WORLD-SPACE SPATIOTEMPORAL RESERVOIR RE-USE FOR RAY TRACING
From <https://www.freepatentsonline.com/y2022/0406002.html>
MACHINE-LEARNING BASED COLLISION DETECTION FOR OBJECTS IN VIRTUAL ENVIRONMENTS
From <https://www.freepatentsonline.com/y2022/0319096.html>
This one appeared in a paper as well recently. https://gpuopen.com/download/publications/HPG2023_NeuralIntersectionFunction.pdf
OVERLAY TREES FOR RAY TRACING
From <https://www.freepatentsonline.com/y2023/0196669.html>
GRAPHICS PROCESSING UNIT TRAVERSAL ENGINE
From <https://www.freepatentsonline.com/y2023/0206543.html>
VARIABLE WIDTH BOUNDING VOLUME HIERARCHY NODES
From <https://www.freepatentsonline.com/y2023/0206542.html>
COMMON CIRCUITRY FOR TRIANGLE INTERSECTION AND INSTANCE TRANSFORMATION FOR RAY TRACING
From <https://www.freepatentsonline.com/y2023/0206541.html>
BOUNDING VOLUME HIERARCHY BOX NODE COMPRESSION
From <https://www.freepatentsonline.com/y2023/0206540.html>
BVH NODE ORDERING FOR EFFICIENT RAY TRACING
From <https://www.freepatentsonline.com/y2023/0206539.html>
FRUSTUM-BOUNDING VOLUME INTERSECTION DETECTION USING HEMISPHERICAL PROJECTION
From <https://www.freepatentsonline.com/y2023/0206544.html>

What is standing out is that a lot of the new concepts in these patents don't seem to rely solely on shader code.

menhera · Jul 23, 2023

DisEnchantment said:
Compare RT related patents leading up to RDNA3 vs the ones below leading up to RDNA4.

BOUNDING VOLUME HIERARCHY HAVING ORIENTED BOUNDING BOXES WITH QUANTIZED ROTATIONS
From <https://www.freepatentsonline.com/y2023/0099806.html>
ACCELERATION STRUCTURES WITH DELTA INSTANCES
From <https://www.freepatentsonline.com/y2023/0097562.html>
TECHNIQUES FOR INTRODUCING ORIENTED BOUNDING BOXES INTO BOUNDING VOLUME HIERARCHY
From <https://www.freepatentsonline.com/y2023/0027725.html>
SPATIAL HASHING FOR WORLD-SPACE SPATIOTEMPORAL RESERVOIR RE-USE FOR RAY TRACING
From <https://www.freepatentsonline.com/y2022/0406002.html>
MACHINE-LEARNING BASED COLLISION DETECTION FOR OBJECTS IN VIRTUAL ENVIRONMENTS
From <https://www.freepatentsonline.com/y2022/0319096.html>
This one appeared in a paper as well recently. https://gpuopen.com/download/publications/HPG2023_NeuralIntersectionFunction.pdf
OVERLAY TREES FOR RAY TRACING
From <https://www.freepatentsonline.com/y2023/0196669.html>
GRAPHICS PROCESSING UNIT TRAVERSAL ENGINE
From <https://www.freepatentsonline.com/y2023/0206543.html>
VARIABLE WIDTH BOUNDING VOLUME HIERARCHY NODES
From <https://www.freepatentsonline.com/y2023/0206542.html>
COMMON CIRCUITRY FOR TRIANGLE INTERSECTION AND INSTANCE TRANSFORMATION FOR RAY TRACING
From <https://www.freepatentsonline.com/y2023/0206541.html>
BOUNDING VOLUME HIERARCHY BOX NODE COMPRESSION
From <https://www.freepatentsonline.com/y2023/0206540.html>
BVH NODE ORDERING FOR EFFICIENT RAY TRACING
From <https://www.freepatentsonline.com/y2023/0206539.html>
FRUSTUM-BOUNDING VOLUME INTERSECTION DETECTION USING HEMISPHERICAL PROJECTION
From <https://www.freepatentsonline.com/y2023/0206544.html>

What is standing out is that a lot of the new concepts in these patents don't seem to rely solely on shader code.

Nice patents revealed recently. TRAVERSAL ENGINE looks like something AMD needs most.

Aapje · Jul 24, 2023

Ajay said:
As an aside - some people think the next round of consumer GPUs won’t hit till 2025. I have two opinions on this. First, I don’t see AMD or NV doing much to change the release schedules of the next generation of AIBs - in part because AIB manufacturers will want something new and shiny to sell, especially given the lackluster sales of this gen.

What makes you think that Nvidia cares at all about the well-being of AIBs?

SmokSmog · Jul 24, 2023

Is 512bit bus even possible with GDDR6/X? Let's say with custom PCB?

Heartbreaker · Jul 24, 2023

SmokSmog said:
Is 512bit bus even possible with GDDR6/X? Let's say with custom PCB?

Certainly. Why wouldn't it be?

It's just expensive in term of die area, and a pain to route, because that's 16 memory channels, requiring 16 memory chips.

We probably won't see a 512bit+ GDDR bus again. If they need more than 384 bit bus can deliver, they will probably go HBM.

SteinFG · Jul 24, 2023

With GDDR6W it's easy, just use 8 chips. Each chip has 64 io lanes. 8x64=512. I assume that this is the primary reason for creating GDDR6W.

Heartbreaker · Jul 24, 2023

SteinFG said:
With GDDR6W it's easy, just use 8 chips. Each chip has 64 io lanes. 8x64=512. I assume that this is the primary reason for creating GDDR6W.

The benefits are small for GDDR6W. You get a slight board packaging/routing advantage. Mainly useful if you want a very wide GDDR bus for your GPU.

The problem with that, is GPU makers want to use the narrowest bus they can get away with, and they want to have multiple suppliers. Plus there is no improvement in capacity/channels, as these are just double capacity chips, that take two bus channels, leaving capacity at the same amount.

I suppose it could make headway if all the major GDDR makers get on board, in partnership with GPU makers, but overall GDDR6W really doesn't move the needle on anything other than a bit of board packaging benefit.

We might switch to GDDRW or we might not, but it really doesn't matter to the consumer at all. We won't see any benefit from the change, if/when it happens.

SteinFG · Jul 24, 2023

Heartbreaker said:
The benefits are small for GDDR6W. You get a slight board packaging/routing advantage. Mainly useful if you want a very wide GDDR bus for your GPU.

The problem with that, is GPU makers want to use the narrowest bus they can get away with, and they want to have multiple suppliers. Plus there is no improvement in capacity/channels, as these are just double capacity chips, that take two bus channels, leaving capacity at the same amount.

I suppose it could make headway if all the major GDDR makers get on board, in partnership with GPU makers, but overall GDDR6W really doesn't move the needle on anything other than a bit of board packaging benefit.

We might switch to GDDRW or we might not, but it really doesn't matter to the consumer at all. We won't see any benefit from the change, if/when it happens.

The question was how to get 512bit GPU to work with GDDR6, not why it should makes sense 乁⁠|⁠ ⁠･⁠ ⁠〰⁠ ⁠･⁠ ⁠|⁠ㄏ

PJVol · Jul 24, 2023

All this mi300-like packaging and multi-gcd's looks like more of the next-next-gen and just is not ready for the RDNA 4, which IMO is RDNA 3 done right.

Ajay · Jul 24, 2023

Aapje said:
What makes you think that Nvidia cares at all about the well-being of AIBs?

Uh, to a degree, yes. I'm sure EVGA dropping out got their attention. That said, Nvidia has really squeezed their AIBs in recent years and they have responded with higher prices to keep their profits up.

Ajay · Jul 24, 2023

SteinFG said:
With GDDR6W it's easy, just use 8 chips. Each chip has 64 io lanes. 8x64=512. I assume that this is the primary reason for creating GDDR6W.

👍I think we will see GDDR6W in use in consumer GFX cards b/4 GDDR7. Also, pretty sure the GDDR6W's are 32 Gb chips (though I may be wrong).

TESKATLIPOKA · Jul 24, 2023

Ajay said:
👍I think we will see GDDR6W in use in consumer GFX cards b/4 GDDR7. Also, pretty sure the GDDR6W's are 32 Gb chips (though I may be wrong).

They are 32gbit chips, but the problem is that IO is doubled.
As I understand It, using a 128-bit bus you will still be limited to same BW and 8GB Vram size.
The only advantage is that this 32gbit GDDR6w chip has the same or similar size as a 16gbit GDDR6 chip, so you need only 1/2 of memory chips. This would be very useful in laptops.
If AMD or Nvidia doubled or at least increased memory controller width by 50% then yes, It would provide increased capacity along with higher BW with the same or fewer amount of chips.

Ajay · Jul 24, 2023

TESKATLIPOKA said:
They are 32gbit chips, but the problem is that IO is doubled.
As I understand It, using a 128-bit bus you will still be limited to same BW and 8GB Vram size.
The only advantage is that this 32gbit GDDR6w chip has the same or similar size as a 16gbit GDDR6 chip, so you need only 1/2 of memory chips. This would be very useful in laptops.
If AMD or Nvidia doubled or at least increased memory controller width by 50% then yes, It would provide increased capacity along with higher BW with the same or fewer amount of chips.

Ah, I see. So, what is the benefit of GDDR6W?? Cost and component area? Bummer, from a desktop perspective.

Heartbreaker · Jul 24, 2023

SteinFG said:
The question was how to get 512bit GPU to work with GDDR6, not why it should makes sense 乁⁠|⁠ ⁠･⁠ ⁠〰⁠ ⁠･⁠ ⁠|⁠ㄏ

512bit can work with GDDR6, just like it worked with GDDR5 (about same size and only marginally more pins). You don't need GDDR6W to get a 512 bit bus working.

There simply is no need/desire for 512bit bus anymore.

TESKATLIPOKA · Jul 24, 2023

Ajay said:
Ah, I see. So, what is the benefit of GDDR6W?? Cost and component area? Bummer, from a desktop perspective.

With current GPUs you only save some space on PCB and cost for PCB and likely this chip is cheaper than 2 GDDR6 ones.
For example:
RTX 4060 8GB -> 4 GDDR6 chips or 2 GDDR6W chips
RTX 4090 24GB -> 12 GDDR6 chips or 6 GDDR6W chips
This doesn't sound like much and is mostly useful for laptops, where you have a limited space.

If AMD and Nvidia were willing to widen memory bus and GDDR7 was limited to 16Gbit chips or It wasn't released at the time, then It could be more interesting.
An example:

	CU (WGP)	Clockspeed	Memory width	Memory Type	Number of memory chips	Vram in total	Bandwidth
RX 7600	32 (16)	2655MHz	128-bit	18gbps GDDR6	4*16Gbit	8 GB	288 GB/s
RX 8600v1	48 (24) +50%	3151MHz +18.6%	128-bit	32gbps GDDR7	4*16Gbit	8 GB	512 GB/s +78%
RX 8600v2	48 (24) +50%	3540MHz +33.3%	256-bit	18gbps GDDR6W	4*32Gbit	16 GB	576 GB/s +100%

P.S. I increased TFLOPs based on the increase in BW for both version 1 and 2.

As shown above, this GDDR6W could have some good advantages, especially in laptops.
Advantages over GDDR7:
1.) Likely the PCB size wouldn't change
2.) 2x Vram If GDDR7 is still limited to 16gbit chips
3.) Higher BW
4.) maybe price of GDDR6W would be cheaper or the same as GDDR7

Disadvantages over GDDR7:
1.) wider memory bus would probably result in higher power consumption
2.) bigger GPU because of extra 128-bit PHY, actually that's extra 25-30mm2 at 6nm maybe? This shouldn't affect GPU package size much, which would increase PCB size.

Not sure If I forgot something.

Heartbreaker · Jul 24, 2023

TESKATLIPOKA said:
With current GPUs you only save some space on PCB and cost for PCB and likely this chip is cheaper than 2 GDDR6 ones.

Likely as a specialized part, it ends up more expensive than two equivalent GDDR6 parts. That's the problem with it not being in competition. You can buy regular GDDR6 from Micron, Samsung, Hynix, so far only Samsung is talking GDDR6W.

TESKATLIPOKA said:
For example:
RTX 4060 8GB -> 4 GDDR6 chips or 2 GDDR6W chips
RTX 4090 24GB -> 12 GDDR6 chips or 6 GDDR6W chips
This doesn't sound like much and is mostly useful for laptops, where you have a limited space.

If AMD and Nvidia were willing to widen memory bus and GDDR7 was limited to 16Gbit chips or It wasn't released at the time, then It could be more interesting.
An example:

CU (WGP) Clockspeed Memory width Memory Type Number of memory chips Vram in total Bandwidth
RX 7600 32 (16) 2655MHz 128-bit 18gbps GDDR6 4*16Gbit 8 GB 288 GB/s
RX 8600v1 48 (24)
+50% 3151MHz +18.6% 128-bit 32gbps
GDDR7 4*16Gbit 8 GB 512 GB/s
+78%
RX 8600v2 48 (24)
+50% 3540MHz +33.3% 256-bit 18gbps GDDR6W 4*32Gbit 16 GB 576 GB/s
+100%

P.S. I increased TFLOPs based on the increase in BW for both version 1 and 2.

As shown above, this GDDR6W could have some good advantages, especially in laptops.
Advantages over GDDR7:
1.) Likely the PCB size wouldn't change
2.) 2x Vram If GDDR7 is still limited to 16gbit chips
3.) Higher BW
4.) maybe price of GDDR6W would be cheaper or the same as GDDR7

Disadvantages over GDDR7:
1.) wider memory bus would probably result in higher power consumption
2.) bigger GPU because of extra 128-bit PHY, actually that's extra 25-30mm2 at 6nm maybe? This shouldn't affect GPU package size much, which would increase PCB size.

Not sure If I forgot something.

None of those are advantages vs still just using regular GDDR6. "W" is still just slightly easier packaging. Everything else is still the same GDDR6 vs GDDR6W. It's not an incentive to double the bus, because the biggest expense of a wider bus is on the GPU side, as it uses significant GPU silicon area to implement. "W" doesn't change that.

SteinFG · Jul 24, 2023

Honestly my idea is that GDDR_W will be good in making GPUs more compact. Have you seen reference 4060 PCB?

Imagine this size of PCB, but for something like 4080. With VRAM on-package. something like this, here's a quick paint mockup:

TESKATLIPOKA · Jul 24, 2023

Heartbreaker said:
Likely as a specialized part, it ends up more expensive than two equivalent GDDR6 parts. That's the problem with it not being in competition. You can buy regular GDDR6 from Micron, Samsung, Hynix, so far only Samsung is talking GDDR6W.

In laptops, It could be still worth It for OEMs, even If It ends up costlier, maybe.

Heartbreaker said:
None of those are advantages vs still just using regular GDDR6. "W" is still just slightly easier packaging. Everything else is still the same GDDR6 vs GDDR6W. It's not an incentive to double the bus, because the biggest expense of a wider bus is on the GPU side, as it uses significant GPU silicon area to implement. "W" doesn't change that.

You are right. For higher BW you would still widen the bus and using GDDR6W wouldn't change anything except saving PCB space compared to GDDR6. Actually, GDDR7 would be preferable with a narrower bus, especially If they made 32Gbit chips.

Discussion RDNA4 + CDNA3 Architectures Thread

Golden Member

Lifer

Junior Member

Senior member

Platinum Member

Senior member

Platinum Member

Golden Member

Golden Member

Junior Member

Golden Member

Member

Diamond Member

Senior member

Diamond Member

Senior member

Senior member

Lifer

Lifer

Platinum Member

Lifer

Diamond Member

Platinum Member

Diamond Member

Senior member

Platinum Member