Question Speculation: RDNA2 + CDNA Architectures thread

uzzi38 · Apr 28, 2020

All die sizes are within 5mm^2. The poster here has been right on some things in the past afaik, and to his credit was the first to saying 505mm^2 for Navi21, which other people have backed up. Even still though, take the following with a pich of salt.

Navi21 - 505mm^2

Navi22 - 340mm^2

Navi23 - 240mm^2

Source is the following post: https://www.ptt.cc/bbs/PC_Shopping/M.1588075782.A.C1E.html

Glo. · Jun 3, 2021

Kepler_L2 said:
Yeah it's Rembrandt, 12 CU with 2 SA and no L3/IC.

Would Infinity Cache that is available to the CPU be visible in the GPU driver?

THAT is the key thing to know the answer.

I don't believe that Rembrandt does not have IC and has ne L3 cache. It makes zero sense for it.

moinmoin · Jun 3, 2021

Glo. said:
Would Infinity Cache that is available to the CPU be visible in the GPU driver?

Does the GPU driver need to manage the IC? Honestly assumed all these times that it's managed by the hardware transparently to the driver and OS, never thought of looking at the Linux driver whether that's actually the case.

Gideon · Jun 4, 2021

moinmoin said:
Does the GPU driver need to manage the IC? Honestly assumed all these times that it's managed by the hardware transparently to the driver and OS, never thought of looking at the Linux driver whether that's actually the case.

At least the very first IC leaks were from Linux drivers where the size was listed.

Bigos · Jun 4, 2021

I believe the only thing related to IC in the Linux AMD graphics drivers (other than listing its size, for API query purposes) is MALL (IIRC Memory Access Last Level), which is a method for the display controller to source framebuffer solely from the IC to improve power use.

The fact the L3 is not listed for Rembrandt in the Linux compute drivers suggests it doesn't exist. It would make no sense to hide it from the compute applications (which could size their working set based on the L3 size otherwise).

Also, "Infinity Cache" is a marketing term. The Zen CCX L3 cache != RDNA L3 cache. They are different on a couple fronts:

The Zen L3 seems to be a victim cache, i.e. it holds cache lines that have been evicted from a core's L2 cache. It means that in order to fill L3 cache line it must have been present in at least one core's L2 cache.
On the other hand, RDNA L3 cache seems to be a transparent memory controller cache (given how it is sized based on the memory controller width), i.e. it caches requests to memory.
As mentioned before, the cache line size differs. It is 64 bytes on CPU and 128 bytes on GPU. It is still possible to respond to a GPU cache read using two CPU cache reads, but it makes the whole mechanism more complicated (how to handle partial hits?) and not something you want to have on a fast path (that's probably one of the reasons the fastest memory for GPU on an APU is so called "uncached memory", which is not cached by CPU at all).

However, if part of the "Zen 3+" is a redesign of the L3 cache, we might see it being used more often by the GPU. The GPU might be connected to it in a similar way the CPUs are, for example. Given no mention of it in amdkfd makes it however fairly unlikely.

beginner99 · Jun 4, 2021

I terms of the 6800m it seems Asus gimped on the RAM in the G15 making their flagship laptops as reviewed by anantech CPU limited in some games.

In shadow of the tomb raider the FPS goes from 119 (120 in anadtech bench) to 135 and above some 3080 offerings. Even worse the performance also seemes gimped when playing on the laptop screen (just 100fps). So external screen + new ram = 1/3 better performance.

moinmoin · Jun 4, 2021

beginner99 said:
I terms of the 6800m it seems Asus gimped on the RAM in the G15 making their flagship laptops as reviewed by anantech CPU limited in some games.

In shadow of the tomb raider the FPS goes from 119 (120 in anadtech bench) to 135 and above some 3080 offerings. Even worse the performance also seemes gimped when playing on the laptop screen (just 100fps). So external screen + new ram = 1/3 better performance.

Looks like chips scarcity hit again, in this case better RAM simply isn't available in sufficient quantity. Not really a good look for the new "AMD Advantage" certification that it allows that though.

beginner99 · Jun 4, 2021

moinmoin said:
Not really a good look for the new "AMD Advantage" certification that it allows that though.

Exactly. Else it is just OEMs being OEMs but this waters down the AMD cert right from the start.

Howeber not sure of this is due to chip shortage really. Rather a bean-counter move. It's not like huge gaming laptops sell that much.

scineram · Jun 6, 2021

How many memory chips do you think fail basic JEDEC specs? Pathetic move.

TESKATLIPOKA · Jun 7, 2021

Some RX 6700M reviews are out, but nothing for RX 6600M, that's infuriating.

blckgrffn · Jun 7, 2021

TESKATLIPOKA said:
Some RX 6700M reviews are out, but nothing for RX 6600M, that's infuriating.

Wasn't that part of the launch announcement? That the high and low end cards were shipping now and the mid range shipping soon?

And with all the leeway they are giving the chassis folks, especially on the 6600M I expect performance to vary a lot from chassis to chassis.

Yay, laptop GPUs. /s

I mean, I want them, but their "configurability" is makes it so hard to know what you are getting.

TESKATLIPOKA · Jun 7, 2021

What low-end cards are shipping now?

blckgrffn · Jun 7, 2021

AnandTech Forums: Technology, Hardware, Software, and Deals

Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

www.anandtech.com

Second to last paragraph. 6800M and 6600M are shipping now, 6700M is coming someday.

Ah I see what you mean.

I forgot that it was 6800M/6700M/6600M - I had it shifted one number. My bad. Thought there as a 6500M to go with 6700M - the 6700M should be the last one to get benches, even though with that SKU they will be all over the map based on TDP and memory speeds and so on.

GodisanAtheist · Jun 7, 2021

In this environment, I hope some of the reviewers get out of their comfort zone and bench laptops against some desktop hardware.

It might not be an apples to apples comparison, but a $ to $ comparison in this market would still ultimately be useful.

If I needed a new computer right now, it would be nice to know how much performance I'd be getting out of, say, a $1000 gaming laptop vs the GPU landscape.

Glo. · Jun 8, 2021

AMD just released Radeon Pro W6600.

10.4 TFLOPs, from 1792 ALUs.

That equals to 2.9 GHz core clock.

From 100W GPU.

Let that sink in.

TESKATLIPOKA · Jun 8, 2021

Glo. said:
AMD just released Radeon Pro W6600.

10.4 TFLOPs, from 1792 ALUs.

That equals to 2.9 GHz core clock.

From 100W GPU.

Let that sink in.

I think there must be a mistake, even If It's on the official AMD pages and presentations.
Link Link
As you said It would mean 2.9GHz clockspeed(Turbo) for this 28CU GPU with 100W TBP.
For comparison, 40CU RX 6700XT with 2.58GHz(turbo) has 230W TBP, which already doesn't make sense to need 2.3x more power.

Then we have 28CU RX 6600M with 8.77 TFLOPs and that's 2.45GHz(turbo) or 450MHz lower and not TBP but just GPU power is up to 100W, which is the same. This also doesn't make any sense.

P.S. I already found an error in memory interface for Radeon Pro W6600, which should be 128bit but they mention 4096bit.

Glo. · Jun 8, 2021

TESKATLIPOKA said:
I think there must be a mistake, even If It's on the official AMD pages and presentations.
Link Link
As you said It would mean 2.9GHz clockspeed(Turbo) for this 28CU GPU with 100W TBP.
For comparison, 40CU RX 6700XT with 2.58GHz(turbo) has 230W TBP, which already doesn't make sense to need 2.3x more power.

Then we have 28CU RX 6600M with 8.77 TFLOPs and that's 2.45GHz(turbo) or 450MHz lower and not TBP but just GPU power is up to 100W, which is the same. This also doesn't make any sense.

P.S. I already found an error in memory interface for Radeon Pro W6600, which should be 128bit but they mention 4096bit.

The clock for those GPUs is nowhere to be found on the sp[ec sheets.

Do you believe that AMD would be THAT incompetent to post incorrect maximum, Peak TFLOPs numbers for ALL of important workloads in professional lineup?

Those clocks are unbelievable. But that would require incredible lack of competence from person who runs the professional group at AMD, because he has to say "OK" to everything his teams are doing, and that includes - marketing efforts.

TESKATLIPOKA · Jun 8, 2021

If you believe they can't be that incompetent then ok, but can you give me a reasonable answer why this GPU has much higher turbo(~18%) or TFLOPs than the mobile version even though they have the same configuration, yet It consumes less power based on official specs(GPU power 100W vs TBP 100W).

For example:
Radeon PRO W6800 has 17.83 TFLOPs which means 2.3GHz turbo with 250W TBP.
RX 6800 has 16.17 TFLOPs which means 2.1GHz turbo with 250W TBP.
~10% higher Turbo with same TBP, but It's true Vram is doubled. A new better revision maybe? Interesting.

Glo. · Jun 8, 2021

TESKATLIPOKA said:
If you believe they can't be that incompetent then ok, but can you give me a reasonable answer why this GPU has much higher turbo or TFLOPs than the mobile version even though they have the same configuration, yet It consumes less power based on official specs(GPU power 100W vs TBP 100W).

For example:
AMD Radeon PRO W6800 has 17.83 TFLOPs which means 2.3GHz turbo with 250W TBP. This is comparable to desktop RX 6800.

It may mean, that mobile 6600 actually runs at way lower thermal envelope, and that "up to 100W" rating allows the GPU to clock way, way higher than advertised.

Im not saying this is the case. But it could be one possible explenation, that AMD sandbags the mobile 6600M way below its potential.

TESKATLIPOKA · Jun 8, 2021

I personally don't believe in higher clockspeed or lower power consumption and that AMD is sandbagging us. The reason is N22 and Its specs in comparison to N23.

BTW no 32CU version out yet. Is really everything sent for Tesla?

Glo. · Jun 8, 2021

TESKATLIPOKA said:
BTW no 32CU version out yet. Is really everything sent for Tesla?

Tesla has exactly the same performance as W6600.

TESKATLIPOKA · Jun 8, 2021

Do you think even in Tesla is only the 28CU version?
Then where are the full 32CU versions? Unless N23 has physically only 28CU.
That would be hilarious.

If this boost turns out true, then even a 16CU N24 could perform better than RX 5500XT and If It could fit into 75W limit, then a great GPU for HTPC.

dr1337 · Jun 8, 2021

TESKATLIPOKA said:
Unless N23 has physically only 28CU.

cant speak about config or clocks but the 'die shot' on their marketing materials is definitely navi 23 and it definitely has 32 CUs.

Glo. · Jun 8, 2021

TESKATLIPOKA said:
Do you think even in Tesla is only the 28CU version?
Then where are the full 32CU versions? Unless N23 has physically only 28CU.
That would be hilarious.

If this boost turns out true, then even a 16CU N24 could perform better than RX 5500XT and If It could fit into 75W limit, then a great GPU for HTPC.

Potentially 28 CUs because AMD has to use every single die possible, hence the CU limit of 28 CUs for each SKU possible.

Also this might mean, that 6600XT is actually 28 CUs.

Yeah, even 1024 ALU Navi 24, clocked at 3 GHz(for the giggles) would be 6 TFLOPs of compute power.

TESKATLIPOKA · Jun 8, 2021

Glo. said:
Potentially 28 CUs because AMD has to use every single die possible, hence the CU limit of 28 CUs for each SKU possible.

Also this might mean, that 6600XT is actually 28 CUs.

The weird thing is that we have 6800M what is a fully unlocked N22 chip and a partially disabled 6700M, but a smaller N23 is limited to only 28CU.

Yeah, even 1024 ALU Navi 24, clocked at 3 GHz(for the giggles) would be 6 TFLOPs of compute power.

At 3GHz It should be capable to go against 3050Ti.

Glo. · Jun 8, 2021

TESKATLIPOKA said:
The weird thing is that we have 6800M what is a fully unlocked N22 chip and a partially disabled 6700M, but a smaller N23 is limited to only 28CU.

At 3GHz It should be capable to go against 3050Ti.

Against 3050 Ti is Navi 14 capable of going with full 24 CUs and 16 Gbps GDDR6, let alone a GPU with more TFLOPs(1024 ALUs at 3 GHz is for gaming better than 1536 ALUs at 1.9 GHz).

Question Speculation: RDNA2 + CDNA Architectures thread

Platinum Member

Diamond Member

Diamond Member

Platinum Member

Senior member

Diamond Member

Diamond Member

Diamond Member

Senior member

Platinum Member

Diamond Member

Platinum Member

Diamond Member

Diamond Member

Diamond Member

Platinum Member

Diamond Member

Platinum Member

Diamond Member

Platinum Member

Diamond Member

Platinum Member

Senior member

Diamond Member

Platinum Member

Diamond Member