• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Page 709 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
The better question if you are going solder RAM, you could save board space and have bigger busses if you use on-package but what are the disadvantages of soldering memory to the board vs MoP?

I can’t think of any for STX halo.
 
I am a straight up bubble huffer. I say that AI functions will be the main reason for sales of computing devices by 2040. I still find it bizarre there’s so many people on various text site forums don’t see that AI is THE future of computing in society.
matrix math isn't all that useful.
I believe the next 30 years will yield a bigger technological and social transformation than than the period from the dawn of civilization to 2024.
xir we're launching TLAMs at Taiwan in a few years.
You sure?
I can’t think of any for STX halo.
MoP means SKU spam.
Suboptimal for a new swimlane new part new-new-new.
 
I am a straight up bubble huffer. I say that AI functions will be the main reason for sales of computing devices by 2040. I still find it bizarre there’s so many people on various text site forums don’t see that AI is THE future of computing in society.

To be sure, new algorithms are needed. The current ones are just stopgaps, as impressive as they may be at these small simple tasks. But, just as current computers are many orders of magnitude more complex and capable than a Commodore 64, neural networks are going to follow the same trajectory, at about triple the speed.

Research is already underway to boost reasoning capacity by 100 to 1000 fold. Those algorithms will be here in a few short years, ones that will make large language models feel like stone age technology. Critically, those won’t be the last ones.

I believe the next 30 years will yield a bigger technological and social transformation than than the period from the dawn of civilization to 2024.

Yeah, I’m huffing big time.
this shows a complete lack of understanding about the hardware your algorithms are running on and the evolution of hardware and what the future looks like.

Your Computations are already running on hyper targeted and optimised hardware that has the biggest BOM's out side of mainframes we have ever seen. Also who exactly is making any money from AI outside Nvida? Im watching this daily as i want to move before the A** falls out of the market.
 
matrix math isn't all that useful.

and yet is the main focus of every major chipmaker on the planet, as well as nearly all of the minor ones. From CEOs, CTOs, PhD fellows all the way down to interns, it is the most important aspect of every architecture and chip at every stage from concept through design, production and validation. Hmmmmmmmm.

xir we're launching TLAMs at Taiwan in a few years.
You sure?
Surely, you’ve noticed that multiple countries have begun spending hundreds of billions of dollars to ensure that they have angstrom scale fabs inside their borders. Each of those countries will massively increase spending and several other nations will join them.

this shows a complete lack of understanding about the hardware your algorithms are running on and the evolution of hardware and what the future looks like.

Your Computations are already running on hyper targeted and optimised hardware that has the biggest BOM's out side of mainframes we have ever seen. Also who exactly is making any money from AI outside Nvida? Im watching this daily as i want to move before the A** falls out of the market.
Well obviously, I think the lack of understanding is fully in your camp.

Llama 3 405B has for the last week been used and tested by thousands of people who have been heavily using LLMs for two to three years now. So far it consistently demonstrates effectively equal capabilities as ChatGPT 4o. And it will run on 2 Macbook Pros connected with a single Thunderbolt cable.

More importantly, LLMs are not the AI revolution. They are the precursor.
 
and yet is the main focus of every major chipmaker on the planet, as well as nearly all of the minor ones. From CEOs, CTOs, PhD fellows all the way down to interns, it is the most important aspect of every architecture and chip at every stage from concept through design, production and validation. Hmmmmmmmm.
First time?
Surely, you’ve noticed that multiple countries have begun spending hundreds of billions of dollars to ensure that they have angstrom scale fabs inside their borders. Each of those countries will massively increase spending and several other nations will join them.
That's not how it works.
Gigafabs are Taiwan-only and R&D expertise is also pretty much non-transferrable.
 
oh, nooooooooo



Translated added -

As to why the equivalent delay is 1 and not 0.5, this is a major problem I'm having at the moment.The current version of the microcode seems that a single thread cannot see two decoders no matter what, that is, after the op$ is released or turned off, the front end directly becomes 4-wide and can only take 1 per cycle (regardless of whether there is a branch jump or not). This is obviously inconsistent with AMD's propaganda that a single thread can use two decoders, and more investigation is needed.
Mod DAPUNISHER
 
>RDNA double-issue flashbacks.
Well yeah, AMD is doing that funny PPA trick of stripping out hardware, adding double pump logic cause it is cheap and then having to rely on compilers to actually utilise the hardware.
As it turns out, the software is lagging the hardware and holding it back, as is tradition for AMD.
Whether there is a new AGESA in the next week or so that magically enables the core to use dual decoders for compatible 1t workloads, well that would make any review delay justified.
Yes I am coping.
 
Last edited:
Well yeah, AMD is doing that funny PPA trick of stripping out hardware, adding double pump logic cause it is cheap and then having to rely on compilers to actually utilise the hardware.
As it turns out, the software is lagging the hardware and holding it back, as is tradition for AMD.
Whether there is a new AGESA in the next week or so that magically enables the core to use dual decoders for compatible 1t workloads, well that would make any review delay justified.
Yes I am coping.
^we are coping.

:copiumhuff:
 
and yet is the main focus of every major chipmaker on the planet, as well as nearly all of the minor ones. From CEOs, CTOs, PhD fellows all the way down to interns, it is the most important aspect of every architecture and chip at every stage from concept through design, production and validation. Hmmmmmmmm.


Surely, you’ve noticed that multiple countries have begun spending hundreds of billions of dollars to ensure that they have angstrom scale fabs inside their borders. Each of those countries will massively increase spending and several other nations will join them.


Well obviously, I think the lack of understanding is fully in your camp.

Llama 3 405B has for the last week been used and tested by thousands of people who have been heavily using LLMs for two to three years now. So far it consistently demonstrates effectively equal capabilities as ChatGPT 4o. And it will run on 2 Macbook Pros connected with a single Thunderbolt cable.

More importantly, LLMs are not the AI revolution. They are the precursor.
So one model that cant do anything useful is more efficient then another model that cant do anything useful....

how many more 10x's of everything do we need to get somewhere useful.

We kind of had 10x scaling in hardware moving from CPU SIMD to MIMD to GEMM engines to hardware aware sparsity, memory compression etc etc. Now we have massive clusters with chips at retile limits that are limited by the speed of light across derivatives of Clos fabrics In a word where CMOS scaling is dead.

I've seen one good use case of AI that works today in my field (high-end tech) and its not replacing a single job or driving large efficiencies. It will just give better situational awareness during failure conditions.
 
Somehow 9% improvement in INT 1T doesn't sound as bad if it's still (apparently) 4-wide decode.
Didn't think they could get any more out of that.
 
Well yeah, AMD is doing that funny PPA trick of stripping out hardware, adding double pump logic cause it is cheap and then having to rely on compilers to actually utilise the hardware.
As it turns out, the software is lagging the hardware and holding it back, as is tradition for AMD.
Whether there is a new AGESA in the next week or so that magically enables the core to use dual decoders for compatible 1t workloads, well that would make any review delay justified.
Yes I am coping.
I am afraid it's about bad marketing message and miscommunication. The materials were mentioning that decoders are statically partitioned in SMT mode. Now traditionally when you wanted to turn off SMT, you went to BIOS and disabled it. Now the question is, is the SMT mode static when enabled [If SMT is on in the BIOS is the core always in SMT mode] or is it dynamic like the interviews are leading us to believe.
 
🤔 It seems that the AMD employee who submitted the patch knew it was 4-wide all along.
We were bamboozled by Mike Clark yet again.
It might be other things too. These were early patches, they might have wanted just to get znver5 option added that would not dramatically break the situation for people using --march=native rather than to give accurate representation of the core. They might have wanted not to share everything or they might have forgot to update. Don't forget that CPUs are designed to handle less than ideal code [OoO, branch prediction etc.] so this won't have terrible effect. It would be much worse if somebody forgot to turn on all available instructions sets as that would hamper the generated code more.
 
To be sure, new algorithms are needed.
The real AI has never been tried, just wait 2-3 weeks (insert 💲💲 here). Well, seriously, what kind of use the current transformers have apart from replacing the politicians (as they can lie and get even more delusional than the most flamboyant political figures around the world). And also, why do we need it integrated into general-purpose CPUs and GPUs at all, it could be just a separate addon board or card and we won't need to sacrifice 16 mb of cache and employ a weird dual ccd setup for this deadweight silicon
 
Back
Top