AMD Bristol/Stoney Ridge Thread

Page 50 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

NostaSeronx

Platinum Member
Sep 18, 2011
2,520
341
126
The only threading design that can replace CMT is another CMT design.

Do to the timeline of 18FDS; https://community.arm.com/developer/ip-products/physical/b/physical-ip-blog/posts/samsung-foundry-and-arm-collaborate-on-18nm-fdsoi
12FDX is more likely to be used in a 2020 SoC, after the 2019 Stoney Ridges.

32nm/28nm, Bulldozer to Excavator => High-cost design, High performance computing, Low density.
12FDX, Derived to new CMT architecture => Low-cost design, Cost-effective mobility, High density.

While peak performance will be lost, there should still be a substantial boost in performance. With the added benefit of costing less to manufacture and buy overall. The move to 256-bit datapaths warrants a design that will stay at the 128-bit datapaths.

https://www.semanticscholar.org/paper/Steamroller-Module-and-Adaptive-Clocking-System-in-Wilcox-Cole/f523cc70532a205032d6d8459fad0718bff339b1
Figure 2 => Most of the design is RVT; 34.4% at normal length, 59.2% at extended length, 5.9% RVT that is memory.

https://www.semanticscholar.org/paper/Zen:-An-Energy-Efficient-High-Performance-$\times-$-Singh-Schaefer/7b06ddb135ffc595ee3856a3b78b56b9d44a6ac4
Figure 21 => Zen is much more diverse in VT. You might be wondering what a wimpy VT is;
"For example, nominal finFETs and wimpy finFETs each have different performance characteristics and thus it may be desirable to choose either a nominal finFET or a wimpy finFET based upon the particular requirements of a semiconductor device. For example, a larger gate length provides for lower leakage and variability. Due to the slightly larger gate length of the wimpy finFET, the device drive current of the wimpy finFET is lower than that of a nominal finFET. In addition, it may be desirable to include a mixture of nominal finFETs and wimpy finFETs on the same substrate such as in certain complementary metal oxide semiconductor (CMOS) product applications."

12FDX brings bi-directional body-biasing. FBB w/ RVT = LVT-like, RBB w/ RVT = HVT-like, yadda yadda yadda. SRAM/Memory gets to use Single-well, Mixed-well(6T being N/P Well, and 2T with N-well for 8T-like perf and 6T-like density, etc.). For the extended length Vt, body biasing can also cull that as well. I'm also pretty sure, gate-first FDSOI poly-biasing is cheaper than gate-last Bulk poly-biasing.
 
Last edited:
  • Like
Reactions: amd6502

ao_ika_red

Golden Member
Aug 11, 2016
1,377
528
106
The new Carrizos are stating to arrive here, i just set up one for testing



Notice that Windows seems to misread the cache or something, also, Carriso are full modules whiout CMT? it should not be reading it as 2 cores 2 threads, Or Windows 10 is just misreading everything.
Should get the 7680 instead. That one is 845 with igpu which in my experience should be more than enough as HTPC and browsing machine.
 
  • Like
Reactions: Insert_Nickname

amd6502

Senior member
Apr 21, 2017
634
152
86
The only threading design that can replace CMT is another CMT design.

Do to the timeline of 18FDS; https://community.arm.com/developer/ip-products/physical/b/physical-ip-blog/posts/samsung-foundry-and-arm-collaborate-on-18nm-fdsoi
12FDX is more likely to be used in a 2020 SoC, after the 2019 Stoney Ridges.

32nm/28nm, Bulldozer to Excavator => High-cost design, High performance computing, Low density.
12FDX, Derived to new CMT architecture => Low-cost design, Cost-effective mobility, High density.
Well, going backwards with XV to Piledriver's shared front end might help efficiency, but this also lowers the ability of single module products. I think 22FDX is more promising than 12FDX. Keep things cheap simple and stupid for the budget A-series.

The only threading design that can replace CMT is another CMT design.
...did you miss the bit where SMT Zen replaced CMT Excavator in every use case?
Yes I think there are plenty of options, CMT or otherwise.

Particularly I think on-demand coarse grain MT would make much sense for Jaguar cores used to assist in big.little roles. This would allow big cores to power down and forward threads to little cores.

It sounds quite doable for Jaguar, which I think exists in soi and finfet (Xbox 1X) form already. A pair of such little cores assisting Stoney++ (28nm or 22FDX) would greatly improve the lowest p-states wattage (during near idle loads) and boost its multithread to decent levels. Or for a finfet project, such a pair would allow a two zen core APU to power down one of its big cores and hand off the threads whenever entering the lowest p-state for a duration. (It would also boost MT by almost 20% , raise the total threads to 6, and allow marketing to consider it a quadcore.)

Assistance from little core on-demand coarse grain multithread would be the way to maximize battery life for any 15W and below laptop. I think a Stoney++ (2+2 quadcore) test run would be pretty cheap and simple project, as ~99% of the building blocks already exist on 28nm.
 
Last edited:

Shivansps

Platinum Member
Sep 11, 2013
2,594
502
126
While Larry is right that MT scaling is bad, it is a lot faster than older FM2 cpus. At least acording to Cinebench.


Compared to MT scores of older FM2 cpus



Not only the CPU is faster, but the IGP is bigger as well.

Should get the 7680 instead. That one is 845 with igpu which in my experience should be more than enough as HTPC and browsing machine.
That one is around the 200GE price, there is no point.
 
Last edited:

amd6502

Senior member
Apr 21, 2017
634
152
86
While Larry is right that MT scaling is bad, it is a lot faster than older FM2 cpus. At least acording to Cinebench.

Compared to MT scores of older FM2 cpus

Not only the CPU is faster, but the IGP is bigger as well.

Should get the 7680 instead. That one is 845 with igpu which in my experience should be more than enough as HTPC and browsing machine.
That one is around the 200GE price, there is no point.
Unless you're building on FM2+ with matching components on hand such as memory. The Athlon 200 series are limited strictly to AM4 DDR4 builds, and its GPU is also quite a bit less suited for gaming than the A8's. So the 7680 is priced decently, allowing you to bypass a low end $60+ dGPU.

The MT scaling for multiple threads of pure (double precision) floats code would be equivalent to SMT scaling. For mixed loads or pure INTs however, the scaling of CMT is very high (especially in post Piledriver with the doubled up front end).

The trouble with synthetic benches is that they rarely (if ever) give mixed loads, which are very common in real life. They will test some category, first in single thread, then in multithread. Stuff like that rarely happens in real life.
 

Shivansps

Platinum Member
Sep 11, 2013
2,594
502
126
Unless you're building on FM2+ with matching components on hand such as memory. The Athlon 200 series are limited strictly to AM4 DDR4 builds, and its GPU is also quite a bit less suited for gaming than the A8's. So the 7680 is priced decently, allowing you to bypass a low end $60+ dGPU.

The MT scaling for multiple threads of pure (double precision) floats code would be equivalent to SMT scaling. For mixed loads or pure INTs however, the scaling of CMT is very high (especially in post Piledriver with the doubled up front end).

The trouble with synthetic benches is that they rarely (if ever) give mixed loads, which are very common in real life. They will test some category, first in single thread, then in multithread. Stuff like that rarely happens in real life.
The 200GE is a better buy than the A8-9600, i dont see how the A8-7680 could be any diferent, you losse GPU power, but it is aceptable compromise comparing what you win with Am4 and a 200GE.

From the little gaming tests i did on the A6-7460, i can say right now that the IGPU is a bit faster than Vega 3., even with DDR3-1600. BUT it is bottlenecked by the CPU even the more simple games ive tyied to run the CPU just went to 100%, i think W10 is not helping.
No doubt the A8-7680 will be faster than the 200GE, IN SOME GAMES, but as i say, it just not worth it. Specially with DDR4 prices returning to normal. It is a valid upgrade to anyone having a dual core FM2, trought.
 

Abwx

Diamond Member
Apr 2, 2011
9,037
789
126
.
While Larry is right that MT scaling is bad, it is a lot faster than older FM2 cpus. At least acording to Cinebench.


.
If thoses numbers are accurate scaling is 68% since ST is at 3.8GHz and MT at 3.5, besides it s barely 25% with Zen/SKL with CB R20 while it is 85% and 40% respectively with CB R15.
 

ao_ika_red

Golden Member
Aug 11, 2016
1,377
528
106
That one is around the 200GE price, there is no point.
If somebody was daft enough to purchase 7400k four years ago, which is a dual threads chip, I believe 7680 is still a good deal, considering that they only need to update their motherboard.
 

Shivansps

Platinum Member
Sep 11, 2013
2,594
502
126
If somebody was daft enough to purchase 7400k four years ago, which is a dual threads chip, I believe 7680 is still a good deal, considering that they only need to update their motherboard.
Thats the thing, now that ive been checking, AMD mistake here was launching these Carrizo APUs when almost all FM2+ motherboards are EOL, there is no bios for most motheboards out there. OEMs are upgrading boards that are still in support, that excludes petty much every chipset BUT the A68H, and in some cases not even that, for example Gigabyte has a GA-F2A68HM-H rev 1.0 and a rev 1.1, only the 1.1 has Carrizo APU support. They also have several other A68H boards with no support.

As things are, these are no upgrades, but intended to serve in the low end market for 3rd world countries were they are still selling FM2+, kinda a pseudo AM1 replacement.
 

amd6502

Senior member
Apr 21, 2017
634
152
86
The 200GE is a better buy than the A8-9600, i dont see how the A8-7680 could be any diferent, you losse GPU power, but it is aceptable compromise comparing what you win with Am4 and a 200GE.
That's going to depend from person to person.

If you're doing a build completely from scratch, then yes, for most people the Athlon (or 2200g for those wanting a real GPU) would be a better choice. I mean heck, in that situation, going with the AM4 platformis almost a no brainer choice, because the newer platform has a great upgrade path and because there are more choices to pick from. That's a big assumption however that doesn't hold true for many. And as long as there is surplus DDR3 and DDR4 stays above $35 per 8GB there will be people for whom fm2+ makes sense.

If somebody was daft enough to purchase 7400k four years ago, which is a dual threads chip, I believe 7680 is still a good deal, considering that they only need to update their motherboard.
Or an A4-7300 (A4-6300 rebrand), or 5300. The 7400k is not too bad (the extra decoder helps those dual cores nicely); it's about the same performance as the new A6, but just runs hotter. Porting those new Bristol Ridges to fm2+ and extending support is a very nice thing to do.

The 7400k GPU should be overclockable to match Vega 3. The new A6 (equiv to A6-9500) should noticably exceed Vega 3; these A6 have 384SPs at 1GHz. Some tests I just found on u tube:

( Clearly some of those games were not meant for running on 2c/2t, making the A8 or Athlon a better fit. )
 
Last edited:
  • Like
Reactions: Abwx

ao_ika_red

Golden Member
Aug 11, 2016
1,377
528
106
I once had that GA-F2A68HM-DS2 which got Carrizo APU support but I traded it for A88X board because it has better VRM cooling and Audio codec. A bit gutted, because I can't toy with this APU. As for older board support, in this neck of the wood, ASRock FM2A68HM DG3+ is the most common A68H board, followed by Asus A68HM-K and some ultra budget boards (like Maxsun, Colorful, Biostar, etc). And those boards (except for ultra budget boards) support Carrizo APU, it's just the matter whether the owner knew that there's a "new" APU for their board or not.
 
  • Like
Reactions: amd6502

Insert_Nickname

Diamond Member
May 6, 2012
3,628
395
126
As things are, these are no upgrades, but intended to serve in the low end market for 3rd world countries were they are still selling FM2+, kinda a pseudo AM1 replacement.
There is an additional use case for FM2+ systems, as I found out. Even if it's very niche, and will be somewhat uncommon.

Now, don't laugh, but Windows XP can run on the platform. Though with the Carrizo IGP you'd need a discrete card of some kind, that still has XP drivers available*. There are still a need for replacement XP systems in certain cases. Of course, these should be kept of the internet.

*F.x. GT710, or older spare card. Entusiasts usually have something in the spare parts pile.
 

NostaSeronx

Platinum Member
Sep 18, 2011
2,520
341
126
...did you miss the bit where SMT Zen replaced CMT Excavator in every use case?
Only a CMT design can adequately dethrone a CMT design. A 14LPP Bulldozer3(BD/PD=BD1, SR/XV=BD2) aimed at HPC would dominate a 14LPP Zen.

Premium wise;
Four FMACs(4 Adds/4 Muls/4 FMAs) vs two FMACs(2 Adds/2 Muls/2 FMAs)
Each thread gets their own 4 ALUs vs Each thread has to compete with a single set of 4 ALUs to use.
A huge branch predictor vs a size-efficient branch predictor.*
*(XV => The L1 BTB contains 128 sets of 6 ways for a total of 768 entries, while the L2 BTB has 1024 sets of 10 ways for a total of 10240 entries.
vs ZN => L0BTB holds 4 forward taken branches and 4 backward taken branches, and predicts with zero bubbles. L1BTB has 256 entries and creates one bubble if prediction differs from L0BTB. L2BTB has 4096 entries and creates four bubbles if its prediction differs from L1BTB.)
- Skylake-SP is more indicative of a 14LPP Bulldozer, where the L3 is a part of the module. The 40h-4Fh models of 15h is implied to use a mesh to get 8 modules in it. (I also, think the 20h-2Fh 5 module version had similar L3, but it was just a ring.)

A CMT module can re-use small cores to get a big module. If Bobcat/Jaguar has two ALUs, then that Bulldozer generation would have two ALUs per core. If Zen has four ALUs, then that HPC Bulldozer generation would have four ALUs per core. Add the function of having a FPU comparable to two Zen FPUs. Etc.

A SMT core can't compete with two cores in a CMT module, if all cores have the same general purpose computational power. Where the FPU would be doubled up to take care of allowing a single CMT core to use two SMT cores worth of FPU.

Zen does not replace Excavator, as it is not a CMT module. It doesn't even support backwards compatibility. It can't run XOP!
fyi, all my apps run XOP, so Zen can't replace every use case for me. (/sarcasm)

However is that the big HPC CMT core is most likely going to be released from Intel. They need a big core, that is efficient to accommodate the ever growing need for more execution ports. Which of course is bad for AMD's Zen2/Zen3.
 
Last edited:

amd6502

Senior member
Apr 21, 2017
634
152
86
Only a CMT design can adequately dethrone a CMT design. A 14LPP Bulldozer3(BD/PD=BD1, SR/XV=BD2) aimed at HPC would dominate a 14LPP Zen.

Premium wise;
Four FMACs(4 Adds/4 Muls/4 FMAs) vs two FMACs(2 Adds/2 Muls/2 FMAs)
Each thread gets their own 4 ALUs vs Each thread has to compete with a single set of 4 ALUs to use.
SMT is much more resource efficient; better utilization of execution units than CMT. If you want big threads that don't get penalized for a second thread you can have the option of toggling to an aSymmetric multithread mode (aSMT).

You could also fix big core CMT's worse utilization by having aSMT within a CMT core. So, a BD++ module would have two big threads, and two "small" threads. But it would be a more complex design than SMT/aSMT; so it doesn't seem right, because one of the main advantages of CMT was simpler design complexity.

They have a good thing going with this new generation. They should stick with it for the finfet products and HPC. Small die budget products is another thing. I think they should chase the 10W and below market with FDX products that are much more competitive than Stoney.


There is an additional use case for FM2+ systems, as I found out. Even if it's very niche, and will be somewhat uncommon.

Now, don't laugh, but Windows XP can run on the platform.
There are some great legacy apps (that mostly don't need the internet), so for many people, a dual boot linux + XP box actually has many advantages over a linux only box. Linux lacks certain apps that XP fills in nicely. In the linux world, I haven't come across a single user friendly and competent architectural CAD program, while on XP there are some very good choices. An fm2+ A6 (or an A8) would fit nicely here.


As things are, these are no upgrades, but intended to serve in the low end market for 3rd world countries were they are still selling FM2+, kinda a pseudo AM1 replacement.
Other than higher TDP and pricier boards (more VRM's, memory channels, etc) fm2+ should be a good replacement. There can be a lot of nice machines built out of these XV APUs and recycled but perfectly good components. The hardware has come along so far that it's not the limitation; usually, it's all in the software.
 
Last edited:

NTMBK

Diamond Member
Nov 14, 2011
8,404
1,244
126
Only a CMT design can adequately dethrone a CMT design. A 14LPP Bulldozer3(BD/PD=BD1, SR/XV=BD2) aimed at HPC would dominate a 14LPP Zen.
If only AMD had hundreds of talented engineers working for them, modelling that exact thing and actually determining the best design! Such a shame that they threw the wonderful space-heater CPU design in the skip, and had to choose an efficient SMT core instead. If only you'd been there to show them the error of their ways.
 
  • Like
Reactions: krumme

Shivansps

Platinum Member
Sep 11, 2013
2,594
502
126
There is an additional use case for FM2+ systems, as I found out. Even if it's very niche, and will be somewhat uncommon.

Now, don't laugh, but Windows XP can run on the platform. Though with the Carrizo IGP you'd need a discrete card of some kind, that still has XP drivers available*. There are still a need for replacement XP systems in certain cases. Of course, these should be kept of the internet.

*F.x. GT710, or older spare card. Entusiasts usually have something in the spare parts pile.
XP on Kaveri was tricky, i would not take it for granted that is going to work, it MAY work on IDE mode. But you are petty much right, there is some people that still needs XP, mainly because of running with ancient software, for those people im recomending the A4-6300, it works whiout issue out of the box.

The most important thing NOW, is these Carrizo APU seems to fully support Windows 7 x64, with both AHCI and IGP drivers.
 

Insert_Nickname

Diamond Member
May 6, 2012
3,628
395
126
XP on Kaveri was tricky, i would not take it for granted that is going to work, it MAY work on IDE mode.
It works just fine in IDE mode on the A68H chipset. Properbly would too in AHCI mode. But getting it to run in AHCI mode either involves slipstreaming the required AHCI drivers. Which are available for XP. Or doing some kind of floppy disk emulation*, since XP doesn't support other media during install. Floppy connectors aren't exactly common on newer boards. This is the biggest weakness with XP.

Anyway, AHCI doesn't help much when running from a HDD.

*Can't use a USB floppy drive before USB drivers are loaded. :(
 

SPBHM

Diamond Member
Sep 12, 2012
4,862
294
126
it's not ideal and you can easily destroy your XP install, but if you do the setup in legacy ide mode, and once it's done replace on device manager the generic controller driver with the AHCI one (it's critical to be 100% sure you have the right drivers) and on reboot set the bios to AHCI it works fine,
but if you get it wrong (wrong drivers) you are going to be in trouble.
 

Shivansps

Platinum Member
Sep 11, 2013
2,594
502
126
it's not ideal and you can easily destroy your XP install, but if you do the setup in legacy ide mode, and once it's done replace on device manager the generic controller driver with the AHCI one (it's critical to be 100% sure you have the right drivers) and on reboot set the bios to AHCI it works fine,
but if you get it wrong (wrong drivers) you are going to be in trouble.
Ive done that a few times, but instead of replacing the already installed disk controller ive added a new one manually, this way two disk controllers will be present and this is completely risk free, it works to move Win7 installs to other motherboard whiout having to generalice with sysprep.
 

NostaSeronx

Platinum Member
Sep 18, 2011
2,520
341
126
Such a shame that they threw the wonderful space-heater CPU design in the skip, and had to choose an efficient SMT core instead.
I see that you didn't get the post that CMT is more EPI efficient than SMT. Zen is the space-heater CPU, not a future next-gen Bulldozer core/module. The whole point of which Bulldozer over to bulk was for Bulldozer to reused Cat IP at lower migratory cost. A 14LPP Bulldozer would ultimately be reusing IP/assets from Zen. Basically, making it the cheapest HPC CMT design AMD built period.
SMT is much more resource efficient; better utilization of execution units than CMT.
The area where AMD's CMT is aimed at is not over using critical resources.

CMT in AMD's case gets all the SMT benefits from a double-sized front-end and a double-sized FPU. While, not creating a scenario of over-utilization, the example;

AMD CMT w/ 4 ALUs;
Both threads require the use of 4 GP ALUs => No slowdown and race to finish boosting.
AMD SMT w/ 4 ALUs;
Both threads require the use of 4 GP ALUs => Slowdown and drag to finish throttling.

The extra benefit of the SMT non-critical + CMT critical design is the possible future inclusion of CSMT in non-critical areas.

If the focus is HPC, all the above prefers CMT over SMT.

12FDX w/ CMT design will kill the anything in the ULP market. Since, 12FDX has the performance and power of a 7nm FinFET node. Which makes it the more viable successor to 28nm Excavator. There is also the migration of 14LPP to 12FDX being similar to 120nm CPP Tahiti and 130nm CPP Fiji. A 12FDX product can re-use memory compilers, logic libraries if available, some models, etc. ULP or HPC, the product would be the cheapest for AMD within the 7nm/14nm generation.
 
Last edited:

Insert_Nickname

Diamond Member
May 6, 2012
3,628
395
126
it's not ideal and you can easily destroy your XP install, but if you do the setup in legacy ide mode, and once it's done replace on device manager the generic controller driver with the AHCI one (it's critical to be 100% sure you have the right drivers) and on reboot set the bios to AHCI it works fine,
but if you get it wrong (wrong drivers) you are going to be in trouble.
Ive done that a few times, but instead of replacing the already installed disk controller ive added a new one manually, this way two disk controllers will be present and this is completely risk free, it works to move Win7 installs to other motherboard whiout having to generalice with sysprep.
You're both right of course. There are a few ways around that particular limitation, but since it's running on an old fashion (well, if you'd call a 2.5" 1TB WD Black "old fashion") HDD, it's not really worth the bother. AHCI benefits SSDs far more then regular HDDs.
 

amd6502

Senior member
Apr 21, 2017
634
152
86
The area where AMD's CMT is aimed at is not over using critical resources.

CMT in AMD's case gets all the SMT benefits from a double-sized front-end and a double-sized FPU. While, not creating a scenario of over-utilization, the example;

AMD CMT w/ 4 ALUs;
Both threads require the use of 4 GP ALUs => No slowdown and race to finish boosting.
AMD SMT w/ 4 ALUs;
Both threads require the use of 4 GP ALUs => Slowdown and drag to finish throttling.

The extra benefit of the SMT non-critical + CMT critical design is the possible future inclusion of CSMT in non-critical areas.

If the focus is HPC, all the above prefers CMT over SMT.
In that case adding an aSMT mode to Zen would be the simplest solution. The SMT design is done, and the only thing to add would be an asymmetric scheduler.
 

krumme

Diamond Member
Oct 9, 2009
5,786
1,390
136
Feeding a fat frontend and fpu in a long pipeline design sounds like a wonderfull idea for a chromebook 6w tdp design.
First thing that pops into the chief architects mind. /S
 

NostaSeronx

Platinum Member
Sep 18, 2011
2,520
341
126
Feeding a fat frontend and fpu in a long pipeline design sounds like a wonderfull idea for a chromebook 6w tdp design.
Fat front-end in this case is the same as Zen. The integer cores would basically be same as Zen without SMT tags and priority algorithms. The general purpose side is more easy to define because of changes from pre-Bulldozer CMP to Bulldozer CMT and to Steamroller/Excavator changes. The FPU would basically combine the 2x (2 FADD + 2 FMUL) into 4x FMA from Zen. It is possible to convert Zen as it re-uses from Bulldozer/Excavator IP. Which makes the conversion between the two threading styles cheaper if on the same node.

However, that is for HPC/FinFET and not the ideal candidate for a 12FDX design. The cheapest for the 12FDX is an overhaul migration of Excavator. At most, 7.5T is a shrink + 84CPP/56Mx is a shrink + Gate-first and SOI (more in relation to IP migrated from 14LPP/12LP) shrinks + Continuous-RX or Single Diffusion Break shrinks to cells + Body-bias is also capable of shrinking area, etc. There is also any changes to make it go from HPC to ULP. For example, each core getting a L0i can mean that the L1i is shrunk. Similar to how 16 KB L1d to 32 KB L1d reduced L2 from 2 MB to 1 MB. Overall, the goal out of this is to get lower EPI and lower area. Both of which for 12FDX means at most an easy acquisition of 5 GHz clocks in 6W. As 28nm = 2 GHz peak, 14nm/22nm(CNRX+FBB) = 3 GHz peak, 7nm(HPC)/12nm(CNRX+FBB) = 5 GHz peak. Lower gate capacitance from FDSOI compared to bulk/FinFET, reduced wire resistance from less lengthy wires compared to 32nm/28nm, superior boost scaling from Vt modulation by body bias(ABS; adaptive body-bias scaling), etc.

Late edit for the image;
 
Last edited:

ASK THE COMMUNITY