Zhaoxin's ZX-F/KX-7000/KH-40000 and beyond

Page 16 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

NostaSeronx

Diamond Member
Sep 18, 2011
3,683
1,218
136
Don't forget custom ARM/RISC-V efforts...

Phytium's

CAS's

Alibaba's/T-head's

Bottom two being open-source means less risk in implementing them in custom SoCs. Not including the China's SiFive (StarFive) which can license SiFive cores to customers in China.

RISC-V has the benefit of somewhat having implementations exploring GPGPU implementations: ThinkSilicon's RISC-V NEOX;
"(If) the main CPU is also RISC-V based, it is possible to dynamically off-load the main CPU of some tasks making some of the GPU cores appear as additional system cores"
Basically, offloading VPU tasks to the GPU with homogeneous instruction-set(RVV extentsion when RVVX/RVVG isn't in use). If the workload isn't GPU-related or needed, then get extra deep VPU little-cores.

Phytium's D2000 can already do some x86/x86-64 tasks via Box86(~80% native)/Box64(~90% native)+wine.

In general, Zhaoxin is largely limited to Linux anyway. With most optimizations going towards Linux/Android for the Zhaoxin first post-S3 DX12 graphics. Which I believe is the goal going forward; to get Zhaoxin into x86/ARM handhelds, so VIA can re-brand them and distribute them.
 
Last edited:

Kosusko

Member
Nov 10, 2019
161
119
116
I really don't understand why you keep posting about these. They are much slower than anything else out there.

Are you are marketing salesman for them ?
Why can't I just be a fan of Centaur Technology (formerly Cyrix) processors?
Is it so hard to understand for moderator anandtech?
It's not all about Intel and AMD.
Please respect my otherness.
Well thank you.

P.S. in the summer I gave an interview for the Czech medium. Maybe it will say something about me.
source: https://www.lupa.cz/clanky/cinske-x...ou-ale-dostacujici-rika-muz-ktery-je-testuje/
 
  • Like
Reactions: lightmanek

eek2121

Platinum Member
Aug 2, 2005
2,883
3,859
136
I addressed that directly. Zhaoxin more-or-less did what you suggested, though it's unclear how much money has been committed to their effort to date. What is clear is that they have not succeeded in producing a competitive CPU. There is the very real risk that anyone who attempted to buy out Zhaoxin would meet the same fate, no matter how many financial resources they committed to their new company. The inherent value of owning an x86 license today isn't so great that spending billions on a design team, facilities, and wafer commitments from a major fab like Samsung (you would be on a waiting list at TSMC) makes any sense.

Can you imagine if IBM decided to give it a shot? 😂
 

DrMrLordX

Lifer
Apr 27, 2000
21,570
10,762
136
Can you imagine if IBM decided to give it a shot? 😂

That would be weird. And ironically I don't think they have an x86 license, despite being the same company that brought x86 PCs to the masses so long ago. I still fantasize about someone creating an OpenPOWER-based console, though I think OpenCAPI is dead in the water by this point. And, of course, OpenPOWER wouldn't have anything to do with the x86 market.
 

NTMBK

Lifer
Nov 14, 2011
10,207
4,939
136
That would be weird. And ironically I don't think they have an x86 license, despite being the same company that brought x86 PCs to the masses so long ago. I still fantasize about someone creating an OpenPOWER-based console, though I think OpenCAPI is dead in the water by this point. And, of course, OpenPOWER wouldn't have anything to do with the x86 market.

IBM had a license back in the day, they used to manufacture their own versions of 486s and the Cyrix 5x86. Not sure how that would work today though, as they never made anything 64-bit... I doubt they have the rights to make an AMD64 chip.

OpenPOWER console would be weird but cool. IBM only open sourced the instruction set, not any CPU designs, right? So it's not like people could just go fab a POWER8 derivative?
 

DrMrLordX

Lifer
Apr 27, 2000
21,570
10,762
136
IBM had a license back in the day, they used to manufacture their own versions of 486s and the Cyrix 5x86. Not sure how that would work today though, as they never made anything 64-bit... I doubt they have the rights to make an AMD64 chip.

Hmm. AMD has a cross-licensing agreement with Intel. Do they have one with IBM? Or VIA/Zhaoxin?

OpenPOWER console would be weird but cool. IBM only open sourced the instruction set, not any CPU designs, right? So it's not like people could just go fab a POWER8 derivative?

There are OpenPOWER systems out there with POWER9 CPUs in them, but I'm pretty sure the actual CPUs are sourced from IBM sooooo it's still technically an IBM machine. For a console, I don't think that would work very well, though. Power consumption would be too high etc. Someone would have to do a custom design. There is this thing:


But an SoC fabricated on a 180nm node is not going to be in any way suitable for consumer hardware in 2021. I had thought the PinePhone Pro would have a POWER-based SoC in it, but that never happened. Instead it has a Rockchip ARM CPU in it (the RK3399s).

Which kind of brings us full-circle back to the main topic we were discussing: initiating a new CPU design effort is really difficult and terrifically expensive. If anyone thinks Zhaoxin's CPUs are bad . . . you should try a 180nm POWER CPU instead!
 
Last edited:
  • Like
Reactions: amd6502

NTMBK

Lifer
Nov 14, 2011
10,207
4,939
136
Hmm. AMD has a cross-licensing agreement with Intel. Do they have one with IBM? Or VIA/Zhaoxin?



There are OpenPOWER systems out there with POWER9 CPUs in them, but I'm pretty sure the actual CPUs are sourced from IBM sooooo it's still technically an IBM machine. For a console, I don't think that would work very well, though. Power consumption would be too high etc. Someone would have to do a custom design. There is this thing:


But an SoC fabricated on a 180nm node is not going to be in any way suitable for consumer hardware in 2021. I had thought the PinePhone Pro would have a POWER-based SoC in it, but that never happened. Instead it has a Rockchip ARM CPU in it (the RK3399s).

Which kind of brings us full-circle back to the main topic we were discussing: initiating a new CPU design effort is really difficult and terrifically expensive. If anyone thinks Zhaoxin's CPUs are bad . . . you should try a 180nm POWER CPU instead!

Oh, I managed to forget about the thing I posted a thread about! https://forums.anandtech.com/threads/ibm-open-sources-a2o-cpu-core.2585193/ They open sourced an obscure core called the A2O, which apparently never actually made it into any shipping products. But that isn't actually compatible with the OpenPower ISA (it supports an older ISA version), and still needs updates to be compatible. The Github project looks... not lively.
 

The Hardcard

Member
Oct 19, 2021
46
38
51
IBM had a license back in the day, they used to manufacture their own versions of 486s and the Cyrix 5x86. Not sure how that would work today though, as they never made anything 64-bit... I doubt they have the rights to make an AMD64 chip.

OpenPOWER console would be weird but cool. IBM only open sourced the instruction set, not any CPU designs, right? So it's not like people could just go fab a POWER8 derivative?
There is a core open sourced, but not from that Power line. I forgot the actual core, but I think it is a Blue Gene or whatever the name derivative.

EDIT: Posted again before I finished reading.
 
  • Like
Reactions: Tlh97 and NTMBK

Kosusko

Member
Nov 10, 2019
161
119
116
Through image analysis, the number of base pins LGA KH-40000 1988 pins, which is close to the pins Xeon LGA 2011 and it is estimated to be a four-channel memory. I assume that the KH-40000 Series is built on Centaur CNS microarchitecture.
 

amd6502

Senior member
Apr 21, 2017
971
360
136
Taiwan's VIA has caught up with 0th-generation dozer, x86 wise, but combined with with some machine learning coprocessor. Somebody on HN linked this recent Tom's article on the newest (but not so new) Centaur.


The engineering sample had a power draw significantly lower than what an 8c BD would draw, ~65 watts, which is a decent improvement over what a high bin Piledriver (eg 8300 FX @95W) would draw. Not bad, but probably not great compared to any modern AMD or Intel server or high end mobile product.

I think Intel snagged the Centaur engineering team, so I'm not sure what's left of Centaur for VIA.
 
Last edited:

DrMrLordX

Lifer
Apr 27, 2000
21,570
10,762
136
I think Intel snagged the Centaur engineering team, so I'm not sure what's left of Centaur for VIA.

They hired them, but it's not clear what they're actually doing, or how relevant they were to further development of products for Zhaoxin.
 

NTMBK

Lifer
Nov 14, 2011
10,207
4,939
136
Taiwan's VIA has caught up with 0th-generation dozer, x86 wise, but combined with with some machine learning coprocessor. Somebody on HN linked this recent Tom's article on the newest (but not so new) Centaur.


The engineering sample had a power draw significantly lower than what an 8c BD would draw, ~65 watts, which is a decent improvement over what a high bin Piledriver (eg 8300 FX @95W) would draw. Not bad, but probably not great compared to any modern AMD or Intel server or high end mobile product.

I think Intel snagged the Centaur engineering team, so I'm not sure what's left of Centaur for VIA.

You'd hope that it is more efficient than Piledriver, considering that it's 10 years old- and was inefficient even at the time.

Looks like that chip was a bit of a failure. Worse single thread performance than Bulldozer, while running at 65W. I can see why VIA let the team go to Intel- hopefully with Intel support and funding they'll be able to achieve something better.
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,683
1,218
136
So far, Zhaoxin is either:
1. Utilizing a modified CNS core
2. Actually decided to make their own core

8-core at ~65W (CHA A1/A2 die)
to
4-core+next DX12 Integrated GPGPU at ~15W, seems like a big leap in power-efficiency given same clock of >2.0 GHz.

<3 GHz/KX-6000(8-core+IGP)=65W TDP & 2.2 GHz/CHA(8-core+No IGP)=65W TDP to >2 GHz/KX-7000(4-core+NxtIGP)=15W

In this case, from specs:
KX-6640MA-25W to KX-7000-15W at the same process with processors and a GPU that can do more(higher IPC).
 
Last edited:

amd6502

Senior member
Apr 21, 2017
971
360
136
You'd hope that it is more efficient than Piledriver, considering that it's 10 years old- and was inefficient even at the time.

Looks like that chip was a bit of a failure. Worse single thread performance than Bulldozer, while running at 65W. I can see why VIA let the team go to Intel- hopefully with Intel support and funding they'll be able to achieve something better.

An 8c Piledriver system (or higher corecount Opteron server) gets pretty good Perf/Watts between 1.5 - 2.5 GHz. Under 3 GHz is not bad either. The CMT nature (very good almost 2x scaling in real life load mixes) makes my FX-8300 and 8350 perform way better at multitasking than my 4c/8t 2500u. On my FX's I throw 6-7 BOINC tasks at it and often/usually cannot even notice a slowdown during day to day tasks (I cannot do the same on my 2500u, which is why I don't even bother to use it for distributed computing).

These sort of characteristics could make a difference for some server applications.

I'm guessing the Centaur chip is monothreading, so it will have similar to better characteristics than CMT Opterons.

I think it's premature to declare this Centaur chip a failure based on just these benchmark results; it could have found a niche in its very specialized segment of that market for neural network applications. Maybe in real life it was a failure, just because this niche was incredibly small at the time.
 

Thunder 57

Platinum Member
Aug 19, 2007
2,640
3,697
136
An 8c Piledriver system (or higher corecount Opteron server) gets pretty good Perf/Watts between 1.5 - 2.5 GHz. Under 3 GHz is not bad either. The CMT nature (very good almost 2x scaling in real life load mixes) makes my FX-8300 and 8350 perform way better at multitasking than my 4c/8t 2500u. On my FX's I throw 6-7 BOINC tasks at it and often/usually cannot even notice a slowdown during day to day tasks (I cannot do the same on my 2500u, which is why I don't even bother to use it for distributed computing).

These sort of characteristics could make a difference for some server applications.

I'm guessing the Centaur chip is monothreading, so it will have similar to better characteristics than CMT Opterons.

I think it's premature to declare this Centaur chip a failure based on just these benchmark results; it could have found a niche in its very specialized segment of that market for neural network applications. Maybe in real life it was a failure, just because this niche was incredibly small at the time.

Why would expect a 15W laptop CPU to be good at DC? I find it hard to believe that near 2X scaling is possible and that would be very workload dependent. Even in a pure integer workload you are sharing resources including L2 cache.
 

amd6502

Senior member
Apr 21, 2017
971
360
136
Why would expect a 15W laptop CPU to be good at DC? I find it hard to believe that near 2X scaling is possible and that would be very workload dependent. Even in a pure integer workload you are sharing resources including L2 cache.

I've never done any mearsurements but am just going by perceived responsiveness during my day to day computer usage, as well as my estimate that CMT is very similar to monothreading, especially in natural/normal mixed loads. Typically at least one thread among a pair of threads is not all purely FPU code. As far as cache sharing, usually halving the L2 cache does not make a hugely dramatic impact, especially with a nice big L3 backing it.

For A-series APU's you may see more of the L2 effect you mention, though I wonder if it will be that dramatic. I did notice, on my A series laptops, the thing would more than occasionally bog and seem limited by its memory controller, when running 3 BOINC threads; even on A-series desktops I was less happy with the responsiveness when loading them with 3 threads. My workaround was to taskset the BOINC threads on frequency limited cores. This became much simpler when linux refined its kernel and now is almost as effective if its frequency governor flag /sys/devices/system/cpu/cpufreq/ondemand/ignore_nice_load is set to 1. ( So once you run these boinc tasks at or below frequencies that are @ peak perf/watt the memory controller can keep up just fine again. )

I think Atenra measured around 1.8 scaling, and I assume this is with synthetic benchmarks and loads (you may have to use wayback archive to recover some of the missing Atenra benchmark links). Natural mixed loads should do better, so my guess is the scaling might be very often in the region of 1.9 plus or minus 0.05; in any case, it's a far cry from some of the poor SMT scaling seen in 2nd-3rd gen core-series (This example, 1.33 is unfair as it's two pure synthetic loads loaded with floats, where CMT would not do well either. So for such odd cases, CMT no longer is much like monothreading. The thing is, this doesn't happen very often, if at all, for household type usage).
 
Last edited:

Thunder 57

Platinum Member
Aug 19, 2007
2,640
3,697
136
I've never done any mearsurements but am just going by perceived responsiveness during my day to day computer usage, as well as my estimate that CMT is very similar to monothreading, especially in natural/normal mixed loads. Typically at least one thread among a pair of threads is not all purely FPU code. As far as cache sharing, usually halving the L2 cache does not make a hugely dramatic impact, especially with a nice big L3 backing it.

For A-series APU's you may see more of the L2 effect you mention, though I wonder if it will be that dramatic. I did notice, on my A series laptops, the thing would more than occasionally bog and seem limited by its memory controller, when running 3 BOINC threads; even on A-series desktops I was less happy with the responsiveness when loading them with 3 threads. My workaround was to taskset the BOINC threads on frequency limited cores. This became much simpler when linux refined its kernel and now is almost as effective if its frequency governor flag /sys/devices/system/cpu/cpufreq/ondemand/ignore_nice_load is set to 1. ( So once you run these boinc tasks at or below frequencies that are @ peak perf/watt the memory controller can keep up just fine again. )

I think Atenra measured around 1.8 scaling, and I assume this is with synthetic benchmarks and loads (you may have to use wayback archive to recover some of the missing Atenra benchmark links). Natural mixed loads should do better, so my guess is the scaling might be very often in the region of 1.9 plus or minus 0.05; in any case, it's a far cry from some of the poor SMT scaling seen in 2nd-3rd gen core-series (This example, 1.33 is unfair as it's two pure synthetic loads loaded with floats, where CMT would not do well either. So for such odd cases, CMT no longer is much like monothreading. The thing is, this doesn't happen very often, if at all, for household type usage).

Prior to Zen, AMD wasn't very good at cache. I can't really speak to the pre K7 years, but with only a few exceptions did AMD have good cache. K7/8/10 L1 were good. K10 L3 was pretty good. I don't know if they ever had a standout L2 before Zen.

BD has horrible cache all around. They couldn't make the L2 fast enough so they oversized it. L3 was stupidly slow and IIRC even AMD said it would really only be useful in servers. They also stuck to an exclusive L2 for far too long.

Whoever was responsible for Zen's cache structure did a great job. They finally added a uop cache. They redid the whole thing basically.
 
  • Like
Reactions: Tlh97

NostaSeronx

Diamond Member
Sep 18, 2011
3,683
1,218
136
Prior to Zen, AMD wasn't very good at cache. I can't really speak to the pre K7 years, but with only a few exceptions did AMD have good cache. K7/8/10 L1 were good. K10 L3 was pretty good. I don't know if they ever had a standout L2 before Zen.
Stop calling Family 10h with the K10 moniker.

K9-65nm is four issue:
P0: 1 ALU / 1 AGU
P1: 1 ALU / 1 AGU
P2: 1 ALU / 1 AGU
P3: 1 ALU

K10-45nm takes P0 and P1 of K9 and replicates them and widens issue width to four:
P0: 1 ALU
P1: 1 ALU
P2: 1 AGU
P3: 1 AGU
Second iteration:
P4: 1 ALU
P5: 1 ALU
P6: 1 AGU
P7: 1 AGU

On the google group leak:
K9 had SMT later on, which was developed into K10's CMT. Which later developed into Bulldozer's lackluster incomplete Clustered Architecture/CMT implementation.

K9 = Monolithic 4-issue width
K10 = Clustered 8-issue width

SMT/CMT additions were to solve the 3rd/4th ALU/AGU problem AMD had in their architectures. While K9 couldn't gate off the second half, K10 could.

K10 however also has an implementation that is just K9. So, it is very important to realize K10-1 => K9 on 45nm(SMT-addition) and K10-2 = K10/Early Bulldozer on 45nm(CMT-redux of SMT). While Family 10h was never K10 but rather K8L's IPC enhanced cores, not to be confused with K8G's LP-enhanced cores in Griffin.
K8L was officially renamed to Greyhound =>
2006 CAD Program Management
Provided CAD tool and flow program management, including estimation, scheduling, tracking of both construction and analysis flow enhancements for AMD 65nm/45nm 'greyhound core' CPU designs.
Which officially killed off the K-naming and retroactively renames all cores before it:
K7 => Argon
K8 => Hammer => 0Fh Family
K8-128-bit => Greyhound => 10h Family
K9 => B-something, first heavy machine core => Open64-NDA has K9 and 15h Family sharing optimization version space.
K10 => Bulldozer (could refer to K9-SMT or actual K10-CMT for extra confusion just like K8L as K10 when Phenom is 10h and not K10) => 15h Family
K11 => Bobcat => Family 14h.
 
Last edited:

Thunder 57

Platinum Member
Aug 19, 2007
2,640
3,697
136
Stop calling Family 10h with the K10 moniker.

K9-65nm is four issue:
P0: 1 ALU / 1 AGU
P1: 1 ALU / 1 AGU
P2: 1 ALU / 1 AGU
P3: 1 ALU

K10-45nm takes P0 and P1 of K9 and replicates them and widens issue width to four:
P0: 1 ALU
P1: 1 ALU
P2: 1 AGU
P3: 1 AGU
Second iteration:
P4: 1 ALU
P5: 1 ALU
P6: 1 AGU
P7: 1 AGU

On the google group leak:
K9 had SMT later on, which was developed into K10's CMT. Which later developed into Bulldozer's lackluster incomplete Clustered Architecture/CMT implementation.

K9 = Monolithic 4-issue width
K10 = Clustered 8-issue width

SMT/CMT additions were to solve the 3rd/4th ALU/AGU problem AMD had in their architectures. While K9 couldn't gate off the second half, K10 could.

K10 however also has an implementation that is just K9. So, it is very important to realize K10-1 => K9 on 45nm(SMT-addition) and K10-2 = K10/Early Bulldozer on 45nm(CMT-redux of SMT). While Family 10h was never K10 but rather K8L's IPC enhanced cores, not to be confused with K8G's LP-enhanced cores in Griffin.
K8L was officially renamed to Greyhound =>
2006 CAD Program Management
Provided CAD tool and flow program management, including estimation, scheduling, tracking of both construction and analysis flow enhancements for AMD 65nm/45nm 'greyhound core' CPU designs.
Which officially killed off the K-naming and retroactively renames all cores before it:
K7 => Argon
K8 => Hammer => 0Fh Family
K8-128-bit => Greyhound => 10h Family
K9 => B-something, first heavy machine core => Open64-NDA has K9 and 15h Family sharing optimization version space.
K10 => Bulldozer (could refer to K9-SMT or actual K10-CMT for extra confusion just like K8L as K10 when Phenom is 10h and not K10) => 15h Family
K11 => Bobcat => Family 14h.

I'll call it whatever I think people will recognize it by. If I started calling K8 0Fh Family, people would have to look that up. Sort of like when trying to figure out what the hell Tunnelborer is.
 

amd6502

Senior member
Apr 21, 2017
971
360
136
K11 => Bobcat => Family 14h.

amd-bobcat.png


Bobcat was pretty interesting. Again, as a zeroth generation product, it was very rough, dirty, and unbalanced. But it seemed to have set a good basis to start the Cat cores. Jaguar 4c APU was such a great jump (and one of the things done, having the cores share the L2 was maybe Bulldozer influenced)


BD has horrible cache all around. They couldn't make the L2 fast enough so they oversized it. L3 was stupidly slow and IIRC even AMD said it would really only be useful in servers.

I don't have much experience with BD (many years ago i briefly tried a 6100 to see how bad it really was). I'm not sure if major cache improvements were made on PD's L3 or whether it's basically the same as BD's---if I had to guess, it'd be the latter.

I do suspect though that the L3 plays a huge role once you start getting a bit more than 4 threads on an FX-8000 series (especially if some of those threads are memory access intensive). This is for the reason I mentioned in my last post, with the suspected memory controller bottleneck observed on the 4c APU (which has no L3) when loading with memory intensive distributed computing tasks. With two threads per module, the L2 often won't be big enough to fit what it ideally would, and the L3 is there to make that penalty much lower than if it had to go through RAM. (Sure, the L3 was slow, but still much faster than RAM, and it also served to make that very busy memory controller less busy).

The above sort of loads were not very typical household loads around the time that Piledriver launched, but would over the years (around almost the time the FX series was phased out from sale) become pretty common as browsers and games started to heavily multithread.

The criticism that the L2 was a bit slow seems valid. And this is one (of a few) reason(s) why why the 3rd gen (Excavator) APU's had a big performance bump (they used a new L2 cache handed down from the Zen team).
 
Last edited:

NostaSeronx

Diamond Member
Sep 18, 2011
3,683
1,218
136
I'll call it whatever I think people will recognize it by. If I started calling K8 0Fh Family, people would have to look that up. Sort of like when trying to figure out what the hell Tunnelborer is.
Tunnelborer is a later generation Family 15h Model A0h-FFh processor. That followed the trend of fusing the integer core aspect and improving the FPU.

Bulldozer/Piledriver =
Integer Datapath0Integer Scheduler/Rename0Integer Scheduler/Rename1Integer Datapath1
PRF to PRF length is practically full-length Module-wall to Module-opp-wall.
bulldozerpiledriver.png

Steamroller/Excavator =
Integer Scheduler/Rename0Integer Scheduler/Rename1
Integer Datapath0Integer Datapath1
PRF to PRF length is 2-wide decode-side to 2-wide decode-opp-side.
40h-4Fh server part has 8 modules
Diagram example for SR/XV is the combination of PNGs above and below; shown in second image: https://www.anandtech.com/show/9319...leap-of-efficiency-and-architecture-updates/3

1st Iteration of 3rd Gen Fam 15h => Distributed PRF between both cores -> As both cores at 20nm post-Excavator returned initial Clustered Architecture capability.
2nd Iteration of 3rd Gen Fam 15h => Add arithmetic operations to AGLUs converting to Address ALUs -> 14nm/10nm Iteration <-- Tunnelborer.
Next server part would have been able to use Modules as 8-cores(~>4 IPC workloads) or 16-cores(~<2 IPC workloads), custom fit like SMT4/SMT8 POWER9/10.
tunnelborer.png

However, AMD swapped LP/HP cores: Bobcat(LP) -> Jaguar(LP) -> Zen(HP)
Bulldozer(HP) -> Steamroller(HP) -> Tunnelborer(LP) <-- No longer its name.

Zen(x); x = any integer above 0 = High Performance to Low-power // Performance to Value
ULP(x); x = any integer above 0 = Low-power to Ultra-low-power // Value to Pervasive

---
Zhaoxin has replaced Centaur at VIA. There is potential for Zhaoxin to introduce their own core rather than CNS-redux.
CPU IP from VIA, Mobo IP from VIA, and full support of VIA going forward. VIA has made more money from Zhaoxin than with Centaur.

Centaur if eye-balling USA => ~280 million users on internet
Zhaoxin if eye-balling China => ~970 million users on internet, with plans for ~1.3 billion by 2026
 
Last edited:
  • Like
Reactions: Tlh97 and amd6502

amd6502

Senior member
Apr 21, 2017
971
360
136
Yup, this Centaur 8c is already more than enough performance for home users, if run at ~40w in an SFF or AiO desktop. They could use these cores, put it together with a semi decent iGPU or GPU on chipset, and it'd be an alternative to western X86 or ARM SoCs.

Sure, Chinese kids that are more into gaming or power users will still stick with AMD or Apple.

And there's also an export market in China's next door countries. India, and SE Asia. For most of these, they'll probably want to improve power efficiency first.
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,683
1,218
136
New Zhaoxin's CPU architecture name is "Yongfeng". It is family 7 model 91 (0x5b).

CNS is family 6.
Lujiazui/kx-6000 is family 7.
WuDaoKou/kx-5000 is family 7.
Zx-c/isaiah is family 6.
So,
family 6 = small cores,
family 7 = big cores?
Family 6 = Centaur
Family 7 = Zhaoxin

ZhangJiang => Family 6, Model 0F(Zhaoxin/Centaur) => adds Padlock SM3/SM4 instructions
CNS => Family 6, Model 47(Centaur)

Developed from ZhangJiang
WuDaoKou => Family 7, Model 1B (internal ZX core)
LuJiaZui => Family 7, Model 3B (2nd internal ZX core)

CNS-redux or 3rd redux of ZhangJiang:
YongFeng => Family 7, Model 5B (3rd internal ZX core)

Zhaoxin's evolution before YongFeng (which optimization path is unknown currently) is aimed at smaller cores at higher GHz.

The more interesting thing is the node potentially used. As TSMC has been delisted from Gov contracts. Which means majority of the orders need to come from HLMC's 14nm or SMIC's 14nm/12nm and not TSMC's 16nm.

HLMC = 2020 ramp-up for 14nm
SMIC = Q4 2019 ramp-up for 14nm

Both of which are currently using FinFET transistor optimizations from Shanghai Integrated Circuit R&D Center, which is located in Zhangjiang Hi-Tech Park, Shanghai, adjacent to HLMC and SMIC.
 
Last edited:
  • Like
Reactions: prosty_mirek

prosty_mirek

Junior Member
Nov 1, 2020
13
4
51
Family 6 = Centaur
Family 7 = Zhaoxin

ZhangJiang => Family 6, Model 0F(Zhaoxin/Centaur) => adds Padlock SM3/SM4 instructions
CNS => Family 6, Model 47(Centaur)

Developed from ZhangJiang
WuDaoKou => Family 7, Model 1B (internal ZX core)
LuJiaZui => Family 7, Model 3B (2nd internal ZX core)

CNS-redux or 3rd redux of ZhangJiang:
YongFeng => Family 7, Model 5B (3rd internal ZX core)

Zhaoxin's evolution before YongFeng (which optimization path is unknown currently) is aimed at smaller cores at higher GHz.

The more interesting thing is the node potentially used. As TSMC has been delisted from Gov contracts. Which means majority of the orders needs to come from HLMC's 14nm or SMIC's 14nm/12nm and not TSMC's 16nm.

HLMC = 2020 ramp-up for 14nm
SMIC = Q4 2019 ramp-up for 14nm

Both of which are currently using FinFET transistor optimizations from Shanghai Integrated Circuit R&D Center, which is located in Zhangjiang Hi-Tech Park, Shanghai, adjacent to HLMC and SMIC.

Zhaoxin have it's own CNS sample since 2019. Calls it family 6. Maybe because of ES/unchanged core, or it is a software/GB bug.
https://browser.geekbench.com/v5/cpu/526995.gb5
Zhaoxin's CNS clock was lower (2ghz) than Centaur's (desired 2,5ghz), which was already lower clocked than ZX-E, and rumour has it Zhaoxin wants 4ghz on new CPU. Wonder if this, and decision of unreleasing ZX-F, was because of problems with porting to a different process (16FFC TSMC's that Centaur do vs chinese 14nm).