Zhaoxin's ZX-F/KX-7000/KH-40000 and beyond

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

amd6502

Senior member
Apr 21, 2017
971
360
136
to Gideon:

Thank you, but please provide exact results. Not scaling.

IPC @ 2.0GHz as this:






or this


IPC @ 2.0GHz:


AMD FX-8300 3.3GHz
@ 2.00 GHz (65,25% )


vs



KX-7000 ZX-F OctaCore 2000MHz @ 2.00 GHz (+53,27% )



link: https://browser.geekbench.com/v5/cpu/compare/526995?baseline=562975


Why does GB5 then say the frequency of the 8300 is at 2.9ghz? You're saying it was run at 2ghz? The IPC looks impressive.

If you let the 8300 run at stock it still has a hard time keeping up sometimes

https://browser.geekbench.com/v5/cpu/compare/526995?baseline=498633

Seems like a wide architecture. High IPC, low frequency. Unfortunately I cannot find any details like block diagrams. https://en.wikichip.org/wiki/zhaoxin/microarchitectures/lujiazui https://en.wikichip.org/wiki/zhaoxin/microarchitectures/wudaokou
 
Last edited:

soresu

Platinum Member
Dec 19, 2014
2,662
1,862
136
R5 2500U = 2670 MT score / 3.5 / 4 = 190.714 MT points per core per Ghz
Er.... 2500U has a base clock far lower than its boost (2 Ghz in fact), I think you may be overlooking that and then some, comeback when you have a comparable score using only base clock.

Considering 2500U's base clock is the same as this KX-7000 sample, it should give a pretty exact low down on comparative performance.

Either way that score is comparing with Zen1 and not Zen2, so still not up to AMD performance even if it is accurate.
 

Kosusko

Member
Nov 10, 2019
161
120
116
Why does GB5 then say the frequency of the 8300 is at 2.9ghz? You're saying it was run at 2ghz? The IPC looks impressive.

If you let the 8300 run at stock it still has a hard time keeping up sometimes

https://browser.geekbench.com/v5/cpu/compare/526995?baseline=498633

Seems like a wide architecture. High IPC, low frequency. Unfortunately I cannot find any details like block diagrams. https://en.wikichip.org/wiki/zhaoxin/microarchitectures/lujiazui https://en.wikichip.org/wiki/zhaoxin/microarchitectures/wudaokou

If you want know real processor frequency, you must write behind result .gb5 https://browser.geekbench.com/v5/cpu/562975.gb5


BIOS AMD FX-8300
CPU Frequency Multipier: [x10.0 2000 MHz]
AMD Turbo Core Technology: [Disable]
Cool' n' Quiet: [Disable]

"processor_frequency": {
"frequencies": [
1981,
1983,
1958,
1980,
1983,
1980,
1984,
1994,
1967,
1982,
1979,
1995,
1997,
1999,
1998,
1997,
1998,
1992,
1998,
1994
]


source: https://browser.geekbench.com/v5/cpu/562975.gb5
source: https://browser.geekbench.com/v5/cpu/562975
 
  • Like
Reactions: amd6502

Gideon

Golden Member
Nov 27, 2007
1,637
3,672
136
Er.... 2500U has a base clock far lower than its boost (2 Ghz in fact), I think you may be overlooking that and then some, comeback when you have a comparable score using only base clock.

Considering 2500U's base clock is the same as this KX-7000 sample, it should give a pretty exact low down on comparative performance.

Either way that score is comparing with Zen1 and not Zen2, so still not up to AMD performance even if it is accurate.
No, I didn't use (some random) base clocks becaused I actually checked the frequencies the CPUs ran the test at. You can do that by addind ".gb5" at the end of the links, and it shows you the j info of the run in JSON format. Under "frequencies" you can see what clockspeed the CPU was running at. 2500u was actually running @ 3.5Ghz almost the entire test, while KX-7000 was at 1993 Mhz the entire test.

(2500U) here:
and KX-7000 here:

Obviously my results are approximations "calculated on-a-napkin", but just using base clocks would be highly more inaccurate.
 
  • Like
Reactions: amd6502

soresu

Platinum Member
Dec 19, 2014
2,662
1,862
136
Yeeeaaahhh....

I just looked up scores for the Ryzen 2700x and 3900x. (Single thread only)

Using that gb5 thing mentioned earlier I got the gist of the average clockspeed of both and derived what the KX-7000 score (using the linked 2 Ghz 469 score) would be if it scaled to the same frequencies:

Ryzen 2700X at 4.1 Ghz = 1052
KX-7000 at 4.1 Ghz = 961.45

9.418% advantage to Zen+.

Ryzen 3900X at 4.5 Ghz = 1317
KX-7000 at 4.5 Ghz = 1055.25

24.804% advantage to Zen2.

Obviously I'm assuming that the 469 score is from engineering samples, but I wouldn't expect a huge difference from that.

Of course, even this doesn't account for all the other things that matter, power consumption, latencies, heavy SIMD performance on codecs and such.

Then again, how much info do we have on VIA/Zhaoxin cores with the various Spectre/Meltdown security problems and their necessary performance degrading mitigations?

Assuming they aren't baked in of course, they may be more on the ball than Intel here.
 

Kosusko

Member
Nov 10, 2019
161
120
116
Last edited:

amd6502

Senior member
Apr 21, 2017
971
360
136
Well, that scaling is a bit of an approximation that will always favor the true lower clocking parts (this is because memory latency is a constant). However, even so, the high IPC says quite a bit. This is also why I think Zen3 and 4 will go a tiny bit wider yet. There is more ILP to extract yet for x86.
 

Kosusko

Member
Nov 10, 2019
161
120
116
to amd6502:

Yes it's true. But you don't see everything in that Centaur microarchitecture. Geekbench 5 doesn't see it. Mlperf machine-learning inference benchmark Yes. (10x faster than 8-core part).

We remember how FPU coprocessor changed (added) x86 microarchitecture. Or SIMD. As you remember VIA C3 was the first native x86 processor with embedded security features (PadLock Advanced Cryptography Engine "AES"; Random Number Generator "RNG" etc.) that enhance the protection of sensitive corporate and personal data.

As the demands in computing change, exciting challenges in machine learning, neural networking and Artificial Intelligence are at the forefront. And tomorrow is the time for the next step that will changing the future of server and scalable computing. Because. Their latest design includes an industry-unique Deep-Learning coprocessor that achieves high performance with minimum additional silicon cost.
 

DrMrLordX

Lifer
Apr 27, 2000
21,629
10,841
136
@soresu

Look up the older 1800x. Here's a stereotypical result for the 1800x, on a Biostar board of all things:


Note the average clockspeed seems to be around 3.8 GHz. Since downscaling performance is more reliable than upscaling, if we scale the results downward to 2 GHz we get:

ST: ~516
MT: ~3644

That's 9.8% faster in ST and 11.6% faster in MT than the VIA chip. They're still chasing Summit Ridge.

I remain skeptical of how integrating a machine-learning coprocessor is really going to improve the product. Clearly it isn't going to compete with heavy-duty machine learning clusters based on tons of GPUs. Is VIA trying to compete with lightweight machine-learning tasks, such as those popping up on mobile devices?
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,686
1,221
136
ZX-C = 2 GHz @ 18W TDP (Quad-core/3-issue)
ZX-D = 2 GHz @ 56W TDP (Octo-core/4-issue)
ZX-E = 3 GHz @ 56W TDP (Octo-core/Enhanced-width Media Ports-etc)
ZX-E @ 2 GHz = ~18W TDP do to 3x performance/power ratio.
ZX-F might also be another 3x performance/power ratio => 1.6 GHz ~ 2 GHz => 8-cores @ ~6-watts

We don't know if there is variable cache data rate.. for all we know the L2 is half-rate and the L3 is 2/3rd or quarter-rate.

1.6 GHz ~ 2.0 GHz is suspicious, since they have already moved to a high frequency physical design with ZX-E. 16nm to 7nm should thus be faster at the given 56W TDP mark. Which has a bigger improvement than 28nm to 16nm.
 
Last edited:
  • Like
Reactions: amd6502

Hitman928

Diamond Member
Apr 15, 2012
5,262
7,890
136
Problem is by the time KX-7000 (ZX-F) CPUs are actually available, AMD will have Zen3 CPUs out while ZX-F will at best match the original Zen architecture (from what we have so far). That's still a huge leap competitively from where they were just a couple of years ago, but won't be enough to catch up to AMD/intel yet.
 

amd6502

Senior member
Apr 21, 2017
971
360
136

NostaSeronx

Diamond Member
Sep 18, 2011
3,686
1,221
136
4-wide decode plus op cache? any guesses how many pipelines and the ratio?
There is no op-cache, but there is a micro-op queue.
^-- which is the µop-decode stage of WuDaoKou (ZX-D, pulled up from Isaiah)

WuDaoKou & onwards(ZX-E & ZX-F) should continue the 15-stage mispredict.

ZX-D & ZX-E are still seven execution units.

ZX-C/Isaiah-exact clone => 3x 86 instructions decode, 3 x86 macro-ops dispatch, 3 fused x86 micro-ops issue.
ZX-D&E/Isaiah-evolved => 4 x86 instructions decode, 4 x86 macro-ops dispatch, 4 fused x86 micro-ops issue.
 
Last edited:
  • Like
Reactions: amd6502

Kosusko

Member
Nov 10, 2019
161
120
116

soresu

Platinum Member
Dec 19, 2014
2,662
1,862
136
ZX-C = 2 GHz @ 18W TDP (Quad-core/3-issue)
ZX-D = 2 GHz @ 56W TDP (Octo-core/4-issue)
ZX-E = 3 GHz @ 56W TDP (Octo-core/Enhanced-width Media Ports-etc)
ZX-E @ 2 GHz = ~18W TDP do to 3x performance/power ratio.
ZX-F might also be another 3x performance/power ratio => 1.6 GHz ~ 2 GHz => 8-cores @ ~6-watts

We don't know if there is variable cache data rate.. for all we know the L2 is half-rate and the L3 is 2/3rd or quarter-rate.

1.6 GHz ~ 2.0 GHz is suspicious, since they have already moved to a high frequency physical design with ZX-E. 16nm to 7nm should thus be faster at the given 56W TDP mark. Which has a bigger improvement than 28nm to 16nm.
You make an assumption that desktop is their only aim.

To my knowledge, the only standalone VR headset with a 'high end' (and yes I use the term loosely) was the one that used a full Carrizo chip - they may be chasing decent standalone VR as the next thing, the Chinese can't get enough of 3D at the cinema supposedly, makes sense they would prioritise VR.
 

Kosusko

Member
Nov 10, 2019
161
120
116
I remain skeptical of how integrating a machine-learning coprocessor is really going to improve the product. Clearly it isn't going to compete with heavy-duty machine learning clusters based on tons of GPUs. Is VIA trying to compete with lightweight machine-learning tasks, such as those popping up on mobile devices?

MLPerf Inference v0.5 Results
source: https://mlperf.org/inference-results


COCO object detection on SSD-MobileNet v1 (images/sec)
Intel Core i3-1005G1 1.2GHz up to 3.4GHz Ice Lake 2C/4T (Intel® UHD Graphics)
217,93​
33,43 %
Centaur Integrated x86 CPUs @ 2.5GHz~2.3GHz 8C/8T (Centaur Integrated AI Coprocessor)
651,89​
299,13 %
ImageNet image classification on MobileNet v1 (images/sec)
images/sec
Intel Core i3-1005G1 1.2GHz up to 3.4GHz Ice Lake 2C/4T (Intel® UHD Graphics)
507,71​
8,40 %
Centaur Integrated x86 CPUs @ 2.5GHz~2.3GHz 8C/8T (Centaur Integrated AI Coprocessor)
6 042,34​
1190,12 %
ImageNet image classification on ResNet-50 v1.5, (images/sec)
Intel Core i3-1005G1 1.2GHz up to 3.4GHz Ice Lake 2C/4T (Intel® UHD Graphics)
100,93​
8,28 %
Centaur Integrated x86 CPUs @ 2.5GHz~2.3GHz 8C/8T (Centaur Integrated AI Coprocessor)
1 218,48​
1207,25 %


source: source: source: https://www.intel.ai/mlperf-nov2019/#gs.h01eu8
source:
 

DrMrLordX

Lifer
Apr 27, 2000
21,629
10,841
136
@Kosusko

Thanks for making my point? You're comparing this to an IceLake i3. Compare a rack full of these things to a rack of systems running enterprise-class GPUs. ML functions on small devices are mostly gimmicks at this point, especially on laptop/PC (mobile devices may make better use of ML functions, but we'll see).
 

Kosusko

Member
Nov 10, 2019
161
120
116
THE INDUSTRY’S FIRST HIGH-PERFORMANCE X86 SOC WITH SERVER-CLASS CPUS AND INTEGRATED AI COPROCESSOR TECHNOLOGY


MockPageAI-6-505x600.jpg


source:


You must post your own comments when referencing an article or image. You may not just drop images by themselves.
Here are the rules.


esquared
Anandtech Forum Director
 
Last edited by a moderator:
  • Wow
Reactions: amd6502

NostaSeronx

Diamond Member
Sep 18, 2011
3,686
1,221
136
"This SoC architecture requires less than 195mm2 in TSMC 16nm and provides an extensible platform with 44 PCIe lanes and 4 channels of PC3200 DDR4."

Whoa, Centaur did it again, but it isn't really ZX-F(7nm). More like a much larger ZX-E w/ AVX512... however, it has the same cache layout as the ZX-F. (16 MB L3 on 7nm and 16 MB L3 on 16nm)
"The x86 microprocessor cores deliver high instructions/clock (IPC) for server-class applications and support the latest x86 extensions such as AVX 512 and new instructions for fast transfer of AI data."
 

Kosusko

Member
Nov 10, 2019
161
120
116
Today they have access to the 16nm FinFET chip TSMC in Nanjing China. They did not want to wait for free capacities of 7nm in 2021.


Centaur’s internal code name: “NCORE”
• SoC design is “CHA”, and x86 core is “CNS

• Centaur developed a new x86 microprocessor with high instructions/clock (IPC)
• Microarchitecture designed for server-class applications with extensions such as AVX-512
• New x86 technology now proven in silicon with 8 CPU cores and 16MB L3 caches
• SoC architecture provides an extensible platform with 44 PCIe lanes and 4 channels of PC3200
• Including AI coprocessor, requires less than 195mm2 in 16nm TSMC
• AI Coprocessor is 34.4mm2 in 16FFC
• Reference platform running at 2.5GHz today
• Simultaneous execution of x86 cores and 20 TOPS AI Coprocessor
• Delivers 20 peak terabytes/sec to AI Coprocessor from dedicated 16MB SRAM

More technical details will follow soon:

Microprocessor Report article will be released on December 2, 2019. This report will be a deep dive into the technical details.

source: https://centtech.com/wp-content/uploads/PRSlides_1118_Release.pdf
source: https://centtech.com/wp-content/uploads/November-18-2019-press-release-1.pdf