48 CORES! Investigating Cavium's ThunderX: The First ARM Server SoC with Ambition

FIVR

Diamond Member
Jun 1, 2016
3,753
911
106
Pretty interesting article, I am surprised it hasn't been posted here, especially considering it's an Anandtech article....

http://www.anandtech.com/show/10353/investigating-cavium-thunderx-48-arm-cores



Scary benchmarks for intel, especially the decompression. This is a 28nm HKMG manufactured SoC competing with intel's latest 14nm Broadwell Xeons. They are already offering an advantage over intel for certain content server applications and such.


This is not a joke. This competes DIRECTLY with intel's latest server and datacenter offerings. If they can do this on 28nm (beat intel in some aspects, trail in others) what can they do on 14nm? 10nm?
 

Nothingness

Diamond Member
Jul 3, 2013
3,292
2,360
136
Single-thread performance is still lacking (though less than what I would have expected on SPEC) and power consumption is bad (28nm). Both points will certainly limit market penetration.

But for the rest I agree, this looks rather good, certainly much better than X-Gene. Can't wait to see how well ThunderX 2 will do.
 

krumme

Diamond Member
Oct 9, 2009
5,956
1,595
136
Its absolutely paywall class article. Stunning work as always. Where do you find this quality work on the net besides at AT?

I also wondered why it havnt been posted but i think people is very conservative.

The technical thing is over my head but its interesting such a well targeted product can have such a success looking at solid profit they consistently have. Cdn loadbalancing cache is niches but looking how the net is evolving and how cdn is evolving what is not cdn in 5 years? Lol
 

FIVR

Diamond Member
Jun 1, 2016
3,753
911
106
Single thread performance is seriously lacking, it's usually 1/3 to 1/2 of what Intel is offering. That's on a low power part too. I don't think going to 14nm would bring it to parity with Intel's single thread, but it would definitely help. At this point, the single thread performance is just way too low to compete.


Very interested to see where this goes. Hopefully this will put some pressure on the current absurd pricing structure for intel's latest offerings.
 

jhu

Lifer
Oct 10, 1999
11,918
9
81
Can it run mainline Linux kernel? If not, then it's just another reason to stick with x86, and not deal with vendor lockin.
 

poofyhairguy

Lifer
Nov 20, 2005
14,612
318
126
At this point, the single thread performance is just way too low to compete.

How needed is single thread performance really?

I honestly don't know. Seems like a web server would be better with a bunch of little cores right?
 

krumme

Diamond Member
Oct 9, 2009
5,956
1,595
136
@jhu If you buy a 10.000 pcx server farm only for eg cdn what is the importance of that?
The tco calculations changes as we see more consolidation all over and more cloud.
Arm is certainly moving like crazy in the next year when this is eg ares based and on 14nm but the market situation is perhaps moving yet faster in its advantage.
Looking at this development one can hope for amd zen is not to fat :)
 
Last edited:

SAAA

Senior member
May 14, 2014
541
126
116
How needed is single thread performance really?

I honestly don't know. Seems like a web server would be better with a bunch of little cores right?

Judging by how much single thread performance affect my everyday web-surfing I'd rather indicate the 2-3x faster Xeons for that purpose. :sneaky:

Overall performance are much better than any other ARM competitors, let's see how well they do with 14nm, new arch and especially when. That's critical because Intel's moving too and next year you have more cores, Skylake-D,X etc.

28nm bulk to 14nm finFET will be the single biggest jump for all the ARM vendors, then 10, 7nm are mere scaling of these with diminishing returns, so it's just a one time opportunity to get much closer to Intel and take some market share.
If they can really double single thread performance by next gen I'll be a lot less sceptical.
 

poofyhairguy

Lifer
Nov 20, 2005
14,612
318
126
Judging by how much single thread performance affect my everyday web-surfing I'd rather indicate the 2-3x faster Xeons for that purpose. :sneaky:

Sure but aren't servers a whole different ballgame than your desktop?

Hell for my desktop I would love a 6GHz dual core just to finally kick Dolphin's ass.
 

krumme

Diamond Member
Oct 9, 2009
5,956
1,595
136
Judging by how much single thread performance affect my everyday web-surfing I'd rather indicate the 2-3x faster Xeons for that purpose.
Thats nonsense. The road to the webserver is cluttered with cpu of all kinds and sorts and goes many ways. Its not faster than the weakest link from your demand. And you have to pay for it. What about 2 fast xeons in your router and the same for your modem? How is that for a start. Runs standard linux woaa :)
 

SAAA

Senior member
May 14, 2014
541
126
116
Thats nonsense. The road to the webserver is cluttered with cpu of all kinds and sorts and goes many ways. Its not faster than the weakest link from your demand. And you have to pay for it. What about 2 fast xeons in your router and the same for your modem? How is that for a start. Runs standard linux woaa :)

One thing is moving data from A to B, with a little processing here and there. Another is to generate pages, run code, video streams and serve thousand of people contemporary.
Yes there are hardware accelerators, parallizeable sofwares etc but half or less the single thread performance must hurt somewhere.

Btw even atom handles 10Gbit but that doesn't make it look so good when you have to compute something... or rather what about 72 of these cores, then we can talk! Ps I'm referring to Xeon Phi: if more cores is the road then that is quite the match against ARM.
Price wise no until you look at performance/power consumption →TCO.

...future will tell, my crystal ball has spoken for now.
 

dark zero

Platinum Member
Jun 2, 2015
2,655
140
106
A nice try for Cavium and seems that Small Cores has their limits... Time to rethink on their uArch.

Wondering if Qualcomm or Apple decides to go big and launches big ARM cores.

PS: BTW I am waiting for AMD K12 ... supposedly is an uArch is an even higher frequency with less cores (but bigger ones) ARM processor that includes features that the rest won't have... maybe HT and L3.
 

XavierMace

Diamond Member
Apr 20, 2013
4,307
450
126
Wake me up when there's an ARM version of ESXi and Server 2012R2. Until then this doesn't really matter much.
 

DrMrLordX

Lifer
Apr 27, 2000
22,696
12,651
136
Hate to say it, but the ThunderX doesn't yet look that good compared to Xeon-D. Needs work on power management software and probably needs a process shrink to a 14nm node. Among other things.

Interesting product, though.
 

beginner99

Diamond Member
Jun 2, 2009
5,315
1,760
136
How needed is single thread performance really?

I honestly don't know. Seems like a web server would be better with a bunch of little cores right?

Disagree. Even on web servers with low load. Why? Latency. Especially on more heavy pages. ST performance matters greatly. I rather have a Xeon with multiple VMs on it running low-usage web apps than this slow 48-core chip running unvirtualized.

The advantage of virtualization is also obvious. Install once basic OS, then you can reuse it. It makes maintenance of web servers much cheaper. I mean just look at the trouble AT had at install Ubuntu. I can say with certainty that the IT of the company I work in would be incapable of doing this.

28nm bulk to 14nm finFET will be the single biggest jump for all the ARM vendors, then 10, 7nm are mere scaling of these with diminishing returns, so it's just a one time opportunity to get much closer to Intel and take some market share.
If they can really double single thread performance by next gen I'll be a lot less sceptical.

They could only double performance at same power use. Problem is they need to half power use and increase ST performance. Good luck with that. The only chance they have is to target a niche which intel does not or doesn't offer any advantages. They will also need some software goodies to make this happen. Why the heck not bundle it with a working copy of a linux dist + very specific install guide? Even better guides for all common linux distros. Plus then some management/maintenance software targeting the according niche.

I'm however unclear what said niche would actually be...
 

videogames101

Diamond Member
Aug 24, 2005
6,783
27
91
Disagree. Even on web servers with low load. Why? Latency. Especially on more heavy pages. ST performance matters greatly. I rather have a Xeon with multiple VMs on it running low-usage web apps than this slow 48-core chip running unvirtualized.

The advantage of virtualization is also obvious. Install once basic OS, then you can reuse it. It makes maintenance of web servers much cheaper. I mean just look at the trouble AT had at install Ubuntu. I can say with certainty that the IT of the company I work in would be incapable of doing this.



They could only double performance at same power use. Problem is they need to half power use and increase ST performance. Good luck with that. The only chance they have is to target a niche which intel does not or doesn't offer any advantages. They will also need some software goodies to make this happen. Why the heck not bundle it with a working copy of a linux dist + very specific install guide? Even better guides for all common linux distros. Plus then some management/maintenance software targeting the according niche.

I'm however unclear what said niche would actually be...

I think they have a fair shot, because (excuse me for saying) I don't think Cavium's in-house uarch is exactly at the bleeding edge of efficiency OR st perf. You have to imagine a more Apple A9 like core I think.
 

R0H1T

Platinum Member
Jan 12, 2013
2,582
163
106
I think they have a fair shot, because (excuse me for saying) I don't think Cavium's in-house uarch is exactly at the bleeding edge of efficiency OR st perf. You have to imagine a more Apple A9 like core I think.
This is from the last few pages ~
The ThunderX at 2 GHz performs more or less like an A57 core at the same speed. Considering that AMD only got eight A57 cores inside a power envelope of 32W using similar process technology, you could imagine that a A57 chip would be able to fit 32 cores at the most in a 120W TDP envelope. So Cavium did quite well fitting about 50% more cores inside the same power envelope using an old 28 nm high-k metal gate process.
Imangine an A73 equivalent core on 14nm, or 10nm* for that matter, & how they'll stack up against Xeon D.
Code:
Xeon D-1557	45 W	54	99	100	46	73	99
Xeon D-1581	65 W	59	123	125	66	97	124
Xeon E5-2640 v4	90 W	76	135	143	67	71	138
ThunderX	120 W	141	204	223	82	46	190
Xeon E5-2690 v3	135 W	84	249	254	170	47	241
The second last column shows Transactions per watt & if the trend from mobile sector continues it'll beat the Xeon D handsomely, though purely in terms of performance the E3 Xeons & above should still beat ARM servers in raw power.
 
Last edited:

krumme

Diamond Member
Oct 9, 2009
5,956
1,595
136
One thing is moving data from A to B, with a little processing here and there. Another is to generate pages, run code, video streams and serve thousand of people contemporary.
Yes there are hardware accelerators, parallizeable sofwares etc but half or less the single thread performance must hurt somewhere.

Btw even atom handles 10Gbit but that doesn't make it look so good when you have to compute something... or rather what about 72 of these cores, then we can talk! Ps I'm referring to Xeon Phi: if more cores is the road then that is quite the match against ARM.
Price wise no until you look at performance/power consumption →TCO.

...future will tell, my crystal ball has spoken for now.
The page i look at dont mention web server but stuff like cdn cache balancing as where there is good gain.

http://www.anandtech.com/show/10353/investigating-cavium-thunderx-48-arm-cores/20

As cdn evolves the delivery of information is not what it used to be.
 
Apr 30, 2015
131
10
81
Thinking of a simple model of what ARM and partners do, compared with Intel, the latter is a producer of CPUs, and the former a producer of SoCs. It is also worth remembering that the network/server-farm model is changing, more to a distributed-data system, with caching of data throughout the network, or so I understand.
There are a dozen or so ARM network/server SoC suppliers announced; there is scope for a whole diverse range of SoCs. Each use-case could have its own contenders; this is possible, because the Intel solution is so expensive.
ARM had 4000 employees at the end of 2015, and they are recruiting more this year. They have, maybe, more than one architecture team, appart from the processor design teams, and they are setting up a diverse set of server-processor design teams, it would seem. SoCs could feature special accelerators, per use case. This approach provides scope for efficiency per use-case. Maybe ARM will develop their own accelerators. Also, virtualisation is a feature of V8.
 

Exophase

Diamond Member
Apr 19, 2012
4,439
9
81
This is from the last few pages ~

I take this ThunderX vs Cortex-A57 conclusion with some serious salt. 7-zip doesn't come anywhere close to painting a comprehensive picture. I doubt the two are really that close over a wide range of tests.
 

Zodiark1593

Platinum Member
Oct 21, 2012
2,230
4
81
How needed is single thread performance really?

I honestly don't know. Seems like a web server would be better with a bunch of little cores right?
Latency in server workloads is an issue, thus single-thread performance will need to be decent to a degree.