• We should now be fully online following an overnight outage. Apologies for any inconvenience, we do not expect there to be any further issues.

The opposite of Virtualization, Calxedas Quad Core ARM Server Part

cbn

Lifer
Mar 27, 2009
12,968
221
106
With HP ARM Server Plans finally becoming Official today I though I would dig up some info on the Calxedas Server Parts:

http://arstechnica.com/business/new...-new-quad-core-arm-part-for-cloud-servers.ars

cl_calxeda-systemcard-4eb0608-intro-thumb-640xauto-27248.jpg


On Tuesday, Austin-based startup Calxeda launched its EnergyCore ARM system-on-chip (SoC) for cloud servers. At first glance, Calxeda’s SOC looks like something you’d find inside a smartphone, but the product is essentially a complete server on a chip, minus the mass storage and memory. The company puts four of these EnergyCore SoCs onto a single daughterboard, called an EnergyCard, which is a reference design that also hosts four DIMM slots and four SATA ports. A systems integrator would plug multiple daughterboards into a single mainboard to build a rack-mountable unit, and then those units could be linked via Ethernet into a system that can scale out to form a single system that’s home to some 4096 EnergyCore processors (or a little over 1,000 four-processor EnergyCards).

The current EnergyCore design doesn’t support classic, hypervisor-based virtualization; instead, it supports Ubuntu’s lightweight, container-based LXC virtualization scheme for system management. The reason that you won’t see a hypervisor running on Calxeda hardware anytime soon is that Calxeda’s whole approach to server efficiency is the exact opposite of what one typically sees in a virtualized cloud server. The classic virtualization model squeezes higher utilization and power efficiency out of a group of high-powered server processors—typically from Intel or AMD—by running multiple OS instances on each processor. In this way, a typical 2U virtualized server might use two Xeon processors and a large pool of RAM to run, say, 20 virtual OS instances.

With a Calxeda system, in contrast, you would run 20 OS instances in 2U of rack space by physically filling that rack space with five EnergyCards, which, at four EnergyCore chips per card and one OS instance per chip, would give you 20 virtual servers. This high-density, one-OS-per-chip approach is often called “physicalization,” and Calxeda’s bet is that it represents a cheaper and lower power way to run those 20 virtual servers than what a Xeon-based system could offer. And for certain types of cloud workloads, this bet will no doubt pay off when you consider that a single EnergyCard gives you four quad-core servers in just 20 watts of power (an average of 5W per server and 1.25W per core. Contrast this with a single quad-core Intel Xeon E3, which can run anywhere from 45W to 95W depending on the model.

The new EnergyCore chips will be sampling at the end of this year, and are scheduled to ship in volume in the second half of next year.


calxeda1-4eb0601-intro.png


The EnergyCore custom SoC that lies at the heart of Calxeda’s approach to power efficiency is built around four ARM Cortex A9 cores than can run at 1.1 to 1.4GHz. The four cores share a 4MB L2 cache, a set of memory controllers, and basic I/O blocks (10Gb and 1Gb Ethernet channels, PCIe lanes, and SATA ports).

The EnergyCore Fabric Switch that sits in between the Ethernet blocks and the ARM cores is the key to Calxeda’s ability to scale out a single system to as many as 4,096 processors using any network topology that the system integrator or customer chooses. This switch presents two virtual Ethernet ports to the OS, so that the combination of switch, Ethernet channels, and Calxeda’s proprietary daughtercard interface (the latter carries Ethernet traffic to connected nodes) is transparent to the software side of the system while providing plenty of bandwidth for inter-node transport.

The crown jewel in Calxeda’s approach is the block labelled EnergyCore Management Engine. This block is actually another processor core that runs specialized monitoring and management software and is tasked with doing dynamic power-optimization of the rest of the chip. The management engine can turn on and off the separate power domains on the SoC in response to real-time usage, so that the parts of the chip that are idle at any given moment cease drawing power.

The management engine is also what presents the virtualized Ethernet to the OS, so it works in conjunction with the fabric switch to do routing and power optimization. There are also OEM hooks into the proprietary software that runs on the engine, so that OEMs can roll their own management offerings as a value add.
ARM vs. x86 and Calxeda vs. SeaMicro

It’s helpful to contrast Calxeda’s approach with that of its main x86-based competitor, SeaMicro. SeaMicro makes a complete, high-density server product based on Intel’s low-power Atom chips that is built on many of the principles described above. Aside from the choice of Atom over ARM, the main place that SeaMicro’s credit-card-sized dual-Atom server nodes differ from Calxeda’s EnergyCards is in the way that the latter handles disk and networking I/O.

As described above, the Calxeda system virtualizes Ethernet traffic so that the EnergyCards don’t need physical Ethernet ports or cables in order to do networking. They do, however, need physical SATA cables for mass storage, so in a dense design you’ll have to thread SATA cables from each EnergyCard to each hard drive card. SeaMicro, in contrast, virtualizes both Ethernet and SATA interfaces, so that the custom fabric switch on each SeaMicro node carries both networking and storage traffic off of the card. By putting all the SATA drives in a separate physical unit and connecting it to the SeaMicro nodes via this virtual interface, SeaMicro systems save on power and cooling vs. Calxeda (again, the latter has physical SATA ports on each card for connecting physical drives). So that’s one advantage that SeaMicro has.

One disadvantage that SeaMicro has is that it has to use off-the-shelf Atom chips. Because SeaMicro can’t design its own custom SoC blocks and integrate them with Atom cores on the same die, the company uses a separate physical ASIC that resides on each SeaMicro card to do the storage and networking virtualization. This ASIC is the analog to the on-die fabric switch in Calxeda’s SoC.

Note that SeaMicro’s current server product is Atom-based, but the company has made clear that it won’t necessarily restrict itself to Atom in the future. So Calxeda had better be on the lookout for some ARM-based competition from SeaMicro in the high-density cloud server arena.

Higher watt Server Parts with Virtualization vs ARMs "Physicalization": How much energy saving are we really talking about?

Or is it not energy costs, but the purchase price of the hardware/software that is driving "ARM Servers"?
 

Concillian

Diamond Member
May 26, 2004
3,751
8
81
In this way, a typical 2U virtualized server might use two Xeon processors and a large pool of RAM to run, say, 20 virtual OS instances.

Okay two xeons = 20 OS instances. Therefore one Xeon is 10 OS instances.

And for certain types of cloud workloads, this bet will no doubt pay off when you consider that a single EnergyCard gives you four quad-core servers in just 20 watts of power (an average of 5W per server and 1.25W per core. Contrast this with a single quad-core Intel Xeon E3, which can run anywhere from 45W to 95W depending on the model.

But 4 ARMs would be 4 OS instances, and the article is comparing that to a quad core 45-65W Xeon, which it previously said was 10 OS instances... err... so multiply that 20 W by 2.5 and you get 50W, right around the same level as the Xeon, but less able to handle spike loads by any single OS instance.

notice a complete absence of any information regarding speed of 10 arm cores vs. one "10 OS" quad core Xeon.

Sounds like a plan...
 
Last edited:

cbn

Lifer
Mar 27, 2009
12,968
221
106
Okay two xeons = 20 OS instances. Therefore one Xeon is 10 OS instances.



But 4 ARMs would be 4 OS instances, and the article is comparing that to a quad core 45-65W Xeon, which it previously said was 10 OS instances... err... so multiply that 20 W by 2.5 and you get 50W, right around the same level as the Xeon, but less able to handle spike loads by any single OS instance.

notice a complete absence of any information regarding speed of 10 arm cores vs. one "10 OS" quad core Xeon.

Sounds like a plan...

I know it is confusing, but each ARM quad core is 5 watts making each ARM Core 1.25 watts.

Therefore 10 OS instances on ARM is 12.5 watts....not 50 watts.
 
Last edited:

Cerb

Elite Member
Aug 26, 2000
17,484
33
86
http://www.calxeda.com/products/energycore/ecx1000/techspecs
1-4 A9 cores
4 MB shared L2 cache with ECC
72-bit DDR controller with ECC support
'bout time. With HP being willing to call them servers, I figured they had to have ECC RAM and caches, but I half-expected 36-bit DDR2/3, and definitely was not expecting a decent size cache. We might actually get to see what a modern ARM core can do, when it's not breathing through a straw.

I don't quite get the whole them v. virtualization bit, but that's probably just the marketing guys doing their thing. Virtualization also gets you system management capabilities that a physical server won't have (it's not uncommon for servers to be partitioned, with no over-provisioning, to take advantage of this).

For churning on small memory-bound workloads that scale well to large numbers of physical nodes (MapReduce, Hadoop, maybe cache/proxy servers, random Erlang apps for the Hell of it, and oversold-to-the-nines shared hosting), I can see these doing well. 4GB/CPU is somewhat limiting, but ARM is kind of behind the ball on upping the address space, so the only thing either HP or Calxeda can do is hope these servers sell well enough to fund an ARMv8 iteration.

physicalization?
Recently made up word, as an antagonist to virtualization, for the practice of putting many distinct servers in a small space.
 
Last edited:

cbn

Lifer
Mar 27, 2009
12,968
221
106
physicalization?

http://en.wikipedia.org/wiki/Physicalization

Physicalization, the opposite of virtualization, is a way to place multiple physical machines in a rack unit.[1] It can be a way to reduce hardware costs, since in some cases, server processors cost more per core than energy efficient laptop processors, which may make up for added cost of board level integration.[2] While Moore's law makes increasing integration less expensive, some jobs require lots of I/O bandwidth, which may be less expensive to provide using many less integrated processors.

Applications and services that are I/O bound are likely to benefit from such physicalized environments. This ensures that each operating system instance is running on a processor that has its own network interface card, host bus and I/O sub-system unlike in the case of a multi-core servers where a single I/O sub-system is shared between all the cores / VMs.
 
Last edited:

Chiropteran

Diamond Member
Nov 14, 2003
9,811
110
106
I know it is confusing, but each ARM quad core is 5 watts making each ARM Core 1.25 watts.

Therefore 10 OS instances on ARM is 12.5 watts....not 50 watts.

I don't think so...

The article says
a single EnergyCard gives you four quad-core servers in just 20 watts of power (an average of 5W per server and 1.25W per core. Contrast this with a single quad-core Intel Xeon E3, which can run anywhere from 45W to 95W depending on the model.

It also says things like "one OS per chip". Even though the "chips" are quad core, those 4 cores only run 1 virtual OS, not 4. You need 4 arm cores to run a server, a single core is just too slow for any modern OS. So for 10 OS instances you need 10 "chips", aka 40 cores, which is 50W... not really better than what is already possible using AMD or Intel CPUs.


Even if it was more efficient in this simplistic view, I don't like it because it is a lot less flexible and lot less efficient. The great thing about a VM is for handling servers that, on average don't get a lot of activity, but occasionally spike up in usage for a brief period of time. While with this "physicalization" approach if a server needs to be fast enough to handle a spike it needs to have that many cores dedicated to it always.
 
Last edited:

quest55720

Golden Member
Nov 3, 2004
1,339
0
0
Even if it was more efficient in this simplistic view, I don't like it because it is a lot less flexible and lot less efficient. The great thing about a VM is for handling servers that, on average don't get a lot of activity, but occasionally spike up in usage for a brief period of time. While with this "physicalization" approach if a server needs to be fast enough to handle a spike it needs to have that many cores dedicated to it always.

Pretty much sums up why something like this will not ever end up in our server room at work. It is basically going backwards IMO.
 

Vesku

Diamond Member
Aug 25, 2005
3,743
28
86
Only reason I can see to buy these is for voicemail, PBX, VOIP services. They can try to market it more generally but I don't see that getting anywhere soon.
 

Cerb

Elite Member
Aug 26, 2000
17,484
33
86
Only reason I can see to buy these is for voicemail, PBX, VOIP services. They can try to market it more generally but I don't see that getting anywhere soon.
But, why? x86 is already strong, there, both in terms of usage, and performance. There is already affordable hardware to offload the grunt work, too, if you want to use weak CPUs.

There will really be no point in marketing them as general-purpose servers. They aren't. General-purpose servers can't fit >250 DRAM channels in a 2U box.

Even if it was more efficient in this simplistic view, I don't like it because it is a lot less flexible and lot less efficient. The great thing about a VM is for handling servers that, on average don't get a lot of activity, but occasionally spike up in usage for a brief period of time. While with this "physicalization" approach if a server needs to be fast enough to handle a spike it needs to have that many cores dedicated to it always.
You're thinking about it all wrong! :awe: We will host you with your own dedicated quad-core server for cheap! Oh, now that you're in the contract, we may as well tell you it's a quad cell phone chip.

No 1st-party virtual servers are going to move. Any that do will be hosted for clients on abstracted platforms, where said clients want to pay as little as possible for their share of the infrastructure.
 
Last edited:

cbn

Lifer
Mar 27, 2009
12,968
221
106
I don't think so...

The article says

It also says things like "one OS per chip". Even though the "chips" are quad core, those 4 cores only run 1 virtual OS, not 4. You need 4 arm cores to run a server, a single core is just too slow for any modern OS. So for 10 OS instances you need 10 "chips", aka 40 cores, which is 50W... not really better than what is already possible using AMD or Intel CPUs.


Even if it was more efficient in this simplistic view, I don't like it because it is a lot less flexible and lot less efficient. The great thing about a VM is for handling servers that, on average don't get a lot of activity, but occasionally spike up in usage for a brief period of time. While with this "physicalization" approach if a server needs to be fast enough to handle a spike it needs to have that many cores dedicated to it always.

Okay that makes sense.

Physicalization= Just like the desktop, One OS uses however many cpu cores contained on the chip

Virtualization: Each Multi-core chip can have many OS instances running on each cpu core. (with the option to reduce OS instances on any particular core or group of cores should the need arise)

So lets say a Virtualized 50 watt quad core server experienced:

1. A heavy single threaded load: up to 12.5 watt TDP could be allocated to the Task (assuming Turbo does not exist). With greater than 12.5 watt TDP if Turbo mode is available.

2. A heavy multi-threaded load: up to 50 watts TDP could be allocated to the task assuming the load is parallel enough and occupies four threads.

With a 50 watt Physicalized server (10 quad cores on 2 energy cards)

1. A heavy single threaded load: up to 1.25 watts TDP could be allocated to the Task
2. A heavy multi-threaded load: up to 5 watts TDP could be allocated to the task (assuming the load is parallel enough and occupies four threads).

Conclusions:

The major drawback of the Physicalized server is poor ability to handle heavy single threaded loads. This by a factor of 10.....which could stretch beyond a factor of 10 if the Virtaulized server was able to "Turbo" the single core being stressed?

The advantage of the Physicalized server is higher I/O.

So only if a person knows the loads will always be small and needs more I/O then Physicalized server would be the proper choice.

P.S. I have read the demand for type of "Physicalized" server segment is still small at the moment. But I just wonder how large could it grow and more importantly how much cheaper it could be for the small load/high IO market vs. an existing Virtualized server environment using Big iron Intel or AMD x86 Chips?

If the price ends being substantially cheaper for "Physicalized Server" then maybe it is worth pursuing even if it occupies a low percentage of the total server market.
 

deimos3428

Senior member
Mar 6, 2009
697
0
0
There may be a niche for these, but I'd imagine the density of ARM chips still needs to higher for this to be a feasible option for many. It appears to be great on power-saving, but it just doesn't pack much CPU power into that 2U space relative to x86 options.

For comparison, at my workplace we're building out x86 clusters that are both virtualized and "physicalized" at 48 physical cores (96 w/hyperthreading) per 2U chassis. They are definitely going to run a lot more than 20 virtual server instances; probably averaging somewhere around 120 per chassis in our case. (Heh. That's exactly the same as 10 instances per (Xeon) quad-core as someone stated above, now that I do the math.)

While obviously much higher power consumption than the ARM solution it's still pretty darn good for x86 and a lot less power-hungry than our current dense-pizza-box solution. It's also a lot more flexible in what it can do. Should we need to deploy highly CPU- or memory-intensive instances in the future for whatever reason, we can.

For reference, we're using these. I'm not sure on the exact model TBH, but we're running dual X5650s on each server module:
http://www.supermicro.com/products/system/2U/2026/SYS-2026TT-H6RF.cfm