Xbit Labs: Dell starts to test ARM microprocessors in servers

cbn · Mar 3, 2012

http://www.xbitlabs.com/news/other/...s_to_Test_ARM_Microprocessors_in_Servers.html

Dell Starts to Test ARM Microprocessors in Servers.

Dell: If Customers Want ARM, We Will Provide!
[02/29/2012 11:11 PM]
by Anton Shilov

Dell, major PC company that exclusively used Intel Corp.'s microprocessors just six years ago, now not only utilizes both AMD's and Intel's chips, but is working on servers which can be powered by various ARM-architecture microprocessors. Dell claims that if consumers want, it will offer appropriate products.

"We have had ARM systems in our lab for over a year. If that is what our customers demand thats what we will offer," said Forrest Norrod, general manager for Dells server solutions group, in an interview with Forbes.

Hewlett-Packard and some other manufacturers are also experimenting with ARM-based servers and startups like Calxeda are working on ARM chips with special server capabilities. ARM itself is also developing v8 32/64-bit architecture with servers in mind. Dell did not elaborate which ARM chips it uses for testing of the architecture.

Dell believes that switching from x86 to ARM will not bring too drastic changes to servers in general: all the industry standards as well as proprietary technologies will work with ARM system-on-chips.

Our [server] management [software] is independent of the processor powering the server. If we wanted to incorporate ARM into our server lineup, to any management tool it just looks like a PowerEdge server," claimed Mr. Norrod.

The most important advantage of ARM over x86 is its ultra low power consumption and therefore potentially better performance scalability. Still, before there are commercially available ARMv8-based SoCs with server-specific features, it does not make sense to utilize ARM chips inside machines used to host web-sites or run critical applications. Such 64-bit chips are projected to emerge in late 2013 at the earliest.

"ARM has some interesting advancements around power density. [...] I dont believe customer are going to want to port their applications back to 32 bits from 64 bits," concluded Mr. Norrod.

Looks like the industry could be headed for a change.

It will be interesting to see how AMD responds to this with their SeaMicro based systems.

According to this Arstechnica Article Sea Micro has power/cooling advantages over Calxeda. The disadvantage being SeaMicro needs a separate physical ASIC for network and storage virtualization.

Its helpful to contrast Calxedas approach with that of its main x86-based competitor, SeaMicro. SeaMicro makes a complete, high-density server product based on Intels low-power Atom chips that is built on many of the principles described above. Aside from the choice of Atom over ARM, the main place that SeaMicros credit-card-sized dual-Atom server nodes differ from Calxedas EnergyCards is in the way that the latter handles disk and networking I/O.

As described above, the Calxeda system virtualizes Ethernet traffic so that the EnergyCards dont need physical Ethernet ports or cables in order to do networking. They do, however, need physical SATA cables for mass storage, so in a dense design youll have to thread SATA cables from each EnergyCard to each hard drive card. SeaMicro, in contrast, virtualizes both Ethernet and SATA interfaces, so that the custom fabric switch on each SeaMicro node carries both networking and storage traffic off of the card. By putting all the SATA drives in a separate physical unit and connecting it to the SeaMicro nodes via this virtual interface, SeaMicro systems save on power and cooling vs. Calxeda (again, the latter has physical SATA ports on each card for connecting physical drives). So thats one advantage that SeaMicro has.

One disadvantage that SeaMicro has is that it has to use off-the-shelf Atom chips. Because SeaMicro cant design its own custom SoC blocks and integrate them with Atom cores on the same die, the company uses a separate physical ASIC that resides on each SeaMicro card to do the storage and networking virtualization. This ASIC is the analog to the on-die fabric switch in Calxedas SoC.

Note that SeaMicros current server product is Atom-based, but the company has made clear that it wont necessarily restrict itself to Atom in the future. So Calxeda had better be on the lookout for some ARM-based competition from SeaMicro in the high-density cloud server arena.

.....But with AMD taking over SeaMicro it stands to reason that one mentioned disadvantage will no longer exist as AMD can design their own Server Specific SOCs.

Idontcare · Mar 3, 2012

Just as the ugly-duckling that was x86 marched right in and upset the entire big-iron server market despite all its misgivings and lackings, ARM could very well acheive the exact same outcome for the exact same reasons.

Consider that 10yrs before DEC went belly-up they were the titan of the industry with nearly untouchable performance and massive sales/revenue (second only to IBM). Had you analyzed DEC in 1988 the last thing you would have expected looking forward 10yrs to 1998 was that they would have met their demise at the hands of Intel's paltry x86 microprocessor line.

We look at Intel today and conclude they are too big to fail, that 10yrs from now they will simply be bigger and even more untouchable. But I'm not so convinced.

Edrick · Mar 3, 2012

Idontcare said:
Consider that 10yrs before DEC went belly-up they were the titan of the industry with nearly untouchable performance and massive sales/revenue (second only to IBM).

I remember drooling over the Alpha processors. Now they are just a footnote in history.

Dravic · Mar 3, 2012

Idontcare said:
Just as the ugly-duckling that was x86 marched right in and upset the entire big-iron server market despite all its misgivings and lackings, ARM could very well acheive the exact same outcome for the exact same reasons.

Consider that 10yrs before DEC went belly-up they were the titan of the industry with nearly untouchable performance and massive sales/revenue (second only to IBM). Had you analyzed DEC in 1988 the last thing you would have expected looking forward 10yrs to 1998 was that they would have met their demise at the hands of Intel's paltry x86 microprocessor line.

We look at Intel today and conclude they are too big to fail, that 10yrs from now they will simply be bigger and even more untouchable. But I'm not so convinced.

Especially with the necessary decoupling of higher language software code from hardware type in the move to cloud computing frameworks. It becomes purely a case of transactions per watt.

in 15 years I wonder what the decoupling from the cloud will be called.

SickBeast · Mar 3, 2012

ARM is probably great for certain types of basic servers, but anything requiring FPU performance will not work on ARM at all.

The thing is, the lower end of the market is where most of the sales are, so this type of thing could really hurt AMD and Intel.

cbn · Mar 3, 2012

One thing I am trying to understand is how much "IP other than the CPU" will factor into these Server SOC strategies?

For example, I have seen mention that one major reason Qualcomm was able to make such progress in Smartphone SOC market was because the company had superior non-cpu IP.

How much will the Server SOC battle have similarities to this? In what ways will it differ?

With that being said, It would still be great to see AMD design the "best CPU" (of its type) and then share that same advanced CPU across multiple product lines to reduce costs.

cbn · Mar 3, 2012

Dravic said:
Especially with the necessary decoupling of higher language software code from hardware type in the move to cloud computing frameworks. It becomes purely a case of transactions per watt.

in 15 years I wonder what the decoupling from the cloud will be called.

Thank you for the post and insight.

Can you explain this in a little more detail?

Idontcare · Mar 3, 2012

Dravic said:
Especially with the necessary decoupling of higher language software code from hardware type in the move to cloud computing frameworks. It becomes purely a case of transactions per watt.

So true, provided the maximum latency for an individual transaction is saited on the human timescale (say 10ms), the TPM/$ metric is all that is going to matter to a majority of the market.

I see that in a different way, but the same analogy, with my SSD's.

In benchmarks, the difference between my 160GB G2 and my 240GB V3 is astounding, but in reality their performance is the same to my perception, and both are vastly superior to the spindle-drives they replaced.

Just as x86 merely needed to be good-enough computing 15yrs ago to supplant the server market, so too for ARM. It need merely be good enough, but economically advantageous, and that will be all she wrote.

In 10yrs time, x86 in servers could be as commonplace as DEC servers were in 2002.

cbn · Mar 3, 2012

Idontcare said:
So true, provided the maximum latency for an individual transaction is saited on the human timescale (say 10ms), the TPM/$ metric is all that is going to matter to a majority of the market.

I see that in a different way, but the same analogy, with my SSD's.

In benchmarks, the difference between my 160GB G2 and my 240GB V3 is astounding, but in reality their performance is the same to my perception, and both are vastly superior to the spindle-drives they replaced.

Just as x86 merely needed to be good-enough computing 15yrs ago to supplant the server market, so too for ARM. It need merely be good enough, but economically advantageous, and that will be all she wrote.

In 10yrs time, x86 in servers could be as commonplace as DEC servers were in 2002.

So lets say that all the fat gets shaved off these server boxes to the point where the energy required by the x86 instruction set becomes a big deal for a very small cpu core.....who becomes the main competitor to ARM Server?

MIPS?

How about AMD and MIPS? <----Does this fit into the company's long term server plans?

cbn · Mar 4, 2012

Back when I looked at ARM Cortex A9 vs. MIPS, the MIPS actually came out on top for energy efficiency----> http://forums.anandtech.com/showpost.php?p=32695414&postcount=7

This is the highest performance MIPS processor I could find on the company website:

http://www.mips.com/products/cores/3...specifications

Base Core: 1074Kf (with FPU)
Configuration: dual core
Process: 40nm G (TSMC)
Libraries: TSMC 12 track, MVt/OD
Frequency: >1.2 Ghz, 1.5 GHz
Coremark/MHz (per core) 2.55
DMIPS/MHz (per core) 2.03
Power: 0.36 mW/Mhz, 0.43 mW/MHz
Area: 4.1 mm2

Now compare this to ARM Cortex A9: http://www.arm.com/products/processo.../cortex-a9.php (click performance tab for specifications)

Cortex A9 Dual Core Hard Macro Implementation
Process: 40nm G (TSMC)
Frequency: 2000 Mhz (performance optimized),800Mhz (power optimized),
Performance (total DMIPS) : 10,000, 4000
Energy efficiency (DMIPS/Mw): 5.26, 8.0
Total power at target frequency: 1.9W, .5W

So it appears the MIPS 1074f core achieves ~80% the drystone MIPS/MHz compared to the Cortex A9 core.

Trying to equalize the two CPU designs (with respect to Drystone MIPS and power consumption) I come up with the following:

1.2 Ghz MIPS 1074f (with FPU) dual core = 4872 DMIPS. 432mW power consumption.
800 Mhz Cortex A9 (power optimized) dual core = 4000 DMIPS, 500mW power consumption.

So overall, the 1.2 Ghz MIPS dual core (using TSMC 40nm G) actually beats the 800 Mhz Cortex A9 dual core (using TSMC 40nm G) according to the marketing specs taken from both companies.

Cerb · Mar 4, 2012

Ah, but that's an absolutely pointless comparison. Neither core is capable of running any software. You need peripheral IO controllers, added cache, and a RAM controller, before you can even think about it. By the time you have a whole useful system that can read data and change it, those differences will have long been lost.

On top of that, MIPS/MHz is purely theoretical, and only valuable to perform diagnostics. L1 and L2 miss rates matter. RAM latency matters. Average outstanding cache misses matter. Even for common content consumption, Dhrystone doesn't cut it. Regardless of your use for the chip, Coremark is useless.

That is not to say any ARM or MIPS core is better or worse, when the potential performance and power are both enough. Just that it's the final computer (SoC+Firmware, really) that matters. ARM has the industry support and mindshare, so you will get an ARM, today.

Idontcare · Mar 4, 2012

Cerb said:
Ah, but that's an absolutely pointless comparison. Neither core is capable of running any software. You need peripheral IO controllers, added cache, and a RAM controller, before you can even think about it. By the time you have a whole useful system that can read data and change it, those differences will have long been lost.

On top of that, MIPS/MHz is purely theoretical, and only valuable to perform diagnostics. L1 and L2 miss rates matter. RAM latency matters. Average outstanding cache misses matter. Even for common content consumption, Dhrystone doesn't cut it. Regardless of your use for the chip, Coremark is useless.

That is not to say any ARM or MIPS core is better or worse, when the potential performance and power are both enough. Just that it's the final computer (SoC+Firmware, really) that matters. ARM has the industry support and mindshare, so you will get an ARM, today.

I think the point is more in the pursuit of answering the hypothetical question of "if you were to launch a server-ARM design project today, which microarchitecture would you be better off using as your starting template for such an endeavor?"

One justified answer is "the MIPS core, because..."

That's not pointless, that is called contemplation and discourse. We do that around here on rare occasion, and even then it is mostly by accident I am quite convinced, when the fanboy flame wars die down

CPUarchitect · Mar 4, 2012

Idontcare said:
Just as the ugly-duckling that was x86 marched right in and upset the entire big-iron server market despite all its misgivings and lackings, ARM could very well acheive the exact same outcome for the exact same reasons.

Consider that 10yrs before DEC went belly-up they were the titan of the industry with nearly untouchable performance and massive sales/revenue (second only to IBM). Had you analyzed DEC in 1988 the last thing you would have expected looking forward 10yrs to 1998 was that they would have met their demise at the hands of Intel's paltry x86 microprocessor line.

We look at Intel today and conclude they are too big to fail, that 10yrs from now they will simply be bigger and even more untouchable. But I'm not so convinced.

DEC failed because it stopped innovating and other companies managed to create faster processors at lower prices. I just don't see this happening to Intel any time soon. ARM doesn't have any design that's even remotely a viable alternative to Sandy Bridge, let alone its successors. And Medfield, which is a 64-bit design inside, shows they can also create very power-efficient chips based on x86. And last but not least they have a huge lead in process technology that they keep investing into.

So this begs the question: What sort of innovation does ARM bring to the table? I honestly can't think of anything significant. It's a RISC architecture, but Intel has already beaten many other competitor's RISC architectures. Aside from the decoders, modern x86 processors are RISC inside anyway.

ARM manufacturers still have to present 64-bit parts, and they also need much bigger caches to become competitive in the server market. Plus they have to be able to run reliably 24/7 at full load for decades. That can certainly all be achieved, but it will make these chips a lot more expensive. So what's the secret ingredient that would allow ARM to succeed where many others have failed?

degibson · Mar 4, 2012

CPUarchitect said:
So what's the secret ingredient that would allow ARM to succeed where many others have failed?

Nobody realistically expects ARM to compete on performance. But if only performance mattered, everyone would be using IBM parts. Cost matters more, especially for workloads that scale-out and can tolerate whimpier individual CPUs. ARM chips are cheap. Economies of scale that apply already to phones would make the same chips very, very cheap as servers.

CPUarchitect · Mar 4, 2012

degibson said:
Nobody realistically expects ARM to compete on performance. But if only performance mattered, everyone would be using IBM parts. Cost matters more, especially for workloads that scale-out and can tolerate whimpier individual CPUs. ARM chips are cheap. Economies of scale that apply already to phones would make the same chips very, very cheap as servers.

AMD's chips are considerably cheaper than Intel's, and yet it's Intel who rules the server market.

Using "whimpier" processors doesn't work because then you have a lot of identical silicon which is idle for much of the time, and a lot of expensive peripheral hardware. For instance it pays off to have a ~20 MB cache which minimizes the cache misses, and share it between ~8 powerful cores which each process 2 threads. It would take perhaps a hundred ARM processors to get anywhere near the same throughput, which each need their own cache, memory controller, socket, power regulator, etc. In the end that's not cheaper at all, and response times are longer.

Also note that in 2006 ARM considered Hyper-Threading to be inefficient. In 2010, they announced they will include simultaneous multithreading in future chips...

They clearly have a lot of catching up to do. And I honestly don't see what kind of technological advantage they could have over Intel. Heck, Intel already showed off a 48-core single-chip cloud computer in 2009. If it had any merit, they would have already started selling it by now.

Cerb · Mar 4, 2012

What I call pointless is a synthetic bench type comparison with some of the most important parts today missing. Not that MIPS is bad, but that the differences will largely evaporate on a CPU core basis by the time a SoC is up and running, and the 'uncore' will really determine both the real application performance, and the power efficiency.

Idontcare said:
That's not pointless, that is called contemplation and discourse. We do that around here on rare occasion, and even then it is mostly by accident I am quite convinced, when the fanboy flame wars die down

No, that's not pointless. Not only that, but I would go as far as to say that ARM's popularity is why it has been chosen as the warrior to free us from Intel's grasp ^_^, and the technical aspects take a back seat to a 50/50 look at (a) getting press attention and (b) licensing issues (which I imagine are not too dissimilar between MIPS and ARM).

MIPS may be just as good, but management of random IT-using company X knows about ARM now, whereas MIPS was then. Honestly, I think that's a near-impossible hurdle for MIPS, as far as any mass market adoption goes. Calxeda, FI, didn't just choose ARM because they thought the ARM cores would work. They chose ARM because they thought the ARM cores would work, and ARM has an image of a serious underdog, when it comes to eating into the space of x86, PPC, MIPS, etc.. So, they would have a better chance of getting real press than if they had used a MIPS core, even if the resultant racks of computers performed identically in every meaningful way.

MIPS is still around and competitive, just hidden away in infrastructure that doesn't paint the MIPS name on it, much like Renesas' stuff (SH, H8, R8, RX).

CPUarchitect said:
So this begs the question: What sort of innovation does ARM bring to the table? I honestly can't think of anything significant. It's a RISC architecture, but Intel has already beaten many other competitor's RISC architectures. Aside from the decoders, modern x86 processors are RISC inside anyway.

A complete Thumb, so most executable code would be 16-bit instructions, except where 32-bit offered a useful feature, which might also free up 32-bit space for more complicated instructions, instead of just having Thumb for compression? Nah. A block-aware ISA, so that they could add trace speculation and caching without needing to inspect the code instruction-by-instruction (well, at least branch by branch)? Nah. Not use more registers, since only big software-unrolled loops tend to really need more than 12-16 GPRs (ARM is 13 for most code, IIRC), so as to keep potentially valuable instruction bits for encoding of more useful information per instruction (just make sure every new core renames registers, and be done with it)? Nah.

On one hand, how much can they afford to risk, when making such a major ISA move? OTOH, how can they afford not to try something new and different, with competitors on all sides in their traditional markets (Renesas and MIPS have been getting pretty serious in the last few years), Intel in the markets they want to get into, and now Intel trying to force their way down to phones?

TBH, it's not an enviable spot to be in.

CPUarchitect · Mar 4, 2012

Cerb said:
A complete Thumb, so most executable code would be 16-bit instructions, except where 32-bit offered a useful feature, which might also free up 32-bit space for more complicated instructions, instead of just having Thumb for compression? Nah. A block-aware ISA, so that they could add trace speculation and caching without needing to inspect the code instruction-by-instruction (well, at least branch by branch)? Nah. Not use more registers, since only big software-unrolled loops tend to really need more than 12-16 GPRs (ARM is 13 for most code, IIRC), so as to keep potentially valuable instruction bits for encoding of more useful information per instruction (just make sure every new core renames registers, and be done with it)? Nah.

Are you being sarcastic or not?

Just in case, note that x86 is a register-memory ISA while ARM is a register-register ISA. So it always takes an extra instruction to read memory with ARM. This pretty much cancels the advantage of Thumb. And it also wastes registers on data that could have been read from L1 cache instead.

I'm not claiming this makes x86 a better ISA, but it's certainly not as bad as some seem to think it is. There is no clear winner in the RISC versus CISC debate; the best architectures combine ideas from both ends of the spectrum.

So it's a mystery to me how ARM is expected to become serious competition to Intel, especially in the server market.

cbn · Mar 4, 2012

CPUarchitect said:
Heck, Intel already showed off a 48-core single-chip cloud computer in 2009. If it had any merit, they would have already started selling it by now.

According to the link, the Intel 48 core Cloud Computer chip has a 567mm2 die.

So what kind of yields and economics are we looking for that chip? Also, I just have to wonder what other Intel chips that 48 core 567mm2 chip would compete against?

tangrisser · Mar 4, 2012

I don't think that ARM will chase after the regular server market where they most likely will have to design new chip and do a lot more than they are really capable of.

Most likely they are going after high density server market similar to what SeaMicro/AMD caters to. SeaMicro has what looks to be an 8U with more than 512 cores in the box. If they can pull off something similar with less power usage while providing better performance. Atom and Zacate (or whatever AMD is going to produce in lieu of) has some serious competition for their high density server CPUs

CPUarchitect · Mar 4, 2012

Computer Bottleneck said:
According to the link, the Intel 48 core Cloud Computer chip has a 567mm2 die.

That was at 45 nm, and Beckton (also at 45 nm) was 684 mm². It would be much smaller at 32 or 22 nm. Also, since the cores are highly independent, yields should be very good when fusing off one or two.

In contrast, ARM doesn't even have a 64-bit chip on the market yet.

cbn · Mar 4, 2012

With AMD being a possible casualty of an ARM Server Invasion, the company must have some other plans they are considering.

One reason I brought up MIPS is because Rory Read (AMD CEO) has claimed he wants the company to focus on "Emerging Markets", "Cloud" and "Low Power".

With China (as the emerging market) investing in MIPS for Servers I just have to wonder if the MIPS CPU (rather than the likely more expensive ARM) is the one AMD wants to eventually pursue.

CPUarchitect · Mar 4, 2012

tangrisser said:
I don't think that ARM will chase after the regular server market where they most likely will have to design new chip and do a lot more than they are really capable of.

Most likely they are going after high density server market similar to what SeaMicro/AMD caters to. SeaMicro has what looks to be an 8U with more than 512 cores in the box. If they can pull off something similar with less power usage while providing better performance. Atom and Zacate (or whatever AMD is going to produce in lieu of) has some serious competition for their high density server CPUs

"Intel said it was internally developing technology to remain competitive in the dense server market.

We are developing integrated fabrics to boost I/O and high-performance networking and storage in servers, said Jason Waxman, general manager of Intel's data center business unit, in an interview. The technologies are being developed as the company tries to boost its presence in the market for dense servers used in cloud computing deployments in data centers." - AMD's Acquisition of SeaMicro Puts Intel on the Defensive

So again, exactly how does ARM hope to obtain some server market share when it hasn't even got 64-bit chips yet? By the time they do, they'll be facing Intel's next generation of server products.

Dravic · Mar 4, 2012

Computer Bottleneck said:
Thank you for the post and insight.

Can you explain this in a little more detail?

My first post was just to point out the underlying muscle of the computing hardware is quickly becoming insignificant in regards to what high level programing language you use. With cloud computing architectures you are usually another software layer removed from the bare metal.

The resellers are going charge me for CPU resource usage, so I'm going to find the cheapest most reliable provider i can that meets my risk tolerance. If the ISA doesn't matter to me and ARM based architectures end up giving a denser more power effeicnt foot print, it will win.

This is especially true for the cloud resellers who just want to get money and utilization of their existing cloud services, like Amazon.

My second point was toward the cyclical nature of this industry ~40 years in. The mainframe rides again. It may not even take 15 years, I'm not convinced the existing internet is built for a trusted(secure) cloud solution yet.

cbn · Mar 4, 2012

CPUarchitect said:
"Intel said it was internally developing technology to remain competitive in the dense server market.

We are developing integrated fabrics to boost I/O and high-performance networking and storage in servers, said Jason Waxman, general manager of Intel's data center business unit, in an interview. The technologies are being developed as the company tries to boost its presence in the market for dense servers used in cloud computing deployments in data centers." - AMD's Acquisition of SeaMicro Puts Intel on the Defensive

So again, exactly how does ARM hope to obtain some server market share when it hasn't even got 64-bit chips yet? By the time they do, they'll be facing Intel's next generation of server products.

http://www.dailytech.com/Project+Moonshot+HPs+Secret+ARM+Servers+Get+Official/article23175.htm

Also we have to consider what kind of sales lower priced servers could get in "emerging markets" where salaries (in general) are a lot lower.

As I understand things, in the US engineer/technician salaries and server software licensing costs are quite high compared to the actual price of the server equipment. But in other countries, I'd imagine being able to get server equipment cheaper could really make a noticeable dent in the total costs of the operation.

Dravic · Mar 4, 2012

CPUarchitect said:
So again, exactly how does ARM hope to obtain some server market share when it hasn't even got 64-bit chips yet? By the time they do, they'll be facing Intel's next generation of server products.

Intel is spending money on dense computing research.
AMD bought a denes computing outfit
Dell starts testing ARM in servers.

You seem to be throwing up walls that don't exist. 64 Bit will matter more for memory density than computing efficiency. We are talking armies of low cost cpus serving up web services programmed at an abstracted layer in some rapid deployment framework like ruby on rails or django. This isn't high performance computing at all. Economies of scale will have a much greater impact.

Xbit Labs: Dell starts to test ARM microprocessors in servers

Lifer

Elite Member

Golden Member

Senior member

Lifer

Lifer

Lifer

Elite Member

Lifer

Lifer

Elite Member

Elite Member

Senior member

Golden Member

Senior member

Elite Member

Senior member

Lifer

Member

Senior member

Lifer

Senior member

Senior member

Lifer

Senior member