Author:
Jon Masters (Chief ARM Architect at Red Hat)
Standardization: ARM server success in 2016
Dec 12, 2015
ARM servers are on track to become extremely popular in the mainstream over the next 2 years. You don’t have to believe me, but I’m not wrong. The mainstream servers will all be completely standardized, will function similarly to their x86 and POWER cousins, and will provide the kind of user experience that datacenter customers demand in order for any new technology to be taken seriously. That’s the “boring” stuff. The stuff that developers hate, and that end users love. The stuff that takes a server from a science experiment to real world use.
I’m going to be publishing a series of guides over the coming weeks and months, beginning with a white paper in the new year that will articulate all of the things you should demand of your system vendor when making a purchasing decision in order to have a completely “boring" and non-adventurous out of the box experience. Eventually, this really will be a “take for granted” situation. But in the early days, not all vendors are created equal, and not all have fully self actualized how to succeed in the Enterprise (and not embedded) market. We are going to help them, together, to ensure that they build only servers the market wants, and give you the experience that you expect. And the good news is there are a number of ways to get that real positive, great out of the box experience already today. There need to be more.
The upshot to this is that industry standard ARM servers will “just work”. They will allow you to choose your Operating System of choice, install it onto a server, and then upgrade, downgrade, experiment, migrate, all of the things that you expect from real world use cases in your real world datacenters today. There will be no “special distribution built for server X” that shipped with the system from vendor Y in real world datacenter scenarios. There will be only industry standard systems running real Linux kernels, built from upstream sources, shipping in major distros. To get to that point faster, we need to work together.
You need to demand of your vendors that they only ship industry standard hardware, and that they ensure “upstream” (Linus Torvalds) kernels get all of the drivers needed to make their systems work.]/b]
In the early new year of 2016, I will provide you with a nicely articulated list of system hardware requirements (all based upon your normal datacenter experience) that you can share with your system vendor as a starting point that ensures you will have a “boring” and non-adventurous out of the box experience with any compliant Operating System of your choice. In return, I need your help to continue the pressure toward universal adoption of only standardized platforms. I need your help to get the other Linux distros to follow through on doing the right thing, because you as an end user want the competition, and you want compatibility, and you want to be able to migrate from one to another with ease. They need to do a little work in some cases to get with the program, and we’re going to do that together. You’re going to create the pressure, we’re going to ensure only industry standards are used.
Why is this so critically important? Why does a boring out of the box experience matter?
Standardization remains key to the success of ARM in servers (and, conversely, the lack of appropriate platform standardization is one of the major reasons why other alternative architectures of yesteryear failed in market). Customers and end users in the real world datacenter market don’t often make choices directly based upon firmware, boot loaders, or other aspects of the platform runtime (because it’s implied these are already doing the right thing in contemporary setups). Instead, customers and end users make choices based upon well established datacenter industry best practices, their own expectations, and the demands of their datacenter environment. Existing infrastructure choices also form a significant factor in the decision processes surrounding traditional datacenter adoption. And while contemporary "Cloud providers” are willing to adapt faster in some respects than their peers in a more traditional setting, they nevertheless have their own limits of reasonable accommodation for any new technology being considered for use at hyper scale. Besides, all of the major Cloud vendors have long lists of requirements that make what we are discussing here look quaint by comparison. Those succeeding at getting into that space with ARM will quickly come to understand what I mean when I say that, and there will be many such success stories.
Taking a step further back. The industry today has certain key built-in, assumed expectations for server technology going into real datacenter, as opposed to those being used for demos and early proof of concept work. These are so ingrained, they’re just taken for granted today. This means that customers don’t even ask for them, but they do expect they are there. Here are just a few of the minimal expectations that must be satisfied and will be satisfied by the wealth of production quality ARM servers that will enter the market over the coming years:
That the installation process for any server is a simple matter of booting some install media (over the network, or locally) in a completely standardized, totally reproducible way. Precisely nothing custom to an individual system may be included in this process - no special platform data, no special address for the console (UART) logs, no custom kernel, and never “just type this magic sequence” (the famous last words of any attempt at being successful in market with something new). The server should turn on, boot up, and operate consistently every time. Standard BMCs, IPMI, SMBIOS, APEI, watchdogs, all of these must “just work”, the way they “just work” today on many ARM servers and all of their contemporary existing counterparts.
That the server (platform) and the Operating System are decoupled and disjoint. The same pre-built, binary Operating System release must run on multiple different, independent servers. It must be upgradeable, and it must provide a consistent experience from one generation to the next. It is never acceptable to require mainstream datacenter users to upgrade their Operating System and hardware (firmware) platform in lockstep. Similarly, firmware and Operating System software must be broadly forward and backward compatible. Updates should occur in standardized ways (e.g. using UEFI Capsule) and require no special customization from one system to the next. An x86 server from OEM or ODM X must function very similarly to a comparable ARM based system, in terms of the overall user experience. I have heard too many totally avoidable horror stories from those who received a hacked up distro (not mine) built for some specific platform which shipped together and now has no upgrade path, and no future. In those cases, I have dug people out by pointing them toward standard firmware to replace the hacks they were running, and given them a path to use their existing silicon with future Operating System releases (from whichever vendor they want to go with - that’s called an open market). All of that nonsense out of the box experience was totally avoidable, and entirely caused by laziness on the part of certain folks shipping systems who bundle firmware intended for embedded use simply out of muscle reflex while simultaneously making real enterprise versions available to other users. EVERY system shipping EVERY time, should run exactly the agreed upon industry standard solution. FROM DAY ONE. These negative out of the box experiences are totally avoidable today and they must cease if we are to collectively succeed. You don’t go into a datacenter customer and get to make a second good impression after you’ve needlessly squandered your first opportunity by showing them a useless hack. That useless hack has no upgrade path when they decide to go to production.
That the same Operating System release they run today, will run on tomorrow’s servers in a backward compatible way. If the exact same pre-built binary Operating System image does not boot and run on a server released in six months from now (albeit with certain caveats in terms of performance tradeoff) then the development model in use is broken and successful adoption is needlessly hindered. It is acceptable to innovate with new devices requiring updated drivers, but the foundational components of the system must continue to work (perhaps without the new device). Thus, system vendors have a well-understood tradeoff in terms of decision to incorporate a new plug-in adapter (or on-SoC device) at a point when Operating System support is broadly present for those users who will be consuming that hardware in their environment. In English, you can always turn on your server, boot it up, get to the hard disk, and get on the network. If you want to add a whizzbang new disk controller or an Ethernet controller that’s running at a bazillion Gigabits with RDMA, go have fun, but the existing Operating System will continue to work while you’re having your innovative fun.
There are many other expectations that the industry has. These are just a few of them.
One of the most dangerous phrases in the English language is “we’ll just do it this way for now”. There is never a “for now”. There is always a legacy. A “forever”. This is why I have been so forthright that there will never be an acceptable period of time in which ARM servers landing in end user or customer hands are other than completely standards compliant. To do otherwise gives users a hideously unfortunate and totally avoidable experience out of sheer laziness on the part of those building the platforms. Time and time again, I have seen users who are keen to try out ARM based servers foiled by having to figure out which kernel version to boot with which system, where to get a special OS release from, or which device tree they should load using some magic incantation with this special OS build. This does more damage than it does good, and it must not happen. Real servers are built using agreed upon industry standards. On ARM, as with x86, this means UEFI, ACPI, PCIe, and many other acronyms. All of those pieces are available today. The only reason not to universally adopt such standards is out of short term thinking spawned by an embedded mentality and laziness. It must, and will, end.
Developers love hacks. They love non-standard firmware that lets them play. They are happy with words and phrases that begin with “and then I just did…”. It even seems fast, expeditious, even progressive. "Look! We can run OpenStack on ARM!!!” they will say (aside: this is indeed awesome, but you can already do this today using fully standardized ARM based platforms that don’t have such hacks). They will then neglect to mention that they’re doing so using some hacked up distribution and a kernel that never even remotely saw upstream, isn’t going to be upgradeable, and is a support and end user experience nightmare. This is what happens when you allow your customer experience to be dictated solely by the whims of keen, well meaning, developers, who don’t realize that many end users running an app have absolutely no idea what firmware really is, how a Linux kernel works, or the notion of loading some platform specific gunk. What they know is that their x86 server powers on and “just works” with the OS release they found on the Internet yesterday. And if their ARM based one needed someone to go find some special wiki page and waste more than exactly 5 seconds getting the system to work, something is wrong. I have heard of weeks being wasted trying to deal with shoddy, unacceptable vendor hacks that were given to them as a result of short-term, narrowly focused thinking that used the words “we’ll just do it this way for now”. Again, always totally avoidable since the right solutions exist TODAY. And should be used.
This all ends. Right now. We have built some wonderful industry standards for ARM servers. All of the real contenders are already adopting them, and it already works great. By this time in 2017 you will find many different, fully compatible ARM based servers built around UEFI, ACPI, PCIe, and all of the technologies that datacenter customers know and love (and really want in order for us to be taken seriously). I am not prepared to wait around while we have another year fighting the nonsense that is the resistance to industry standardization on ARM from a small subset of folks who are making this more painful for the rest. We are trying to seize defeat from the jaws of victory and it will not stand. We need this to stop. Some very serious contenders are building some very nice hardware and we will not fail to execute just because some folks on the Internet failed to grasp the importance of standardized, upstream, mainstream industry servers that can run any software in a fully uniform way.
I am going to publish a number of guides over the coming weeks and months. They will call out precisely what you should look for in an ARM server that runs in a real datacenter. All of the pieces are available TODAY, right now, but there is still too much needless fragmentation. I intend for all of that nonsense to go away in 2016. And if I catch you building a non-standard ARM server, shipping one, or otherwise not engaging with others based upon the agreed specifications that we (the vendors) all built together over the past 5 years, I will find you. When I find you, I will do all I can to compel you to understand that short term hacks, kludges, and other nonsense will not drive us to where we need to be in 2016 and beyond.
We are going to succeed in this effort. We are going to build fully standardized platforms built using open technologies with cross-vendor support. We are going to be able to migrate from one ARM server to another, just like we can on x86 today. There will be no zoo. Those who still don’t get that have no idea how much I am willing to do to prove them wrong.