Yes, but it is far easier to develop a custom interconnect than to develop a custom ARM core. Even AMD could afford to buy an interconnect maker, and AMD financial position is subpar.
AMD Seamicro will make a difference on highly complex or specific designs. For your everyday servers the chip itself will matter a lot more than the interconnect.
So they are switching to cheap, vanilla ARM servers, most likely inferior to its competitors, which means that whatever gains comes from manufacturing the boxes themselves. I doubt that this business will be able to muster enough margins to sustain the company in the long run.
I cannot but agree. MIPS beat them all to the punch by years in some ways, but not by making anything COTS. Even if there ends up being enough markets (no one niche will be enough for more than any one company, I don't think), AMD would only have a significant advantage v. Calxeda and Baserock, who only support rather slow interconnects. For fighting thermal density with parallel workloads, I'd be willing to bet that the forthcoming Atoms will smash the upcoming ARM SoCs, and that AMD would have been better to make a RAS-heavy Jaguar, and try to stuff it into every niche they could find (and, the same chip w/ those features not turned on could be an OK mobile and AIO CPU).
JHH is at least working towards known-profitable areas, like HPC, but I'm not sure how well AMD will be able to hold out with other ARM vendors fighting over small margins, unless they have other trick up their sleeve (so far they haven't, though).
Samsung is working on ARM based server parts for release in 2014, at 32 bits ARM is inadequate for a large server application, but with instruction set V8 it becomes viable.
32-bit isn't viable for anything more fancy than a Sheevaplug. Yes, it works, but 64-bit makes the world, as the CPU, OS, RAM-heavy services and file-heavy services, see it, orders of magnitude simpler, once you get into using GBs of RAM, and 100s of MBs of data sets. Wider registers are a nice bonus for small copies and DB work, too. It's the difference between admins getting pissed off and disabling the OOM killer, then recompiling the kernel, then getting pissed off all over again when they still run out of memory with enough available...to occasionally having to tweak a VM tunable.
For now, I don't think anybody is running LAMP on GPUs. That may change, but my understanding is that for an individual thread, a GPU is very slow - to the point where, if you have any latency restrictions at all, a GPU thread is not likely to meet your performance requirements. I've heard that for a number of server workloads, the current generation of low-power processor cores (from all major vendors) just don't have quite enough performance to get a foot in the door. Websites care very much about how long a page takes to load, and if you can't e.g. serve Anandtech forums in under 100ms (random made up number), it doesn't matter how cheap/low-power/etc your chip is.
Pretty much. I could see OLAP going GPU, but (Apache/Nginx)+(MySQL/PostgreSQL)+(PHP/Python/Ruby), or anything like that, will be much more benefited from big caches and fast memory access than more processors. Not merely that, but they benefit greatly from caching in more abstract ways than the memory cache: sharing chunks of files through the OS, sharing their own data structures in memory (Python explicitly will need to use a 3rd-party object DB, PHP can use APC with a few config changes), and implicitly sharing most of their VM contents (depending on server config, anyway--upstream defaults are not good for web apps). In addition, MySQL and PostgreSQL both scale very well to many cores, provided there are also many transactions.
So, even in cases where more threads, cores, etc., could be more useful than fast response times per thread, you'd really want an expandable NUMA system, where each card adds to total CPUs and RAM (and with a kernel aware of the layout), rather than each card being a little ARM blade server. Also, the program counter can't reasonably be shared between threads, even though they're sharing the same binary, so a GPGPU's vector lanes would be unusable: scaling would be limited to the number of independent program counters supported by the processor (they call vector data lanes, "threads," so it's important to make the distinction).
I can see three possibly useful cases:
1. Shared hosting (standard and VPS), where everybody gets slow 'servers' anyway, and the hosting service would get benefits from reduced thermal density, and not needing to share nearly as much cache and RAM between sets of sites.
2. Massive low-bandwidth parallel processing, where time to completion does not matter as much as being able to have more jobs in flight. With a system like Hadoop, or Erlang, the scaling could be excellent. The case here is looking at them as sets of networked DRAM banks that happen to have some CPUs attached.
3. Memcache servers. I'm not sure how they'd actually perform here, relative to using fewer servers per U, but once 64-bit is here, each node could have 8+GB RAM, so you might be able to pack much more total cache per U than standard servers.
Will v8 cores clock at 3 ghz? 4ghz? or am i missing something?
All you're missing is that it's new, and there's obviously some demand. But, with no useful parts out, nobody knows exactly how it will shape up. So, anybody who can wants to be ready, over the next few years, in case it ends up
not only being a few users totaling 1%. If they are good enough for enough users, and/or people find unexpected ways to exploit them, and it explodes, you don't want to be the guy that said it would be a waste of time to pursue, do you?