LLano, the last hurrah for AMD K8, will make itself into a new niche, the unofficial new console, i expect it will take marketshare from Xbox360/PS3,
No. Consoles' strength is in being specialized devices, with a fairly vertical software market. Llano's main target are the mobile Core i3 and Core 2 Duo laptops. I'm hoping AMD has a deal with somebody like HP, to make some good midrange ones.
Bulldozer BD needs to turn up to the fight first. won't compete as well against SB in most applications, but for per core, cloud computing it will be significantly more competitive than intel SB. by turning up late, it has already ceded client side computing to intel.
It ceded single-threaded/client performance to Intel way before turning up late, and AMD has been server-first since the K8. There's no reason it won't be a good desktop CPU--but, good like a Phenom II, not good like an i7. On the server side, they can tout the shared resources in perf/watt, on the desktop they can make decent margins (hopefully) by getting more CPUs per wafer than if they hadn't gone that route.
Secondly, cloud computing, in the sense that you just have a UI, and the work is done on servers, will still be best handled with good per-thread performance, rather than more servers or threads. IoW, you want more cores, but 8 fast cores will usually be better than 64 cores each 1/8 as fast, even if each of those has plenty of IO resources per thread. One core 8x as fast brings you up to non-air cooling, so that's right out. It's balancing act, in which AMD and Intel have to find the closest point to the intersection of many different performance curves...and Intel can afford to throw money at R&D in a way that only IBM can match.
Voo said:
Selling cores for cheap is only great as long as we can use them, more single threaded performance is always welcome.
Per-thread performance will improve for the foreseeable future. More is not welcome, more is
needed. AMD's BD should perform well when there aren't many threads, and then perform better than the same number of logical threads from Intel when there are. It's not like they're making an x86 version of Niagara/Rock. They're focusing on how to make those many threads work well, as they will slowly grind themselves into the ground, if they try to compete with Intel on raw per-thread performance,
and many threads are a reality,
and they don't have the R&D that Intel has, to make SRAM smaller,
and they are consistently at least one process node behind Intel
(it may somewhat overlap for BD, but Intel's 32nm is already mature, where AMD/GloFo will still be ironing issues out for BD).
Frickin'
Internet Explorer will use more threads (whole processes, really) than most of our desktops have. 4-8 threads, usable by common desktop software, is not a far-flung future. It's already becoming mainstream. Every common code base that gets a significant change, will get changed so as to use more threads, should it make sense.
If your workload needs RAM, redundancy, and more RJ45s, Atom or Bobcat type servers on the cheap will be for you
(give Bobcat some server reliability features, and the rest should fall into place). Proxies
(including load-balancing layers for front ends), some web apps, select-only DB clusters, etc., can make better use of more computers than faster ones. Likewise, file servers, DNS servers, and other servers that need a physical resource, not a virtual one, could benefit from lower power consumption and reduced rack space.
If you have IO contention, but latency is secondary, and you've got the deep pockets
(or if the hardware SSL is a killer feature for you), something like like the Oracle T3 may be up your alley. However, there are potential vendor lock-ins to worry about, as well; and last I knew, 64-bit support for those machines in Linux still wasn't ideal.
For everyone else, even as apps scale out better and better, performance per thread context still needs to keep going up. Even with something that scales out fairly well, the fewer cores it
must to scale out to, the better, and the fewer sockets involved for the performance you need, the better. Moving data between sockets in the same computer, FI, can have a significant impact on performance. The finer granularity you need as spread out your previously-not-multithreaded code, the more it matters, Likewise, some workloads, even that scale out well, don't always scale across threads in a way that keeps latency down, giving you a situation where you have to really think about exactly how you want to handle it*. More cores are
the future, but very weak cores will only serve small niches of
the future.
*
Let's say you have a server/workstation app, and it needs to scale out to many threads. For the resources you can put in, you expect to be able to use 8 cores to decrease the time a normal task takes by 30-50%, and scale out to several tasks. Or, you can make the code for each task for efficient per thread, and scale out to 8 tasks at a time, but with that, any tasks depending on previous tasks will take much longer than with the first option. Using the former, all the additional threads to be managed could hurt performance for users with fewer threads to use, and/or who could benefit from many tasks running at once. Using the latter, some users simply won't have 8 tasks to scale out to, so would not see the benefit from the new version's code, that they are paying for as much as the users who have >=8 tasks for their >=8 logical threads. What exactly do you do?