X86 varieties this year

ydnas7

Member
Jun 13, 2010
160
0
0
Atom, developing into a true Arm competitor for internet aware mobile applications.

Bobcat, similar to Atom, but when a choice is made between computing performance and battery life, it chooses computing performance. For any application that is mains powered, bobcat would be a better choice. A throughput cpu using, many Bobcat's would probably shame both SB and BD for equivalent process and chip area.

LLano, the last hurrah for AMD K8, will make itself into a new niche, the unofficial new console, i expect it will take marketshare from Xbox360/PS3, will severely hurt Nvidia, and but not perform so well against intel.

Sandybridge SB, for any workload that is not highly threaded, (Amdahl's law constrained) this will be the king. Also good for laptops etc. with LLano, it will mostly kill low end descrete graphics (100mm2 will tend to become the new low end graphics chip)

Bulldozer BD needs to turn up to the fight first. won't compete as well against SB in most applications, but for per core, cloud computing it will be significantly more competitive than intel SB. by turning up late, it has already ceded client side computing to intel.
 

Martimus

Diamond Member
Apr 24, 2007
4,490
157
106
Bulldozer BD needs to turn up to the fight first. won't compete as well against SB in most applications, but for per core, cloud computing it will be significantly more competitive than intel SB. by turning up late, it has already ceded client side computing to intel.

To be honest, the Bulldozer design doesn't seem to be focused on client side computing based on its design. It is very much agrigate throughput focused (at least compared to Sandybridge), so it should be superior for heavy threaded situations, but inferior for single threaded IPC.
 

Voo

Golden Member
Feb 27, 2009
1,684
0
76
It is very much agrigate throughput focused (at least compared to Sandybridge), so it should be superior for heavy threaded situations, but inferior for single threaded IPC.
Yeah, which means as long as the number of threads is <= max. nr of SB threads, the performance will be worse. And we all know the number of consumer applications that can handle more than 4 threads atm (encoding software and folding stuff yippie). And even if people focus more on multi threading, there are enough algorithms that just don't scale great (or at all).
Selling cores for cheap is only great as long as we can use them, more single threaded performance is always welcome.
 

Martimus

Diamond Member
Apr 24, 2007
4,490
157
106
Yeah, which means as long as the number of threads is <= max. nr of SB threads, the performance will be worse. And we all know the number of consumer applications that can handle more than 4 threads atm (encoding software and folding stuff yippie). And even if people focus more on multi threading, there are enough algorithms that just don't scale great (or at all).
Selling cores for cheap is only great as long as we can use them, more single threaded performance is always welcome.

I don't quite agree. Looking at the architecture, it looks like BD will be better at retiring 2 threads than SB will be, let alone 8. What it will likely be worse at is single threaded IPC.

EDIT: Maybe it is because I haven't slept in about a month, but I can see now that you likely right. The AMD design seems to have more throughput per module than the Intel design, but lower IPC per module for an individual thread. However, while there are changes to make the modules comunicate quicker, there is still likely an advantage in SB cores over BD modules when there is only 1 thread per module/core.

SB seems to be what will be the better client processor, while BD seems to be the better server processor for highly threaded applications. I plan to purchase a SB i7-2600K next year, as long as it isn't much over $300.
 
Last edited:

Cerb

Elite Member
Aug 26, 2000
17,484
33
86
LLano, the last hurrah for AMD K8, will make itself into a new niche, the unofficial new console, i expect it will take marketshare from Xbox360/PS3,
No. Consoles' strength is in being specialized devices, with a fairly vertical software market. Llano's main target are the mobile Core i3 and Core 2 Duo laptops. I'm hoping AMD has a deal with somebody like HP, to make some good midrange ones.

Bulldozer BD needs to turn up to the fight first. won't compete as well against SB in most applications, but for per core, cloud computing it will be significantly more competitive than intel SB. by turning up late, it has already ceded client side computing to intel.
It ceded single-threaded/client performance to Intel way before turning up late, and AMD has been server-first since the K8. There's no reason it won't be a good desktop CPU--but, good like a Phenom II, not good like an i7. On the server side, they can tout the shared resources in perf/watt, on the desktop they can make decent margins (hopefully) by getting more CPUs per wafer than if they hadn't gone that route.

Secondly, cloud computing, in the sense that you just have a UI, and the work is done on servers, will still be best handled with good per-thread performance, rather than more servers or threads. IoW, you want more cores, but 8 fast cores will usually be better than 64 cores each 1/8 as fast, even if each of those has plenty of IO resources per thread. One core 8x as fast brings you up to non-air cooling, so that's right out. It's balancing act, in which AMD and Intel have to find the closest point to the intersection of many different performance curves...and Intel can afford to throw money at R&D in a way that only IBM can match.

Voo said:
Selling cores for cheap is only great as long as we can use them, more single threaded performance is always welcome.
Per-thread performance will improve for the foreseeable future. More is not welcome, more is needed. AMD's BD should perform well when there aren't many threads, and then perform better than the same number of logical threads from Intel when there are. It's not like they're making an x86 version of Niagara/Rock. They're focusing on how to make those many threads work well, as they will slowly grind themselves into the ground, if they try to compete with Intel on raw per-thread performance, and many threads are a reality, and they don't have the R&D that Intel has, to make SRAM smaller, and they are consistently at least one process node behind Intel (it may somewhat overlap for BD, but Intel's 32nm is already mature, where AMD/GloFo will still be ironing issues out for BD).

Frickin' Internet Explorer will use more threads (whole processes, really) than most of our desktops have. 4-8 threads, usable by common desktop software, is not a far-flung future. It's already becoming mainstream. Every common code base that gets a significant change, will get changed so as to use more threads, should it make sense.

If your workload needs RAM, redundancy, and more RJ45s, Atom or Bobcat type servers on the cheap will be for you (give Bobcat some server reliability features, and the rest should fall into place). Proxies (including load-balancing layers for front ends), some web apps, select-only DB clusters, etc., can make better use of more computers than faster ones. Likewise, file servers, DNS servers, and other servers that need a physical resource, not a virtual one, could benefit from lower power consumption and reduced rack space.

If you have IO contention, but latency is secondary, and you've got the deep pockets (or if the hardware SSL is a killer feature for you), something like like the Oracle T3 may be up your alley. However, there are potential vendor lock-ins to worry about, as well; and last I knew, 64-bit support for those machines in Linux still wasn't ideal.

For everyone else, even as apps scale out better and better, performance per thread context still needs to keep going up. Even with something that scales out fairly well, the fewer cores it must to scale out to, the better, and the fewer sockets involved for the performance you need, the better. Moving data between sockets in the same computer, FI, can have a significant impact on performance. The finer granularity you need as spread out your previously-not-multithreaded code, the more it matters, Likewise, some workloads, even that scale out well, don't always scale across threads in a way that keeps latency down, giving you a situation where you have to really think about exactly how you want to handle it*. More cores are the future, but very weak cores will only serve small niches of the future.

* Let's say you have a server/workstation app, and it needs to scale out to many threads. For the resources you can put in, you expect to be able to use 8 cores to decrease the time a normal task takes by 30-50&#37;, and scale out to several tasks. Or, you can make the code for each task for efficient per thread, and scale out to 8 tasks at a time, but with that, any tasks depending on previous tasks will take much longer than with the first option. Using the former, all the additional threads to be managed could hurt performance for users with fewer threads to use, and/or who could benefit from many tasks running at once. Using the latter, some users simply won't have 8 tasks to scale out to, so would not see the benefit from the new version's code, that they are paying for as much as the users who have >=8 tasks for their >=8 logical threads. What exactly do you do?
 
Last edited:

Voo

Golden Member
Feb 27, 2009
1,684
0
76
Frickin' Internet Explorer will use more threads (whole processes, really) than most of our desktops have. 4-8 threads, usable by common desktop software, is not a far-flung future. It's already becoming mainstream. Every common code base that gets a significant change, will get changed so as to use more threads, should it make sense.
Yep, but most of those threads/processes won't do any heavy lifting at all, seperating every tab into it's own process isn't done for performance reasons (actually it harms performance and increases the footprint of the app.. still worth the advantages though). So the question is how many threads can be used to render a page and I doubt that they can seperate that good enough for 8, 16 or 32 threads (but what do I know? Haven't written a JS renderer, maybe that is one of those tasks that scales fairly well.. but I doubt it)

The rest of your post seems to be more focussed on enterprise/server software, which more often than not scales fairly well and performance/watt and other features are more important than performance alone. But yeah I agree that even there good single threaded performance is important as well, at least where latency is still of concern (like you point out ;) )


But for desktops more than 4 threads seems to be kinda useless atm if you're not into rendering or F@H. And while there are interesting solutions to write multi threaded programs out there, they're all rather new or klunky/complicated and not to forget multi threaded algorithms are usually by design WAY more complicated than the standard ones (let an average undergrad implement a tree contraction..). So let's better hope that more and more multi threaded libraries show up and better multi threading support in the important standard libraries, but all in all it makes SW engineering even more complex than it already is.


PS: But I'm not too sure about great IPC improvements in the next few years - I mean till 2005 we got most of our performance improvements from more GHz, SB brings allegedly ~15-20% IPC improvments which is nice, but the question is if Intel/Amd can continue to do that every two years from now on..
 

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
To be honest, the Bulldozer design doesn't seem to be focused on client side computing based on its design. It is very much agrigate throughput focused (at least compared to Sandybridge), so it should be superior for heavy threaded situations, but inferior for single threaded IPC.

And unfortunately while the absolute best we can hope for is to make some fairly reasonable "guesses" regarding bulldozer's single-thread IPC we must acknowledge the fact that we have zero information to go on to formulate any credible speculation as to clockspeed and power-consumption.

Will single-threaded performance of bulldozer be lackluster? We need to know clockspeed, power-consumption, and pricing to answer that question.

(I'm not saying this to disagree with you, I'm just saying this while quoting you since my sentiments here in my post are best matched up as reflecting your sentiments expressed in your post)
 

Jovec

Senior member
Feb 24, 2008
579
2
81
Is there any app where AMD's lower single-threaded IPC is holding anyone back besides gaming? The majority of what the average computer user does is still input limited. And the majority of what the average computer does that is heavily CPU limited will benefit from more cores (primarily video encoding).

I don't want to be the 640K memory guy, but isn't any modern $100 (or less) CPU enough for standard desktop usage? Wouldn't typical consumers be better served with either bigger displays, more GPU, more RAM, faster Internet, or an SSD than more CPU? If anything, mobile power consumption is a bigger issue for AMD today than IPC. AMD does need to improve its IPC because they cannot constantly add more cores to their desktop lineup, but they certainly don't need to match Intel's IPC to be competitive.
 
Last edited:

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
Is there any app where AMD's lower single-threaded IPC is holding anyone back besides gaming? The majority of what the average computer user does is still input limited. And the majority of what the average computer does that is heavily CPU limited will benefit from more cores (primarily video encoding).

I don't want to be the 640K memory guy, but isn't any modern $100 (or less) CPU enough for standard desktop usage? Wouldn't typical consumers be better served with either bigger displays, more GPU, more RAM, faster Internet, or an SSD than more CPU? If anything, mobile power consumption is a bigger issue for AMD today than IPC. AMD does need to improve its IPC because they cannot constantly add more cores to their desktop lineup, but they certainly don't need to match Intel's IPC to be competitive.

I suppose the same argument could be made of every automobile purchased. Does anyone really need a 6-cylinder (or 8, or 10!?) vehicle that gets less than 30mpg?

Couldn't they all just be happy with a 4-cylinder gas-sipper shoved into whatever auto-body they liked (be it a 2-door coupe or a hummer) with whatever features embellishing the interior to their liking? (10-speaker stereo? built-in nav? cup holders?)

The truth is people consider the horsepower under the hood just as much as they account for the paint-color when choosing the vehicle that is right for them. It will never be any different for any other personal gadget or machine. Be it a computer, a car, a cellphone, their house, etc.

When is it ever about need? It is always about want.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,787
136
Good point there IDC. Although with cars for most people have to abide by the speed limit and safety issues while computers don't have that limitation.

People who don't really use the extra speed tend to not know much about computers either. They'll still have to replace it when it breaks down, and the halo effect provided by the new computers and the faster=better mentality still exist.

Semiconductors are about the only area where the 5 year old product in the future is better in almost every way than the current ones. You honestly can't even see a place where an older system is better than the newer one. Even if you use it as a doorstop the smaller form factor will probably be easy on you since its easier to move it around. :\
 

Jovec

Senior member
Feb 24, 2008
579
2
81
I suppose the same argument could be made of every automobile purchased. Does anyone really need a 6-cylinder (or 8, or 10!?) vehicle that gets less than 30mpg?

Couldn't they all just be happy with a 4-cylinder gas-sipper shoved into whatever auto-body they liked (be it a 2-door coupe or a hummer) with whatever features embellishing the interior to their liking? (10-speaker stereo? built-in nav? cup holders?)

The truth is people consider the horsepower under the hood just as much as they account for the paint-color when choosing the vehicle that is right for them. It will never be any different for any other personal gadget or machine. Be it a computer, a car, a cellphone, their house, etc.

When is it ever about need? It is always about want.

You can feel the difference between a 4-cyl econobox and a 500HP sports car in everyday driving even while obeying speed laws. That doesn't apply to modern computers. Intel might be generating 500HP compared to 350HP for AMD in IPC, but the transmissions (average usage scenarios) in both cars are only capable of handling 200HP.

Outside of gaming, show me where the typical user feels the difference between an i7 and a Athlon x4 browsing the web, playing videos, creating a word doc, etc. Slower CPUs are masked behind slow storage, slow bandwidth, and slow user input.

Do we like faster stuff? Of course. Do we buy more than we need? Of course. I will also not argue that perception of speed (or lack thereof) can ultimately hurt AMD in the long run.
 
Last edited:

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
You can feel the difference between a 4-cyl econobox and a 500HP sports car in everyday driving even while obeying speed laws.

My analogy was to suggest that regardless the "difference" you feel while obeying the speed laws, except in rather niche end-user situations you don't "need" that difference. It is a nice to have, not a need to have.

But the point is that making this point itself is pointless, nobody buys based on need. They justify purchases based on the perception of need, but we all based on want and the ability to be manipulated to want.

Good thing too, a lot of folks would be out of jobs if marketing did not play a role :p

Outside of gaming, show me where the typical user feels the difference between an i7 and a Athlon x4 browsing the web, playing videos, creating a word doc, etc.

How can I "show" you that anymore than you can show where a typical user feels the difference between a 4-cyl econobox and a 500HP sports car in everyday driving?
 

extra

Golden Member
Dec 18, 1999
1,947
7
81
Meh...I do agree that a lowly athlon II x4 (or a tri core) is perfectly adequate for almost everything that normal users do. Even photo editing and gaming and such. However, most of us on this forum aren't "normal" users. We want the best (which is why I have an oc'd i7)...

But at work for productivity use, if you gave me a choice between an athlon II (or an old core2duo) with an SSD and an i7 with a regular hdd, i'd take the lowly athlon II (or core2duo) in a heartbeat--it just helps sooooooo much more with normal day to day productivity use...night/day difference.

Let's face it, at this point in time, for *most* users, the cpu doesn't really matter. It's reached the "good enough" stage. If you have, say, an e5200 overclocked a bit or something, that's still a good enough cpu for almost everything. Replace that cpu with a top phenom x6 or i7 and they probably would barely (if at all) notice. Replace the hdd with an ssd and they'd notice in the first minute of using their computer that something was vastly different.

Which is why I think both bobcat and llano will do really well (they are focusing on getting the gpu performance up for most people, and efficiency). And it's why Intel seems to be focusing strongly on efficiency and such imho (and gpu performance there too!...reading about sandy bridge, it's mind blowing to me how awesome the technology in it is that is dedicated to making the chip take less energy.) It's the smart thing to do right now for most people. Anyway, sorry, I wrote a lot for what is basically a rant about how great ssd's are!
 

Cerb

Elite Member
Aug 26, 2000
17,484
33
86
Yep, but most of those threads/processes won't do any heavy lifting at all, seperating every tab into it's own process isn't done for performance reasons (actually it harms performance and increases the footprint of the app.. still worth the advantages though).
It does not harm the performance. It radically improves it. Those advantages, like quickly switching from one window to another, and one window's heavy load not affecting another, are important performance metrics, which have been ignored, as single CPU cores constantly got so much faster, that it took care of itself by throwing a little money at it. Users care when they have to wait on the PC. If an app is slower at some throughput-limited task, but faster at responding to the user, then it is really faster. Every JS benchmark you can name is just a bunch of BS, compared to being able to go to another page, click a link, and have it begin loading almost instantly, while your JS benchmark is running in another viewport, somewhere.

The holy grail for desktop applications is not keeping all your cores at 100%. The holy grail is for the user to never have to wait on your work, to get their GUI to respond to them.

Software which performs many tasks in many windows has historically had to deal with events from one affecting the others, and it being an ordeal to fix. More processes, more threads, with more distinct data, can help make it easier to prevent such events, which reduce both throughput and responsiveness. Don't think about the little loops. Think about the big modal dialog boxes. Think about file locks. Think about exception handlers that come back to bite you. Et cetera. Then, once all that is taken care of, think about the little loops, and when doing that, try to find the ones that will give you the most performance return for your time, not merely the ones that take the most time.

Spreading out our applications to many threads is not the only way to do it, but it is a sensible way to do it, given how mainstream computer software and hardware have developed. With cheap RAM, these days, multiple processes are perfectly good, too.
But for desktops more than 4 threads seems to be kinda useless atm if you're not into rendering or F@H. And while there are interesting solutions to write multi threaded programs out there, they're all rather new or klunky/complicated and not to forget multi threaded algorithms are usually by design WAY more complicated than the standard ones (let an average undergrad implement a tree contraction..). So let's better hope that more and more multi threaded libraries show up and better multi threading support in the important standard libraries, but all in all it makes SW engineering even more complex than it already is.
Using many threads at a very fine level isn't usually needed on the desktop, except for games, which face problems similar to servers (an expectation of certain increases in performance and data set size over time, yet not having the serial resources to do it). IoW, I think 4 being enough is a bit myopic, but the idea that we'll be running 50 pure CPU threads at a time on our desktops is fantsasy, barring unpredictable UI paradigm changes (direct thought input to the computer, FI).

At the moment is key. In just a few years, we've seen quite a few applications start to use small numbers of cores well, and it's only increasing. There's still plenty of room to grow, even in the short-term. While there is a point of diminishing returns (more important than absolute limits, on the desktop), for most software, we're not close to it. There's still room to exploit thread resources before the multithreaded libraries for your work are needed (though, I agree that will be a future necessity for mainstream software). Map your data dependencies on a large scale, then work your way down, and see if you can't find things that can be done in worker threads, for any task that might take more than maybe 200ms, or any situation which could lead to needlessly blocking non-dependent execution.

PS: But I'm not too sure about great IPC improvements in the next few years - I mean till 2005 we got most of our performance improvements from more GHz, SB brings allegedly ~15-20% IPC improvments which is nice, but the question is if Intel/Amd can continue to do that every two years from now on..
There is an obvious human efficiency wall, there, eventually. I don't trust anyone's prediction on how or when we'll hit it, nor how we'll deal with it (especially w/ Arthur C. Clarke being dead :)).
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,787
136
When I went from a Core 2 Duo E6600 + X25-M to Core i5 661 + X25-M, the response time noticeably improved. Even with the Core 2 there were cases where I wished it would respond faster, and I'm the type that closes tabs and applications that isn't my main focus.

While there are some applications(like loading games) that can't be improved in response times with the CPU, in overall the improvement of going to the new CPU was not less than what I felt when I went from the 36GB Raptor to the X25-M.

(I was a early adopter for the X25-M. I bought it month or two after it came out for $800. It might have been due to the big anticipation I had, but I felt a greater improvement when I first bought the Raptor drive. It was slightly disappointing)

Bottleneck is fixed, and another one opens up. Rather than being forced to choose between slow CPU + SSD vs. fast CPU + platter hard drives, I'd be doing a lot to get fast CPU + SSD. :)
 
Last edited:

Voo

Golden Member
Feb 27, 2009
1,684
0
76
It does not harm the performance. It radically improves it. Those advantages, like quickly switching from one window to another, and one window's heavy load not affecting another, are important performance metrics, which have been ignored, as single CPU cores constantly got so much faster, that it took care of itself by throwing a little money at it. Users care when they have to wait on the PC.
Umn the point was that using processes for that isn't done for performance reasons, because, well performance isn't everything. A context switch between processes is much more work than between threads, the only reason to use processes instead of threads is the enhanced security. But I assume you thought I mean using one thread for all the tabs - nope, that'd be idiotic ;)
But: You click on that link and the page gets rendered - since that's (at least now) a single threaded task - you'll get the lowest latency from the highest single threaded performance, so you may have 30 threads but only one of them is doing real work.

The holy grail for desktop applications is not keeping all your cores at 100%. The holy grail is for the user to never have to wait on your work, to get their GUI to respond to them.
That's true, but if you can't do the work in the background and occupy the user somehow - presumbly because he wants the task he told the program to do, to be finished ;) - you'll want that to be done as quick as possible and let the os scheduler make sure everything is respondable.

Making the GUI multi threaded is an approach that has been forfeited by most frameworks I know of so far, because it just causes so much more problems than the performance increase is worth (e.g. java and the event-dispatching thread). But who knows maybe that'll change, though I'm not sure if there's lots of performance to gain there.

Background worker threads on the other hand are much easier, but don't really tackle the problem.. if you got lots of independent, small tasks, that's great, put them in a queue or something and let the background workers do their work. If you can do that, that's the best approach you can take, but at least for the stuff I've worked on there's usually some heavy lifting task involved that can't be split up easily and that's where it gets interesting and complicated.. and if we're especially unlucky there's not even an algorithm out there for the problem or it has prohibitive constant factors (ah sparse graphs how I love thee :p ).


But I agree that 4 threads aren't the end of the game, but we're talking about 1H 2011 CPUs here, so while we'll have much better multi threaded apps by 2013 or whatever, by then there will be at least one new generation of CPUs out there.
I bought a e8400 the last time because I assumed that for my useage scenario I would be better off with 2 higher clocked cores than 4 and well for me that worked out great, if I'd have to buy today I'd get something with 4 cores, so we're improving ;)
 

OBLAMA2009

Diamond Member
Apr 17, 2008
6,574
3
0
imma say it right now, imma say it right now, bd's out a year from now will underperform current i7's clock for cock
 
Last edited:

Martimus

Diamond Member
Apr 24, 2007
4,490
157
106
imma say it right now, imma say it right now, bd's out a year from now will underperform current i7's clock for cock

If it is prediction time, then I predict it will clock 20% higher than current i7's when it is released, so the IPC deficit is meaningless.
 

OBLAMA2009

Diamond Member
Apr 17, 2008
6,574
3
0
i will go so fars as ta say dhat bd's out a year from now will underperform the average i7 out today, regardless of da cockspeed at release time and dhat when released bd will have error/defects that will cause a delay in shippin or even a recall
 

busydude

Diamond Member
Feb 5, 2010
8,793
5
76
i will go so fars as ta say dhat bd's out a year from now will underperform the average i7 out today, regardless of da cockspeed at release time and dhat when released bd will have error/defects that will cause a delay in shippin or even a recall

Can you please cut down on your thread crapping? If you are so sure.. can you corroborate your statements and enlighten me and the forum members? I am really curious to know.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,787
136
I'm going to guess on Bulldozer's theoretical clock. If its not thermally limited it should be able to reach 10% higher non-Turbo clock compared to Sandy Bridge.
 

Cerb

Elite Member
Aug 26, 2000
17,484
33
86
Umn the point was that using processes for that isn't done for performance reasons, because, well performance isn't everything. A context switch between processes is much more work than between threads, the only reason to use processes instead of threads is the enhanced security. But I assume you thought I mean using one thread for all the tabs - nope, that'd be idiotic ;)
But: You click on that link and the page gets rendered - since that's (at least now) a single threaded task - you'll get the lowest latency from the highest single threaded performance, so you may have 30 threads but only one of them is doing real work.
For browsers in particular, yes, there's more going on. So, OK, I cede that it as not the best example :). It works well, but just processes do complicate matters. OTOH, the results are similar, as each viewport is its own unique universe, so communicating between the processes is going to be somewhat rare, and not happen often, even when using a web app that uses more than one at a time.

That's true, but if you can't do the work in the background and occupy the user somehow - presumbly because he wants the task he told the program to do, to be finished ;) - you'll want that to be done as quick as possible and let the os scheduler make sure everything is respondable.
Yet, this can often be difficult to handle with a single thread, and lends itself to more, if your program needs to be used while also doing work. Not because it should be, but because there ends up being so many layers of code you can't become an expert about in the time it takes to get the work done. When they click and something happens, users tend not to notice small inefficiencies, and I've yet to find anyone who has enough time to do everything on their todo lists. When waiting on work to be done, such that during that time, even if it is a second, if the user's input is blocked, and/or they get no response, they notice much more than if the GUI responds, but they don't get any kind of update from their work.

Making the GUI multi threaded is an approach that has been forfeited by most frameworks I know of so far, because it just causes so much more problems than the performance increase is worth (e.g. java and the event-dispatching thread). But who knows maybe that'll change, though I'm not sure if there's lots of performance to gain there.
The actual GUI itself? Nah. Video acceleration has been good enough to prevent the need for that. More making the GUI work on its own, not also doing the real work. Voluntarily act you're writing for BeOS ;).

Background worker threads on the other hand are much easier, but don't really tackle the problem.. if you got lots of independent, small tasks, that's great, put them in a queue or something and let the background workers do their work. If you can do that, that's the best approach you can take, but at least for the stuff I've worked on there's usually some heavy lifting task involved that can't be split up easily and that's where it gets interesting and complicated.. and if we're especially unlucky there's not even an algorithm out there for the problem or it has prohibitive constant factors (ah sparse graphs how I love thee :p ).
Sometimes, there's no need, and sometimes, there's just nowhere to go. It's not that everything will use several threads, or even be able to, without costing too much to do so, but that there's much room to improve; and it's being held back by old code that has hidden side effects waiting to bite you, if you try to wrangle it into using more than that one, lone, thread.

However, as far as setting up and using workers, I wish every imperative language did it like C#. It's very easy, and it makes it easy to try out cases that you might not be sure about. There's still some overhead, but not such a need to work directly with pthreads, or figure out all the quirks of random framework X, which are far more of a problem than inherent complexities of multithreading, unless you are trying to split up something that is rather insanely serial, and/or each step is too latency-sensitive to be moving data about caches. Do-all frameworks can be a whole other evil, themselves...

But I agree that 4 threads aren't the end of the game, but we're talking about 1H 2011 CPUs here, so while we'll have much better multi threaded apps by 2013 or whatever, by then there will be at least one new generation of CPUs out there.
I bought a e8400 the last time because I assumed that for my useage scenario I would be better off with 2 higher clocked cores than 4 and well for me that worked out great, if I'd have to buy today I'd get something with 4 cores, so we're improving ;)
I'm holding off until I can get 8, depending on how BD manages (if it's good and really cheap, or genuinely really fast, I'd probably get one; if it's good but only just competitive, I'd probably wait a couple months for Intel to respond, and go w/ Intel again), but I think we're at one of those points where the software is playing catch-up--though, I see it happening fast enough and well enough, that I'm not too worried.