CPU technology - less is more

SagaLore · Jan 19, 2005

Well it was fun while it lasted. The RISC model was the popular choice early 90s. It made sense - simplify the cpu, and you could make it faster. But AMD/Intel keep competing and adding more features, and the transistor count continues to grow. Now we have these high performance processors with two cores on them, doubling the transistors.

Why not revisit RISC? Maybe even MISC? About 6 to 8 years ago I was really into MISC technology, pushed by the Forth language. They were using outdated chips to run these ultra tiny dies that could run four 5 bit opcodes within a single fetch, and the whole thing was stack based and extremely efficient. Unfortunately stack based processing, and especially Forth, is not very popular (or known) except with embedded/realtime programmers.

Perhaps MISC is a little extreme, but why not go back to a RISC approach. Cut these dies to about the 10th of their size, get rid of all these extra instructions that only do very specific tasks that a minority of software actually uses (like 3dnow, sse2, etc.). And then put about 10 of these dies onto a single cpu along with the memory controller. That would be some serious multitasking, would it not?

ghackmann · Jan 19, 2005

Originally posted by: SagaLore
Why not revisit RISC? Maybe even MISC?

Two words: legacy apps.

Although it's worth noting that modern x86 processors are more-or-less RISC chips. The x86 instruction set is still CISC, but the CPU internally converts from x86 to RISC micro-operations.

SagaLore · Jan 19, 2005

Originally posted by: ghackmann

Originally posted by: SagaLore
Why not revisit RISC? Maybe even MISC?

Click to expand...

Two words: legacy apps.

Although it's worth noting that modern x86 processors are more-or-less RISC chips. The x86 instruction set is still CISC, but the CPU internally converts from x86 to RISC micro-operations.

Three words: legacy app emulation.

Yea I've had a class that explained the CISC within a RISC conglomeration. I'd rather see the CISC operations offloaded to the OS as emulation support so future applications could optimize for the slimmer instruction set.

BEL6772 · Jan 19, 2005

Now that Intel and AMD agree that clockspeed is not the end-all of performance, we may start seeing some more creative ways of improving the user experience. It seems that the first step is multi-core products based on existing cores, but I think it's safe to say that both companies are looking beyond this first step and are exploring options like multiple small execution units on a chip. Should be fun to watch!

icarus4586 · Jan 19, 2005

Modern CPU's can't really be described as RISC or CISC or whatever. x86, POWER, Sparc, etc. all contain elements both of RISC and CISC. To explain this, here's the transformation code goes through on the way to execution. Start with the user-understood language: C/C++, Perl, whatever. This is then translated by a compiler to assembly, which represents binary ISA operations. In most modern microprocessors, these operations are decoded, then broken into smaller pieces (as in NetBurst and micro-ops). This involves a hardware decoder and a microcode engine. The presence of the microcode engine basically means that all of today's CPU's are basically RISC backends coupled with a decoder/microcode engine. This is done mainly for legacy code support and is why a Pentium 4 can run i386 code. And really, all this decoding and pseudo-emulating doesn't cost much performance-wise. Theoretically, maybe a couple percent. The transistor, and consequentially die size, cost is greater though. But such is the cost of compatibility.

Calin · Jan 20, 2005

Originally posted by: SagaLore

Originally posted by: ghackmann

Originally posted by: SagaLore
Why not revisit RISC? Maybe even MISC?

Click to expand...

Two words: legacy apps.

Although it's worth noting that modern x86 processors are more-or-less RISC chips. The x86 instruction set is still CISC, but the CPU internally converts from x86 to RISC micro-operations.

Click to expand...

Three words: legacy app emulation.

Yea I've had a class that explained the CISC within a RISC conglomeration. I'd rather see the CISC operations offloaded to the OS as emulation support so future applications could optimize for the slimmer instruction set.

One work: performance

The emulation was reduced too much for everyone to like it.

imgod2u · Jan 20, 2005

Modern MPU design has reached the point where the ISA matters little (one scalar ISA is just about as good as another). They're mainly for programming convenience. Decoding is no longer as much of a burden now as it was back when the RISC craze hit. Not to say that x86 is the perfect ISA but its drawbacks have pretty much been eliminated with micro-architectural finess.

That being said, the move does seem to be towards larger and more complicated cores. The move to multi-core is an attempt at going backwards, using smaller, simpler cores but having them run in parallel. While this may not be as efficient in terms of processing power to transistors (and in many cases, heat), it does solve one of the biggest problems in MPU design today, the growing complexity and verification efforts required.

It should be noted that attempts at improving processor performance while maintaining simplicity by changing the ISA has been tried and, arguably, wasn't that successful. We see this in IA-64 and its VLIW-like ISA. The core of an Itanium chip has more processing power than most other chips out there and it's still very small. The drawback being that it relies very much on the compiler and its memory subsystem.

The whole RISC vs CISC thing is old and moot. The ISA (as far as scalar ISA's are concerned) matters very little nowadays. The majority of the processor's die are spent towards complicated features like very long pipelines (which means very large window of instructions, re-order buffer, branch prediction logic, etc. etc.) or really wide cores (a lot of execution units in parallel, with ports and register access routes that goes from each register port to each execution port, taking up a lot of space). All that complexity decreases the performance/watt or performance/transistor ratio significantly, as having twice the execution resources will not get you twice the performance.

The same could be said of multi-core but at least multi-core gets rid of the design complexity and verification time.

Calin · Jan 20, 2005

Also, multi-core processors will only be viable as a normal desktop replacement when most of the programs widely used will be capable to work much faster on two processors than on a single one. As long as two 1.5GHz processing units will work just as fast as a single one, for the usual needs, multicore is doomed.
Sun's Niagara and Rock projects will have up to 64 simpler cores in a single package, and its focus is to be used in systems where 64 cores will work 64 times faster than a single one of those core (hopefully).
The need for multicore in workstations is not yet big enough, even if some of the programs used work much faster on multiprocessors.

Anyway, the idea now is the public (most of the programs) are NOT ready for multithreading, so multiple processors/cores won't help.

Matthias99 · Jan 20, 2005

Anyway, the idea now is the public (most of the programs) are NOT ready for multithreading, so multiple processors/cores won't help.

Unless, of course, you might want to do something crazy like running multiple programs at the same time. But, I mean, nobody *ever* does that.

The only place multi-CPU or multicore setups would be totally useless is for games (where you're running one singlethreaded app that needs all the CPU power it can get), or on workstations where you are running singlethreaded apps (again, where you are usually doing one thing at a time and you want all the CPU power you can get for it). For most 'normal' applications, today's CPUs are just way, way overkill anyway, so I would think most people would see a better computing experience with a dual-core chip that was, say, 2/3 the speed of a single modern processor.

Sunner · Jan 20, 2005

Originally posted by: Matthias99

Anyway, the idea now is the public (most of the programs) are NOT ready for multithreading, so multiple processors/cores won't help.

Click to expand...

Unless, of course, you might want to do something crazy like running multiple programs at the same time. But, I mean, nobody *ever* does that.

The only place multi-CPU or multicore setups would be totally useless is for games (where you're running one singlethreaded app that needs all the CPU power it can get), or on workstations where you are running singlethreaded apps (again, where you are usually doing one thing at a time and you want all the CPU power you can get for it). For most 'normal' applications, today's CPUs are just way, way overkill anyway, so I would think most people would see a better computing experience with a dual-core chip that was, say, 2/3 the speed of a single modern processor.

But then again, for most users, gaming is the only place where performance is truly needed.
Workstations and servers aside, what do people do?

Office apps, watch a movie, surf the net, and so forth.
It isn't until the kids(or the techies "working" late) break out the game that all the performance of that brand new P4/A64 is truly needed, the rest of the day it's mostly going to waste.

I have a hard time seeing where my parents would ever need a dual core chip.
I could use one at work, but at home I don't need it either, I do all my work at work, at home I play games mostly, and when I do, I need fast single thread performance.

SagaLore · Jan 20, 2005

Originally posted by: Sunner

Originally posted by: Matthias99

Anyway, the idea now is the public (most of the programs) are NOT ready for multithreading, so multiple processors/cores won't help.

Click to expand...

Unless, of course, you might want to do something crazy like running multiple programs at the same time. But, I mean, nobody *ever* does that.

Click to expand...

But then again, for most users, gaming is the only place where performance is truly needed.
Workstations and servers aside, what do people do?

You guys are looking at this too big. Open up Task Manager. CPUs don't process by application, they process by individual procceses. Typically games are only a single process, but you still have all that extra junk the OS is running, your personal firewall, your antivirus, your antispyware, your web browser, sometimes printer drivers and sound drivers, etc.

When you're gaming, isn't it really the video card that is the most important? Any other tasks the cpu('s) have to accomplish would probably be really simple anyway, so multiple slimmer cores would increase the gaming experience.

clarkey01 · Jan 20, 2005

X86

x86 does have its problems, but a lot of PPC and IA64 supporters honestly couldn't tell you what these problems are, only that they exist. The PPC supporters are especially guilty of this.. not to say that the latest PowerPC processors are nothing special (they're really quite nice), just that Apple fanboyism often leaks over into architecture.

If you ask me, we're past most of what made x86, well, suck. Modern x86 CPUs are really just RISC in sheep's clothing (hey, who said I couldn't randomly mix analogies?) for the most part, and the only reason x86 is said to drag us down is that CPUs basically have to translate x86 instructions into an easier-to-digest form (basically/semi-incorrectly: CISC to RISC, stupid to smart). Do we lose some speed doing this? Sure! Do we lose enough that we need to go through making an entirely new architecture for PCs? Well.. maybe not.

The trouble isn't so much designing the new architecture, really. The engineers doing this probably find it fun. The trouble is that you have to tell the market "hey, we're going to break compatibility with everything out right now, but look at it this way-- if you buy all this expensive new hardware then run the expensive software coded for this new architecture, you'll get a moderate speed boost over the old hardware!" Who out there is going to say "ooh! Me first!"?

There are more.. delicate ways to handle this situation, of course. Ace's Hardware went over it in that Kill x86 article of theirs. It may not be the easiest thing in the world, but it would be relatively painless for the market. The point is that these light-handed (on the "market treatment" side, nevermind the poor engineers who're told that they have to make a CPU that's effectively two architectures in one) ways of introducing a new instruction set / arch were not the ways that Intel chose.

Sunner · Jan 21, 2005

Originally posted by: SagaLore

Originally posted by: Sunner

Originally posted by: Matthias99

Anyway, the idea now is the public (most of the programs) are NOT ready for multithreading, so multiple processors/cores won't help.

Click to expand...

Unless, of course, you might want to do something crazy like running multiple programs at the same time. But, I mean, nobody *ever* does that.

Click to expand...

But then again, for most users, gaming is the only place where performance is truly needed.
Workstations and servers aside, what do people do?

Click to expand...

You guys are looking at this too big. Open up Task Manager. CPUs don't process by application, they process by individual procceses. Typically games are only a single process, but you still have all that extra junk the OS is running, your personal firewall, your antivirus, your antispyware, your web browser, sometimes printer drivers and sound drivers, etc.

When you're gaming, isn't it really the video card that is the most important? Any other tasks the cpu('s) have to accomplish would probably be really simple anyway, so multiple slimmer cores would increase the gaming experience.

Yeah, but all those background tasks won't amount to more than 1-2% CPU usage, if that, on any modern box.
I remember Carmack stating that Quake3 spent >75% of it's CPU time talking to the graphics driver, which is why any phenomenal speedups weren't possible with it.

nyarrgh · Jan 21, 2005

even games have the POTENTIAL to benefit from multiple cores. One of the reasons Creative still sells sound cards is that it takes less CPU power to run a creative card than some others. Half life 2 has it's own sound routines that would benefit greatly if it can be offloaded to another CPU. The new 7.1 HD sound integrated in motherboards (mostly Intel ) would benefit. Hopefully someday NPC AI can someday also be multithreaded ( another NPC came into the room? spawn another thread!)

Matthias99 · Jan 21, 2005

Yeah, but all those background tasks won't amount to more than 1-2% CPU usage, if that, on any modern box.

However, having multiple cores/CPUs saves a lot of the 'extra' CPU overheard from context swaps to background tasks, etc. It generally also improves cache hit rates, since you're not switching between processes as often per core (I can't remember if x86 CPUs flush the cache on memory space swaps -- I don't think they do -- but you certainly start to fill up the instruction and data cache with data from the new process). So that '1-2%' of background usage actually impacts your foreground tasks more than the numbers would directly indicate on a single-CPU machine. Not to mention if you're running something CPU-intensive (say, a virus scan, or encoding an audio/video file) in the background. It sounds minor, but that's a big part of what makes multiprocessor machines run 'smoother' from the user's point of view while multitasking.

I remember Carmack stating that Quake3 spent >75% of it's CPU time talking to the graphics driver, which is why any phenomenal speedups weren't possible with it.

That does sort of tend to become the limiting factor with games that do very simple physics and very rudimentary AI.

imgod2u · Jan 21, 2005

Originally posted by: Matthias99

Yeah, but all those background tasks won't amount to more than 1-2% CPU usage, if that, on any modern box.

Click to expand...

However, having multiple cores/CPUs saves a lot of the 'extra' CPU overheard from context swaps to background tasks, etc. It generally also improves cache hit rates, since you're not switching between processes as often per core (I can't remember if x86 CPUs flush the cache on memory space swaps -- I don't think they do -- but you certainly start to fill up the instruction and data cache with data from the new process). So that '1-2%' of background usage actually impacts your foreground tasks more than the numbers would directly indicate on a single-CPU machine. Not to mention if you're running something CPU-intensive (say, a virus scan, or encoding an audio/video file) in the background. It sounds minor, but that's a big part of what makes multiprocessor machines run 'smoother' from the user's point of view while multitasking.

Again, this is trivial when dealing with something that takes 1-2% CPU time. The OS is smart enough not to switch to a background process for 1 or 2 instructions and then switch back. More than likely the time-slice given to that background process lasts its entire lifetime and only occurs rarely (hence, the 1-2% usage). This would mean context switches and cache flushes do not occur often as well. The difference would therefore be unnoticable.

I remember Carmack stating that Quake3 spent >75% of it's CPU time talking to the graphics driver, which is why any phenomenal speedups weren't possible with it.

That does sort of tend to become the limiting factor with games that do very simple physics and very rudimentary AI.

[/quote]

Modern games are using more and more complex AI and physics engines.

Matthias99 · Jan 21, 2005

Again, this is trivial when dealing with something that takes 1-2% CPU time. The OS is smart enough not to switch to a background process for 1 or 2 instructions and then switch back.

So... the OS knows in advance exactly how long every process is going to run for? I didn't know MS had psychic scheduling algorithms. What if the process runs 5 instructions, checks some status register in memory, and calls yield() (as, say, a polling service of some sort might do)?

Active processes have to be given a timeslice every now and then, so if you have 10-20 background processes running, you're going to get at least 10-20 such switches a second, even if each one only lasts, say, half a millisecond. Maybe more than that, depending on how often it gives each process a shot at the CPU.

More than likely the time-slice given to that background process lasts its entire lifetime and only occurs rarely (hence, the 1-2% usage). This would mean context switches and cache flushes do not occur often as well. The difference would therefore be unnoticable.

The preemptive scheduler, at least in Windows, operates on 10ms timeslices (1/100th of a second). So a process that does not explicitly yield back to the scheduler will use at least 10ms of CPU time before it switches back to the foreground process. I would think most background processes and drivers are well-written enough to do this (and so might only spend a couple ms running at most), but I wouldn't necessarily count on a regular application doing so. Try running a couple low-priority instances of Prime95 in the background and see if it is 'unnoticable'.

You really can't tell if "1%" usage is a single 10ms timeslice, or 10 1ms timeslices, or 100 .1ms timeslices being given to a process in a second. Well, you probably can, but not through the simple displays in Task Manager.

Modern games are using more and more complex AI and physics engines.

Uh... that was my point?

The earlier poster stated that Q3 was 75% graphics-driver limited, but I was trying to imply that games that use more sophisticated AI, physics, sound processing, etc. will not be like that.

imgod2u · Jan 21, 2005

Originally posted by: Matthias99

So... the OS knows in advance exactly how long every process is going to run for? I didn't know MS had psychic scheduling algorithms. What if the process runs 5 instructions, checks some status register in memory, and calls yield() (as, say, a polling service of some sort might do)?

No, any good OS would not constantly switch processes. It's smart enough to give reasonable and variable time-slices to programs depending on priority and often enough other data (such as how long previous runs of the process took). There are many different process scheduling schemes such as round-robin or first come first serve. I would expect most modern OS's to also have different levels with different priorities for system processes vs user level processes (with user levels usually taking higher priority since the user will notice a slowdown).

Active processes have to be given a timeslice every now and then, so if you have 10-20 background processes running, you're going to get at least 10-20 such switches a second, even if each one only lasts, say, half a millisecond. Maybe more than that, depending on how often it gives each process a shot at the CPU.

Firstly, not all processes are active all the time. In fact, most aren't. Secondly, most processes who are active hardly need to be switched to every second. Lastly, on a processor that does 2 billion cycles a second, and assuming a context switch takes 50 cycles, if you have 20 in 1 second that's 1000 of 2 *billion* cycles you've devoted to context switching, or, a 0.00005% decrease in performance.

The preemptive scheduler, at least in Windows, operates on 10ms timeslices (1/100th of a second). So a process that does not explicitly yield back to the scheduler will use at least 10ms of CPU time before it switches back to the foreground process. I would think most background processes and drivers are well-written enough to do this (and so might only spend a couple ms running at most), but I wouldn't necessarily count on a regular application doing so. Try running a couple low-priority instances of Prime95 in the background and see if it is 'unnoticable'.

Luckily, most background processes are not processor-intensive like Prime95 and their lifecycle will be very short. If you're arguing that using dual-core processors will allow you to run an instance of Prime95 crunching in the background and not have too much of a noticable decrease in performance in the foreground, I'm not going to argue with you there. But who exactly *does* that?

You really can't tell if "1%" usage is a single 10ms timeslice, or 10 1ms timeslices, or 100 .1ms timeslices being given to a process in a second. Well, you probably can, but not through the simple displays in Task Manager.

No, but no program would actually do such a thing. Well, let me rephrase that. No *good* program would do that. Unless you have a *very* malformed program running in the background consistently sleeping and running, it'll be very unlikely that you'll get to the point where context switches will use anywhere close to the 2 billion CPU cycles per second you have.
Even assuming you'd have to go to memory to switch to a process (which, if it's switching *that* much in a single second, everything would probably be in cache anyway), taking ~200 cycles to fetch the first instruction, let's give it an upper bound of 300 cycles per context switch. To make a dent in the processing time (~10%), on a 2 GHz processor, that would mean context switches have to take 200 million cycles. So there'd have to be close to 1 *million* context switches per second to make a 10% noticable drop in performance. Just think about how feasible that number is.

Search

CPU technology - less is more

SagaLore

Elite Member

ghackmann

Member

SagaLore

Elite Member

BEL6772

Senior member

icarus4586

Senior member

Calin

Diamond Member

imgod2u

Senior member

Calin

Diamond Member

Matthias99

Diamond Member

Sunner

Elite Member

SagaLore

Elite Member

clarkey01

Diamond Member

Sunner

Elite Member

nyarrgh

Member

Matthias99

Diamond Member

imgod2u

Senior member

Matthias99

Diamond Member

imgod2u

Senior member

TRENDING THREADS