Theory Questions - Simplest Possible CPU

campbbri

Junior Member
Nov 20, 2011
8
0
0
This is mostly "food for thought", but I'm hoping to understand processor design a little better and I know a some Anandtech members would have insight into this pretend scenario.

Suppose Intel wanted to create an ultra-reduced instruction set CPU that only allowed the following:

1. Create variables in a register/RAM or delete them.
2. Add, Subtract, Multiply, Divide
3. Whatever the minimum is to allow loops and If / Then statements, or anything else needed for basic code to function

Intel builds this on a 22nm process and uses all modern design elements to make the absolute fastest CPU possible without increasing clock speeds (so no 20 ghz chips allowed).

How many transistors would this CPU have? Could it have modern design elements like multiple levels of cache, out of order processing, branch prediction, and so on or does the compiler have to be aware of these features? If I wrote a simple program that told it to count to a trillion would it be faster than modern x86 CPUs?

Basically, when I buy a CPU with a billion transistors I wonder how much of that chip is used for "core" calculations and how much is hardware shortcuts for common functions, video, and so on.

Thanks!
 

Revolution 11

Senior member
Jun 2, 2011
952
79
91
The only real way to find out the answer to your question would be to hire Intel's CPU design team (pick one, doesn't matter) and give them billion of dollars, a 22-nm fab with tools, and several years. :p
 

MrTeal

Diamond Member
Dec 7, 2003
3,917
2,704
136
It would still be much bigger than you think, just from needing to interface to the outside world.
 

sefsefsefsef

Senior member
Jun 21, 2007
218
1
71
If you're willing to throw away all the modern interfaces to the outside world, then the CPU you make could be ridiculously small. If you're willing to do divide and multiply in software, then the core would be much, much smaller than that even.

EDIT: I'm assuming a new, non-x86, super-simple ISA for this hypothetical core.
 

Ken g6

Programming Moderator, Elite Member
Moderator
Dec 11, 1999
16,635
4,562
75
It sounds like you might be asking a RISCy question. ;)

"how much is hardware shortcuts for common functions" depends on your definition of "shortcuts for common functions". As mentioned above, multiplication and division can be done in software using just addition, subtraction, and preferably shift instructions. Floating-point math and 64-bit math can be emulated (slowly) on a CPU with as few as four integer bits per register, maybe even less. Though memory addressing can be tricky on such a simple CPU.

The CPU with the fewest transistors I could find was the Intel 4004 at just 2,300. So, basically, almost all the transistors in a CPU go to methods of speeding it up.
 

campbbri

Junior Member
Nov 20, 2011
8
0
0
Thanks for the replies.

Wow. I'm amazed at how small those old chips were. I assumed even a handheld calculator would have more transistors than the 3500 in an Intel 8008.

I think I worded my question poorly. We now have thousands of times as many transistors as an Intel 8008 or a 486, and CPUs are obviously many times more powerful. Of course much of the improvement is due to faster clockspeeds, but I don't think this requires any more transistors.

So why have transistor counts gone way up? I can think of three possible reasons:

1. Interface. As MrTeal points out you can use some of your transistor budget to communicate with other components more efficiently.

2. Basic calculation improvements. You can cache frequently used data, try to predict and precalculate portions of the code, etc. The code is unaware of this, so you could improve performance of programs compiled for the 8008, for example.

3. Additional features and extensions. You can add hardware support to graphics, certain algorithms, simultaneous threads, and so on. These improvements only take effect if you modify (or recompile) the code to support these features.

Each time Intel (or AMD) doubles transistor counts, is it all pretty much going to #3? If we can't keep improving #2 through more sophisticated design than we are essentially just adding more and more hardware "shortcuts". If that's the case Intel is basically saying "We can't really improve your current, simple code very much anymore. You'll have to make your code less universal by designing it for our specific extensions".

Thanks again. I'm unfortunately both very interested and ignorant about the current state of computing.
 

sefsefsefsef

Senior member
Jun 21, 2007
218
1
71
Of course cache has huge performance implications (going from zero cache to "some" cache, can do wonders for performance, but you run into diminishing returns after adding more cache for a while), and cache takes lots of transistors.

Even calculations can be sped up by more transistors. Doing integer divide (let alone floating point divide) can be done in software with shifts and adds, and the units that perform those shifts and adds are super duper tiny, BUT it will be super duper slow. You can GREATLY speed up your divide latency by building a dedicated divide unit, but this new divide unit is GIGANTIC compared to your old adder/shifter.

Next, adding more transistors *can* increase clockspeeds. In order to get really high clockspeeds you need to do pipelining. Longer and longer pipelines (where each pipeline stage performs a smaller and smaller amount of work), means higher and higher clockspeeds (and more and more performance, if you can magically avoid branch mispredicts). Adding more pipeline stages requires spending more transistors to save and move state between the pipeline stages.

Finally, branch prediction requires lots of transistors. If you have a pipelined architecture (and you should), then branches become a problem, which can be mostly-ish fixed by branch prediction. More transistors let you track more historical branch targets/results, which lets you predict more accurately.
 

Pilum

Member
Aug 27, 2012
182
3
81
This is mostly "food for thought", but I'm hoping to understand processor design a little better and I know a some Anandtech members would have insight into this pretend scenario.

Suppose Intel wanted to create an ultra-reduced instruction set CPU that only allowed the following:

1. Create variables in a register/RAM or delete them.
2. Add, Subtract, Multiply, Divide
3. Whatever the minimum is to allow loops and If / Then statements, or anything else needed for basic code to function

Intel builds this on a 22nm process and uses all modern design elements to make the absolute fastest CPU possible without increasing clock speeds (so no 20 ghz chips allowed).

How many transistors would this CPU have? Could it have modern design elements like multiple levels of cache, out of order processing, branch prediction, and so on or does the compiler have to be aware of these features? If I wrote a simple program that told it to count to a trillion would it be faster than modern x86 CPUs?
If it's "minimal", it would be several thousands to millions of times slower for any practical task. That would be like a 4-bit CPU with 4-bit address bus (i.e. total internal storage of 8 bytes) and 6 instructions or so (ADD, AND, JMP, INC, IN, OUT). Should fit into 1.500 transistors, probably less. All meangingful calculation would have to hit HD storage of course... 4 bits at a time. Of course, all instructions would take several cycles to execute. Counting to a trillion... oh, IDK. I can't do assembly on such abominations... 100 instructions per addition for 40-bit numbers? 200? With 3 cycles for each instruction? So, only a few hundred times slower than a modern x86 CPU – but just because you can't execute counting up one number to a trillion. If you count up four numbers, the upcoming Haswell could do that in parallel each cycle, raising the performance difference to 2.000 times or so. And of course, for bigger numbers the minimal CPU would get slower; a modern x86 CPU doesn't care if it adds 20 bits, 40 bits or 64 bits; the minimal CPU would take an additional 60% hit for adding 64-bit numbers compared to 40-bit.

Now, do floating point calculations in 64-bit. Haswell will do 8 64-bit FP additions and multiplies each cycle; the minimal CPU would probably take… a few thousand cycles per operation, for each ADD and MUL – have to hit external storage for additions and muls… oh, actually for the code, too (in the above example as well, actually). Say 2.000 cycles for ADD and 8.000 for MUL, 10.000 for both. Well, maybe slower. Really depends on the mass storage interface. The CPU will spend most of its time with reading/writing values and managing the mass storage – to get a feeling for the effect, try booting Windows 7 with 256MiB RAM.

So 10.000 cycles for a single ADD+MUL operation, times 8 for parallel execution on Haswell, so a factor of 80.000. Of course Haswell can do some other work on the other vector and integer pipelines as well. So, a factor of 100.000 in speed difference at the same clock for FP/vector work I guess, maybe more. Definitely no less than 10.000.

Of course, you can add any of the modern features you mentioned – but these all eat up tranistors like mad. Which directly leads you to our modern CPUs and their transistor counts, because basically Transistor Count=Performance Per Clock.

And you can't really get the same performance we have with less transistors, if that was possible some company would have figured it out. The industry is pretty much at an optimum at minimizing transistor count for the performance we take for granted. Of course you can implement many simpler cores with the same transistor count, but you loose single-threaded performance. That's useful for some special embedded applications, but the most primitive core still in production today is still an order of magnitude more complex than a truly "minimal" CPU.
 

Cerb

Elite Member
Aug 26, 2000
17,484
33
86
2. Basic calculation improvements. You can cache frequently used data, try to predict and precalculate portions of the code, etc. The code is unaware of this, so you could improve performance of programs compiled for the 8008, for example.

3. Additional features and extensions. You can add hardware support to graphics, certain algorithms, simultaneous threads, and so on. These improvements only take effect if you modify (or recompile) the code to support these features.
Not the case for multithreading. Just run the code.

Each time Intel (or AMD) doubles transistor counts, is it all pretty much going to #3? If we can't keep improving #2 through more sophisticated design than we are essentially just adding more and more hardware "shortcuts". If that's the case Intel is basically saying "We can't really improve your current, simple code very much anymore. You'll have to make your code less universal by designing it for our specific extensions".
No. Well, kind of yes, but not really.

The Memory Wall is the fundamental problem. It was theorized that by now we would processors in excess of 1k cycles out to main memory. That didn't happen, though in the early 00s we did get into the hundreds of cycles. There are other things, too (clock speed/power, FI), but trying to make good use of limited memory is the biggest.

The simpler instructions get, the less dense they get. You can encode them smaller, but then you need more of them to emulate a larger instruction, at which point your potential gains start going away, as you need to spend clock cycles back-to-back to get a value, instead of being able to issue a smaller number of instructions, taking fewer dependent operations. Meanwhile, a simple but not dense ISA can get the job done with fewer instructions, but the binary still ends up larger, effectively increasing the needed bandwidth for code, and reducing the effectiveness of any size of caches (quite a few RISCs are this way, relative to x86).

So, you have a limited amount of bandwidth, a high latency, and you need to keep the CPU full of work to do. We do not have a solution to that problem which is efficient, and that is why high performance CPUs keep getting bigger. If you make software do all the work, then you start eating up too much bandwidth (also, hardware speculation has historically been better performing, for non-real-time work, even if there's memory IO to spare).
 

Idontcare

Elite Member
Oct 10, 1999
21,110
59
91
So why have transistor counts gone way up? I can think of three possible reasons:

(1) Consider the explosion in instructions supported by the ISA:

x86ISAovertime.jpg


(2) 4 bit -> 8 bit -> 16 bit -> 32 bit -> 64 bit (and even higher for specific instructions, up to 256 bit)

(3) Big difference between pre-Pentium and post-pentium: OOO

(4) On-die cache

(5) On-die integrated functionality: memory controllers, GPU's, thermal management (PCU), I/O functions like PCIE, etc

Basically the bottom line is that you are looking at xtors being added to accomplish things that are necessary to support the magnitude of IPC (let alone the absolute clockspeed itself) that we harness in today's processors.

A 486 sex'ed up on 22nm 3D xtors may clock in at 10GHz but the IPC will be abysmal. Its absolute performance won't be any less than that gained by a 133 MHz 486, but it won't be much higher either despite the significantly elevated clockspeeds.

So designers spend a lot of xtors on prefetchers, ILP, IMC, cache, etc all so those 4GHz CPU's don't stall out and do nothing for microseconds at a time.

If you wanted a modern ISA (say that of IvyBridge or Bulldozer) implemented in a minimal footprint with absolutely no regard to actual performance or IPC then you could probably get a 10MHz processor with ~1 million xtors (maybe even less than that, bare-bones could probably weigh in at 500k) but it would truly be a worthless processor.
 
Dec 30, 2004
12,553
2
76
This is mostly "food for thought", but I'm hoping to understand processor design a little better and I know a some Anandtech members would have insight into this pretend scenario.

Suppose Intel wanted to create an ultra-reduced instruction set CPU that only allowed the following:

1. Create variables in a register/RAM or delete them.
2. Add, Subtract, Multiply, Divide
3. Whatever the minimum is to allow loops and If / Then statements, or anything else needed for basic code to function

Intel builds this on a 22nm process and uses all modern design elements to make the absolute fastest CPU possible without increasing clock speeds (so no 20 ghz chips allowed).

How many transistors would this CPU have? Could it have modern design elements like multiple levels of cache, out of order processing, branch prediction, and so on or does the compiler have to be aware of these features? If I wrote a simple program that told it to count to a trillion would it be faster than modern x86 CPUs?

Basically, when I buy a CPU with a billion transistors I wonder how much of that chip is used for "core" calculations and how much is hardware shortcuts for common functions, video, and so on.

Thanks!

why do that when they could just buy an AMD? :sneaky:
 

Ken g6

Programming Moderator, Elite Member
Moderator
Dec 11, 1999
16,635
4,562
75
So why have transistor counts gone way up?
Aha! Time to link to my favorite brief description of CPU architecture, which I like to call "Why the Pentium 4 Sucks".

Note that you can stop reading at the point where he discusses the Pentium 4, because nobody uses that architecture. Not even AMD. :p

Even Haswell is basically like a Pentium III. It has 64-bit support, more execution units, better branch prediction, an on-die video processing chip, and they've probably fiddled with the caches since then; but each core is basically similar in design to a Pentium III.
 

piasabird

Lifer
Feb 6, 2002
17,168
60
91
I had one of the last PIII celerons with the larger 256k cache and capable of using 133 RAM and it ran just as fast as the P4's. I think they quit making the PIII's too early. Basically the P4M was a beefed up PIII. It was not till the core 2 duos that they made any real improvements.
 

Arkaign

Lifer
Oct 27, 2006
20,736
1,379
126
Aha! Time to link to my favorite brief description of CPU architecture, which I like to call "Why the Pentium 4 Sucks".

Note that you can stop reading at the point where he discusses the Pentium 4, because nobody uses that architecture. Not even AMD. :p

Even Haswell is basically like a Pentium III. It has 64-bit support, more execution units, better branch prediction, an on-die video processing chip, and they've probably fiddled with the caches since then; but each core is basically similar in design to a Pentium III.

Pentium 4 was okay for most of its life. Intel ran into big trouble with the P3 towards the end not being able to keep up with socket-A Athlons passing the 1ghz range. Remember the recalled 1.13 Coppermine? Even at the previous version Katmai they pushed too hard and had to stock overvolt the 600 Slot 1 to 2.05V. P4/Netburst seemed like a good idea at the time, but it was released too early imho, and might even have been entirely avoided had they had a little more patience.

The P3 Tualatin was pretty solid, IIRC they were somewhat on par with the AXPs clock for clock, but I think Intel was worried about not being able to catch up clockspeed wise.

Certainly the biggest blunder with P4 was going with RDRAM in the beginning. Very little gain, huge expense and inconvenience for everyone from home users to big box vendors.

Anyhow, once P4 reached the Northwood era, it was really fine. Price, performance, feature set, overclockability, power usage, DDR chipsets available, etc, it was all good. Unless you were dumb enough to buy the highest clock speed model or something (reminds me of when AMD had $1k CPUs for idiots). But northwood did have a pretty long run, it was neck and neck and finally pulled completely ahead of AXP at the end, so you had everything from a 1.6A to the 3.2 and rare 3.4ghz models out there over a pretty long lifespan. Even the first models of Athlon 64, particularly on Socket 754, weren't hands-down superior at all. They were better in more than half of things, but still fell behind stock v stock at encoding and a handful of gaming benches and stupid synthetic benches. Hence a P4 3.2 Northwood vs. an Athlon 64 3200+ was basically a wash.

I think most people seem determined to remember P4 for the early stupidity (not enough clock speed yet to justify the new architechture, and tied to idiotic RDRAM), or the last bridge too far, Prescott, as when AMD ramped up to 3500+, 3800+, etc, it was just too much to keep up with, as Prescott didn't deliver at all what they had expected. So P4 is a weird era. Began in failure, ended in failure, but the bulk of its life was extremely successful, selling competitive product at competitive prices.
 

bononos

Diamond Member
Aug 21, 2011
3,928
186
106
..........

Anyhow, once P4 reached the Northwood era, it was really fine. Price, performance, feature set, overclockability, power usage, DDR chipsets available, etc, it was all good. Unless you were dumb enough to buy the highest clock speed model or something (reminds me of when AMD had $1k CPUs for idiots). But northwood did have a pretty long run, it was neck and neck and finally pulled completely ahead of AXP at the end, so you had everything from a 1.6A to the 3.2 and rare 3.4ghz models out there over a pretty long lifespan. Even the first models of Athlon 64, particularly on Socket 754, weren't hands-down superior at all. They were better in more than half of things, but still fell behind stock v stock at encoding and a handful of gaming benches and stupid synthetic benches. Hence a P4 3.2 Northwood vs. an Athlon 64 3200+ was basically a wash.

I think most people seem determined to remember P4 for the early stupidity (not enough clock speed yet to justify the new architechture, and tied to idiotic RDRAM), or the last bridge too far, Prescott, as when AMD ramped up to 3500+, 3800+, etc, it was just too much to keep up with, as Prescott didn't deliver at all what they had expected. So P4 is a weird era. Began in failure, ended in failure, but the bulk of its life was extremely successful, selling competitive product at competitive prices.

Did the P4's start pulling ahead when they added hyperthreading capability?
Aside from this, were the P4 alus double pumped (ran at 2x clockrate) from the beginning or was it around the time of Prescott cores?
 

Arkaign

Lifer
Oct 27, 2006
20,736
1,379
126
Did the P4's start pulling ahead when they added hyperthreading capability?
Aside from this, were the P4 alus double pumped (ran at 2x clockrate) from the beginning or was it around the time of Prescott cores?

Well, the P4 had a quad-pumped FSB, so it was actually stranger than you're thinking. But it doesn't mean the bus was quicker than the clockrate.

P4's started pulling ahead around the 3ghz+ range compared to 3000+, and the 3.2Ghz Northwood won basically every benchmark compared to a 3200+ Athlon XP. It wasn't too long after that when the Athlon 64 3000+/3200+ came along, which basically broke even with the P4 3.2, and soon afterwards ramped up to speeds that Intel couldn't match, Prescott was broken from the start.

There were also massive chipset differences as well. An Athlon XP on a VIA 266A/333/etc mobo was somewhat hindered when compared to running that same CPU on a good NF2 board. Just like running a P4 on a VIA SDRAM chipset or old pre-533FSB chipset was a losing proposition.

I don't think anyone smart (in the OC world) bought either a 3200+ or P4 3.2 though. It was easy enough to get a 2500+ or mobile 1700+ to 3200+ speeds, and ditto even a 2.4, 2.53, 2.6, or 2.8ghz P4 to 3.2ghz and beyond. Going back even further, getting a 1500+ to 1800+/2000+/2200+ speeds was cake, and so was taking a 1.6A to 2.0-2.4Ghz.

As for hyperthreading, it was actually less useful back then, and imho didn't really contribute much of anything to P4 winning in the end vs. AXP, a forgotten victory as it wasn't long at all from that timeframe to when A64 came about.
 

Cerb

Elite Member
Aug 26, 2000
17,484
33
86
Did the P4's start pulling ahead when they added hyperthreading capability?
No, Nehalem did that. HT could help some, but it wasn't really ready for prime time until it had a nice wide CPU to work on.
Aside from this, were the P4 alus double pumped (ran at 2x clockrate) from the beginning or was it around the time of Prescott cores?
Prescott refined the whole thing, and ran more of the ALU and AGU ate normal speed; though I forget some of the details, and links I can find are now 404s.

It's not that all the ideas in the P4 had no uses, they just got implemented in the most reality-defying ways Intel could come up with (people caring about the performance of 386-, 486-, Pentium-, and P6-optimized binaries is part of the x86 reality; and memory limits are everyone's reality).

It's not that they ran at double clock speed, but that they did things like width-pipelining to make it work, and ketp the whole CPU narrow, negating the most useful advantages of a faster ALU*. It took ideal cases for it to be as effective as a single 32-bit ALU, much less perform better (unless you were a major image processing company who's software was commonly benchmarked; then Intel might offer to help you out). On top of that, they slowed down shifting operations, which are not only common, but were previously the recommended shortcuts for tasks like multiplying and dividing by small constants.

A double-pumped ALU working at 32 bits a slice, but only handling the simplest of instructions, or cascaded 32-bit ALUs, probably would have been an all-around performance gain, including in software not specifically targeting the P4 (>99% of x86 software).

The degree of software pipelining necessary to make it work well was just too much, given the long hardware pipelines. A great deal of code out there doesn't have enough intrinsic static parallelism to be able to take advantage of it, even with a clever compiler and access to the source, and programmers with good knowledge of the P4, as well.

It wasn't too long after that when the Athlon 64 3000+/3200+ came along, which basically broke even with the P4 3.2
The very existence of S754 was a blunder by AMD, too, limitng performance of mainstream parts. They would have done better to have come out with 939 as the only non-server from the start. Aside from that, it was better for both AMD and Intel, at the time, in that AMD's fastest CPUs didn't need to be cheap, so while everybody wanted a A64 3200+, a Prescott might be as or more affordable. And, how long did the X2 3800+ stay at $300?

I don't think anyone smart (in the OC world) bought either a 3200+ or P4 3.2 though.
AutoCAD loved AXPs, and then A64s. Outside of that kind of scenario, where you're already paying a bunch for software and peripherals, and there was a real perceptible difference when you sat down and used it, nothing above the [Barton] 2800+ really made sense. The 2.8C/E and faster also made sense, even for those situations where they would perform a bit worse, if you could use HT.

Ultimately, I think Ken Olsen was right on target, many years earlier, when he said he feared success, more than any competitor. AMD certainly didn't handle their success well.

* Consider the 32-bit registers A,B,C,D, each with a low (L) set of bits, and a high (H) set of bits. IIRC, the P4 would do A<-B+C, D<-C << 6 as:
Code:
1. B.L + C.L ; no-op
2. C.L << 6 ;  B.H + C.H + carry
3. no-op ;     C.H << 6 << carry
Now, that would be OK, if the ALUs were wide, and the front-end were moderately wide, so that the ALUs could stay fed. On a high-GHz narrow throughput machine, that isn't a DSP or GPU, it's an example of Intel largely abandoning much of what kept x86 so successful, which wasn't the high speeds and generational recompiles of RISC CPUs. For such width-pipelining to work well, it would need to be done in conjunction with widening the core (IE, Core 2, SB, Haswell), not against it, and would need either special-casing of 'bad' instructions, like right shifts, or the ability to mix and match high and low slices (maybe have 1-2 LSB->MSB ports, and then a MSB->LSB one, or something like that?).
 
Last edited:

pm

Elite Member Mobile Devices
Jan 25, 2000
7,419
22
81
Back when I was in college, a fellow engineering student who loved to code RTL and myself - who liked messing with synthesis and place 'n route tools - got together and coded up and synthesized a simple CPU built on our own homebrew instruction set. It was our final project for our VLSI class and we did a good job on it as I recall. It was in-order, single-pipeline, 8 total instructions (including a NOP), no caching and we didn't bother with any interfacing with anything (so, no real analog-front-end front-side bus) and it actually didn't have memory - instructions just magically appeared in the decode logic and it loaded constants for data, we had a register or two though. My buddy coded it - although I helped a bit - and I synthesized it and used a routing program to do the layout for it although as I recall I didn't really completely wire it - just let the tools plunk down cells and it routed but when I had conflicts, I just ignored them. So there was no fabbing this one. I don't think I even really routed power or the clock.

My friend wrote the programs that ran on it and liked looking at it working by looking at the waveforms. I was more interested in how many transistors it would take, and how fast it would run. As I recall, we managed the whole thing in under 40k FETs - including the ALU - but it wasn't fully functional... and I don't recall if it was CMOS or just PMOS (I think the latter). It was written in VHDL I think, and I used the libraries from some foundry in Europe called Eurochip or something like that... it was 1um (1000nm) technology as I recall. And I don't remember what tools I used for the synthesis except that we had an academic license for a professional suite and that I seem to remember them as being easier than what I actually use now. Somewhere I have a photographs of a test running on it, and the layout of it zoomed way out.
 

dmens

Platinum Member
Mar 18, 2005
2,275
965
136
One of my friend's VLSI project in school was an 8-bit stack machine with an adder, shifter, nand/nor/inv (and a couple others), emulated loads and stores, and a conditional brancher, came out to about 50k transistors.
 

Arkaign

Lifer
Oct 27, 2006
20,736
1,379
126
The very existence of S754 was a blunder by AMD, too, limitng performance of mainstream parts. They would have done better to have come out with 939 as the only non-server from the start. And, how long did the X2 3800+ stay at $300?

I agree completely. I actually built an Opteron 144 / socket 940 system before Athlon 64 was available, and it was quite excellent. I remember being pretty underwhelmed by socket 754 and puzzled as to why they would dilute things like that.

X2 3800+ stayed at ~$300 for a fairly long time. Basically from the release date of August 1st, 2005 through the release of Core 2 Duo.

http://www.dailytech.com/article.aspx?newsid=2800

July 24th, 2006. So basically a year solid at $300. I remember that during this era I built one pretty early, then sold it, then built an opty 165 setup, sold it, and finally settled on a heavily overclocked Pentium D 805 that I got for about 1/3rd the cost. Once in the mid-3ghz range it was competitive enough for my uses, which was mainly encoding video for a big church project, and playing Enemy Territory (the RTCW one), along with checking things out such as Titan Quest, Quake 4, etc, etc. As AMD had no dual-core options below the $300 range for that year, I think this was their worst value era. Intel really wasn't much better, actually WORSE, if you weren't an overclocker. The chips also ran hotter if you didn't run good aftermarket cooling, and you had to have a good power supply and good mobo choice to have a stable happy OC with the PD.
 

CTho9305

Elite Member
Jul 26, 2000
9,214
1
81
You can actually implement all the functionality of a processor with just one instruction - the one I'm most familiar with is "subtract and branch if negative". With that one instruction, you can implement addition, subtraction, multiplication, loops, etc.