Trends in Multithreaded processing

Scali · Sep 21, 2010

I think the only reason why ISA doesn't matter is because x86 has a truckload more time and money invested in it than anything else, and operates on a much larger scale.
All this effort put into all this logic to decode x86 code, fuse it, solve dependencies, run microcode emulation for obscure and outdated instructions, realmode, pmode, long mode etc...
Somewhere somehow it's got to be possible to come up with something faster/smaller/more efficient if you apply more modern techniques while using the same resources/time/scale etc benefits of x86.

And no, x86 is not meant for a compiler.
x86 predates the compiler-era (that is, compilers existed, but especially for smaller computers such as PCs, handcrafted assembly was still the way to go. Compilers were for the big iron).
RISC instructionsets are meant for a compiler. CISC instructionsets generally contain lots of esoteric tricks so a human can hand-optimize things to save a few bytes here and there (yes, size was very important back then).
The sense behind x86 has long eroded. If you look at hand-optimized code for 8088 or 286, and apply it to a modern x86, it will generally be incredibly inefficient, because it uses all these esoteric instructions which have been moved into microcode emulation, because compilers would never use them anyway. But they're still supported in every x86 (at least Motorola actually evicted legacy instructions completely from the 68k hardware, and put a software emulation routine in the invalid instruction trap handler).

degibson · Sep 21, 2010

Scali said:
I think the only reason why ISA doesn't matter is because x86 has a truckload more time and money invested in it than anything else, and operates on a much larger scale.
All this effort put into all this logic to decode x86 code, fuse it, solve dependencies, run microcode emulation for obscure and outdated instructions, realmode, pmode, long mode etc...
Somewhere somehow it's got to be possible to come up with something faster/smaller/more efficient if you apply more modern techniques while using the same resources/time/scale etc benefits of x86.

Sure it's possible to come up with something better -- a computer architect's pipe dream -- but of course it won't happen, for the reasons you've mentioned. But it's not just the hardware cost -- the cost of making new software is way, way higher. The only help in that regime is dynamic binary translation.

And no, x86 is not meant for a compiler.
x86 predates the compiler-era (that is, compilers existed, but especially for smaller computers such as PCs, handcrafted assembly was still the way to go. Compilers were for the big iron).

x86 and compilers grew up together. The whole stack based architecture reads like a compiler text. I believe in human-coded ASM, especially in the past and for occasional good reasons today, and for embedded devices. But x86 certainly does not predate compilers. Just the widespread adoption of compilers on what is now commodity hardware. Part of the success of compilers is due to x86.

RISC instructionsets are meant for a compiler. CISC instructionsets generally contain lots of esoteric tricks so a human can hand-optimize things to save a few bytes here and there (yes, size was very important back then).

Agreed. Lessons learned from CISC led to the RISC experiment. But several trends underlie compiler's ability to leverage fancy CISC ops. E.g., they're not often useful. For another, compilers are restricted to implementing precisely what a programmer specifies, which may not be precisely what is done by a particular CISC op. Both of these are artifacts of today's languages and today's compilers. On the other hand, an optimizing FORTRAN compiler from the 80s was pretty good at using those things, given the correct program input.

The sense behind x86 has long eroded. If you look at hand-optimized code for 8088 or 286, and apply it to a modern x86, it will generally be incredibly inefficient, because it uses all these esoteric instructions which have been moved into microcode emulation, because compilers would never use them anyway. But they're still supported in every x86 (at least Motorola actually evicted legacy instructions completely from the 68k hardware, and put a software emulation routine in the invalid instruction trap handler).

Evicting uncommon operations from hardware operation is a common trick. IBM did it, x86 does it, even SPARC does it to some extent for some traps. An naturally, microarchitecture-specific optimization has its place.

Scali · Sep 21, 2010

degibson said:
x86 and compilers grew up together. The whole stack based architecture reads like a compiler text.

Not sure why you refer to x86 as 'stack based architecture'.
x87 is stack based (and actually hell for a compiler to optimize for), but with x86 you generally use it via its register file, operations normally don't go via stack.
It's the first time I've heard someone refer to x86 as 'stack based architecture' in my ~25 years of experience with the architecture.
I don't see x86 any more stack-oriented as say a 68k or even PowerPC. They all have a 'dedicated' stack pointer register. Okay, technicakky 68k/PPC don't have specific push/pop instructions, but because of their addressing modes, they don't need them.

degibson said:
But x86 certainly does not predate compilers. Just the widespread adoption of compilers on what is now commodity hardware. Part of the success of compilers is due to x86.

Not initially.
I grew up with 6502, x86 and 68k, and we rarely used compilers... most of the machines simply weren't powerful enough to run a compiler on... and even if you could run a compiler, the level of optimization was generally unacceptable, so you'd still whip out the ole assembler.
It wasn't until the early 90s that compilers started to play a larger role in the x86/PC world (partly because optimizations became better... partly because CPUs became so fast that it no longer mattered that much).
If you look at the source code of eg Wolf3D or Doom, you'll still find tons of hand-optimized assembly in there.
By then, x86 was already about 15 years old, so 'grew up together'? Nah.

degibson said:
Agreed. Lessons learned from CISC led to the RISC experiment. But several trends underlie compiler's ability to leverage fancy CISC ops. E.g., they're not often useful. For another, compilers are restricted to implementing precisely what a programmer specifies, which may not be precisely what is done by a particular CISC op. Both of these are artifacts of today's languages and today's compilers. On the other hand, an optimizing FORTRAN compiler from the 80s was pretty good at using those things, given the correct program input.

I see it completely differently from you.
Modern x86s are pretty much the same as the RISC CPUs of yore... Partly because compilers don't use the esoteric instructions anyway.
So these are moved to microcode emulation. However, unlike RISC CPUs, x86 still needs complex decoders to handle the complex (and variable-length, uggghh!!) instruction encoding. Something that was eliminated with RISC.
So while technically a modern x86 has pretty much the same efficient lightweight RISC-backend, it takes a lot more effort to get the instructions there.

degibson said:
Evicting uncommon operations from hardware operation is a common trick. IBM did it, x86 does it, even SPARC does it to some extent for some traps. An naturally, microarchitecture-specific optimization has its place.

x86 has never removed anything from hardware. They just changed it from a 'hardwired' implementation to a 'software' implementation (executing a microcode program/macro).
IBM and Motorola have actually physically removed instructions altogether, causing an invalid opcode trap, which is then handled in software. I don't know of any x86 that does this.

Cogman · Sep 21, 2010

Scali said:
Not sure why you refer to x86 as 'stack based architecture'.
x87 is stack based (and actually hell for a compiler to optimize for), but with x86 you generally use it via its register file, operations normally don't go via stack.
It's the first time I've heard someone refer to x86 as 'stack based architecture' in my ~25 years of experience with the architecture.

It certainly isn't as stack heavy as the x87 (Thank goodness). But it is pretty stack friendly. I believe what he meant is that x86 is naturally setup to use stack operations (push, pop, and the esp register). It doesn't HAVE to use them, but for things like function calling, it is pretty natural to use the stack.

x86 has never removed anything from hardware. They just changed it from a 'hardwired' implementation to a 'software' implementation (executing a microcode program/macro).
IBM and Motorola have actually physically removed instructions altogether, causing an invalid opcode trap, which is then handled in software. I don't know of any x86 that does this.

yep, pretty much. AFAIK, x86 has never really removed an instruction. The ever useless "Enter" instruction still remains after 20+ years of obsoleteness.

Scali · Sep 21, 2010

Cogman said:
It certainly isn't as stack heavy as the x87 (Thank goodness). But it is pretty stack friendly. I believe what he meant is that x86 is naturally setup to use stack operations (push, pop, and the esp register). It doesn't HAVE to use them, but for things like function calling, it is pretty natural to use the stack.

But most CPUs are.
Stacks are just very handy for everyday use.

There are REAL stack-based CPUs however, where the operands for instructions are passed through the stack, rather than through registers (eg Java's virtual machine uses such a design).
x86 isn't anything like that. It's closer to an accumulator-based architecture, since a lot of instructions rely on (r/e)ax, and the 'a' actually stands for accumulator (and 'b' for base, 'c' for counter, 'd' for data).

Cogman · Sep 21, 2010

Scali said:
But most CPUs are.
Stacks are just very handy for everyday use.

There are REAL stack-based CPUs however, where the operands for instructions are passed through the stack, rather than through registers (eg Java's virtual machine uses such a design).
x86 isn't anything like that. It's closer to an accumulator-based architecture, since a lot of instructions rely on (r/e)ax, and the 'a' actually stands for accumulator (and 'b' for base, 'c' for counter, 'd' for data).

Not really disagreeing, just trying to get at what degibson meant. You are certainly right, there are real stack based processors, they are a PITA to work with, and the x86 architecture really isn't one of them.

degibson · Sep 21, 2010

Cogman said:
Not really disagreeing, just trying to get at what degibson meant. You are certainly right, there are real stack based processors, they are a PITA to work with, and the x86 architecture really isn't one of them.

x86 has an accumulator. Ergo, it is a stack-based architecture. Modern x86 makes the accumulator less necessary, again, lesson learned from RISC.

By then, x86 was already about 15 years old, so 'grew up together'? Nah.

If it has to be a question about raw dates, then:
First compiler: 1952
First x86: 1971
Compilers win.

But its silly not to understand that x86 and compilers learned from each other and evolved together.

So while technically a modern x86 has pretty much the same efficient lightweight RISC-backend, it takes a lot more effort to get the instructions there.

Sure it does. There have been hundreds of research papers on the subject. But the underlying trend remains: after inter-instruction data dependences are ironed out (in the x86 world, post-cracking, post-renaming), the true dependences are the limiting factor. It ends up not mattering at all how the ISA expresses those dependences in the first place.

Modern x86s are pretty much the same as the RISC CPUs of yore

Of course they are. Again, lessons learned from RISC. They've been cracking uops for a long time since. That doesn't change the fact that the instructions are described as CISC ops.

x86 has never removed anything from hardware. They just changed it from a 'hardwired' implementation to a 'software' implementation

Semantics. x86 vendors never had the chance to make their emulated stuff full-software because they never controlled enough of their software stack. But it's an old trick... getting rid of dinosaur architectural features by making them even slower and less useful. Over time, software evolves away from them. In the short term, existing software still works.

Voo · Sep 21, 2010

@degibson: So what exactly are the advantages of x86 for compiler writers? Last time I checked most compilers didn't use all those fancy, inconsistent CISC operations anyhow (well that's mostly because they're emulated in µops today and are extremely inefficient, but then the question is why Intel did it that way.. same result). I mean one reason for the whole RISC ISA was to create something that was easy for compiler writers and not human assembly programmers.

Schmide · Sep 21, 2010

degibson said:
x86 has an accumulator. Ergo, it is a stack-based architecture. Modern x86 makes the accumulator less necessary, again, lesson learned from RISC.

I hate to say it having an accumulator does not define stack based architecture. Having a stack where you push data and operands instead of using registers and addressing pretty much defines it.

Even the x87 isn't really a stack based architecture; in fact, it's actually an accumulator based architecture as st0 is the default register (i.e. the accumulator)

degibson · Sep 22, 2010

Voo said:
@degibson: So what exactly are the advantages of x86 for compiler writers? Last time I checked most compilers didn't use all those fancy, inconsistent CISC operations anyhow (well that's mostly because they're emulated in µops today and are extremely inefficient, but then the question is why Intel did it that way.. same result).

No advantages at all for a modern compiler. But lots of advantages for a primitive compiler of the 70s: built-in accumulator support, implicit arguments, native push, pop.

ISAs are static - once they're written, they don't change with compiler trends. You're right that compilers don't use those fancy CISC ops -- those were for assembly folks, and optimizing FORTRAN compilers in the best case.

I mean one reason for the whole RISC ISA was to create something that was easy for compiler writers and not human assembly programmers.

The biggest motivator for RISC was hardware simplicity. As a side note, it enabled a lot of compiler orthogonality arguments -- but as they say, compilers measure performance difference by the percentage point, hardware designers measure it by factors of speedup.

It was believed that once all the complicated decode logic was out of the way, the cleanliness of the RISC idea would enable better-performing designs. It worked -- but x86 vendors threw way more money at the same ideas. And with the market share already out there, x86 designers took the good from RISC and banked. Hence, e.g., micro-op cracking appearing the Pentium era.

I hate to say it having an accumulator does not define stack based architecture. Having a stack where you push data and operands instead of using registers and addressing pretty much defines it.

I'm not sure why you seem to dislike the term. Stacks are great. x86 expects to have a stack. It's not as explicitly stack-oriented as some other ISAs, admittedly, but with an accumulator, an explicit stack pointer, an explicit stack segment, explicit push/pop, and control flow instructions that assume a stack, it does a fine job of organizing it's own stack.

Schmide · Sep 22, 2010

degibson said:
I'm not sure why you seem to dislike the term. Stacks are great. x86 expects to have a stack. It's not as explicitly stack-oriented as some other ISAs, admittedly, but with an accumulator, an explicit stack pointer, an explicit stack segment, explicit push/pop, and control flow instructions that assume a stack, it does a fine job of organizing it's own stack.

Stacks are great, but I think you're equating the existence of stacks in a system with a stack based system. (also referred to as zero address machines)

A stack based computer is one where registers are not exposed to the system. To do any operation it looks like so.

c = a + b

push a
push b
push add
pop c

In general if you have a default operand (accumulator) and instructions with one operand then you have an accumulator based system. (also referred to as one address machines)

I think once you get to 2 operands and above it's just called a modern computer.

Scali · Sep 22, 2010

degibson said:
If it has to be a question about raw dates, then:
First compiler: 1952
First x86: 1971
Compilers win.

The first x86 is the 8086 from 1978. Its earlier cousins can not be classified as 'x86', although they are similar.
And for the rest:

Scali said:
I grew up with 6502, x86 and 68k, and we rarely used compilers... most of the machines simply weren't powerful enough to run a compiler on... and even if you could run a compiler, the level of optimization was generally unacceptable, so you'd still whip out the ole assembler.
It wasn't until the early 90s that compilers started to play a larger role in the x86/PC world (partly because optimizations became better... partly because CPUs became so fast that it no longer mattered that much).
If you look at the source code of eg Wolf3D or Doom, you'll still find tons of hand-optimized assembly in there.
By then, x86 was already about 15 years old, so 'grew up together'? Nah.

I said that compilers existed already, but had not yet replaced assembly on small systems such as x86.

degibson said:
But its silly not to understand that x86 and compilers learned from each other and evolved together.

I pointed that exact fact out, just in a slightly different way from you.

Sure it does. There have been hundreds of research papers on the subject. But the underlying trend remains: after inter-instruction data dependences are ironed out (in the x86 world, post-cracking, post-renaming), the true dependences are the limiting factor. It ends up not mattering at all how the ISA expresses those dependences in the first place.

Of course they are. Again, lessons learned from RISC. They've been cracking uops for a long time since. That doesn't change the fact that the instructions are described as CISC ops.

degibson said:
Semantics.

Nope, it's not semantics. It's a VERY different thing.

degibson said:
x86 vendors never had the chance to make their emulated stuff full-software because they never controlled enough of their software stack.

Motorola had possibly even less control than x86 vendors, since although x86 went in every PC-compatible, 68k went in pretty much everything else. Macs, Ataris, Amiga's, most arcade machines etc.
All you'd need is the emulation routines hooked into the invalid opcode trap, either by the BIOS or by the OS. Since new CPUs generally need a new socket anyway, and as such a new motherboard, fixing this in the BIOS would have been as trivial for x86 as it was for the 68060.

Scali · Sep 22, 2010

degibson said:
No advantages at all for a modern compiler. But lots of advantages for a primitive compiler of the 70s: built-in accumulator support, implicit arguments, native push, pop.

What exactly is the advantage for a compiler of native push/pop?
68k did not have a 'native' push/pop, but you could do a push/pop in a single instruction with its addressing modes using pre-decrement and post-increment addressing.
Does it really matter for a compiler whether it outputs 'pop reg' or 'move.l (sp)+, reg'? Pop is just an alias.
I don't think so. It just emits the bytecode, whatever it is. Even if it's multiple instructions, it doesn't change anything, really.

Likewise, an accumulator is not much of an advantage for a compiler. Any destination register will do (it needs to handle multiple source registers anyway).

degibson said:
It was believed that once all the complicated decode logic was out of the way, the cleanliness of the RISC idea would enable better-performing designs. It worked -- but x86 vendors threw way more money at the same ideas. And with the market share already out there, x86 designers took the good from RISC and banked. Hence, e.g., micro-op cracking appearing the Pentium era.

Motorola also delivered a small micro-op cracking gem in the form of the 68060.
Ofcourse the 68k had the advantage that the instructionset was designed to be orthogonal from the start, and support full 32-bit (while only being a few months younger than the 8086).

degibson · Sep 22, 2010

Scali said:
What exactly is the advantage for a compiler of native push/pop?
68k did not have a 'native' push/pop, but you could do a push/pop in a single instruction with its addressing modes using pre-decrement and post-increment addressing.
{snip}
Likewise, an accumulator is not much of an advantage for a compiler. Any destination register will do (it needs to handle multiple source registers anyway).

Compilers didn't always have the sophistication, nor the time, they do today. Early compilers just traversed trees and generated stack code -- x86 was perfect for those two things, yielding small and compact compiled code.

Once things like peephole optimization started showing up, the difference between eight stack stores and eight subtracts vs eight explicit pops started going away, because the eight subtracts could become one subtract.

Does it really matter for a compiler whether it outputs 'pop reg' or 'move.l (sp)+, reg'? Pop is just an alias.
I don't think so. It just emits the bytecode, whatever it is. Even if it's multiple instructions, it doesn't change anything, really.

You said it yourself: variable length instructions. push is simple x86 opcode. If you want to do a move.l(sp)+reg, that's a move with the appropriate decorations to use the particular register. Doing that once is fine. Doing that a lot hurts instruction cache locality. Instruction size mattered in the early years of x86 -- instruction memories were small in a lot of cases (still are in embedded space x86).

Scali · Sep 22, 2010

degibson said:
Compilers didn't always have the sophistication, nor the time, they do today. Early compilers just traversed trees and generated stack code -- x86 was perfect for those two things, yielding small and compact compiled code.

My recollection is that compilers had already evolved beyond that basic stadium by the time the x86 arrived on the market.

degibson said:
You said it yourself: variable length instructions. push is simple x86 opcode. If you want to do a move.l(sp)+reg, that's a move with the appropriate decorations to use the particular register. Doing that once is fine. Doing that a lot hurts instruction cache locality. Instruction size mattered in the early years of x86 -- instruction memories were small in a lot of cases (still are in embedded space x86).

Okay, this makes no sense to me. Weren't you trying to argue that a special pop instruction made it easier to write a compiler?
So I made this point:
In both cases, 'reg' is the only variable. Compilation is equally trivial in both cases. Just insert the proper bit encoding for 'reg' into the otherwise fixed pop opcode.

Now you are trying to argue about variable length instructions and instruction cache locality? What does that have to do with writing a compiler?
Aside from that... as I said, x86's early assumptions are no longer valid. Instruction size is one of them. A lot of extra time and logic has to be spent on decoding x86 instructions (it's a 2-pass scheme: first determine the instruction boundaries, then decode the actual instruction).
And for a compiler, fixed length instructions are actually easier.

degibson · Sep 22, 2010

Scali said:
Okay, this makes no sense to me. Weren't you trying to argue that a special pop instruction made it easier to write a compiler?
So I made this point:
In both cases, 'reg' is the only variable. Compilation is equally trivial in both cases. Just insert the proper bit encoding for 'reg' into the otherwise fixed pop opcode.

Now you are trying to argue about variable length instructions and instruction cache locality? What does that have to do with writing a compiler?
Aside from that... as I said, x86's early assumptions are no longer valid. Instruction size is one of them. ).
And for a compiler, fixed length instructions are actually easier.

Variable-length instructions make it easier for a compiler to make smaller code. I'm not arguing current trends, I'm arguing past ones. Small code used to be much more important than it is today.

Schmide · Sep 22, 2010

(insert RISC/CISC debate reference here and note I foretold this!)

Scali said:
What exactly is the advantage for a compiler of native push/pop?

There is no advantage, especially in the case of x86, its just a different methodology. The opcode is more complex, well duh it's CISC, you're trading off registers/opcode size/address-ability etc.

Scali said:
68k did not have a 'native' push/pop, but you could do a push/pop in a single instruction with its addressing modes using pre-decrement and post-increment addressing.
Does it really matter for a compiler whether it outputs 'pop reg' or 'move.l (sp)+, reg'? Pop is just an alias.
I don't think so. It just emits the bytecode, whatever it is. Even if it's multiple instructions, it doesn't change anything, really.

For one you can't replace a push/pop instruction with a move instruction even with the post increment. In a byte/word/dword/quad addressable machine you need to move the stack pointer by 1,2,4,8 respectively.

Scali said:
Likewise, an accumulator is not much of an advantage for a compiler. Any destination register will do (it needs to handle multiple source registers anyway).

It is in terms of the opcode map, in the days of stack/accumulator based systems a bit here or there was kind of important.

Scali · Sep 22, 2010

degibson said:
Variable-length instructions make it easier for a compiler to make smaller code. I'm not arguing current trends, I'm arguing past ones. Small code used to be much more important than it is today.

I don't think it makes it any easier. You just have less options when you don't have variable-length instructions.
But I'm not sure why you are arguing about this, as the 68k also has variable-length instructions.
Funny enough size optimizing wasn't exactly a strong point of compilers, especially in those early days.

Scali · Sep 22, 2010

Schmide said:
For one you can't replace a push/pop instruction with a move instruction even with the post increment. In a byte/word/dword/quad addressable machine you need to move the stack pointer by 1,2,4,8 respectively.

Because of stack alignment, you never want to use anything other than the word size on stack anyway. A move.l (sp)+, reg is *exactly* the same as pop reg. And I mean *exactly*.
And on a 32-bit x86, you can do both 16-bit and 32-bit push/pops. Not that you should, but you can.

Schmide said:
It is in terms of the opcode map, in the days of stack/accumulator based systems a bit here or there was kind of important.

Not in this case, because although x86 has an accumulator, it has more general-purpose registers than a TRUE accumulator-based CPU, such as the 6502, so the x86 doesn't reap potential benefits of smaller opcode maps.
In fact, as you just said, x86 is a mess of a CISC instructionset with a LOT of cruft. Not exactly the perfect example of keeping the opcode map small and simple. 68k did that a lot better.

But really, you are turning things around if you look at it this way.
In the early days, memory speed was not a bottleneck, so operations could work directly on memory. Therefore a stack-based CPU was good enough. This was also the most compact way to encode instructions (but really it was the only way they ever tried up to then).
As CPU speeds evolved faster than memory speeds, registers were introduced. They were an early form of caching ('level 0 cache').
At a later stage, additional levels of memory caching were added.
But really, I think trying to relate these developments to compilers is a bit backwards.

Schmide · Sep 22, 2010

Scali said:
Because of stack alignment, you never want to use anything other than the word size on stack anyway. A move.l (sp)+, reg is *exactly* the same as pop reg. And I mean *exactly*.
And on a 32-bit x86, you can do both 16-bit and 32-bit push/pops. Not that you should, but you can.

What is advised and what is allowed is; well, not the same. For what push/pop is on the x86, the opcodes generated are not general purpose but very specific to certain operations. I believe in the 68k you can actually move memory to memory (i.e. move.l (sp)+, (reg)+ ). There is no way to do that on the x86. I don't know off hand but I would bet the encoding of this probably larger on average on the 68k than the x86.

Scali said:
Not in this case, because although x86 has an accumulator, it has more general-purpose registers than a TRUE accumulator-based CPU, such as the 6502, so the x86 doesn't reap potential benefits of smaller opcode maps.

Actually it does not really have an accumulator, in a sort of balancing act, it has an opcode optimized register. Operations on the A register can be done with smaller opcodes relative to the same operations in other registers. It does have some operations that require use of the A register, BCD for example, but for general operations you are not required to use it.

Despite all its legacy instructions, the x86 produces a small code footprint relative to many other current architectures.

Scali said:
In fact, as you just said, x86 is a mess of a CISC instructionset with a LOT of cruft. Not exactly the perfect example of keeping the opcode map small and simple. 68k did that a lot better.

Did I say that? I don't think so! I said it was a trade off.

The 68k did have a very small code footprint, I would say though, this may have lead to its fall from favor. Its lack of virtual addressing and opcode extensibility slowed its development keeping it behind the curve. Movement of its base to RISC architectures didn't help either.

It's easy to argue that there are unused instructions of the x86 that could easily be dropped. However there is equal ability to further extend the ISA as shown by the adoption of x64, mmx, SSE, etc.

Scali said:
But really, you are turning things around if you look at it this way.
In the early days, memory speed was not a bottleneck, so operations could work directly on memory. Therefore a stack-based CPU was good enough. This was also the most compact way to encode instructions (but really it was the only way they ever tried up to then).
As CPU speeds evolved faster than memory speeds, registers were introduced. They were an early form of caching ('level 0 cache').
At a later stage, additional levels of memory caching were added.
But really, I think trying to relate these developments to compilers is a bit backwards.

I don't think the use of an stack/accumulator has anything to do with memory speeds but rather the complexity of the internal units. It's a lot easier to latch to a known unit/location than to latch to one of many units/locations. Its really a way of limiting where operations take place.

Nothing is really backwards when you relate software to the hardware it executes on.

Scali · Sep 22, 2010

Schmide said:
What is advised and what is allowed is; well, not the same.

That's not the point, is it?
The compiler just needs to emit a 'pop'. Whether the instruction is actually called 'pop reg' or 'move.l (sp)+, reg' is irrelevant to a compiler. The difference is only in the mnemonic representation for humans.

Schmide said:
For what push/pop is on the x86, the opcodes generated are not general purpose but very specific to certain operations. I believe in the 68k you can actually move memory to memory (i.e. move.l (sp)+, (reg)+ ). There is no way to do that on the x86. I don't know off hand but I would bet the encoding of this probably larger on average on the 68k than the x86.

You can do memory-to-memory on x86, that's what movs is for. The beauty of the 68k addressing modes is that you can do push/pop and lods/stos/movs all with the same instruction and two addressing modes (and no need for a direction flag either).

Schmide said:
Actually it does not really have an accumulator, in a sort of balancing act, it has an opcode optimized register. Operations on the A register can be done with smaller opcodes relative to the same operations in other registers. It does have some operations that require use of the A register, BCD for example, but for general operations you are not required to use it.

Depends on what you are talking about. 16-bit mode is more restricted in the use of registers than 32-bit mode. Things like mul/div are fixed to the accumulator. As are the above-mentioned lods and stos, to name but a few.

Schmide said:
Despite all its legacy instructions, the x86 produces a small code footprint relative to many other current architectures.

That was never debated. The argument was that this somehow made a compiler's job easier. I think that's completely unrelated.
That would be like saying that a compiler has a harder job optimizing for size in x64 mode than in x86, simply because the average instruction size is slightly larger in x64 mode.
No, you can use the exact same algorithms, the results will just be slightly larger... Then again, even handtuned assembly will be slightly larger. That's just a side-effect of the instructionset. Has nothing to do with how easy or difficult it would be to optimize for size.

Schmide said:
Did I say that? I don't think so! I said it was a trade off.

You said CISC. The rest was implied.

Schmide said:
The 68k did have a very small code footprint, I would say though, this may have lead to its fall from favor. Its lack of virtual addressing and opcode extensibility slowed its development keeping it behind the curve. Movement of its base to RISC architectures didn't help either.

Lack of virtual addressing? Que?
The 68k had an MMU on board since the 030 if I'm not mistaken.
As for opcode extensibility... Motorola did plenty of that. It was a variable-length instruction encoding scheme, much like x86. The difference was that Motorola stuck to using 16-bit words. So each instruction was a multiple of 2 bytes.
As you yourself said, it still had a very small code footprint. Not very different from x86, while x86 has plenty of one-byte opcodes. That does not necessarily guarantee smaller code. 68k got a lot of benefit from its larger register file and its clever addressing modes. It often just needed less instructions to do the same job as is x86 competitor. 68k also generally had higher IPC at the same clockspeed than its x86 competitors.

Schmide said:
It's easy to argue that there are unused instructions of the x86 that could easily be dropped. However there is equal ability to further extend the ISA as shown by the adoption of x64, mmx, SSE, etc.

I think you'll notice though that extensions such as x64 and SSE are not all that similar to most of the classic x86 instructionset. It's somewhere between 68k and RISC. A lot of the concepts of the classic x86 instructionset are not applied in these extensions.

Schmide said:
I don't think the use of an stack/accumulator has anything to do with memory speeds but rather the complexity of the internal units. It's a lot easier to latch to a known unit/location than to latch to one of many units/locations. Its really a way of limiting where operations take place.

I don't think that has anything to do with it. Why not? Because you can (and will) always use an internal register for that. There's no need to explicitly expose it to the programmer though, and let him use two instructions rather than one (one load and one store).

Schmide said:
Nothing is really backwards when you relate software to the hardware it executes on.

That is my point, but perhaps you didn't fully understand it yet.
x86 was not a high-end CPU. It was meant for microcomputers.
Back in those days, there was a huge difference between what you had on your desktop and where the cutting-edge software and hardware-developments took place: mainframes, minicomputers... that sort of thing. The so-called Big Iron.
*THAT* is where compiler development took place. Not on simple PCs. Those PCs weren't powerful enough. In those days you didn't just open up a GUI with an IDE on your PC and compile some code while you wait. As I said, it wasn't until the early 90s that compilers and simple (text-based) IDEs became commonplace on PCs.
Heck, if you look at the most popular computer of the 80s, the C64... It didn't even *have* any compilers, because it just was physically incapable of running any. With only 64kb of memory and just short of 1 MHz, it just wasn't possible to try and write some C/C++ code (heck, the C64 didn't even have mul or div instructions in hardware. You had to implement those yourself, and optimize them for each purpose. Don't process more bits than you have to! Always a nice pop-quiz for the younger generation of programmers: How do you implement a mul or div with just adds, subs, shifts, compares?).
At one point I found a simple Pascal compiler for the C64.. but it could only handle a few hundred lines of code at best. It just wasn't possible to write anything meaningful. That's why pretty much everything on a C64 was written completely in assembly (even stuff like Geos).

So no, compilers were developed on stuff like PDP-8, PDP-11 and all that. The same place where unix came from, among other things. x86 didn't have a whole lot to do with any of that.

Cogman · Sep 22, 2010

Scali said:
You can do memory-to-memory on x86, that's what movs is for. The beauty of the 68k addressing modes is that you can do push/pop and lods/stos/movs all with the same instruction and two addressing modes (and no need for a direction flag either

Wait, you can? http://faydoc.tripod.com/cpu/mov.htm AFAIK you were only allow register -> memory or immediate -> memory but not memory -> memory.

Is there an instruction besides mov that does memory -> memory on the x86 architecture? Or am I just not understanding the documentation of the mov instruction.

Scali · Sep 22, 2010

Cogman said:
Wait, you can? http://faydoc.tripod.com/cpu/mov.htm AFAIK you were only allow register -> memory or immediate -> memory but not memory -> memory.

Is there an instruction besides mov that does memory -> memory on the x86 architecture? Or am I just not understanding the documentation of the mov instruction.

I said 'movs', not 'mov': http://faydoc.tripod.com/cpu/movs.htm
The 's' suffix means 'string operation'. x86 has a bunch of those, lods, stos, scas, and movs. They can be combined with rep, to repeat the operation 'cx' times.
That's why si and di exist. Source index and Destination index.
movsb copies a byte from [si] to [di], and increases (or decreases, depending on the direction flag) si and di to advance in the string.
movsw and movsd do the same for words and dwords respectively.

Cogman · Sep 22, 2010

Scali said:
I said 'movs', not 'mov': http://faydoc.tripod.com/cpu/movs.htm
The 's' suffix means 'string operation'. x86 has a bunch of those, lods, stos, scas, and movs. They can be combined with rep, to repeat the operation 'cx' times.
That's why si and di exist. Source index and Destination index.
movsb copies a byte from [si] to [di], and increases (or decreases, depending on the direction flag) si and di to advance in the string.
movsw and movsd do the same for words and dwords respectively.

ah, gotcha. carry on then.

Scali · Sep 22, 2010

Cogman said:
ah, gotcha. carry on then.

Heh, I must admit... I clicked on your blog after seeing it under your last post, and had a bit of a chuckle when I saw the post that came up

Trends in Multithreaded processing

Banned

Golden Member

Banned

Lifer

Banned

Lifer

Golden Member

Golden Member

Diamond Member

Golden Member

Diamond Member

Banned

Banned

Golden Member

Banned

Golden Member

Diamond Member

Banned

Banned

Diamond Member

Banned

Lifer

Banned

Lifer

Banned