1x256bit CPU or 4x64bit, whats the impact ?

HypNoTic · Sep 17, 2001

Wich would be more efficient in a given environnement at a given clock speed ?

1 pipeline x 256bit data bus CPU or a 4 pipeline wide x 64bit data bus

I suppose with a 256bit compiler, the first one would be more efficient but with with 64 instruction, what would be the impact on the first CPU and the other ?

I'm not sure about this but is having 4 pipelines of 64 bits might equal to having 4 CPU in SMP when you have a multi-thread application ?

BurntKooshie · Sep 17, 2001

I think you have a few things mixed up.

CPUs can have varying "databus" widths, and still be the same "bitness" of a CPU. A 32-bit CPU could have an 8-bit bus feeding it, but that'd be kinda slow. The "bitness" of a CPU refers, at least as I understand it, to the number of bits (number of places for a 1 or a 0 to take place), with integer values.

I think what you're thinking of is the size of the registers. A 32-bit CPU has 32-bit integer registers, meaning it can represent numbers all the way up to (2 ^ 32) - 1. After that, it gets overflow, where the last available bit (left most, if you're thinking of it going from right to left) is already a 1, as are all the other bits, and there would be a carry over, except there are no more "bits" left for storing this spill-over.

Now with a 256-bit register, you could have integers that go all the way the heck up to (2 ^ 256) - 1. That's pretty damned big. Will we one-day make use of that? Maybe. Not any time soon. Few people actually need 32-bit processing (more people are running into the problem of only 32-bit flat adressing, but that's another issue). This doesn't inherantly mean it could process 4-data elements (each 64-bits in size) at the same time. To do so is called SWAR computing (SIMD Within A Register). This requires all four data elements to have the exact same instruction operated upon them (hence Single Instruction Multiple Data). This can happen in a lot of code, but not everything. A lot of things have to have different instructions performed on them. This means that not everything can be done in SIMD easily.

So a 256-bit functional unit could handle non-SIMD data, I would think, but would only be able to operate on one data element at a time. This would, of course, be a waste.

Four 64-bit functional units would be better in most respects, assuming the data-element size is 64-bits. In this way, it could process four data elements, whether they all have the same instruction performed on them (they'd each have to have their own instructions for them, however), or if they were four different data elements, with four different instructions being performed on them.

Just to be clear, I went from registers to functional units (in our case, the functional unit would be an integer unit). This is because a 64-bit funtional unit is the stuff that actually does the work, and it stores and retrieves its information from registers.

Having four functional units is somewhat, kinda-ish-but-not-really like having SMP on a chip. Four functional units would mean that it would be a 4-way superscalar core (scalar meaning only one, super scalar meaning more than one). The functional units can do their thing on data that is from a particular thread (currently), and nothing more. So a 4-way superscalar core can perform 8 instructions per cycle (assuming it is pipelined) from one thread. An 8-way superscalar core could perform 8 instructions per cycle (again, assuming it's pipelined) from one thread.

In order to be able to have the functional units work on more than one thread at once, you either have to break the die up into multiple CPUs (called CMP, or chip level multiprocessing), or do something like SMT (Simultaneous multithreading). CMP means that they are physically different CPUs (probably identical) with their resources cut in half compared to a regular one (or cut in forth if it's 4 CPUs on a die, etc). SMT is about sharing the resources of a chip (though some has to be duplicated). In an SMT machine, the CPU can either work on one thread, just like a regular superscalar, or work on as many threads as it has support for (an n-way SMT machine could work on n-threads at the same time).

Um....if you have any other questions, or if I confused you, LMK

SinNisTeR · Sep 17, 2001

i was gonna say the same exact thing, he beat me to it.. hheehehe

Train · Sep 17, 2001

BK, your way, right away.

cookieman · Sep 20, 2001

HypNoTic: did BurntKooshie hipnotize you ?

Cheers,

RaynorWolfcastle · Sep 20, 2001

Side question for this. When video consoles say they're 64-bit (N64) or 128-bit. What exactly does this "bitness" refer to? Why do they need 128 bits when the computer is only beginning to make the transition to 64-bit and that's mainly to have access to more memory (and, in Intel's case to get rid of x86)? Can someone shed some light on this for me please?

-Ice

MustPost · Sep 20, 2001

I have always wondered about that too.
I would guess consoles are refering to a different thing?
but few people actually know what it is.

Sohcan · Sep 20, 2001

<< What exactly does this "bitness" refer to? >>

Usually its the width of the general purpose registers or Arithmetic Logic Unit, but its also a bit of a marketing term for consoles. At one time, the transition from 8-bit to 16-bit to 32-bit CPUs was important for consoles, because it allowed for flat addressing of more memory. The increased memory as well as faster CPUs (and more advanced architectures) are what allowed for better graphics. Now Sony made a big deal about its Emotion Engine for the PS2, claiming its a 128-bit CPU...that's not entirely true, at least from the standard definition. The Emotion Engine actually uses the standard 64-bit MIPS III core, and uses a 128-bit SIMD Vector coprocessor...but guess what...the G4 and Athlon/P3/P4 already have 128-bit SIMD with Altivec and SSE, respectively (though granted in a limited fasion with current implementations of SSE).

For non-SIMD integer math, having a 64-bit CPU vs. a 32-bit CPU doesn't imply that the 64-bit CPU is twice as fast. SISD arithmetic (single instruction, single data) operates on two operands and yields one result; if you do 1 + 1, it doesn't matter if you express the numbers with 32-bits or 64, the instruction will still do only one add. Floating-point arithmetic does benefit from wider adders in terms of accuracy, but FP implementations already have a high degree of precision (x87 uses 80-bit internal precision). SIMD (MMX, 3DNow, SSE1/2) does grant increased throughput from wider units, since you can do more operations in parallel (ie, 128-bit SIMD add can do 4 32-bit adds at once), but programming for SIMD instructions introduces a number of other issues.

But anyway, just like PCs, the bottleneck for video performance lies not in the CPU but in the graphics processor and memory and system bandwidth.

Search

1x256bit CPU or 4x64bit, whats the impact ?

HypNoTic

Member

BurntKooshie

Diamond Member

SinNisTeR

Diamond Member

Train

Lifer

cookieman

Senior member

RaynorWolfcastle

Diamond Member

MustPost

Golden Member

Sohcan

Platinum Member

TRENDING THREADS