k5's get 3.3KKeys/mhz?

toph99 · Oct 25, 2000

why is it they get such a high number, and the k6-2's, 3's are all half(nearly) and an athlon is just barely faster?
just read it on the speed page

nickdakick · Oct 25, 2000

You're right, should be near a Pentium Classic, so 1.3 should be the correct number. 😕

Crow · Oct 25, 2000

No the rate of 3.3kkeys/mhz is correct the K5s used to be the fastest cpu out per mhz
until the athlon arrived.

Not sure why, but I am sure some will post the answer.

BurntKooshie · Oct 25, 2000

no, K5's were incredibly advanced for the time, in terms of the amount of work done per clock cycle. That's part of the reason why it was so late, because it was so aggressive in design. Its also why it didn't ramp very well.

The K5, contrary to popular belief, was an incredible little chip. It was just late to market, and didn't scale well. That was back in the day when people still tried to make "brainiac" CPU's...now everyone has joined on the "speed demon" bandwagon - which isn't necessarily a bad thing.

Long story short? K5's do much more work per clock cycle, which is partially why it does so well. K6-x's do so poorly because RC5 relies heavily on one instruction (ROL). Its slow on the K6-x. There was no reason for it to be fast (that instruction isn't used much), so they let it be slow (allows the chip to scale better). The athlon is both a branian in some sense, and a speed demon. 'Nuff said.

BK

LeBlatt · Oct 25, 2000

Hence the K5 code name, which was supposed to be the "Pentium Killer".

toph99 · Oct 25, 2000

so if i could find any, a better upgrade for a socket7 board would be a k5 for cracking RC5?

JHutch · Oct 25, 2000

Mhz for MHz, the K5 was great. However, it's biggest failing was its inability to ramp to higher MHz ratings. Hence, a 100 MHz K5 could do 330 kkeys/sec, but a 200 MHz k6 could do 340 kkeys/sec. And most k6 chips go quite a bit higher than 200 MHz...

So, no, a K5 is not the best upgrade for a socket 7 board.

JHutch

Fandu · Oct 25, 2000

Everyone is on the right track. There is exactly one reason why the K5 is able to acheive such high RC5 rates (and a couple other smaller ones 🙂 ). The K5 is able to process the ROTL instruction in 1 hardware step, where as every other CPU (that I know of) must break down the instruction and execute a tight loop of simpler instructions. ROTL is a 32 bit rotational shift left for those wondering. And this just happens to be a highly used instruction for encryption using the RC5 algrothym. Check here for detailed information

Now that combined with the K5's 8 stage pipeline is how you get a 4 year old CPU having a higher keys/mhz performance that today. The K5 also has an excellent branch prediction unit(8,192-entry BHT with its own two-level GAs predictor). Now the excellent branch prediction and short pipeline makes for very few mis-predictions, and when there is a mis-prediction the penalty is not large, only 8 clocks for the K5 compared to 28 for the upcoming P4.

JHutch · Oct 25, 2000

Fandu,

Isn't that predictor you describe the K6 predictor? Nit-picky, I know... But, you were dead-on with the ROTL instruction stuff.

JHutch

Fandu · Oct 25, 2000

I thought they used the same predictor in both the K5 and K6... hmmm, maybe my mind is failing me, I should check up on that...

Edit:

Correct you are. The K5 uses a 16K, 1024 instruction dynamic branch prediction system, and does not maintain a BHT, or use a prediction angrothym. Instead, it assign a "branch bit" to each bramch as it passes through the pipeline, and it changes this bit based on the instructions ahead of the branch in the pipeline.

<< Branch prediction is handled a little differently than in other advanced microprocessors. Instead of maintaining a separate branch target buffer to hold the addresses of predicted branches, the K5 appends the predicted address to the branch instruction during predecode. This 10-bit tag, called a successor index, points to a target within the I-cache.

At first, all predecoded branch instructions are predicted not taken. Later, if speculative execution reveals that the prediction was wrong, the prediction is reversed by writing a new successor index that points to the correct cache block. That prediction remains in effect until it's wrong again. In other words, the prediction is reversed every time it's wrong.

This is one reason why the cache blocks are only 16 bytes in size. The K5 can predict only one taken branch per block, so a smaller block reduces the chance that an instruction will branch to another branch in the same block. A 32-byte cache block would reduce performance, according to AMD's simulations.

Although the branch prediction is ``dynamic'' in the sense that it adapts to wrong predictions at run time, it does so merely by reversing its predictions in a binary flip-flop. In contrast, some of the latest RISC processors use algorithms that dynamically predict the outcome of branches by keeping track of how often a particular branch is actually taken. But RISC chips don't have to bother with complicated x86 decoding. By adopting a somewhat simpler form of branch prediction, the K5 keeps an already complicated decoder from becoming even more labyrinthine.

There is another advantage to the K5's approach: In effect, it predicts branches over a larger sample of the program than other methods. Branch target buffers have a limited number of entries, usually a few dozen. However, the K5 can theoretically predict a branch in every cache block. Since the block size is 16 bytes and the I-cache is 16 KB, that's potentially 1024 branches. This larger sample--coupled with the K5's flexible cache fetching--partly offsets its less sophisticated predictions. Of course, when the cache is flushed, all the prediction states are lost, too, because they're tagged to the instructions instead of being held in a branch target buffer.

To make this whole mechanism complete, the K5's byte queue can trigger a special signal called BQ confused. It waves this flag when the predecoded instructions don't appear to make sense because of a mispredicted branch or some other anomaly. The signal wipes out the incoherent cache blocks and reloads them with freshly predecoded instructions. Johnson says this rarely happens, but it is so reliable that it once masked a bug in the K5's critical logic path during the chip's early development. Even though not even AMD would claim the K5 is a fault-tolerant processor, it's comforting to know there's a mechanism of last resort that is robust enough to handle a logic glitch and confused code.
>>

nickdakick · Oct 25, 2000

Damn, so I had my outing with my first post ? 🙁 I'll downgrade right now.

k5's get 3.3KKeys/mhz?

toph99

Diamond Member

nickdakick

Platinum Member

Crow

Senior member

BurntKooshie

Diamond Member

LeBlatt

Golden Member

toph99

Diamond Member

JHutch

Golden Member

Fandu

Golden Member

JHutch

Golden Member

Fandu

Golden Member

nickdakick

Platinum Member

TRENDING THREADS