• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

k5's get 3.3KKeys/mhz?

No the rate of 3.3kkeys/mhz is correct the K5s used to be the fastest cpu out per mhz
until the athlon arrived.

Not sure why, but I am sure some will post the answer.
 
no, K5's were incredibly advanced for the time, in terms of the amount of work done per clock cycle. That's part of the reason why it was so late, because it was so aggressive in design. Its also why it didn't ramp very well.

The K5, contrary to popular belief, was an incredible little chip. It was just late to market, and didn't scale well. That was back in the day when people still tried to make "brainiac" CPU's...now everyone has joined on the "speed demon" bandwagon - which isn't necessarily a bad thing.

Long story short? K5's do much more work per clock cycle, which is partially why it does so well. K6-x's do so poorly because RC5 relies heavily on one instruction (ROL). Its slow on the K6-x. There was no reason for it to be fast (that instruction isn't used much), so they let it be slow (allows the chip to scale better). The athlon is both a branian in some sense, and a speed demon. 'Nuff said.

BK
 
Mhz for MHz, the K5 was great. However, it's biggest failing was its inability to ramp to higher MHz ratings. Hence, a 100 MHz K5 could do 330 kkeys/sec, but a 200 MHz k6 could do 340 kkeys/sec. And most k6 chips go quite a bit higher than 200 MHz...

So, no, a K5 is not the best upgrade for a socket 7 board.

JHutch
 
Everyone is on the right track. There is exactly one reason why the K5 is able to acheive such high RC5 rates (and a couple other smaller ones 🙂 ). The K5 is able to process the ROTL instruction in 1 hardware step, where as every other CPU (that I know of) must break down the instruction and execute a tight loop of simpler instructions. ROTL is a 32 bit rotational shift left for those wondering. And this just happens to be a highly used instruction for encryption using the RC5 algrothym. Check here for detailed information

Now that combined with the K5's 8 stage pipeline is how you get a 4 year old CPU having a higher keys/mhz performance that today. The K5 also has an excellent branch prediction unit(8,192-entry BHT with its own two-level GAs predictor). Now the excellent branch prediction and short pipeline makes for very few mis-predictions, and when there is a mis-prediction the penalty is not large, only 8 clocks for the K5 compared to 28 for the upcoming P4.
 
Fandu,

Isn't that predictor you describe the K6 predictor? Nit-picky, I know... But, you were dead-on with the ROTL instruction stuff.

JHutch
 
I thought they used the same predictor in both the K5 and K6... hmmm, maybe my mind is failing me, I should check up on that...

Edit:

Correct you are. The K5 uses a 16K, 1024 instruction dynamic branch prediction system, and does not maintain a BHT, or use a prediction angrothym. Instead, it assign a "branch bit" to each bramch as it passes through the pipeline, and it changes this bit based on the instructions ahead of the branch in the pipeline.



<< Branch prediction is handled a little differently than in other advanced microprocessors. Instead of maintaining a separate branch target buffer to hold the addresses of predicted branches, the K5 appends the predicted address to the branch instruction during predecode. This 10-bit tag, called a successor index, points to a target within the I-cache.

At first, all predecoded branch instructions are predicted not taken. Later, if speculative execution reveals that the prediction was wrong, the prediction is reversed by writing a new successor index that points to the correct cache block. That prediction remains in effect until it's wrong again. In other words, the prediction is reversed every time it's wrong.

This is one reason why the cache blocks are only 16 bytes in size. The K5 can predict only one taken branch per block, so a smaller block reduces the chance that an instruction will branch to another branch in the same block. A 32-byte cache block would reduce performance, according to AMD's simulations.

Although the branch prediction is ``dynamic'' in the sense that it adapts to wrong predictions at run time, it does so merely by reversing its predictions in a binary flip-flop. In contrast, some of the latest RISC processors use algorithms that dynamically predict the outcome of branches by keeping track of how often a particular branch is actually taken. But RISC chips don't have to bother with complicated x86 decoding. By adopting a somewhat simpler form of branch prediction, the K5 keeps an already complicated decoder from becoming even more labyrinthine.

There is another advantage to the K5's approach: In effect, it predicts branches over a larger sample of the program than other methods. Branch target buffers have a limited number of entries, usually a few dozen. However, the K5 can theoretically predict a branch in every cache block. Since the block size is 16 bytes and the I-cache is 16 KB, that's potentially 1024 branches. This larger sample--coupled with the K5's flexible cache fetching--partly offsets its less sophisticated predictions. Of course, when the cache is flushed, all the prediction states are lost, too, because they're tagged to the instructions instead of being held in a branch target buffer.

To make this whole mechanism complete, the K5's byte queue can trigger a special signal called BQ confused. It waves this flag when the predecoded instructions don't appear to make sense because of a mispredicted branch or some other anomaly. The signal wipes out the incoherent cache blocks and reloads them with freshly predecoded instructions. Johnson says this rarely happens, but it is so reliable that it once masked a bug in the K5's critical logic path during the chip's early development. Even though not even AMD would claim the K5 is a fault-tolerant processor, it's comforting to know there's a mechanism of last resort that is robust enough to handle a logic glitch and confused code.
>>

 
Back
Top