AMD Clockspeeds

quizzelsnatch

Senior member
Nov 12, 2004
860
0
0
Why is it that Intel is able to ramp up their clockspeeds on the pentium 4 while the athlon xp/64 aren't able to have as high clockspeeds?

I know that in general, amd has short pipes so that they have a higher IPC and intel has long pipes so they do less IPC, but I don't see how that makes it so that intel can raise their clockspeeds so much higher...


(maybe a repost? too lazy to search)
 

complacent

Banned
Dec 22, 2004
191
0
0
Originally posted by: itachi
you spent all that time typing up a message that's 100 words long, yet you're too lazy to hit search? cmon..

Arguably all questions asked in this forum could be found with a search and the forum would go away. In the time that you spent typing a smartass reply you could have answered his question:

AMD and Intel processors are built with two different paradigms. AMD thinks that IPC is more important, so they design their chips with that in mind. Intel thinks clock speed is more important, so they design their chips with that in mind. Intel's chips use less energy and generate less heat and are able to operate at a higher clock speed. AMD's chips use more eneregy and generate more heat. AMD could produce a 3 GHz clock processor, but they would need to shift their focus to efficiency and heat dissipation instead of IPC optimization. Note that with adequate cooling one can overclock expensive AMD processors to a 2.7 GHz clock (AMD-FX series)

AMD isn't the only company that does this. Sun's UltraSPARC processors haven't even breached 1.5 GHz mark. The processors in a Cray X1 have a clock of 800 MHz. The list goes on.

A caveat to this is the Intel Pentium M. The Pentium M has a shorter pipeline than a P4 and has a higher IPC. The Pentium M is only working at about 2.0 GHz now, and those are very expensive. Intel announced that they will be shifting theit focus to IPC with the P4's and even dropped their bid to get a 4 GHz processor out.
 

harrkev

Senior member
May 10, 2004
659
0
71
The short and sweet answer...

Every CPU has registers, and logic.

A register has an input and an output. The output is held until it sees the rising edge of a clock, then it copies the input to the output.

Logic simple has a lot of inputs, and a lot of outputs. The outputs depend only upon the inputs, and it takes a certain amount of time for the inputs to cause a change in the outputs. Note that this time depends on the process and materials used in making the part, the complexity of that particular section of logic, temperature, and voltage.

A clock pulse hits, and all registers latch their data. Then, the outputs of the registers have to go through the logic and make it to the inputs of the next set of registers before the next clock pulse. So, in a real sense, it is mostly the speed of the logic that determines the maximum speed of the part. (there is some delay associated with the registers too, but we will assume that this is not the case to simplify the analysis)

So, let's assume that the data has to go through X amount of logic in order to be processed. Let's also assume Intel splits the logic into 20 pieces, with 21 sets of registers. So, the maximum clock speed is the time in which X/20 can be done. This is called pipelining. if you have a 20 stage pipeline, then each section of the pipe can process one instruction at a time. This means that there are 20 different instructions all in various stages of completion.

Now, let's assume that AMD only splits the X into 10 pieces, so their maximum clock speed is X/10. So their clock speed on average is half that of Intel.

This is the extremely simple guide to why one company has lower clock speeds. But you also need to keep in mind that this is not the whole picture when it comes to performance. Intel has a deeper pipline, but more work must be "thrown out" in the event of a branch mis-prediction. So, just increasing the pipeline depth brings diminishing levels of return. I suspect that Intel is already at the point where increasing the pipline depth would not help at all.

So, the extra-short anser is that AMD has slower clock cycles, but they do more in each cycle to make up for it.
 

itachi

Senior member
Aug 17, 2004
390
0
0
Arguably all questions asked in this forum could be found with a search and the forum would go away. In the time that you spent typing a smartass reply you could have answered his question:
uh huh.. so i suppose it only took you a few seconds to write that post hm?
well... I wasn't sure what to search, I didn't want to go through a bunch of posts not pertaining to what I was looking for. And it's not like it's that hard of a question.
it isn't a hard question.. but it gets covered like 10 times every day. wouldn't care if i didn't read it 5 times before i came here.
 

frootbooter

Member
Dec 3, 2004
63
0
0
Originally posted by: harrkev
The short and sweet answer...

Every CPU has registers, and logic.

A register has an input and an output. The output is held until it sees the rising edge of a clock, then it copies the input to the output.

Logic simple has a lot of inputs, and a lot of outputs. The outputs depend only upon the inputs, and it takes a certain amount of time for the inputs to cause a change in the outputs. Note that this time depends on the process and materials used in making the part, the complexity of that particular section of logic, temperature, and voltage.

A clock pulse hits, and all registers latch their data. Then, the outputs of the registers have to go through the logic and make it to the inputs of the next set of registers before the next clock pulse. So, in a real sense, it is mostly the speed of the logic that determines the maximum speed of the part. (there is some delay associated with the registers too, but we will assume that this is not the case to simplify the analysis)

So, let's assume that the data has to go through X amount of logic in order to be processed. Let's also assume Intel splits the logic into 20 pieces, with 21 sets of registers. So, the maximum clock speed is the time in which X/20 can be done. This is called pipelining. if you have a 20 stage pipeline, then each section of the pipe can process one instruction at a time. This means that there are 20 different instructions all in various stages of completion.

Now, let's assume that AMD only splits the X into 10 pieces, so their maximum clock speed is X/10. So their clock speed on average is half that of Intel.

This is the extremely simple guide to why one company has lower clock speeds. But you also need to keep in mind that this is not the whole picture when it comes to performance. Intel has a deeper pipline, but more work must be "thrown out" in the event of a branch mis-prediction. So, just increasing the pipeline depth brings diminishing levels of return. I suspect that Intel is already at the point where increasing the pipline depth would not help at all.

So, the extra-short anser is that AMD has slower clock cycles, but they do more in each cycle to make up for it.
Right, but uhh... X/20 (intel) < X/10 (amd), so that would mean that AMD can reach faster clock speeds...

We know what you meant though ;)
 

CTho9305

Elite Member
Jul 26, 2000
9,214
1
81
Originally posted by: frootbooter
Originally posted by: harrkev
The short and sweet answer...

Every CPU has registers, and logic.

A register has an input and an output. The output is held until it sees the rising edge of a clock, then it copies the input to the output.

Logic simple has a lot of inputs, and a lot of outputs. The outputs depend only upon the inputs, and it takes a certain amount of time for the inputs to cause a change in the outputs. Note that this time depends on the process and materials used in making the part, the complexity of that particular section of logic, temperature, and voltage.

A clock pulse hits, and all registers latch their data. Then, the outputs of the registers have to go through the logic and make it to the inputs of the next set of registers before the next clock pulse. So, in a real sense, it is mostly the speed of the logic that determines the maximum speed of the part. (there is some delay associated with the registers too, but we will assume that this is not the case to simplify the analysis)

So, let's assume that the data has to go through X amount of logic in order to be processed. Let's also assume Intel splits the logic into 20 pieces, with 21 sets of registers. So, the maximum clock speed is the time in which X/20 can be done. This is called pipelining. if you have a 20 stage pipeline, then each section of the pipe can process one instruction at a time. This means that there are 20 different instructions all in various stages of completion.

Now, let's assume that AMD only splits the X into 10 pieces, so their maximum clock speed is X/10. So their clock speed on average is half that of Intel.

This is the extremely simple guide to why one company has lower clock speeds. But you also need to keep in mind that this is not the whole picture when it comes to performance. Intel has a deeper pipline, but more work must be "thrown out" in the event of a branch mis-prediction. So, just increasing the pipeline depth brings diminishing levels of return. I suspect that Intel is already at the point where increasing the pipline depth would not help at all.

So, the extra-short anser is that AMD has slower clock cycles, but they do more in each cycle to make up for it.
Right, but uhh... X/20 (intel) < X/10 (amd), so that would mean that AMD can reach faster clock speeds...

We know what you meant though ;)

What? He's correct. Intel divides the work into 20 steps, so if the work takes a total of 20 seconds, Intel's design can be clocked once per second, or 1000mHz (millihertz). The AMD design does 20 seconds of work in 10 steps, each step taking two seconds: 500mHz. Of course, in reality, it's nanoseconds/picoseconds of work instead of seconds.

edit: sorry, you're right, he said "clock speed" is X/10, when he means "clock period" - for speed, the fractions should be inverted.
 

itachi

Senior member
Aug 17, 2004
390
0
0
Originally posted by: frootbooter
Right, but uhh... X/20 (intel) < X/10 (amd), so that would mean that AMD can reach faster clock speeds...

We know what you meant though ;)
what he was saying was based on theory.. tAMD / 10 ~= tINTEL / 20.

building off what harrkev said.. in a pipeline there are 4 primary stages: fetch, decode, execute, and write-back. say that those are the only stages and each one is equal in size. that means that an intel processor starts the execute phase at the 11th clock cycle edge, and an amd processor starts at the 6th clock cycle. that means that intel processors have to wait 5 more clock cycles before it can execute an instruction with an empty pipeline. if every instruction processed was dependant on the write-back of the instruction ahead of it, then intel processors would have to wait 20 clock cycles while amd would have to wait 10 clock cycles. in this case, intel processors would have to be clocked twice as highly to have equivalent pipeline performance to amd processors.. which is where the tAMD / 10 = tINTEL / 20 comes from.
 

imgod2u

Senior member
Sep 16, 2000
993
0
0
There's also the fact that the K7/K8 design is wider. Everyone seems to think pipeline length is the definitive way to determine effective IPC. In reality, with modern branch predictors, it's actually a very small influence.
The K7/K8 design has 9 execution units with 9 issue ports. 3 for FP, 3 for Integer and 3 for Memory address computation. Netburst has 7 execution units with 4 issue ports (2 of which are double-pumped when operating on 16-bit operands). Port 0 can issue either to a double-pumped ALU (at twice the core frequency) or to an FP move operation (at normal frequency), port 1 can issue to a second double-pumped ALU or a normal speed ALU operation, or an FP operation (x87, MMX or SSE), port 2 can issue memory loads and port 3 issues memory stores.
The execution engine of Netburst is a lot less flexible than on the K7/K8 and therefore, it does require more optimzation in software to get better performance. However, you'll note that you also have a lot less hardware (using up transistors and sucking up heat). This leaves room for more cache, or other features (such as SSE2) and also allows you to clock the processor to higher frequencies in the same amount of power. This does reduce effective IPC somewhat, but with software that is adjusted, it shouldn't be too dramatic a difference and the increase headroom in clockspeed should more than make up, and even surpass the wider design in many tasks. We see this with Netburst against Banias or Netburst against the K7. In multimedia (specifically SSE2) operations, Netburst is simply king. It was designed to be so.
 

frootbooter

Member
Dec 3, 2004
63
0
0
Right, I know what he's saying, just, a number divided by 10 is bigger than the same number divided by 20. His post just says that backwards, that's all I was pointing out. I'm not disagreeing with him on why amd is faster per clock.

I'm talking about the math, not the theory lol. He didn't have two different numbers (tAMD and tINTEL) divided by 10 and 20. What he said was that AMD's max clockspeed is X/10 and intel's max clock speed is X/20, which supposedly explains the higher max clock speed for Intel... not really though, because X/10 > X/20, which is completely backwards. That implies that AMD has a higher max clock speed than Intel (which we all know to be false for reasons layed out previously).
 

itachi

Senior member
Aug 17, 2004
390
0
0
Originally posted by: frootbooter
Right, I know what he's saying, just, a number divided by 10 is bigger than the same number divided by 20. His post just says that backwards, that's all I was pointing out. I'm not disagreeing with him on why amd is faster per clock.

I'm talking about the math, not the theory lol. He didn't have two different numbers (tAMD and tINTEL) divided by 10 and 20. What he said was that AMD's max clockspeed is X/10 and intel's max clock speed is X/20, which supposedly explains the higher max clock speed for Intel... not really though, because X/10 > X/20, which is completely backwards. That implies that AMD has a higher max clock speed than Intel (which we all know to be false for reasons layed out previously).
ohhh.. whoops, my bad.

btw.. what the crap is with my less-than symbol?? on my computer.. it looks extremely stretched, spanning 2 spots rather than 1 like it should.. doesn't look that way for any other postings with that symbol tho.
 

imgod2u

Senior member
Sep 16, 2000
993
0
0
x/20 and x/10 is the cycle time. That is, how much time each clockcycle lasts. The higher the cycle time, the less cycles you can have in each second. For instance, if your cycle time is 1 ns, then you can have 1 billion cycles in a second. If your cycle time is 0.5 ns, then you can have 2 billion cycles in a second.
 

Philippine Mango

Diamond Member
Oct 29, 2004
5,594
0
0
Originally posted by: itachi
Originally posted by: frootbooter
Right, I know what he's saying, just, a number divided by 10 is bigger than the same number divided by 20. His post just says that backwards, that's all I was pointing out. I'm not disagreeing with him on why amd is faster per clock.

I'm talking about the math, not the theory lol. He didn't have two different numbers (tAMD and tINTEL) divided by 10 and 20. What he said was that AMD's max clockspeed is X/10 and intel's max clock speed is X/20, which supposedly explains the higher max clock speed for Intel... not really though, because X/10 > X/20, which is completely backwards. That implies that AMD has a higher max clock speed than Intel (which we all know to be false for reasons layed out previously).
ohhh.. whoops, my bad.

btw.. what the crap is with my less-than symbol?? on my computer.. it looks extremely stretched, spanning 2 spots rather than 1 like it should.. doesn't look that way for any other postings with that symbol tho.

when you post (in post window) things are scewed and look much different than what it actually is (on atot).
 

frootbooter

Member
Dec 3, 2004
63
0
0
Originally posted by: itachi
btw.. what the crap is with my less-than symbol?? on my computer.. it looks extremely stretched, spanning 2 spots rather than 1 like it should.. doesn't look that way for any other postings with that symbol tho.

Haha, wow... in your post it shows "~=" where the < is. That's really weird.
 

ChuaChua

Member
Dec 20, 2002
178
0
0
Originally posted by: frootbooter
Originally posted by: itachi
btw.. what the crap is with my less-than symbol?? on my computer.. it looks extremely stretched, spanning 2 spots rather than 1 like it should.. doesn't look that way for any other postings with that symbol tho.

Haha, wow... in your post it shows "~=" where the < is. That's really weird.

Yeah. i got "~=" too... I thought you meant "approximately equal to" or something like that.
 

Vee

Senior member
Jun 18, 2004
689
0
0
Originally posted by: imgod2u
There's also the fact that the K7/K8 design is wider. Everyone seems to think pipeline length is the definitive way to determine effective IPC. In reality, with modern branch predictors, it's actually a very small influence.
The K7/K8 design has 9 execution units with 9 issue ports. 3 for FP, 3 for Integer and 3 for Memory address computation. Netburst has 7 execution units with 4 issue ports (2 of which are double-pumped when operating on 16-bit operands). Port 0 can issue either to a double-pumped ALU (at twice the core frequency) or to an FP move operation (at normal frequency), port 1 can issue to a second double-pumped ALU or a normal speed ALU operation, or an FP operation (x87, MMX or SSE), port 2 can issue memory loads and port 3 issues memory stores.
The execution engine of Netburst is a lot less flexible than on the K7/K8 and therefore, it does require more optimzation in software to get better performance. However, you'll note that you also have a lot less hardware (using up transistors and sucking up heat). This leaves room for more cache, or other features (such as SSE2) and also allows you to clock the processor to higher frequencies in the same amount of power. This does reduce effective IPC somewhat, but with software that is adjusted, it shouldn't be too dramatic a difference and the increase headroom in clockspeed should more than make up, and even surpass the wider design in many tasks. We see this with Netburst against Banias or Netburst against the K7. In multimedia (specifically SSE2) operations, Netburst is simply king. It was designed to be so.

This (quoted) is the fairly correct answer sofar. Read it again.

************

The "longer pipe/less work per clock-shorter pipe/more work per clock" paradigm is mostly misunderstood and misused. Actually, think about it for a while: - Even if less work is done per clock at each stage in the pipe, - one finished instruction can still be dispatched by the end of the pipe, each clockcycle! So longer pipe, faster clock should be faster by this reasoning.

It isn't because all this is manure anyway. Fast clock is the only reason for a long pipe. You do a long pipe in order to make a fast clockrate possible. But that's just manure in so many ways. It wastes transistors and it wastes power. Contrary to an early answer in this thread, Intel does not at all make more power efficient CPUs than AMD. Way, way the other way round! Power consumption increases aggressively with clockrate, and the only Intel CPUs using less power than AMDs, are much lower clocked again.

The true reason is width. AMD's K7 &amp; K8 cores are roughly 50% wider than Intel's Northwood &amp; Prescott.