Could a pentium 4 reach 10 ghz on intel latest process node?

superrockie · Feb 3, 2014

We all know that intels pentium 4 were designed for higher clockspeeds then they reached. The max that intel could get of the pentium was 3,2 ghz because off heat constraints. My question is: could these cpu really reach 10 ghz if there made on the latest process node from intel?

ninaholic37 · Feb 3, 2014

Intel says they will reach 10Ghz by 2005:

http://slashdot.org/story/00/12/11/101224/intel-says-10ghz-by-2005
http://www.theinquirer.net/inquirer/news/1007219/-20ghz-intel-nehalem-slated-2005

ShintaiDK · Feb 3, 2014

The P4 reached 3.8Ghz on 90nm at 115W.

10Ghz for the whole CPU? No, 10Ghz for the ALUs? Yes.

Abwx · Feb 3, 2014

Max was 3.73 or so , 10ghz is doable with current process
but its ipc would still be at the level of a 4ghz HW.

Yuriman · Feb 3, 2014

As I understand it, no. It's not my field though.

Intel's latest node isn't designed around hitting 10GHz.

ElFenix · Feb 3, 2014

i thought silicon had reliability issues before you got to 10 ghz

Abwx · Feb 3, 2014

Yuriman said:
As I understand it, no. It's not my field though.

Intel's latest node isn't designed around hitting 10GHz.

With a 30-50 million xtors CPU it s possible , current transistors
have transition frequencies 3 times higher than 90nm siblings,
actualy it s the transistor inflation that limit frequency.

TuxDave · Feb 3, 2014

ShintaiDK said:
The P4 reached 3.8Ghz on 90nm at 115W.

10Ghz for the whole CPU? No, 10Ghz for the ALUs? Yes.

Yeah I was about to say. A 10GHz P4 means a 20GHz ALU = 50ps cycle times. lol, that would really suck to deal with. All synchronous logic would go out the window.

ShintaiDK · Feb 3, 2014

Yuriman said:
As I understand it, no. It's not my field though.

Intel's latest node isn't designed around hitting 10GHz.

Yep, thats another issue so to say. Efficiency rules, not raw performance.

Abwx · Feb 3, 2014

TuxDave said:
Yeah I was about to say. A 10GHz P4 means a 20GHz ALU = 50ps cycle times. lol, that would really suck to deal with. All synchronous logic would go out the window.

Not sure that the double pumped ALUs were effectively
implemented IIRC.

videogames101 · Feb 3, 2014

if intel targeted that frequency, probably

would be difficult

lagokc · Feb 3, 2014

ShintaiDK said:
The P4 reached 3.8Ghz on 90nm at 115W.

10Ghz for the whole CPU? No, 10Ghz for the ALUs? Yes.

The Pentium 4 hit 8 GHz at 65nm ... on liquid nitrogen.

http://www.engadget.com/2007/01/24/pentium-4-overclocked-to-8ghz-lets-see-your-fancy-core-2-try-t/

Lepton87 · Feb 3, 2014

On Air or with exotic cooling? With exotic cooling I think it shouldn't be difficult, 8150 already reached past 8GHz, maybe it would be enough to reach 10GHz if made on an Intel process.

SPBHM · Feb 3, 2014

lagokc said:
The Pentium 4 hit 8 GHz at 65nm ... on liquid nitrogen.

http://www.engadget.com/2007/01/24/pentium-4-overclocked-to-8ghz-lets-see-your-fancy-core-2-try-t/

ln2...
I think there are some 7GHz 22nm Intel CPUs OC around, but I don't think the OP was talking about ln2 OC?

Tristor · Feb 3, 2014

Pentium 4s were thermally limited primarily, and the efficiency gains Intel achieved following the failure of the Pentium 4 are not exclusively tied to die shrinks. The total rearchitecture that came with Core made as much if not more difference in getting them past their prior thermal limits and gaining a significant amount of IPCs. Do I think you could maybe get a P4 on a 22nm process to 10GHz? Probably under LN2. Would it matter? Nope, because Haswell is about 90% more efficient clock for clock than P4, and if you remove the process shrink efficiency gains, it's still 60% more efficient simple due to architecture. Under LN2 you can get Haswell to 7GHz which would still be much much faster than the P4.

Bateluer · Feb 3, 2014

Perhaps, but even if it did,it'd still perform worse than the current Haswell line.

TuxDave · Feb 4, 2014

Abwx said:
Not sure that the double pumped ALUs were effectively
implemented IIRC.

I don't know about you but the lvs circuits to get the ALUs to get double pumped in the first place is goddamn genius to me.

VirtualLarry · Feb 4, 2014

Is there a power-efficiency reason why we don't see double-clocked ALUs and shaders these days?

For example, the P4 had double-pumped ALUs, the Core2 didn't.

Another example, consider Fermi's shader's and their "hot clock" (double core freq), versus the greater number, but slower (running at core clock) shaders in Kepler.

Is the slower, but wider, trend something that we will see more of, in the quest for power-efficiency?

Dresdenboy · Feb 4, 2014

TuxDave said:
I don't know about you but the lvs circuits to get the ALUs to get double pumped in the first place is goddamn genius to me.

It works, as does domino logic.

Here is one of the papers, describing Intel's tech:
http://ctho.org/toread/forclass/18-722/logicfamilies/Deleganes05.pdf

The shmoo plot goes up to 4.2 GHz base clock (8.4 GHz ALUs) at 1.4V.

I think, due to different effects of shrinks (lower power consumption, lower FO4 cycle times, shorter wire lengths) it might even be possible to push the P4 core to 10 GHz.

Just remember, today we're looking at 4C or more with SMT + GPU + IMC + SA etc. at far below 100W TDP. The P4 was just one core + FSB.

We might look at the Quark or MIC cores, which were derived from older x86 cores. Eg. the Quark is derived from the 486 pipeline but got a Pentium class ISA. It achieved a 4x-10x clock increase over the original designs at different process nodes.

Dresdenboy · Feb 4, 2014

VirtualLarry said:
Is there a power-efficiency reason why we don't see double-clocked ALUs and shaders these days?

For example, the P4 had double-pumped ALUs, the Core2 didn't.

Another example, consider Fermi's shader's and their "hot clock" (double core freq), versus the greater number, but slower (running at core clock) shaders in Kepler.

Is the slower, but wider, trend something that we will see more of, in the quest for power-efficiency?

Yes, it is.

Switching clocks wastes energy and storing intermediate results in latches does the same. And there is more to it. E.g. the gates evaluate the results during the low/high phases of the clock signal. The faster the clock has to switch, the lower the usable percentage of time gets. There's noise and skew,etc.

Abwx · Feb 4, 2014

TuxDave said:
I don't know about you but the lvs circuits to get the ALUs to get double pumped in the first place is goddamn genius to me.

The genius idea was to call double pumped the fact that
a single ALU could receive two micro ops simultaneously
and market it as double the frequency , of course , literaly
the frequency of micro ops is doubled , indeed the integer
performance was all but doubled compared to previous designs...

http://books.google.fr/books?id=gni...onepage&q=pentium 4 double pumped alu&f=false

Tuna-Fish · Feb 4, 2014

Yuriman said:
Intel's latest node isn't designed around hitting 10GHz.

Nodes are not designed to hit certain frequencies. The frequency of a chip is equally dependent on it's design and the capability of the node.

To put it simply, the node makers push the transistor switching speed as high as they can -- modern Intel 22nm transistors have switching times of less than 10ps, that is, they can reach well over 100GHz speeds. However, the ability to switch a single transistor really, really fast isn't all that useful. Logic in chips is constructed of chains of transistors, and the clock speed of the chip is the inverse of the time it takes for all the transistors in the longest chain of the CPU to switch in sequence. So, a hypothetical CPU that runs on 10ps transistors and has a longest path of 20 transistors would be 5GHz.

You *could* use Intel's latest node to build very high clock speed chips. Intel has chosen not to because chips built with lower clockspeeds but more logic per stage seem to outperform the dumber chips.

Abwx · Feb 4, 2014

Tuna-Fish said:
To put it simply, the node makers push the transistor switching speed as high as they can -- modern Intel 22nm transistors have switching times of less than 10ps, that is, they can reach well over 100GHz speeds.

Actualy it s well over 300GHz for CPUs that works at 3-5GHz,
that s all the margin needed to render the rising times negligibles
in respect of a single cycle duration.

Edit : i think that the 10ps is the transmission
delay not the actual switching time.

Dresdenboy · Feb 4, 2014

Abwx said:
The genius idea was to call double pumped the fact that
a single ALU could receive two micro ops simultaneously
and market it as double the frequency , of course , literaly
the frequency of micro ops is doubled , indeed the integer
performance was all but doubled compared to previous designs...

http://books.google.fr/books?id=gni...onepage&q=pentium 4 double pumped alu&f=false

I had to search the keywords again to get the book text. But there is no clear statement, just that it's not sure, how its done. But as many other sources suggest (esp. the ISSCC paper or the one above) for simple ops (bit ops, narrow adds, single bit shifts) the fast ALUs can be used. And they're really that fast (4 FO4 delays?). The slow ALUs (known from day one of Willamette presentations) do the remaining ops. Some ops can be executed in a staggered way and back to back operations (one op using the result of the previous one) were possible too, IIRC. I don't know, how flags were handled.

Abwx said:
Actualy it s well over 300GHz for CPUs that works at 3-5GHz,
that s all the margin needed to render the rising times negligibles
in respect of a single cycle duration.

Edit : i think that the 10ps is the transmission
delay not the actual switching time.

The transistor switching speed is just part of the equation. Pipeline stages are measured in FO4 delays, which in itself are multiple layers of gates. P4 was ~16 FO4, fast ALUs 8 FO4, K10 and similar ones were 20+ FO4.

TuxDave · Feb 4, 2014

Dresdenboy said:
It works, as does domino logic.

Here is one of the papers, describing Intel's tech:

http://ctho.org/toread/forclass/18-722/logicfamilies/Deleganes05.pdf

The shmoo plot goes up to 4.2 GHz base clock (8.4 GHz ALUs) at 1.4V.

Thanks but yeah I know it works.

However as you scale down the process you have a couple problems. Transistor unity frequency increases but it's hard to say how that scales with pass gate logic used in lvs designs. That and your rc delays don't get the same improvement.

I'm actually more concerned about clocking at 20ghz. How much skew and jitter will take away from that 50ps cycle time. At 7ghz domino circuits had to do a lot of tricks to get that done. At 20ghz it will take another stroke of genius.

Abwx said:
The genius idea was to call double pumped the fact that
a single ALU could receive two micro ops simultaneously
and market it as double the frequency ,

If marketing calls the whole CPU double the frequency then yeah, that's a marketing bullet. But the alu fast clocks were actually 2x.

Could a pentium 4 reach 10 ghz on intel latest process node?

Member

Golden Member

Lifer

Lifer

Diamond Member

Elite Member

Lifer

Lifer

Lifer

Lifer

Diamond Member

Senior member

Platinum Member

Diamond Member

Senior member

Lifer

Lifer

No Lifer

Golden Member

Golden Member

Lifer

Golden Member

Lifer

Golden Member

Lifer