Question Intel had a 7 GHz CPU years ago

igor_kavinski · Oct 29, 2022

https://www.reddit.com/r/hardware/comments/wir8ay

213 mm2 die size, 150 Watts, 50 pipeline stages all for one core at >7 GHz.

I'm getting a strong feeling of lust for this CPU. 150W ain't that much. Imagine what this CPU could have been if it had been developed further, with modern branch predictors and huge caches.

IntelUser2000 · Oct 29, 2022

NostaSeronx said:
Prescott's Integer core operated at 2x core frequency => https://ieeexplore.ieee.org/document/1332640
Thus, 3.4 GHz => 6.8 GHz and 3.6 GHz => 7.2 GHz
Tejas's Integer core operated at 2x core frequency => https://ieeexplore.ieee.org/document/4039605
~4.5 GHz -> 9 GHZ

Actually, the whole TeraHertz Transistor was specifically FDSOI. Since, Intel dropped TeraHertz early on, they pulled across the features meant for FDSOI to Bulk while preparing for FinFETs. This is the actual cause of the Netburst getting killed off; no process -> no product.

Load of bollocks and you know it.

First, Prescott's Integer core was only for simple operations. It had a separate "real" ALU that can handle all but that could only handle measly base frequencies. When analysis was done on performance they said the double pumped ALU's gain was less than people thought it would be.

And Intel never said the double pumped ALU would be the clock Netburst would reach, no, they said they want the entire chip to reach that!

The FinFET transistor achieves most of FD-SOI benefits while bringing other goodies such as much improved gate control. FD-SOI is obsolete. It's time to let go. I know it's hard, but you must.

The Netburst idea is gone because it made no sense. Any layman who had the faintest idea about overclocking knew Intel management was absolutely insane for claiming it would even reach 5GHz.

-Netburst idealogy meant the performance per clock was atrocious.
-In reality the simple circuit design techniques didn't pan out. Netburst was a massive 217mm2 die. Same spirit went on to design the Itanium, which was also massive.
-But worst of all, it couldn't even meet the clockspeed targets, because some part of Intel were braindead or were in serious denial that even the 5GHz target was impossible using air cooling. We only achieve that in 2022 with building bricks as heatsinks and water cooling that has become commonplace.

igor_kavinski said:
If they can put out an anemic Atom CPU for the masses, why not a single core ultra high frequency CPU for the extra giggles?

Speaking of dead designs, the out of order Silvermont Atom achieved 50% higher performance per clock with no clock speed degradation and at the same die size as the in-order Atom!

So if they went with the out of order design in the first place, then Atom would have been, much, much faster. Silvermont made it usable for general purpose applications like browsing. The predecessor was slow even for basic things.

ARM did the same going in-order with some. IBM did with POWER. Looks like even the engineers need experimenting to find out the better path.

>5GHz clocks were always meant for the mental asylum though.

NostaSeronx · Oct 29, 2022

IntelUser2000 said:
Load of bollocks and you know it.

First, Prescott's Integer core was only for simple operations. It had a separate "real" ALU that can handle all but that could only handle measly base frequencies. When analysis was done on performance they said the double pumped ALU's gain was less than people thought it would be.

And Intel never said the double pumped ALU would be the clock Netburst would reach, no, they said they want the entire chip to reach that!

Uh? What?

FCLK = Fast Clock => 2x Core clock

Those are real ALUs, that can fuse to do one 64-bit or not fuse with two 32-bit.

IntelUser2000 said:
The FinFET transistor achieves most of FD-SOI benefits while bringing other goodies such as much improve gate control. FD-SOI is antiquidated. It's time to let go. I know it's hard, but you must.

The modern iteration of FDSOI says otherwise. Maybe, not get so hard focused being a Fin-stan? Have fun losing scaling after 22nm, losing body biasing, losing variation control, good luck with all the doping in first-gen FinFETs, etc.

A/// · Oct 29, 2022

Dunno I liked prescott. It was cheaper to run than my space heater at the time and it lasted longer than the chinese junk that space heater was.

IntelUser2000 · Oct 29, 2022

NostaSeronx said:
Uh? What?
View attachment 70061
View attachment 70062
FCLK = Fast Clock => 2x Core clock

Those are real ALUs, that can fuse to do one 64-bit or two 32-bit.
The modern iteration of FDSOI says otherwise. Maybe, not get so hard focused being a Fin-stan?

Do some actual reading please. The Pentium 4's Rapid Execution Engine AKA the Double Pumped ALU was limited in instructions it can execute.

And yes TriGate/FinFET replaces FDSOI for the most part. The full effect will happen with RibbonFET when gate is fully surrounded.

igor_kavinski · Oct 29, 2022

Thunder 57 said:
That's interesting, considering GloFo didn't exist in 2005. And FDSOI is Nosta's fetish.

It was so long ago. Can't remember everything exactly. But I clearly remember the "moment of clarity" bit in that interview. I thought that was really cool. Maybe Intel/AMD should hand out fine wine to their engineers to sip throughout the day while they think about silicon engineering problems.

NostaSeronx · Oct 29, 2022

IntelUser2000 said:
Do some actual reading please. The Pentium 4's Rapid Execution Engine AKA the Double Pumped ALU was limited in instructions it can execute.

Fast ALUs cover >60% of existing program code. It wasn't the double pump that was the issue but everything else.

Kind of ignoring the low hanging fruit look at that cache hierarchy.

IntelUser2000 · Oct 29, 2022

Shall we recap what the Pentium 4 was?

-8KB L1 Data cache, no L1 Instruction cache. Pentium III had 16KB, since Pentium MMX.
-Went from Pentium III's 3 decoders to 1. Actually Pentium MMX had 2 decoders so Pentium 4 had the same amount of decoders as the 486 and predecessors before Superscalar execution was a thing.
--Trace Cache replaced the L1i cache.
-Fast ALU that clocked at double the rest of the core and Slow ALU for the rest. The Fast ALU could only execute some instructions.
-20+ stage pipeline. I believe it was 20 stages AFTER Trace Cache hit.

So they tried to up the clock by two methods, one by increased pipeline stages and two by "simplifying" the core. I put simplifying in quotes because the Trace Cache was almost 100KB cache in size, which is 4x as large as Sandy Bridge's uop cache with same hit rate. Of course people learn and why SNB was much better.

The Netburst chips as we find out later went much further than enhanced branch prediction and novel Trace Cache to make up for the "Hyper Pipeline". It had the replay feature that replicated significant parts of the entire pipeline so it can "replay". I heard that was the primary reason for the big loss in performance when Hyperthreading was enabled in single threaded applications and no doubt it ballooned die size and power consumption.

Thunder 57 · Oct 29, 2022

AdamK47 said:
This is an AMD heavy forum, so of course this is going to be another Intel vs AMD thread with a title to bait the majority.

Is it though? I think people are just glad that there is competition again, and that was AMD's doing.

igor_kavinski · Oct 29, 2022

IntelUser2000 said:
It had the replay feature that replicated significant parts of the entire pipeline so it can "replay".

So it could "playback" the executed instructions and take a different branch that wouldn't lead to pipeline flush? Sounds promising. I can also see why that would conflict with HT. The virtual thread would keep waiting while the main thread replayed. Seems different teams were working on different parts of the CPU and they weren't communicating.

NostaSeronx · Oct 29, 2022

IntelUser2000 said:
Shall we recap what the Pentium 4 was?

-8KB L1 Data cache, no L1 Instruction cache. Pentium III had 16KB, since Pentium MMX.
-Went from Pentium III's 3 decoders to 1. Actually Pentium MMX had 2 decoders so Pentium 4 had the same amount of decoders as the 486 and predecessors before Superscalar execution was a thing.
--Trace Cache replaced the L1i cache.
-Fast ALU that clocked at double the rest of the core and Slow ALU for the rest. The Fast ALU could only execute some instructions.
-20+ stage pipeline. I believe it was 20 stages AFTER Trace Cache hit.

So they tried to up the clock by two methods, one by increased pipeline stages and two by "simplifying" the core. I put simplifying in quotes because the Trace Cache was almost 100KB cache in size, which is 4x as large as Sandy Bridge's uop cache with same hit rate. Of course people learn and why SNB was much better.

The Netburst chips as we find out later went much further than enhanced branch prediction and novel Trace Cache to make up for the "Hyper Pipeline". It had the replay feature that replicated significant parts of the entire pipeline so it can "replay". I heard that was the primary reason for the big loss in performance when Hyperthreading was enabled in single threaded applications and no doubt it ballooned die size and power consumption.

- The Datacache was renamed to L0. => No L1 buffer for data
- Yep.
- Trace cache was after decode, while if there was a L1i it would be before decode. => No L1 buffer for instructions
- Fast ALUs don't do these, and are run on 1x core clock:
-- Some examples of integer execution hardware put elsewhere are the multiplier, shifts, flag logic, and branch processing.
-- Most integer shift or rotate operations go to the complex integer dispatch port. These shift operations have a latency of four clocks. Integer multiply and divide operations also have a long latency. Typical forms of multiply and divide have a latency of about 14 and 60 clocks, respectively.
- Yep

The hottest part of any Pentium 4 was the processor around the Integer Core. Since, when ever something missed it went to L2 -> Decode -> Trace -> Core -> L0 -> L2. Overall, the coolest part in temperature was the integer core. While everything else was nuclear apocalypse.

Modern OoO non-x86 cores that do >5 GHz, high-frequency pumpin, all have L1 instruction and data caches. Specifically, to prevent L2 usage from ballooning power at high frequency.

The double-pumping is actually the only good thing of Pentium 4. If they kept Netburst close to the patented version, it probably wouldn't have been bad. Since, that version had a L1 cache while none of the production models had one. The L1 Caches based off patents would be near/inside the memory execution unit next to the L0 caches and branch prediction unit.

Gen1:
19-cycle L2 -> Decode
19-cycle L2 -> 2-cycle L0d
No buffer for that.

Gen2:
23/27-cycle L2 -> Decode
23/27-cycle L2 -> 4-cycle L0d
Where is the buffer?!

Also, trace cache is as big as the L0 data cache. Definitely, not 100KB in size.

DAPUNISHER · Oct 29, 2022

Thunder 57 said:
Is it though? I think people are just glad that there is competition again, and that was AMD's doing.

I was going to respond to this myself. If this were 2017, We'd be characterizing this as an Intel heavy forum. AMD's comeback ended stagnation and boring CPU releases from both companies. The fact AM4 is not only still here over half a decade later, but also the best selling DIY CPUs, speaks for itself.

I will buy Intel again, when they stop changing platforms more often than a Hobo changes underwear.

Or AMD copies them on that too; whichever comes first.

Insert_Nickname · Oct 30, 2022

A/// said:
Dunno I liked prescott. It was cheaper to run than my space heater at the time and it lasted longer than the chinese junk that space heater was.

Remember Asetek Vapochill?

Water cooling? Pffft...

DigDog · Oct 30, 2022

i remember back when Conroe launched i assembled my very own first PC, and because i didnt have any money i bought it piece by piece.
Because it was socketed on LGA775 i could fit in a Celeron, which sold at 1.8Ghz, but ran at 3.6Ghz STABLE. Just doubled the FSB, done.

igor_kavinski · Oct 30, 2022

DigDog said:
Because it was socketed on LGA775 i could fit in a Celeron, which sold at 1.8Ghz, but ran at 3.6Ghz STABLE. Just doubled the FSB, done.

Stock FSB was 66 MHz? How many years did it run OCed? Never any USB issues or HDD errors?

Insert_Nickname · Oct 30, 2022

DigDog said:
Because it was socketed on LGA775 i could fit in a Celeron, which sold at 1.8Ghz, but ran at 3.6Ghz STABLE. Just doubled the FSB, done.

Those Core 2 Celerons were monster overclockers. At nearly 100% success rate. Only downside was the limited L2 cache, but high frequency made somewhat up for that.

That was back when OC'ing was fun.

DigDog · Oct 30, 2022

Insert_Nickname said:
Those Core 2 Celerons were monster overclockers. At nearly 100% success rate. Only downside was the limited L2 cache, but high frequency made somewhat up for that.

That was back when OC'ing was fun.

Oh don't get me wrong, even at 3.6 it was trash, but, LESS trash. And it cost nothing.

igor_kavinski said:
Stock FSB was 66 MHz? How many years did it run OCed? Never any USB issues or HDD errors?

I dont remember what FSB it had.
I only used it for a couple of months until i could afford the C2D E6600 which was a beast compared to the Celeron 430. I didnt have a ton luck with the E6600, i could get it to 3.2Ghz with a ton of voltage and tons of noise from the H212, but it was stable. Eventually i clocked it down to 3Ghz when the mobo started dropping voltages. People were pushing their other E6600 to 3.4-3.6 on water.

I never had any errors of any kind.

I remember back in those days there were a few (seriously .. like.. two? maybe three) people online who said things like "if you were a professional and you did extended memory tests, you would see a ton of memory errors that you don't see because you are a casual user" but i had a strong feeling that it was pure BS. Advocating against overclocking = please make sure you register your account with your @Intel email.

No, that CPU was revolutionary. I used it for god .. until 2014? I think i upgraded from E6600 directly to a 4670k .. E6600 = bought early 2007 ? maybe late 2006, Haswell bought middle of 2014. 8 years of service.
TBH can't complain if the mobo caps start dropping voltage.

igor_kavinski · Oct 30, 2022

DigDog said:
No, that CPU was revolutionary. I used it for god .. until 2014? I think i upgraded from E6600 directly to a 4670k ..

So you are due for an upgrade. What do you have your mind set on? Zen 4? Or waiting for Zen4X3D?

Insert_Nickname · Oct 30, 2022

DigDog said:
Oh don't get me wrong, even at 3.6 it was trash, but, LESS trash.

Yeah, the early single core Conroe-L weren't really anything to write home about mostly due to being single core. OC'd it was still better then any P4. The later Wolfdale-3M on the other hand were pretty decent.

DigDog said:
And it cost nothing.

That the essence right there. They were fun to play with precisely because they didn't cost an arm or a leg. Blowing one up OC'ing it to high **** and back, didn't really matter that much.

There is a bit more grief if your brand new 13900K goes up in magic smoke. Taking your brand new Z790 board and RTX 4090 with it.

DigDog · Oct 30, 2022

igor_kavinski said:
So you are due for an upgrade. What do you have your mind set on? Zen 4? Or waiting for Zen4X3D?

i upgraded recently, i got me a 5600X which should see me through the next 6/8/10 years.
Still running a cheap RX590 i bought super-discounted 2 christmas ago from amazon.

I used to be a fan of high end tech until the 4670 when i realized i was getting more power than i needed. I come from a Commodore 64 .. ermm .. i mean, an Intellivision .. and have lived though the years where if you opened 3 Explorer pages your whole PC would bluescreen. But these days, computers are like Ferrari - more than i need to get the job done.

SSDs are awesome though.

kschendel · Oct 30, 2022

The zillion-stage pipeline killer was of course pipeline bubbles due to branching. Intel tried to bury it with GHz, but it was never going to work without a lot more cleverness about branch handling, and that introduces complexity that defeats the march to crazy GHz frequencies.

Netburst had real potential if you could only write your code to only go in straight lines. Alas, real code doesn't work that way. So even if Intel could have reached 10 or even 7 GHz, I don't think it would have performed worth the effort.

Doug S · Oct 30, 2022

TheELF said:
No it wouldn't, because those games are made for a specific range of FPS, if it gets faster the faster the core then it will be broken and unplayable because you wouldn't have a chance of controlling anything.

At least intel had the sense to not release this thing in the first place, unlike AMD that just went ahead and released the whole dozer gen that was a step back in cores.

My understanding from what I heard and read at the time was Intel's plan was to expand the number of double pumped stages over a few generations (which would continue to increase frequency and pipeline stages in their own right) until they essentially 'exposed' those internal half stages, i.e. doubling the number of pipeline stages and the frequency. That was how Intel was going to reach 20 GHz - get to 10 GHz with 50+ pipeline stages or whatever where most of them were double pumped. Then everything would be made to run double pumped they'd have a 20 GHz CPU to sell us!

This was IMHO driven as much by Intel's marketers as their engineers. Back then consumers had been trained to regard clock frequency as the way you tell how well a CPU performs, and even when AMD had better performing Athlon CPUs they were finding them hard to sell to consumers due to the lower clock rates. AMD was forced to give them product numbers to compete so e.g. a 2 GHz Athlon might be marketed as "Athlon XP 3000+" or something like that, indicating it was to be compared to a 3 GHz Pentium 4. Intel of course gave them a lot of crap about that, claiming they were trying to fool people with sleazy marketing.

Two things hurt Pentium 4 in the market IMHO. First was how initially it was tied to expensive RDRAM, which made it more expensive and especially unpopular in the business world where they wanted to be able to move RAM around between machines (something far fewer care about / do these days since operating systems and applications don't keep demanding more and more like they used to) Second was the power draw issues mentioned in the video. Not so much because consumers care about how much power they burn but because they care about how noisy they are. The standard coolers Intel shipped were terribly loud at the speeds they had to run with a P4, while AMD's Athlons were much quieter. AMD was also helped by being first to market with 64 bit CPUs (which is ironic, because Intel had 64 bit capability built into a P4 that shipped well before AMD's 64 bit stuff did but did not enable it - Hans DeVries had a writeup about that many years ago) which helped them market them overcome their lower clock rates marketing-wise.

Ironically, once Intel backed down off the P4 power cliff and went with an evolution of the P3 design they had designed in Israel intended for laptops (because no sane person wanted a P4 in a laptop) they were faced with the same marketing problem AMD had trying to compete with P4s with a far higher clock. After spending 20 years training consumers that higher clock rates were better, how could they sell Core series CPUs clocked 1 GHz slower than the P4 they were trying to get people to replace? Thus Intel was forced to adopt the same model number scheme they had spewed so much bile at AMD over!

NostaSeronx · Oct 30, 2022

kschendel said:
Netburst had real potential if you could only write your code to only go in straight lines. Alas, real code doesn't work that way. So even if Intel could have reached 10 or even 7 GHz, I don't think it would have performed worth the effort.

Actually, you got it backwards. Netburst was built for branchy-code.

Problem:
"The job of the fetch unit is to feed the dynamic instruction stream to the decoder. A problem is that instructions are placed in the cache in their compiled order. Storing programs in static form favors fetching code that does not branch or code with large basic blocks. Neither of these cases is typical of integer code."

Solution:
"It is for this reason that several researchers have proposed a special instruction cache for capturing long dynamic instruction sequences. This structure is called a trace cache because each line stores a snapshot, or trace, of the dynamic instruction stream."

The trace cache was meant to fix the in-order-ness of the conventional instruction cache/decode pipeline. Heavy branchy integer code saw high hit rates with a trace cache.

Pentium 4 only has a trace cache, and there is no L1i buffer. So, there is huge slow downs on conventional memory-heavy non-branchy code. Which is important to note since the Intel Labs/R&D had the trace cache paired with the L1i. To have the best of both worlds setup and it is what AMD followed in their design. First K9 tapeout didn't have a trace cache, but the second tapeout which was canned did. It didn't get rid of the L1i cache either and the core frequency was higher than any of Netburst's core frequency. Since, it was launching at 5 GHz(iCore=5 GHz) while if Tejas launched it would only be around 4.5 GHz(iCore=9 GHz).

Historic moment is this thing killed AMD's K9:

Intel® Pentium® M Processor 755 (2M Cache, 2.00 GHz, 400 MHz FSB) - Product Specifications | Intel

Intel® Pentium® M Processor 755 (2M Cache, 2.00 GHz, 400 MHz FSB) quick reference with specifications, features, and technologies.

www.intel.com

Around the time of Mobile low-TDP PCs growth was exceeding Desktop all-TDP pc growth.

Insert_Nickname · Oct 30, 2022

Doug S said:
Second was the power draw issues mentioned in the video. Not so much because consumers care about how much power they burn but because they care about how noisy they are. The standard coolers Intel shipped were terribly loud at the speeds they had to run with a P4, while AMD's Athlons were much quieter.

Intel tried hard to push the BTX form factor for those reasons. As far as I know it was a complete flop outside OEM PCs.

A shame actually. It was a pretty good design compared to ATX. But then, newer ATX has borrowed a lot of the good points, so there is that at least.

zir_blazer · Oct 30, 2022

Doug S said:
After spending 20 years training consumers that higher clock rates were better, how could they sell Core series CPUs clocked 1 GHz slower than the P4 they were trying to get people to replace? Thus Intel was forced to adopt the same model number scheme they had spewed so much bile at AMD over!

False. AMD used Performance Ratings that had some resemblance of performing around the same as a Pentium 4 of that generation running at the PR clock speed (Which you actually mentioned). Around mid Prescott era, Intel decided to rename its Processors models from just clock speed to a model number, that it abused for consecutive Prescott re-releases with added features (Base that ended in 5x0, then Prescotts with XD Bit which ended in 5x1, then Prescotts with EM64T that I don't remember how you identified). It never adopted a PR system like AMD, it went straight to model numbers that you had to decode.
And, by the time of Pentium D and Athlon 64 X2, which were released near simultaneously, AMD PR had no true meaning since the values were completely distorted, then they fully dropped it with Phenom. By the time of Conroe release, Intel has been using model numbers for a year and half or so.

Insert_Nickname said:
Intel tried hard to push the BTX form factor for those reasons. As far as I know it was a complete flop outside OEM PCs.

A shame actually. It was a pretty good design compared to ATX. But then, newer ATX has borrowed a lot of the good points, so there is that at least.

I recall discussions about BTX mentioning than it was impossible for AMD to use due to the DIMM Slots being too far away from the Processor, which on K8 was critical because it relied on the Integrated Memory Controller, whereas Intel had Chipset as middle point.
And with Conroe going back to reasonable power levels BTX main reason to be ceased to exists.

Fanatical Meat · Oct 30, 2022

TheELF said:
No it wouldn't, because those games are made for a specific range of FPS, if it gets faster the faster the core then it will be broken and unplayable because you wouldn't have a chance of controlling anything.

At least intel had the sense to not release this thing in the first place, unlike AMD that just went ahead and released the whole dozer gen that was a step back in cores.

Don’t get me started on those loads of crap. My damn FX6300 almost made me vow to never buy anything AMD again. Glad I gave them a second chance but man was I pissed off at the shoddy quality and unreliable and crappy performance of that machine

Insert_Nickname said:
But, but, but Intel promised 10GHz eventually on the P4...

Fun fact; P4s run the ALUs double pumped, so a hypothetical 10GHz P4 would have ALUs running at an effective 20GHz.

Okay so my memory is going far back but I remember around the time intel was making claims of 10/20Ghz pentiums (fours?) someone did the math and it was projected intel cpus would be running close to the surface of the sun for heat. I forgot the specifics because it was so long ago but I’m sure google remembers it.

Question Intel had a 7 GHz CPU years ago

Lifer

Elite Member

Diamond Member

Diamond Member

Elite Member

Lifer

Diamond Member

Elite Member

Diamond Member

Lifer

Diamond Member

Super Moderator CPU Forum Mod and Elite Member

Diamond Member

Lifer

Lifer

Diamond Member

Lifer

Lifer

Diamond Member

Lifer

Senior member

Diamond Member

Diamond Member

Diamond Member

Golden Member

Lifer