Could a 12.8ghz cpu equal a quad core 3.2ghz?

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Rezist

Senior member
Jun 20, 2009
726
0
71
Dude, do you even know the difference between simple assembly code instructions and threads? I am talking about dependency of assembly code executing through a pipeline, and you are talking about dependency of multiple threads in a programming language executing through the operating system's scheduler. You are on a completely different page. Take some programming courses for the love of god.

A 12G single core CPU would need an 80 stage pipeline to get any work done, and would still fail miserably.

I thought we were talking about the same CPU's here? The reason we don't have 12 GHz is a silicon issue not a software one. If you assume the same CPU at 12.8 GHz (stages, cache running at proc speed) I think the single core would win. All the time a high clock dual core beats lower clocked quads in most tests. This is mostly due to software I undertsand. But in todays enviroment I'd take the really fast single core.

If everything starts running really efficiently starting tomorrow on quads I'll change my mind.
 

aka1nas

Diamond Member
Aug 30, 2001
4,335
1
0
At this point in time, you'd at least want the ability to handle more than one thread concurrently. While such a high-frequency processor would be great for individual process execution times, it will be inferior at multitasking. Some OSes might have better schedulers with additional latency-hiding techniques, but it's still not comparable to hardware multitasking.

Most of you guys are conflating two different performance metrics: raw execution time with perceived latency/response time. These items are equally important.
 

JAG87

Diamond Member
Jan 3, 2006
3,921
3
76
dude the 12ghz single core would dominate.

I dont know wtf u guys are argueming about.

First off GHZ does not scale lineally.
Why does it not scale linear?
Because 12ghz would require either a massive multi, or a massive FSB/QPI.

Having your fsb or qpi that ramped up will lead to an insane boost in overall system speed.

If you ask me, id take the single 12ghz core over 4 x 3ghz core ANY day of the week.
My gulfy @ 4.4ghz x 6 = 26.4ghz w/o HT and 26.4+13.2(cuz ht is about 50% of real cores) = 39.6ghz :O

Give me a single 39.6ghz machine and i'll rule the world.


ROFL what obscene statements.

I think even my grand mother knows that FSB/QPI frequency as nearly no impact on performance. And FYI... HT has never, ever, ever accounted for more than a 30-35% performance improvement.
 

Maximilian

Lifer
Feb 8, 2004
12,604
15
81
ROFL what obscene statements.

I think even my grand mother knows that FSB/QPI frequency as nearly no impact on performance. And FYI... HT has never, ever, ever accounted for more than a 30-35% performance improvement.

She must be the most well informed grand mother ever :p
 

Tsavo

Platinum Member
Sep 29, 2009
2,645
37
91
My 286 is overclocked to 13 GHz.

I'm still waiting for it to finish booting Windows 7.

It's been 9 days, 17 hours and 37 minutes so far.
 
Dec 30, 2004
12,553
2
76
And why do you think frequency stopped scaling? It's almost as if you didn't read my post at all. Developers cannot program something to take advantage of all those clock cycles. If all executions were random than maybe you would be right. There is going to be a ton of wasted clock cycles because the code is just waiting on other executions to finish so that part of the processor is freed up.





Yes and pipelining and OOO fail to provide any advantage once we reached a certain clock speed. Which is why I am bringing them up. HT is another very good example thank you for bringing it up. HT fails to provide improvement in the majority of applications (because they are coded such that freeing up certain processor resources provides no gains until other executions finish). And it sucked especially when it was implemented on single core P4s. All this should help you draw conclusions as to why a 12.8G single core cpu would suck hard.
Frequency stopped scaling because CPUs are limited by the underlying silicon and in turn, waste heat output.

ya I don't have any clue what Jag is talking about. Apparently neither does he. IAACE. (computer engineer)
 
Dec 30, 2004
12,553
2
76
Yes and pipelining and OOO fail to provide any advantage once we reached a certain clock speed. Which is why I am bringing them up. HT is another very good example thank you for bringing it up. HT fails to provide improvement in the majority of applications (because they are coded such that freeing up certain processor resources provides no gains until other executions finish). And it sucked especially when it was implemented on single core P4s. All this should help you draw conclusions as to why a 12.8G single core cpu would suck hard.

WHAT? OOO and Pipelining are all transistors, and scale with the transistor frequency.

HT functioned poorly on the P4 due to the limited cache to pull from and high memory latency relative to core frequency (hence why Core i7 does better-- latency to memory is much lower thanks to the IMC).
 
Dec 30, 2004
12,553
2
76
The 12.8Ghz single core would easily outperform the 3.2Ghz quad core.

But the quad core could potentially still be more responsive due to poor schedulers. (though linux has a much better scheduler than windows when it comes to interactivity, a run away process generally doesn't get to monopolize the processor, and I/O is often handled asynchronously instead of causing waits like on windows)

Also, the quad core could be faster if the single core is cache starved. If the quad core has 4x as much cache (or more), that's a major advantage. A 12.8Ghz cpu with no L2 cache would be quite slow, and even a limited L2 cache (like the old durons with 64KB) wouldn't fair well.

as the size of programs and number of processes increases, does the branch prediction accuracy decrease? IE the branch history would be changing more frequently and thus the prediction accuracy would begin to decline.
 
Dec 30, 2004
12,553
2
76
dude the 12ghz single core would dominate.

I dont know wtf u guys are argueming about.

First off GHZ does not scale lineally.
Why does it not scale linear?
Because 12ghz would require either a massive multi, or a massive FSB/QPI.

Having your fsb or qpi that ramped up will lead to an insane boost in overall system speed.

If you ask me, id take the single 12ghz core over 4 x 3ghz core ANY day of the week.
My gulfy @ 4.4ghz x 6 = 26.4ghz w/o HT and 26.4+13.2(cuz ht is about 50% of real cores) = 39.6ghz :O

Give me a single 39.6ghz machine and i'll rule the world.



Only if he has bugs bunny on his team.
Remember that movie?
ROFL what obscene statements.

I think even my grand mother knows that FSB/QPI frequency as nearly no impact on performance. And FYI... HT has never, ever, ever accounted for more than a 30-35% performance improvement.

That's because most of the units on the processor are already busy with the main thread. The lower the performance gain of HT (assuming the cache is large enough to hold excess instructions that the CPU could calculate OOO), the more "efficient" your code is at utilizing that architecture.

This is why I would take my Ph2 at 3.5ghz any day over an i3 at 4.8ghz-- I still have more ALU's than the I3 does, and as applications become more multithreaded, my processor's higher IPC will begin to shine through. This is usually I always recommend a Ph2 quad over the i3-- even if the i3 is clocked higher, because where it matters NOW the Ph2 quad can keep up no problem [IE >60fps], and in the future, true multithreaded code capable of stressing an entire core is going to become more ubiquitous--> the HT on the i3,7 chips is going to benefit applications less and less.

HT takes advantage of ineffective compiler architecture optimization, or single threaded applications that only need to make use of 1 out of [hypothetically] 3 ALUs or FPUs or whatever in a sequence of executions in software on a processor.
 
Last edited:

Cogman

Lifer
Sep 19, 2000
10,286
145
106
That's because most of the units on the processor are already busy with the main thread. The lower the performance gain of HT (assuming the cache is large enough to hold excess instructions that the CPU could calculate OOO), the more "efficient" your code is at utilizing that architecture.

This is why I would take my Ph2 at 3.5ghz any day over an i3 at 4.8ghz-- I still have more ALU's than the I3 does, and as applications become more multithreaded, my processor's higher IPC will begin to shine through. This is usually I always recommend a Ph2 quad over the i3-- even if the i3 is clocked higher, because where it matters NOW the Ph2 quad can keep up no problem [IE >60fps], and in the future, true multithreaded code capable of stressing an entire core is going to become more ubiquitous--> the HT on the i3,7 chips is going to benefit applications less and less.

HT takes advantage of ineffective compiler architecture optimization, or single threaded applications that only need to make use of 1 out of [hypothetically] 3 ALUs or FPUs or whatever in a sequence of executions in software on a processor.

Its pretty much impossible to write an application that fully utilizes every piece of the CPU. This isn't a compiler issue either. The fact is, You don't know when you're code is going to be scheduled to run by the OS, and you can't assume that instruction x on thread 0 is going to be executing while instruction y on thread 1 is executing.

Thus, HT will benefit pretty much all multi-threaded apps and multiple apps running on the PC. The only exception to this is if the two threads of the apps are solely using one portion of the CPU (All the same ALU instructions in the thread). Which I would hardly call and efficient application.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,787
136
Oh goodie, a fun thread! :)

Here take a look at this:

http://forum.notebookreview.com/showthread.php?t=455347&highlight=i7+620M

620M, which is a Arrandale-based dual core that can run at 3.06GHz with all the cores active is slightly faster than the 720QM, which is a Clarksfield based quad core which can run at 1.73GHz with all the cores active.

So yes. A single 12.8GHz would be faster than a quad 3.2GHz. However, it wouldn't be ideal. The better comparison would be a dual 6.4GHz. Sure, a single 12.8GHz would run your single app faster, but since it'll be used 100%, it might not be so responsive. Duals on the other hand...

Pentium 4's SMT failed due to a number of reasons.
-When the code isn't in the Trace Cache, its effectively a 1-wide CPU. There's no way Hyperthreading could help in that case
-The replay feature which was a indirect consequence of a high clock CPU clogged up execution resources which hurt SMT performance

The Nehalem is faster per clock, but it is also wider and doesn't have replay. It also has plenty of memory bandwidth.
 
Dec 30, 2004
12,553
2
76
Its pretty much impossible to write an application that fully utilizes every piece of the CPU. This isn't a compiler issue either. The fact is, You don't know when you're code is going to be scheduled to run by the OS, and you can't assume that instruction x on thread 0 is going to be executing while instruction y on thread 1 is executing.

Thus, HT will benefit pretty much all multi-threaded apps and multiple apps running on the PC. The only exception to this is if the two threads of the apps are solely using one portion of the CPU (All the same ALU instructions in the thread). Which I would hardly call and efficient application.

Good point I did not consider that.
 
Dec 30, 2004
12,553
2
76
Oh goodie, a fun thread! :)

Here take a look at this:

http://forum.notebookreview.com/showthread.php?t=455347&highlight=i7+620M

620M, which is a Arrandale-based dual core that can run at 3.06GHz with all the cores active is slightly faster than the 720QM, which is a Clarksfield based quad core which can run at 1.73GHz with all the cores active.

So yes. A single 12.8GHz would be faster than a quad 3.2GHz. However, it wouldn't be ideal. The better comparison would be a dual 6.4GHz. Sure, a single 12.8GHz would run your single app faster, but since it'll be used 100%, it might not be so responsive. Duals on the other hand...

Pentium 4's SMT failed due to a number of reasons.
-When the code isn't in the Trace Cache, its effectively a 1-wide CPU. There's no way Hyperthreading could help in that case
-The replay feature which was a indirect consequence of a high clock CPU clogged up execution resources which hurt SMT performance

The Nehalem is faster per clock, but it is also wider and doesn't have replay. It also has plenty of memory bandwidth.

I agree fun thread!

Trace cache-- we just had to read that paper (Rotenberg etc) for an assignment due this past Tuesday.
I'm going to go run it again.
 

Fox5

Diamond Member
Jan 31, 2005
5,957
7
81
as the size of programs and number of processes increases, does the branch prediction accuracy decrease? IE the branch history would be changing more frequently and thus the prediction accuracy would begin to decline.

I wouldn't think so.

But with less total cache, more swapping out to memory may occur.
The same may hold true for registers. With less cores you have less registers, but each running process or thread requires its own set of registers and local memory. Under the right circumstances, you might be able to cause a single core cpu, with 1/4 the registers and cache, to thrash around so significantly that it wastes most of its time just accessing main memory.
 

aigomorla

CPU, Cases&Cooling Mod PC Gaming Mod Elite Member
Super Moderator
Sep 28, 2005
21,074
3,577
126
ROFL what obscene statements.

I think even my grand mother knows that FSB/QPI frequency as nearly no impact on performance.
Are you smoking crack?

200 x 20 IS NOT THE SAME as 25 x 160.

I have way more then enough screen shots to prove this.


Also i have enough proof on WCG to show you that HT can improve your work... my rough estimate was 50%.
And however you may be right about close to 35%. But i was using 50% to add my GHZ so i could get a higher number...
Were not looking at HT, but raw cores.

But The first comment... u should retract or i will show you JAG.
Experiment for yourself on your i7.
You will see im correct.

QPI / FSB > Multi when overclocking.


And once again a 12ghz cpu will dominate everything and everything, as long as your cache speed raitos and ram timings were in sync.
The reason why we went multi cores and not faster speed was because 12ghz silicon as someone stated is close to impossible.
The heat output alone would tear apart a LN2 POT.

But if we ignore the heat, and do just raw numbers... 12ghz clocked machine is not something to laugh at.
Even @ 6ghz+ territory we see insane numbers being pulled in.
 
Last edited:

Ben90

Platinum Member
Jun 14, 2009
2,866
3
0
Alright, I'm no scientist or computer expert, but I got some numbers for you guys to use as a base.

I'm on a i7-920. 3x2gb of RAM, HT off. Since I don't have an unlocked multiplier, I did the best I could without doing a lot of math to find numbers where the RAM matched up perfectly:

Single Core Setup:
168 Bclock
CPU- 3192mhz
DDR- 1347mhz
Unc- 2694mhz
QPI- 6063jiggahertz

Dual Core Setup:
133 Bclock
CPU- 1596mhz
DDR- 1333mhz
Unc- 2666mhz
QPI- 5866jiggahertz

Alright so comparing these two setups you can see the single core has exactly twice the CPU core, unfortunately its uncore/RAM/QPI operate at slightly higher. I believe the uncore will be the only thing that gives anything more than a negligible benefit so keep that in mind.

Single Core Benches (I'm lazy and limited):
IBT: 78.252 seconds
.......77.431 seconds

CineBench: 4561 points
................4517 points

Both at the same time: 169.29 IBT, 2518 CineBench, 5:57 total time

Dual Core Benches:
IBT: 76.335 seconds
.......75.356 seconds

Cinebench: 4604 points
................4625 points

Both at the same time: 187.84 IBT, 2547 CineBench, 5:49 total time

Alright so I didn't have that much time to test so these are going to have to do for now. The IBT runs faster on a single core and Cinebench runs faster on a dual. Something I noticed when testing both programs under the singlecore was that cinebench basically did nothing until IBT was done, while cinebench did a very small amount of work at the end of the IBT runs.

Its possible under certain scenarios that either setup could be faster. However remembering back to the days of single cores where an application could hang a core, I think multicore is more desirable (I'm sure OS scheduling has improved since then though).

Give me some more ideas to test and I can do a more in depth comparison when I have more time. My frequencies go from about 1500 mhz to 4400
 

JAG87

Diamond Member
Jan 3, 2006
3,921
3
76
Good comparison Ben. At 3.2G it doesn't show the weakness of the single core. Something else you can do is run a bench with a 1.6G single core, and compare to the 3.2G single core. If frequency scaling was perfect, the 3.2G score should be double. Otherwise we can calculate the scaling from 1.6>3.2.

Aigo, I don't know what kind of horse you are riding, all things being the same there is no improvement what-so-ever with higher QPI. Feel free to disprove me.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,787
136
Ben90, you should lower the dual core Uncore clock to match CPU core speeds. We are trying to compare a theoretical dual core 1.6GHz vs a single core 3.2GHz and Uncore clock speeds are likely going to be lower. Remember, Uncore is also the L3 cache.

If you can do it even more accurately, keep the L3 cache speed to core clock speeds but keep the IMC clocks the same.
 

aigomorla

CPU, Cases&Cooling Mod PC Gaming Mod Elite Member
Super Moderator
Sep 28, 2005
21,074
3,577
126
Either he owns a liquid nitrogen processing plant or he loves cake. :D

ruby is a she not a he.

Aigo, I don't know what kind of horse you are riding, all things being the same there is no improvement what-so-ever with higher QPI. Feel free to disprove me.

i'll show ya when i get home.

However its faster if you try it.

Put your i7 @ 133x20 vs 186x15 and run a few benchmarks.
You'll see first hand the 186 bclk is faster.

OF course tho the 186blck will require more voltage, so dont expect it to be a direct change.

And jag incase your wondering.. its RubyCon that taught me this...
I was originally on your side, til she showed me its wrong.

So your bashing heads with her as well. :X
 
Last edited:

Tsavo

Platinum Member
Sep 29, 2009
2,645
37
91
If they can build a single core doing 12GHz they can build a twelve core cpu running at 12GHz. Sounds gross but if you can have a cake you better be able to eat it too! :p

Wow, that'd be 144 GHz!!