Nehalem

Nemesis 1

Lifer
Dec 30, 2006
11,366
2
0
http://www.xtremesystems.org/f...howthread.php?t=171811

Only 10%-25% faster than Penryn @ same clock . I would say Intel is keeping things in perspective here. But look at the multithread improvements over Penryn 2x. Max. Which is 20% -40% faster than K10 at same clock. And this baby is going to scale bigtime.

So that says to me the 2core 4 thread version of Nehalem > 4 core K10. Now if this holds up. Kiss AMD good bye. In the cpu business. Because Intel is going to have the topend and the bottom(penryn)
 

Idontcare

Elite Member
Oct 10, 1999
21,110
59
91
I don't see where they state that the 15-25% performance increase is at same clock.

In fact I see them say single-threaded apps will perform higher, but then they have a slide discussing the "turbo mode" where individual cores can be pushed higher in clockspeed to increase single-threaded app performance while remaining within the same power envelope.

They even go further and state same performance at lower power, or higher performance at same power.

I read this to mean that thread-level performance (clock for clock) will be basically the same between Yorkfield and Bloomfield.

Because of hyperthreading you can get up to 2X more threads operating in parallel - so multi-threaded performance could potentially double on Bloomfield over Penryn simply because Bloomfield supports 8 threads and Penryn supports 4.

Its exciting, but misleading. You don't get ALL those benefits, you have to choose which ones you want (higher speed or lower power consumption, higher single-threaded app performance or higher multi-threaded app performance)
 

Nemesis 1

Lifer
Dec 30, 2006
11,366
2
0
If you open the link . Second point . It says this.1.1x-1.25 performance increase in single thread apps' = 10%-25% increase. Off course its at the same clock . How else would Intel do it. .

Than you wrote this.

Because of hyperthreading you can get up to 2X more threads operating in parallel - so multi-threaded performance could potentially double on Bloomfield over Penryn simply because Bloomfield supports 8 threads and Penryn supports 4.

Exactly and it scales from 1.2 -2x multi threaded arch improvement. First point on slide 1. = 20%-100% improvement in multi thread arch.

So a 2 core Nehalem with 4 threads. is going to be very close to what AMD can put out with their high end K10. I have heard from a vary good source that Nehalems will run at 4ghz or above. So add in the performance increase of higher GHz Intel Nehalem. Were talking 40%-80% better performance single thread arch improvement over Penryn be it at higher clock speeds.

I was also told the number of pipelines of Nehalem . Its more than Penryn but a little less than P4C. I will keep this one to myself as it gives me an edge in the forum wars. I have a feeling this thing is going to clock vary high. I always believed that Nehalem would have same number of pipelines as P4C. I believe P4C were 20 pipeline cpu's . But what the hay I was close.

We well see in a month or so . But I said it here not long ago. That A wolfdale O/C will give K10 all it wants as far as = performance Multi thread arch. So I have zero reason not to believe that a 2core Nehalem will perform better than K10's .

So amd is not only going to have to lower the prices of 3core K10's . Amd will be forced to lower their 4 core prices to compete with 2core Nehalems. I see noway out of the best mouse trap ever laid down to catch a rodent.(rat).

When did Hector want to compare AMD's best against Intels best. Oh! that time is history . I thought Intel handle that one rather well. But now after the facts . They new what they had coming .

For guys thinking about nehalem . It is not the Isreali team that did the work on Nehalem .
Gesher is the Israeli teams CPU. Can't wait for this baby.

As A side note taking bets on this one . Heres the question will the Israel 45/32nm fab produce any Penryns .

I am betting not 1 penryn will come out of that fab . I am betting they will be nehalems from day 1.

As we speak right now AMD is reverse engineering Penryn . They did it before they will do it again. Skunk cann't change its strips . As we already know intel 4 issue core took people by surprize. But I remember A guy here. Can't recall his name . But he said back in 04 I believe maybe 03 . That Intels Next desktop core after the pressies would be a higher issue core. I believe not sure but he said 4 issue . I remember it pretty good as he argued that Intels Itanic was a 6 issue core and Intel would have no problem releasing a higher issue core. In the end he was banned because he caused a hell of an uproar in the forums here . Dam I wish I could remember his name . But anyway he was right . As it turns out he was unjustly banned.

 

The-Noid

Diamond Member
Nov 16, 2005
3,117
4
76
10-25% is a big deal singlethreaded. IPC advantages are significantly harder than just adding more cores.

Nehalem should also help keep up with in heavily multithreaded environments.

Should be better in the end.
 

Idontcare

Elite Member
Oct 10, 1999
21,110
59
91
Originally posted by: Nemesis 1
If you open the link . Second point . It says this.1.1x-1.25 performance increase in single thread apps' = 10%-25% increase. Off course its at the same clock . How else would Intel do it. .

You are missing the point I was attempting to communicate.

If Intel sells a 3GHz Nehalem and you run a single-threaded application then the chip is going to automatically "turbo mode" the core running the single-threaded app (if you believe the marketing hype) to something >3GHz.

So if you compare single-threaded performance of a "3GHz" Bloomfield chip (albeit running 1 core at 3.5GHz and the remaining 3 cores at 2Ghz to fit into its power envelope) to a 3GHz Yorkfield are you comparing "clock to clock"? No you won't be.

So my question is how much of that 15-25% single-threaded performance boost is from the CPU up-clocking the loaded core by 15-25% versus how much is actually going to come from IPC improvments?

And if that is the case, what happens if I load a Bloomfield with 4 instances of a single-threaded application and turn them all on at once?

Because of TDP restrictions the chip won't operate any of the cores in turbo mode (as they are all fully-loaded with single-threaded apps) so will I still see a 15-25% performance boost in my single-threaded apps on Bloomfield relative to loading a Yorkfield in similiar fashion?

Have I clearly communicated my question now?

Originally posted by: Nemesis 1
Than you wrote this.

Because of hyperthreading you can get up to 2X more threads operating in parallel - so multi-threaded performance could potentially double on Bloomfield over Penryn simply because Bloomfield supports 8 threads and Penryn supports 4.

Exactly and it scales from 1.2 -2x multi threaded arch improvement. First point on slide 1. = 20%-100% improvement in multi thread arch.

If the multi-threaded performance increase is "at best" a linear extrapolation of the number of threads on the chip despite the chip architecture changing from Yorkfield to Bloomfield then that further suggests there is little to no IPC improvement per thread.

If they doubled the threads AND increased the IPC per thread AND integrated the memory controller then I would expect the upper end of the improvement range to be >2X and not just simply listed as "up to" 2X.
 

Nemesis 1

Lifer
Dec 30, 2006
11,366
2
0
Ok I see what your saying. Really it wouldn't make since for intel to use turbo mode to give us performance increases . Your basicly saying that Nehalem is < than Penryn if turbo mode is off. I can't buy into that. No improvements from ondie memory controller. No improvements from multi- leveled shared cache. Meaning more than L2 is shared.

No arch . gains as far as logic and SSE4.2. Now I am thinking Intel is going to sandbag
right up till the release. No reason to do otherwise.

 

BitByBit

Senior member
Jan 2, 2005
474
2
81
Originally posted by: Idontcare


If the multi-threaded performance increase is "at best" a linear extrapolation of the number of threads on the chip despite the chip architecture changing from Yorkfield to Bloomfield then that further suggests there is little to no IPC improvement per thread.

If they doubled the threads AND increased the IPC per thread AND integrated the memory controller then I would expect the upper end of the improvement range to be >2X and not just simply listed as "up to" 2X.

Since when has multithreaded performance due to SMT been a linear extrapolation of the number of threads? Have I misunderstood you? If Nehalem's SMT can potentially result in a two-fold increase in performance over Penryn, then Intel really have done their homework; Hyperthreading resulted in around a 25% increase on Netburst at best. Assuming Nehalem's single-threaded performance is 20% greater than Penryn's, and multithreading results in double the performance in extreme cases, then the speedup due to SMT in this case would be: 2 / 1.2 = 66%. Intel did state that Nehalem's SMT would bear little resemblance to Hyperthreading, so such as increase in performance isn't entirely unbelievable.
 

Nemesis 1

Lifer
Dec 30, 2006
11,366
2
0
I think we been threw this one . With AMD .



Intel Vice President Kirk Skaugen said the
Quote:
that the CPU core performance jump from the same process Core 2 (Penryn) to Nehalem would be higher than the jump for Netburst to Core 2 itself.

 

ArchAngel777

Diamond Member
Dec 24, 2000
5,223
61
91
Originally posted by: Nemesis 1
I think we been threw this one . With AMD .



Intel Vice President Kirk Skaugen said the
Quote:
that the CPU core performance jump from the same process Core 2 (Penryn) to Nehalem would be higher than the jump for Netburst to Core 2 itself.

VP's CEO's, etc... All of a bunch of marketing BS and hype... I'll believe it when I see it. Remember all the things Jerry Sanders and Hector have promised over the years?
 

Idontcare

Elite Member
Oct 10, 1999
21,110
59
91
Originally posted by: BitByBit
Originally posted by: Idontcare
If the multi-threaded performance increase is "at best" a linear extrapolation of the number of threads on the chip despite the chip architecture changing from Yorkfield to Bloomfield then that further suggests there is little to no IPC improvement per thread.

If they doubled the threads AND increased the IPC per thread AND integrated the memory controller then I would expect the upper end of the improvement range to be >2X and not just simply listed as "up to" 2X.

Since when has multithreaded performance due to SMT been a linear extrapolation of the number of threads? Have I misunderstood you? If Nehalem's SMT can potentially result in a two-fold increase in performance over Penryn, then Intel really have done their homework; Hyperthreading resulted in around a 25% increase on Netburst at best. Assuming Nehalem's single-threaded performance is 20% greater than Penryn's, and multithreading results in double the performance in extreme cases, then the speedup due to SMT in this case would be: 2 / 1.2 = 66%. Intel did state that Nehalem's SMT would bear little resemblance to Hyperthreading, so such as increase in performance isn't entirely unbelievable.

Increasing threads/core to >1 without inducing a thread performance penality is not new. Niagara processors do it, Power6 as well.

I would expect POV-Ray (the multi-threaded beta) performance to scale linearly with the number of available threads.

So, if IPC per thread is not improved in Nehalem versus Penryn then I'd expect a Bloomfield to perform 2X as fast as Yorkfield is "clock-for-clock" unless the new and improved hyperthreading in Bloomfield is in fact really crappy and does turn out to introduce a performance penality to the 2nd concurrent thread running on a given core...

I am not saying anything new or surprising here, right? I can never effectively tell when I am conversing with ignorant newbs who just like to express their unjustified opinions versus the few folks who do know a thing or two about processors at this forum. No offense intended.
 

Nemesis 1

Lifer
Dec 30, 2006
11,366
2
0
Originally posted by: ArchAngel777
Originally posted by: Nemesis 1
I think we been threw this one . With AMD .



Intel Vice President Kirk Skaugen said the
Quote:
that the CPU core performance jump from the same process Core 2 (Penryn) to Nehalem would be higher than the jump for Netburst to Core 2 itself.

VP's CEO's, etc... All of a bunch of marketing BS and hype... I'll believe it when I see it. Remember all the things Jerry Sanders and Hector have promised over the years?



Actually Sanders did ok . It was Sanders who started development on Hammer. Hector has done nothing but pis-s Intel off . and they are pis-sed. Hector took AMD down this 60billion dollar lawsuite deal It was Hector who challenged Intel to cpu match . Hector should ask again.If I was intel I would go for the kill.
 

Idontcare

Elite Member
Oct 10, 1999
21,110
59
91
I would argue that Barret pissed Intel off more than Hector ever did.

Intel didn't replace Hector in response to Dirk's K8 beating the crap out of Barret's netburst Prescott.

If you followed the saga that was Cyrix then you probably have visions of dejavu' when you see where AMD is going, management change or no management change.
 

BitByBit

Senior member
Jan 2, 2005
474
2
81
Originally posted by: Idontcare

So, if IPC per thread is not improved in Nehalem versus Penryn then I'd expect a Bloomfield to perform 2X as fast as Yorkfield is "clock-for-clock" unless the new and improved hyperthreading in Bloomfield is in fact really crappy and does turn out to introduce a performance penality to the 2nd concurrent thread running on a given core...

The performance penalty is due to concurrent threads attempting to access the same resources. Granted, Netburst was less than an ideal candidate for SMT given its tiny L1 caches and dismal decode rate, among other things, but Core 2 (and presumably Nehalem) doesn't have unlimited resources either. Added to that the fact that Core 2 does a pretty good job of keeping its execution units busy with single threads, the speedup from SMT is never going to be double. Netburst managed 25-30% at best, and I'd expect Nehalem to achieve about twice that.

I am not saying anything new or surprising here, right? I can never effectively tell when I am conversing with ignorant newbs who just like to express their unjustified opinions versus the few folks who do know a thing or two about processors at this forum. No offense intended.

I think the general consensus among reviewers and readers alike, is that SMT does a good job of boosting performance, but that it is no substitute for another core.

Lastly, before you use terms such as 'ignorant newbs', I suggest you properly research the thread topic before posting. No offense intended.
 

Nemesis 1

Lifer
Dec 30, 2006
11,366
2
0
Actually Barret was right about Netburst . It was just to early for netburst. If Intel would have taken P4C down to 90nm than 65nm. I think netburst P4C on smaller process would have shocked a lot of people. P4C at 4.ghz was FAST . Problem was not many would reach that speed. But @ 90 it would have done 4ghz easy @ 65 . I won't even say it.
 

BitByBit

Senior member
Jan 2, 2005
474
2
81
Netburst was retired at 65nm. 'Cedar Mill' was a P4 with 2MB cache per core, which still didn't hit 4.0GHz.
 

v8envy

Platinum Member
Sep 7, 2002
2,720
0
0
Intel always has the option to dig up the rotting corpse of Netburst if it makes sense. You raise an interesting point -- a 32nm netburst design just might get close to the 10Ghz Intel envisioned for the architecture. Of course, it'd probably belch forth about 200 watts per core even if running at half a volt, but hey. 10 gigs!

Now wouldn't THAT be irony. "You don't want that crappy core2 architecture CPU, the new 16 core Netbursts are a no brainer! You'll need a ten killowatt PSU though to drive that and your 8 video cards, which means re-wiring your house..."
 

dmens

Platinum Member
Mar 18, 2005
2,275
965
136
the netburst family may be discontinued, but core2 has demonstrated that aggressive pipelining and clocking is still the best way to achieve high performance. the key learning is not to make uarch decisions based purely to boost clock speed.
 

lopri

Elite Member
Jul 27, 2002
13,310
687
126
To OP: The link you provided has a lot of precious information in it (if the source is correct) and I thank you for that. But your following comment (what does Mr. Hector have to do with Nehalem architecture anyway?) is kinda unproductive as you can see other members are more interested in Nehalem's technical merits/advancement than what it means for Intel's business vs AMD's.

So on topic: Assuming the information is correct, I am surprised that Intel is integrating PCI-E on-die, not to mention memory controller. That'd be big - I'd expect significant boost in graphics performance. Also note that this move will practically obsolete the concept of 'north bridge'. There will be no more north/south, but only the 'south', of which main function would be taking care of I/O.

One worriesome thing (this one I indeed expected coming) is that Intel will attempt to seperate the high-end/low-end CPUs with sockets. So this means there will be no more 'E6600' that performs like 'X6800', 'Q6600' that performs like 'QX6800'. From the charts, it looks like the high-end Nehalem will incorporate memory controller for DDR3, in contrast to that for DDR2 on lower-end Nehalem. Whether this is a transitional thing (which will depend on DDR3 price) remains to be seen, but I can't hold my breath.
 

jones377

Senior member
May 2, 2004
458
55
91
If this info is correct, Bloomfield(LGA1366) won't have an integrated memorycontroller, but instead use the memory controller on the Tylersburg chipset. Lynnfield and Havendale (LGA1160) will have the northbridge on the same package in an MCP (Multichip Package), simular to how the current quadcores are MCM and not "native".

I wonder if this is a result of an early decision by Intel to make the memory controller FBD2 instead of DDR3. Right now this is just a bit confusing.
 

Nemesis 1

Lifer
Dec 30, 2006
11,366
2
0
Originally posted by: BitByBit
Netburst was retired at 65nm. 'Cedar Mill' was a P4 with 2MB cache per core, which still didn't hit 4.0GHz.


Prescott was the real Netburst blonder= 30 pipelines. Not even close to the P4 Northwood = P4C = 20 pipelines.

I just read a link over @ XS . It seems to be a webmaster writing about the link I got at XS. The guy screws alot of things up. The one that I like the Best was . How Nehalem pipeline will be in the low 20's . I can say with a great deal of confidance . Nehalems pipelines will be 20 or less . But more than 14.

 

Nemesis 1

Lifer
Dec 30, 2006
11,366
2
0
Ya I read that also but new it was a writing error. Here's another link based off this threads link.

http://www.bit-tech.net/news/2...ll_through_the_tubes/1
Now there are some writing errors here also . You should know the basics by now. As I said the fact the auther makes it sound like the number of pipelines is greater than 20 . I would not remember that one . Its not true.

5 months ago people were arguing here Over intels 4issue core not being utilized. Peps even said amd 3issue was not fully utlized . Than they went to say Intels 4 issue core made zero performance improvements. I hope its true because with 2/4/8 threads this 4issue core is going to rock . Comparing Intels new H/T on this 4 issue core Vs. P4netburst 3issue core. Their is no way to make an apple to apple comparison . By the way guys / saying merom is based on Pro core is a little misleading as one is a 3 issue core and one is 4 issue . with the Merom also having more pipes. So it would be a core worked over to the max
 

zach0624

Senior member
Jul 13, 2007
535
0
0
I think it will be interesting to see how much of a difference ht makes in Nehelam performance wise. From what I understand it won't help programs that already are multithreaded that max out all the cores, but what it would be good at is filling cores up with 2 smaller threads using the same resources. How much of a benifit would 8 threads be though when most programs only run one, eight diff. single threaded programs? I however don't see a 100% boost in over all perf. between nehelam and c2d/q I think that 30-50% is a reasonable assumtion. Lets just hope their plans with clock scaling don't end up like another netburst.
 

xxceler8

Member
Dec 29, 2007
80
0
0
I thought nehalem was supposed to be native 8 core? I wonder if that was all hype that won't happen. That really sux especially for people like me that are waiting for it!
 

coldpower27

Golden Member
Jul 18, 2004
1,676
0
76
Originally posted by: note235
wheres the 8 core/16 thread cpu?

A variant will be.

Originally posted by: xxceler8
I thought nehalem was supposed to be native 8 core? I wonder if that was all hype that won't happen. That really sux especially for people like me that are waiting for it!

The initial version is a native quad core, with an octo core version coming down the pipe don't know the exact time frame, maybe mid to late 2009 if I were to hazard a guess.