Nehalem, aka Bloomfield, looks Impressive

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

tno

Senior member
Mar 17, 2007
815
0
76
It's really nice to see Intel really getting behind their products with this kind of enthusiasm, what I worry about is whether it's called for. I'm not trying to be a naysayer but we heard this kind of talk during the glory days of Netburst (where's my 10ghz P4). It makes me wish that AMD had some truly competitive products on sale right now! That would stoke the flames under the intel engineers bottoms and make sure the product they produce is ready to lay the smack down on Phenom II.

Then again, I'm the guy that postulated that the Netburst fiasco was a well planned and intentional engineering exercise.

tno
 

Denithor

Diamond Member
Apr 11, 2004
6,298
23
81
Originally posted by: tno
It's really nice to see Intel really getting behind their products with this kind of enthusiasm, what I worry about is whether it's called for.

I see lots of parallels in the video card world with the new launch of the 8800gt today. nVidia stepping up to the plate to introduce the next level of performance at a lower price even when DAAMIT doesn't have anything close to their top-end cards.

Seems like DAAMIT is falling further and further behind on both fronts these days, makes me wish that AMD hadn't bought ATI, they both seemed to do better before the merger.
 

Emission

Senior member
Mar 4, 2007
580
0
0
Originally posted by: ArchAngel777
I am going to make a prediction so I can come back and see if it came true.

Nehalem will be ~15% clock for clock faster than Penryn.

I'd be willing to say it'll be ~75% clock for clock faster than Penryn, with some nice improvements in efficiency.

In regards to DDR3, I think it'll drop to reasonable levels by the time Nehalem sees the store shelves.
 

Viditor

Diamond Member
Oct 25, 1999
3,290
0
0
Originally posted by: IntelUser2000
I'll play...

For desktop, I'd say ~5-10%...

Very conservative expectation I think. 5-10% is what's expected in Penryn, thought to be a merely a die shrink with little more cache before Intel flooded with all the information. I'd say double that is very likely if not more. FP/SSE units are also supposed to get a significant overhaul, and some rumors say even stuff like branch prediction might be better.

That's a fair call...but as you say, all other improvements are just rumour at this point. Going by what's been released so far, I'd say 5-10% over Penryn for single socket sounds about right. When more info/verification comes on the rest, I would certainly re-evaluate.
 

Phynaz

Lifer
Mar 13, 2006
10,140
819
126
Originally posted by: Viditor
Originally posted by: IntelUser2000
I'll play...

For desktop, I'd say ~5-10%...

Very conservative expectation I think. 5-10% is what's expected in Penryn, thought to be a merely a die shrink with little more cache before Intel flooded with all the information. I'd say double that is very likely if not more. FP/SSE units are also supposed to get a significant overhaul, and some rumors say even stuff like branch prediction might be better.

That's a fair call...but as you say, all other improvements are just rumour at this point. Going by what's been released so far, I'd say 5-10% over Penryn for single socket sounds about right. When more info/verification comes on the rest, I would certainly re-evaluate.

Nehalem will be considerably more than a Penryn with an IMC and Quickpath.

 

Viditor

Diamond Member
Oct 25, 1999
3,290
0
0
Originally posted by: Phynaz
Originally posted by: Viditor
Originally posted by: IntelUser2000
I'll play...

For desktop, I'd say ~5-10%...

Very conservative expectation I think. 5-10% is what's expected in Penryn, thought to be a merely a die shrink with little more cache before Intel flooded with all the information. I'd say double that is very likely if not more. FP/SSE units are also supposed to get a significant overhaul, and some rumors say even stuff like branch prediction might be better.

That's a fair call...but as you say, all other improvements are just rumour at this point. Going by what's been released so far, I'd say 5-10% over Penryn for single socket sounds about right. When more info/verification comes on the rest, I would certainly re-evaluate.

Nehalem will be considerably more than a Penryn with an IMC and Quickpath.

Well, Quickpath shouldn't effect single socket (desktop) at all...
And the IMC shouldn't have much (if anything at all) performance effect over current Conroe/Penryn designs as the large cache negates the advantages of an IMC (comparisons to AMD have shown just those points).

Do you have any other published news on what else Intel is designing into Nehalem yet?
 

jones377

Senior member
May 2, 2004
463
64
91
Originally posted by: Viditor
Originally posted by: Phynaz
Originally posted by: Viditor
Originally posted by: IntelUser2000
I'll play...

For desktop, I'd say ~5-10%...

Very conservative expectation I think. 5-10% is what's expected in Penryn, thought to be a merely a die shrink with little more cache before Intel flooded with all the information. I'd say double that is very likely if not more. FP/SSE units are also supposed to get a significant overhaul, and some rumors say even stuff like branch prediction might be better.

That's a fair call...but as you say, all other improvements are just rumour at this point. Going by what's been released so far, I'd say 5-10% over Penryn for single socket sounds about right. When more info/verification comes on the rest, I would certainly re-evaluate.

Nehalem will be considerably more than a Penryn with an IMC and Quickpath.

Well, Quickpath shouldn't effect single socket (desktop) at all...
And the IMC shouldn't have much (if anything at all) performance effect over current Conroe/Penryn designs as the large cache negates the advantages of an IMC (comparisons to AMD have shown just those points).

Do you have any other published news on what else Intel is designing into Nehalem yet?

Unlike HT3 over HT2 you mean?
 

Phynaz

Lifer
Mar 13, 2006
10,140
819
126
Originally posted by: Viditor
Originally posted by: Phynaz
Originally posted by: Viditor
Originally posted by: IntelUser2000
I'll play...

For desktop, I'd say ~5-10%...

Very conservative expectation I think. 5-10% is what's expected in Penryn, thought to be a merely a die shrink with little more cache before Intel flooded with all the information. I'd say double that is very likely if not more. FP/SSE units are also supposed to get a significant overhaul, and some rumors say even stuff like branch prediction might be better.

That's a fair call...but as you say, all other improvements are just rumour at this point. Going by what's been released so far, I'd say 5-10% over Penryn for single socket sounds about right. When more info/verification comes on the rest, I would certainly re-evaluate.

Nehalem will be considerably more than a Penryn with an IMC and Quickpath.

Well, Quickpath shouldn't effect single socket (desktop) at all...
And the IMC shouldn't have much (if anything at all) performance effect over current Conroe/Penryn designs as the large cache negates the advantages of an IMC (comparisons to AMD have shown just those points).

Do you have any other published news on what else Intel is designing into Nehalem yet?

Published?

Nope.

 

HopJokey

Platinum Member
May 6, 2005
2,110
0
0
Originally posted by: Phynaz
Originally posted by: Viditor

Do you have any other published news on what else Intel is designing into Nehalem yet?

Published?

Nope.

SMT (Simultaneous multithreading) is a feature that is publicly known about Nehalem that you did not mention.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,787
136
And the IMC shouldn't have much (if anything at all) performance effect over current Conroe/Penryn designs as the large cache negates the advantages of an IMC (comparisons to AMD have shown just those points).

It doesn't really negate the advantages of IMC, rather reduce it. Long time ago when AMD made chipsets, they weren't known for high performance. AMD with their integrated memory controllers, managed to bring substantial improvements over the best chipset, the Nforce 2.

http://www.anandtech.com/showdoc.aspx?i=2795&p=5

On the CPU-Z 1.35 (8192KB, 128-byte stride), where its a more realistic comparison, Core 2 X6800 had 23% more memory latency than the Athlon 64 FX-62.

And there is a bandwidth part that can't be matched with the external memory controller: http://www.tomshardware.com/20...ir_cooling/page32.html

AMD can achieve 43% higher bandwidth than Intel. That's 3x over what 1333MHz FSB for Intel brought over the 1066MHz FSB(Link: http://www.anandtech.com/cpuch...owdoc.aspx?i=2993&p=9)

The 23% lower latency and 43% higher bandwidth alone will bring 5-10%. The memory controller enhancements will probably bring more than that though. 10-15% is possible.
 

bfdd

Lifer
Feb 3, 2007
13,312
1
0
Originally posted by: IntelUser2000
And the IMC shouldn't have much (if anything at all) performance effect over current Conroe/Penryn designs as the large cache negates the advantages of an IMC (comparisons to AMD have shown just those points).

It doesn't really negate the advantages of IMC, rather reduce it. Long time ago when AMD made chipsets, they weren't known for high performance. AMD with their integrated memory controllers, managed to bring substantial improvements over the best chipset, the Nforce 2.

http://www.anandtech.com/showdoc.aspx?i=2795&p=5

On the CPU-Z 1.35 (8192KB, 128-byte stride), where its a more realistic comparison, Core 2 X6800 had 23% more memory latency than the Athlon 64 FX-62.

And there is a bandwidth part that can't be matched with the external memory controller: http://www.tomshardware.com/20...ir_cooling/page32.html

AMD can achieve 43% higher bandwidth than Intel. That's 3x over what 1333MHz FSB for Intel brought over the 1066MHz FSB(Link: http://www.anandtech.com/cpuch...owdoc.aspx?i=2993&p=9)

The 23% lower latency and 43% higher bandwidth alone will bring 5-10%. The memory controller enhancements will probably bring more than that though. 10-15% is possible.

Isn't Viditor an AMD fanboy? I find it funny that so many people preached how IMC was better than FSB for so long and now that Intel is getting it some people actually have the nerve to say "well it won't help that much".
 

Viditor

Diamond Member
Oct 25, 1999
3,290
0
0
Originally posted by: IntelUser2000
And the IMC shouldn't have much (if anything at all) performance effect over current Conroe/Penryn designs as the large cache negates the advantages of an IMC (comparisons to AMD have shown just those points).

It doesn't really negate the advantages of IMC, rather reduce it. Long time ago when AMD made chipsets, they weren't known for high performance. AMD with their integrated memory controllers, managed to bring substantial improvements over the best chipset, the Nforce 2.

http://www.anandtech.com/showdoc.aspx?i=2795&p=5

On the CPU-Z 1.35 (8192KB, 128-byte stride), where its a more realistic comparison, Core 2 X6800 had 23% more memory latency than the Athlon 64 FX-62.

And there is a bandwidth part that can't be matched with the external memory controller: http://www.tomshardware.com/20...ir_cooling/page32.html

AMD can achieve 43% higher bandwidth than Intel. That's 3x over what 1333MHz FSB for Intel brought over the 1066MHz FSB(Link: http://www.anandtech.com/cpuch...owdoc.aspx?i=2993&p=9)

The 23% lower latency and 43% higher bandwidth alone will bring 5-10%. The memory controller enhancements will probably bring more than that though. 10-15% is possible.

I grant that Intel hasn't equalled AMD's IMC, but you're comparing it with only 4MB of L2 cache. Penryn has 6MB, which should decrease the memory latency advantage even more.
As to the 128bit being more realistic, I have to disagree...from the article:

"The 128-byte stride numbers are indicative of what will happen if the pre-fetchers are not able to get the Core 2 the data it needs, when it needs it, while the 64-byte numbers show you what can happen when things go well"

The 2 are representitive of the different scenarios (unless you feel that the C2D prefetch is unable to get the data it needs over the majority of cases...).

But in any event, the point is moot. Keeping in mind that we are discussing desktop only here, my understanding is that these parts won't even have the IMC for the first iteration of Nehalems...

As to bandwidth, I certainly agree that bandwidth will be greatly increased...but for desktop parts, so what? With only a single socket and unless Intel has a super secret project to produce Torrenza-like interconnects, I don't see how any bandwidth increase will speed up desktop parts.

There is something that hasn't been mentioned that might increase performance though...Nehalem will be a native quad-core. I'm still trying to find the details of that (no luck so far), but it might indeed help a desktop part's performance if it's similar to C2D's shared cache structure.

Don't get me wrong, I think Nehalem is going to do wonders for Intel's Xeon line...but for the desktop I just don't see that much improvement yet.
 

Viditor

Diamond Member
Oct 25, 1999
3,290
0
0
Originally posted by: bfdd


Isn't Viditor an AMD fanboy? I find it funny that so many people preached how IMC was better than FSB for so long and now that Intel is getting it some people actually have the nerve to say "well it won't help that much".

Then maybe you should actually read the whole post instead of just the parts you want to notice...
1. Having an IMC means you don't need a large cache to keep memory latency low.
For instance, in the article IntelUser linked, the memory latency was fairly close...but the C2D required twice the L2 cache to get it there. Remember that the L2 cache takes up almost half the chip (or more in some cases), so it's very expensive to use.
2. HT and CSI are both awesome for large numbers of chips/cores but have very little effect on single socket (with the possible exception of cHT, if and only if AMD is able to utilize it for GPUs). Many of the Intel fanboys have been shouting this at me for months, and I haven't disagreed...with the exception of the cHT interconnects that is.
 

dmens

Platinum Member
Mar 18, 2005
2,275
965
136
Having an IMC means you don't need a large cache to keep memory latency low.

the big cache is still highly desirable. the designers may opt for a smaller cache because of cost, or yield, or some other reasons, but saying the cache is not needed for performance is crap (i.e. marketing lies).
 

Viditor

Diamond Member
Oct 25, 1999
3,290
0
0
Originally posted by: dmens
Having an IMC means you don't need a large cache to keep memory latency low.

the big cache is still highly desirable. the designers may opt for a smaller cache because of cost, or yield, or some other reasons, but saying the cache is not needed for performance is crap (i.e. marketing lies).

I agree that it's still desirable...but considering the amount of real estate it takes up, no other benefit is near as critical. I guess I probably should have bolded as above...
While Intel does have an advantage because of their cache density, it's still a LOT of space on the die...

Edit: The only situation I can think of to actually test how much cache means to performance beyond memory latency is to compare the X2s like the 4600 vs the 4800, or the 4400 vs the 4200. They all have the same IMC, and each pair runs at the same clockspeed...the only difference is that the 4400 and 4800 each have twice the cache of the other 2.
Sharky Extreme's review compares them, and the cache only seems to make a difference of 1-3% in most cases.

If we look at the C2D (where the cache effects memory latency so much), the percentages are much greater. AT's review shows improvement by as much as 10% for doubling the cache with the average being 3.5%...
 

piasabird

Lifer
Feb 6, 2002
17,168
60
91
Check out the info on this 45nm process.

http://www.tomshardware.com/20...air_cooling/index.html

Makes for an Interesting Read considering AMD has a big research facility in Germany. Toms hardware is based in Germany I do Believe. I think there have been some push at Intel to move every processor line possible to the 45nm process. It may have more Overclocking potential.

Also has many (45) new instructions to speed up Video Processing called SSE4.
 

dmens

Platinum Member
Mar 18, 2005
2,275
965
136
Originally posted by: Viditor
I agree that it's still desirable...but considering the amount of real estate it takes up, no other benefit is near as critical. I guess I probably should have bolded as above...
While Intel does have an advantage because of their cache density, it's still a LOT of space on the die...

Edit: The only situation I can think of to actually test how much cache means to performance beyond memory latency is to compare the X2s like the 4600 vs the 4800, or the 4400 vs the 4200. They all have the same IMC, and each pair runs at the same clockspeed...the only difference is that the 4400 and 4800 each have twice the cache of the other 2.
Sharky Extreme's review compares them, and the cache only seems to make a difference of 1-3% in most cases.

If we look at the C2D (where the cache effects memory latency so much), the percentages are much greater. AT's review shows improvement by as much as 10% for doubling the cache with the average being 3.5%...

if you do the math and match up the workloads, you'd see the % difference isn't all that different on similar jobs

i.e.
wme, 8.4% for c2d, 11% for x2
hl2, 1.9% for c2d, 3.1% for x2
cinebench, negligible for both

some are outliers, for example divx. but that's because the c2d is hungrier, so to speak. also, the sharkyextreme review is pretty crappy from a cache comparison pov, the workloads are not suitable.

can the imc reduce cache roi? probably in a couple workloads. but it is far from being true as a general statement. it is a minor factor.
 

tno

Senior member
Mar 17, 2007
815
0
76
re: cache

I feel like a good analogy for Cache is actually compounded interest. If you're bank says they'll give you 5% of your savings annually, that's awesome, right? Now, let's say that your bank says that it'll give you 4.95% instead but 12 times a year it'll calculate how much interest you've earned so far (in one month, assuming that 4.95% annual rate) and go ahead and put that in your account. It's a half of a tenth of a percentage point less a year but through out the year they'll add more money to your account and then count that amount the next month, so little by little you save more and more. And when you stretch that out across a long period of time that difference becomes huge. I think the same goes with cache, the more you have the more cycles your computer will save from having to wait for RAM, and we might not notice it in day to day tasks, but our computer notices.

The question is, if the on chip memory controller really speeds up RAM access, will all this extra cache be a little pointless?
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,787
136
As to bandwidth, I certainly agree that bandwidth will be greatly increased...but for desktop parts, so what? With only a single socket and unless Intel has a super secret project to produce Torrenza-like interconnects, I don't see how any bandwidth increase will speed up desktop parts.

You are still talking about I/O bandwidth. Like connections to hard drive and USB. IMC will increase memory bandwidth. http://www.anandtech.com/cpuch...howdoc.aspx?i=2991&p=3

P35 with enhanced memory controller brought 6.5% with DDR2-800 and 1066MHz FSB and 18% with same memory and 1333MHz FSB. Actual bandwidth increase from FSB increase alone is less than 12%. 25% increase in theoretical bandwidth brought only 12% increase in real world bandwidth. AMD on the link I put had 43% advantage of memory bandwidth. That's like going from 1066MHz FSB to 1900MHz. Plus the lower latency of 10 or 20%(you don't really think IMC on Nehalem will have higher latency than Penryn do you?), will bring 5-10% performance alone, at least. Latency is the reason that DDR3(and previously, DDR2 and DDR) isn't faster than DDR2.

You are saying that 40% increase in BW and 20% lower latency will bring ZERO percentage performance improvement. I'd say 5-10%. That's considering Nehalem's IMC isn't any better than Athlon X2's IMC. That isn't likely either, Intel always made good memory controllers.

But in any event, the point is moot. Keeping in mind that we are discussing desktop only here, my understanding is that these parts won't even have the IMC for the first iteration of Nehalems...

Which doesn't really matter because Extreme versions are supposed to have one. And remember how Extreme features have migrated to mainstream pretty fast. Quad core was "Extreme", now we have $300 parts. IMC will migrate to mainstream parts pretty quickly. One day we are in awe by the Extreme version and we see the same features in mainstream versions 2-3 months later. And plus we are talking about the parts with IMC.

Since we are talking in competitive terms, it also depends on how AMD does. If AMD flops as much as they do now by Nehalem's timeframe, we won't see 4GHz+ Nehalem's and we won't see IMC's in mainstream versions.

Actually it isn't only AMD flopping. Intel is also doing better. Most of Intel's products were always delayed and defeatured. In contrary, Merom-based cores came almost entire quarter earlier. If Intel skipped Netburst and went down the Pentium M way, AMD would probably be doing as bad as they do now back in 2004.