Penryn detailed benchmark vs Core2Duo

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
Actually, the switch from Dothan to Yonah to Conroe brought smaller improvements than the move from K8 to Barcelona.

Actually, in terms of expected performance improvement from K8 to Barcelona, and only including core enhancements I'd say Yonah to Conroe is at least equal. You can say the faster HTT bus, better multi-CPU features shine over in comparison, but those aren't core enhancements and performance improvements won't be seen on the desktop side, while Conroe's enhancements came out to be all useful on the desktop side for performance.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
wow, source engine likes something about penryn thats for sure. after episode 2 and the multi threaded enhancements, the source engine will really shine on yorkfield.

I've read that Half Life 2 Engine features decent amount of shuffle instructions on their code. Considering how poor the current performance of Conroe in shuffle instructions is, and how much the Super Shuffle engine will improve it, its not surprising that Half Life 2 gains the most.

Latency in operations that Intel is exactly targeting to improve performance with the Super Shuffle engine is really poor in Conroe. It's worse than Yonah, and Athlon 64 based cores are almost twice as faster. Granted that Barcelona will get similar enhancements to the Super Shuffle engine, Penryn's shuffle engine benefits performance more than Barcelona will do the X2.

Conroe:

I331 SSE2 :pSHUFD xmm, xmm L: 1.50ns= 4.0c TP: 0.37ns= 1.00c

I332 SSE2 :pUNPCKHDQ xmm, xmm L: 1.50ns= 4.0c TP: 0.75ns= 2.00c

I333 SSE2 :pACKSSDW xmm, xmm L: 1.50ns= 4.0c TP: 0.75ns= 2.00c

You can see that the latency is 4 cycles, Yonah is around 3, and Athlon 64 cores have weird numbers, but lower than both Yonah and Conroe, at 2.25-2.75 cycles. Both Barcelona and Penryn will lower than to single cycle.


 

VirtualLarry

No Lifer
Aug 25, 2001
56,417
10,092
126
The part about the shuffle instructions - won't that have an impact on crypto code, don't they do a lot of shuffling of bits, or am I misunderstanding what is being shuffled in both cases.
Just wondering which CPU would be fastest for, say, SSL processing on servers.
 

Nemesis 1

Lifer
Dec 30, 2006
11,366
2
0
Originally posted by: zephyrprime
Originally posted by: Nemesis 1
Originally posted by: zephyrprime
Originally posted by: Nemesis 1
Game developers will be among the first to target SSE4. Its really getting exciting in the gaming world. Can't wait to see how Intels Larabee projects turns out . First generation probably a little iffy but after that look out NV/ Dammit
Looking at the SSE4 instructions, they don't really seem to be very useful for gaming so I don't think sse4 will have a big effect outside of codecs and select other applications.

refer to http://en.wikipedia.org/wiki/SSE4


Here refer to this. As you don't seem to understand your own link.

http://www.dailytech.com/article.aspx?newsid=8313

Here's a white paper . Intel says Gaming among other beneffits. Read SSE4 section.If you want more info just ask.

http://www.intel.com/technolog...m-core2_whitepaper.pdf
Having programmed programs that use SSE before, I stand by my comments.


Tell me than how many SSE4 programms have you coded for . Seems to be quite a few vector instructions in the new SSE4 instruction set. Why would Intel say that game would be inhanced if it isn't ? We won't wait long to find out . DivX benefitted greatly from the SSE4 . I believe games will see a large improvement . But I am not going to argue the point. Intel has already released the compiler for programmers the wait should't be long . Witness DivX .

 

Makaveli

Diamond Member
Feb 8, 2002
4,733
1,072
136
Originally posted by: GFORCE100
So it seems like Penryn will be anywhere between 8-10% faster on average in most apps if to place a number on the conservative side. Games seem to love the extra L2 cache.

DirectX will be SSE4 optimized before long so it does very much seem AMD's Agena will be facing stiff competition without SSE4.

I can't see AMD's 512K L2 cache and 2MB L3 cache approach doing too well in comparison. It's L2 is too small for today's clock speeds and data sets. An Agena X4 will only have 4MB of cache Vs 12MB in a Penryn quad core, no matter how efficient your cache is, you can't make up for a lack of 8MB.

It seems the ball is very firmly in Intel's hands.

On Average 10% speed bump when Conroe is already 20% ahead of AMD's X2's giving 30% is nothing to sniff at. Agena must be 30% faster per clock to be 1:1 let alone faster. It's not easier to design a chip that's 40% faster and cost effective to make unless your previous design was lacking in various ways like Netburst was.

I can imagine Intel will be in most people's Xmas lists soon.


I thought everyone knew by now with AMD's IMC they don't need larger caches as much as intel.

The difference in speed between a 512k cache A64 and 1Mb one is small alot smaller than the difference from a 2mb to 4mb Core2, or those spaceheater p4's.
 

zsdersw

Lifer
Oct 29, 2003
10,560
2
0
Originally posted by: Makaveli
I thought everyone knew by now with AMD's IMC they don't need larger caches as much as intel.

Intel doesn't really "need" larger caches with Core 2 either, in the sense that Core 2's performance doesn't hinge upon how much cache it has. Core 2 only shows significant benefits from larger cache when the application can utilize it.

The difference in speed between a 512k cache A64 and 1Mb one is small alot smaller than the difference from a 2mb to 4mb Core2,

"A lot smaller"? I don't think so.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
II thought everyone knew by now with AMD's IMC they don't need larger caches as much as intel.

The difference in speed between a 512k cache A64 and 1Mb one is small alot smaller than the difference from a 2mb to 4mb Core2, or those spaceheater p4's. .

Yea true. The cache size will matter when looking at servers and multi-processing, which is why Xeons have lots of cache, but much much less in desktops. Xeon 7100 gets pretty respectable performance because of the large shared L3 cache. Same with ones like "native" dual/quad cores and hypertransport bus. Relevant in servers, irrelevant in desktops.

I think probably the P4's like cache the most, and especially the Northwood generation(some changes were done in Prescott that didn't rely on cache as much), but it could have been peculiarity of the architecture that allowed it to scale better. That "peculiarity" is what killed the Northwood Celerons.

"A lot smaller"? I don't think so.

Semprons sure performed much closer to the Athlon 64's with double the cache than Core 2 Duo variants with half the cache compared to the elder brothers. I'd say half the gain is significant difference, but not lot smaller.

http://www.anandtech.com/showdoc.aspx?i=2139&p=6

It made me wonder why they bothered to introduce two models with most gain less than 2%. There's isn't even a single benchmark that goes over 5%.

Compared to Core 2 Duo: http://www.anandtech.com/cpuch...howdoc.aspx?i=2795&p=4

Which gains at least 2x as much even though 2MB to 4MB should be gaining lot less than 256KB to 512KB.
 

coldpower27

Golden Member
Jul 18, 2004
1,677
0
76
Originally posted by: IntelUser2000
II thought everyone knew by now with AMD's IMC they don't need larger caches as much as intel.

The difference in speed between a 512k cache A64 and 1Mb one is small alot smaller than the difference from a 2mb to 4mb Core2, or those spaceheater p4's. .

Yea true. The cache size will matter when looking at servers and multi-processing, which is why Xeons have lots of cache, but much much less in desktops. Xeon 7100 gets pretty respectable performance because of the large shared L3 cache. Same with ones like "native" dual/quad cores and hypertransport bus. Relevant in servers, irrelevant in desktops.

I think probably the P4's like cache the most, and especially the Northwood generation(some changes were done in Prescott that didn't rely on cache as much), but it could have been peculiarity of the architecture that allowed it to scale better. That "peculiarity" is what killed the Northwood Celerons.

"A lot smaller"? I don't think so.

Semprons sure performed much closer to the Athlon 64's with double the cache than Core 2 Duo variants with half the cache compared to the elder brothers. I'd say half the gain is significant difference, but not lot smaller.

http://www.anandtech.com/showdoc.aspx?i=2139&p=6

It made me wonder why they bothered to introduce two models with most gain less than 2%. There's isn't even a single benchmark that goes over 5%.

Compared to Core 2 Duo: http://www.anandtech.com/cpuch...howdoc.aspx?i=2795&p=4

Which gains at least 2x as much even though 2MB to 4MB should be gaining lot less than 256KB to 512KB.

Yeah it's interesting how the K8 architecture didn't gain all that much with additional cache.

http://xbitlabs.com/articles/c...ay/sempron-2600_5.html
 

zach0624

Senior member
Jul 13, 2007
535
0
0
I don't know where makaveli got his numbers but what I think he means is that the k8 experiences more severe(I think this is the right word) diminishing returns from extra cache. Please correct me if Iam wrong because I am a noob in this subject.
 

Makaveli

Diamond Member
Feb 8, 2002
4,733
1,072
136
Yes that is what I was referring to Diminishing returns, just didnt' word it properly.

And even with that benchmark cold power27 just posted from Xbit which i've seen before.

The difference from 128k to 1mb might be big, but from 512 to 1mb I still considering those a small gain.

obviously some things benefit alot from more cache like 3dmark2001 and Quake in that link, but the rest alot smaller and alot closer to the Avg gain.

And yes I stand corrected I should have said small not alot smaller, so relax kiddies!
 

Nemesis 1

Lifer
Dec 30, 2006
11,366
2
0
Originally posted by: zephyrprime
Originally posted by: Nemesis 1
Originally posted by: zephyrprime
Originally posted by: Nemesis 1
Game developers will be among the first to target SSE4. Its really getting exciting in the gaming world. Can't wait to see how Intels Larabee projects turns out . First generation probably a little iffy but after that look out NV/ Dammit
Looking at the SSE4 instructions, they don't really seem to be very useful for gaming so I don't think sse4 will have a big effect outside of codecs and select other applications.

refer to http://en.wikipedia.org/wiki/SSE4


Here refer to this. As you don't seem to understand your own link.

http://www.dailytech.com/article.aspx?newsid=8313

Here's a white paper . Intel says Gaming among other beneffits. Read SSE4 section.If you want more info just ask.

http://www.intel.com/technolog...m-core2_whitepaper.pdf
Having programmed programs that use SSE before, I stand by my comments.


I don't mean to be disrespectful but I just hate when people make claims that are just plain false . Leads me to believe they have an agenda. Your a programmer as stated so you should be aware of this.
So I made it easy for all to comprehend what SSE4 means to penryn .

Follow this link .

On the blue text that reads . Introduction to 45nm Next generation Intel core 2 processor. Click on this link . Flash player is required and can be download right from that paragraph. It couldn't be laid out any better than this. As all will see SSE4 is a very big deal.

http://softwarecommunity.intel.../articles/eng/1193.htm

Hoping to increase our knowledge base.
 

NoobyDoo

Senior member
Nov 13, 2006
463
0
71
There's a Wolfdale "benchmark" here .

Wolfdale vs e6850 at 333 x 7 = 2330MHz (DDR2 833 5-5-5-15) using ASUS Commando (Intel P965) .

Conclusion:
Intel 45nm Wolfdale might was not exerted completely limited by the test software support. Luckily, 45nm Wolfdale has 6MB L2 cache, so it behaved wonderful in Super PI and Cinebench. For game performances, if you used GeForce 8800GTX, 24 or 30-inch display, 1920 x 1200 or 2560 x 1600 definitions, then the 45nm Wolfdale cannot improve the game performances largely, because it is hard to see the difference of 65nm Conroe and 45nm Wolfdale from 1280 x 1200 definition.


 

Nemesis 1

Lifer
Dec 30, 2006
11,366
2
0
Nice put it on a X38 chipset DDR3 low latency @ 1333 and run games that use SSE4 and stand back because the scores will ballon at whatever resolution.