Poulson Intel 50mb cache CPU

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Ferzerp

Diamond Member
Oct 12, 1999
6,438
107
106
Since when were there 10 core processors? Obviously this is a niche product, but the price tag is pretty funny, "Recommended Channel Price $4227.00".

A system with 4 of those and 256GB of ram will run you about $30k. It was worth every penny.
 

Ferzerp

Diamond Member
Oct 12, 1999
6,438
107
106
Anything highly parallel that scales poorly across systems.

Or if you pay per server hosting fees instead of per U or any other price model.
 
Last edited:

386DX

Member
Feb 11, 2010
197
0
0
Since when were there 10 core processors? Obviously this is a niche product, but the price tag is pretty funny, "Recommended Channel Price $4227.00". Granted this chip probably rivals a lot of dual socket systems.

To the home user $4K on a CPU may seem like alot but in a corporate environment $4K on hardware is dirt cheap. For example, we have Engineering workstations that have software that cost upwards of $4K a year per machine. And a single SQL server licenses can easily run you $80-$200K a year. So even if a CPU costs 3 times as much and only gives you 20% performance boost its easily justified if you can reduce 5 machines down to 4. Its very different in a corporate environment... software and licensing costs is what costs money not hardware.
 

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
To the home user $4K on a CPU may seem like alot but in a corporate environment $4K on hardware is dirt cheap. For example, we have Engineering workstations that have software that cost upwards of $4K a year per machine. And a single SQL server licenses can easily run you $80-$200K a year. So even if a CPU costs 3 times as much and only gives you 20% performance boost its easily justified if you can reduce 5 machines down to 4. Its very different in a corporate environment... software and licensing costs is what costs money not hardware.

Just wanted to say I fully agree but I would add a third cost-adder of comparable impact and that is the human compensation component.

It doesn't take too many IT guys each with a $50k-75k compensation footprint before the corporate IT dept itself represents a sizable cost-adder to the management of those engineering workstations, software maintenance, etc.

I once had a $50k cluster budget for an internal project, $25k off the top went to the IT dept for personnel expenses (project support, computer room maintenance, etc), I was left with $25k for both hardware as well as the software expenses. :(

That's corporate life though, since none of it was actually my money I really didn't have room to complain :D The customers paid for it all :thumbsup:
 

Smartazz

Diamond Member
Dec 29, 2005
6,128
0
76
To the home user $4K on a CPU may seem like alot but in a corporate environment $4K on hardware is dirt cheap. For example, we have Engineering workstations that have software that cost upwards of $4K a year per machine. And a single SQL server licenses can easily run you $80-$200K a year. So even if a CPU costs 3 times as much and only gives you 20% performance boost its easily justified if you can reduce 5 machines down to 4. Its very different in a corporate environment... software and licensing costs is what costs money not hardware.

Oh I wasn't doubting that for their purposes $4K is a bargain. I'm sure even the power costs on very big server farms adds up over time if you can cut down on the number of units.
 

Cogman

Lifer
Sep 19, 2000
10,286
145
106
What's amazing about Oracle is that given the choice of throwing their lot in with Itanium or with Sparc...they went Sparc D:

Larry is one smart dude, so what does that say about Itanium :confused:

Given Intel's process tech advantage, I really would have bet on Intel winning that horse race, and yet Itanium has been around for more than a decade now and it is nowhere near dominating the big-iron market.

That said...DEC managed to go bankrupt in the midst of having the industry's crown-jewel of microarchitectures, and Cray nearly did the same. So there's something to be said about poor business decisions I suppose.

Most of the HP dudes speculate this is coming straight from Mark Hurd himself. He is bitter about his termination from HP (pretty much the only providers of Itanium servers) and Oracle figures they can get HP customers to move over to Oracle's Spark platform by discontinuing support.

That said, It is somewhat disappointing that Itanium never got its fair shake. It really is a pretty good architecture.
 

jones377

Senior member
May 2, 2004
462
64
91
Poulson is the first really major core architecture update in the Itanium line since 2002 or there abouts, but could also potentially be the last (except for future process shrinks). This is because Oracle kind of pulled the rug from under Itanium a while back when they stopped supporting it with their software. This despite that their own SPARC line of processors (acquired from Sun when they bought it up) had lower sales than Itanium since some years back.

Itanium has always had lots of cache, especially compared with contemporary x86 processors (even the Xeon ones), but Poulson also has a pretty interesting and ambitious micro-architecture.
 

Tuna-Fish

Golden Member
Mar 4, 2011
1,677
2,560
136
That said, It is somewhat disappointing that Itanium never got its fair shake. It really is a pretty good architecture.

Are you serious? Itanium is one of the worst mass produced cpu architectures ever. Its an arch where a single indirect memory access (that depends on a calculation, and that has a result that the rest of the program depends on) takes a minimum of 16 bytes of code. And if you want *anything* more complex than simple register indirect, say hello to 32 bytes. Itanium might have been competitive for HPC before the GPGPU onslaught, but for common brancy business code full of memory ops Itanium is, and always was, a very bad joke.

I have always found it funny how people say that putting the scheduling in software saves hardware and allows better performance from the same silicon, conveniently forgetting that the OoOE logic in sane processors allows them to run on 4-cycle L1 caches without much of a problem, and that the huge and fast caches Itanium needs to be even remotely competitive with x86 take about a thousand times more transistors than the meager amount taken by OoOE.
 

CHADBOGA

Platinum Member
Mar 31, 2009
2,135
833
136
Are you serious? Itanium is one of the worst mass produced cpu architectures ever. Its an arch where a single indirect memory access (that depends on a calculation, and that has a result that the rest of the program depends on) takes a minimum of 16 bytes of code. And if you want *anything* more complex than simple register indirect, say hello to 32 bytes. Itanium might have been competitive for HPC before the GPGPU onslaught, but for common brancy business code full of memory ops Itanium is, and always was, a very bad joke.

I have always found it funny how people say that putting the scheduling in software saves hardware and allows better performance from the same silicon, conveniently forgetting that the OoOE logic in sane processors allows them to run on 4-cycle L1 caches without much of a problem, and that the huge and fast caches Itanium needs to be even remotely competitive with x86 take about a thousand times more transistors than the meager amount taken by OoOE.
For the same process node, Itanium has always had better SPEC numbers than x86.
 

jhu

Lifer
Oct 10, 1999
11,918
9
81
For the same process node, Itanium has always had better SPEC numbers than x86.

And how well does SPEC translate to real world performance? Itanium supposedly has awesome FP capabilities according to SPEC, and this FP ability was touted from the very beginning. Because of this, I was actually considering getting one (Itanium2 is all over the place on eBay). However, one of the tests that SPEC uses is Povray. Looking at that program only, Itanium2 lags behind even Netburst-based Xeons! And this is using Intel's own icc compilers for the tests!

Here are some more recent benchmarks:
Opteron 4 core, 4 chips, 16 threads, tested May 2009
Itanium2 4 core, 4 chips, 32 threads, tested April 2010

Test 453 is the only one I really care about, and Itanium gets smoked by Opteron.
 
Last edited:

pm

Elite Member Mobile Devices
Jan 25, 2000
7,419
22
81
Are you serious? Itanium is one of the worst mass produced cpu architectures ever. Its an arch where a single indirect memory access (that depends on a calculation, and that has a result that the rest of the program depends on) takes a minimum of 16 bytes of code. And if you want *anything* more complex than simple register indirect, say hello to 32 bytes. Itanium might have been competitive for HPC before the GPGPU onslaught, but for common brancy business code full of memory ops Itanium is, and always was, a very bad joke.

Could you write a quick example of what you are talking about? An IPF logic designer and I were trying to figure your comment out. In IPF, you can do a "ld r1 = [r2]". But if you are talking about something more like
{.mmi
ld r1 = [r2]
;;
mov r3 = r1
nop.i 0x0
} (which is 16 bytes)
then even on x86, it's not one op internally. Microcode expands it out internally. You can code it up in one small line, but when it gets cached and executed, it will be larger. In my mind, the differences are not as big as you seem to be saying.

Patrick Mahoney
Enterprise Processor Division
Intel Corp.
* Not a spokesperson for Intel Corp. *
 
Last edited:

Tuna-Fish

Golden Member
Mar 4, 2011
1,677
2,560
136
Could you write a quick example of what you are talking about? An IPF logic designer and I were trying to figure your comment out. In IPF, you can do a "ld r1 = [r2]". But if you are talking about something more like
{.mmi
ld r1 = [r2]
;;
mov r3 = r1
nop.i 0x0
} (which is 16 bytes)
then even on x86, it's not one op internally. Microcode expands it out internally. You can code it up in one small line, but when it gets cached and executed, it will be larger. In my mind, the differences are not as big as you seem to be saying.

Think of a situation where you are running server-side business logic code, where almost every instruction depends on the one directly preceding it, at least one in 4 is a jump (or conditional execution) and one in 4 is a memory op. And you have 30 megabytes of it (jit-compiled from java), where you rarely spend more than a few iterations in any subset. By LOC, that's probably what the majority of all code ever written looks like. And on it, vast majority of a time an Itanium bundle contains just one op and two nops, the two additional slots only getting any use when you can put conditional paths into them.

I don't care how the CPU expands or executes them internally, I care that one architecture blows up the code size to a point where it is always trashing all it's caches. It would really help if Itanium had a bit in the bundle that just said "screw this, execute the members of this bundle sequentially".
 

pm

Elite Member Mobile Devices
Jan 25, 2000
7,419
22
81
Think of a situation where you are running server-side business logic code, where almost every instruction depends on the one directly preceding it, at least one in 4 is a jump (or conditional execution) and one in 4 is a memory op. And you have 30 megabytes of it (jit-compiled from java), where you rarely spend more than a few iterations in any subset. By LOC, that's probably what the majority of all code ever written looks like. And on it, vast majority of a time an Itanium bundle contains just one op and two nops, the two additional slots only getting any use when you can put conditional paths into them.

I don't care how the CPU expands or executes them internally, I care that one architecture blows up the code size to a point where it is always trashing all it's caches. It would really help if Itanium had a bit in the bundle that just said "screw this, execute the members of this bundle sequentially".

I've been working on Itanium products for over a decade and before that I worked on x86. For the last 6 years my role has been in electrical debug - so I look at Itanium CPU's, mostly Poulson's recently - that do not pass electrical robustness checks, figure out why and debug the issues. I've looked at a lot of IPF assembly code streams - in fact for years I did this daily - and I can say that I very rarely see code that contains one op and two nops. Most code that I see if fairly tightly condensed... this could be some sort of Darwinian function (if it was coarse code, then it wouldn't have an issue), but I still seem to see a lot of raw IPF code and it generally looks pretty good. If you really disagree, send me a chunk of code... Java works, and I'll compile it. I do this all the time, and then I can email it or post it. I think you'd be surprised how compact things work, how well predication and branching hints really do work. I'm not saying it's perfect... every once in a while I see hints that are the exact opposite of what they should be, but generally code looks very clean.

If it wasn't so much work to set up a system, I'd almost be willing to boot an OS and then dump the cache and have a Perl script count nops... but that's a huge hassle to convince a random guy on a forum. The situation you are describing is not what I see happening in real life. IPF is far from perfect, but it's not as bad as you are saying, and in fact has several advantages over x86... among other things the FP architecture is so much easier than x86, there's lot of GP registers, and predication does help with branching.
 

Cogman

Lifer
Sep 19, 2000
10,286
145
106
I've been working on Itanium products for over a decade and before that I worked on x86. For the last 6 years my role has been in electrical debug - so I look at Itanium CPU's, mostly Poulson's recently - that do not pass electrical robustness checks, figure out why and debug the issues. I've looked at a lot of IPF assembly code streams - in fact for years I did this daily - and I can say that I very rarely see code that contains one op and two nops. Most code that I see if fairly tightly condensed... this could be some sort of Darwinian function (if it was coarse code, then it wouldn't have an issue), but I still seem to see a lot of raw IPF code and it generally looks pretty good. If you really disagree, send me a chunk of code... Java works, and I'll compile it. I do this all the time, and then I can email it or post it. I think you'd be surprised how compact things work, how well predication and branching hints really do work. I'm not saying it's perfect... every once in a while I see hints that are the exact opposite of what they should be, but generally code looks very clean.

If it wasn't so much work to set up a system, I'd almost be willing to boot an OS and then dump the cache and have a Perl script count nops... but that's a huge hassle to convince a random guy on a forum. The situation you are describing is not what I see happening in real life. IPF is far from perfect, but it's not as bad as you are saying, and in fact has several advantages over x86... among other things the FP architecture is so much easier than x86, there's lot of GP registers, and predication does help with branching.

Interesting... This is off topic but if you fly out to Las Colinas (Irving) TX, in the next few months, you should look me up.
 

Ferzerp

Diamond Member
Oct 12, 1999
6,438
107
106
He does not understand the difference between x86 and Itanium. It is best to just nod and smile when he posts.

How are the current Itaniums stacking up with the current x86 offerings anyway, on the few apps that work on both.
 

jhu

Lifer
Oct 10, 1999
11,918
9
81
They can if they are coded in Java. :D

(What kind of despicable idea is writing games in Java anyway...)

There's always open source games that can be recompiled such as Quake III and soon to be Doom III (whenever Carmack decides to open source the engine, which is supposedly soon). Also console emulators.
 

jhu

Lifer
Oct 10, 1999
11,918
9
81
How are the current Itaniums stacking up with the current x86 offerings anyway, on the few apps that work on both.

I can tell you that performance (in absolute terms and performance/price) is much worse with Povray on Itanium than x86.
 

lol123

Member
May 18, 2011
162
0
0
There's always open source games that can be recompiled such as Quake III and soon to be Doom III (whenever Carmack decides to open source the engine, which is supposedly soon). Also console emulators.
But those games require OpenGL accelerating hardware, which is hardly available for the Itanium platform right? On second thought, so does Minecraft (although in that case it's through some Java library implementation of OpenGL). I wonder if there's any OpenGL software renderer for Itanium? :D

I can tell you that performance (in absolute terms and performance/price) is much worse with Povray on Itanium than x86.
In fairness though, in the SPECfp numbers you linked to there are also several tests where the Itanium system is much faster than the x86 one.
 
Last edited: