• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

Thoughts on POWER4?

Sunner

Elite Member
I posted a thread similar to this one a while back, but the POWER4 was still rather new at the time, so Im hoping for better luck now thats it's been out for a while.

I'd like to know what people who work with chip design, CPU design in particular, but general design as well, think of it.

You people out there must have an oppinion(I hope) about it, lets hear it, on a purely technical basis, no company politics and such involved.

Whats good? Bad? Dumb? Brilliant?

[edit] Oh and the reason Im asking is, Im rather fascinated by this CPU, seems like IBm really tried to do something extraordinary here, I just wanna hear some professionals oppinions on whether they succeded [/edit]
 
Well, I'll give this one a bump.

Come on, one of you peeps out there(Sochan, pm, etc) gotta have an oppinion?
I complain all the time about the way other server admins setup stuff, Im sure you guys do the same 🙂

Oh, and yes, for the record, there are probably lots of people complaining about the way I setup stuff too. 😀
 
Late reply here. 🙂

Power4 is certainly very impressive from a chip-level multiprocessing, bandwidth, and cache size standpoint, but frankly I was a little disappointed in its single-processor performance and clock rate, especially considering the competition. The 1.3GHz Power4 set records for SPECint2K and SPECfp2K, but this will likely be surpassed by the 1.2GHz McKinley/Itanium (in SPECfp) and the 1.2GHz Alpha EV7 (in both SPECint and SPECfp).

What I found perplexing was its (IMHO) low clockrate....the Power4 has a 14 stage pipeline, which is extremely long for RISC CPUs that normally have pipelines 4 to 7 stages long. It's fabbed on an advanced .18u process w/copper interconnects and SOI...compare that to McKinley at 1.2GHz, which has an 8 stage pipeline and is fabbed on a bulk .18u aluminum process w/o SOI. McKinley probably gets a ~10% boost in clock rate as an in-order processor, yet it at the same time it has to deal with slower GPR access (due to its huge 128 register files) and x86 hardware compatibility. Normally McKinley's wider issue (6-way fetch/issue/retire) would hurt its clock rate slightly w/respect to high-end RISC designs, but the Power4 is a very wide design as well: 8-way fetch, 5-way retire, up to 200 instructions in-flight.

Also consider the EV7...the 1.2GHz EV7 will very likely outperform the 1.3GHz Power4 in SPEC2K, which is impressive since it still basically uses the same five year old EV6 core (with on-die L2 cache, glueless SMP, and integrated RDRAM controllers). It will be fabbed on the same IBM process as Power4, but without SOI. While its not as wide as the Power4 (4 fetch, 6 issue, 11 retire, 80 instructions in-flight), it still achieves a very impressive clock rate compared to Power4 for a wide RISC design with half the pipeline length (7 stages).
 
/me watches as thread FLIES over my head......

like this
rolleye.gif



hehe....


Good post Sohcan.

zs
 
But how is real world performance? I would think it would truly shine in multiprocessor setups. Just think the kind of DB server you can have with this thing.
 
Sochan, how do you think it will scale with MP configs?

Im thinking McKinely will have some problems scaling in large SMP configs(8+ CPU's) since it, from what I've heard, uses a shared memory bus, while for example the UltraSPARC III increases it's memory bandwidth as CPU's are added.
Ace's posted a benchmark(some science application I believe) a while back, where a SunFire 15K achieved 80+ percent efficency in a 72 CPU configuration, which is IMO very impressive, compared to a 32 way Unisys box based on P3 Xeons which ichieved a low 20's with 32 CPU's.

I also seem to remember IBM stating that they expected a 32 way regatta with 1.1 GHz CPU's to achieve ~550K TPC-C, which would certainly be very imperssive(though the actual usefullness of TPC-C can definately be questioned).

Thanks for the reply by the way, I had pretty much given up hope on hearing any opinions about POWER4 🙂
 
Actually large is 30+ CPU's, Monster starts at about 128+ CPU's. 8+ CPU's is still considered a baby SMP... I know people in the feild who refer to 2 way SMP computers as Time Sharing machines because as far as they're conserned that's all it is.



<< Sochan, how do you think it will scale with MP configs?

Im thinking McKinely will have some problems scaling in large SMP configs(8+ CPU's) since it, from what I've heard, uses a shared memory bus, while for example the UltraSPARC III increases it's memory bandwidth as CPU's are added.
Ace's posted a benchmark(some science application I believe) a while back, where a SunFire 15K achieved 80+ percent efficency in a 72 CPU configuration, which is IMO very impressive, compared to a 32 way Unisys box based on P3 Xeons which ichieved a low 20's with 32 CPU's.

I also seem to remember IBM stating that they expected a 32 way regatta with 1.1 GHz CPU's to achieve ~550K TPC-C, which would certainly be very imperssive(though the actual usefullness of TPC-C can definately be questioned).

Thanks for the reply by the way, I had pretty much given up hope on hearing any opinions about POWER4 🙂
>>

 
I just found out something that I hadn't realized before....while the Power4 runs at 1.3GHz when the second on-die core is disabled, it runs at 1.1GHz when CMP is enabled. Also, I believe the official Power4 SPEC benchmarks were ran with the 7 of the cores on the 8-CPU MCM disabled (since SPEC is a single-threaded benchmark), which perhaps "unfairly" gives the single Power4 core complete access to all 128MB of L3 cache. For these reasons, the Power4 will likely be in third place in single-processor SPEC performance behind McKinley and the EV7. Note that single-processor performance isn't necessarily the most important factor for MPUs of this class (just look at Sun 😉)....system bandwidth, MP scalability, and software & support is perhaps as much or more important (though single-processor performance is still interesting from an academic standpoint 🙂).

Also, the fact that the Power4 normally runs at "only" 1.1GHz leads more credence to my post above...evidently the on-chip MP network fabric has an effect on clock-rate, but it still seems that the Power4 should scale more considering its process technology and pipeline length. Perhaps IBM used less full-custom design than the EV7 and McKinley?



<< Im thinking McKinely will have some problems scaling in large SMP configs(8+ CPU's) since it, from what I've heard, uses a shared memory bus >>

IIRC McKinley uses a bus similar to the P4 ("quad-pumped"), but with a 128-bit width, yielding a peak 6.4 GB/sec. If it is indeed a shared bus, this does introduce a problem with MP scalability, though I suspect that McKinley/IA64 still has yet to be targeted at 8+ CPU configurations. This is only the second IA64 microarchitecture, so there's still time to improve the bus architecture.

Here's a good "grade-report" (at the bottom) of the remaining enterprise-class platforms (though its a bit out-of-date, and uses Merced to judge IA64).
 
Locutus4657, I was talking about "general" servers, that is, non specialized ones, like the SunFire series, or the p690.

But I guess it's all about oppinions, and everyone has one, to me, once the boxes reach 8 CPU's, then it's a large box, after that it's various degrees of "larger". 😉
 
Back
Top