*****CONFIRMED**** PIV slower in Spec tests with HT enabled. Apple not lying about that.

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Eug

Lifer
Mar 11, 2000
24,142
1,792
126
Originally posted by: gramboh
I don't own a Mac but my friend is a devout Ti Powerbook user since he got one last year and replaced with desktop Athlon system with it. He honestly does far far more tweaking than I do on my XP machine (I spent about 2 hours including installation and patching to get my OS the way I wanted it and 100% stable). I think it is because of the large amount of unix programs he uses and tweaking the source before compiling. I still think a unix based system requires more tweaking depending on the software you want to use versus XP. This doesn't apple to native apple software of course.
Yeah, one thing I don't deal with is the underlying Unix, esp. since I'm a complete n00b at Unix. About the only terminal program I ever use is top. I tried installing X11 and GIMP just for the hell of it, but after playing with it I just erased it because Photoshop is much nicer. (I'd rather just buy OS X Office and Photoshop than deal with Unix libraries in the terminal, etc.)
 

Sohcan

Platinum Member
Oct 10, 1999
2,127
0
0
Originally posted by: Eug

They DID admit that the Intel compiler would be faster, but go on to say it is because it is heavily optimized for the Xeon, whereas GCC isn't all that greatly optimized for either CPU (albeit probably better for the Xeon).

I would say the opposite (regarding code generation on x86 via GCC). I am quite sure that GCC internally uses a RISC model, which is going to make register allocation and code scheduling more difficult for x86.

On a slightly related note, Apple's claim that SPEC should be measured while keeping the compiler constant is bogus. It violates the spirit of SPEC CPU, which has always been a computationally intensive suite meant to test the microprocessor, the memory hierarchy, and the compiler. This is why all official submissions are always submitted with the best possible compiler and flag combinations.

I wonder how good the Code Warrior compiler is on the Mac side by the way. In the benches Apple itself showed, compile times with Code Warrior simply blew away GCC. I don't know how this translates into performance of the compiled app however.
I suspect the opposite ;). I've been told (by anecdotes from some Mac users I know) that Core Warrior produces horrible code.

sohcan, are you an Intel employee? just got interested
Yep, I'm taking a break from grad school to do a stint as a microprocessor engineer. Which reminds me, I need to get back to work. ;)
 

sandorski

No Lifer
Oct 10, 1999
70,784
6,343
126
I didn't read the article, but I'd thought I'd throw this in: Microsoft issued a comment recently saying that HT will experience performance problems in WinXP with SP1 installed. I'm sure the issue will be resolved, but so far HT seems to be somewhat of a hit-or-miss feature that needs some time to mature.
 

dexvx

Diamond Member
Feb 2, 2000
3,899
0
0
This is typical Apple marketting hype.

go to Aceshardware to see the actual SPEC results. Apple used their own compiler for the tests on the P4/Xeon platforms.
 

sharkeeper

Lifer
Jan 13, 2001
10,886
2
0
Did ANYONE notice that the article was written almost ONE YEAR AGO? That's when HT wasn't fully finished, even Anand wasn't excited about it then. However, today, it's a much different story.

GOOD observation!

HT 1.0 was definitely MUCH worse than HT 2.0. HT 2.0 is on all C1 stepping Xeons which we have.

I would still like to have the ability to toggle it on and off from the desktop for some our servers that I don't have direct access to the machine's BIOS. I've tried this with workstations and I'm getting better results with it ON. We're a bunch of multitasking mofos here! :)

-DAK-
 

Rand

Lifer
Oct 11, 1999
11,071
1
81
Am I missing something or does Dell not disclose the platform tested anywhere in the artcile or associated .PDF, if anyone knows wherein I might find such information please post.
 

EdipisReks

Platinum Member
Sep 30, 2000
2,722
0
0
Originally posted by: dexvx
This is typical Apple marketting hype.

go to Aceshardware to see the actual SPEC results. Apple used their own compiler for the tests on the P4/Xeon platforms.

yeah, open source compilers that aren't developed by Apple are Apple compilers?
rolleye.gif
 

dullard

Elite Member
May 21, 2001
26,044
4,690
126
As partially pointed out above, this needs to be read with a few things in mind:
1) This test was done on the original Xeons with HT. HT has since been improved, and is continuing to be improved on the P4s and the newer Xeons. All benchmark sites I read originally said HT harmed performance on the orginal Xeons for most tests, but now these same benchmark sites have tests that show HT improves performance on the vast majority of tests.
2) This test was done on Windows 2000 Advanced Server and Windows .NET Enterprize server. As far as I have read, these are not well written for HT (the Windows 2000 Advanced Server came out far before HT). These operating systems as well as any other older OS generally get worse performance with HT enabled.

 

Pariah

Elite Member
Apr 16, 2000
7,357
20
81
It should also be added that some of the tests did have HT enabled. All of them single CPU tests. So if Apple claims to have known that HT hurt performance and tried to present the Intel platform in the best light, as they claim, why did they still enable it on tests, and only single CPU tests at that?
 

buleyb

Golden Member
Aug 12, 2002
1,301
0
0
Originally posted by: LukFilm
Did ANYONE notice that the article was written almost ONE YEAR AGO? That's when HT wasn't fully finished, even Anand wasn't excited about it then. However, today, it's a much different story.
rolleye.gif

Good eye, first thing I looked for when I saw the graphs. Performance graphs done on the P4 3.0 when it came out showed equal or better performance (not sure about SPEC marks, but all the other tests typically were worse in the earlier HT days).

and personally, I think the compiler fiasco isn't nearly as bad as Apple performing hardware tweaks on the G5 that not only aren't on shipping G5 powered Macs, but they aren't really system friendly in typical Unix environments...
 

MistaTastyCakes

Golden Member
Oct 11, 2001
1,607
0
0
you are totally missing the point of why Apples makes computers.

To make money off of an uneducated buyer? ;)

:p

...I kid, I kid.

What people fail to see is that, in the big picture, Apples and PC's are almost impossible to fit into a certain niche. Both can be used for diverse apps, both can design, both can code, and both can run intense multimedia applications. A computer's point is up to the person using it, not the manufacturer. It's just like the Apple advocate's argument.. "My Apple can do what your PC can do" - my PC can do what your Apple can do.
 

Eug

Lifer
Mar 11, 2000
24,142
1,792
126
go to Aceshardware to see the actual SPEC results. Apple used their own compiler for the tests on the P4/Xeon platforms.
Why would Apple be making compilers for Linux on x86? That makes no sense whatsoever. As EdipisReks says, the compiler used for the PC is a very popular one. Gnu C Compiler (GCC) 3.3, on Linux. It's so popular because it's widely available and free, and apparently produces reasonably good code.

Here's my cross-post (of info taken from a MacNN thread). I hope some of you can understand it, because I don't.

Ok, I've scoured the net, trying to get the info needed to compare the optimizations dell used in their own benchmarks and the ones apple used in theirs.

Here's what I've discovered:

Compiler options used by dell (based on their CINT2000 Result file at spec.org): -Qipo -QxW -O3 +FDO breakdown for each option (as best I understand it - based off this information.):

ipo - interprocedural optimization across multiple files and link objects
xW - optimize code to run exclusively on Pentium IV processors (uses SSE2)
O3 - aggressive optimization (prefetching, loop transformation, etc.) (same function as in gcc)
FDO -(??) Feedback Directed Optimization - apparently further optimizes code during runtime.

I think Apple's compile time options are a little more widely known and understood than these, but just to be through: (info from the VeriTest report)

Dell Precision 650 flags: -O3 -march=pentium4 -mfpmath=sse
Dell Dimension 8300 flags: -O3 -march=pentium4 -mfpmath=sse
Apple G5 flags: -fast -lstmalloc

The flags used for the Dell systems are somewhat analogous to the ones used above (SSE2 enabled, P4 specific optimizations, etc. for the Apple flags:

fast - enables G5 specific optimizations (implies -O3)
lstmalloc - links to faster malloc libraries

I'm not sure why faster malloc libraries were used in the G5, it may have something to do with a discrepancy in OS X and x86-Linux memory management and/or efficiency. The most important thing, I believe, is that the library is single threaded, which may increase the benchmark speed.

In the end I'd guess that the various optimizations evened out across the two platforms (PPC and IA-32), though I'm not 100% sure. The customized malloc libs may be a sticking point, but I don't know how much of an effect on the result they'd have.


The other interesting point is that Apple's GCC 3.3 SPEC benchmark numbers for the 2 GHz system were significantly lower than IBM SPEC numbers, for a slower 1.8 GHz chip. I suspect the reason for this is that IBM used a highly optimized and tweaked compiler for the PPC 970 for their own tests. Thus it would seem that GCC 3.3 is inefficient on both Apple and Intel platforms and doesn't particularly favour Apple systems.
 

drag

Elite Member
Jul 4, 2002
8,708
0
0
Gcc isn't designed for speed unfortuanatly.. It's designed to be portable a produce reliable speedy code. Plus it is Free and free. :)

It's cool how this computer works.

Each cpu has it's own 500mhz pathway to the chipset. At the typical overblown ddr rating it is suppose to be 1000mhz, but we all know that 1000mhz ddr speed is not = to 1000mhz real speed. The best fastest bus for the x86 is Intel's "quad-pumped" 800mhz bus. It is realy a 200mhz bus that is suppose to be able to transfer information 4 times for every clock, which seems doubtful. But I am not in a position to disput this.

(In dual cpu mode does the intel have to share this single bus? or does each cpu have its own?)

And look at it this way. PowerPC's typically can do 30-40% more work per cpu cycle then the Intel cpu can.

So if you look at it in terms of real bus speeds, the bus for the newest Pentium4 is able to supply information for 1 out of every 16 cpu clock cycles, in quad pump mhz they can get information for 1 out of every 4. In a G5 this translates into 1 in 4 for real-world mhz and 1 in 2 for ddr mhz.

Plus you have one isolated channel for each cpu, so there is no bandwidth sharing.

So for s***s and giggles I think I'll try some math out. (I know this probably, well definately, not accurate)
I'll try to translate the Apple cpu over intel cpu in terms of percentages of power were per cpu cycle

so Apple starts off with a 130% advantage per mhz and the much larger bandwith aviable to it per cpu cycle. I am tempted to put that at 260% per cpu cycle because of the dual proccessors, but I know that dual proccessors typically are only 130-150% as effective as a single proccessor, but since the architecture is speficly designed for dual proccessors and has seperate channels for each proccessor and has 128bit pathway to the RAM I will peg the effectiveness at 160% dual cpu bonus.... I also throw in a 140% for the 64bit, but I will have to take away 30% since most of the programs will be half 64 and 32 bits since they were originally designed for 32bits.

(remember I am talking about PER CPU CYCLE right now)

So 110% bonus for the 64bit into 130% bonus for efficiancy, gets you 143% per cycle.
143% percent combined with the 160% dual cpu bonus gets you a grand total of
228.8% effeciveness...

But I will have to take away 10% becuase the world is not what it should be. So that's 218% effectiveness clock to clock.

So the dual G5 scores a intel pentium4 rating of 4360mhz relative effectiveness. Thats 136% the umph of the single intel 3.2 ghz or 145% compared to a 3ghz intel cpu.

SO if AMD was the ruler of the G5 world the cpu name would be the PowerPC 4500+. :p

Of course if you look at the Apple benchmarks and average out the 194% for the fp rating and 166% for the other spec benchmark you get 184% effectiveness, which is a much more aggressive rating then the Drag-o-Meter benchmark. I think that mine is more accurate at 145%. :D

And if you listen to the Mac rumor land, Apple should be getting a 3ghz g5 by next year so by the time intel gets around releasing the pentium5 at 4/45/5Ghz Mac users should have a PowerPC 6700+ aviable at 163/145/130% effectiveness... and still Intel won't have a 32 proccesor home market and right now there Itanium is flopping in terms of sales...

It will be nice when AMD gets AMD64 out. Then the G5 will have some real competition!

(And yes I know this is a very bad way to compare proccessors and computers, but I just thought it was funny, so don't bother flaming me.)
 

Eug

Lifer
Mar 11, 2000
24,142
1,792
126
Low level testing results out. Doesn't consider any cache or memory subsystem or anything like that. Purely a CPU bench, with results normalized to a 1 GHz G4 (The G4 results below are from my own TiBook actually.)

2 GHz G5 (single CPU) vs 1 GHz G4

Int: 172 vs 100
FP: 270 vs 100
Vec: 208 vs 100

Thus, it seems the G5 isn't so great MHz for MHz when dealing with integer, but kicks @ss with floating point. It other words, in ideal theoretical situations, the dual G5 2 GHz would do floating point calcs at about 5.4 times the speed of my TiBook.
 

dullard

Elite Member
May 21, 2001
26,044
4,690
126
Originally posted by: drag
And if you listen to the Mac rumor land, Apple should be getting a 3ghz g5 by next year...
I keep seeing you brag about this as if it was something brag worthy. Up until now I've kept my mouth shut, but I don't feel like it any more. According to the rumors there will be a 3 GHz G5 in 12 months from the release of the 2 GHz G5. That is a 50% speed boost in 12 months.

Moore's law was originally meant for transister count, but for a couple of decades now it has held really well for processor frequency as well. The typical variation Moore's law is that processor speeds double every 18 months. Thus in 12 months processors typically are 60% faster (the math isn't very difficult to show this).

Wait a minute, you are bragging that the G5 will get speed boosts that are slower than normal?

Note: Intel itself may not follow that 18 month doubling rule during this year either (it will also be slightly behind the 18 month pace). However, Intel is focusing on efficiency improvements with new Prescott commands, a greater FSB, double the cache, hyperthreading 2, etc. That combined with its processor frequency boosts will give the overall performance far more than a 50% boost in 12 months - so the G5 will start falling behind again if it only hits the rumored 3 GHz speed.
 

dexvx

Diamond Member
Feb 2, 2000
3,899
0
0
Originally posted by: EdipisReks
Originally posted by: dexvx
This is typical Apple marketting hype.

go to Aceshardware to see the actual SPEC results. Apple used their own compiler for the tests on the P4/Xeon platforms.

yeah, open source compilers that aren't developed by Apple are Apple compilers?
rolleye.gif

SPEC is a measurement of compiler capability as well as hardware capability. Since almost all software being made today (with the exception of legacy software) is using this, its not unreasonable to assume so.

ZDNet Story

"It wasn't really a fair test," said Gartner analyst Martin Reynolds, who said that the Dell machines are capable of producing scores 30 percent to 40 percent higher than those produced under Apple's methodology. "The reason this happened is Apple had a third party go out and test a Dell under less than optimal conditions."
 

Sohcan

Platinum Member
Oct 10, 1999
2,127
0
0
Originally posted by: Eug
go to Aceshardware to see the actual SPEC results. Apple used their own compiler for the tests on the P4/Xeon platforms.
Why would Apple be making compilers for Linux on x86? That makes no sense whatsoever. As EdipisReks says, the compiler used for the PC is a very popular one. Gnu C Compiler (GCC) 3.3, on Linux. It's so popular because it's widely available and free, and apparently produces reasonably good code.

It's widely used under Unix when "good enough" performance is required. It is not as widely used for HPC applications.

The real culprit for the SPECfp discrepancy is not GCC...10 of the 14 SPECfp tests are written in Fortan, not C. For these, NAGware's Fortran compiler was used, and the results were not pretty (these are the geometric means calculated from the official SPEC results and Veritest's disclosure):

C programs (177.mesa, 179.art, 183.equake, 188.ammp):
3 GHz P4:
* ICC: 1086
* GCC: 788
2 GHz G5 (GCC): 778

Fortran programs (168.wupwise, 171.swim, 172.mgrid, 173.applu, 178.galgel, 187.facerec, 189.lucas, 191.fma3d, 200.sixtrack, 301.apsi)
3 GHz P4:
* ICC: 1298
* NAGware: 658
2 GHz G5 (NAGware): 866

NAGware yields horrid performance for the P4. SPECfp is a computationally bound suite, and the sophistication and level of optimizations a compiler attempts is going to yield a significant difference.

There is only one source for official SPEC results: spec.org. I suggest waiting until IBM submits official scores for the PPC 970 before attempting to make any comparisons.

So 110% bonus for the 64bit into 130% bonus for efficiancy, gets you 143% per cycle.
143% percent combined with the 160% dual cpu bonus gets you a grand total of
228.8% effeciveness...

But I will have to take away 10% becuase the world is not what it should be. So that's 218% effectiveness clock to clock.
I only wish performance evaluation was so cut-and-dry. ;)
 

Eug

Lifer
Mar 11, 2000
24,142
1,792
126
Thanks Sohcan. It's nice to read stuff from people who seem to know what they're talking about for a change. ;)
 

Sohcan

Platinum Member
Oct 10, 1999
2,127
0
0
Originally posted by: Eug
Thanks Sohcan. It's nice to read stuff from people who seem to know what they're talking about for a change. ;)

Actually I'm just faking it....shhh....:Q
 

Eug

Lifer
Mar 11, 2000
24,142
1,792
126
Gaming benchmarks anyone? Another post from somebody at the show. Dunno what settings (besides the resolution). I'm not sure, but it's probably a 9800 Pro.

"I oversaw someone playing UT2K3 on a dual 2.0 G5. It was pulling frame rates in the 70's to 80's at 1920x1200 resolution on a 23" Cinema Display!

THAT's what I'm gonna play it on... They were saying it wasn't optimized yet either. That's about the same performance as I get at 1280x1024 on an Athlon 2800+.
"
 

Pariah

Elite Member
Apr 16, 2000
7,357
20
81
Anything running that high a resolution would video card limited, not CPU limited. THG benchmarked a 9800Pro on an Athlon 2700+, which most likely is slower than the Apple system, and got 84.6 fps at 1600x1200 giving more evidence that the CPU is not the limiting factor.
 

cmdrdredd

Lifer
Dec 12, 2001
27,052
357
126
Originally posted by: LikeLinus
Originally posted by: Sideswipe001
you are totally missing the point of why Apples makes computers.

Why would that be?

It's to make money. You really think Apple cares about people? Don't flatter yourself. They only care about fattening their wallets. Defend them all you want, but atleast know who you are REALLY defending.

yeah like having you pay THEM $1 for every song you download
rolleye.gif
 

cmdrdredd

Lifer
Dec 12, 2001
27,052
357
126
Originally posted by: Eug
Gaming benchmarks anyone? Another post from somebody at the show. Dunno what settings (besides the resolution). I'm not sure, but it's probably a 9800 Pro.

"I oversaw someone playing UT2K3 on a dual 2.0 G5. It was pulling frame rates in the 70's to 80's at 1920x1200 resolution on a 23" Cinema Display!

THAT's what I'm gonna play it on... They were saying it wasn't optimized yet either. That's about the same performance as I get at 1280x1024 on an Athlon 2800+.
"

but that is NOT the average...trust me those numbers dip to below playable half the time :) he was just stating the highest he noticed and let me tell you it wasn't with more than 8 players in a small room with rockets and heavy geometry either.

*sarcasm*I can stare at a wall and get 300fps too*sarcasm*
 

Eug

Lifer
Mar 11, 2000
24,142
1,792
126
Gaming benchmarks anyone? Another post from somebody at the show. Dunno what settings (besides the resolution). I'm not sure, but it's probably a 9800 Pro.

"I oversaw someone playing UT2K3 on a dual 2.0 G5. It was pulling frame rates in the 70's to 80's at 1920x1200 resolution on a 23" Cinema Display!

THAT's what I'm gonna play it on... They were saying it wasn't optimized yet either. That's about the same performance as I get at 1280x1024 on an Athlon 2800+.
"

but that is NOT the average...trust me those numbers dip to below playable half the time :) he was just stating the highest he noticed and let me tell you it wasn't with more than 8 players in a small room with rockets and heavy geometry either.
Yeah, I asked him the specifics but haven't had a response yet. But regardless of the details, it's likely a big improvement. On a dual 1.42 GHz G4 the performance sucks pretty bad with a 9700 Pro.

yeah like having you pay THEM $1 for every song you download
Actually I think Apple only gets about a 3rd of the money, and has to host to files and do the advertising, etc. The rest of money goes to the record companies.