Originally posted by: piesquared
On a side note, I wonder where companies like SiSoft earn revenue?
From stupid people who think synthetic benchmark scores matter? I've said before that I think it's irresponsible of review sites to post those results.
Originally posted by: Zstream
The question should be instead of bantering the poster but rather ask the question why can't common code take advantage of this? I have always thought and always will that game company's force the coder to make a quick attempt at something and call it a day.
I bet if AMD paid a few developers to optimize for AMD and still use the same structure as we do now for Intel that AMD would be in the lead. The same can be said for Intel but I bet they already do this.
It's just not that simple. Real programs do a lot more than straight arithmetic - for example, traversing a data structure (e.g. searching a linked list or a tree). Even when they're doing arithmetic, they may be using integers rather than floating point numbers, and FLOPS is only a measure of floating point performance. Slow JavaScript web pages are nearly 100% integer code. You could probably run gmail about as fast on a CPU with
no floating point unit as you could on one with a high-end FPU.
Things in real programs that keep them from hitting theoretical FP throughput:
1) integer operations
2) dependent chains of instructions (no available parallelism)
3) memory access
4) branch prediction (every time a branch is reached in a program, a processor has to guess whether to follow the branch or not, and it's hard to guess right more than ~95% of the time; about 1 in 5 instructions is a branch, so even with a good predictor you still make a lot of mistakes and have to fix them up)
The SPEC benchmarks have had people (grad students, PhDs) optimizing them for
years and still they don't reach the theoretical MFLOPS numbers. It's not just lazy programmers.
Discussing this further is a waste of time unless you understand programming. Do you know how to write a binary tree? Do you know how to sort numbers? (I'm not asking about using a library that does these things for you - I'm asking if you could do them from scratch).
Originally posted by: myocardia
Originally posted by: piesquared
So, the OP's compiler statement holds a fair bit of merit. When programs that do the same task vary so wildly in performance between Intel and AMD, it is irresponsible not to take notice.
I would venture that when two processors running @ roughly the same speed, one of which clearly has a superior design
on paper (that wold be the X2 & Phenom, BTW), yet one of them performs somewhat poorly in almost every app, discounting all benchmaks, it would be irresponsible of us enthusiasts not to take notice. And yeah, FLOPS count for shit, in the real world, where only performance matters.
(I assume you're comparing C2D vs Phenom) Which part of Phenom is better on paper? The load-store unit (Intel reorders loads past stores speculatively; as far as I know, Phenom will only let a load bypass a store if the store's address is ready)? The branch predictor? The L2 cache size? The narrower peak decoder throughput?
Try comparing Via's Nano to Phenom on paper.
Originally posted by: piesquared
I'll guarantee you FLOPS mean a hell of a lot more than the decade old code used in SuperPi.
As far as I can tell, nobody in this thread was stupid enough to suggest SuperPI actually matters.