Looking for a new CPU mostly for MMOs

soccerballtux · Aug 9, 2011

GaiaHunter said:
soccerballtux said:

GaiaHunter said:

What you mean is Blizzard games are heavily optimized for core i architecture, probably making use of all instructions available for core i that are absent from both Phenom II and Core 2 architectures.

If a game is CPU power starved raising the clocks by 22% and getting 8% increase doesn't show much CPU dependence.

And fast IPC simply means at similar speed, more instructions are processed, increasing performance.

If processor X has 1 IPC and processor Y has 1.5 IPC, but processor X is clocked @4GHz and processor Y is clocked @2GHz, processor x still offers higher performance, although requiring much higher clocks.

Click to expand...

Do you konw what instructions does Intel have that AMD doesn't?

Click to expand...

For example SSE4.1 and 4.2 and SSSE3 (4.1 and SSSE3 is present in Core 2, but core 2 has no L3$).

If this is the cause or something else is causing a bottleneck on Phenom II, I don't know, but something doesn't add up and surely isn't IPC.

I believe much of it is this:
http://arstechnica.com/hardware/reviews/2008/07/atom-nano-review.ars/6

If they're doing it here they're probably doing it at every chance they get.
In Intel's defense, WoW is Microsoft's Visual C++, but it wouldn't surprise me if Microsoft Visual C++ were compiled with Intel's compiler.

Either that, or Intel's branch prediction is way better,
or it could also be their 12MB of cache...50% more than AMD's. That makes a big difference in games.

Cerb · Aug 10, 2011

More like Intel has that benchmark suite tailored to their CPUs, so it's all but useless comparing between brands.

Also, a nitpick, it wouldn't matter if Visual Studio were compiled with Intel's compiler; that wouldn't change any of what it outputs.

sm625 · Aug 10, 2011

RussianSensation said:
The reason Phenom II is uncompetitive in games is because that architecture is hardly improved over the Athlon 64 design. Once Bulldozer ships with much improved integer performance and better cache design, it will be far faster in Starcraft 2 than Phenom II is today.

I would expect a huge improvement in SC2 performance. But we have this clown Obrovsky continually posting benchmarks of "bulldy", the latest one showing an old quake game running at 64 fps when it runs at 87 fps on a X4 980. I still think he is benchmarking one that has cache writes disabled, but he will not post the AIDA64 write test results. (He did post the AIDA64 read results, which were very good at 70% better than the 980.)

ed29a · Aug 10, 2011

soccerballtux said:
WoW is Microsoft's Visual C++, but it wouldn't surprise me if Microsoft Visual C++ were compiled with Intel's compiler.

What?

You can use the slowest, least optimized compiler to compile another, if the other is good, the compiled code is going to fly.

soccerballtux · Aug 10, 2011

Cerb said:
More like Intel has that benchmark suite tailored to their CPUs, so it's all but useless comparing between brands.

Also, a nitpick, it wouldn't matter if Visual Studio were compiled with Intel's compiler; that wouldn't change any of what it outputs.

ed29a said:
What?

You can use the slowest, least optimized compiler to compile another, if the other is good, the compiled code is going to fly.

Yes one would think, but not according to this guy, who has been tracking it heavily:
http://www.agner.org/optimize/blog/read.php?i=49

see his post "AMD's Library Contains Intel's 'cripple-AMD' Function"-- http://www.agner.org/optimize/blog/read.php?i=49#115
AMD's core math library was compiled with Intel's Fortran compiler, and using this library with VS C++ 2010 results in un-optimized code.

An executable contains the processor identification for the purpose of determining which optimizations to enable. If you search WoW.exe in notepad for GENUINEINTEL or AUTHENTICAMD you can see this. The Intel optimized parts of the code run using the Intel Kernel Math Library linked functions, and the AMD optimized mathematics run using AMD's Core Math Library [still optimized for Intel], resulting in up to 15% performance loss with all the movement calculations going on for characters+their armor in cities, on top of whatever other losses there are from AMD not writing their CML as efficiently as Intel.

point being: we need to find a hack to return GENUINEINTEL when it's actually AMD, and see what happens in the benchmarks.

ed29a · Aug 10, 2011

Did you check the WoW source code yourself?

What does a compiler compiled by another one has to do with a code calling an external math library? WoW calls the Intel or AMD library to do stuff. For the compiler this is a dumb library call. WoW checks what processor it's running on and calls the corresponding library. How does this call has anything to do with Intel compiler compiling MS compiler?

Now, if the library called is crippled, not efficient, slow, it's not the compiler's fault. All it does is to blindly call it. If I call some library from my code and this library call is slow, the compiler that compiles my code can't do jack about it.

Also searching for 'AUTHENTICAMD' or 'GENUINEINTEL' can lead to a lot more questions and assumptions (like you assumed). You assume that WoW searching for those IDs directly relates to loading a specific library. Did you debug WoW? Did you see if OS loads the Intel vs AMD library right upon searching for the ID? Not saying it can't be the case. But there are many more things WoW can do with the knowledge of the underlying CPU's brand.

Cerb · Aug 10, 2011

soccerballtux said:
Yes one would think, but not according to this guy, who has been tracking it heavily:

That's irrelevant. Most libraries in are binaries, already compiled. When you use them, you aren't compiling them, just compiling in enough code to call the procedures in them.

A compiler takes input set of code, and creates an output binary to execute. That output will be identical, regardless of what compiled the compiler (assuming a bug-free compiler).

soccerballtux · Aug 10, 2011

Cerb said:
That's irrelevant. Most libraries in are binaries, already compiled. When you use them, you aren't compiling them, just compiling in enough code to call the procedures in them.

A compiler takes input set of code, and creates an output binary to execute. That output will be identical, regardless of what compiled the compiler (assuming a bug-free compiler).

That makes sense. So then the only solution would be to either
1. remove these special libraries from VS, or
2. somehow make Intel use full optimizations, if the CPU supports them, no matter the architecture.

soccerballtux · Aug 10, 2011

ed29a said:
Did you check the WoW source code yourself?

What does a compiler compiled by another one has to do with a code calling an external math library? WoW calls the Intel or AMD library to do stuff. For the compiler this is a dumb library call. WoW checks what processor it's running on and calls the corresponding library. How does this call has anything to do with Intel compiler compiling MS compiler?

Now, if the library called is crippled, not efficient, slow, it's not the compiler's fault. All it does is to blindly call it. If I call some library from my code and this library call is slow, the compiler that compiles my code can't do jack about it.

Also searching for 'AUTHENTICAMD' or 'GENUINEINTEL' can lead to a lot more questions and assumptions (like you assumed). You assume that WoW searching for those IDs directly relates to loading a specific library. Did you debug WoW? Did you see if OS loads the Intel vs AMD library right upon searching for the ID? Not saying it can't be the case. But there are many more things WoW can do with the knowledge of the underlying CPU's brand.

I'm not saying it IS the case, I was simply pointing out how it could be possible for this to be the case with WoW. And, if a library call is just execution of binary, then if we removed the 'variables' (those libraries, for example-- though they may or may not be in WoW's code), then we'd have an even match-- hardware vs. hardware, at which point we may (or may not) find the performance advantage is largely related to the amount of on-chip cache and less due to un-optimized code.

We still need to get AMD to implement changeable processor ID like Via's Nano does; so that we can change it to GENUINEINTEL and see how performance changes.

GaiaHunter · Aug 10, 2011

RussianSensation said:
Can you provide evidence to support this? Every time Intel CPUs outperform AMD in CPU intensive games, we always hear a claim that the game is "Intel optimized" or "Core i optimized".

Can it be that Intel designed its CPUs to excel in this type of code/workload (i.e.. games)? Their architectures are simply better designed to excel in games. Similarly, Athlon 64 was far better optimized for such workloads than the Pentium 4 architecture was. No developer sat there and specifically coded games to take advantage of Athlon 64's architecture over Pentium 4. Games just ran better on Athlon 64 out of the gate since AMD's architecture at the time was far more efficient per clock than Netburst was, and AMD had superior integer and cache performance. Simply said, Athlon 64's architecture was optimized to perform better at certain workloads, such as games; it was not that games were optimized to run faster on Athlon 64 architecture.

Intel simply continues to improve the IPC and cache performance of their processors. In constract, AMD has done nothing of the sort since they launched Phenom I. It is no surprise then that the gap in performance in games has grown so large - after all Intel is 2 full generations ahead. In other words, Core i7/SB architectures are simply much more efficient at handling integer operations than the previous architectures.

Take a look at the changes in the architecture which they have made each generation -> Core 2 Duo/Q --> Core i7 --> SB. All these changes resulted in SB being 15-20% faster per clock cycle than i5/i7 was and i7 itself was 20-25% faster per clock cycle than Core 2 Quad was. Overall, SB is now about 45-50% faster per clock (i.e., in IPC) vs. Core 2 Quad 65nm design from 2007. Usually games have a huge cache miss rate, but Intel worked on Nehalem and later on SB to ensure that their CPUs needed less clock cycles for some instructions, featured more execution/store units that helped for computing intensive applications (gaming included), and optimized those architectures to reduce cache misses as much as possible. During the same period of time, all AMD did was raise clock speeds and add more useless cores (for games) on what was already an outdated Phenom I architecture to begin with.

In fact, if you look at the performance of C2Q generation, it's severely lacking in Starcraft II.

In other words, the reason Intel's Nehalem/Lynnfield/Sandy Bridge CPUs perform much better than Phenom I/II and Core 2 Quad generation in games is because those CPUs have far superior cache performance/latency and far greater integer performance over those architectures. It has little to do with Blizzard games being "optimized" to run faster on Intel CPUs. It is Intel who optimized its architectures to run faster at these types of workloads.

The reason Phenom II is uncompetitive in games is because that architecture is hardly improved over the Athlon 64 design. Once Bulldozer ships with much improved integer performance and better cache design, it will be far faster in Starcraft 2 than Phenom II is today.

If SC2 running on Phenom II scaled with higher clocks there would be no problem.

What is illogical is a certain workload be CPU limited, which means it could use more instructions and then one increase the number of instructions that theoretically can be performed and the performance remains the same.

OCing a CPU 22% doesn't increase IPC but it increases the CPU performance. But then on a workload that is limited by CPU performance those 22% translate in a performance gain that is negligible and could be attributed to a combination of error margin and/or other areas of the system working better due to CPU performance increase, it doesn't seem to make sense to present that workload as the paradigm of IPC differences when clearly the faster clocked CPU of an architecture has a substantial decrease in IPC.

A good benchmark for IPC will be a game that shows similar increase in performance with increased clocks, otherwise, one will be measuring some kind of architectural bottleneck, that maybe or may not be present in another piece of software.

Likewise, one wouldn't use a game that is GPU bottlenecked to measure IPC.

soccerballtux · Aug 11, 2011

GaiaHunter said:
If SC2 running on Phenom II scaled with higher clocks there would be no problem.

What is illogical is a certain workload be CPU limited, which means it could use more instructions and then one increase the number of instructions that theoretically can be performed and the performance remains the same.

OCing a CPU 22% doesn't increase IPC but it increases the CPU performance. But then on a workload that is limited by CPU performance those 22% translate in a performance gain that is negligible and could be attributed to a combination of error margin and/or other areas of the system working better due to CPU performance increase, it doesn't seem to make sense to present that workload as the paradigm of IPC differences when clearly the faster clocked CPU of an architecture has a substantial decrease in IPC.

A good benchmark for IPC will be a game that shows similar increase in performance with increased clocks, otherwise, one will be measuring some kind of architectural bottleneck, that maybe or may not be present in another piece of software.

Likewise, one wouldn't use a game that is GPU bottlenecked to measure IPC.

They should have overclocked the CPU-NB/L3 cache. This would explain why performance increased so little, the thing's still running at 2.2ghz.

ed29a · Aug 11, 2011

soccerballtux said:
I'm not saying it IS the case, I was simply pointing out how it could be possible for this to be the case with WoW. And, if a library call is just execution of binary, then if we removed the 'variables' (those libraries, for example-- though they may or may not be in WoW's code), then we'd have an even match-- hardware vs. hardware, at which point we may (or may not) find the performance advantage is largely related to the amount of on-chip cache and less due to un-optimized code.

We still need to get AMD to implement changeable processor ID like Via's Nano does; so that we can change it to GENUINEINTEL and see how performance changes.

I can see you got no clue what you are talking about.

Hypothetical situation:
(1) Intel library makes use of SSE 4.1 and SSE 4.2.
(2) You hack WoW/CPU/something to force WoW to execute Intel library for AMD processors.
(3) AMD processor tries to execute SSE 4.1 and/or SSE 4.2.
(4) ????
(5) Hilarity ensues.

There are very good reasons why libraries made for intel CPUs should be used with Intel CPUs only. There are some instruction sets AMD doesn't support yet.

Cerb · Aug 11, 2011

soccerballtux said:
They should have overclocked the CPU-NB/L3 cache. This would explain why performance increased so little, the thing's still running at 2.2ghz.

They should have not used SC2 as a comparison benchmark. Anyone who's played the game at more than one graphics detail setting will know that the CPU needs change with detail settings, and that no replay benchmark quite captures actual jerkiness/stuttering in play at some given detail settings.

soccerballtux · Aug 11, 2011

Cerb said:
They should have not used SC2 as a comparison benchmark. Anyone who's played the game at more than one graphics detail setting will know that the CPU needs change with detail settings, and that no replay benchmark quite captures actual jerkiness/stuttering in play at some given detail settings.

were those just replays? That's silly, gives no real world indication of performance...

Cerb · Aug 11, 2011

soccerballtux said:
were those just replays? That's silly, gives no real world indication of performance...

Replays work better for repeatability. Some sites do play, but play is variable. Also, low, medium, high, and ultra change the performance demands of the CPU as much as the GPU, while not having much to do with keeping up performance with many units on screen, regardless of detail (IoW, if you play other people, use a lower detail than you can handle for SP, and there are A64 X2 users able to competitively play using low detail). It's a poor game to do performance comparisons with, beyond fairly narrow scope comparisons (such as comparing performance with and without AA, AF, AO, etc., for relative graphics performance).

Search

Looking for a new CPU mostly for MMOs

soccerballtux

Lifer

Cerb

Elite Member

sm625

Diamond Member

ed29a

Senior member

soccerballtux

Lifer

ed29a

Senior member

Cerb

Elite Member

soccerballtux

Lifer

soccerballtux

Lifer

GaiaHunter

Diamond Member

soccerballtux

Lifer

ed29a

Senior member

Cerb

Elite Member

soccerballtux

Lifer

Cerb

Elite Member

TRENDING THREADS