Separate names with a comma.
Discussion in 'CPUs and Overclocking' started by Dresdenboy, Mar 1, 2016.
Price wise I wouldn't be that sure...
Interesting, I remember reading rumors that there wouldn't be 4 core variant initially and only 6/8.
3.5 base 4.0 turbo is all I was hoping for on initial release versions.
One thing I can tell you for sure: what he said is not fake, at all. Wishful thinking would have been a 4c/8t Summit Ridge already running at over 4 GHz clocks in a flawless AM4 motherboard and executing a bunch of benchmark software with zero issues.
Why should we trust you? Another low posting user coming from nowhere, everyone could post the same.
I didn't call the clock speeds wishful thinking because they are not impressive.
Do you know that 32 cores without FMA/AVX on the server market is the old AMD failed strategy of providing more cores to reach the same performance level, do you? And it doesn't work, because the moment you add more cores you add up to leakage and you end up with worse performance/watt.
Plus AMD doesn't have anything close to the Purley platform today and they didn't announce anything for the future either, so I would not have high expectations on that market.
Btw, if Zen is another Polaris AMD is headed for a fiasco.
Then they wouldn't require multithreading performance in the first place.
You don't have to trust me. I'm in the PC hw/sw business for over 20 years, and I have some degree of credibility in certain circles. Most likely I will also receive a Summit Ridge ES CPU months before review sites like Anandtech, but that's a whole different issue
Who said anything about lack of AVX or FMA support? Zen as a microarchitecture introduces a few compromises, and one of those is the inferior (to Skylake) AVX or FMA performance. It's still supported, just considerably slower than with e.g. SKL or even HSW.
Are you saying that Intel wouldn't be able to roll out a 32-core BDW-EP with a cca. 2 GHz clock rate? Why would that be impossible to do? And keep in mind that a Zeppelin (Zen) core is simpler (leaner) than a BDW core, so piling them up might be a bit easier.
Depends on whether you have a workload that cares about vector FP throughput. Plenty of workloads focus on integer throughput.
And if you have a workload that scales well the 8-wide FP vectors over 32 cores, you should probably be looking at putting it on a GPU anyway.
Isn't hose the features that will be remedy with ZEN+ when it comes year or two later?
Think harder, Intel had this choice and instead took the other route. Why?
It doesn't matter whether the workload can scale over 32 cores with decent throughput, but whether the workload can be processed in an efficient manner. This is a glass-jaw in AMD architecture, a trade-off they are making. Whether they are making this due to 1)lack of resources, due to 2)being delusional or 3)disruptive business strategy is open to question, but given that Intel has much better access to the server market customers than AMD can dream of, I wouldn't put my bets on 3), I would go for 1) but wouldn't discard 2).
You mean two years later when Intel will have deployed something much better than what they have by the time Zen arrives?
They also have a 72-core CPU, so it's more about knowing their market IMHO and working in incremental steps. Evolution vs. revolution. While AMD is kinda forced into taking huge leaps now to get back in the game. Since AMD uses 8-core dies, they can make both 24-core or 32-core server parts. A 24-core server part wouldn't make a huge buzz, we've already seen that with BDW-EX. But making both a 24-core and 32-core server CPU could work out as a breaking news and help AMD to capture the attention of server folks.
Also, I don't think a 32-core BDW-EP/EX could work well with their current 2-ring architecture. It would most likely require a more complex solution to scale well. While AMD chose a different path and that one could work well with up to 32 cores. Otherwise, if that would be hugely inferior to a 24-core layout, AMD simply wouldn't bother making a 32-core part...
Usually in the server market it doesn't matter how much buzz you make, it's more about delivering the promised performance within the efficiency targets, something that Intel has been doing in the last 10 years.
Would you care to explain how Intel dual ring solution for 32 cores would be worse than the MCM solution AMD is going to field with Zen? Unless AMD discovered a magical pixie dust that will magically solve the problem, AMD solution will be even worse than Intel's.
Yes, and no server guys would read any reviews, and noone in the upper management has ever been influenced by the power of marketing. That's exactly why noone spends a dollar on advertising server hardware
GMI and SDF will do the magic. I'm sure you know what they stand for and how they work
As for 32 cores in BDW-EP/EX: there are certain hops that you can quite easily pile up and make things worse. The data packets have to travel around in those 2 rings, and the size of the rings (the number of hops) is quite an important factor. Yes, you could add 4 more cores per each ring, but I'm sure in many workloads it would make things worse, even at similar core clocks than with the current top-of-the-line 24-core BDW-EX... AMD's solution doesn't use any rings BTW.
If those clocks stated by AMD Polaris are even remotely true, I cannot say I'm too surprised. They align perfectly with the latest of my estimations I made in post #1259 (2600MHz (±200MHz) base and 3200MHz (±200MHz) maximum boost).
As explained in post #2044 I assumed that AMD used 2:1 scale in their famous "Orochi vs. Summit" and "Excavator vs. Zen" charts.
Also if the chart was infact about Cinebench R15 (which is highly likely), then the "Orochi" CPU they used in the comparison is likely to be the FX-8350 / FX-8370 (and not the -E model I previously speculated.) These two CPUs have the same base clock and TDP, meaning they will perform identically in multithreaded workloads. The FX-8350 / FX-8370 scores 643 points in Cinebench R15 MT. Multiply that by 1.5x and you'll end up with 965 points. Based on the expected IPC improvement and the clock estimation, Zeppelin should match this target at the 3050MHz "all core boost" if the SMT yield is ~11.7%. A pretty plausible SMT yield for a first implementation, IMO.
I ran the claimed figures through (with the expectation that the IPC is exactly 40% over Excavator).
Used Cinebench R15 only, since it is properly threaded workload which doesn't use modern instructions.
If the sustained base clock during CB R15 MT is 2.8GHz, 8C/16T Zeppelin should score: 833 / 872 / 912 / 952 / 991 (5 - 25% SMT yield)
If the sustained base clock during CB R15 MT is 3.05GHz, 8C/16T Zeppelin should score: 907 / 950 / 994 / 1037 / 1080 (5 - 25% SMT yield)
The single threaded score should be around 113 points at 3.2GHz.
Meanwhile Intel 8/16C parts would score as follows, at the same clocks:
Sandy Bridge-E = 971 / 1058 (2.8GHz / 3.05GHz)
Ivy Bridge-EP = 1002 / 1091 (2.8GHz / 3.05GHz)
Haswell-E = 1091 / 1189 (2.8GHz / 3.05GHz)
Broadwell-E = 1117 / 1217 (2.8GHz / 3.05GHz)
Intel ST at 3.2GHz:
Sandy Bridge = 114
Ivy Bridge = 117
Haswell = 128
Broadwell = 136
Skylake = 139
Based on these figures 8C/16T Zeppelin clocked to 2.8GHz / 3.05GHz would compete in multithreaded performance with i7-6700K (except in AVX2, most likely).
Personally can't wait to see how this killer 4-way-MCM 32-core chip fares against the 28-core Skylake-EP/EX that will be available by the time of its release.
Trouble clocking above 3GHz?
I'm not interested in magic, I'm interested on how they are technically better than Intel ring solution and it seems you do not have an answer, you are expecting a miracle. I'm expecting either huge latencies or very poor efficiency from AMD solution, it is a cheapskate solution for a market that has very little tolerance to it.
I'm not sure you are familiar with the current Xeon layout. They have four blocks of 6 cores each, each pair of blocks is connected by two rings and there is a bridge between the two rings of each pair, so you don't add four cores to each ring to reach 32 cores, you add two, and FYI the top Skylake-EP, the one that will compete with Zen, will have 28 cores.
The question about the speculated Clocks from Mr. Polaris, the real question is, do early Engineering Samples clock near final product clocks?
The way you talk it's like you are rooting for Intel. Why does it matter to you who makes the best product? Let's just see what happens and then buy what works best for the given application. Can we even agree on that? Maybe you work for/have stock in Intel.
I'm familiar, but the 2 rings I've mentioned are still just two 2-way rings, and 12 cores are connected to each other by a double ring:
You would have to add 33% more cores to each ring to squeeze 32 cores into the BDW-EP/EX die. That's not a small number in an architecture that's already sub-optimal (IMHO). No surprise in SKL-EP/EX Intel will switch to a 3-ring (3 double-ring if you will) solution.
My one and only guess was 2.5 3.2
No surprises here imo.
And i expect relatively weak smt too.
Where i do disagree is the st ipc assessment you do and build perf estimates on.
Based on some ppt and weak pr talk?
I think when they are gunning for broadwell ipc on integer - and why shouldnt they especially that wide - thats more or less what they will get. Hsw integer st ipc in cb. Fp more like ivy.
Efficiency will be the key. Then total system cost.
It is not a ring, it is a bridge, but since you said that Intel solution is suboptimal, what would be the optimal solution for you? and how AMD solution is better than Intel suboptimal solution?
On all designs since Trinity, EVT parts have existed at final clocks. Orochi B (Bulldozer) reached it's final clocks with it's second stepping from the launch stepping (B0 vs. B2 release). A1 stepping already reached 3.6GHz.
If AMD plans to launch Zeppelin in 2016, they must have the final silicon available by now. Therefore most likely A0 (which was displayed by Lisa Su) is the final stepping for Zeppelin.