• Guest, The rules for the P & N subforum have been updated to prohibit "ad hominem" or personal attacks against other posters. See the full details in the post "Politics and News Rules & Guidelines."

Intel Skylake / Kaby Lake

Page 551 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

tamz_msc

Platinum Member
Jan 5, 2017
2,837
2,587
136
Posting false information is against the forum rules and should be reported for moderation using the report button. This is an Intel topic, can you at least try to keep it to Intel?
Well my entire tirade started precisely because some people don't like to double-check what they post. And apparently, such incidents are simply glossed over or pushed under the carpet in this so-called Intel thread. Just look at my previous post just above yours.

I mean RIGHT NOW the AMD Q2 Earnings Call thread is being turned into an Intel Q2 thread, by the very same people who claim that I'm doing the opposite in this thread.

Oh the irony..:rolleyes:
 

JoeRambo

Golden Member
Jun 13, 2013
1,117
974
136
Do you have *actual* examples of AVX-512 giving the purported uplifts over 256b and 128b versions? Or did Intel invent it only to calculate digits of irrational numbers?
Strong word coming from guy whose original claim was about:

Epyc 'destroys' Skylake-SP in STREAM Triad. I wonder why. :rolleyes:

Exceed L1D with large data sets in real-world code and AVX2 lead vanishes into thin air. So much for being 3X faster.
It turns out in applications more real world than STREAM Triad, advantage of AVX2 and AVX512 does not vanish ( not to mention that in original claim article, 8 core ryzen with 8xL1/L2 BW resources is being compared to 4 cores of Skylake and achieves only 1.6x score, not your average "vanishing").
 
  • Like
Reactions: Sweepr

TheF34RChannel

Senior member
May 18, 2017
782
301
106
Posting false information is against the forum rules and should be reported for moderation using the report button. This is an Intel topic, can you at least try to keep it to Intel?
Agreed! This is my go to forum for first hand information and it just gets ruined, for me at least.

Back on topic :) we're hearing very little about Z370 putative release dates unfortunately (I presume they go hand in hand with the CPU releases) although admittedly this thread certainly isn't the place for it (I looked for one in the mobo section and found none though).

Anyway, I haven't been this excited since the 2600K release (currently excited for the 8700K I mean) :cool: it packs everything I'm gunning for; hyperthreading, more cores, excellent frequency, etc.
 

tamz_msc

Platinum Member
Jan 5, 2017
2,837
2,587
136
It turns out in applications more real world than STREAM Triad, advantage of AVX2 and AVX512 does not vanish ( not to mention that in original claim article, 8 core ryzen with 8xL1/L2 BW resources is being compared to 4 cores of Skylake and achieves only 1.6x score).
Do you not understand why that happens, or just pretending not to?

Show me real world examples with perfect scaling. What do you have? y-cruncher? Note that I'm not doubting it's capability as a stress-testing application. I just want examples actual workloads. SiSoft Sandra Multimedia doesn't count either. Even Prime95 is more real-world than y-cruncher - at least there is a monetary incentive in calculating the next Mersenne prime. Calculating pi till (insert very large number)+1th digit is about as far from the real world as it gets.

Have you read the conclusions of the blog? It says Zen can make up a lot of lost ground despite its shortcomings.
 

Zucker2k

Golden Member
Feb 15, 2006
1,398
738
136
Oh, you haven't heard about high frequency microarchitecture designs? Go compare Bulldozer to it's predecessor, Thuban, (clock for clock) and you'll understand why Bulldozer failed. Hint: low clocks.

You think Sandy Bridge, and now Kabylake, being able to hit 5ghz for daily operation is by accident, right? It has nothing to do with design, right?
 

tamz_msc

Platinum Member
Jan 5, 2017
2,837
2,587
136
Oh, you haven't heard about high frequency microarchitecture designs? Go compare Bulldozer to it's predecessor, Thuban, (clock for clock) and you'll understand why Bulldozer failed. Hint: low clocks.

You think Sandy Bridge, and now Kabylake, being able to hit 5ghz for daily operation is by accident, right? It has nothing to do with design, right?
IPC = Instructions per clock, meaning you take frequency out of the equation and only focus on what remains - the architecture. Thuban was faster than Bulldozer precisely because Bulldozer had lower IPC due to it's weaker cores, CMT and shared FPU.

Of course there is a role of CPU design in this as well, like Pentium 4 having a very long pipeline which necessitated higher clocks.

Modern CPU architectures, beginning with Haswell, have some passing resemblance at a higher level, for both Zen and Haswell's successors. In fact it is not just a coincidence that compiler flags for Haswell sometimes results in better performance in Zen.
 

Zucker2k

Golden Member
Feb 15, 2006
1,398
738
136
IPC = Instructions per clock, meaning you take frequency out of the equation and only focus on what remains - the architecture. Thuban was faster than Bulldozer precisely because Bulldozer had lower IPC due to it's weaker cores, CMT and shared FPU.
Haha! You still don't get it. Hint: Clocks, clocks, clocks!!! The 'I' and 'P' mean nothing without the 'C' (cycles.) So, no, you don't take frequency out of the equation, ever!

Bulldozer, was a high frequency design. It was supposed to be a speed demon and would've made up for it's relatively weak "ipc" had it managed to run at intended clocks. So, knowing very well the trade-off here, it'll be disingenuous to do a clock for clock comparison between Bulldozer and Thuban.....

This is the last time I'm going to debate you on this due to off-topic. Sorry guys.
 

JoeRambo

Golden Member
Jun 13, 2013
1,117
974
136
So somehow we went from guy who was excluding a bunch of benchmarks that did not fit his narrative that Ryzen has Skylake IPC, into looking at realworldliness of benchmarks and stress tools? This discussion is over for me, basic facts do not change, Skylake has >15% IPC advantage and up to 25% clock advantage when overclocked. No goal shifting and fact twisting can change those facts.

One can always find "reasons" why some tests are synthetic, or "biased" or not "real world" enough, but if you remove true outliers like Cinebench or y-Cruncher, IPC advantage is still Intel's. "benchmark suites" like GB4 or simple tools like latest CPU-Z confirm substantial advantage in ST for Skylake.
 

dullard

Elite Member
May 21, 2001
22,802
1,031
126
Exceed L1D with large data sets in real-world code and AVX2 lead vanishes into thin air.
Large data set? Real-world code? Try Ansys Fluent. http://www.ansys.com/products/fluids It is used by just about any major manufacturer/designer for airplanes, car aerodynamics, chemical reactions, heat transfer (such as cooling fans in computers), oil pumping, etc.

What happens with AVX2 in probably the most common industrial use for number crunching of large data sets? http://www.ansys-blog.com/boost-ansys-performance-intel-technologies/
In Fluent, we’ve added support for Intel® Advanced Vector Extensions 2 (AVX2) optimized binary...Our benchmark results also show the Intel Xeon Gold 6148 processor boosts performance for ANSYS Fluent 18.1 by up to 41% versus a previous-generation processor — and provides up to 34% higher performance per core.
Ansys Fluent is software that can get say 99.46% scaling efficiency with 144 cores as an example:

http://www.ansys.com/Solutions/Solutions-by-Role/it-professionals/platform-support/benchmarks-overview/ansys-fluent-benchmarks/ansys-fluent-benchmarks-release-17/flow-through-a-combustor-12m

Running with hundreds/thousands of cores is fairly common. This isn't calculating pi. This is multi-million dollar simulations. AVX-512 data isn't out yet. But Ansys updates their software (and thus benchmarks) around October of most years, so maybe it will be available then.
 
Last edited:
  • Like
Reactions: Phynaz

tamz_msc

Platinum Member
Jan 5, 2017
2,837
2,587
136
Haha! You still don't get it. Hint: Clocks, clocks, clocks!!! The 'I' and 'P' mean nothing without the 'C' (cycles.) So, no, you don't take frequency out of the equation, ever!

Bulldozer, was a high frequency design. It was supposed to be a speed demon and would've made up for it's relatively weak "ipc" had it managed to run at intended clocks. So, knowing very well the trade-off here, it'll be disingenuous to do a clock for clock comparison between Bulldozer and Thuban.....

This is the last time I'm going to debate you on this due to off-topic. Sorry guys.
On 28nm the best it could do was 5GHz but at an eye watering TDP. In spite of that we all admit that it was a bad CPU architecture to begin with, and will always have lower IPC.

So somehow we went from guy who was excluding a bunch of benchmarks that did not fit his narrative that Ryzen has Skylake IPC, into looking at realworldliness of benchmarks and stress tools? This discussion is over for me, basic facts do not change, Skylake has >15% IPC advantage and up to 25% clock advantage when overclocked. No goal shifting and fact twisting can change those facts.

One can always find "reasons" why some tests are synthetic, or "biased" or not "real world" enough, but if you remove true outliers like Cinebench or y-Cruncher, IPC advantage is still Intel's. "benchmark suites" like GB4 or simple tools like latest CPU-Z confirm substantial advantage in ST for Skylake.
You can choose to go with whatever facts you like, but you can't get over the fact that you provided outright false information(basically lied) regarding y-cruncher. If you remove outliers like y-cruncher there are no gains to be made from AVX512, and hence your 15 percent IPC advantage comes down to 10% or less.
Large data set? Real-world code? Try Ansys Fluent. http://www.ansys.com/products/fluids It is used by just about any major manufacturer/designer for airplanes, car aerodynamics, chemical reactions, heat transfer (such as cooling fans in computers), oil pumping, etc.

What happens with AVX2 in probably the most common industrial use for number crunching of large data sets? http://www.ansys-blog.com/boost-ansys-performance-intel-technologies/

Ansys Fluent is software that can get say 99.46% scaling efficiency with 144 cores as an example:

http://www.ansys.com/Solutions/Solutions-by-Role/it-professionals/platform-support/benchmarks-overview/ansys-fluent-benchmarks/ansys-fluent-benchmarks-release-17/flow-through-a-combustor-12m

Running with hundreds/thousands of cores is fairly common. This isn't calculating pi. This is multi-million dollar simulations. AVX-512 data isn't out yet. But Ansys updates their software (and thus benchmarks) around October of most years, so maybe it will be available then.
In discussion with people who write their own Lattice QCD and MHD/Plasma Physics codes, I know very well when it's about memory bottlenecks and when it isn't.

Interesting that your arguments always revolve around Pentium G4560s and cheap OEM builds on one end and hundreds of thousands of cores running weeks of simulations on the other with little in between.
 

dullard

Elite Member
May 21, 2001
22,802
1,031
126
In discussion with people who write their own Lattice QCD and MHD/Plasma Physics codes, I know very well when it's about memory bottlenecks and when it isn't.
Please share with us the AVX2, AVX-512 benchmarks of their code. More data is always better.
Interesting that your arguments always revolve around Pentium G4560s and cheap OEM builds on one end and hundreds of thousands of cores running weeks of simulations on the other with little in between.
Interesting that just today, while quoting you, I posted this about processors in between:
https://forums.anandtech.com/threads/intel-skylake-kaby-lake-coffee-lake-thread-skylake-x-reviews-out-page-501.2428363/page-549#post-39003239
I think the 1600X chip is the best up-front value of all chips at the moment. The 1600X has faster speed and 2 more cores than the 1500X, for only $40 more. But if you take total cost of ownership into account, the lower power 1600 is probably the best buy. To keep to the topic of the thread though, the upcoming Intel 8600 might win that best buy award from me if the rumored speeds are true
A little information about myself: I'm a cheapskate personally and professionally I ran ANSYS Fluent and coded my own mathematical simulation software for years (I'm no longer in that line of work though and do not have personal AVX-512 experience). So, yes, I do talk about both ends of the spectrum. But I'm certainly not limited to those extremes.
 
Last edited:

tamz_msc

Platinum Member
Jan 5, 2017
2,837
2,587
136
Please share with us the AVX2, AVX-512 benchmarks of their code. More data is always better.
Here: certain QCD kernels on Knight's Landing are only 20% faster using AVX512 over AVX2.
http://www.nersc.gov/users/computational-systems/cori/application-porting-and-performance/application-case-studies/qphix-case-study/

How about something from CMS@Cern:

Includes benchmarks that fit into the cache - Scimark, actual library function to evaluate multivariate PDFs that are heavily dependent on memory bandwidth, and finally includes the CMSSW software framework, the results of the latter showing performance regression:
CMSSW (Reco and HLT)
–Standard build (-O2, -ftree-vectorize, SSE3)
» HSW up to 25% faster than SB for selected modules
» In average 15-20% faster than SB
–Native build (-O2, -ftree-vectorize, -march=haswell)
»Between 2% and 4% slower than Standard build
How about using explicit vectorization instead of compiler autovectorization for ease of use, avoiding verbose compiler-generated code, but with similar levels of performance uplift across a number of different use cases?
https://github.com/VcDevel/Vc

By comparison, your Ansys link was basically a marketing slide.
 
  • Like
Reactions: Drazick
Mar 10, 2006
11,719
2,010
126
Here: certain QCD kernels on Knight's Landing are only 20% faster using AVX512 over AVX2.
http://www.nersc.gov/users/computational-systems/cori/application-porting-and-performance/application-case-studies/qphix-case-study/

How about something from CMS@Cern:

Includes benchmarks that fit into the cache - Scimark, actual library function to evaluate multivariate PDFs that are heavily dependent on memory bandwidth, and finally includes the CMSSW software framework, the results of the latter showing performance regression:

How about using explicit vectorization instead of compiler autovectorization for ease of use, avoiding verbose compiler-generated code, but with similar levels of performance uplift across a number of different use cases?
https://github.com/VcDevel/Vc

By comparison, your Ansys link was basically a marketing slide.
Major data center customers want Skylake-SP so badly that ~30 cos collectively bought 500K pre-PRQ parts. I strongly suspect that the AVX-512 performance that you keep bashing ad nauseum was a significant part of the value prop.

You really need to let it go, dude.
 
  • Like
Reactions: Edrick and Sweepr

tamz_msc

Platinum Member
Jan 5, 2017
2,837
2,587
136
Major data center customers want Skylake-SP so badly that ~30 cos collectively bought 500K pre-PRQ parts. I strongly suspect that the AVX-512 performance that you keep bashing ad nauseum was a significant part of the value prop.

You really need to let it go, dude.
I'm not talking about the data center, and neither do I care. HPC is mostly what I look at.

Your rebuttal to my technical response is some grapevine talk that is typical of Intel practices, which the concerned parties likely won't admit up front.
 
  • Like
Reactions: Drazick and raghu78

coercitiv

Diamond Member
Jan 24, 2014
4,352
5,662
136
If anyone will be so kind as to reply to this post when the s***storm is over, I'll be in their debt.
 

arandomguy

Senior member
Sep 3, 2013
532
149
116
They need to apply the same forum and rule overhaul done with Video Cards and Graphics to CPUs and Overclocking.
 

Timmah!

Senior member
Jul 24, 2010
759
65
91
https://videocardz.com/71269/intel-core-x-series-full-specs-revealed

Full Spec list for Skylake-X. The 18 core has a base clock of 2.6 with TB3 of 4.4 while having a TDP of 165.
So 7920x has 2,9 Ghz baseclock and 7940x 3,1? Sounds logical.

I see its 140W part contrary to 165W for 7940x, but why anyway, if its based on the same die as 7940x? I could understand if it was LCC as 7900x thus grouped with those CPUs TDP-wise.

Anyway, not that it matters that much. Bring it on already!
 

PeterScott

Platinum Member
Jul 7, 2017
2,605
1,540
106
Apple has already shown that they can beat Core's performance at the same power levels. It's only a matter of time.
Until what? Apple starts selling CPUs to third parties? iPads replace everyone's laptops? iPads weren't doing so well the last time I checked.
 

R0H1T

Platinum Member
Jan 12, 2013
2,567
142
106
Until what? Apple starts selling CPUs to third parties? iPads replace everyone's laptops? iPads weren't doing so well the last time I checked.
OR they move Macs to an Ax SoC, it's not like they haven;t done this before.
 

PeterScott

Platinum Member
Jul 7, 2017
2,605
1,540
106
OR they move Macs to an Ax SoC, it's not like they haven;t done this before.
Lot's of "ARM everywhere" types think that is going to happen, but it really doesn't stand up to scrutiny.

It would be a pointless, expensive mess given the relatively small size of the Mac market and the wide breadth of CPU usage.

Macs use everything from 2 cores to 18 cores, across ultra mobile, mobile, desktop, HEDT spaces. You aren't going to replace them with one Arm SoC, you need to develop a whole family of them for a small captive niche. It makes no sense.
 

dullard

Elite Member
May 21, 2001
22,802
1,031
126
...
By comparison, your Ansys link was basically a marketing slide.
So, in defense of your statement that AVX2 leads "vanish into thin air" you post links that show:
  • At least 20% speed gains,
  • 2x speed gains on linear algebra calculations that fit in cache
  • Sometimes up to 4x speed gains on small data sets, 2x in large data sets
  • Haswell chips,
  • Data from the year 2013/2014,
  • and similar?
You have a point, but you didn't quite make it. AVX is power hungry, hot, and thus can down-throttle a CPU. AVX is a memory bandwidth hog due to the immense number of calculations being done much faster. If your code doesn't really need it, or your cooling/memory cannot sustain it, then it might not be worth using. But to say that its gains vanish into thin air is extremely misleading. In reality, if AVX helps, it helps tremendously. Otherwise, it doesn't help.

You just need to know if you need AVX and when to use AVX properly. The software is specialized, and thus is not applicable to many of us on Anandtech. But AVX is extremely valuable when it is used properly.

Ansys posts software updates and benchmarks usually in October. You'll get marketing slides until then.
 
Last edited:

ASK THE COMMUNITY