Intel Skylake / Kaby Lake

Page 552 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

jpiniero

Lifer
Oct 1, 2010
14,585
5,208
136
Lot's of "ARM everywhere" types think that is going to happen, but it really doesn't stand up to scrutiny.

It would be a pointless, expensive mess given the relatively small size of the Mac market and the wide breadth of CPU usage.

The (i)Mac Pro is a very small portion of Mac sales. They could easily dump it if they had to in order to make the move.
 

Edrick

Golden Member
Feb 18, 2010
1,939
230
106
Haha! You still don't get it. Hint: Clocks, clocks, clocks!!! The 'I' and 'P' mean nothing without the 'C' (cycles.) So, no, you don't take frequency out of the equation, ever!

Sorry, but you are incorrect. IPC = Instructions per clock as tmaz stated. It is used for "measuring" the architecture. Not how fast it will clock. Too many people use that term incorrectly on these forums.
 

Edrick

Golden Member
Feb 18, 2010
1,939
230
106
OR they move Macs to an Ax SoC, it's not like they haven;t done this before.

The day MacOS is ported to their Ax SOC, and Macbooks are equipped with them, is the day Intel needs to be worried about.
 

Edrick

Golden Member
Feb 18, 2010
1,939
230
106
Do you have *actual* examples of AVX-512 giving the purported uplifts over 256b and 128b versions? Or did Intel invent it only to calculate digits of irrational numbers?

AVX-512 on my CPU (7820X) is getting slightly over 600GFlops running at 3.6Ghz. That is a huge increase over AVX2 of just last years models. I have been able to write code to take advantage of AVX-512, but it served no purpose other than testing throughput. It does, however, place a heavy load of the memory/cache subsystem. I suspect that this is one of the reasons that Xeon-SP went with 6 channel memory. But it is clear that AVX-512 does perform as Intel states.

With that said, I am struggling to find a real world usage for any AVX-512 code at the moment. Not one of the systems I am responsible for at my company can take full advantage of these new instructions. So the use cases may be very limited.
 
  • Like
Reactions: tamz_msc

R0H1T

Platinum Member
Jan 12, 2013
2,582
162
106
Lot's of "ARM everywhere" types think that is going to happen, but it really doesn't stand up to scrutiny.

It would be a pointless, expensive mess given the relatively small size of the Mac market and the wide breadth of CPU usage.

Macs use everything from 2 cores to 18 cores, across ultra mobile, mobile, desktop, HEDT spaces. You aren't going to replace them with one Arm SoC, you need to develop a whole family of them for a small captive niche. It makes no sense.
Pointless, hardly?

This has been debated to death, & as someone else said I'll just add to that ~ the day iPhone sales start to tank is the day Intel needs to worry about their big fat margin (Iris) payouts from Apple.
 

PeterScott

Platinum Member
Jul 7, 2017
2,605
1,540
136
Pointless, hardly?

This has been debated to death, & as someone else said I'll just add to that ~ the day iPhone sales start to tank is the day Intel needs to worry about their big fat margin (Iris) payouts from Apple.

Logic is not strong in that post. What do iPhone sales have to do with Intel Desktop CPUs?

Losing the entire Apple contract, isn't as big a hit for Intel as losing 10% of the and Consumer/Data Center Markets to AMD would be.

Apple gets lots of press, but they are only ~5% of the Consumer PC market and 0% of the Data Center.
 

tamz_msc

Diamond Member
Jan 5, 2017
3,773
3,596
136
AVX-512 on my CPU (7820X) is getting slightly over 600GFlops running at 3.6Ghz. That is a huge increase over AVX2 of just last years models. I have been able to write code to take advantage of AVX-512, but it served no purpose other than testing throughput. It does, however, place a heavy load of the memory/cache subsystem. I suspect that this is one of the reasons that Xeon-SP went with 6 channel memory. But it is clear that AVX-512 does perform as Intel states.

With that said, I am struggling to find a real world usage for any AVX-512 code at the moment. Not one of the systems I am responsible for at my company can take full advantage of these new instructions. So the use cases may be very limited.
Is it single or double precision? If it's double precision, then it's indeed impressive. Though you you can clearly see why it stresses the memory/cache subsystem - especially when exceeding L2 limits. AIDA64 L3 is what 110-120GB/s on Skylake-X? That is with an uncore overclock. Then with 6-channel memory you achieve almost the same bandwidth on DDR4 alone. MCDRAM exists on Xeon Phis for a reason. Heck GPUs have 4-8X that bandwidth and are still memory bandwidth starved with DP.
 
  • Like
Reactions: Drazick

R0H1T

Platinum Member
Jan 12, 2013
2,582
162
106
Logic is not strong in that post. What do iPhone sales have to do with Intel Desktop CPUs?

Losing the entire Apple contract, isn't as big a hit for Intel as losing 10% of the and Consumer/Data Center Markets to AMD would be.

Apple gets lots of press, but they are only ~5% of the Consumer PC market and 0% of the Data Center.
Where do you think Apple makes their next billion in profit, after the iPhone sales go down or worse tank?

They already make pretty much everything they sell in house, alright technically contract manufacturing, so where do they get the next round of profits? Their in house SoC & (probably) in house GPU is evidence enough that they're thinking big, Intel is obviously on their cross-hairs, when they pull the trigger is the only thing debating IMO.
 

PeterScott

Platinum Member
Jul 7, 2017
2,605
1,540
136
Where do you think Apple makes their next billion in profit, after the iPhone sales go down or worse tank?

I am not sure, and you aren't making it any more explicit. Do you think Apple is going to start selling CPUs/SoCs/GPUs in the open market to compete with Intel?
 

R0H1T

Platinum Member
Jan 12, 2013
2,582
162
106
I am not sure, and you aren't making it any more explicit. Do you think Apple is going to start selling CPU in the open market to compete with Intel?
I'm not sure where you got that implication from? I'm simply saying that Ax could very well replace Intel inside of Macs that we see today, soon if Apple feels the need for it.
 

PeterScott

Platinum Member
Jul 7, 2017
2,605
1,540
136
I'm not sure where you got that implication from? I'm simply saying that Ax could very well replace Intel inside of Macs that we see today, soon if Apple feels the need for it.

That isn't going to make a dent in any iPhone losses. Do you understand the scale of difference between iPhone and Mac sales?

Switching Macs to ARM is a fools errand, and Apple isn't run by fools.
 

R0H1T

Platinum Member
Jan 12, 2013
2,582
162
106
That isn't going to make a dent in any iPhone losses. Do you understand the scale of difference between iPhone and Mac sales?

Switching Macs to ARM is a fools errand, and Apple isn't run by fools.
Right so do you have the numbers to back that assertion up? How much does Apple pay Intel for all their desktop, laptop or workstation products? Not to mention the fact that you're counting iphone sales & not overall profits, which obviously would be lowered but to a lesser extent relatively speaking.
 

tamz_msc

Diamond Member
Jan 5, 2017
3,773
3,596
136
So, in defense of your statement that AVX2 leads "vanish into thin air" you post links that show:
  • At least 20% speed gains,
  • 2x speed gains on linear algebra calculations that fit in cache
  • Sometimes up to 4x speed gains on small data sets, 2x in large data sets
  • Haswell chips,
  • Data from the year 2013/2014,
  • and similar?

Ansys posts software updates and benchmarks usually in October. You'll get marketing slides until then.
20% speed gains is on Xeon Phis with MCDRAM. LINPACK that fits into cache is basically a throughput benchmark - does not correlate well with real-world applications in most cases. You can get a similar level of performance boost when you write vectorized code to begin with instead of relying on the compiler. Haswell and Broadwell are the most widespread x86 architectures that people use in HPC applications, along with Xeon Phi. They say that beefed up L2 and further improvements in AVX2 gather instructions that were improved upon in Broadwell is a welcome addition, but the victim L3 and lowered L3/core will have significant consequences regarding AVX512 performance.

AVX2 was slower in one of the instances from the CERN pdf. It's very good in array operations but not so when invoking mathematical functions. It is a fact that growth in memory bandwidth has been slower than the growth in throughput in terms of pure GFLOP/s. I can show you papers where even overclocking the uncore has no effect in LINPACK after a certain point, and that point isn't much further from the stock frequency - it even shows a certain decrease in throughput.

These are the realities you face when you attempt to write real-world applications at utilize these ISA to the fullest extent possible.

Even the marketing you provide is something a professional in HPC would laugh at:
http://www.ansys-blog.com/wp-content/uploads/2017/07/image005.png
 
  • Like
Reactions: Drazick

dullard

Elite Member
May 21, 2001
25,055
3,408
126
Even the marketing you provide is something a professional in HPC would laugh at:
http://www.ansys-blog.com/wp-content/uploads/2017/07/image005.png
That is a blog, laugh all you want at it. Their actual marketing is this:
http://www.ansys.com/-/media/Ansys/...hbrief/ab-workstations-for-fea-simulation.pdf
One of the biggest advancements in the recently released Ansys 17.0 simulation software suite is a significantly optimised HPC solver architecture. It is specifically designed to take advantage of new generation Intel processor technologies and large numbers of CPU cores. The biggest benefits should be seen by those using Intel ‘Haswell’ Xeon E5-2600 v3 or Intel ‘Broadwell’ Xeon E5-2600 v4 CPU architectures. Both of these processor families feature new Intel AVX-2 compiler instructions and Intel Math Kernel Libraries that are supported in Ansys 17.0.
Like I said above, until version 18.0 is benchmarked throughly, you can go by the AVX-512 numbers in the blog. Benchmarks take weeks/months to run when you are doing real world work. They don't just come out the minute a chip is launched or a new software version is out.
 

tamz_msc

Diamond Member
Jan 5, 2017
3,773
3,596
136
That is a blog, laugh all you want at it. Their actual marketing is this:
http://www.ansys.com/-/media/Ansys/...hbrief/ab-workstations-for-fea-simulation.pdf

Like I said above, until version 18.0 is benchmarked throughly, you can go by the AVX-512 numbers in the blog. Benchmarks take weeks/months to run when you are doing real world work. They don't just come out the minute a chip is launched or a new software version is out.
And where's the marketing that shows AVX2 performance uplift in actually readable graphs?
 
  • Like
Reactions: Drazick

pj-

Senior member
May 5, 2015
481
249
116
Right so do you have the numbers to back that assertion up? How much does Apple pay Intel for all their desktop, laptop or workstation products? Not to mention the fact that you're counting iphone sales & not overall profits, which obviously would be lowered but to a lesser extent relatively speaking.

Apple's total revenue from macs was $7 billion last quarter. Even if you pretend a quarter of that goes to intel, which is a gross exaggeration, they'd only reclaim a max of $1.75 billion by doing everything internally.

For comparison, the revenue from ipad/iphone last quarter was around $60 billion. I don't think they give the profit margin on individual products, but for the company as a whole they are at around 38%, which means $23 billion in profit. Potential Mac chipset savings are peanuts in comparison. Even if iphone/ipad sales drop by half, it's still a terrible idea to stop using x86 for macs.

Apple dropping intel would only possibly begin to make sense if they basically stopped making money from ipad/iphone and needed to squeeze more margin out of macs. Even then, they'd probably be better off working with AMD to get something semi-custom on the cheap which wouldn't require reworking the entire operating system and every application that runs on it.
 

R0H1T

Platinum Member
Jan 12, 2013
2,582
162
106
Apple's total revenue from macs was $7 billion last quarter. Even if you pretend a quarter of that goes to intel, which is a gross exaggeration, they'd only reclaim a max of $1.75 billion by doing everything internally.

For comparison, the revenue from ipad/iphone last quarter was around $60 billion. I don't think they give the profit margin on individual products, but for the company as a whole they are at around 38%, which means $23 billion in profit. Potential Mac chipset savings are peanuts in comparison. Even if iphone/ipad sales drop by half, it's still a terrible idea to stop using x86 for macs.

Apple dropping intel would only possibly begin to make sense if they basically stopped making money from ipad/iphone and needed to squeeze more margin out of macs. Even then, they'd probably be better off working with AMD to get something semi-custom on the cheap which wouldn't require reworking the entire operating system and every application that runs on it.
I don't have to pretend & it wasn't $7 billion anyway ~ http://www.anandtech.com/show/11325/apple-announces-q2-fy-2017-earnings

As for iPhones+iPad they made just short of $40 billion from these two. Where are you getting your numbers from?
Also I said drop in sales or tank, not fall from a cliff. The decline would be similar to the PC sales but perhaps a bit sharper after more smartphone markets saturate.
 

Edrick

Golden Member
Feb 18, 2010
1,939
230
106
Is it single or double precision? If it's double precision, then it's indeed impressive. Though you you can clearly see why it stresses the memory/cache subsystem - especially when exceeding L2 limits. AIDA64 L3 is what 110-120GB/s on Skylake-X? That is with an uncore overclock. Then with 6-channel memory you achieve almost the same bandwidth on DDR4 alone. MCDRAM exists on Xeon Phis for a reason. Heck GPUs have 4-8X that bandwidth and are still memory bandwidth starved with DP.

That is double precision. Even though I am running quad channel, my RAM is overclocked to 3200mhz which Xeon-SP parts will be unable to match (speed). So I am getting close to 6-channel bandwidth at 2666mhz.

In my opinion, Xeon-SP is going to require 8-channel memory coupled with a larger and faster L3 (possibly MCDRAM as you suggest) cache in order to see the true power of AVX-512. But the potential is there and we should not quickly discount the instructions just yet.
 

Zucker2k

Golden Member
Feb 15, 2006
1,810
1,159
136
Sorry, but you are incorrect. IPC = Instructions per clock as tmaz stated. It is used for "measuring" the architecture. Not how fast it will clock. Too many people use that term incorrectly on these forums.
Yes, I know what IPC means, thank you. The better term is instruction per cycle; or instruction per clock cycle, but I digress. Let me put it this way:

A chip architect must consider 3 important things; ipc, frequency, and power budget. But since the design must be married to a process, he has to take the characteristics of the process into account in his design as well. A low power process may be more suitable to an ipc-heavy design, at the expense of clocks. A high performance process would favor higher frequencies, at the expense of ipc. So what prevents an ipc-heavy design from being ported to a high performance process? Nothing. But power budget may need to be increased. No free lunch here. So, to compare clocks of different chips at a given frequency is to ignore the design decisions necessitated by all four factors at the time of design. This is why I said, it may sound counterintuitive, but clock for clock comparisons are quite disingenuous because they totally ignore the strengths/weaknesses of each design. IMHO, therefore, IPC tests done in the vacuum of "clock for clock" comparisons need not be taken too seriously.
 

pj-

Senior member
May 5, 2015
481
249
116
I don't have to pretend & it wasn't $7 billion anyway ~ http://www.anandtech.com/show/11325/apple-announces-q2-fy-2017-earnings

As for iPhones+iPad they made just short of $40 billion from these two. Where are you getting your numbers from?
Also I said drop in sales or tank, not fall from a cliff. The decline would be similar to the PC sales but perhaps a bit sharper after more smartphone markets saturate.

Whoops I was looking at q1 #s.

I don't understand your line of thinking at all. Maybe some made up numbers would help. How much do you think it costs apple to make a $1500 macbook pro? How much of that do you think goes to intel? How much do you think they would save by making their own CPU? What is the profit margin increase on the average mac?

I don't see any reasonable set of numbers resulting in a profit increase that makes the extreme amount of hardware and software engineering work make sense.
 

tamz_msc

Diamond Member
Jan 5, 2017
3,773
3,596
136
Yes, I know what IPC means, thank you. The better term is instruction per cycle; or instruction per clock cycle, but I digress. Let me put it this way:

A chip architect must consider 3 important things; ipc, frequency, and power budget. But since the design must be married to a process, he has to take the characteristics of the process into account in his design as well. A low power process may be more suitable to an ipc-heavy design, at the expense of clocks. A high performance process would favor higher frequencies, at the expense of ipc. So what prevents an ipc-heavy design from being ported to a high performance process? Nothing. But power budget may need to be increased. No free lunch here. So, to compare clocks of different chips at a given frequency is to ignore the design decisions necessitated by all four factors at the time of design. This is why I said, it may sound counterintuitive, but clock for clock comparisons are quite disingenuous because they totally ignore the strengths/weaknesses of each design. IMHO, therefore, IPC tests done in the vacuum of "clock for clock" comparisons need not be taken too seriously.
The actual term is instruction latency, measured in cycles, which is the analogue of time when it comes to computing. A faster CPU would have lower latency executing a given instruction, hence the more number of instructions that can be run in a given number of cycles. Equalizing the clock frequency is one way of ensuring that you are considering the same number of cycles.
 
  • Like
Reactions: Drazick

tamz_msc

Diamond Member
Jan 5, 2017
3,773
3,596
136
Whoops I was looking at q1 #s.

I don't understand your line of thinking at all. Maybe some made up numbers would help. How much do you think it costs apple to make a $1500 macbook pro? How much of that do you think goes to intel? How much do you think they would save by making their own CPU? What is the profit margin increase on the average mac?

I don't see any reasonable set of numbers resulting in a profit increase that makes the extreme amount of hardware and software engineering work make sense.
You're right. Perhaps Intel is more worried about Qualcomm emulating Windows than Apple ditching their CPUs.
 
  • Like
Reactions: Drazick
Aug 11, 2008
10,451
642
126
Why does every one of these threads devolve into an Intel vs AMD debate?

We should have one thread for the Intel vs AMD stuff, then separate threads to talk about Intel stuff and AMD stuff, respectively.
Wouldn't help. Despite splitting it up, even the general VC & G forum is just as bad. It has just been quiet over there lately because there are no new products to promote/throw mud at.
 

Edrick

Golden Member
Feb 18, 2010
1,939
230
106
A chip architect must consider 3 important things; ipc, frequency, and power budget. But since the design must be married to a process, he has to take the characteristics of the process into account in his design as well. A low power process may be more suitable to an ipc-heavy design, at the expense of clocks. A high performance process would favor higher frequencies, at the expense of ipc. So what prevents an ipc-heavy design from being ported to a high performance process? Nothing. But power budget may need to be increased. No free lunch here. So, to compare clocks of different chips at a given frequency is to ignore the design decisions necessitated by all four factors at the time of design. This is why I said, it may sound counterintuitive, but clock for clock comparisons are quite disingenuous because they totally ignore the strengths/weaknesses of each design. IMHO, therefore, IPC tests done in the vacuum of "clock for clock" comparisons need not be taken too seriously.

I agree with what you stated above. But that is not what you said before (maybe that is what you meant). IPC, frequency, and power budget are 3 separate areas that must be taking into account when evaluating a CPU design. And you are absolutely correct that IPC tests of (clock for clock) comparisons are not the end all be all metric some claim it to be. But, to some of us (especially developers), it is still an important metric.
 

Roger Wilco

Diamond Member
Mar 20, 2017
3,870
5,713
136
As far as Intel's actual customers (Dell, Lenovo, etc) go, they will use up their existing contracts for the 7xx0 chips and then get a contract for the 8xx0 chips.

Intel has no financial need nor desire to keep manufacturing the Kaby Lake chips. (Note: they will do so in limited quantities for legacy purposes). But, Intel only needs us to buy the Coffee Lake chips. As far as Intel is concerned it is Coffee Lake vs Coffee Lake vs Ryzen. Coffee Lake vs Kaby Lake is a problem only for the resellers who have to offload their stock of old chips.

The 7800X is the only chip in your list that is really in competition. But those should be completely different customers who need HEDT features. The 7800X will be a good buy if you need HEDT. If not, get Coffee Lake.

Ah ok. I thought Intel continued producing some of their more recent generations at a decent pace despite being superseded.