Mainstream Intel Core Processors will not support AVX 512 from Skylake – Only Xeon

ShintaiDK

Lifer
Apr 22, 2012
20,378
145
106
Indeed very old. Its actually known since july 2013. But we seem to have an influx of those. Specially if they can add some drama.

Wccftech is nothing but a clickbait site.
 
Last edited:

witeken

Diamond Member
Dec 25, 2013
3,899
193
106
Yeah, but I wonder if or what will be different about SKL compared to HSW to be excited about.

Ofcourse when I say mainstream processors wont have AVX 512, what I really mean is that it will be disabled. They will only be enabled on the Skylake SKUs on the Xeon platform. So it looks like the new iteration of Intel’s offerings will not have any significant new instruction set, mostly all the old stuff. Yet we have word that it is going to be one hell of an architecture to look out for, because for the first time in many years, Intel is simply refusing to divulge the slightest (information even under NDA). This ‘above top secret’ attitude seems out of place since the process was already introduced with Broadwell and is supposed to be just a ‘Haswell-equivalent’ for Broadwell. Something that definitely appears to not be the case.
The level of secrecy Intel is maintaining makes it very clear that they are bringing something brand new with the Skylake uarch.
 

Ken g6

Programming Moderator, Elite Member
Moderator
Dec 11, 1999
16,648
4,590
75
If AVX 512 will only be on Xeon, my next question is whether there will by Skylake Xeons that will fit in standard consumer mobos that will support it? Like the Xeon E3 series?
 

mikk

Diamond Member
May 15, 2012
4,292
2,382
136
Yes this is known since many many months. Nothing new there. The only shock is that wccftech didn't know it, it just proves that they are noobish.
 

Ken g6

Programming Moderator, Elite Member
Moderator
Dec 11, 1999
16,648
4,590
75
Google answered my question, including [thread=2405871]in these forums[/thread] and in another wccftech article.

Xeon E3-1200 v5 will be based on the LGA 1151 socket and feature GT2 graphics with a RAM limit of 64GB (both DDR3 and DDR4 are supported). This is twice as much as its broadwell counterpart.
 

mikk

Diamond Member
May 15, 2012
4,292
2,382
136
Highend Xeon only afaik. I don't expect AVX512 in Xeon E3 either.
 

III-V

Senior member
Oct 12, 2014
678
1
41
Yeah, but I wonder if or what will be different about SKL compared to HSW to be excited about.
The level of secrecy Intel is maintaining makes it very clear that they are bringing something brand new with the Skylake uarch.
I don't think Skylake will be all that great, from what I've pieced together. If Geekbench is accurate, and cache sizes are staying the same, Skylake won't be that big of a deal for performance, at least compared to what it could be. We may have to wait until Icelake to get "modernized" cache sizes.

IBM and Apple have already moved to 64KB L1 caches. AMD had them quite some time ago, and I can't help but think that ditching them hurt their ST performance.

On the other hand, preliminary scores from Geekbench look favorable, particularly for MT. This may suggest that MorphCore actually did end up in Skylake, but I do not believe this to be the case. If MorphCore were implemented, and correctly reported, Skylake would be a 4 core, 32 thread device, not a 4 core, 8 thread device as reported by Geekbench. Again, this may just be a reporting error, but I think MT scores would be much higher. Another possibility is that the SKU on Geekbench is not a fully-enabled variant -- perhaps MorphCore is only enabled on i7s or Xeons -- just food for thought.

More likely, the boost in MT is a result of moving to a "tiled" architecture, where cores in groups of twos share their L2, and a result of revamping inter-SoC communication (2D mesh instead of ring bus, as reported by Knight's Landing rumors, and confirmed as "plausible" by David Kanter). Silvermont already does this, as do Bulldozer-variants and Bobcat/Jaguar/Puma.

I averaged together all of the scores reported by Geekbench, sans memory scores, and managed to get an average of 14% improvement for integer (both ST and MT), 9% for ST (INT + FP), and 22% for MT, comparing the Skylake core @ 2.6 GHz vs. Haswell @ 4.0 GHz.

There are an enormous number of caveats that apply though. We don't know what is enabled and disabled on the sample, we don't know what boost clocks the sample has, we don't know the TDP, the OS is not constant, BIOS is not constant, different motherboards, AES scores abnormally low on Skylake, Skylake will likely go through another stepping before it releases, Geekbench is not exactly applicable to desktop workloads (much better for tablet/smartphones, though)... the list goes on.

But if I had to guess, it'll be a bigger increase than Haswell was by a fair margin -- Haswell was about 10% better per-clock, Skylake will probably be about 10-15% ST, 15-30% MT.

I am still worried about 14 nm's performance at the higher end of the frequency spectrum, though. Intel's 14 nm has better subthreshold slopes, but significantly higher DIBL than their 22 nm process. They have better saturation currents at a given Ioff, but only at the 0.7 V they report, and I suspect at higher voltages, 14 nm will fall behind 22 nm, just as 22 nm fell behind 32 nm. But, according to Intel's 14 nm paper, 14 nm's dielectric is more resilient than 22 nm's and they have less variation -- it seems 14 nm can be overvolted higher than 22 nm does, which would be interesting if my interpretation is correct -- however it would need this extra voltage anyway, since it is less sensitive to voltage scaling as I pointed out with the higher DIBL values. I should probably ask Idontcare for his interpretation... I don't fully understand everything I'm looking at.
 
Mar 10, 2006
11,715
2,012
126
I don't think Skylake will be all that great, from what I've pieced together. If Geekbench is accurate, and cache sizes are staying the same, Skylake won't be that big of a deal for performance, at least compared to what it could be. We may have to wait until Icelake to get "modernized" cache sizes.

IBM and Apple have already moved to 64KB L1 caches. AMD had them quite some time ago, and I can't help but think that ditching them hurt their ST performance.

With caches, everything's a trade-off. A larger cache implies a higher latency cache, so it's not an automatic "win" to double the size of the cache.
 
Aug 11, 2008
10,451
642
126
I don't think Skylake will be all that great, from what I've pieced together. If Geekbench is accurate, and cache sizes are staying the same, Skylake won't be that big of a deal for performance, at least compared to what it could be. We may have to wait until Icelake to get "modernized" cache sizes.

IBM and Apple have already moved to 64KB L1 caches. AMD had them quite some time ago, and I can't help but think that ditching them hurt their ST performance.

On the other hand, preliminary scores from Geekbench look favorable, particularly for MT. This may suggest that MorphCore actually did end up in Skylake, but I do not believe this to be the case. If MorphCore were implemented, and correctly reported, Skylake would be a 4 core, 32 thread device, not a 4 core, 8 thread device as reported by Geekbench. Again, this may just be a reporting error, but I think MT scores would be much higher. Another possibility is that the SKU on Geekbench is not a fully-enabled variant -- perhaps MorphCore is only enabled on i7s or Xeons -- just food for thought.

More likely, the boost in MT is a result of moving to a "tiled" architecture, where cores in groups of twos share their L2, and a result of revamping inter-SoC communication (2D mesh instead of ring bus, as reported by Knight's Landing rumors, and confirmed as "plausible" by David Kanter). Silvermont already does this, as do Bulldozer-variants and Bobcat/Jaguar/Puma.

I averaged together all of the scores reported by Geekbench, sans memory scores, and managed to get an average of 14% improvement for integer (both ST and MT), 9% for ST (INT + FP), and 22% for MT, comparing the Skylake core @ 2.6 GHz vs. Haswell @ 4.0 GHz.

There are an enormous number of caveats that apply though. We don't know what is enabled and disabled on the sample, we don't know what boost clocks the sample has, we don't know the TDP, the OS is not constant, BIOS is not constant, different motherboards, AES scores abnormally low on Skylake, Skylake will likely go through another stepping before it releases, Geekbench is not exactly applicable to desktop workloads (much better for tablet/smartphones, though)... the list goes on.

But if I had to guess, it'll be a bigger increase than Haswell was by a fair margin -- Haswell was about 10% better per-clock, Skylake will probably be about 10-15% ST, 15-30% MT.

I am still worried about 14 nm's performance at the higher end of the frequency spectrum, though. Intel's 14 nm has better subthreshold slopes, but significantly higher DIBL than their 22 nm process. They have better saturation currents at a given Ioff, but only at the 0.7 V they report, and I suspect at higher voltages, 14 nm will fall behind 22 nm, just as 22 nm fell behind 32 nm. But, according to Intel's 14 nm paper, 14 nm's dielectric is more resilient than 22 nm's and they have less variation -- it seems 14 nm can be overvolted higher than 22 nm does, which would be interesting if my interpretation is correct -- however it would need this extra voltage anyway, since it is less sensitive to voltage scaling as I pointed out with the higher DIBL values. I should probably ask Idontcare for his interpretation... I don't fully understand everything I'm looking at.

Taking somewhat intermediate values of your ranges, 12% ST and 20% MT, I think would be a very good improvement, since all the easy gains have been made already. I am talking about bottom line performance, combination of clockspeed and IPC. Hopefully we wont have a case like Kaveri where IPC gains were pretty much negated by lower clockspeeds, but I fear this could be a possibility for Skylake. I know some are going to argue that it is not needed, but *someday* Intel is going to have to break down and make a mainstream hex core if they want to keep increasing performance of anything but the igpu. I think it depends on Zen actually. If it is the great equalizer AMD fans are touting, perhaps it will motivate Intel to make 6 cores mainstream, or at least make hyperthreading more available. It it is mediocre or just OK, and has to compete on price only like their current lineup, then intel can continue as they are.
 

Roland00Address

Platinum Member
Dec 17, 2008
2,196
260
126
Well I guess I am noobish for I did not hear about this till now

And this frustrates me to no end. Why does this frustrate me? Because intel is crippling what could be useful software tools before it becomes mainstream and thus make it harder for software designers to justify the work on adding it in marginal cases.

Sure most consumers do not use floating point heavily in their current software but how do you expect them to use such software when you cripple it from the beginning before such software is made.
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
145
106
Well I guess I am noobish for I did not hear about this till now

And this frustrates me to no end. Why does this frustrate me? Because intel is crippling what could be useful software tools before it becomes mainstream and thus make it harder for software designers to justify the work on adding it in marginal cases.

Sure most consumers do not use floating point heavily in their current software but how do you expect them to use such software when you cripple it from the beginning before such software is made.

There are 2 sides of it.

Cripple and TDP.

An example is Haswell when running AVX2. We already know Xeons run AVX2 at a lower speed to compensate for the higher powerdraw. Shifting to AVX512 without a node shrink to compensate may not be the direction the 99% crowd wants. Aka lower TDP. Its great that Haswell is ~80% faster than IB when running AVX2, but there is also a downside with the powerdraw. I would first be dissapointed if Cannonlake with its shrink didnt introduce AVX512 to the mainstream.

In terms of the Celerons and Pentiums however, its nothing but crippling.
 
Last edited:

mikk

Diamond Member
May 15, 2012
4,292
2,382
136
Yes, AVX and AVX2 is widely used.

At least 2 games even got dedicated AVX executables that differs featurewise from regular. Grid 2 and Dirt Showdown.


Widely used not really. And even if it is used in consumer software like x264 or Grid 2 AVX.exe the gain can be very tiny. Less than 5% in x264 because it is mostly non-SIMD assembly code. Same for Grid 2 AVX.
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
145
106
Widely used not really. And even if it is used in consumer software like x264 or Grid 2 AVX.exe the gain can be very tiny. Less than 5% in x264 because it is mostly non-SIMD assembly code. Same for Grid 2 AVX.

It doesnt matter if it only gives 0.1% or not. If its used its used.

Less than 5% for example is still more than 0%.
 

SAAA

Senior member
May 14, 2014
541
126
116
AVX512 disabled on mainstrem could be the reason why Haifa team wasn't very happy with the finalization of this tock... working hard to get that done and then having it cut out on most chips just for marketing reasons sounds so silly...
At least the users who would benefit the most are going to have it in Xeons but this really sounds like a trick to sell higher margin chips.
 

III-V

Senior member
Oct 12, 2014
678
1
41
With caches, everything's a trade-off. A larger cache implies a higher latency cache, so it's not an automatic "win" to double the size of the cache.
Yeah, but I'd think with modern workloads, we'd benefit from larger cache sizes. Anand wrote an article on Nehalem that highlighted that there are some on Intel's arch team that wanted a larger L3 on Nehalem, and Anand himself thought 256 KB on the L2 was too small. Given that the cache sizes have stayed put, but software has not over 6-7 years... I'd think that expanding the cache sizes would be of good use at this point.

Of course, I don't have any actual data on the matter -- I could be completely wrong, and "general workloads" may not have shifted to benefit more from larger cache sizes.

You can make larger caches that have low latency however -- IBM's Power8 has a 64KB L1D with a 3-cycle latency, compared to Intel's 32 KB L1D with a 4-cycle latency -- the tradeoff being that Power8's L1D consumes an inordinate amount of power.
 
Last edited:

SAAA

Senior member
May 14, 2014
541
126
116
I bet Skylake-S WILL support AVX-512 ...

But not Skylake-K of course. Dang this sounds so Intelish that they might pull it out.

What else they didn't implement before, TSX? Beside it being borked the feature was disabled in -k chips.

I imagine that after all the fuss when that extension was disabled they prefer to wait and see if everything works a bit longer this time, rather than have it used in mainstream only to backfire when say someone's cpu burns when running an heavy load with it.
You know twice the theoretical performance... wouldn't that make it a bit hotter than the already infernal AVX2? XD