Intel Skylake / Kaby Lake

Page 430 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

blue11

Member
May 11, 2017
151
77
51
Up to doubling of throughput of AVX-512 will be reflected in real-world apps, or Cinebench, when the app's time-critical sections are updated to use AVX-512 instructions.
Cinebench doesn't even use AVX, let alone AVX-512. You can see this from all the Zen articles, which showed the 1800X above the 6900K in Cinebench.
 

dullard

Elite Member
May 21, 2001
25,054
3,408
126
If Intel starts segmenting CPUs via AVX-512, I would really be pissed off. So the 7800 series would have fewer cores, fewer pcie lanes and gimped avx-512?
Compared to what?

The 6800k vs 7800X comparison:
  • 7800X Cores: 6, 6800k Cores: 6, Same number, not fewer
  • 7800X PCI lanes: 28, 6800k PCI lanes: 28, Same number, not fewer
  • 7800X AVX-512: ?, 6800k AVX-512: No, Probably the same
The 7700k vs 7800X comparison:
  • 7800X Cores: 6, 7700k Cores: 4, More, not fewer
  • 7800X PCI lanes: 28, 7700k PCI lanes: 16, More, not fewer
  • 7800X AVX-512: ?, 7700k AVX-512: No - Probably the same
So on the three things you mention, the 7800 is the same as the chip it is replacing and better than the chip under it.
 
  • Like
Reactions: Drazick

Timmah!

Golden Member
Jul 24, 2010
1,417
630
136
Up to doubling of throughput of AVX-512 will be reflected in real-world apps, or Cinebench, when the app's time-critical sections are updated to use AVX-512 instructions.

Thank you. But how likely is that to happen within say the next year from now (before another Intel HEDT generation hits the market)? Is that update difficult to do? I have no clue about this stuff, so i have to wonder...
 

blue11

Member
May 11, 2017
151
77
51
AVX-512 instructions have been in existence since 2013 (Xeon Phi). If applications are not using them by now, they never will (because they do not need it). Basically, everyone that needs AVX-512 already knows they need it.
 
Last edited:

jpiniero

Lifer
Oct 1, 2010
14,583
5,204
136
AVX-512 instructions have been in existence since 2013 (Xeon Phi). If applications are not using them by now, they never will (because they do not need it). Basically, everyone that needs AVX-512 already knows they need it.

The Phi really hasn't gotten much traction beyond custom developed code for HPC. We'll have to see if Xeons getting it will get more usage out of it.
 

blue11

Member
May 11, 2017
151
77
51
The Phi really hasn't gotten much traction beyond custom developed code for HPC. We'll have to see if Xeons getting it will get more usage out of it.
AVX-512 reaching servers and HEDT (?) isn't exactly going to drive huge consumer demand for it, so it's still only going to be appearing in "custom developed code." The instruction set is likely there for customers running the same HPC applications, who aren't ready to deploy a Phi cluster. The use case Intel is targeting is probably either those whose volume doesn't justify a separate Phi installation, or those with legacy code. The idea is that when the user has enough HPC activity to need Phi, they can migrate their code without rewriting it.
 
Last edited:

Bouowmx

Golden Member
Nov 13, 2016
1,138
550
146
One pertinent application I know of is video encoding: x265 with AVX-512 soon™.

Difficultly could be low: compile again. Difficultly could be high: writing assembly.
Speed up could be none in the case of just compiling again: code couldn't be converted to properly use higher width.
 

blue11

Member
May 11, 2017
151
77
51
One pertinent application I know of is video encoding: x265 with AVX-512 soon™.

Difficultly could be low: compile again. Difficultly could be high: writing assembly.
Speed up could be none in the case of just compiling again: code couldn't be converted to properly use higher width.
For video codecs, the available vectorization is determined in large part by codec details. A 512-bit register is 32-wide in 16-bit elements and 64-wide in 8-bit elements, whereas the majority of the codec block activity is going to be 16x16 or smaller. HEVC presents a few more opportunities than AVC, since it has 32x32 transforms and 64x64 motion compensation, but those block sizes are going to be used infrequently. Even AVX only yields 20-30% performance gain in x265, so I would expect AVX-512 to make even less difference, probably under 10%.

In general, it's hard to find 64 independent things to do, which is what makes wider vector instructions difficult to apply to common workloads. It's particularly bad for video and image processing, since they operate on the smaller 8 or 16-bit word sizes, instead of the more manageable 64-bit words used in HPC.
 
Last edited:
  • Like
Reactions: krumme and tamz_msc
Mar 10, 2006
11,715
2,012
126
Icelake would make sense since they could just fuse in a future revision to Alpine Ridge.

'Course this is going make it much easier on Apple to use non-Intel processors.

Not really, they could have purchased discrete Thunderbolt controllers if they wanted.
 

Jan Olšan

Senior member
Jan 12, 2017
278
297
136
One pertinent application I know of is video encoding: x265 with AVX-512 soon™.

Difficultly could be low: compile again. Difficultly could be high: writing assembly.
Speed up could be none in the case of just compiling again: code couldn't be converted to properly use higher width.

Video codecs are the second case, those need hand-written assembly, because autovectorization doesn't work for them and would produce bad code anyway, likely (meaning that the AVX2 code that is hand-written would likely beat it particularly if avx-512 reduces the clock of the core).
 
  • Like
Reactions: Drazick

mikk

Diamond Member
May 15, 2012
4,133
2,136
136
For video codecs, the available vectorization is determined in large part by codec details. A 512-bit register is 32-wide in 16-bit elements and 64-wide in 8-bit elements, whereas the majority of the codec block activity is going to be 16x16 or smaller. HEVC presents a few more opportunities than AVC, since it has 32x32 transforms and 64x64 motion compensation, but those block sizes are going to be used infrequently. Even AVX only yields 20-30% performance gain in x265, so I would expect AVX-512 to make even less difference, probably under 10%.

AVX gains can be very tiny even with x265, but this might depend on the video and settings. Not long ago I've compared AVX2 and non AVX x265 binaries and the difference on my i7-7700k was not more than 2-3%. Imho x265 is not a showcase for AVX performance.
 

itsmydamnation

Platinum Member
Feb 6, 2011
2,762
3,131
136
I hear LINPACK is bring out a new game :D

Threadcrapping and trolling are not allowed.
Markfw
Anandtech Moderator
 
Last edited by a moderator:
  • Like
Reactions: tential

blue11

Member
May 11, 2017
151
77
51
AVX gains can be very tiny even with x265, but this might depend on the video and settings. Not long ago I've compared AVX2 and non AVX x265 binaries and the difference on my i7-7700k was not more than 2-3%. Imho x265 is not a showcase for AVX performance.
I downloaded today's build from builds.x265.eu and ran some tests on a Haswell processor. All encodes are at 1080p.

x265-10bit --preset veryslow:
x265 [info]: HEVC encoder version 2.4+26-355cf3582263
x265 [info]: build info [Windows][GCC 5.3.1][64 bit] 10bit
x265 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX AVX2 FMA3 LZCNT BMI2
encoded 300 frames in 291.83s (1.03 fps), 1924.72 kb/s, Avg QP:30.30

x265-10bit --preset veryslow --asm SSE4.2,FMA3,FMA4,BMI:
x265 [info]: HEVC encoder version 2.4+26-355cf3582263
x265 [info]: build info [Windows][GCC 5.3.1][64 bit] 10bit
x265 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA4 FMA3 BMI2
encoded 300 frames in 406.18s (0.74 fps), 1924.72 kb/s, Avg QP:30.30

AVX, i.e. 256-bit vector unit, increased the performance by 40%, which is more than reviewers showed (20%), since they only ever use the fast presets that do very little analysis. These results are consistent with findings on the Doom9 forum. Perhaps the "AVX2 and non AVX" binaries you downloaded were referring to compiler optimizations, which would not be expected to affect the performance of assembly code.
 
Last edited:

Valantar

Golden Member
Aug 26, 2014
1,792
508
136
Am I the only one a little put off by the 175W TDP in the Sandra listing for the 7900X? I mean, not that it wouldn't require a lot more power to push .5GHz more than the previous 10c20t CPU on the same process, but 175W? That's approaching the limit for what's possible to cool at all without water cooling. And then there's the 12c24t one ...
 

wildhorse2k

Member
May 12, 2017
180
83
71
Am I the only one a little put off by the 175W TDP in the Sandra listing for the 7900X? I mean, not that it wouldn't require a lot more power to push .5GHz more than the previous 10c20t CPU on the same process, but 175W? That's approaching the limit for what's possible to cool at all without water cooling. And then there's the 12c24t one ...

I on the contrary perfer high TDP CPUs from manufacturer, because I will use water cooling anyway and high TDP means I don't really need to OC it. I get the same performance as if Intel shipped it as 140W and I OCed to 175W. It also demonstrates Intel is confident these CPUs can run at that TDP for a long time without negative effects.
 

blue11

Member
May 11, 2017
151
77
51
Am I the only one a little put off by the 175W TDP in the Sandra listing for the 7900X? I mean, not that it wouldn't require a lot more power to push .5GHz more than the previous 10c20t CPU on the same process, but 175W? That's approaching the limit for what's possible to cool at all without water cooling. And then there's the 12c24t one ...
SiSoft power numbers are not measurements, so you can't infer anything about how much power will actually be drawn from them. If the SiSoft figures reflect the final SKU, the 7900X will actually be running more efficiently than the 6950X, since the frequency will be 33% higher (4.0 vs 3.0) for only 25% more power (175 vs 140). You could just underclock the CPU back to 3.0 GHz if your cooler can't handle 175 W, but any tower cooler should be good for up to 200 W.
 
Last edited:

LTC8K6

Lifer
Mar 10, 2004
28,520
1,575
126
Am I the only one a little put off by the 175W TDP in the Sandra listing for the 7900X? I mean, not that it wouldn't require a lot more power to push .5GHz more than the previous 10c20t CPU on the same process, but 175W? That's approaching the limit for what's possible to cool at all without water cooling. And then there's the 12c24t one ...
What's surprising is that all of the leaks prior to the Sisoft listing have said that KL-X is 112W and SL-X is 140W. Never a mention of 175W.
 
  • Like
Reactions: Drazick

scannall

Golden Member
Jan 1, 2012
1,946
1,638
136
What's surprising is that all of the leaks prior to the Sisoft listing have said that KL-X is 112W and SL-X is 140W. Never a mention of 175W.
At this point you really can't tell the fakes from the real. And it really is rather pointless until the product is actually shipping and available to everyone. Some wild stuff flying around these threads.
 
  • Like
Reactions: Ajay

Jan Olšan

Senior member
Jan 12, 2017
278
297
136
I downloaded today's build from builds.x265.eu and ran some tests on a Haswell processor. All encodes are at 1080p. (...)
(...)
encoded 300 frames in 291.83s (1.03 fps), 1924.72 kb/s, Avg QP:30.30
(...)
encoded 300 frames in 406.18s (0.74 fps), 1924.72 kb/s, Avg QP:30.30
(...)
AVX, i.e. 256-bit vector unit, increased the performance by 50%, which is more than reviewers showed (20%), since they only ever use the fast presets that do very little analysis.(...)

That's 39% increase, not 50 %. Still a lot though.
 
  • Like
Reactions: pcp7 and Drazick