Intel Skylake / Kaby Lake

blue11 · May 24, 2017

Bouowmx said:
Up to doubling of throughput of AVX-512 will be reflected in real-world apps, or Cinebench, when the app's time-critical sections are updated to use AVX-512 instructions.

Cinebench doesn't even use AVX, let alone AVX-512. You can see this from all the Zen articles, which showed the 1800X above the 6900K in Cinebench.

dullard · May 24, 2017

Edrick said:
If Intel starts segmenting CPUs via AVX-512, I would really be pissed off. So the 7800 series would have fewer cores, fewer pcie lanes and gimped avx-512?

Compared to what?

The 6800k vs 7800X comparison:

7800X Cores: 6, 6800k Cores: 6, Same number, not fewer
7800X PCI lanes: 28, 6800k PCI lanes: 28, Same number, not fewer
7800X AVX-512: ?, 6800k AVX-512: No, Probably the same

The 7700k vs 7800X comparison:

7800X Cores: 6, 7700k Cores: 4, More, not fewer
7800X PCI lanes: 28, 7700k PCI lanes: 16, More, not fewer
7800X AVX-512: ?, 7700k AVX-512: No - Probably the same

So on the three things you mention, the 7800 is the same as the chip it is replacing and better than the chip under it.

Timmah! · May 24, 2017

Bouowmx said:
Up to doubling of throughput of AVX-512 will be reflected in real-world apps, or Cinebench, when the app's time-critical sections are updated to use AVX-512 instructions.

Thank you. But how likely is that to happen within say the next year from now (before another Intel HEDT generation hits the market)? Is that update difficult to do? I have no clue about this stuff, so i have to wonder...

blue11 · May 24, 2017

AVX-512 instructions have been in existence since 2013 (Xeon Phi). If applications are not using them by now, they never will (because they do not need it). Basically, everyone that needs AVX-512 already knows they need it.

jpiniero · May 24, 2017

blue11 said:
AVX-512 instructions have been in existence since 2013 (Xeon Phi). If applications are not using them by now, they never will (because they do not need it). Basically, everyone that needs AVX-512 already knows they need it.

The Phi really hasn't gotten much traction beyond custom developed code for HPC. We'll have to see if Xeons getting it will get more usage out of it.

blue11 · May 24, 2017

jpiniero said:
The Phi really hasn't gotten much traction beyond custom developed code for HPC. We'll have to see if Xeons getting it will get more usage out of it.

AVX-512 reaching servers and HEDT (?) isn't exactly going to drive huge consumer demand for it, so it's still only going to be appearing in "custom developed code." The instruction set is likely there for customers running the same HPC applications, who aren't ready to deploy a Phi cluster. The use case Intel is targeting is probably either those whose volume doesn't justify a separate Phi installation, or those with legacy code. The idea is that when the user has enough HPC activity to need Phi, they can migrate their code without rewriting it.

Bouowmx · May 24, 2017

One pertinent application I know of is video encoding: x265 with AVX-512 soon™.

Difficultly could be low: compile again. Difficultly could be high: writing assembly.
Speed up could be none in the case of just compiling again: code couldn't be converted to properly use higher width.

blue11 · May 24, 2017

Bouowmx said:
One pertinent application I know of is video encoding: x265 with AVX-512 soon™.

Difficultly could be low: compile again. Difficultly could be high: writing assembly.
Speed up could be none in the case of just compiling again: code couldn't be converted to properly use higher width.

For video codecs, the available vectorization is determined in large part by codec details. A 512-bit register is 32-wide in 16-bit elements and 64-wide in 8-bit elements, whereas the majority of the codec block activity is going to be 16x16 or smaller. HEVC presents a few more opportunities than AVC, since it has 32x32 transforms and 64x64 motion compensation, but those block sizes are going to be used infrequently. Even AVX only yields 20-30% performance gain in x265, so I would expect AVX-512 to make even less difference, probably under 10%.

In general, it's hard to find 64 independent things to do, which is what makes wider vector instructions difficult to apply to common workloads. It's particularly bad for video and image processing, since they operate on the smaller 8 or 16-bit word sizes, instead of the more manageable 64-bit words used in HPC.

Edrick · May 24, 2017

dullard said:
Compared to what?
So on the three things you mention, the 7800 is the same as the chip it is replacing and better than the chip under it.

Sorry, comparing the 7800 series to the 7900 series

Arachnotronic · May 24, 2017

jpiniero said:
Icelake would make sense since they could just fuse in a future revision to Alpine Ridge.

'Course this is going make it much easier on Apple to use non-Intel processors.

Not really, they could have purchased discrete Thunderbolt controllers if they wanted.

jpiniero · May 25, 2017

Arachnotronic said:
Not really, they could have purchased discrete Thunderbolt controllers if they wanted.

I can't imagine (well at least until now) that Intel would have sold controllers for a non-Intel machine though.

mikk · May 25, 2017

Kabylake R i7-8550U: http://ranker.sisoftware.net/show_d...a19cae88e0dde8ceb68bba9cf99ca191b7c4f9c1&l=en

There is a small name change for the GPU: UHD Graphics 620

http://ranker.sisoftware.net/show_d...ccf1c0e68eb386a0d8e5d4f297f2cfffd9aa97af&l=en

Jan Olšan · May 25, 2017

Bouowmx said:
One pertinent application I know of is video encoding: x265 with AVX-512 soon™.

Difficultly could be low: compile again. Difficultly could be high: writing assembly.
Speed up could be none in the case of just compiling again: code couldn't be converted to properly use higher width.

Video codecs are the second case, those need hand-written assembly, because autovectorization doesn't work for them and would produce bad code anyway, likely (meaning that the AVX2 code that is hand-written would likely beat it particularly if avx-512 reduces the clock of the core).

mikk · May 25, 2017

blue11 said:
For video codecs, the available vectorization is determined in large part by codec details. A 512-bit register is 32-wide in 16-bit elements and 64-wide in 8-bit elements, whereas the majority of the codec block activity is going to be 16x16 or smaller. HEVC presents a few more opportunities than AVC, since it has 32x32 transforms and 64x64 motion compensation, but those block sizes are going to be used infrequently. Even AVX only yields 20-30% performance gain in x265, so I would expect AVX-512 to make even less difference, probably under 10%.

AVX gains can be very tiny even with x265, but this might depend on the video and settings. Not long ago I've compared AVX2 and non AVX x265 binaries and the difference on my i7-7700k was not more than 2-3%. Imho x265 is not a showcase for AVX performance.

itsmydamnation · May 25, 2017

I hear LINPACK is bring out a new game

Threadcrapping and trolling are not allowed.
Markfw
Anandtech Moderator

blue11 · May 25, 2017

mikk said:
AVX gains can be very tiny even with x265, but this might depend on the video and settings. Not long ago I've compared AVX2 and non AVX x265 binaries and the difference on my i7-7700k was not more than 2-3%. Imho x265 is not a showcase for AVX performance.

I downloaded today's build from builds.x265.eu and ran some tests on a Haswell processor. All encodes are at 1080p.

x265-10bit --preset veryslow:
x265 [info]: HEVC encoder version 2.4+26-355cf3582263
x265 [info]: build info [Windows][GCC 5.3.1][64 bit] 10bit
x265 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX AVX2 FMA3 LZCNT BMI2
encoded 300 frames in 291.83s (1.03 fps), 1924.72 kb/s, Avg QP:30.30

x265-10bit --preset veryslow --asm SSE4.2,FMA3,FMA4,BMI:
x265 [info]: HEVC encoder version 2.4+26-355cf3582263
x265 [info]: build info [Windows][GCC 5.3.1][64 bit] 10bit
x265 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA4 FMA3 BMI2
encoded 300 frames in 406.18s (0.74 fps), 1924.72 kb/s, Avg QP:30.30

AVX, i.e. 256-bit vector unit, increased the performance by 40%, which is more than reviewers showed (20%), since they only ever use the fast presets that do very little analysis. These results are consistent with findings on the Doom9 forum. Perhaps the "AVX2 and non AVX" binaries you downloaded were referring to compiler optimizations, which would not be expected to affect the performance of assembly code.

Valantar · May 25, 2017

Am I the only one a little put off by the 175W TDP in the Sandra listing for the 7900X? I mean, not that it wouldn't require a lot more power to push .5GHz more than the previous 10c20t CPU on the same process, but 175W? That's approaching the limit for what's possible to cool at all without water cooling. And then there's the 12c24t one ...

wildhorse2k · May 25, 2017

Valantar said:
Am I the only one a little put off by the 175W TDP in the Sandra listing for the 7900X? I mean, not that it wouldn't require a lot more power to push .5GHz more than the previous 10c20t CPU on the same process, but 175W? That's approaching the limit for what's possible to cool at all without water cooling. And then there's the 12c24t one ...

I on the contrary perfer high TDP CPUs from manufacturer, because I will use water cooling anyway and high TDP means I don't really need to OC it. I get the same performance as if Intel shipped it as 140W and I OCed to 175W. It also demonstrates Intel is confident these CPUs can run at that TDP for a long time without negative effects.

blue11 · May 25, 2017

Valantar said:
Am I the only one a little put off by the 175W TDP in the Sandra listing for the 7900X? I mean, not that it wouldn't require a lot more power to push .5GHz more than the previous 10c20t CPU on the same process, but 175W? That's approaching the limit for what's possible to cool at all without water cooling. And then there's the 12c24t one ...

SiSoft power numbers are not measurements, so you can't infer anything about how much power will actually be drawn from them. If the SiSoft figures reflect the final SKU, the 7900X will actually be running more efficiently than the 6950X, since the frequency will be 33% higher (4.0 vs 3.0) for only 25% more power (175 vs 140). You could just underclock the CPU back to 3.0 GHz if your cooler can't handle 175 W, but any tower cooler should be good for up to 200 W.

LTC8K6 · May 25, 2017

Valantar said:
Am I the only one a little put off by the 175W TDP in the Sandra listing for the 7900X? I mean, not that it wouldn't require a lot more power to push .5GHz more than the previous 10c20t CPU on the same process, but 175W? That's approaching the limit for what's possible to cool at all without water cooling. And then there's the 12c24t one ...

What's surprising is that all of the leaks prior to the Sisoft listing have said that KL-X is 112W and SL-X is 140W. Never a mention of 175W.

scannall · May 25, 2017

LTC8K6 said:
What's surprising is that all of the leaks prior to the Sisoft listing have said that KL-X is 112W and SL-X is 140W. Never a mention of 175W.

At this point you really can't tell the fakes from the real. And it really is rather pointless until the product is actually shipping and available to everyone. Some wild stuff flying around these threads.

LTC8K6 · May 25, 2017

https://twitter.com/WikiChip/status/863874789622022145

Shows 160W for the 12C chip.

Jan Olšan · May 25, 2017

blue11 said:
I downloaded today's build from builds.x265.eu and ran some tests on a Haswell processor. All encodes are at 1080p. (...)
(...)
encoded 300 frames in 291.83s (1.03 fps), 1924.72 kb/s, Avg QP:30.30
(...)
encoded 300 frames in 406.18s (0.74 fps), 1924.72 kb/s, Avg QP:30.30
(...)
AVX, i.e. 256-bit vector unit, increased the performance by 50%, which is more than reviewers showed (20%), since they only ever use the fast presets that do very little analysis.(...)

That's 39% increase, not 50 %. Still a lot though.

crashtech · May 25, 2017

Jan Olšan said:
That's 39% increase, not 50 %. Still a lot though.

I noticed that too, but I think the delta from Non-AVX to AVX2 is being referenced, although the non-AVX result is not shown.

blue11 · May 25, 2017

Jan Olšan said:
That's 39% increase, not 50 %. Still a lot though.

I corrected the statement. In other news, AVX-512 kernels were added to x264 a few days ago, but I wouldn't expect any miracles.

Intel Skylake / Kaby Lake

Member

Elite Member

Golden Member

Member

Lifer

Member

Golden Member

Member

Golden Member

Lifer

Lifer

Diamond Member

Senior member

Diamond Member

Diamond Member

Member

Golden Member

Member

Member

Lifer

Golden Member

Lifer

Senior member

Lifer

Member