Intel Skylake / Kaby Lake

shortylickens · Apr 30, 2017

OK, so my 7700k is doing what I want, which of course is improving framerates in some games that previously lagged a bit.
Fallout 4 is much smoother in the downtown area, as is Subnautica anywhere on the map, and a few others that were too much for the i3. Desktop stuff like large photo work has not improved noticeably, but then I never benchmarked those kinds of things.
Its running very hot even with a Cooler Master Hyper TX3. And the fans are always loud. I may need to look at water cooled solutions.

Markfw · May 1, 2017

shortylickens said:
OK, so my 7700k is doing what I want, which of course is improving framerates in some games that previously lagged a bit.
Fallout 4 is much smoother in the downtown area, as is Subnautica anywhere on the map, and a few others that were too much for the i3. Desktop stuff like large photo work has not improved noticeably, but then I never benchmarked those kinds of things.
Its running very hot even with a Cooler Master Hyper TX3. And the fans are always loud. I may need to look at water cooled solutions.

Why are you posting in this thread ? Its about Skylake / Kaby Lake. I am confused

imported_ats · May 1, 2017

xdfg said:
Cost determines the lower bound of price, and based on superior MCM scaling, we can already see that floor is extremely low. On the other hand, how much room does Intel have to play with prices on their ridiculous monolothic dies?

Pretty far, realistically you are looking at a production cost in the range of ~$100 just based on napkin math. Honestly, the production cost is likely significantly less than that now with the fabs pretty much fully amortized.

You can go only a few pages back to see the evidence from Markfw, or just one thread over for the Stilts power efficiency numbers. Besides that, you seem to not understand arithmetic. If AMD quadruples everything with Naples, that would be 4x the power, which is an upper bound, so we're not even accounting for any optimizations that would make the actual power less than the 4x figure.

Markfw provided absolutely ZERO evidence that is relevant to our discussion. And no, 4x naples in a MCM doesn't have an upper bound of 4x power. Significant portions of die and I/O aren't even used in Naples. For I/O, Naples really isn't using even 1/4th of the I/O required for a 4x MCM.

imported_ats · May 1, 2017

IEC said:
Re: the pricing and the TDP, both are clearly mentioned in the TweakTown article. And the poster above transposed the last two digits on the TDP; it is supposedly 205W not 250W.

Unfortunately we don't know what that rumored 205W figure represents. For example, Intel already sells custom spec ~200W E5v4's to some vendors to use in water cooled systems (allowing them to in effect run full time at max turbo). Nor do we know what all is included in that processor (rumors of both HMC and integrated dual 100g OPA abound with dual 100g chips taking 20-30W of power alone).

imported_ats · May 1, 2017

DrMrLordX said:
Also, I do not think we'll have to worry that much about Naples' performance due to fabric speeds, but I could be wrong. Again, look to the application itself and ask: how often are you going to come under the total thread count of the CPU? How often are you going to get inter-thread communication? Is that inter-thread communication already a problem on multi-socket systems? In other words, I would expect that the CCX strategy will work better with multisocket-friendly applications than it will . . . games, for example.

You'll get some form of inter thread communication of quite literally every memory access. And yes inter-thread communication has been an issue going back to essentially the beginning of multi-context processors. There are two types of inter-thread communication: implicit and explicit. Explicit inter-thread communication is caused by the program actively exchanging data between threads while implicit inter-thread communication is the burden put on all memory activity in order to maintain a coherent view of memory (aka cache coherency and MOM - Memory Order Model).

On chip interconnection networks are rarely a bottleneck because you can throw a LOT of wires at the problem, but unfortunately once you have to go off chip, you are at the mercy of bumps and pads and the amount of wires and the performance of those wires quickly becomes an issue. It quickly becomes impossible to put enough interconnect in for worse cases and in addition you have extra latency getting on and off chip as well. So instead you end up designing around average case and practical worst case.

tamz_msc · May 1, 2017

imported_ats said:
Unfortunately we don't know what that rumored 205W figure represents. For example, Intel already sells custom spec ~200W E5v4's to some vendors to use in water cooled systems (allowing them to in effect run full time at max turbo). Nor do we know what all is included in that processor (rumors of both HMC and integrated dual 100g OPA abound with dual 100g chips taking 20-30W of power alone).

The 205W TDP is more or less confirmed from the leaked OEM board featuring the C621 chipset.

LTC8K6 · May 1, 2017

Markfw said:
Why are you posting in this thread ? Its about Skylake / Kaby Lake. I am confused

Last time I checked, the 7700K was a Kaby Lake chip.

itsmydamnation · May 1, 2017

this will be my last OT post in this thread,

https://tu-dresden.de/zih/forschung...benchit/2014_MSPC_authors_version.pdf?lang=en

Code:

location latency in ns
                       L2    L3     RAM
local                   9.1
2nd core                19.5 27.3 82.3
on-chip                88.6
2nd die in MCM         129 116 133
other socket, 1 hop     136 123 146
other socket, 2 hops     178 164 187
including probe (max)    198 185 -

Even on bulldozer the difference between an on chip not local L2 and a MCM L2 access is only 33%. But then go an look at the different SOC uarch's to see just how much better Zepplins is then Bulldozers.

Bulldozer used the same HT interfaces on MCM and inter socket. Bulldozer had the SRQ which was one giant bottleneck even accessing the directory cache. To get to the other chip in the package you had to go SRQ-> crossbar->HT->crossbar->SRQ->directory/L3/L2.

Zen is much better, cache directories are attached directly to the UMC's, the UMC's/GMI's/io Hub/Core Complexs are attached to the fabric, The L3 holds tags for the L2 within a CCX, so a much more scalable solution.

Then even just look at the size and number of phy's for the GMI interfaces, sure they are going to cost you some power to go over the interposer but nothing like PCIe or memory accesses. Fudzilla's leaks( everything else in them is correct) had each of those GMI interfaces at 25GB/s which is twice the bandwidth of BD's HT (12.8), so we are looking at around 200 GB/s of GMI bandwidth for each SOC, what we dont know is if it is full mesh or ring (i think ring is more likely with 4 controls and 8 phys with only 4 stops) .

So then it comes down to latency and we dont know what it is but i'll bet you its no more then the 40ns of BD, i recon it will be around the 20ns mark per hop just like inter CCX on the same chip. Just look at BD the difference for inter proc vs inter mcm is only 13ns so distance isn't a big contributor and we know Zeppelin has been designed from ground up to have a distributed memory hierarchy

Last but not least i'll leave you with these tidbit from an OEM when we where talking about 32 core naples:

https://forum.beyond3d.com/threads/amd-ryzen-cpu-architecture-for-2017.56187/

Terms and conditions apply here. I can't say the same about DP, but let's say it looks really good in a single CPU configuration.

We know why DP doesn't look as good, GMIx needs to use PCIe lanes, so in 2P you loose 1/2 your lanes and have nowhere near the bandwidth of the on package fabric.

imported_ats · May 1, 2017

itsmydamnation said:
this will be my last OT post in this thread,

https://tu-dresden.de/zih/forschung...benchit/2014_MSPC_authors_version.pdf?lang=en

Code:

location latency in ns L2 L3 RAM local 9.1 2nd core 19.5 27.3 82.3 on-chip 88.6 2nd die in MCM 129 116 133 other socket, 1 hop 136 123 146 other socket, 2 hops 178 164 187 including probe (max) 198 185 -

Even on bulldozer the difference between an on chip not local L2 and a MCM L2 access is only 33%. But then go an look at the different SOC uarch's to see just how much better Zepplins is then Bulldozers.

What numbers are you comparing there to get 33%? From my reading there it is 45% (129 vs 88.6). And it is only that low because the uncore is so incredibly bad. Intel's L2 latencies are like 1/2 or less depending on chip.

Then even just look at the size and number of phy's for the GMI interfaces, sure they are going to cost you some power to go over the interposer but nothing like PCIe or memory accesses. Fudzilla's leaks( everything else in them is correct) had each of those GMI interfaces at 25GB/s which is twice the bandwidth of BD's HT (12.8), so we are looking at around 200 GB/s of GMI bandwidth for each SOC, what we dont know is if it is full mesh or ring (i think ring is more likely with 4 controls and 8 phys with only 4 stops) .

If those GMI are running at high frequency, they'll still take a decent amount of power.

Last but not least i'll leave you with these tidbit from an OEM when we where talking about 32 core naples:

https://forum.beyond3d.com/threads/amd-ryzen-cpu-architecture-for-2017.56187/

Um, care to link to a post rather than an entire thread?

tamz_msc · May 1, 2017

imported_ats said:
What numbers are you comparing there to get 33%? From my reading there it is 45% (129 vs 88.6).

Uh semantics? 88ns is 33% faster than 129ns or 129ns is 45% slower than 88ns.

crashtech · May 1, 2017

Any idea WHEN we will know whether at least some Z270s will support 6c Coffee Lake?

Bouowmx · May 1, 2017

When motherboard manufacturers post "next-generation ready" articles some weeks prior to launch.

Bouowmx · May 1, 2017

Edit: Made a mistake in this post

LTC8K6 · May 1, 2017

crashtech said:
Any idea WHEN we will know whether at least some Z270s will support 6c Coffee Lake?

BIOS updates...

Sweepr · May 1, 2017

mikk said:
Any clue on this Geekbench result? It's an Ice Lake model according to Geekbench with some abnormal entries. It's running with 1C only which is explainable assuming it is a very early ES. The strange thing is there is a M7-6Y75 entry which is Skylake, but 2.00 Ghz doesn't match to base or turbo clock. L1 Data Cache is different to current mainstream SKUs, upped from 32 KB to 48 KB. L3 Cache much bigger. Processor ID doesn't match either, here is a direct comparison: https://browser.primatelabs.com/v4/cpu/compare/2400363?baseline=901997

Actually the scores are very low, only the memory scores are way better. They are way faster than my i7-7700k running with DDR4-3200 CL14.

The 'Ice Lake' entry mysteriously disappeared from their search, though I can (still) open the comparison link provided here.

IntelUser2000 · May 1, 2017

imported_ats said:
Nor do we know what all is included in that processor (rumors of both HMC and integrated dual 100g OPA abound with dual 100g chips taking 20-30W of power alone).

Skylake server has one 100G OPA. Xeon Phi has 2 and adds 15W. Also HMC has never been on the official roadmaps and confidential slides. FPGAs and iGPUs are though.

On the topic of TDP, there's too much speculation based on leaks. It can actually be wrong you know. GPU-Z for example is nearly almost wrong on the TMU/ROP stats for Intel iGPUs. The information in that case is limited to what the program author knows about the product. So the arguments are completely pointless. Take breather and stop acting like kids?

IMO 205W is realistic for 28 cores and 2.5GHz. 250W may be for a rumored even higher 32 core part with everything enabled.

mikk · May 1, 2017

Sweepr said:
The 'Ice Lake' entry mysteriously disappeared from their search, though I can (still) open the comparison link provided here.

It disappeared from the official ranking, the entry still exists under the old link. Most likely removed from Geekbench because of the oddly fast memory scores, it says: "This result has been flagged as inaccurate"

jpiniero · May 1, 2017

The people at ServeTheHome made a good point... final Skylake-SP has been available to the cloud providers for some time. It's just you that have to wait until May 22nd.

frozentundra123456 · May 1, 2017

LTC8K6 said:
Last time I checked, the 7700K was a Kaby Lake chip.

Yea, it's more on topic than all the posts about Ryzen.

ehume · May 1, 2017

Ajay · May 1, 2017

ehume said:
MCM?

Multi Chip Module

coercitiv · May 2, 2017

LTC8K6 said:
Last time I checked, the 7700K was a Kaby Lake chip.

The /s was strong in that post...

thepaleobiker · May 3, 2017

Not sure if this helps with Skylake-X, but Skylake-EP CPU popped up on Ebay with some interesting specs :-

"
Some of Intel's new Skylake-EP processors have appeared on eBay, with the seller claiming that this model is a 28-core 2GHz unit called the E5-2600 V5.

The images of this new CPU match what we have already seen in LGA 3647 CPU leaks, though at this time is is difficult to tell if this listing is genuine. Regardless, whoever one of these two matching CPUs will need an LGA 3467 motherboard to use it, motherboards that are not publically available at this time.

While there is nothing illegal about this listing, it is certainly one that Intel will need to investigate, as at this time Skylake-EP is only available to select early partners, which should leave the company wondering who is selling these samples to the public.

This Intel Xeon CPU is listed with the P 8136 name, which does not match any known Intel naming scheme, though this CPU is listed as a 28-core, 56-thread Xeon with 38.75MB of cache and a TDP of 165W. "

Regards,
Vish

shortylickens · May 3, 2017

Its official, liquid cooling is my fave.

Before: the 7700k was idling at 70 and maxing at 100.
After: the 7700k idles at 35 and maxes at 69.

If anyone wants to try it:

https://www.amazon.com/gp/product/B015E14J9M/ref=oh_aui_detailpage_o00_s00?ie=UTF8&psc=1

I picked up a damaged box version for 63 dollars. It was easy to install except for mounting a little bracket on the underside of my motherboard. And of course, its quiet.
Now its time for OVERCLOCKING!

richierich1212 · May 3, 2017

shortylickens said:
Its official, liquid cooling is my fave.

Before: the 7700k was idling at 70 and maxing at 100.
After: the 7700k idles at 35 and maxes at 69.

What is your overclock/voltage @?

Intel Skylake / Kaby Lake

No Lifer

Moderator Emeritus, Elite Member

Senior member

Senior member

Senior member

Diamond Member

Lifer

Platinum Member

Senior member

Diamond Member

Lifer

Golden Member

Golden Member

Lifer

Diamond Member

Elite Member

Diamond Member

Lifer

Lifer

Golden Member

Lifer

Diamond Member

Member

No Lifer

Platinum Member