Intel Skylake / Kaby Lake

Page 413 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

shortylickens

No Lifer
Jul 15, 2003
82,854
17,365
136
OK, so my 7700k is doing what I want, which of course is improving framerates in some games that previously lagged a bit.
Fallout 4 is much smoother in the downtown area, as is Subnautica anywhere on the map, and a few others that were too much for the i3. Desktop stuff like large photo work has not improved noticeably, but then I never benchmarked those kinds of things.
Its running very hot even with a Cooler Master Hyper TX3. And the fans are always loud. I may need to look at water cooled solutions.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,564
14,518
136
OK, so my 7700k is doing what I want, which of course is improving framerates in some games that previously lagged a bit.
Fallout 4 is much smoother in the downtown area, as is Subnautica anywhere on the map, and a few others that were too much for the i3. Desktop stuff like large photo work has not improved noticeably, but then I never benchmarked those kinds of things.
Its running very hot even with a Cooler Master Hyper TX3. And the fans are always loud. I may need to look at water cooled solutions.
Why are you posting in this thread ? Its about Skylake / Kaby Lake. I am confused
 
  • Like
Reactions: Drazick

imported_ats

Senior member
Mar 21, 2008
422
63
86
Cost determines the lower bound of price, and based on superior MCM scaling, we can already see that floor is extremely low. On the other hand, how much room does Intel have to play with prices on their ridiculous monolothic dies?

Pretty far, realistically you are looking at a production cost in the range of ~$100 just based on napkin math. Honestly, the production cost is likely significantly less than that now with the fabs pretty much fully amortized.



You can go only a few pages back to see the evidence from Markfw, or just one thread over for the Stilts power efficiency numbers. Besides that, you seem to not understand arithmetic. If AMD quadruples everything with Naples, that would be 4x the power, which is an upper bound, so we're not even accounting for any optimizations that would make the actual power less than the 4x figure.

Markfw provided absolutely ZERO evidence that is relevant to our discussion. And no, 4x naples in a MCM doesn't have an upper bound of 4x power. Significant portions of die and I/O aren't even used in Naples. For I/O, Naples really isn't using even 1/4th of the I/O required for a 4x MCM.
 

imported_ats

Senior member
Mar 21, 2008
422
63
86
Re: the pricing and the TDP, both are clearly mentioned in the TweakTown article. And the poster above transposed the last two digits on the TDP; it is supposedly 205W not 250W.

Unfortunately we don't know what that rumored 205W figure represents. For example, Intel already sells custom spec ~200W E5v4's to some vendors to use in water cooled systems (allowing them to in effect run full time at max turbo). Nor do we know what all is included in that processor (rumors of both HMC and integrated dual 100g OPA abound with dual 100g chips taking 20-30W of power alone).
 

imported_ats

Senior member
Mar 21, 2008
422
63
86
Also, I do not think we'll have to worry that much about Naples' performance due to fabric speeds, but I could be wrong. Again, look to the application itself and ask: how often are you going to come under the total thread count of the CPU? How often are you going to get inter-thread communication? Is that inter-thread communication already a problem on multi-socket systems? In other words, I would expect that the CCX strategy will work better with multisocket-friendly applications than it will . . . games, for example.

You'll get some form of inter thread communication of quite literally every memory access. And yes inter-thread communication has been an issue going back to essentially the beginning of multi-context processors. There are two types of inter-thread communication: implicit and explicit. Explicit inter-thread communication is caused by the program actively exchanging data between threads while implicit inter-thread communication is the burden put on all memory activity in order to maintain a coherent view of memory (aka cache coherency and MOM - Memory Order Model).

On chip interconnection networks are rarely a bottleneck because you can throw a LOT of wires at the problem, but unfortunately once you have to go off chip, you are at the mercy of bumps and pads and the amount of wires and the performance of those wires quickly becomes an issue. It quickly becomes impossible to put enough interconnect in for worse cases and in addition you have extra latency getting on and off chip as well. So instead you end up designing around average case and practical worst case.
 

tamz_msc

Diamond Member
Jan 5, 2017
3,821
3,641
136
Unfortunately we don't know what that rumored 205W figure represents. For example, Intel already sells custom spec ~200W E5v4's to some vendors to use in water cooled systems (allowing them to in effect run full time at max turbo). Nor do we know what all is included in that processor (rumors of both HMC and integrated dual 100g OPA abound with dual 100g chips taking 20-30W of power alone).
The 205W TDP is more or less confirmed from the leaked OEM board featuring the C621 chipset.
 

itsmydamnation

Platinum Member
Feb 6, 2011
2,773
3,151
136
this will be my last OT post in this thread,

https://tu-dresden.de/zih/forschung...benchit/2014_MSPC_authors_version.pdf?lang=en

Code:
location latency in ns
                       L2    L3     RAM
local                   9.1
2nd core                19.5 27.3 82.3
on-chip                88.6
2nd die in MCM         129 116 133
other socket, 1 hop     136 123 146
other socket, 2 hops     178 164 187
including probe (max)    198 185 -

Even on bulldozer the difference between an on chip not local L2 and a MCM L2 access is only 33%. But then go an look at the different SOC uarch's to see just how much better Zepplins is then Bulldozers.

Bulldozer used the same HT interfaces on MCM and inter socket. Bulldozer had the SRQ which was one giant bottleneck even accessing the directory cache. To get to the other chip in the package you had to go SRQ-> crossbar->HT->crossbar->SRQ->directory/L3/L2.

Zen is much better, cache directories are attached directly to the UMC's, the UMC's/GMI's/io Hub/Core Complexs are attached to the fabric, The L3 holds tags for the L2 within a CCX, so a much more scalable solution.

Then even just look at the size and number of phy's for the GMI interfaces, sure they are going to cost you some power to go over the interposer but nothing like PCIe or memory accesses. Fudzilla's leaks( everything else in them is correct) had each of those GMI interfaces at 25GB/s which is twice the bandwidth of BD's HT (12.8), so we are looking at around 200 GB/s of GMI bandwidth for each SOC, what we dont know is if it is full mesh or ring (i think ring is more likely with 4 controls and 8 phys with only 4 stops) .

So then it comes down to latency and we dont know what it is but i'll bet you its no more then the 40ns of BD, i recon it will be around the 20ns mark per hop just like inter CCX on the same chip. Just look at BD the difference for inter proc vs inter mcm is only 13ns so distance isn't a big contributor and we know Zeppelin has been designed from ground up to have a distributed memory hierarchy

Last but not least i'll leave you with these tidbit from an OEM when we where talking about 32 core naples:

https://forum.beyond3d.com/threads/amd-ryzen-cpu-architecture-for-2017.56187/
Terms and conditions apply here. I can't say the same about DP, but let's say it looks really good in a single CPU configuration.
We know why DP doesn't look as good, GMIx needs to use PCIe lanes, so in 2P you loose 1/2 your lanes and have nowhere near the bandwidth of the on package fabric.
 
  • Like
Reactions: .vodka and Ajay

imported_ats

Senior member
Mar 21, 2008
422
63
86
this will be my last OT post in this thread,

https://tu-dresden.de/zih/forschung...benchit/2014_MSPC_authors_version.pdf?lang=en

Code:
location latency in ns
                       L2    L3     RAM
local                   9.1
2nd core                19.5 27.3 82.3
on-chip                88.6
2nd die in MCM         129 116 133
other socket, 1 hop     136 123 146
other socket, 2 hops     178 164 187
including probe (max)    198 185 -

Even on bulldozer the difference between an on chip not local L2 and a MCM L2 access is only 33%. But then go an look at the different SOC uarch's to see just how much better Zepplins is then Bulldozers.

What numbers are you comparing there to get 33%? From my reading there it is 45% (129 vs 88.6). And it is only that low because the uncore is so incredibly bad. Intel's L2 latencies are like 1/2 or less depending on chip.


Then even just look at the size and number of phy's for the GMI interfaces, sure they are going to cost you some power to go over the interposer but nothing like PCIe or memory accesses. Fudzilla's leaks( everything else in them is correct) had each of those GMI interfaces at 25GB/s which is twice the bandwidth of BD's HT (12.8), so we are looking at around 200 GB/s of GMI bandwidth for each SOC, what we dont know is if it is full mesh or ring (i think ring is more likely with 4 controls and 8 phys with only 4 stops) .

If those GMI are running at high frequency, they'll still take a decent amount of power.

Last but not least i'll leave you with these tidbit from an OEM when we where talking about 32 core naples:

https://forum.beyond3d.com/threads/amd-ryzen-cpu-architecture-for-2017.56187/
Um, care to link to a post rather than an entire thread?
 

Bouowmx

Golden Member
Nov 13, 2016
1,138
550
146
When motherboard manufacturers post "next-generation ready" articles some weeks prior to launch.
 

Sweepr

Diamond Member
May 12, 2006
5,148
1,142
131
Any clue on this Geekbench result? It's an Ice Lake model according to Geekbench with some abnormal entries. It's running with 1C only which is explainable assuming it is a very early ES. The strange thing is there is a M7-6Y75 entry which is Skylake, but 2.00 Ghz doesn't match to base or turbo clock. L1 Data Cache is different to current mainstream SKUs, upped from 32 KB to 48 KB. L3 Cache much bigger. Processor ID doesn't match either, here is a direct comparison: https://browser.primatelabs.com/v4/cpu/compare/2400363?baseline=901997

Actually the scores are very low, only the memory scores are way better. They are way faster than my i7-7700k running with DDR4-3200 CL14.


icelakejts9h.png

The 'Ice Lake' entry mysteriously disappeared from their search, though I can (still) open the comparison link provided here.
 
  • Like
Reactions: Drazick

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
Nor do we know what all is included in that processor (rumors of both HMC and integrated dual 100g OPA abound with dual 100g chips taking 20-30W of power alone).

Skylake server has one 100G OPA. Xeon Phi has 2 and adds 15W. Also HMC has never been on the official roadmaps and confidential slides. FPGAs and iGPUs are though.

On the topic of TDP, there's too much speculation based on leaks. It can actually be wrong you know. GPU-Z for example is nearly almost wrong on the TMU/ROP stats for Intel iGPUs. The information in that case is limited to what the program author knows about the product. So the arguments are completely pointless. Take breather and stop acting like kids?

IMO 205W is realistic for 28 cores and 2.5GHz. 250W may be for a rumored even higher 32 core part with everything enabled.
 

mikk

Diamond Member
May 15, 2012
4,141
2,154
136
The 'Ice Lake' entry mysteriously disappeared from their search, though I can (still) open the comparison link provided here.


It disappeared from the official ranking, the entry still exists under the old link. Most likely removed from Geekbench because of the oddly fast memory scores, it says: "This result has been flagged as inaccurate"
 

jpiniero

Lifer
Oct 1, 2010
14,605
5,223
136
The people at ServeTheHome made a good point... final Skylake-SP has been available to the cloud providers for some time. It's just you that have to wait until May 22nd.
 

thepaleobiker

Member
Feb 22, 2017
149
45
61
Not sure if this helps with Skylake-X, but Skylake-EP CPU popped up on Ebay with some interesting specs :-

"
Some of Intel's new Skylake-EP processors have appeared on eBay, with the seller claiming that this model is a 28-core 2GHz unit called the E5-2600 V5.

The images of this new CPU match what we have already seen in LGA 3647 CPU leaks, though at this time is is difficult to tell if this listing is genuine. Regardless, whoever one of these two matching CPUs will need an LGA 3467 motherboard to use it, motherboards that are not publically available at this time.

While there is nothing illegal about this listing, it is certainly one that Intel will need to investigate, as at this time Skylake-EP is only available to select early partners, which should leave the company wondering who is selling these samples to the public.

This Intel Xeon CPU is listed with the P 8136 name, which does not match any known Intel naming scheme, though this CPU is listed as a 28-core, 56-thread Xeon with 38.75MB of cache and a TDP of 165W.
"

Regards,
Vish
 

shortylickens

No Lifer
Jul 15, 2003
82,854
17,365
136
Its official, liquid cooling is my fave.

Before: the 7700k was idling at 70 and maxing at 100.
After: the 7700k idles at 35 and maxes at 69.

If anyone wants to try it:

https://www.amazon.com/gp/product/B015E14J9M/ref=oh_aui_detailpage_o00_s00?ie=UTF8&psc=1

I picked up a damaged box version for 63 dollars. It was easy to install except for mounting a little bracket on the underside of my motherboard. And of course, its quiet.
Now its time for OVERCLOCKING!