[Ashraf] 10nm "Lakefield" SoC with Intel big + little cores

Page 18 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

vigilant007

Junior Member
Dec 7, 2014
11
3
81
As i already mentioned elsewhere, Windows on big.LITTLE ARM cores like on the Surface Pro X works really well. You see the small core utilized either on very low workloads like background processes or when the application requests 8 threads with full utilization, then all 8 cores are loaded 100%.
Of course the application can still make stupid decisions, like distributing the work equally to all 8 threads - but this is rarely the case in application where MT performance counts. For example Blender distributes one tile at a time to each core. Once a core is finished with a tile it gets the next tile assigned. You can literally see which tiles are assigned to the small cores :)

ps. I do still have the issue, that under Linux the CPU does not increase the clock when all cores are requested with full utilization on my Surface Pro X. As workaround i start a program on the Windows side which pulls the clock-speed up while running the Linux app under virtualization. Even then i am beating Lakefield in Blender by miles even with a relatively ancient Cortex A76 system - all while using less power from battery.
Thank! That is perfect!

Out of curiosity, how do you like the Surface X for day to day use? I am tempted to pick one up this week.


Sent from my iPad using Tapatalk Pro
 

DrMrLordX

Lifer
Apr 27, 2000
16,020
4,970
136
I don't think it's fair to slag Tremont as being inferior. Goldmont+ -> Tremont seems like a substantial jump in performance at isoclocks. The problem is that Tremont got caught up in the node mess. We still haven't seen proper replacements for Goldmont/Goldmont+ consumer devices featuring Tremont, nor have we seen any movement in Intel's automotive lineup.
 

Thala

Senior member
Nov 12, 2014
998
326
136
Thank! That is perfect!

Out of curiosity, how do you like the Surface X for day to day use? I am tempted to pick one up this week.
I'd say its the perfect device for me. Fast and snappy combined with gorgeous screen and form factor plus excellent battery duration. Most software i am running is native. In some cases you might need to looks for alternatives. For example instead of Photoshop why not running GIMP under native WSL2? Be aware that most 32bit programs do work under emulation, some just do not and crash. GPU performance is excellent. For example Diablo 3 is running at around 30 fps at 2880x1920 resolution - thats 5.5 mio pixel. Other games like Dust: An Elysian Tail even achieving stable 60fps under same resolution with all gfx options on. Ori and the Blind Forrest - stable 60fps lower resolution though.

Some other limitation like the missing OpenGL are looking to get better. Microsoft announced lately, that they are working with Collabora to work on an OpenGL driver based on the Mesa project which has a D3D12 backend. Now since the effort is open source i cloned the sources in addition to the quake 3 sources compiled everything and here is the result:

q3arm_SurfaceProX.PNG

Accelerated OpenGL on my Surface Pro X :)

ps: sorry for off-topic - but it is somehow related to lakefield.
 
Last edited:

vigilant007

Junior Member
Dec 7, 2014
11
3
81
I'd say its the perfect device for me. Fast and snappy combined with gorgeous screen and form factor plus excellent battery duration. Most software i am running is native. In some cases you might need to looks for alternatives. For example instead of Photoshop why not running GIMP under native WSL2? Be aware that most 32bit programs do work under emulation, some just do not and crash. GPU performance is excellent. For example Diablo 3 is running at around 30 fps at 2880x1920 resolution - thats 5.5 mio pixel. Other games like Dust: An Elysian Tail even achieving stable 60fps under same resolution with all gfx options on. Ori and the Blind Forrest - stable 60fps lower resolution though.

Some other limitation like the missing OpenGL are looking to get better. Microsoft announced lately, that they are working with Collabora to work on an OpenGL driver based on the Mesa project which has a D3D12 backend. Now since the effort is open source i cloned the sources in addition to the quake 3 sources compiled everything and here is the result:

View attachment 26983

Accelerated OpenGL on my Surface Pro X :)

ps: sorry for off-topic - but it is somehow related to lakefield.
It’s not going to be a primary device of mine. I mostly use Macs for work, but I occasionally need to do a few things using Microsoft Office that ONLY has the feature on Windows. I’m looking at it as a fun experiment.

I see the move to ARM for every platform incredibly interesting because for a good while Intel just hasn’t been interesting. Qualcomm makes brilliant hardware, and I’m glad Microsoft is using that hardware to hold Intels feet to the fire.

As a tinkerer at heart, I want to see how far I can get with it by being creative. I know I could wait a few months but seeing as it’s not my primary device, I feel compelled to take advantage of the discounting that is currently being offered.

It would also be nice to have a Windows tablet with native LTE connectivity, as when I’m traveling I’m normally working from the side of the road.
 
  • Like
Reactions: Tlh97

ondma

Golden Member
Mar 18, 2018
1,279
324
106
  • Like
Reactions: Tlh97

Thala

Senior member
Nov 12, 2014
998
326
136
Performance per watt is exceptional at least at 5w. If it scales up proportionately to 7w with the same efficiency, it could be competitive.
Not sure what you consider exceptional, but Galaxy Book S with Cortex 8CX has 50-100% higher performance while using much less battery power - this plays in completely different league. A turd stays a turd - even if you give it 2W more power.
 
Last edited:
  • Like
Reactions: Tlh97 and Lodix

ondma

Golden Member
Mar 18, 2018
1,279
324
106
Not sure what you consider exceptional, but Galaxy Book S with Cortex 8CX has 50-100% higher performance while using much less battery power - this plays in completely different league.
I was looking at the "Multi-threaded performance per watt" graph at the bottom of page 2 in the hardware luxx article linked a couple of posts above.
link
 

Hitman928

Platinum Member
Apr 15, 2012
2,588
1,813
136
I was looking at the "Multi-threaded performance per watt" graph at the bottom of page 2 in the hardware luxx article linked a couple of posts above.
link
There seems to be something wrong with a lot of their calculations for perf/w. If I am understanding their translation correctly, they took the multi-threaded Cinebench and divided by the CPUs PL1 power limit. But if you do that on the 4900HS (for example), you get 4142/35 = 118.3 pts/w. Then take the i5-L16G7 and you get 607/5 = 121.4 pts/w. That's only a 2.6% improvement over the 4900HS.

Am I missing something in the translation here?
 

Thala

Senior member
Nov 12, 2014
998
326
136
I was looking at the "Multi-threaded performance per watt" graph at the bottom of page 2 in the hardware luxx article linked a couple of posts above.
link
Saw this graph and have no idea how this was derived. The claim was Cinebench score devided by PL1. Lets do this:

i5-L16G7: 607pk/5W = 121pkt/W
i5-1021OU: 1203pkt/15W = 80.2 pkt/W

Very different than what the graph shows. In addition increasing the TDP from 5W to 7W will not improve efficiency as the frequency does not increase linearly with power.
Besides in the light of the meager battery duration i trust that the device stays within 5W SoC power not the slightest bit.

PS. Hitman928 beat be :p
 
Last edited:
  • Like
Reactions: Tlh97 and Lodix

mikk

Platinum Member
May 15, 2012
2,733
603
136
There seems to be something wrong with a lot of their calculations for perf/w. If I am understanding their translation correctly, they took the multi-threaded Cinebench and divided by the CPUs PL1 power limit. But if you do that on the 4900HS (for example), you get 4142/35 = 118.3 pts/w. Then take the i5-L16G7 and you get 607/5 = 121.4 pts/w. That's only a 2.6% improvement over the 4900HS.

Am I missing something in the translation here?

This is default TDP though, real power usage differs. 4900HS uses more than 35W at this score: https://www.hardwareluxx.de/index.php/artikel/hardware/prozessoren/52811-amd-ryzen-9-4900hs-im-vergleichstest-intel-muss-sich-abermals-warm-anziehen.html?start=2

First run 65W, second run 50W. At 35W it scores 3300 points.
 

Thala

Senior member
Nov 12, 2014
998
326
136
This is default TDP though, real power usage differs. 4900HS uses more than 35W at this score: https://www.hardwareluxx.de/index.php/artikel/hardware/prozessoren/52811-amd-ryzen-9-4900hs-im-vergleichstest-intel-muss-sich-abermals-warm-anziehen.html?start=2

First run 65W, second run 50W. At 35W it scores 3300 points.
Does not explain the graph - neither the 4900HS numbers nor the Lakefield numbers. According to your link the frequency comes down to 3.1-3.2GHz, which is where the score stabilize at around 3700 points at 35W. Quick calculation shows: 3700/35 = 107pkt/W.
Even better at 25W: 3300pkt/25W = 132pkt/w ....easily beating Lakefield.
 
  • Like
Reactions: Tlh97

mikk

Platinum Member
May 15, 2012
2,733
603
136
Does not explain the graph - neither the 4900HS numbers nor the Lakefield numbers. According to your link the frequency comes down to 3.1-3.2GHz, which is where the score stabilize at around 3700 points at 35W. Quick calculation shows: 3700/35 = 107pkt/W.

Yes I know, I'm just saying they are using the default TDP which is flawed, they need real power usages for efficiency tests.
 
  • Like
Reactions: lightmanek

Hitman928

Platinum Member
Apr 15, 2012
2,588
1,813
136
According to this video, setting a 4700u to cTDP down of 10W and disabling turbo gives score of 1670 points in Cinebench R20 which results in 167 pts/w. Seems like the 10 - 15 W range is the sweet spot in terms of efficiency for Renoir. I wonder how much performance they'd have to sacrifice to get it down to the 5 - 7 W range. Will also be interesting to see how it compares to a 4C tigerlake with cTDP down.

 

HurleyBird

Platinum Member
Apr 22, 2003
2,097
481
136
Yeah, there's no way to make that graph work, they need to check their numbers.
Or at least expand on their methodology. Maybe it's total power consumption of the entire laptop (which itself would introduce a slew of confounds). At a glance, it looks pretty nonsensical.
 

CluelessOne

Member
Jun 19, 2015
50
29
91
One thing I don't really understand. Why going big little style? Let's say for hypothetical 1 big 1 little core compared to a dual core of the same architecture CPU with 1 core with transistors optimized for let's say 1.6 GHz max frequency and 1 core with transistors that can run up to 4 GHz? Is the power savings gained from in order architecture that much compared to out of order architecture?
 

DrMrLordX

Lifer
Apr 27, 2000
16,020
4,970
136
One thing I don't really understand. Why going big little style? Let's say for hypothetical 1 big 1 little core compared to a dual core of the same architecture CPU with 1 core with transistors optimized for let's say 1.6 GHz max frequency and 1 core with transistors that can run up to 4 GHz? Is the power savings gained from in order architecture that much compared to out of order architecture?
Being able to power down the big core saves on power versus relying on a wider core that you can bring into a low power state via sophisticated pstates and power gating. Going with the multicore config does seem to consume more die area though.
 

NTMBK

Diamond Member
Nov 14, 2011
8,707
1,680
126
According to this video, setting a 4700u to cTDP down of 10W and disabling turbo gives score of 1670 points in Cinebench R20 which results in 167 pts/w. Seems like the 10 - 15 W range is the sweet spot in terms of efficiency for Renoir. I wonder how much performance they'd have to sacrifice to get it down to the 5 - 7 W range. Will also be interesting to see how it compares to a 4C tigerlake with cTDP down.

How much I/O is integrated into Lakefield that is not in Renoir, but rather in a separate southbridge chip? You're not necessarily comparing apples to apples. The southbridge isn't generally very power hungry, but when you get to TDPs this low it starts to matter.
 

itsmydamnation

Platinum Member
Feb 6, 2011
2,019
1,310
136
How much I/O is integrated into Lakefield that is not in Renoir, but rather in a separate southbridge chip? You're not necessarily comparing apples to apples. The southbridge isn't generally very power hungry, but when you get to TDPs this low it starts to matter.
None?
The entire right hand side of the soc is IO
 

mikk

Platinum Member
May 15, 2012
2,733
603
136
How much I/O is integrated into Lakefield that is not in Renoir, but rather in a separate southbridge chip? You're not necessarily comparing apples to apples. The southbridge isn't generally very power hungry, but when you get to TDPs this low it starts to matter.

Is it the package or core power usage there? I'm asking because AMDs TDP is a core only TDP, so you can have a TDP of 25W but the Soc can draw 28-29W overall. On the other side Intels TDP includes the entire Soc, it's a package TDP. I would be surprised if it's different on Lakefield.
 

Hitman928

Platinum Member
Apr 15, 2012
2,588
1,813
136
Is it the package or core power usage there? I'm asking because AMDs TDP is a core only TDP, so you can have a TDP of 25W but the Soc can draw 28-29W overall. On the other side Intels TDP includes the entire Soc, it's a package TDP. I would be surprised if it's different on Lakefield.
I don't think AMD's TDP is core only, it's SOC TDP. Having a core only TDP doesn't really make sense.
 

mikk

Platinum Member
May 15, 2012
2,733
603
136
I don't think AMD's TDP is core only, it's SOC TDP. Having a core only TDP doesn't really make sense.

It's a core TDP, you can see it in the tests. The CPU power usage is exactly 15W or 25W under full load, package power is like 3-4W more than this.
 

Hitman928

Platinum Member
Apr 15, 2012
2,588
1,813
136
It's a core TDP, you can see it in the tests. The CPU power usage is exactly 15W or 25W under full load, package power is like 3-4W more than this.
According to Techspot (Hardware Unboxed), the 4900HS (35 W TDP) sits at exactly 35 W package power after turbo duration is exceeded.



If TDP was core only, how would system builders/OEMs know how much cooling to apply to the SOC?

 

ASK THE COMMUNITY