[Ashraf] 10nm "Lakefield" SoC with Intel big + little cores

Markfw · Jun 10, 2020

piokos said:
So you felt the urge to jump on an Intel product in yet another thread...

You jump on any AMD product in every thread, so whats your point ? Just trolling ?

piokos · Jun 10, 2020

IntelUser2000 said:
This is dead. Probably why recent Macbooks are so subpar. Make the previous generation subpar, and the next generation looks even better.

I'm not sure I understand this sentence.
Recent MacBooks Air (those with Ice Lake U) are probably the best ones ever. Very good stuff.

We know Apple will start making ARM laptops.
It's not obvious if ARM and x86 will overlap (i.e. different MacBook Airs offered side by side) or will there be a strong segmentation (MB Air on ARM, MB Pro on x86).

Multi-threaded Turbo is 1.8GHz for the L16G7.

Yeah, I've found it here:

AnandTech Forums: Technology, Hardware, Software, and Deals

Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

www.anandtech.com

I guess this makes sense.
Those 4 Tremont cores only need around 2W, so everything else goes to the big core.
I guess Intel implemented some balancing, i.e. small cores will be held back during Sunny Cove boosts.
Really looking forward to reviews of the Samsung Book S.

IntelUser2000 · Jun 10, 2020

Tabalan said:
Any link to prove those 18W PL2? thanks in advance.

I normally wouldn't answer this but you are being polite.

Notebookcheck does thorough enough reviews to show that data. Their stress test section shows screenshots when running HWInfo, a monitoring application. It shows PL2 data.

I'm not sure I understand this sentence.
Recent MacBooks Air (those with Ice Lake U) are probably the best ones ever. Very good stuff.

They waited this long to use Icelake when they could have used it few months earlier. And I'm aware Apple tends to be late. They are extra late.

Performance shows that the "28W" parts are nothing exceptional. So much for their exclusivity.

Those 4 Tremont cores only need around 2W, so everything else goes to the big core.
I guess Intel implemented some balancing, i.e. small cores will be held back during Sunny Cove boosts.

I hope all 5 cores can work together. Otherwise it'll look pretty bad in MT benchmarks.

At least its outperforming 8500Y in ST, despite the 8500Y having 4.2GHz Turbo. I knew the 8500Y was underperforming!

DrMrLordX · Jun 10, 2020

Is Apple going to use these chips? I would think Lakefield would be a product replaceable by Kalamata.

IntelUser2000 · Jun 10, 2020

DrMrLordX said:
Is Apple going to use these chips? I would think Lakefield would be a product replaceable by Kalamata.

Definitely not.

Especially if they are announcing transition to their cores at next WDDC.

jpiniero · Jun 10, 2020

IntelUser2000 said:
I hope all 5 cores can work together. Otherwise it'll look pretty bad in MT benchmarks.

Together I have a doubt, at least on Windows. It's probally more like a process can be one or the other but not both.

RetroZombie · Jun 10, 2020

piokos said:
So you felt the urge to jump on an Intel product in yet another thread...

I'm a computer enthusiast, i like new stuff, specially things that take the industry forward.

piokos said:
Like on all Atom-based CPUs.

And all the celeron and pentiums. Why limit this in 2020? How can software evolve?
It's not even a single vs multi thread performance or ipc comparisons, it's the missing key features.
How can apps take fully advantage of avx if some x86 parts miss it?

piokos said:
No. But we've been over this.

Yes and we will be, specially once the products can be tested and compared to existing ones.

IntelUser2000 · Jun 10, 2020

RetroZombie said:
I'm a computer enthusiast, i like new stuff, specially things that take the
And all the celeron and pentiums. Why limit this in 2020? How can software evolve?

This is to maintain the same ISA between both cores. Generation that uses Gracemont should enable AVX.

jpiniero said:
Together I have a doubt, at least on Windows. It's probally more like a process can be one or the other but not both.

It may be able to. They are saying enabling Sunny Cove results in 33% increase in WebXPRT 3 performance. That benchmark is somewhat multi-threaded(I'm trying to find out to what degree).

But it may also depend on application if its indeed possible.

Thala · Jun 10, 2020

jpiniero said:
Together I have a doubt, at least on Windows. It's probally more like a process can be one or the other but not both.

Why not? On SQ1/8CX CPUs all 8 cores work together as well on Windows, at max frequency by the way...

IntelUser2000 · Jun 10, 2020

Thala said:
Why not? On SQ1/8CX CPUs all 8 cores work together as well on Windows, at max frequency by the way...

And you are sure it works in 100% of the working applications and(the most important part) benefit from it?

Like can we see Cinebench scores improve? x264?

Thala · Jun 10, 2020

IntelUser2000 said:
And you are sure it works in 100% of the working applications and(the most important part) benefit from it?

Like can we see Cinebench scores improve? x264?

Both yes. If you start a benchmark (like say 7zip benchmark), it allocates 8 threads - one on each core. CPU load rises immediatly too 100% on each core. The 8 cores are also exposed to Linux (WSL) - so when i run Blender under WSL/Ubuntu it will distribute the load to all 8 cores.

IntelUser2000 · Jun 10, 2020

Thala said:
Both yes. If you start a benchmark (like say 7zip benchmark), it allocates 8 threads - one on each core. CPU load rises immediatly too 100% on each core.

But does it perform like one? That's the real important question. I know in Geekbench the small cores add ~15%, similar to HT in Intel chips.

Thala · Jun 10, 2020

IntelUser2000 said:
But does it perform like one? That's the real important question. I know in Geekbench the small cores add ~15%, similar to HT in Intel chips.

Of course a small core is performing like a small core - so it is slower than the bigger cores. So by going from 4 to 8 cores you will not see double performance when we are assuming a big.LITTLE archtecture.

ps. Just checked 7-zip performance, when going 4 big core -> 4+4 - i gain roughly 28% performance. The cores do not throttle yet - i guess there is power headroom for the GPU until we reach 7W.

IntelUser2000 · Jun 10, 2020

Thala said:
ps. Just checked 7-zip performance, when going 4 big core -> 4+4 - i gain roughly 28% performance.

Interesting.

Reviewers need to test for things like these. I think people will be very interested in it.

HurleyBird · Jun 10, 2020

To be honest, the only thing that doesn't sound completely underwhelming to me here is the package size. An 865 might lose a bit in int ST but will spank this in MT and any kind of fp while consuming less power. On the other end of the spectrum, Renoir spanks across all performance metrics and might even have better perf/watt.

And it would be one thing if this were a budget chip, but it looks like they're treating it as a premium platform for some reason.

piokos · Jun 10, 2020

HurleyBird said:
To be honest, the only thing that doesn't sound completely underwhelming to me here is the package size. An 865 might lose a bit in ST but will spank this in MT while consuming less power.

Seriously, how long will we see this argument...
It's x86, not ARM. That's the advantage. ARM will be more efficient.

On the other end of the spectrum, Renoir spanks across all performance metrics and might even have better perf/watt.

Yeah, the choice of 7W Zen mobile APUs that idle on a fraction of a Watt is just enormous. Especially those fitting in a similar form factor.
:/

RetroZombie said:
I'm a computer enthusiast, i like new stuff, specially things that take the industry forward.

But you don't like this product...
I bet that would be different if it had a different logo...

And all the celeron and pentiums. Why limit this in 2020?

Intel decided to differentiate their product lineup that way. Nothing wrong with that.

How can software evolve?

Not sure what you mean by "evolve"...

It's not even a single vs multi thread performance or ipc comparisons, it's the missing key features.
How can apps take fully advantage of avx if some x86 parts miss it?

AVX is an extension to x86 ISA. It doesn't have to be supported by the CPU and software should not require it.
If software requires AVX to run, it's just badly written (or compiled).

Basically: you have AVX (or AVX2, AVX-512), you may get some performance boost. That's all.

piokos · Jun 10, 2020

Thala said:
Why not? On SQ1/8CX CPUs all 8 cores work together as well on Windows, at max frequency by the way...

And they will all work in Lakefield. Intel stated during development that all big and small cores will be available at the same time (this is called heterogeneous multi-processing / HMP).
It was different with ARM. HMP was implemented few years after big.LITTLE. Before that only half of the CPU was working.

The problem is not in running software on all cores. They share the same ISA.
The hard part is efficient scheduling.
In a homogeneous CPU you can just throw a job to any free core. They're all the same.
In a heterogeneous CPU you have to try to give the "big" tasks to the "big" cores.

The 7-zip example is actually easy, because the threads are independent (much like with rendering). Each one calculates part of the problem, puts it on a stack and gets another one.
The issue is when threads have to be synchronized, i.e. they have to wait for someone else to finish.

Let's say you have a program that runs 2 threads on 2 identical "big" cores:
- A - big job - takes 1s
- B - small job - takes a lot less, like 0.1s
After each round they have to "talk" and then another rounds starts.
In this case cores can be assigned randomly - it doesn't matter.

Now you replace one "big" core with 4 "small" ones. Let's say the total throughput is the same, i.e. one "small" core is 4 times slower.
What we're after is optimal assigning:
- A -> big
- B -> small
because then the whole round still takes 1s.

If task A goes to a small core, performance drops 4 times.
Which means that assigning randomly on a 1+4 architecture, it'll be optimal in just 20% rounds.
So, on average, each round will take 3.4s. And we just lost 2/3 of performance.

IntelUser2000 · Jun 10, 2020

piokos said:
Seriously, how long will we see this argument...
It's x86, not ARM. That's the advantage. ARM will be more efficient.

I know they can get in the ballpark. They have with the Tablet Atoms. I have an 8-inch Venue 8 Pro with Bay Trail Atom. 15WHr and you get 6-8 hours of battery life.

I also have a device when they couldn't do it. A 45nm Menlow Atom for UMPCs. It's a 4.8-inch device and a 24WHr battery, and gets 5 hours of battery life. I know it has little to do with uarch, since 32nm didn't do much(maybe 10% better), but the 32nm real Tablet/Smartphone oriented version did. All three used the same uarch. Yet the Tablet oriented Atom got twice the battery life per WHr!

Intel's current Core devices are somewhere between the two. I know its not just the screen. You can find Icelake 8-inch devices not doing any better per WHr.

insertcarehere · Jun 10, 2020

piokos said:
Seriously, how long will we see this argument...
It's x86, not ARM. That's the advantage. ARM will be more efficient.

x86 almost certainly isn't an advantage when power efficiency takes precedence over full Windows program compatability. Luckily Samsung has done us a favor by putting both Lakefield and Qualcomm 8cx in the exact same form factor, so it won't be long before we get a good sense of the former's performance against (somewhat outdated) ARM.

Thala · Jun 10, 2020

IntelUser2000 said:
I know they can get in the ballpark. They have with the Tablet Atoms.

You talk about times, when Intel had a big process advantage (e.g. FinFet vs Planar). That time is over - today we are talking more or less iso-process.

Thala · Jun 10, 2020

piokos said:
Let's say you have a program that runs 2 threads on 2 identical "big" cores:
- A - big job - takes 1s
- B - small job - takes a lot less, like 0.1s
After each round they have to "talk" and then another rounds starts.
In this case cores can be assigned randomly - it doesn't matter.

Now you replace one "big" core with 4 "small" ones. Let's say the total throughput is the same, i.e. one "small" core is 4 times slower.
What we're after is optimal assigning:
- A -> big
- B -> small
because then the whole round still takes 1s.

If task A goes to a small core, performance drops 4 times.
Which means that assigning randomly on a 1+4 architecture, it'll be optimal in just 20% rounds.
So, on average, each round will take 3.4s. And we just lost 2/3 of performance.

The scheduler will start assigning the larger task to the bigger core even if the initial assignment is not optimal, because it would observe, that the small core is at 100% load while the big core is not with this assignment.
I never had the issue, that the scheduler was assigning the more demanding thread to the small cores.

Zucker2k · Jun 11, 2020

Lakefield die shot

https://twitter.com/x/status/1270741606975639552

IntelUser2000 · Jun 11, 2020

Thala said:
You talk about times, when Intel had a big process advantage (e.g. FinFet vs Planar). That time is over - today we are talking more or less iso-process.

Refer back to my comment above. That wouldn't be true if it was due to the process.

Modern Intel platforms still suffer from being behind. 32nm Clover Trail platform with an "old" planar proces does way better, Atom or otherwise.

piokos · Jun 11, 2020

Thala said:
The scheduler will start assigning the larger task to the bigger core even if the initial assignment is not optimal, because it would observe, that the small core is at 100% load while the big core is not with this assignment.

No. After each "round" the CPU is empty - nothing is running. That's why I used that example specifically.
And the CPU really doesn't know if new pair of jobs is the same as before, so he can't assume that the previous load levels will repeat.

For really good optimization - this will have to be done on compiler level. Which is probably where oneAPI comes into play. But it'll still be interesting what Intel manages.

insertcarehere said:
x86 almost certainly isn't an advantage when power efficiency takes precedence over full Windows program compatability.

But it doesn't have to take precedence. That's the point.

The whole Lakefield exercise is about pushing efficiency as far as possible without losing functionality.

Honestly, I don't understand why this is so underestimated by people on enthusiast forums.
Most of the software you use today on Windows won't work on that ARM-powered Samsung Book S. It will on the Lakefield one. Simple as that.

coercitiv · Jun 11, 2020

jpiniero said:
Also no HT on the big core.

The lack of HT further complicates a 2xSNC vs 1xSNC + 4xTNT comparison. Original performance charts claimed 1 SNC core offered ~50% throughput at ~60% power versus 4 TNT cores, but these numbers were tailored for Lakefield, so in hindsight they might have lacked SMT for the SNC core.

If so, then a dual-core SNC may not look so bad in perf/watt compared to Lakefield, except maybe in idle power consumption which is arguably just as important for tablets and other small devices.

[Ashraf] 10nm "Lakefield" SoC with Intel big + little cores

Moderator Emeritus, Elite Member

Senior member

Elite Member

Lifer

Elite Member

Lifer

Senior member

Elite Member

Golden Member

Elite Member

Golden Member

Elite Member

Golden Member

Elite Member

Platinum Member

Senior member

Senior member

Elite Member

Senior member

Golden Member

Golden Member

Golden Member

Elite Member

Senior member

Diamond Member