Samsung Exynos Thread (big.LITTLE Octa-core)

hemedans · Mar 14, 2016

CLARiiON said:
My guesses:

1. Something related to Qualcomm security for Samsung Pay? Both US and China are supposed to have Qualcomm models. Then again Korea have Exynos!
2. Radios for CDMA, where Qualcomm clearly have upper hand. Probably China also have some of those bands where Qualcomm radios work better.

3. better gpu

CLARiiON · Mar 14, 2016

hemedans said:
3. better gpu

I doubt why only people in US/China worry about better GPU and not in rest of the world!

Andrei. · Mar 14, 2016

hemedans said:
3. better gpu

That's not determined yet.

Nothingness · Mar 15, 2016

I ran VFP Benchmark on my Exynos-based S7 Edge.

Looking at the results, one can deduce the AdvSIMD engine is 128-bit wide and it can run fmul/fadd/fmla every cycle (both SP and DP). This means >20 GFLOPS for SP (mul+add) and >10 GFLOPS for DP. Not bad.

What I find odd is that the MT score reaches 147 SP GFLOPS and 73 DP GFLOPS whichs means a 7x speedup, while I would have expected less than 6: 1-2 M1 run @2.6 GHz ,while if you run 8 cores 4 M1 run @2.3 GHz and 4 Cortex-A53 @1.6 GHz, 4*2.3 + 4*1.6 = 15.6 = 6.*2.6.

Can someone run the benchmark on an S820?

monstercameron · Mar 15, 2016

Nothingness said:
I ran VFP Benchmark on my Exynos-based S7 Edge.

Looking at the results, one can deduce the AdvSIMD engine is 128-bit wide and it can run fmul/fadd/fmla every cycle (both SP and DP). This means >20 GFLOPS for SP (mul+add) and >10 GFLOPS for DP. Not bad.

What I find odd is that the MT score reaches 147 SP GFLOPS and 73 DP GFLOPS whichs means a 7x speedup, while I would have expected less than 6: 1-2 M1 run @2.6 GHz ,while if you run 8 cores 4 M1 run @2.3 GHz and 4 Cortex-A53 @1.6 GHz, 4*2.3 + 4*1.6 = 15.6 = 6.*2.6.

Can someone run the benchmark on an S820?

My scores seem kinda low.

Nothingness · Mar 15, 2016

monstercameron said:
My scores seem kinda low.

They are in line with what Exynos does: 128-bit wide, one instruction per cycle, with a core running at 2.1 GHz. The MT score is slightly higher than what I'd expect, but by a smaller margin than Exynos.

Thanks for running the benchmark 🙂

zentan · Mar 16, 2016

What's interesting to see will be a detailed perf/power analysis of both these chips in comparison to an A72 based SoC like used by the Kirin 950.
The mid-range A72 based chip-SD650(28nm) used in Xiaomi Redmi note 3 ~150/180$ easily outpaces SD808(20nm) and does quite well againt SD810(20nm) while not having as much thermal issues.Finally the big-core has arrived to lower ranges in some markets.SD650 based phones could make a lot of SD616/808/810 based phones irrelevant if priced right. A53 wasn't big enough improvement from A7 but A72 has definitely delivered over A57.

R0H1T · Mar 16, 2016

zentan said:
What's interesting to see will be a detailed perf/power analysis of both these chips in comparison to an A72 based SoC like used by the Kirin 950.
The mid-range A72 based chip-SD650(28nm) used in Xiaomi Redmi note 3 ~150/180$ easily outpaces SD808(20nm) and does quite well againt SD820(20nm) while not having as much thermal issues.Finally the big-core has arrived to lower ranges in some markets.SD650 based phones could make a lot of SD616/808/810 based phones irrelevant if priced right. A53 wasn't big enough improvement from A7 but A72 has definitely delivered over A57.

They're already irrelevant in this part of the world, as for premium phones I'd say only Apple atm has a death grip on that segment & the rest (including Sammy) are falling by the wayside. I have the 2GB model of Redmi Note 3 & for the price, it's closer to 140$ including all the discounts & cashback one can muster, there's no better alternative out there.

Sweepr · Mar 16, 2016

NotebookCheck's Galaxy S7 (Exynos 8 Octa 8890) Review
Here.

Exynos vs Snapdragon Speedtests:
www.youtube.com/watch?v=mI7jkyRwa8A
www.youtube.com/watch?v=r_sPVQ4VFvs

zentan · Mar 16, 2016

Oops,meant SD810 not 820.
Sammy is still doing fine sort of with ads and other means.Chinese OEMs might make a even more significant dent on bigger OEM's share if they get into more markets and as awareness in other Asian countries apart from China grows though they are already doing well in India and likes.Anyway,went way off-topic.

Exophase · Mar 16, 2016

Nothingness said:
Looking at the results, one can deduce the AdvSIMD engine is 128-bit wide and it can run fmul/fadd/fmla every cycle (both SP and DP). This means >20 GFLOPS for SP (mul+add) and >10 GFLOPS for DP. Not bad.

1:2 SP to DP ratio on fmul/fmadd seems like an odd choice to me. I wonder how many are actually running FP64 code on these CPUs.

Maybe it'll help somewhere with PS2 emulation one day (not that PS2 has FP64 but it might be useful for emulating its weird FP32 behavior)

Nothingness · Mar 16, 2016

Exophase said:
1:2 SP to DP ratio on fmul/fmadd seems like an odd choice to me. I wonder how many are actually running FP64 code on these CPUs.

Maybe it'll help somewhere with PS2 emulation one day (not that PS2 has FP64 but it might be useful for emulating its weird FP32 behavior)

Perhaps the CPU, or parts of it, are to be reused (or were to be reused, if the project was cancelled) on a desktop/server chip.

Thala · Mar 16, 2016

1:2 SP to DP ratio on fmul/fmadd seems like an odd choice to me.

How is this a choice? This is essentially given by the fact that you can pack double the amount of SP datatypes into single 128 bit NEON register. A NEON operation completes 2 DP operation or 4 SP operations per cycle best case. For Kryo and Mongoose this seems to be the case.
On Cortex-A5 for example even the integer datatypes need 2 cycles but you still have the ratio 2:1 between 16bit and 32bit.

Exophase · Mar 16, 2016

Thala said:
How is this a choice? This is essentially given by the fact that you can pack double the amount of SP datatypes into single 128 bit NEON register. A NEON operation completes 2 DP operation or 4 SP operations per cycle best case. For Kryo and Mongoose this seems to be the case.
On Cortex-A5 for example even the integer datatypes need 2 cycles but you still have the ratio 2:1 between 16bit and 32bit.

The ratio is for cycles per operation (on a total throughput basis), it has nothing to do with data sizes. Hence why it was in response to throughput figures.

The amount of work needed for a multiplication scales quadratically with input width. You can see this in Cortex-A8 and A9 with integer NEON multiplications: 4x16-bit issues in one cycle but 2x32-bit issues in two cycles. Going from single precision to double precision floating point is even worse because the multiplication part goes from 23 bits to 52 bits meaning that there's over five times as much work needed. But some other work is needed for FP calculations that don't scale as much so it's not quite that bad in terms of overall logic increase.

The point is, if you want to support a 1:2 SP to DP work ratio you need a lot of extra computational logic that's only for the benefit of double precision. If you go with a 1:4 ratio you can use ~26 bit multipliers instead. Jaguar for example has the 1:4 ratio.

It is however worth noting that Cortex-A57 and A72 are 1:2 so the precedent has already been set.

Nothingness · Mar 17, 2016

Exophase said:
The amount of work needed for a multiplication scales quadratically with input width. You can see this in Cortex-A8 and A9 with integer NEON multiplications: 4x16-bit issues in one cycle but 2x32-bit issues in two cycles. Going from single precision to double precision floating point is even worse because the multiplication part goes from 23 bits to 52 bits meaning that there's over five times as much work needed. But some other work is needed for FP calculations that don't scale as much so it's not quite that bad in terms of overall logic increase.

When you have to add only two DP muladd units, the extra area is not that large (compared to the rest of the CPU), especially since these DP units can be used for SP computations too with some tweaks.

Lepton87 · Mar 17, 2016

Sweepr said:
NotebookCheck's Galaxy S7 (Exynos 8 Octa 8890) Review
Here.

Exynos vs Snapdragon Speedtests:
www.youtube.com/watch?v=mI7jkyRwa8A
www.youtube.com/watch?v=r_sPVQ4VFvs

I watched only the first movie and the Exynos is way more fluid, but it scores lower in benchmarks, does the NAND speed make it feel faster?

Thala · Mar 17, 2016

The amount of work needed for a multiplication scales quadratically with input width.

Size of an ALU is not an issue for the cores we are talking about. It rather is circuit depth and DEPTH = O(log n) for multiplication and addition. So its not bad if you are shooting for single cycle DP multiplication and if you can afford the depth increase. With such an ALU you also getting single cycle 2xSP multiplication.

Of course for smaller cores like Cortex A5 size is an issue.
Not convinced that the savings in gate count for Jaguar really pays off.

hemedans · Mar 17, 2016

Andrei. said:
That's not determined yet.

gsmarena tested xiaomi mi5 and s7 edge with sd 820
-sd 820 is faster in single thread perfomance (only behind a9)
-adreno 530 is faster than any gpu in the market
-better battery efficient in sd 820??

perfomance check here
http://www.gsmarena.com/samsung_galaxy_s7_edge-review-1409p6.php

battery life

mi 5 has 1080p display and s7 Quad HD so its not direct comparison.

ChronoReverse · Mar 17, 2016

Lepton87 said:
I watched only the first movie and the Exynos is way more fluid, but it scores lower in benchmarks, does the NAND speed make it feel faster?

It's strange but Samsung has somehow messed up Snapdragon GPU performance in their phones for a while now.

For example, the Galaxy S4 had a faster version of the the SD600's cpu and gpu than the M7 but despite that, will experience stuttering when scrolling in Google Maps that doesn't exist on the M7.

When you open up Opera Browser and compare the GPU flags, you'll also see that there are GPU acceleration features that are turned off because of broken drivers only on Samsung phones.

So it doesn't surprise me that the Galaxy S7 SD820 version has issues that no other SD820 phone does.

CLARiiON · Mar 30, 2016

When is the Exynos S7 review/deep dive expected?

Andrei. · Mar 30, 2016

Not for another month minimum. Sometimes I wish we could get devices faster but that's life.

CLARiiON · Mar 31, 2016

Andrei. said:
Not for another month minimum. Sometimes I wish we could get devices faster but that's life.

ahh, ok. waiting is such a tough thing to do 😀

Sweepr · Apr 21, 2016

Meh. Since last year Samsung is treating the Note line like an oversized S product with a stylus. This is your real flagship, so give me a new/updated SoC, state-of-the-art 4K AMOLED and different design/features.

krumme · Apr 21, 2016

Sweepr said:
Meh. Since last year Samsung is treating the Note line like an oversized S product with a stylus. This is your real flagship, so give me a new/updated SoC, state-of-the-art 4K AMOLED and different design/features.

Yeaa. Or get the product on market this month.

Nothingness · Apr 21, 2016

Sweepr said:
Meh. Since last year Samsung is treating the Note line like an oversized S product with a stylus. This is your real flagship, so give me a new/updated SoC, state-of-the-art 4K AMOLED and different design/features.

That indeed is disappointing. The only interesting part is the increased RAM memory 🙁

As far as SoC goes, Galaxy Note had the same one as the same generation Galaxy S: 4210 for GN1/S2, 4412 for GN2/S3.

We can hope for increased frequency...

Samsung Exynos Thread (big.LITTLE Octa-core)

Senior member

Member

Senior member

Diamond Member

Diamond Member

Diamond Member

Member

Platinum Member

Diamond Member

Member

Diamond Member

Diamond Member

Golden Member

Diamond Member

Diamond Member

Platinum Member

Golden Member

Senior member

Platinum Member

Member

Senior member

Member

Diamond Member

Diamond Member

Diamond Member