• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

Samsung Exynos Thread (big.LITTLE Octa-core)

Page 27 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
My guesses:

1. Something related to Qualcomm security for Samsung Pay? Both US and China are supposed to have Qualcomm models. Then again Korea have Exynos!
2. Radios for CDMA, where Qualcomm clearly have upper hand. Probably China also have some of those bands where Qualcomm radios work better.

3. better gpu
 
I ran VFP Benchmark on my Exynos-based S7 Edge.

Looking at the results, one can deduce the AdvSIMD engine is 128-bit wide and it can run fmul/fadd/fmla every cycle (both SP and DP). This means >20 GFLOPS for SP (mul+add) and >10 GFLOPS for DP. Not bad.

What I find odd is that the MT score reaches 147 SP GFLOPS and 73 DP GFLOPS whichs means a 7x speedup, while I would have expected less than 6: 1-2 M1 run @2.6 GHz ,while if you run 8 cores 4 M1 run @2.3 GHz and 4 Cortex-A53 @1.6 GHz, 4*2.3 + 4*1.6 = 15.6 = 6.*2.6.

Can someone run the benchmark on an S820?
 
I ran VFP Benchmark on my Exynos-based S7 Edge.

Looking at the results, one can deduce the AdvSIMD engine is 128-bit wide and it can run fmul/fadd/fmla every cycle (both SP and DP). This means >20 GFLOPS for SP (mul+add) and >10 GFLOPS for DP. Not bad.

What I find odd is that the MT score reaches 147 SP GFLOPS and 73 DP GFLOPS whichs means a 7x speedup, while I would have expected less than 6: 1-2 M1 run @2.6 GHz ,while if you run 8 cores 4 M1 run @2.3 GHz and 4 Cortex-A53 @1.6 GHz, 4*2.3 + 4*1.6 = 15.6 = 6.*2.6.

Can someone run the benchmark on an S820?
ed94929022a8d9f59426cb587c4a9d47.jpg

My scores seem kinda low.
 
What's interesting to see will be a detailed perf/power analysis of both these chips in comparison to an A72 based SoC like used by the Kirin 950.
The mid-range A72 based chip-SD650(28nm) used in Xiaomi Redmi note 3 ~150/180$ easily outpaces SD808(20nm) and does quite well againt SD810(20nm) while not having as much thermal issues.Finally the big-core has arrived to lower ranges in some markets.SD650 based phones could make a lot of SD616/808/810 based phones irrelevant if priced right. A53 wasn't big enough improvement from A7 but A72 has definitely delivered over A57.
 
Last edited:
What's interesting to see will be a detailed perf/power analysis of both these chips in comparison to an A72 based SoC like used by the Kirin 950.
The mid-range A72 based chip-SD650(28nm) used in Xiaomi Redmi note 3 ~150/180$ easily outpaces SD808(20nm) and does quite well againt SD820(20nm) while not having as much thermal issues.Finally the big-core has arrived to lower ranges in some markets.SD650 based phones could make a lot of SD616/808/810 based phones irrelevant if priced right. A53 wasn't big enough improvement from A7 but A72 has definitely delivered over A57.
They're already irrelevant in this part of the world, as for premium phones I'd say only Apple atm has a death grip on that segment & the rest (including Sammy) are falling by the wayside. I have the 2GB model of Redmi Note 3 & for the price, it's closer to 140$ including all the discounts & cashback one can muster, there's no better alternative out there.
 
Oops,meant SD810 not 820.
Sammy is still doing fine sort of with ads and other means.Chinese OEMs might make a even more significant dent on bigger OEM's share if they get into more markets and as awareness in other Asian countries apart from China grows though they are already doing well in India and likes.Anyway,went way off-topic.
 
Looking at the results, one can deduce the AdvSIMD engine is 128-bit wide and it can run fmul/fadd/fmla every cycle (both SP and DP). This means >20 GFLOPS for SP (mul+add) and >10 GFLOPS for DP. Not bad.

1:2 SP to DP ratio on fmul/fmadd seems like an odd choice to me. I wonder how many are actually running FP64 code on these CPUs.

Maybe it'll help somewhere with PS2 emulation one day (not that PS2 has FP64 but it might be useful for emulating its weird FP32 behavior)
 
1:2 SP to DP ratio on fmul/fmadd seems like an odd choice to me. I wonder how many are actually running FP64 code on these CPUs.

Maybe it'll help somewhere with PS2 emulation one day (not that PS2 has FP64 but it might be useful for emulating its weird FP32 behavior)
Perhaps the CPU, or parts of it, are to be reused (or were to be reused, if the project was cancelled) on a desktop/server chip.
 
1:2 SP to DP ratio on fmul/fmadd seems like an odd choice to me.

How is this a choice? This is essentially given by the fact that you can pack double the amount of SP datatypes into single 128 bit NEON register. A NEON operation completes 2 DP operation or 4 SP operations per cycle best case. For Kryo and Mongoose this seems to be the case.
On Cortex-A5 for example even the integer datatypes need 2 cycles but you still have the ratio 2:1 between 16bit and 32bit.
 
How is this a choice? This is essentially given by the fact that you can pack double the amount of SP datatypes into single 128 bit NEON register. A NEON operation completes 2 DP operation or 4 SP operations per cycle best case. For Kryo and Mongoose this seems to be the case.
On Cortex-A5 for example even the integer datatypes need 2 cycles but you still have the ratio 2:1 between 16bit and 32bit.

The ratio is for cycles per operation (on a total throughput basis), it has nothing to do with data sizes. Hence why it was in response to throughput figures.

The amount of work needed for a multiplication scales quadratically with input width. You can see this in Cortex-A8 and A9 with integer NEON multiplications: 4x16-bit issues in one cycle but 2x32-bit issues in two cycles. Going from single precision to double precision floating point is even worse because the multiplication part goes from 23 bits to 52 bits meaning that there's over five times as much work needed. But some other work is needed for FP calculations that don't scale as much so it's not quite that bad in terms of overall logic increase.

The point is, if you want to support a 1:2 SP to DP work ratio you need a lot of extra computational logic that's only for the benefit of double precision. If you go with a 1:4 ratio you can use ~26 bit multipliers instead. Jaguar for example has the 1:4 ratio.

It is however worth noting that Cortex-A57 and A72 are 1:2 so the precedent has already been set.
 
The amount of work needed for a multiplication scales quadratically with input width. You can see this in Cortex-A8 and A9 with integer NEON multiplications: 4x16-bit issues in one cycle but 2x32-bit issues in two cycles. Going from single precision to double precision floating point is even worse because the multiplication part goes from 23 bits to 52 bits meaning that there's over five times as much work needed. But some other work is needed for FP calculations that don't scale as much so it's not quite that bad in terms of overall logic increase.
When you have to add only two DP muladd units, the extra area is not that large (compared to the rest of the CPU), especially since these DP units can be used for SP computations too with some tweaks.
 
The amount of work needed for a multiplication scales quadratically with input width.

Size of an ALU is not an issue for the cores we are talking about. It rather is circuit depth and DEPTH = O(log n) for multiplication and addition. So its not bad if you are shooting for single cycle DP multiplication and if you can afford the depth increase. With such an ALU you also getting single cycle 2xSP multiplication.

Of course for smaller cores like Cortex A5 size is an issue.
Not convinced that the savings in gate count for Jaguar really pays off.
 
I watched only the first movie and the Exynos is way more fluid, but it scores lower in benchmarks, does the NAND speed make it feel faster?

It's strange but Samsung has somehow messed up Snapdragon GPU performance in their phones for a while now.

For example, the Galaxy S4 had a faster version of the the SD600's cpu and gpu than the M7 but despite that, will experience stuttering when scrolling in Google Maps that doesn't exist on the M7.

When you open up Opera Browser and compare the GPU flags, you'll also see that there are GPU acceleration features that are turned off because of broken drivers only on Samsung phones.


So it doesn't surprise me that the Galaxy S7 SD820 version has issues that no other SD820 phone does.
 
Not for another month minimum. Sometimes I wish we could get devices faster but that's life.
 
Last edited:
i.img


i.img


Meh. Since last year Samsung is treating the Note line like an oversized S product with a stylus. This is your real flagship, so give me a new/updated SoC, state-of-the-art 4K AMOLED and different design/features.
 
Last edited:
i.img


i.img


Meh. Since last year Samsung is treating the Note line like an oversized S product with a stylus. This is your real flagship, so give me a new/updated SoC, state-of-the-art 4K AMOLED and different design/features.
Yeaa. Or get the product on market this month.
 
Meh. Since last year Samsung is treating the Note line like an oversized S product with a stylus. This is your real flagship, so give me a new/updated SoC, state-of-the-art 4K AMOLED and different design/features.
That indeed is disappointing. The only interesting part is the increased RAM memory 🙁

As far as SoC goes, Galaxy Note had the same one as the same generation Galaxy S: 4210 for GN1/S2, 4412 for GN2/S3.

We can hope for increased frequency...
 
Back
Top