The disappointing slowdown CPU progress in last 6 years vs 4 years before (10 yrs) ago

Thala · Jun 12, 2019

ondma said:
That seems like a gross oversimplification. Would be easy to change the TDP and take over the desktop market as well as the mobile one. As another poster said, it is basically an apples to oranges comparison, since they dont run x86.

I never said it is trivial to take over the desktop market, as the single biggest issue is to overcome is the ecosystem barrier. If we ignore this ecosystem issue for a second it is very much apples to apples.
And while it is technically easy to increase the TDP target, noone would buy such a product at the moment. Anyone buying a desktop class computer are expecting to run desktop class applications, which simply are not exisiting outside x86 - at least commercial ones. For open source its much easier, as you can just re-compile.

In addition, the ARM architectures achieving higher IPC at much lower gate count. A Cortex A76 for example - while having similar performance per clock as Skylake - does this using roughly 1/3 of chip area (comparing core vs core).

IntelUser2000 · Jun 12, 2019

Thala said:
Yeah sure, why not?. All the mainstream ARM architectures (ARM Cortex A77, Exynos 9820?, Apple A12) have surpassed Intel and AMD with respect to IPC.

You don't really believe that other than for the A12 do you?

Thala · Jun 12, 2019

IntelUser2000 said:
You don't really believe that other than for the A12 do you?

Hmm? You can run the numbers yourself and come to the same conclusion, there is nothing secret here. A12 is quite a bit ahead though.

IntelUser2000 · Jun 12, 2019

Thala said:
Hmm? You can run the numbers yourself and come to the same conclusion, there is nothing secret here. A12 is quite a bit ahead though.

Exynos 9820 is roughly equal to Cortex A76. It does better in Geekbench but falls back down to A76 levels in most other benchmarks. Skylake parts are still few % ahead and A77 is still a part that can't be bought.

I guess the A76 is ahead of Zen/Zen+.

Yeroon · Jun 12, 2019

A C2D wasn't the highest core count cpu you could get on lga775 at that time, so that kinda makes your comparison incorrect.
Compare a C2Q to the 3770k and see how those numbers stack up.

JasonLD · Jun 12, 2019

For comparison's sake. Had to find clockspeed as close as possible to match those SoCs.

Nothingness · Jun 13, 2019

PotatoWithEarsOnSide said:
I meant, the CPUs from 10 years ago were also of lower performance ergo the performance gains were easier to make.
Asking for exponential growth is simply unrealistic. Going from 1 to 2 is a 100% improvement, yet going from 2 to 3 is only a 50% improvement despite the fact that the gain is exactly the same amount in real terms.

Ha then we completely agree. The exponential growth is unsustainable and I just meant to say Intel reached the plateau earlier (at some point their improvement each year was more than what the competition was achieving) than others which are now catching up.

Nothingness · Jun 13, 2019

IntelUser2000 said:
Yea, you can just look at Intel side and see this. 18% with Sunny Cove? What about 30-40% with Goldmont Plus? And 30-40% with Goldmont?

I was talking about high-end, and so didn't consider Goldmont. Goldmont is the perfect example of starting low and getting higher fast.

So Intel achieved 18% with Sunny Cove. Against Skylake which was released in 2015. Do I really need to remind you what ARM achieved during that time? And what AMD achieved?

OTOH, as I have already said multiple times during the last years, I'm impressed Intel still finds that many things to improve. And they may prove me wrong about the plateau effect in the coming years

Flayed · Jun 13, 2019

IntelUser2000 said:
There's no way a 3770K is 110% faster in single thread over E8400. At best, maybe 50% faster in average.

I couldn't help myself I had to look it up in bench. It does look a whole hell of a lot faster, see for yourself.
https://www.anandtech.com/bench/product/551?vs=54

Thala · Jun 13, 2019

IntelUser2000 said:
Exynos 9820 is roughly equal to Cortex A76. It does better in Geekbench but falls back down to A76 levels in most other benchmarks. Skylake parts are still few % ahead and A77 is still a part that can't be bought.

I guess the A76 is ahead of Zen/Zen+.

Now thats not different from my original statement. I did not mention Cortex A76 as i would as well estimate it somewhere between Zen/Zen+ and Skylake but for Cortex A77 we are on the safe side given the huge reported uplifts.

B-Riz · Jun 13, 2019

unseengundam101 said:
I have been comparing my speed increase from the Core2 Duo E8400 in 2008 to fasted CPU I got, a 3770K from 2012.

In UserBenchmark, I see a 440% increase in Mixed Multi Core, 110% in Single Core, and 275% in average user bench, 209% Effective.

How compare 6 years ahead (to CPUs in 2018) my 3770K vs i9-9000KF, I see a 182% MC increase, only 27% Single Core, 92% Average User, and 62% Effective

3770k vs 2018's Ryzen 2700x is even worse with 141% MC, measly 12% Single Core, 60% user, and 33% effective.

Ryzen Zen 2 (3950x) just might be able to get to 300% MC, but I doubt it will hit my 440%. Also, that Single core I am excepting only a 30% bump, no where close to jaw dropping 110% increase from 08 to 12, in just 4 years.

I am expecting such a slowdown in CPU progress that after my next CPU upgrade in 1-2 years I bet I could stick with for 1 or 2 decades (10-20 yrs)...

You do realize with the release of R7 1700 that AMD gave us affordable mini servers on the desktop?

Look at how modern OS's run, with a lot of stuff virtualized for security. (And needing lots of threads)

The future was cast 2 years ago at Zen launch, more cores, cheaper.

Threads are the currency of computing now.

IntelUser2000 · Jun 13, 2019

Nothingness said:
I was talking about high-end, and so didn't consider Goldmont. Goldmont is the perfect example of starting low and getting higher fast.

So Intel achieved 18% with Sunny Cove. Against Skylake which was released in 2015. Do I really need to remind you what ARM achieved during that time? And what AMD achieved?

I didn't disagree with you. In the big picture I believe there's little difference. If you are at top, anything you find from there you are breaking new ground. So up to a certain point, if you are behind, it can be said you are following the path taken by the leader. Whether its Goldmont, or ARM cores, it matters little.

I'm not sure if its right saying 18% is all they could have got out in the 4 years. They were so messed up during that time, that absolutely no contingency plan was in place in case the 10nm failed. The original Icelake should have been late 2017, and we should have been seeing a core after Tigerlake this year. Using historical execution of a company to judge limits of scalar performance does not seem right.

Even Skylake's gain may have been hindered by the problems they were having with 14nm process. 10nm was such a disaster, everyone forgot that it started with 14nm.

I couldn't help myself I had to look it up in bench. It does look a whole hell of a lot faster, see for yourself.
https://www.anandtech.com/bench/product/551?vs=54

Out of all the benchmarks there, only Sysmark and Cinebench 1T counts as ST tests(actually Sysmark takes advantage of 2 cores, but no more) that can be see as uarch gains. The rest is due to double the cores plus hyperthreading.

PotatoWithEarsOnSide · Jun 13, 2019

I don't think it is wrong to compare a horse-drawn cart with an internal combustion engine. Even if the earliest cars couldn't move much faster than their counterparts, they certainly get you further throughout the day.
Just because the gains aren't in single thread it doesn't mean that the gains aren't worthwhile. We're at a crossroads at which the future will look back more favourably at the move to multithread performance.
As I said previously, going 16c on the mainstream is a big thing.

amrnuke · Jun 13, 2019

Thala said:
Yeah sure, why not?. All the mainstream ARM architectures (ARM Cortex A77, Exynos 9820?, Apple A12) have surpassed Intel and AMD with respect to IPC. The only thing holding actual implementation of these architectures back in terms of desktop computing performance are the moderate clock frequencies as direct consequence of low TDP and hence low Voltage and high-density/high-Vth cell library mix.

That's oversimplifying things a great deal.

First, bi-endian mobile processor at low clocks and low TdP and small die has a far different development strategy from a little-endian desktop/prosumer 4/6/8/12/16 core processor that requires high base and boost clocks for heavy processing work, etc. ARM has been focusing on power efficiency. x86 has been focusing on processing power. So the two processes are fairly far diverged, and while the IPC are similar, good luck finding a current ARM processor that can overclock to 4 GHz to make those IPC similarities useful in the real world of desktop computing.

Second, there's a very good reason why multi-tasking is so limited even on the iPad - it's because engineering for multi-threaded performance is a lot different from the process for single-threaded performance. If it were so easy to toss an A12 into a laptop, Apple would have already done it. Instead, they might be doing it (not with the A12, but with another ARM-based solution) in 2020, because it will require a reworking of macOS, Xcode, etc and they'll have to bake in backward compatibility like they were doing with Rosetta. It's not a small feat moving from x86 to ARM, though it's nice that they've had a lot of experience moving iOS toward simple multitasking on the iPad, so they can use that experience on the laptops. But it's going to be a lot of work. When I'm driving, my wife's iPhone with an A12 can play music, give me driving directions, and have a browser or Facebook open or Candy Crush. But the amount of background work that can be done while doing anything in the active forefront is really limited. On my home PC I can be transcoding Plex movies my kids are watching, have Youtube videos playing on one screen, and doing photo processing on the other screen with tutorials for photo editing open in the background in the browser, without a hitch. I am not sure an A12 can handle that at this point. But we don't know, because there's no real way to do these things on the A12.

Third, regarding performance on desktops, which are far more multitasking oriented, even if you look at AMD vs Intel multithreaded performance, you can see quite large differences in performance. Scaling up performance with cores is not easy. It's not like they can easily just ramp up the clock speeds and cores and it'll just work. So while the IPC are on-par with x86, the ability to increase clock speeds means they need to start redesigning their chip topology for power delivery to support those frequencies, take into consideration heat dissipation, develop a good architecture for inter-core communication which is going to be different for a 4-core or higher architectures than their A77x2 + A55x6 designs even. Their ISAs are entirely different, and optimizations for speed with x86 ISAs like SSE3 and so on mean x86 has a huge advantage for real-world tasks, and for ARM to reach on-par performance will take a lot more than simply gluing together cores and adding a beefy cooling solution.

Finally, there has to be software to run on the ARM processor. Windows on ARM is still a big issue. Even though chromium OS is compatible with ARM, ARM processors are still slow enough that few manufacturers are even attempting to use them in their Chrombooks or Chromeblets, and even when they do, they're clearly laggards in performance compared to similarly priced Intel offerings. Ubuntu has ARM-ready OS, but that's such a narrow use-case that I doubt it makes much of a difference. If/when Apple move macOS to ARM, that'll be a defining moment.

tl;dr: This is so much more complex than just increasing the TDP and clock frequencies.

Topweasel · Jun 13, 2019

Nothingness said:
Starting from lower perf made their task easier

Intel had such a lead... I wonder if they can still find a lot of perf without killing efficiency. I guess conventional out of order CPU are reaching a plateau; most of them now have very similar micro-architectures.

We can only hope there will be a breakthrough but I'd be surprised if it didn't require heavy changes in software and programming languages.

Well on the other end. Let's say AMD started Zen at SB levels. They might have just hopped over Skylake in just one arch change and 26 months after release. Intel is on their third revision 7+ years later.

The jump from BD to Zen was helped by their poor starting point. This one is pretty legitimate.

DrMrLordX · Jun 13, 2019

JasonLD said:
For comparison's sake. Had to find clockspeed as close as possible to match those SoCs.

Those A12 results still scare me a little bit. But I digress.

It's impressive to see ARM's gains. Sadly, it's still difficult to get the latest ARM cores (such as A76) in anything but a phone. Maybe 8cx laptops will change that. With proper SVE support, I think ARM could make a splash. Currently it is difficult to find anything faster than A72 in an SBC or any other form factor that can run a proper desktop OS even for tinkering.

Thala · Jun 13, 2019

amrnuke said:
That's oversimplifying things a great deal.

First, bi-endian mobile processor at low clocks and low TdP and small die has a far different development strategy from a little-endian desktop/prosumer 4/6/8/12/16 core processor that requires high base and boost clocks for heavy processing work, etc. ARM has been focusing on power efficiency. x86 has been focusing on processing power. So the two processes are fairly far diverged, and while the IPC are similar, good luck finding a current ARM processor that can overclock to 4 GHz to make those IPC similarities useful in the real world of desktop computing.

First - I assume you have no idea what you are talking about. In any case any modern ARM architecture can clock north of 4GHz.
Reason: Given same cell library mix and two architectures if you are matching the frequency at one voltage point you are going to match frequency at other voltage points. So unless you can show that Skylake reaches 2.5GHz+ @ 0.75V your points are moot. Last time i checked the achievable frequencies at 0.75V are rather trending below 2Ghz.

Second - not sure what your claim here is. Are you somehow implying that architecturally ARM is less suited to multitasking? Thats absurd.

Third - gets even more confusing. Are you saying you need to take the scaling of interconnect and communication network into consideration when designing a many-core CPU? That is indeed the case, who would have guessed that! ARM has coherent mesh network IP with CHI/ACE interfaces for this readily available for its OEMs to use. Not sure if you meant this by "glueing together"... Typically with ARM you "glue together" up to 8 cores in a DSU cluster and then you "glue together" these clusters with the coherent mesh network (e.g. CMN-600), optionally adding a last level cache of up to 128MByte and finally you "glue" this with the memory controllers for up to 8 DDR channels.

VirtualLarry · Jun 13, 2019

Thala said:
First - I assume you have no idea what you are talking about. In any case any modern ARM architecture can clock north of 4GHz.

"architecture"... maybe. But are there any actual real-world 4Ghz ARM implementations? I didn't think so. So at those point, your supposition is merely hypothetical to me. (Granted, I'm not a chip engineer, and if you are, then your comments would hold more weight. But if ARM CAN clock to 4Ghz, why haven't we seen it?)

Edit: Forgot to mention, remember, "architecturally", the Pentium 4 was able to go to 10Ghz. We all know how that ended up.

scannall · Jun 13, 2019

VirtualLarry said:
"architecture"... maybe. But are there any actual real-world 4Ghz ARM implementations? I didn't think so. So at those point, your supposition is merely hypothetical to me. (Granted, I'm not a chip engineer, and if you are, then your comments would hold more weight. But if ARM CAN clock to 4Ghz, why haven't we seen it?)

Don't confuse ISA with architecture. In x86 for example, we have Core, Bulldozer, Ryzen, Atom, Nano etc. Those are all architectures in the x86 ISA. You can make an architecture for any ISA that clocks high, provided the process will allow it. I'm fairly certain that Apple's design would clock plenty high, with enough power applied to it for example. All the parts are there for a high performance CPU. As a guess, we'll see the A series chips in low end laptops from them in 2021ish.

JasonLD · Jun 13, 2019

VirtualLarry said:
"architecture"... maybe. But are there any actual real-world 4Ghz ARM implementations? I didn't think so. So at those point, your supposition is merely hypothetical to me. (Granted, I'm not a chip engineer, and if you are, then your comments would hold more weight. But if ARM CAN clock to 4Ghz, why haven't we seen it?)

I think it might be more of a financial than architecture limitation. It would be huge financial risk to invest on a desktop ARM architecture without the guarantee that they would ever recoup their investment back. I assume Arm server chips wouldn't need a major departure in terms architecture from its mobile siblings since high number of cores are more desirable than high single core performance, so just need to work on cache and interconnect without making significant architecture adjustment.

With Desktop market shrinking every year, I don't think any company would take a risk of making Arm desktop chip.

Carfax83 · Jun 13, 2019

VirtualLarry said:
"architecture"... maybe. But are there any actual real-world 4Ghz ARM implementations? I didn't think so. So at those point, your supposition is merely hypothetical to me. (Granted, I'm not a chip engineer, and if you are, then your comments would hold more weight. But if ARM CAN clock to 4Ghz, why haven't we seen it?)

Yeah I have to agree. I'm tired of hearing about how much more efficient and performant ARM is compared to x86. If it's so much better, put their damn money where their mouth is and develop solutions for desktop/workstation class applications!

IntelUser2000 · Jun 13, 2019

JasonLD said:
With Desktop market shrinking every year, I don't think any company would take a risk of making Arm desktop chip.

The biggest issue of having non-x86 desktop is that you end up having Windows on it. Even Windows 10 isn't up to par with how good Windows 7 was. Why they thought having massive tiles replace icons is completely beyond me. It's being different for the sake of being different. It isn't just Intel with Atom that screwed up. WinTel screwed up.

If you want a mobile consumption oriented device, its iOS or Android, period.

Tup3x · Jun 14, 2019

IntelUser2000 said:
The biggest issue of having non-x86 desktop is that you end up having Windows on it. Even Windows 10 isn't up to par with how good Windows 7 was. Why they thought having massive tiles replace icons is completely beyond me. It's being different for the sake of being different. It isn't just Intel with Atom that screwed up. WinTel screwed up.

If you want a mobile consumption oriented device, its iOS or Android, period.

Firstly, Win 7 being superior to Win 10 is a matter of personal preference. The bolded part shows that you are oblivious how Windows 10 start menu works. You should take a second look since things have changed quite a bit since launch.

Thala · Jun 14, 2019

VirtualLarry said:
"architecture"... maybe. But are there any actual real-world 4Ghz ARM implementations? I didn't think so. So at those point, your supposition is merely hypothetical to me. (Granted, I'm not a chip engineer, and if you are, then your comments would hold more weight. But if ARM CAN clock to 4Ghz, why haven't we seen it?)
Edit: Forgot to mention, remember, "architecturally", the Pentium 4 was able to go to 10Ghz. We all know how that ended up.

Just to clarify a few things. With (micro)-architecture i mean the concrete soft IP provided by ARM as RTL. An implementation of this architecture is for instance the concrete implementation of the Cortex A76 in the Snapdragon 855.
So implementation is mostly the back-end work starting with the design-compiler, which maps the RTL to the actual cell library. At this point you already have quite a few options, with TSMC N7 for instance you have the option of 3 different Vths (low Vth = fast but high leakage) in combination with different cell-sizes (larger cell = higher drive current - faster but bad for power). Then you do the layout and sign-off the timing. Thats an iterative process where you might need to add buffers or need to re-layout based on timing analysis - which tells you the slack in ps for all setup and hold violations in your design. When doing timing-signoff you are choosing a corner and a voltage. If you dont do any binning etc. you would choose worst conditions and 0.75V for instance - worst conditions (for setup-time) are for example 90 degree, slowest cells, 10% reduced Vmin. For hold-violation you would choose a higher voltage and lower temperature of course. I doubt that any mobile design is tested against hold violations above 1V anyway - why should it?

To make a long story short - current real world implementation wont reach frequencies 4-5GHz due to the implementation decisions described above. You most likely even running in hold violations when trying to increase the voltage, which would be necessary to reach 4+Ghz. Nominal voltage for TSMC N7 is 0.75V - anything above 0.85 is considerd overdrive. For mobile design you pretty much stay around nominal voltage levels and even go below for low load use-cases - while trying to keep leakage in place by chosing mostly slow cells.

However it is possible to just take the architecture from ARM and make a 4+Ghz implementation. There is nothing you would have to change at micro-architecture level. Just make your back-end work, targetting something like 1.4V and there you go.

Now regarding the question, why havent we seen it? Because what would you do with an ARM desktop processor at the moment, whom would you want to sell such a processor to? The desktop community ís more sensitive to the applications they want to run than to power or even number of cores. You would somehow need to justify the investment. Do you expect anyone to tape-out an 4GHz+ ARM CPU just for demonstration purpose?

For ARM OEMs targetting the server market is much more promising, as you wont have the big ecosystem/application barrier. And for serves targetting a high single-core turbo frequency is mostly irrelevant.

Carfax83 · Jun 14, 2019

Thala said:
However it is possible to just take the architecture from ARM and make a 4+Ghz implementation. There is nothing you would have to change at micro-architecture level. Just make your back-end work, targetting something like 1.4V and there you go.

But would the performance continue to scale linearly up to such a high frequency without any architectural changes? I'm no chip engineer or industry professional, but I thought that clockspeed was just one factor of performance, and that no architecture can continue to scale indefinitely regardless of whether it's a CPU or GPU as they run into other problems like memory bandwidth limitations for instance and start to taper off regardless of how high the frequency is. I know NVidia's Pascal has a sweet spot in terms of optimal operating frequency, and I assume it's the same for CPUs.

The disappointing slowdown CPU progress in last 6 years vs 4 years before (10 yrs) ago

Golden Member

Elite Member

Golden Member

Elite Member

Member

Senior member

Diamond Member

Diamond Member

Senior member

Golden Member

Golden Member

Elite Member

Senior member

Golden Member

Diamond Member

Lifer

Golden Member

No Lifer

Golden Member

Senior member

Diamond Member

Elite Member

Golden Member

Golden Member

Diamond Member