The disappointing slowdown CPU progress in last 6 years vs 4 years before (10 yrs) ago

Mr Evil · Jun 14, 2019

cellarnoise said:
...2 bit / on / off switches are also holding us back..

As possible improvements in performance from fabrication advances dry up, maybe we will see a reappearance of ternary computers, which are potentially faster and more efficient than our simple binary machines.

DrMrLordX · Jun 14, 2019

Mr Evil said:
As possible improvements in performance from fabrication advances dry up, maybe we will see a reappearance of ternary computers, which are potentially faster and more efficient than our simple binary machines.

Psst, the Virtual Adepts have been using those since the 1990s. Not that any of you Sleepers would know about that stuff.

IntelUser2000 · Jun 14, 2019

Tup3x said:
Firstly, Win 7 being superior to Win 10 is a matter of personal preference. The bolded part shows that you are oblivious how Windows 10 start menu works.

Windows 10 still has annoying issues. Control Panel/Settings means needing two different things for doing essentially the same thing. Latest version updates require going through unnecessary steps to allow simple file transfer over LAN.

I have a touch laptop with Windows 10 on it. It's still the same, way too large Tile system. Existence of Android, iOS, and Chrome negates the need for Windows versions that sacrifice compatibility, because they do a better job elsewhere.

10 has advantages, but it comes with disadvantages. 3 steps forward, 2 steps back.

PotatoWithEarsOnSide · Jun 14, 2019

The tile system only exists in one place, and you can remove it.
Personally, not a fan of app-style layouts, but Windows 10 layout is hardly annoying.

IntelUser2000 · Jun 14, 2019

PotatoWithEarsOnSide said:
The tile system only exists in one place, and you can remove it.
Personally, not a fan of app-style layouts, but Windows 10 layout is hardly annoying.

The Tile system is what you'll use if you have a touchscreen device, because its better than not having one.

However its far from optimal. I don't see why they had to use the Tile interface when the two dominant mobile OSes don't do this and theirs is basically an icon based system(used in Windows for decades) optimized for touch.

Companies are run by people, and they shadow how the leaders think. It has to do with ego. They wanted it unique and dominate with it, but giving no thought to it.

Thala · Jun 14, 2019

Carfax83 said:
But would the performance continue to scale linearly up to such a high frequency without any architectural changes? I'm no chip engineer or industry professional, but I thought that clockspeed was just one factor of performance, and that no architecture can continue to scale indefinitely regardless of whether it's a CPU or GPU as they run into other problems like memory bandwidth limitations for instance and start to taper off regardless of how high the frequency is. I know NVidia's Pascal has a sweet spot in terms of optimal operating frequency, and I assume it's the same for CPUs.

All the parts, which you linearly scaling up in frequency will scale linearly, obviously. For example a benchmark like Dhrystone, which running completely from L1$ scales linearly - always. More complex code, where you have misses in the last level caches will not scale linearly anymore because memory latency is not going down with higher frequency. To compensate you decrease miss rate by increasing cache size. In effect you are waiting more clock cycles for each miss but having less misses.

GPUs are little different here, as they have relatively small caches but working mostly on streaming data with very small temporal locality. For streaming data increasing caches wont help anyway...you need to compensate by increasing bandwidth.

Tup3x · Jun 14, 2019

IntelUser2000 said:
The Tile system is what you'll use if you have a touchscreen device, because its better than not having one.

However its far from optimal. I don't see why they had to use the Tile interface when the two dominant mobile OSes don't do this and theirs is basically an icon based system(used in Windows for decades) optimized for touch.

Companies are run by people, and they shadow how the leaders think. It has to do with ego. They wanted it unique and dominate with it, but giving no thought to it.

I don't get it what so bad about those tiles. They're just extra info, that's it. My workflow is the same as always: press windows key and start typing what I want. The app list is instantly there when I open start menu too. It functions the same as always.

It is not great that you can do the same thing in two different places. They have been moving settings slowly to new settings application but there are still quite a few things that you can't do there. It's weird how long it takes to just move all those old things to new place. There are inconsistency issues (old and new mixed) which is annoying. Granted, those are mostly "change few times a year" stuff but still.

IntelUser2000 · Jun 14, 2019

Tup3x said:
I don't get it what so bad about those tiles.

It is not great that you can do the same thing in two different places. They have been moving settings slowly to new settings application but there are still quite a few things that you can't do there.

I hate the Settings application. It's a panicked response to Android and iOS. Edge has the same problem. They cut features so much that you need to do more to do the same task. That's needed in screen-space constrained devices like a 5-inch screen Smartphone. You don't want that in a 20-inch monitor.

Tiles are not terrible by itself but things are always relative and icon-based systems are better.

Especially in Windows systems where you might have so many of them, the Tiles take way too much space, and all the information is very distracting. You can't easily tell what's what without reading it because they are all brightly colored squares. It was a very lazy way of doing things as if it was a 3rd party extension on top of Windows 7.

I had a Windows 95 system with Plus! pack installed. The Plus pack had "internet" mode where you can change all icons so it opens in a single click.

The Tablet mode should have been single-click icons, and the gestures they introduced with Windows 8 could have been to ease multi-tasking on touch mode.

A streamlined version of http://osxdaily.com/2010/02/28/the-ipad-evolves-into-the-iboard-and-the-imat/

is what Windows 8-10 should have been.

Microsoft managers should have swallowed their pride and adopted Android/iOS UI because by the time they realized they were late, they were too late all you can do by that point is follow. Mobile OSes took the best of Windows called Icons so its not really copying either.

Carfax83 · Jun 14, 2019

Thala said:
All the parts, which you linearly scaling up in frequency will scale linearly, obviously. For example a benchmark like Dhrystone, which running completely from L1$ scales linearly - always. More complex code, where you have misses in the last level caches will not scale linearly anymore because memory latency is not going down with higher frequency. To compensate you decrease miss rate by increasing cache size. In effect you are waiting more clock cycles for each miss but having less misses.

But what about pipeline depth and all that? From what I recall, the A12x is a very wide CPU, like 7 or 8 issue wide or some such. Does that not make scaling to higher clock speeds much more difficult or outright impossible?

If very wide designs like the Apple A12 series are so successful and performant, what's to stop Intel or AMD from making similar designs? I think Skylake is like 4 wide issue, and Icelake is supposed to bump it up to 5 wide. Zen is 4 issue wide, and Zen 2 is still at 4 I think.

But anyway, it seems that Intel and AMD prefer more balanced designs that can also hit higher clock speeds with ease.

Thala · Jun 14, 2019

Carfax83 said:
But what about pipeline depth and all that? From what I recall, the A12x is a very wide CPU, like 7 or 8 issue wide or some such. Does that not make scaling to higher clock speeds much more difficult or outright impossible?

An architecture which runs at low voltage at a certain frequency will just scale - as this is a process property and not an architecture property. Its not like A12 need 1.4V in order to achieve its frequency of 2.5GHz. Corollary if you reduce voltage of Skylake you will end up at very similar frequencies.

You notice this, when your PCs CPU is reducing voltage due to power or thermal throttling - it also needs to reduce frequency (it does not reduce frequency just for fun so to say) - the power controller typically choses the highest feasible frequency for that voltage.

From this you can conclude, that the critical logic path (in terms of gate depth and length) between 2 registers is pretty similar between both architectures.

If very wide designs like the Apple A12 series are so successful and performant, what's to stop Intel or AMD from making similar designs? I think Skylake is like 4 wide issue, and Icelake is supposed to bump it up to 5 wide. Zen is 4 issue wide, and Zen 2 is still at 4 I think.

Thats simply an architectural decision. Making a CPU wider increases area and power - and here Intel already went borderline huge for 14nm. Thats why its always interesting how much area is needed to achieve a certain IPC - here ARM shows huge advantages. As i have written elsewhere, Cortex A76 has IPC somewhere between Zen and Skylake but is 3 times smaller (normalized to the same process node).

naukkis · Jun 15, 2019

Carfax83 said:
But what about pipeline depth and all that? From what I recall, the A12x is a very wide CPU, like 7 or 8 issue wide or some such. Does that not make scaling to higher clock speeds much more difficult or outright impossible?

Different ISA's. Skylake instruction fetch is actually 320 bits vs 224 bits in A12, but in instruction that's 7 for A12 and up to 5 with Skylake as ARM instructions are fixed 32bits and x86 instructions are varying lengths least 64 bits. ISA's do matter....

IntelUser2000 · Jun 16, 2019

Thala said:
As i have written elsewhere, Cortex A76 has IPC somewhere between Zen and Skylake but is 3 times smaller (normalized to the same process node).

Apple made some very good choices, and their execution is stellar. Lower pipeline stages that are wider may be inevitable for everyone though.

Why?

Consider that the air cooled limit is ~5GHz. This is where it needs absolute bleeding edge optimization techniques to achieve the frequencies. It would result in leakage which would worsen the localized heating caused by the high frequency.

So even if the next generation processes were able to offer faster clocks, you won't get faster than 5GHz anyway. If(and its a big one) you are able to create a CPU that clocks 3GHz but performs 50% better per clock, you might lose 10% of performance, but you get potential for future scaling, until you reach 5GHz again.

Skylake core seems exceptionally large. This could be done to reduce the effect above to reach the high frequency. If it were to retarget it to 3-4GHz range, the core could be much more dense.

We know Intel can make more dense cores, because their Atom cores are very dense.

As possible improvements in performance from fabrication advances dry up, maybe we will see a reappearance of ternary computers, which are potentially faster and more efficient than our simple binary machines.

The problem is we are asking for traditional computers to be faster. I think, this is the heart of the problem. If you include Smartphones, they've been improving in a rapid way. So when the scaling showed in desktops, the answer was not "faster traditional computers", but Smartphones and Tablets.

There are no near-term replacements because it doesn't need to be. Machines dedicated to the task, fit with accelerators are the future. Even quantum computers(long term thing) are looked to be the same thing. More accelerators.

pmv · Jun 16, 2019

I for one welcome the reduction in the tiresome pressure to upgrade! No longer do I feel my PC 'obsoleting' in real time!

Had current one for 5 years now, far longer than I've ever gone before without upgrading or replacing a PC. Doubly fortunate in that I don't have the discretionary spending capability I used to.

I suppose the market is now more focussed on lower-power consumption for mobile devices?

gipper53 · Jun 16, 2019

It's not just CPUs, but electronics development in general seems to have really slowed down the last decade. My main hobby is digital photography. The decade between 2002-2012 saw MASSIVE improvement every couple of years on all fronts. If it was 2008 and you were shooting a camera made in 2003... you were waaaay behind, almost embarrassingly so! Between 2001 and 2007 alone, megapixels jumped from 5 to 24 on high end bodies, with prices dropping in half. Performance in low light also improved by leaps and bounds in that time frame.

Since about 2013, progress has been slow in regards to pure image quality improvements. The pixel counts still go up, dynamic range improves and ISO 6400 is nothing today, but it's it's a slow march now. Many other things have advanced regarding operation of the camera and especially video specs, but a high end camera from 2013-ish is still a highly capable imaging device, and many pros are still using them. Heck, many pros still use bodies from 2007/2008 as cheap backup bodies or for "don't care if it breaks" assignments. They still produce great images.

DrMrLordX · Jun 16, 2019

@gipper53

You can blame cell phone cameras for some of that. The incentive to produce better DSLRs etc. has been tamped down by the excellent (and often expensive) cameras that people are buying in their phones. Professionals still need the features of a DSLR, but regular consumers don't really need to buy a standalone camera for anything anymore. Taking that money out of the low-end camera market affects the entire digital camera market as a whole. We are seeing the same problem in the CPU/desktop world.

gipper53 · Jun 16, 2019

@DrMrLordX

You can blame cell phones for most of it. Sales of point and shoot and low end DSLRs have been in the toilet and sinking further for a number of years now. As you said, nobody buys a DSLR now just to get daily life snapshots. So it's only the pros and enthusiasts buying 'real cameras' now. As a result, the high end is where all the development is at and the prices are starting to get absurd. Most new lenses are obnoxiously large and expensive in the newest quest for optical perfection. It's a fierce fight vying for the wallets of dentists and lawyers. The industry is in a state of transition and contraction at the same time. The next few years will be interesting.

IntelUser2000 · Jun 17, 2019

gipper53 said:
You can blame cell phones for most of it. Sales of point and shoot and low end DSLRs have been in the toilet and sinking further for a number of years now.

That, and for most people quality of cameras on a Smartphone is good enough.

Yes, I know some people keep quoting the "640KB ought to be enough" when saying we can always be better.

But we really are in the good enough stage. Add to that improvements are getting more expensive and harder, there's little reason to focus on high end stuff. Smartphones have been rapidly improving because they've been that much behind, and because there was a reason to aim for that good enough point reached by other consumer electronics.

Even Smartphones are reaching the point where its slowing down.

PotatoWithEarsOnSide · Jun 17, 2019

Smartphone makers have to keep adding an extra camera each generation in order to justify the £250 price increases that each new generation brings.

JoeRambo · Jun 17, 2019

naukkis said:
Different ISA's. Skylake instruction fetch is actually 320 bits vs 224 bits in A12, but in instruction that's 7 for A12 and up to 5 with Skylake as ARM instructions are fixed 32bits and x86 instructions are varying lengths least 64 bits. ISA's do matter....

x86 instructions are complex, but encode more information ( like op + wide operand + mods + lock whatever ). It does not matter if someone designs architecture with 16bit instructions, if common "operations" take 2-3 such 16bit instructions to express.

These arguments have been rage in 1990, it maybe mattered more back then, but now even ARM has uOp caches, that translate architectural instructions into what cores actually execute. More complex decoders obviuosly still burn power and that is not good, but not all is black or white either.

naukkis · Jun 17, 2019

JoeRambo said:
but now even ARM has uOp caches, that translate architectural instructions into what cores actually execute

After Meltdown and Spectre everybody know that they don't need to have full address range in core. Intel seems to have their core address range about 20 bits. ARM needs also uOP cache to get most out of truncated address range in core and first level caches.

Thala · Jun 17, 2019

naukkis said:
After Meltdown and Spectre everybody know that they don't need to have full address range in core. Intel seems to have their core address range about 20 bits. ARM needs also uOP cache to get most out of truncated address range in core and first level caches.

ARMv8 using for all intents and purposes 64 bit virtual addresses at the core. The VA output from the core need to have at least the 16 upper bits set to either 0 or 1 in EL0/1 to avoid triggering an MMU fault - but the core always works with a 64 bit effective address.
In short, i am not sure what you are talking about...

JoeRambo · Jun 17, 2019

Thala said:
i am not sure what you are talking about...

Yeah, me neither, it's not only addressing width that is relevant to instruction size, but also operands, immediates, modifiers etc. There is simply no way for ARM to add lets say 40 bit constant value to register, yet x86 can do that with simple add reg, [mem]. Try that on ARM where max you get is 8 bits of constant value with some mods.

Nothingness · Jun 17, 2019

JoeRambo said:
Yeah, me neither, it's not only addressing width that is relevant to instruction size, but also operands, immediates, modifiers etc. There is simply no way for ARM to add lets say 40 bit constant value to register, yet x86 can do that with simple add reg, [mem]. Try that on ARM where max you get is 8 bits of constant value with some mods.

That's 9 bits, not 8. Or if you insist on being partially correct that sign + 8 bits.

I measured dynamic code size (that is add size of executed instructions) on SPECint 2000 176.gcc some years ago and AArch64 was less than 10% larger than x86-64. That was back when the compiler used (gcc) was not mature for AArch64; it might be better now... or not

Anyway x86-64 is not that denser than AArch64 contrary to some claims I read.

IntelUser2000 · Jun 17, 2019

JoeRambo said:
These arguments have been rage in 1990, it maybe mattered more back then, but now even ARM has uOp caches, that translate architectural instructions into what cores actually execute. More complex decoders obviuosly still burn power and that is not good, but not all is black or white either.

I think what matters more is how dominant the market is based on that instruction set.

If its big enough, then there will be enough people working on it to address the deficiencies.

Micron, when talking about the so-called DRAM killers has said DRAM has surpassed them all just because the market around them was so large. So a theoretical deficiency has been overcome by people putting effort into improving it.

Nothingness · Jun 18, 2019

IntelUser2000 said:
I think what matters more is how dominant the market is based on that instruction set.

If its big enough, then there will be enough people working on it to address the deficiencies.

Micron, when talking about the so-called DRAM killers has said DRAM has surpassed them all just because the market around them was so large. So a theoretical deficiency has been overcome by people putting effort into improving it.

The issue is that, in the case of encoding and instruction sets in general, you're bound by compatibility with decades of old software. x86 is particularly encumbered by that heritage, it is its strength and its weakness. ARM has the same issue now, though AArch64 was a fresh start and one that is much saner than 32-bit ARM and x86.

The disappointing slowdown CPU progress in last 6 years vs 4 years before (10 yrs) ago

Senior member

Lifer

Elite Member

Senior member

Elite Member

Golden Member

Golden Member

Elite Member

Diamond Member

Golden Member

Golden Member

Elite Member

Lifer

Member

Lifer

Member

Elite Member

Senior member

Golden Member

Golden Member

Golden Member

Golden Member

Diamond Member

Elite Member

Diamond Member