Linus Torvalds: Too many cores = too much BS

Dribble · Jan 8, 2015

ShintaiDK said:
That still dont invalidate anything he said. More supporting it.

The super computer market leads the way, the rest of us follow. How many cores on your average super computer? If we want more processing power we will have to have lots of little cores, that's what super computer makers worked out a long time ago. It's just tough transition making the software to support that sort of hardware.

Essence_of_War · Jan 8, 2015

The super computer market leads the way, the rest of us follow.

Because opening excel spreadsheets and adjusting photo blurs are exactly the same as doing CFD, PIC, Vlasov, and QCD simulations?

The super computer market isn't playing the same game as desktop software, so it isn't clear to me why it's leading anyone except for itself.

ShintaiDK · Jan 8, 2015

Dribble said:
The super computer market leads the way, the rest of us follow. How many cores on your average super computer? If we want more processing power we will have to have lots of little cores, that's what super computer makers worked out a long time ago. It's just tough transition making the software to support that sort of hardware.

Supercomputers handles an entirely different application scope that is parallel in nature. Always been, always will. Same reason why HPC cards got popular.

Ajay · Jan 8, 2015

Idontcare said:
To be fair, on the flip side the hardware guys have a conflict of interest when it comes to pursuing multi-core MPU development versus increasingly larger core development.

Much less costly and less time consuming to task your R&D team with the goal of making a single tiny core with some otherwise modicum IPC improvements followed by cutting and pasting 6 or 18 of the same small core next to each other on a die layout schematic versus the alternative of allocating the exact same silicon footprint and xtor budget as those 6 cores consume but tasking your R&D team to come up with single or dual large core design.

Hardware guys are like "meh, let's just make it easier on ourselves, design a single small core and just cookie-cutter the hell out of our SoC and tell the world we have no choice because the software guys keep claiming parallelism is a chicken-and-egg problem, well we did our part to solve that problem, hard part done, now the software guys just need to get off their lazy asses and deliver on the programming"

True. On the downside, especially for Intel, would be the extra time spent design and verification - that would lengthen the 'tick-tock' cycle. Even with that, I don't think we could expect large gains. Higher clocks would help - but that would present larger problems for process development and would not be helpful in mobile. I guess, as you mentioned earlier, it comes down to economics - both for the MPU manufactures and the software developers. This is the best we get for now without either side blowing up their development budgets and having to charge much more for their products.

Cerb · Jan 8, 2015

DrMrLordX said:
While Cell failed, current-gen console parts have essentially done the same thing all over again, only worse: instead of one "general purpose" core with several (6-8, depending on how many were fused off) SPEs, now we have 8 "general purpose" cores with, what, 1024 shader units? Those shader units can be used for computation just as easily as they can be used for graphics, too. Whatever lesson Cell was meant to teach to the industry as a whole, the one that firms like AMD and Intel don't seem to have learned is "parellelism in consumer hardware is bad, mmmkay".

That being said, those parts aren't necessarily the defining hardware in what will be the next generation of processors. Cell was meant to be used in darn near everything, while those custom AMD parts are pretty niche.

Those shader units are the GPU, same as a PC, and already do useful work, if not used for special computations. You can buy PCs with similar setups to what's in the consoles, just not made for gaming (IE, less on the RAM and graphics). Jaguars with Radeons are in billboards, firewalls, PBXes, desktops, notebooks, etc.. The specific SoC is niche, but that's MS' and Sony's respective decisions. The internals that power it are already all over the place (Intel's Atoms even more-so).

They did nothing like the Cell. They went with low power parts that were based on those with proven performance, and an existing software development ecosystem. The Cell was an, "if you build it they will come," chip. Jaguar with Radeon HD is a, "here's something that will work, because it's an improvement on this other thing that already works well enough for your needs."

Skylake is going to have Gen9 graphics, which will be just another step in the evolution towards APU-like behavior by mainstream Intel products. Sure, it'll have "fat cores" for the same reason that Cell and AMD's custom Jaguar chip (among others) have them.

The Cell's CPU core was narrow, in-order, and cache-starved from day one, and had no graphics capability at all.

t remains to be seen what Intel can do to goose up "fat core" performance as they move forward. Judging by the early Broadwell results (yeah, I know, it's still early and that's a low-power part) I'm not holding my breath waiting for +%20 IPC over Haswell or anything like that.

+20% is long dead. Broadwell looks to be like IB, with little to no gains on that front. Skylake should bring some, but 20% is, again, dead, at least short of some major disruption.

See, there's the rub: "offloading it to special hardware" is exactly what AMD and (eventually) Intel are going to want people to do.

But that's not the rub. That's the expected reality, based on the current trends.

Jouni Osmala said:
There is enough scaling left to turn somewhat better than current high end xeons to client CPU:s

Linus Torvalds said:

Maybe. And probably not. Given the choice between 16 cores and 4, I suspect most people will take 4, and prefer more cache, graphics, and integrated networking etc.

Click to expand...

Idontcare · Jan 8, 2015

NTMBK said:
Doesn't going multicore just make the interconnect design more difficult? Keeping 18 cache-coherent cores efficiently fed with data can't be an easy job!

Nothingness said:
Yes, and place and route can get very difficult depending on the topology of your interconnect.

Enigmoid said:
I would say that was part of the problem AMD had with Zambini/Piledriver. They need a better interconnect to cut down a lot of the excess cache; part of the problem was that each module had 2 MB exclusive cache meaning a shared L3 was needed for performance, bloating the die.

NTMBK, you are right to say it "can't be easy".

But the self-evident truth of the matter is that despite it not being easy, it is easier than spending 2x or 3x the R&D budget to make the core 2x or 3x larger with 1.5-2x the IPC.

The hardware market everywhere, literally everywhere, copped out and went multi-core as soon as the xtors became cheap enough and the silicon area became cheap enough relative to the design costs for the core itself.

If there are aliens traveling among the stars, I doubt they got there by creating sub-nanometer process technology just to enable processors with thirteen bajillion cores so they could sit back and expect the software lackeys to do something with them all.

Our present day "more cores than our software knows what to do with them" conundrum here on earth was created for one reason only - profit motivated economics of the hardware companies.

If multicore was the battlecry of the software industry or the consumer then Intel's Pentium CPU would have looked a lot more like a dual-core 486.

Ramses · Jan 8, 2015

Idontcare said:
NTMBK, you are right to say it "can't be easy".

But the self-evident truth of the matter is that despite it not being easy, it is easier than spending 2x or 3x the R&D budget to make the core 2x or 3x larger with 1.5-2x the IPC.

The hardware market everywhere, literally everywhere, copped out and went multi-core as soon as the xtors became cheap enough and the silicon area became cheap enough relative to the design costs for the core itself.

If there are aliens traveling among the stars, I doubt they got there by creating sub-nanometer process technology just to enable processors with thirteen bajillion cores so they could sit back and expect the software lackeys to do something with them all.

Our present day "more cores than our software knows what to do with them" conundrum here on earth was created for one reason only - profit motivated economics of the hardware companies.

If multicore was the battlecry of the software industry or the consumer then Intel's Pentium CPU would have looked a lot more like a dual-core 486.

That rings true.

Do you think the multi core route was because it was just that much more profitable and still counts as progress or because it would have been completely unprofitable/unsellable to do otherwise? CPU's seem to work about like TV's to me sales wise, they have to come up with something new every year or two to keep selling them or nothing gets done.

Lepton87 · Jan 8, 2015

Idontcare said:
NTMBK, you are right to say it "can't be easy".

But the self-evident truth of the matter is that despite it not being easy, it is easier than spending 2x or 3x the R&D budget to make the core 2x or 3x larger with 1.5-2x the IPC.

The hardware market everywhere, literally everywhere, copped out and went multi-core as soon as the xtors became cheap enough and the silicon area became cheap enough relative to the design costs for the core itself.

If there are aliens traveling among the stars, I doubt they got there by creating sub-nanometer process technology just to enable processors with thirteen bajillion cores so they could sit back and expect the software lackeys to do something with them all.

Our present day "more cores than our software knows what to do with them" conundrum here on earth was created for one reason only - profit motivated economics of the hardware companies.

If multicore was the battlecry of the software industry or the consumer then Intel's Pentium CPU would have looked a lot more like a dual-core 486.

You say 1.5x2X IPC but you didn't mention clock-speed, do you think it's even possible to produce an X86 CPU with 2x single-threaded performance of 4790K/5930K? Somehow I seriously doubt that, at least using conventional cooling and not sub-zero cooling. I think it's gonna take 7 years for that to happen.

imported_ats · Jan 9, 2015

Idontcare said:
Much less costly and less time consuming to task your R&D team with the goal of making a single tiny core with some otherwise modicum IPC improvements followed by cutting and pasting 6 or 18 of the same small core next to each other on a die layout schematic versus the alternative of allocating the exact same silicon footprint and xtor budget as those 6 cores consume but tasking your R&D team to come up with single or dual large core design.

This is actually quite false. One of the significant issues with numerically large multicore and manycore designs is the actual interconnection network required for communication. As you increase the core count, the complexity required to deliver the same uncore performance increases substantially. You can of course take short cuts but this invariably results in sub-par performance.

Likewise ST performance has been low because people are concentrating on absolute power instead of power/perf currently. This has been the overriding trend of the last couple generations of mainstream processors. For example, if Intel was concentrating on perf/power, broadwell would likely be significantly faster than it is, probably on the order of another 5-10%. But instead the concentration is much more on absolute power with a 2x perf/power ratio given as a bare minimum for a design change. This is because the focus has largely shifted to both mobile and server instead of the desktop. The absolute power focus for mobile is obvious, but it also has a significant impact in server where the goal is to add ever more cores as its the easiest path to performance increases in that space. This has largely left the desktop market being a leftover segment which actually has significant thermal margin in most cases (mainstream 60-70w parts currently vs 100-130w just a few years ago).

Hardware guys are like "meh, let's just make it easier on ourselves, design a single small core and just cookie-cutter the hell out of our SoC and tell the world we have no choice because the software guys keep claiming parallelism is a chicken-and-egg problem, well we did our part to solve that problem, hard part done, now the software guys just need to get off their lazy asses and deliver on the programming"

The reality is that there are plenty of high core count processors out there to develop on. The reason most applications can't take advantage of more than a couple actual hardware contexts has basically nothing to due with hardware and everything to do with software. Making an application that can take advantage of a significant number of hardware contexts with a single inputing user is very hard outside of some rather small and specialized sub-fields. And the majority of that fields get more advantage out of wider and more flexible SIMD than more cores.

Idontcare · Jan 9, 2015

Lepton87 said:
You say 1.5x2X IPC but you didn't mention clock-speed, do you think it's even possible to produce an X86 CPU with 2x single-threaded performance of 4790K/5930K? Somehow I seriously doubt that, at least using conventional cooling and not sub-zero cooling. I think it's gonna take 7 years for that to happen.

Yes of course it is possible. However, it is not possible if you require "1% IPC gain is only acceptable if it increases power consumption by 0.5%"

Engineering is about trade offs, but those trade offs are made within the constraints of your time-zero boundary conditions (of which budget and delivery timeline are defined).

If you direct your team to build a core within a specific timeline, with a specific dollar budget, xtor budget, silicon area budget, power consumption window, etc., then you will get what you asked for. Be it a Pentium 4 or an Athlon 64 X2.

With today's technology, getting to the moon is possible for anyone or any nation with enough capital to make it happen. The same is true for large fat cores that will have wonderfully high IPC. But don't expect them to sip power, speculative computing requires electrons.

Ramses · Jan 9, 2015

So, fewer, faster cores would cost a lot, be less power/heat efficient, and be expensive.

So, multi cores sounds like the right decision given the required constraints, no?

Dribble · Jan 9, 2015

Essence_of_War said:
Because opening excel spreadsheets and adjusting photo blurs are exactly the same as doing CFD, PIC, Vlasov, and QCD simulations?

The super computer market isn't playing the same game as desktop software, so it isn't clear to me why it's leading anyone except for itself.

If it was possible super computers would use less cores too - don't know why you think it's magically simpler for them to deal with massive parallelism. Super computers used less cores, and in the 90's the massive very fast single cores started getting blown away by the multi core designs. At that cross over point they gave up trying to make super powerful cores (what they had always done) and went parallel.

Same is true in the home consumer market. The bits of it not held up by historical apps and OS (i.e. windows) already have gone massively parallel (e.g. gpu/gpu compute, which was built from the ground up to work with many cores). Just a matter of time until the rest catches up.

ShintaiDK · Jan 9, 2015

Ramses said:
So, fewer, faster cores would cost a lot, be less power/heat efficient, and be expensive.

So, multi cores sounds like the right decision given the required constraints, no?

If you cant use more cores....

The fragmentation is quite obvious when we see enduser/server space.

Lepton87 · Jan 9, 2015

Idontcare said:
Yes of course it is possible. However, it is not possible if you require "1% IPC gain is only acceptable if it increases power consumption by 0.5%"

Engineering is about trade offs, but those trade offs are made within the constraints of your time-zero boundary conditions (of which budget and delivery timeline are defined).

If you direct your team to build a core within a specific timeline, with a specific dollar budget, xtor budget, silicon area budget, power consumption window, etc., then you will get what you asked for. Be it a Pentium 4 or an Athlon 64 X2.

With today's technology, getting to the moon is possible for anyone or any nation with enough capital to make it happen. The same is true for large fat cores that will have wonderfully high IPC. But don't expect them to sip power, speculative computing requires electrons.

But would it even be possible to cool such a core with just air? If a rule of the thumb would be every increase in IPC is worth 3x increase in power, how much would power density shoot up? Hell, even my overclocked HW-E is hard to cool with a custom water that cost me 400$.

DrMrLordX · Jan 9, 2015

Cerb said:
Those shader units are the GPU, same as a PC, and already do useful work, if not used for special computations.

They do useful work so long as there's 3d graphics work to be done. They go mostly dark unless you can do GPGPU work on them as well. Which is exactly the sort of thing Torvalds is complaining about. At least in the case of the consoles, most console software revolves around graphics-intensive games, so it's not a big problem for them, usually . . .

Also, do recall that Sony originally wanted Cell to use its SPEs for graphics and threw in a dGPU later when they realized that such a strategy would not work.

They did nothing like the Cell. They went with low power parts that were based on those with proven performance, and an existing software development ecosystem. The Cell was an, "if you build it they will come," chip. Jaguar with Radeon HD is a, "here's something that will work, because it's an improvement on this other thing that already works well enough for your needs."

Jaguar looks conventional today, but by 2006 standards, it would be revolutionary (or criminally insane). Eight cores and 1024 shaders in one die is a pretty far-out design compared even to Cell that seemed pretty wacky back in the day.

+20% is long dead. Broadwell looks to be like IB, with little to no gains on that front. Skylake should bring some, but 20% is, again, dead, at least short of some major disruption.

And so you reinforce my point. Torvalds is barking up the wrong tree here. He wants the IPC gains that you have said are dead.

But that's not the rub. That's the expected reality, based on the current trends.

Right, and Torvalds doesn't seem to like it very much.

Cerb · Jan 9, 2015

DrMrLordX said:
They do useful work so long as there's 3d graphics work to be done. They go mostly dark unless you can do GPGPU work on them as well. Which is exactly the sort of thing Torvalds is complaining about. At least in the case of the consoles, most console software revolves around graphics-intensive games, so it's not a big problem for them, usually . . .

No, they go dark when nothing is to be done. If you so much as alt-tab, they will be live, with no GPGPU work needed. So, they are generally useful. And, please show where it is being complained about. Show one single quote for it.

Also, do recall that Sony originally wanted Cell to use its SPEs for graphics and threw in a dGPU later when they realized that such a strategy would not work.

Yup. They didn't have to do that this time, did they? Because this time they chose something nominally proven.

DrMrLordX said:
And so you reinforce my point. Torvalds is barking up the wrong tree here. He wants the IPC gains that you have said are dead.

No, 20% is dead. Gains are not dead. We've been getting gains even with 20% being gone. So far, you haven't said that many weak CPU cores are the answer, either (many strong cores would mean nothing else left for the space or power).

Right, and Torvalds doesn't seem to like it very much.

You say the same thing, as if it's different. I'm thinking that, like ShintaiDK already commented, you only read the one comment, not anything else.

So, are you now not for GPGPU, DSP, etc.? On on hand you say it's the future. Then I comment that it's a non-issue, because we know and accept it, and somehow that's not liking it.

Idontcare · Jan 9, 2015

Lepton87 said:
But would it even be possible to cool such a core with just air? If a rule of the thumb would be every increase in IPC is worth 3x increase in power, how much would power density shoot up? Hell, even my overclocked HW-E is hard to cool with a custom water that cost me 400$.

You can buy GPU's that use over 300W and they are air cooled with coolers that are less massive than an NH-D14.

Air cooling doesn't seem to be the challenge per se, but consumers don't want 300W CPUs, heck they don't even want 140W CPUs, so it is a moot discussion.

Ramses · Jan 9, 2015

Pretty much everyone complains incessantly about the noise/heat too it seems.

Lepton87 · Jan 9, 2015

Idontcare said:
You can buy GPU's that use over 300W and they are air cooled with coolers that are less massive than an NH-D14.

Air cooling doesn't seem to be the challenge per se, but consumers don't want 300W CPUs, heck they don't even want 140W CPUs, so it is a moot discussion.

Most don't but we enthusiasts do. My computer uses 110W at ide and under the extreme load just for the CPU 425W so that makes over 300W for the CPU and my 400$ custom loop barely manages to cool that. I guess heat density is key in all of that. Those GPUs are 420mm2 for AMD and over 550mm2 for NV and 8-core HW-E is just 355mm2, I'm pretty sure that if I had that 18C 662mm2 die I could dissipate even 600W from it. 300W figure is for the whole card not just the GPU die. Even the heat density of my overclocked CPU would be too much for any CLC solution.

Pretty much everyone complains incessantly about the noise/heat too it seems.

WC takes care of that. I can turn off the fans on my radiator and it barely makes a difference in the temperature. But all of the noise is made by my air cooled cards. But that won't be for long.

Ramses · Jan 9, 2015

Lepton87 said:
WC takes care of that. I can turn off the fans on my radiator and it barely makes a difference in the temperature. But all of the noise is made by my air cooled cards. But that won't be for long.

I love WC, but it's not a mainstream/consumer solution. Even if Antec sells a crappy one at Best Buy. I haven't seen anything to suggest it will be either beyond Alienware grade stuff, which is priced to absorb the warranty/support/cost related to it I believe.

imported_ats · Jan 10, 2015

Dribble said:
If it was possible super computers would use less cores too - don't know why you think it's magically simpler for them to deal with massive parallelism. Super computers used less cores, and in the 90's the massive very fast single cores started getting blown away by the multi core designs. At that cross over point they gave up trying to make super powerful cores (what they had always done) and went parallel.

Because it is simpler for supercomputers to deal with massive parallelism. Most modern supercomputers are pretty horrible if you either need significant data sharing or don't have large enough datasets that you can easily distribute reasonable work loads to each core. Modern supercomputers are practically built around workload with both massive data sets and minimal interaction between data subsets. This works for a decent number of supercomputing applications because by their nature they have massive data sets, minimal communication, and the calculations can be easily be broken into subsets. An example of this is numerical weather forecasting where you can both go to lower timeslices and smaller grid sizes allowing you to make use of more hardware contexts. However, even for many supercomputing applications, there are limits to this, primarily driven by communication bandwidths and latencies. Which is why a supercomputers interconnection network these days costs as much if not more than the actual computation and storage. And why supercomputers are basically the driving forces in advancing networks these days.

Same is true in the home consumer market. The bits of it not held up by historical apps and OS (i.e. windows) already have gone massively parallel (e.g. gpu/gpu compute, which was built from the ground up to work with many cores). Just a matter of time until the rest catches up.

which is great and all except for the fact that the number of applications that can be sped up by compute offload is so miniscule as to be in most aspects pointless. Basically, if the problem doesn't look like a graphics problem, GPGPU really does jack all. And that basically means that the problem needs fixed data sets, practically zero interaction between data, pretty much no control flow, etc. That's a not surprisingly small portion of the application market out there.

Ramses said:
So, fewer, faster cores would cost a lot, be less power/heat efficient, and be expensive.

So, multi cores sounds like the right decision given the required constraints, no?

Well, maybe, if you want to ignore all the actual stuff you want to get done with a computer, sure. The reality is that applications are somewhat important and making application parallel is still a very very hard, very very complex problem. If it was so easy, companies like oracle wouldn't be multi-billion dollar corporations.

Ramses · Jan 10, 2015

But it seems to be what we have to work with for the foreseeable future on the desktop(ish). There was just a long discussion recently on how upgrade cycles have slowed down for CPU's since there isn't nearly as much need to upgrade quickly as there once was. Partly because existing stuff is "fast enough" and partly because new hardware isn't fast-er enough to justify itself. That seems contrary to there needing to be less cores that are faster.

TheELF · Jan 10, 2015

Ramses said:
That seems contrary to there needing to be less cores that are faster.

Nobody said that,what people said was that more cores don't give the end user any benefits if single core doesn't get a boost too.
It got so painfully obvious with the new haswell-e cpu's,
http://www.techspot.com/review/875-intel-core-i7-5960x-haswell-e/page8.html
if a programm has to synchronize it's threads than speed depends on single core speed.
http://www.legitreviews.com/powercolor-radeon-r7-250x-1gb-video-card-review_137172/5
It's the same story with all the games in this review(and any other that shows threads) one thread reaches the limit and thous limits all other threads.

soccerballtux · Jan 10, 2015

I can't help but wish Itanium had taken off. Part of me thinks with enough funding Intel's PhD's could have figured out how to efficiently compile for it.

SlickR12345 · Jan 10, 2015

The future is tons of cores. I mean if I were to predict the CPU's in 10 years I'd say its 64+ cores.

2020 is probably where we reach 16 cores and 2025 where we reach 64+ cores.

2030 I expect quantum computers to be available on a wider market, I think 2025 we see quantum computer breakthroughs and being used in big servers and supercomputers and 2030 reaching a more mainstream audience.

Linus Torvalds: Too many cores = too much BS

Platinum Member

Platinum Member

Lifer

Lifer

Elite Member

Elite Member

Platinum Member

Platinum Member

Senior member

Elite Member

Platinum Member

Platinum Member

Lifer

Platinum Member

Lifer

Elite Member

Elite Member

Platinum Member

Platinum Member

Platinum Member

Senior member

Platinum Member

Diamond Member

Lifer

Senior member