Linus Torvalds: Too many cores = too much BS

Ramses · Jan 5, 2015

Fjodor2001 said:
If Intel would only provide 1 core mainstream CPUs, and sell 2+ core CPUs at much higher "enthusiast/workstation" prices, then you can bet that most applications would be single threaded. Just think about how crappy that would be, to not even make use of 2 cores.

As an early enthusiast adopter of SMP, way pre-hyperthreading days, this sounds mighty familiar. I swear the same discussions have gone on, on this forum twelve or fifteen years ago even, when HT lumbered it's way into the semi-mainstream on that miserable Pentium 4. It sucked sometimes running two slower CPU's, but I could do several things at once and that was golden. It was golden-er when software took advantage of the hardware. Experience can only direct me to assume that this will be the case again with more cores on a single CPU. The longer I ran SMP, the better it got.

Fox5 · Jan 5, 2015

VirtualLarry said:
I agree completely. Bring on the cores, and a distributed app / OS model! Linus is IMHO too steeped in existing software architecture - eg. monolithic apps and kernels. Once you break them up into little minute bits of code, that all call each other, you open up more spaces for effective multi-processing.

You're missing out on the cost of synchronization. You can parallelize every function into it's own thread. Indeed microkernels weren't far off from being able to do that. But from a performance perspective, it's a bad idea.

Now the things I mentioned, such as AMD HSA and Intel TSX, allow us to have our parallelism cake without a large synchronization or programming overhead.

What happens when you pass data between a serial task and a parallel task? There's a context switch, you have to marshall data around, and possibly invalidate caches and some such. This can cost thousands of cycles. AMD's HSA has basically reduced the distance between cpu and gpu to only the cost of a cache flush (ie, reread from memory). Ideally, eventually the caches between the GPU and CPU will be unified at some level, and the GPU and CPU will only be a few cycles apart. It really does solve a lot of the practical issues with multithreading that keep people from wanting to multithread things.

jhu · Jan 5, 2015

VirtualLarry said:
I agree completely. Bring on the cores, and a distributed app / OS model! Linus is IMHO too steeped in existing software architecture - eg. monolithic apps and kernels. Once you break them up into little minute bits of code, that all call each other, you open up more spaces for effective multi-processing.

GNU Hurd has been in development for over 20 years, and there's still no stable version. Linus is, so far, still correct.

soccerballtux · Jan 6, 2015

Cerb said:
And the one going serial that handles those collisions, which, depending on the situation, could account for a large portion of the decisions needed, and may not be reducible until many have been decided for. If it were so easy, why haven't games that get bogged down from it doing it? Maybe it's because, aside from pathing, it's not so trivial. How do you crunch through those "collisions" (interactions) in parallel? For any given time slice, they each serialize every entity they "collide" with. As their number grows, the usefulness of parallelism diminishes (and their number that need handling that way will be greater than the actual count of interactions within any given time slice, but they have to be handled before that will be known).

IE, if 80% of the entities might have some effect on one another, the overhead of running it in parallel would surely have been wasted.

Only in demos. That's what should change. But, even today's consoles don't have enough CPU power for that, I don't think, so I doubt it will. Instead, we'll see more demos of better, and it will be around the next corner, yet again, and we'll continue to have hacks on rigid body Havok-like (or genuine Havok) physics; though midrange PCs can, based on those demos, do a pretty decent job. I'm patiently, awaiting the game that doe a decent job of modeling moving in mud, or trying to cross a river, or cave-ins, or ramming a vehicle into a building. Those sorts of things are either not done at all, or are done no better than not merely 5, but 15-20 years ago. Only a handful of racing games have gone any farther than the day we first say Havok in action (~10yrs), and other games not even that far.

Even those suck. It's going to be a long time before they get faces done well, I think. And it's not suspension of disbelief itself, but usefulness of the immersive world. Why can't we yet have a game that acts like Red Faction, but with decent physics backing up the buildings, for example? I'm fine and dandy with it looking like Borderlands (actually, I'd prefer that, over the massive amount of work that goes into failed "photorealism").

good. now we're asking the same question.

and no. they don't suck. They only suck at screaming.

soccerballtux · Jan 6, 2015

Idontcare said:
Same here. In fact I have never, not once, met a single other person in real life that I know of who edits video of any kind, whatsoever.

Kinda eye opening now that I think about, I've known hundreds of people well enough to know if they edit videos in any way and not a one of them had even considered it let alone attempted it.

Only people I have ever known who edit videos are forum friends met here on Anandtech. We must be a rather limited segment of the market, scary.

I've made a 20 minute documentary.

I don't make a habit of that though.

soccerballtux · Jan 6, 2015

Idontcare said:
I suspect part of the challenge is that programmers and game developers can't count on their customers having a minimum core count of say 4 or 6 (or higher) cores.

So they sort of have to pander to the middle of the hardware distribution curve which is going to entail a lot of crappy low-to-mid end systems with dual core and weak quad core cpus.

In that environment, no one is going to sink serious capital into developing a highly threaded game for which maybe 10% of their customer base will have the hardware to maximize.

that doesn't make sense. you're underestimating the power of path dependency, laziness, and marketing and release driven software

soccerballtux · Jan 6, 2015

shady28 said:
I pulled up procexp while running Skyrim, found 12 main threads off of Skyrim. 3 of them were 1-8% CPU usage, one was .1-1%, and the others were mostly 0.1%. I ran around a bit and I don't think I had a single thread use more than 10% for more than a second or so.

Did the same thing with Neverwinter Online - which was known for its high CPU requirement and low GPU requirement. This one had maybe 20-25 threads. Of those, about 15 were actually around 0.1-0.2%. Again, one thread was running 4-8% occasionally spiking as high as 12%, and two other threads were ~1-4%. Overall it took more CPU than Skyrim, not surprising as there are a heck of a lot more NPCs.

In both cases, there were a bunch of threads from the GPU drivers (nvidia .dlls) and a couple from the OS - DirectX and something else. These really didn't take any noticeable CPU though.

In both sessions, the max CPU was at 28%. CPU spikes over 20% happened 3 times in Neverwinter, for a grand total of 4s above 20% after about 10mins of play.

Point being, these games are already heavily threaded. They just aren't balanced. And it makes no difference anyway, all these threads could run on a Haswell i3 without any issue. Only way I was able to get some decent CPU action going was to start up a couple of youtube videos below my game window.

Firefox has hundreds of threads yet still uses only one core for the entire process because it's so interlocked.

threading itself means nothing

soccerballtux · Jan 6, 2015

TheELF said:
Open up windows task manager, right now,go to the performance tab and have a look at how many threads windows(and all your soft) is running,what you are saying has been done years ago.

Thing is that a problem that needs to be solved in a serial manner will still need a lot of processor speed to be solved quickly.

haha, now I see why you are so gung-ho Intel. It has most definitely NOT been done.

soccerballtux · Jan 6, 2015

cbn said:
Here was the % FPS difference between 4.7 GHz FX-6300 vs. 4.7 GHz FX-4300 in each game:

Assassin's Creed IV: Black Flag: ~4%
Arma III: ~5%
Battlefield 4 MP: ~39%
CS:GO : ~11%
Crysis 3, Root of All Evil: ~24%
Crysis 3, Welcome to The Jungle: ~33%
Far Cry 3: ~11%
Max Payne 3: ~23%
Watch Dogs: ~19%
Civilization V: ~17%
Skyrim: ~5%
Witcher 2: ~13%
Star Craft 2: ~4%
Total War: Rome 2: ~9%
Flight Simulation X: ~4%

Regarding reduction module penalty vs. hexcore for the FPS gain, that is a good question. One way to find out true quad to hexcore scaling (in a practical way) would to compare using a SB-E or IB-E quad core and hexcore at the same clocks.

so, Battlefield and Crysis3 are acceptable benchmarks for 6 core CPUs. We need to see the same for 6->8 core CPUs.

I guarantee UE4 is going to make FX-8xxx look like a worthy purchase.

soccerballtux · Jan 6, 2015

jhu said:
GNU Hurd has been in development for over 20 years, and there's still no stable version. Linus is, so far, still correct.

is that because it's not monolithic, or because Linux developed momentum?
maybe it's easier for something large and monolithic to develop momentum :colbert:

:hmm: :awe:?

Simsme · Jan 6, 2015

Hot thread in 2015

jhu · Jan 6, 2015

soccerballtux said:
is that because it's not monolithic, or because Linux developed momentum?
maybe it's easier for something large and monolithic to develop momentum :hmm: :awe:?

Or, in other words, it's not as easy as VirtualLarry makes it out to be.

exar333 · Jan 6, 2015

soccerballtux said:
so, Battlefield and Crysis3 are acceptable benchmarks for 6 core CPUs. We need to see the same for 6->8 core CPUs.

I guarantee UE4 is going to make FX-8xxx look like a worthy purchase.

Or just make it competitive?

For the rest of us, we had a great-performing CPU for YEARS vs. a slower one. Nothing bad against the FX-8xxx series, but trying to justify a past purchase with UPCOMING software is pretty sad.

You should buy for what you need, not what 'might' come out later. That's foolish for any component IMHO.

Cerb · Jan 6, 2015

soccerballtux said:
good. now we're asking the same question.

So you're going to play the, "I'll just stick my head in the sand and assume everyone's lazy, because it's really much easier than anyone else thinks it is," card. Great. It can't be made so limited as pathfinding, for example, can be. Each frame there may be many iterations of decisions for every actor, and they will need to vary throughout development, possibly well into the support portion of the life cycle (meaning they may not be able to be limited to a small number of deterministic paths up front, like pathfinding).

and no. they don't suck. They only suck at screaming.

How many games have had faces with skin that moves over the bone and muscle structure, without stretching like it's a rubber mask? I don't disagree that Valve did it as well as anyone still has yet to, but that only works right for young and/or female and/or hairless-alien faces. Adolescent or older male faces I still haven't seen done well, outside of TV/movies. It's 10x worse if it's a male face with dimples, too.

SunnyD · Jan 6, 2015

jhu said:
GNU Hurd has been in development for over 20 years, and there's still no stable version. Linus is, so far, still correct.

Is he really correct or is it not stable because we haven't invested the proper resources to the problem? Remember, we've been throwing our resources towards a very restricted subset of parallelism where the most immediate and largest tangible gains can be obviously seen. Everybody, including Torvalds, is all about the low-hanging fruit.

jhu · Jan 6, 2015

SunnyD said:
Is he really correct or is it not stable because we haven't invested the proper resources to the problem? Remember, we've been throwing our resources towards a very restricted subset of parallelism where the most immediate and largest tangible gains can be obviously seen. Everybody, including Torvalds, is all about the low-hanging fruit.

Research institutions have loads of smart people who work for peanuts (PhD post-docs). Still nothing in wide usage other than Mach.

But team size doesn't really have much to do with it. For example, the NetBSD and OpenBSD teams are significantly smaller than the number of people working on Linux. Yet they were able to produce a robust and production ready OS.

Fox5 · Jan 6, 2015

jhu said:
Research institutions have loads of smart people who work for peanuts (PhD post-docs). Still nothing in wide usage other than Mach.

And Apple didn't really conform to the Mach premise anyway, since a lot of their code is monolithic.

You are wrong though, Sun Solaris is very microkernelish and met pretty wide spread use and was more cutting edge than Mach. There's also QNX, which is used in embedded systems and Blackberry phones.

Cerb · Jan 6, 2015

VirtualLarry said:
I agree completely. Bring on the cores, and a distributed app / OS model! Linus is IMHO too steeped in existing software architecture - eg. monolithic apps and kernels. Once you break them up into little minute bits of code, that all call each other, you open up more spaces for effective multi-processing.

Linux typically scales out better than any other OS but FreeBSD (unless that's changed very recently), and they have done a lot of work in the past, and will continue into the future, for better performance in parallel and concurrent programs. In general, the Unix way of doing work is to call other processes, and outside of the history-repeating Systemd, it's alive and well.

It's also not new. Doing it for higher throughput with SMP is new, but doing it for saner software development, and to make use of the OS for concurrency, is old. What is new is remaking software with the mainframe mindset, but with modern commodity software and hardware, and that everything is more efficient by using a single service/runtime. Windows can't help it, to a degree, but userland on Linux is, by and large, repeating history for being ignorant of it.

Monolithic v. micro really has nothing to do with your end result, so long as it works (though a microkernel OS has no place in a general purpose computer, due to IPC issues that have yet to be resolved in a suitable fashion). Different tasks have different scheduling needs, and may have different programs and libraries to make use of. Sometimes, a pipe is as easy and good as anything else (if it can work, it's usually the easiest way). Sometimes, it can't be used. Sometimes, it's too slow. And so on. A single C/C++ program managing its own memory and scheduling will generally be the hardest way to do any particular job, but sometimes it might be the best or only way (depending on libraries available, it could actually be the easy, however; such as building a GUI-based program with Qt). And sometimes, it either can't be done, or will take enough time/money to not be worth doing.

Game engines have tens of millions, some more, to go into development, in the knowledge or hope that it will be licensed to make up that cost, so anything that may provide tangible performance benefits will be worth it. Plenty of software is built on a much smaller scale, or with little benefit to massive rewriting.

If you're building new software, and there are places to make it scale out, even on a single PC, it would be stupid not to at least build in provisions for doing so, even if you leave the work of doing it for a later date. Not doing so will put you in a place like Firefox: able to use multiple cores somewhat, because of hacking some concurrency on, but when being slow, it's always that main one blocking the rest. But, even then, your program might not need it enough to be worth doing the work to actually make it scale out.

jhu · Jan 6, 2015

Fox5 said:
And Apple didn't really conform to the Mach premise anyway, since a lot of their code is monolithic.

You are wrong though, Sun Solaris is very microkernelish and met pretty wide spread use and was more cutting edge than Mach. There's also QNX, which is used in embedded systems and Blackberry phones.

Solaris uses a monolithic kernel.

QNX is an RTOS, a slightly different beast. That is used in a lot of places although not quite sure what it brings for Blackberry vs. Linux.

Ramses · Jan 6, 2015

exar333 said:
Or just make it competitive?

Icing on the aged cake.

sm625 · Jan 6, 2015

As someone who runs a G3258 at 4.5GHz, I totally agree with Linus. More threads is a poor substitute for good coding and optimization anyway.

SunnyD · Jan 6, 2015

jhu said:
Research institutions have loads of smart people who work for peanuts (PhD post-docs). Still nothing in wide usage other than Mach.

But team size doesn't really have much to do with it. For example, the NetBSD and OpenBSD teams are significantly smaller than the number of people working on Linux. Yet they were able to produce a robust and production ready OS.

I wasn't really referring to manpower in terms of resources in this case. It's more of a "solution in search of a problem" issue at this point anyway.

As I said earlier though, Torvalds has his baby and he has a very vested interest in his given point of view. There are few if any gains to be made with the hardware we are throwing at parallelism in the current paradigm, and the software being designed for the current general purpose computing paradigm simply doesn't scale well given the current hardware resources. Chicken-Egg problem. Torvalds is looking at it through his lens, and he is indeed hitting the mark while failing to see the big picture at the same time. It would take something radically different on BOTH sides of hardware and software in order to take advantage of a higher degree of parallelism on a more general purpose scale.

jhu · Jan 6, 2015

SunnyD said:
I wasn't really referring to manpower in terms of resources in this case. It's more of a "solution in search of a problem" issue at this point anyway.

As I said earlier though, Torvalds has his baby and he has a very vested interest in his given point of view. There are few if any gains to be made with the hardware we are throwing at parallelism in the current paradigm, and the software being designed for the current general purpose computing paradigm simply doesn't scale well given the current hardware resources. Chicken-Egg problem. Torvalds is looking at it through his lens, and he is indeed hitting the mark while failing to see the big picture at the same time. It would take something radically different on BOTH sides of hardware and software in order to take advantage of a higher degree of parallelism on a more general purpose scale.

That's exactly the issue. Why write software for non-existant hardware? But we've also seen where ambitious hardware aspirations have gone too (Itanium being the latest victim).

ashetos · Jan 6, 2015

It's funny that people suggest buying 16GB of RAM and try justifying dual cores in 2015.

What happened to multitasking people? There are inifinite combinations of real-world multitasking needs that quad cores do not satisfy, let alone dual cores.

My wife opens like 15 browser tabs, in those tabs you have adobe reader and flash playback decoding video and audio. At the same time she uses skype, such a heavy program for its intended usage, plus thunderbird for e-mails. Then, based on the information on the browser and e-mails, add ArcGIS and MATLAB to do the actual work. The Core i7 QM-based laptop workstation just sweats at those workloads. Add to that any background virus checks, windows updates or unzipping files. Quad core is simply too little.

I'm not going to be talking about my needs, since I do things that almost nobody does on their PC, but I could use 16 cores, easy.

Let's discuss about a gamer, that wants to stream on twitch.tv and chat with his viewers. There we go again, tons of browser tabs running javascript or whatever, skype for communication, the actual game itself (plus the need to sustain 60 minimum fps), the streaming software, that captures and encodes in real-time plus whatever background tasks are necessary. You think a quad core is enough? No it is not.

God forbid anyone ever needs to use Virtual Machines.

The point is, if a freaking video-game needs 6 cores to work as intended, then I need another 2 cores to have a processor that is reasonably utilized.

And I bet, if an 8-core i7 3Ghz Intel processor cost 300$ using something like cheap triple channel DDR3 and a cheap platform similar to Z97 everyone in these forums would be using the 8-core instead of the 4Ghz 4-core i3. Then those quad core flagship laptops would seem pretty weak wouldn't they?

And this is where we have to blame the lack of competition, that caused intel to grow complacent in the desktop segment.

cbn · Jan 6, 2015

ashetos said:
And this is where we have to blame the lack of competition, that caused intel to grow complacent in the desktop segment.

In defense of Intel they have been putting more silicon on every desktop LGA 115x die since Sandy Bridge.

The problem you and other people have is that the extra silicon is going towards iGPU rather than adding cpu cores.

And not many of us desktop gamers use the iGPU.

Of course, if competition ever came to Intel that do not add iGPU in the same way Intel does, then we'd see Intel shift things up.

Unfortunately for us the competition to Intel on mainstream desktop is AMD and they add even more iGPU than Intel

(Of course, AMD also has iGPU less chips like AM3+, but those have lots of L3 cache which bloats the die in another way).

Linus Torvalds: Too many cores = too much BS

Platinum Member

Diamond Member

Lifer

Lifer

Lifer

Lifer

Lifer

Lifer

Lifer

Lifer

Junior Member

Lifer

Diamond Member

Elite Member

Belgian Waffler

Lifer

Diamond Member

Elite Member

Lifer

Platinum Member

Diamond Member

Belgian Waffler

Lifer

Senior member

Lifer