Linus Torvalds: Too many cores = too much BS

Page 4 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Dec 30, 2004
12,553
2
76
You're talking about possibly waiting thousands to hundreds of thousands of cycles for another thread to catch up, even not considering any other performance issues, compared to being able to run through hundreds or thousands more iterations in that time if left completely serial. The only trivial way to serialize is to yield and wait, if running them all in parallel, which is going to require more time than just doing it all in one thread, probably up to many thousands of entities at once (more than most games will fit in memory, anyway). Any other way is not at all trivial, and would involve some kind of partitioning before-hand based on possibility/probably of interaction, then clean up and merging after.

huh? same as every other frame, you update coordinates of all units, then check proximity, then handle collisions. all of that is trivial to thread and crunch through in parallel.

Physics calculations are necessary, and cannot be simplified. Games with physics cannot be played without them, or with more simplified physics--they are integrated real-time game behavior rules. They need to do just the opposite. We need soft body and fluid to go from specialized demos to being integral game mechanics. Studios being lazy is partly true (in that publishers want to spend $30M on marketing, but not on the game, and need it done yesterday).

they don't look any better than they did 5 years ago, so...

deformable bodies and fluids are not what's preventing belief suspension, it's the facial expressions in Source's HL2 that nobody has since even come close to touching.
 
Last edited:

Cerb

Elite Member
Aug 26, 2000
17,484
33
86
huh? same as every other frame, you update coordinates of all units, then check proximity, then handle collisions. all of that is trivial to thread and crunch through in parallel.
And the one going serial that handles those collisions, which, depending on the situation, could account for a large portion of the decisions needed, and may not be reducible until many have been decided for. If it were so easy, why haven't games that get bogged down from it doing it? Maybe it's because, aside from pathing, it's not so trivial. How do you crunch through those "collisions" (interactions) in parallel? For any given time slice, they each serialize every entity they "collide" with. As their number grows, the usefulness of parallelism diminishes (and their number that need handling that way will be greater than the actual count of interactions within any given time slice, but they have to be handled before that will be known).

IE, if 80% of the entities might have some effect on one another, the overhead of running it in parallel would surely have been wasted.

they don't look any better than they did 5 years ago, so...
Only in demos. That's what should change. But, even today's consoles don't have enough CPU power for that, I don't think, so I doubt it will. Instead, we'll see more demos of better, and it will be around the next corner, yet again, and we'll continue to have hacks on rigid body Havok-like (or genuine Havok) physics; though midrange PCs can, based on those demos, do a pretty decent job. I'm patiently, awaiting the game that doe a decent job of modeling moving in mud, or trying to cross a river, or cave-ins, or ramming a vehicle into a building. Those sorts of things are either not done at all, or are done no better than not merely 5, but 15-20 years ago. Only a handful of racing games have gone any farther than the day we first say Havok in action (~10yrs), and other games not even that far.

deformable bodies and fluids are not what's preventing belief suspension, it's the facial expressions in Source's HL2 that nobody has since even come close to touching.
Even those suck. It's going to be a long time before they get faces done well, I think. And it's not suspension of disbelief itself, but usefulness of the immersive world. Why can't we yet have a game that acts like Red Faction, but with decent physics backing up the buildings, for example? I'm fine and dandy with it looking like Borderlands (actually, I'd prefer that, over the massive amount of work that goes into failed "photorealism").
 

cbn

Lifer
Mar 27, 2009
12,968
221
106
Sure...super uber enthusiast stuff is going to look huge...but do those cards NEED to be that huge? Nah.

http://media.bestofmicro.com/S/O/459816/gallery/Picture3_w_600.png

That's a GTX 970...

High end cards are only this huge (currently) because Nvida, AMD and also third party companies WANT them this huge. (And sales, always sales).

Enthusiast versions of high end cards will obviously still need their 2-3 fans and extra length... but look at how much integrated graphics advanced on Nvidias and AMDs side in the past few years.

Tegra K1 and AMD Kaveri are quite capable.

Regarding Kaveri, IMO it is very expensive way to go about fixing a cheap problem.

After a person factors in both the cpu throttling issue (under iGPU load) and memory bandwith problem the chip turns out to be a very poor value on the desktop.

Having the cpu and gpu (even if it is just a low end video card) being separate results in greater value.

Now mobile, well I believe that is a different story.
 

cbn

Lifer
Mar 27, 2009
12,968
221
106
I guess at least for gamers too many cores are BS at some point.


I mean sure you can try and optimize your game to use 6, 8, 10, 12 cores...but is that worth it?

I'd rather just see a "close to the metal" API that runs applications with 4 cores that perform so well that this CPU parallelism is just not an issue anymore.

If I had a quadcore CPU that runs at 4 GHZ but is as fast as an intel i7 8 core at 4Ghz...then my choice is clear...less cores means less trouble if they deliver the performance you need. Just look at the poor FX8 and FX9 chips from AMD. Not many games utilize 8 cores..and even when they do the performance is crap (okay, AMD is to blame for that part...but still).
More cores also always means more power draw and less space for other things. (At 14NM or so that won't matter anymore I guess)

I for one wouldn't mind a move toward capable SoCs that even have the ram on the chip already...but I don't see that happening with 35324535 CPU cores. Per core power for me will always be more important than number of course.

4 cores just seems to make the most sense for gamers...and that is what I care about the most.

I agree having four very powerful cores is a very attractive thing, but on the slower end of things I should point out that six 4.7 Ghz Piledriver cores appears to beat four 4.5 Ghz Steamroller cores 14 out of 15 times according to the benchmarks in these links:

http://forums.anandtech.com/showpost.php?p=36998949&postcount=10
http://forums.anandtech.com/showpost.php?p=36998951&postcount=11

P.S. Here was the only game the 4.5 GHz Athlon x4 860K won:

assassin_1920n.png


^^^^ 1.3 FPS margin of victory (slightly less than 3% faster)

(Although with this mentioned, I should point out there were a few games where the OC FX-6300 only won by a few percent. Most of the time though the six older Piledrivers at a 200 Mhz faster clock won by a much better margin.)

So hexcore scaling appears to be getting a lot better.
 

Lepton87

Platinum Member
Jul 28, 2009
2,544
9
81
I agree having four very powerful cores is a very attractive thing, but on the slower end of things I should point out that six 4.7 Ghz Piledriver cores appears to beat four 4.5 Ghz Steamroller cores 14 out of 15 times according to the benchmarks in these links:


(Although with this mentioned, I should point out there were a few games where the OC FX-6300 only won by a few percent. Most of the time though the six older Piledrivers at a 200 Mhz faster clock won by a much better margin.)

So hexcore scaling appears to be getting a lot better.

It all about the L3 cache... PD has it SR doesn't so it's not a fair comparison of the cores.
 

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
I suspect part of the challenge is that programmers and game developers can't count on their customers having a minimum core count of say 4 or 6 (or higher) cores.

So they sort of have to pander to the middle of the hardware distribution curve which is going to entail a lot of crappy low-to-mid end systems with dual core and weak quad core cpus.

In that environment, no one is going to sink serious capital into developing a highly threaded game for which maybe 10% of their customer base will have the hardware to maximize.
 

Enigmoid

Platinum Member
Sep 27, 2012
2,907
31
91
So hexcore scaling appears to be getting a lot better.

Be careful not to oversimplify the situation. The module penalty of the construction cores decreases when the core count goes up for lightly threaded programs.

Kaveri is two cores at ~105% PD IPC + 2 cores at ~80% total approximately 370% PD singlethread performance.

6 core Vishera is three cores at 100% PD IPC + 3 cores at ~75% PD IPC total approximately 525% ST PD performance.

However for a three or four core game, the vishera has the edge as well, getting a full module rather than the other half of a used module.

3 Threads
290% Kaveri vs. 300% Vishera

4 Threads
370% Kaveri vs. 375% Vishera

Its a small difference (my numbers may not be exact) but it will be noticeable. Vishera also has L3. The clocks (4.7 Vishera vs. 4.5 Kaveri) knock a bit of SR's IPC increase away.
 

hackmole

Senior member
Dec 17, 2000
250
3
81
Let's get real. We all need the fastest processors we can get.

Every freaking camera today takes video and video takes forever to process. I don't know one person who hasn't used his computer for processing video at some time or another. Software companies are utilizing all four cores for rendering and they will do so for 8 cores. When you are talking about cutting off 30 minutes or an hour or more off of rendering time then those processors become CRITICALLY IMPORTANT because often times in video editing you have to edit and re-organize videos dozens of times to get it right.
Think about how long it takes to OCR books. Many books I get off the net are not OCRd. That makes them almost useless to me. I need to OCR them. Think about how much time it takes if you do something like OCR 100 books. I have OCRd about 400 books in the last year alone.
Processing speed is not something that we don't need, it is an absolute necessity if we are going to take full advantage of the potential that is offered us with the use of a computer. If some people only want to do email, word processing and Facebook, that's fine. But there are millions upon millions who are doing a lot more than that and absolutely need that speed desperately.

I can hardly wait for Skylake unless something faster comes along.
 

njdevilsfan87

Platinum Member
Apr 19, 2007
2,342
265
126
I never saw parallelism as a way of making something faster. I just saw it as a way of running more of that something with less performance loss.
 
Aug 11, 2008
10,451
642
126
Let's get real. We all need the fastest processors we can get.

Every freaking camera today takes video and video takes forever to process. I don't know one person who hasn't used his computer for processing video at some time or another. Software companies are utilizing all four cores for rendering and they will do so for 8 cores. When you are talking about cutting off 30 minutes or an hour or more off of rendering time then those processors become CRITICALLY IMPORTANT because often times in video editing you have to edit and re-organize videos dozens of times to get it right.
Think about how long it takes to OCR books. Many books I get off the net are not OCRd. That makes them almost useless to me. I need to OCR them. Think about how much time it takes if you do something like OCR 100 books. I have OCRd about 400 books in the last year alone.
Processing speed is not something that we don't need, it is an absolute necessity if we are going to take full advantage of the potential that is offered us with the use of a computer. If some people only want to do email, word processing and Facebook, that's fine. But there are millions upon millions who are doing a lot more than that and absolutely need that speed desperately.

I can hardly wait for Skylake unless something faster comes along.

You must have a very video intensive circle of friends. In my family I know about ten people well enough to know their computing habits and *none* I repeat *none* of them have ever done any video editing. The most intensive video they might do is posting photos or clips onto Facebook. You can do this with a low end laptop or even a tablet or smartphone.
 

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
You must have a very video intensive circle of friends. In my family I know about ten people well enough to know their computing habits and *none* I repeat *none* of them have ever done any video editing. The most intensive video they might do is posting photos or clips onto Facebook. You can do this with a low end laptop or even a tablet or smartphone.

Same here. In fact I have never, not once, met a single other person in real life that I know of who edits video of any kind, whatsoever.

Kinda eye opening now that I think about, I've known hundreds of people well enough to know if they edit videos in any way and not a one of them had even considered it let alone attempted it.

Only people I have ever known who edit videos are forum friends met here on Anandtech. We must be a rather limited segment of the market, scary.
 

cbn

Lifer
Mar 27, 2009
12,968
221
106
It all about the L3 cache... PD has it SR doesn't so it's not a fair comparison of the cores.

Vishera does have the L3, but steamroller has less module penalty.

As far as the 200 Mhz clockspeed difference goes, Steamroller does have higher IPC than Piledriver so I am not sure how much of a factor it really is.
 
Last edited:

shady28

Platinum Member
Apr 11, 2004
2,520
397
126
I pulled up procexp while running Skyrim, found 12 main threads off of Skyrim. 3 of them were 1-8% CPU usage, one was .1-1%, and the others were mostly 0.1%. I ran around a bit and I don't think I had a single thread use more than 10% for more than a second or so.

Did the same thing with Neverwinter Online - which was known for its high CPU requirement and low GPU requirement. This one had maybe 20-25 threads. Of those, about 15 were actually around 0.1-0.2%. Again, one thread was running 4-8% occasionally spiking as high as 12%, and two other threads were ~1-4%. Overall it took more CPU than Skyrim, not surprising as there are a heck of a lot more NPCs.

In both cases, there were a bunch of threads from the GPU drivers (nvidia .dlls) and a couple from the OS - DirectX and something else. These really didn't take any noticeable CPU though.

In both sessions, the max CPU was at 28%. CPU spikes over 20% happened 3 times in Neverwinter, for a grand total of 4s above 20% after about 10mins of play.

Point being, these games are already heavily threaded. They just aren't balanced. And it makes no difference anyway, all these threads could run on a Haswell i3 without any issue. Only way I was able to get some decent CPU action going was to start up a couple of youtube videos below my game window.
 

Erenhardt

Diamond Member
Dec 1, 2012
3,251
105
101
I suspect part of the challenge is that programmers and game developers can't count on their customers having a minimum core count of say 4 or 6 (or higher) cores.

So they sort of have to pander to the middle of the hardware distribution curve which is going to entail a lot of crappy low-to-mid end systems with dual core and weak quad core cpus.

In that environment, no one is going to sink serious capital into developing a highly threaded game for which maybe 10% of their customer base will have the hardware to maximize.

But with new consoles, the lowest common denominator will be 6-8 slower cores.
There are games already that refuse to start if 4 cores are not present.

I agree having four very powerful cores is a very attractive thing, but on the slower end of things I should point out that six 4.7 Ghz Piledriver cores appears to beat four 4.5 Ghz Steamroller cores 14 out of 15 times according to the benchmarks in these links:

http://forums.anandtech.com/showpost.php?p=36998949&postcount=10
http://forums.anandtech.com/showpost.php?p=36998951&postcount=11

P.S. Here was the only game the 4.5 GHz Athlon x4 860K won:

assassin_1920n.png


^^^^ 1.3 FPS margin of victory (slightly less than 3% faster)

(Although with this mentioned, I should point out there were a few games where the OC FX-6300 only won by a few percent. Most of the time though the six older Piledrivers at a 200 Mhz faster clock won by a much better margin.)

So hexcore scaling appears to be getting a lot better.

1. If you want to compare less fast cores to more slow cores, comer i3 to fx6300. FX is faster when all threads are used. In games however i3 wins most of the time with the exception of the best threaded games of 2013.

2. To see how well games scale beyond 4 cores just compare fx4300 to fx6300. Most games from the link you provided show very little advantage. Most likely impact of background tasks running on the spare module of fx6300.
There are few games that scale with 6 threads. Some better than others:
BF4, Watch Doges:awe:, Crysis 3.
Some, like Far Cry3 or Max Payne 3 show some benefit, but the scaling is not amazing.
 

cbn

Lifer
Mar 27, 2009
12,968
221
106
skyrim_1920n.png


w2_1920n.png


sc2_1920n.png


tw_1920n.png


fsx_1920n.png


4.7 Ghz FX-6300 beats 4.7 Ghz FX-4300 in every game.

Sometimes the advantage is not much, but other times the advantage is substantial.

One thing to keep in mind though (As Enigmoid pointed out earlier) is that FX-6300 will have less module penalty in lightly threaded games (owing to the fact it has one more module).
 

dacostafilipe

Senior member
Oct 10, 2013
805
309
136
Concurrency versus parallelism all over again.

In today's hardware, there's a lot of unused processing power that could be used by a more concurrency based programming.

People should stop aiming for parallelism where it's not needed, only because it sounds cool. In most cases, concurrency based programming is more effective and faster to implement.

So, yes, I think he's right!
 

BSim500

Golden Member
Jun 5, 2013
1,480
216
106
There are games already that refuse to start if 4 cores are not present.

Only the badly written ones that appear to force-assign threads specifically to 0, 1, 2 & 3 cores based on their core integer number, (whereas that should be the job of the Windows scheduler). Same also appears to be the reason behind why some games (eg, Thief) can run slower on i7's vs i5's - they end up unwittingly trying to force some threads to a certain core number (ignoring the fact that core may be a HT and not real core (eg, core 1 may be the Hyper-Threaded "pair" of Core 0, not the second real core (which could be core 2)). It's simply a reflection on the programmers incompetence, not a hardware fault. I'm not going to post any links but a certain scene cracker's release of "Far.Cry.4.Dual.Core-Fix.rar" speaks for itself as to how consistently totally useless Ubisoft are at coding / porting / optimizing for the PC...
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
146
106
But with new consoles, the lowest common denominator will be 6-8 slower cores.
There are games already that refuse to start if 4 cores are not present.

By games you mean 1 game? And that may be fixed as well. Its simply due to sloppy coding.

The lowest common determinator from consoles will never be 8. Sony is 6, and MS may be 6½.

But again, its going to be ugly above 4 on the consoles due to their design.

And we still see i3s and dualcores run in circles around console CPus. So I wouldnt worry any bit. This generation of consoles is not going to make those CPUs obsolete.
 

Fjodor2001

Diamond Member
Feb 6, 2010
4,220
583
126
I suspect part of the challenge is that programmers and game developers can't count on their customers having a minimum core count of say 4 or 6 (or higher) cores.

So they sort of have to pander to the middle of the hardware distribution curve which is going to entail a lot of crappy low-to-mid end systems with dual core and weak quad core cpus.

In that environment, no one is going to sink serious capital into developing a highly threaded game for which maybe 10% of their customer base will have the hardware to maximize.

In that case we have a chicken and egg scenario. I.e. if Intel would provide 8 core mainstream CPUs, then the SW companies would adapt. On 14 nm (and perhaps even 22 nm) Intel surely can if they want to.

If Intel would only provide 1 core mainstream CPUs, and sell 2+ core CPUs at much higher "enthusiast/workstation" prices, then you can bet that most applications would be single threaded. Just think about how crappy that would be, to not even make use of 2 cores.
 

SunnyD

Belgian Waffler
Jan 2, 2001
32,675
146
106
www.neftastic.com
I mean, stupidly parallel computation is a route that only goes so far.

But we're seeing a lot of hardware advancements that take out the difficulty of parallel computation. Synchronization costs are what kill a lot of parallelization, and our hardware is getting better at communicating.
The shared caches and atomic operations of current Intel CPUs are fantastic and your synchronization cost is very low. Even a naive locking algorithm can probably scale to ~64 processors without a problem.
AMD keeps making strides with HSA, reducing the distance between your highly parallel resources and your CPUs. This is important, since it will allow for lamda style functions where you just feed off a computation to the GPU mid program.
Intel's TSX instructions will also make things easier too.

Now that said, there's still some pretty hard limits on what parallel computation can do for most problems. Most parallel implementations of algorithms are eventually going to fall into a series of reductions that serialize the problem. Each stage, your dataset will collapse by half, so as the number of processors approaches the size of your dataset, your speed improvement collapses pretty quickly. Of course, you can always just scale up the size of the problem.

Something like the Linux kernel may actually be better off being tuned for a single processor.

All good points, but again you, like most everybody (and implicitly Torvalds) mostly because of the technology at hand seem to be fixated on are homogeneous workloads where a single task can be broken down and scaled nicely across N cores. That's nice and all, but such a limited subset of parallel computing.

What I'm talking is the holy grail of parallel computing, or rather being able to parallelize multiple heterogeneous workflows in a meaningful way. This is the hardest part for any given process given the nature of branching, out of order outcomes, etc. At some point you ask what would be better - high parallelism, or a simple high degree of multithreading/multiprocessing. I think far too much investment is being made right now in trying to parallelize everything and not enough investment is being made into utilizing more general purpose processing capability to parallelize entire processes and OS's on a macroscopic scale. These special purpose APUs are basically not helping, and that's where the problem is.
 

Lepton87

Platinum Member
Jul 28, 2009
2,544
9
81
4.7 Ghz FX-6300 beats 4.7 Ghz FX-4300 in every game.

Sometimes the advantage is not much, but other times the advantage is substantial.

One thing to keep in mind though (As Enigmoid pointed out earlier) is that FX-6300 will have less module penalty in lightly threaded games (owing to the fact it has one more module).

That's not a good comparison, because a hexacore FX will also be faster at 4-threaded workload due to a smaller module penalty.
 

VirtualLarry

No Lifer
Aug 25, 2001
56,585
10,225
126
not enough investment is being made into utilizing more general purpose processing capability to parallelize entire processes and OS's on a macroscopic scale. These special purpose APUs are basically not helping, and that's where the problem is.

I agree completely. Bring on the cores, and a distributed app / OS model! Linus is IMHO too steeped in existing software architecture - eg. monolithic apps and kernels. Once you break them up into little minute bits of code, that all call each other, you open up more spaces for effective multi-processing.
 

TheELF

Diamond Member
Dec 22, 2012
4,027
753
126
I agree completely. Bring on the cores, and a distributed app / OS model! Linus is IMHO too steeped in existing software architecture - eg. monolithic apps and kernels. Once you break them up into little minute bits of code, that all call each other, you open up more spaces for effective multi-processing.

Open up windows task manager, right now,go to the performance tab and have a look at how many threads windows(and all your soft) is running,what you are saying has been done years ago.

Thing is that a problem that needs to be solved in a serial manner will still need a lot of processor speed to be solved quickly.
 

cbn

Lifer
Mar 27, 2009
12,968
221
106
4.7 Ghz FX-6300 beats 4.7 Ghz FX-4300 in every game.

Sometimes the advantage is not much, but other times the advantage is substantial.

One thing to keep in mind though (As Enigmoid pointed out earlier) is that FX-6300 will have less module penalty in lightly threaded games (owing to the fact it has one more module).

That's not a good comparison, because a hexacore FX will also be faster at 4-threaded workload due to a smaller module penalty.

Here was the % FPS difference between 4.7 GHz FX-6300 vs. 4.7 GHz FX-4300 in each game:

Assassin's Creed IV: Black Flag: ~4%
Arma III: ~5%
Battlefield 4 MP: ~39%
CS:GO : ~11%
Crysis 3, Root of All Evil: ~24%
Crysis 3, Welcome to The Jungle: ~33%
Far Cry 3: ~11%
Max Payne 3: ~23%
Watch Dogs: ~19%
Civilization V: ~17%
Skyrim: ~5%
Witcher 2: ~13%
Star Craft 2: ~4%
Total War: Rome 2: ~9%
Flight Simulation X: ~4%

Regarding reduction module penalty vs. hexcore for the FPS gain, that is a good question. One way to find out true quad to hexcore scaling (in a practical way) would to compare using a SB-E or IB-E quad core and hexcore at the same clocks.