Maximizing single/multithread performance with hybrid CPU

Status
Not open for further replies.

lagokc

Senior member
Mar 27, 2013
808
1
41
Given that single threaded and multithreaded performance are both important in modern processors has anyone proposed a CPU that uses 2-4 big fast cores optimized for single threaded performance combined with as many smaller cores optimized for performance per die area as possible?

Just as an example (and probably not a very good one), AMD could fit 2 Vishera modules (4 cores) plus something like 32 Jaguar cores into the same number of transistors as an 8 core FX-8350. Since the big fast cores would be seen as core-0 through core-3, they'll be used first for applications and when multithreading got really heavy the smaller cores would start to come into play.

I don't think it's impossible given that the Exynos 5 Octa is capable of running on all 8-cores (though software generally doesn't support it). It seems like part of the benefit of integrating the GPU and running a lot of OpenCL code except that in this case the big and small cores would actually share an instruction set so could balance threads back and forth.
 

Ancalagon44

Diamond Member
Feb 17, 2010
3,274
202
106
Well, the first problem would be exactly what happened with Bulldozer - the scheduler didnt know what the best thread allocation strategy was. So, the scheduler would need to be aware of what was a strong CPU and what was a weak CPU. Not insurmountable.

Anyway, the problem with the idea in general is that it would be limited to particular workloads. For those workloads, it would do very well, for the rest, not so much.

I mean, it works in graphics cards because rendering is an embarrassingly parallel problem. In other workloads, you might not get so lucky.

Also I think Intel has proven that in some cases, its better to have fewer, stronger cores, than more weaker cores (Bulldozer), even in highly multithreaded cases. Not in all cases, I grant you.
 

lagokc

Senior member
Mar 27, 2013
808
1
41
That makes a lot of sense I guess I was only really thinking of the sorts of workloads that enthusiast computers are likely to need - a pair of fast cores for games or a lot of cores for embarrassingly parallel problems like video conversion. But if the later is already coded to run on the GPU I suppose there wouldn't be much point.
 

indy2878

Member
Apr 9, 2013
130
0
0
How about a multi cpu platform that can accept any combination
and types of cpu socket platforms rather than just one type for that particular platform?
For instance, a motherboard that has an AM3+ socket AND an Intel
Xeon socket. So you get the best of both worlds... For instance, for
everyday computing the AM3+ cpu will be used, and in highly threaded
situations, it would switch over to the Xeon cpu, etc...
That right there MIGHT be the answer to the bulldozer scheduler issue...
I don't know I just thought of this strange idea.... :)
 
Jan 31, 2013
108
0
0
Why would you give up 6 strong threads for 7 weak ones? Jaguar is based on a 28nm manufacturing process, you'll never fit 30 of them on the same die as 2 Piledriver cores. You would be lucky to fit 7 of them on the same die. Jaguar is a power efficient x86 core, that's what it was designed to do and it does it well. To give you an idea of how "not so strong" they are, the PlayStation 4 is going to be packing 8 of these Jaguar cores. Keep in mind console game programmers, always program the game to use all of the available resources the console has to offer (every PS4 game will use 8 cores). This is why SONY can get away with using x86 cores in their next console. It's not like a PC that can change every year. Stick to the desktop cores, 8 strong threads will be more beneficial than a few strong and a few weak. AMD plans on adding decoder units to every single of Steamrollers cores, and is saying 30% higher ops from previous gen (we don't know Bulldozer or Piledriver). Tho I can tell you this much if its 30% from Piledriver, than the FX-8450 will be faster than a 2500k clock for clock. If its 30% from Piledriver, which would only work out to be around 10-15% given Piledrivers gains from Bulldozer. Then AMD may still very well finally break Nehalem levels of core performance (first generation i series).
 

lagokc

Senior member
Mar 27, 2013
808
1
41
Each pair of Bulldozer cores is 213 million transistors and is 30.9mm^2
Each Jaguar core is 3.1mm^2

In the same transistor budget for 8 Bulldozer cores, you really could fit something like 32 Jaguar cores (comparing 32nm vs 28nm in those sizes). For single threaded performance the big cores would win but in cases where your application loved lots of threads I'm not so certain.
 
Jan 31, 2013
108
0
0
Each pair of Bulldozer cores is 213 million transistors and is 30.9mm^2
Each Jaguar core is 3.1mm^2

In the same transistor budget for 8 Bulldozer cores, you really could fit something like 32 Jaguar cores (comparing 32nm vs 28nm in those sizes). For single threaded performance the big cores would win but in cases where your application loved lots of threads I'm not so certain.

In short, no having that many cores even if it was plausible (I doubt you can stack desktop and mobile cores side by side and be efficient) wouldn't be beneficial. I also think you're misunderstanding the difference between logical CPU threads and software threads. You can spawn as many software threads as you want, and even 16 threads on a 8 core processor (8 logical threads) would be faster than 8 threads (a thread per logical thread). At the end of the day, if you can come across applications that can even utilize 8 cores. Than you'd understand there really isn't any barrier there, as the sheer raw performance of 8 desktop cores will execute any task you want about instantly. If threading hardware wise was a concern, we wouldn't move to having a rediculous amount of weak cores. You could opt out right now and build a G34 machine, pop in two of these, and run any extreme number crunching software you'd like on its 32 cores. If you'd like an example, I can code up a quick example of how threading software doesn't necessarily rely on logical threads.
 
Status
Not open for further replies.