I guess multi-core processors are the only way, eh?

FalseChristian · Jul 23, 2008

I imagine we'll be seeing 8,16,32,64 etc. core cpus in the future as clock-speed can't really increase that much more...or can they with smaller and smaller processes?

DSF · Jul 23, 2008

In terms of silicon we're going to hit the limit pretty soon as far as process size goes.

boatillo · Jul 23, 2008

Problem is really that they are stuck with x86 architecture for the time being.

Foxery · Jul 23, 2008

That's the plan. Both Intel and AMD's roadmaps for the next several years show 4 and 8 core processors becoming the norm. Also note that Moore's Law is sometimes misquoted as "CPUs double in speed" every 2 yeas, but it's actually "CPUs double in transistors," so it holds perfectly when you choose to increase the # of cores instead of clock speed. As programs become complex enough for multithreading to make more and more sense every year, the overall speed seen by users continues to rise as expected.

heyheybooboo · Jul 23, 2008

Too bad the overwhelming majority of software can only run a single major program thread on a core. That hasn't changed in 10 years (much less the last 2). It's easier to optimize code for specific instruction sets (like SSE4 - SSE4a) than completely rewriting a programs code for multicore thread parallelism.

Shortass · Jul 23, 2008

Originally posted by: heyheybooboo
Too bad the overwhelming majority of software can only run a single major program thread on a core. That hasn't changed in 10 years (much less the last 2). It's easier to optimize code for specific instruction sets (like SSE4 - SSE4a) than completely rewriting a programs code for multicore thread parallelism.

For now. This will likely change sooner than later.

heyheybooboo · Jul 23, 2008

Originally posted by: Shortass

Originally posted by: heyheybooboo
Too bad the overwhelming majority of software can only run a single major program thread on a core. That hasn't changed in 10 years (much less the last 2). It's easier to optimize code for specific instruction sets (like SSE4 - SSE4a) than completely rewriting a programs code for multicore thread parallelism.

Click to expand...

For now. This will likely change sooner than later.

You are deluding yourself if you think so.

There is no compelling reason to do it. It's not necessary. There is no specific need for it. The overwhelming state of your cpu is 'Idle'.

dakels · Jul 23, 2008

Originally posted by: heyheybooboo

Originally posted by: Shortass

Originally posted by: heyheybooboo
Too bad the overwhelming majority of software can only run a single major program thread on a core. That hasn't changed in 10 years (much less the last 2). It's easier to optimize code for specific instruction sets (like SSE4 - SSE4a) than completely rewriting a programs code for multicore thread parallelism.

Click to expand...

For now. This will likely change sooner than later.

Click to expand...

You are deluding yourself if you think so.

There is no compelling reason to do it. It's not necessary. There is no specific need for it. The overwhelming state of your cpu is 'Idle'.

Not trying to be sarcastic, I'm really curious. Why would you say it's not necessary if processes push my a single core to 100% while my other 3 cores do nothing? Is making apps multithreaded really that difficult? Why can't the OS split tasks to different cores?

I'm probably asking questions that are pages long answers but I was looking for a simple answer. Obviously my knowledge of cpu level instruction set/coding is pretty non-existent.

Idontcare · Jul 23, 2008

Originally posted by: DSF
In terms of silicon we're going to hit the limit pretty soon as far as process size goes.

I'd argue it's not quite like that. There is a limit to scaling process technology, it's impractical to do what it takes to shrink atoms just for the desktop market segment, but the limit we are approaching faster still is the financial limit.

The cost of developing successive process technology nodes is exquisitely prohibitive as you go below 45nm. For one the materials of choice become more and more exotic (as far as the industry is concerned) which means elevated risk which means elevated costs to quantify and characterize the risk, etc etc.

This is why you saw consolidation of R&D efforts in the form of the Crolles Alliance and the IBM Ecosystem (aka fab club) develop at the 90nm and 65nm nodes. The situation gets direr at 45nm and beyond.

Intel has the revenue stream to justify the R&D cost structure necessary to fund 22nm and 16nm node development. But does AMD and the associated IBM Ecosystem? Yes but not at a cadence of 2yrs/node...they will be forced to either throw in the towel (ala Texas Instruments) or reduce their process technology cadence to something that reduces the annual R&D commitment to something their revenue stream can cost justify.

The economic limitations will dominate process technology cadence for everyone but Intel going forward (beyond 45nm) more so than the technology challenges of scaling towards atoms.

That's not to say it isn't a challenge, the money is needed to afford the tools needed to solve those challenges. EUV at $180M per tool is a barrier to entry for developing 16nm process technology for any company whose annual sales volume is <$10B.

Originally posted by: Foxery
Also note that Moore?s Law is sometimes misquoted as "CPUs double in speed" every 2 yeas, but it's actually "CPUs double in transistors,"

And even that is an interpretation adopted some time after the seminal paper was published.

Gordon Moore wrote in his original article that "The complexity for minimum component costs has increased at a rate of roughly a factor of two per year."

What Moore was talking about is the cost structure of IC's and how the number of integrated components in an IC has a cost structure which has a minimum (too few components and the cost per IC is high due to fixed overhead costs of the business itself, too many components and the cost per IC is high due to reduced yields).

Mainstream media subsequently re-interpreted Moore's law and replaced the definition's use of "components" with similar (but not the same) words of transistors, clockspeed, or performance "doubling every X number of months" where X was 12 months, then 18, then 24 months as we go thru the decades since the paper's publication.

Edit: figured I'd run spell check for once

Extelleron · Jul 23, 2008

Increasing the number of cores is the way the industry is heading. How far that will continue is the question. Increasing cores is a good (and power efficient) way to improve performance, but as you get above 4-8 processing cores scaling starts to get inefficient even in applications with solid multicore support. Eventually the increase in cores will become the same thing as the increase in frequency.... eventually it will be unsustainable and inefficient when it comes to power usage.

AMD will have 12-core CPUs by 1Q 2010... with Intel you will see up to 8 cores with Beckton in 2009. With 32nm Westmere, the current rumors say that it will have 6 processing cores, so I wouldn't be surprised if an MP variant of it would have 12 cores. On 32nm it should certainly be possible.

The current plan with Intel is to increase both IPC and the number of cores and find a balance between the two. So as you see Intel moving to 4-8 cores from the current focus on 2-4, they are increasing IPC at the same time (via the new Nehalem architecture).

You are going to see more and more cores, but I don't think you will be seeing 32 or 64 current-type cores on a single CPU anytime soon. What we might see in 2010 with Sandy Bridge or perhaps in the next new architecture in 2012 is a heterogeneous CPU where you see several large OoO cores and a larger number of simple in-order cores, like the Cell processor. With certain calculations at least, the performance of such a CPU would be many times what you see now. We can see that already with Cell excelling in certain types of code.... Roadrunner is an example of the power of such a CPU and so is the F@H performance we see on PS3. Especially with the kind of pressure that Intel/AMD will see from GPGPU where the amount of power is looking attractive in constrast to CPUs which cannot get close to GPUs in pure processing power.

Idontcare · Jul 23, 2008

Originally posted by: Extelleron
Increasing the number of cores is the way the industry is heading. How far that will continue is the question. Increasing cores is a good (and power efficient) way to improve performance, but as you get above 4-8 processing cores scaling starts to get inefficient even in applications with solid multicore support.

In my simplistic view it is no different then the evolution of L2 and L3 cache designs from off-die to on-package (MCM) to on-die...which once accomplished then became an iterative march to larger and larger sized caches.

It's always a tradeoff with cache size versus cost and power consumption as well, no different with size of the core, complexity of the core, and the power consumption of the cores. (where the evolution was multi-socket mobo, MCM dies on package, on-die multicore, etc)

Originally posted by: Extelleron
Eventually the increase in cores will become the same thing as the increase in frequency.... eventually it will be unsustainable and inefficient when it comes to power usage.

This part I don't follow the logic. It's no less sustainable than clockspeed ramping in that it is true that any given time if the design is intentionally crazy then sure it will be unsustainable.

An 8GHz Netburst chip was not a problem unless you were silly enough to attempt it with a 90nm node, at 32nm it probably would be just fine. So too you could argue that a "native" quadcore chip is proper for 45nm node and beyond but ludicrous for 65nm and earlier.

Sustainability and efficiency come down to implementation and timing not some inherent fundamental limitation in the CMOS itself (unless you are talking about clocking chips in the THz region where silicon has fundamental physics-based limitations)

Originally posted by: Extelleron
What we might see in 2010 with Sandy Bridge or perhaps in the next new architecture in 2012 is a heterogeneous CPU where you see several large OoO cores and a larger number of simple in-order cores, like the Cell processor.

Being an old-school Beowulf cluster builder I agree with the concept of the superiority in having your serialized codes process on a beefier more complex core while the parallelized codes (multi-threaded portions) are farmed out to more numerous but simpler cores.

However the complexity saved at the hardware level is merely transferred to the software. The software must increase in complexity so as to be capable of managing where threads are allocated, migrated, spawned, as hardware resources are utilized.

In Beowulf applications (such as roadrunner, blue-gene, and just about every HPC out there) this is an accepted part of the system...applications are intentionally coded to manage their threads.

But will Microsoft do what is needed to make this feasible on the desktop by 2010? I won't hold my breath, not for a second.

Foxery · Jul 23, 2008

Originally posted by: Extelleron
Increasing cores is a good (and power efficient) way to improve performance, but as you get above 4-8 processing cores scaling starts to get inefficient even in applications with solid multicore support. .... eventually it will be unsustainable and inefficient when it comes to power usage.

That's a good point. For all but the most specialized applications, multi-CPU scaling works more like diminishing returns. Programmers' ability to feed the extra cores with meaningful work becomes harder if you add a ridiculous number of them.

Quads show a lot of promise for gamers and workstations, and large enterprise/server type machines can eat all the power you can throw at them. But as far as jumping to 8/16/32 cores for a home machine, I'd have to say "maybe, but not any time soon." Even if they become available/affordable a few years down the road, you will certainly still be able to buy a 2-core "budget" version to put into generic office machines who don't need all that power.

Originally posted by: heyheybooboo
There is no compelling reason to do it. It's not necessary. There is no specific need for it. The overwhelming state of your cpu is 'Idle'.

Mine isn't

Anyway, nobody will force you to buy an overpowered CPU. It's only good business sense for Intel/AMD to sell inexpensive products with a small number of cores to Joe Average. If I recall correctly, Nehalem is planned to come in 2, 4 and 8-core varieties. They'll be priced accordingly, so the market will work things out naturally.

edit: We may see the gap between enthusiast and budget parts widen in the long run, which suits me just fine!

Extelleron · Jul 23, 2008

Originally posted by: Idontcare

Originally posted by: Extelleron
Increasing the number of cores is the way the industry is heading. How far that will continue is the question. Increasing cores is a good (and power efficient) way to improve performance, but as you get above 4-8 processing cores scaling starts to get inefficient even in applications with solid multicore support.

Click to expand...

In my simplistic view it is no different then the evolution of L2 and L3 cache designs from off-die to on-package (MCM) to on-die...which once accomplished then became an iterative march to larger and larger sized caches.

It's always a tradeoff with cache size versus cost and power consumption as well, no different with size of the core, complexity of the core, and the power consumption of the cores. (where the evolution was multi-socket mobo, MCM dies on package, on-die multicore, etc)

Originally posted by: Extelleron
Eventually the increase in cores will become the same thing as the increase in frequency.... eventually it will be unsustainable and inefficient when it comes to power usage.

Click to expand...

This part I don't follow the logic. It's no less sustainable than clockspeed ramping in that it is true that any given time if the design is intentionally crazy then sure it will be unsustainable.

An 8GHz Netburst chip was not a problem unless you were silly enough to attempt it with a 90nm node, at 32nm it probably would be just fine. So too you could argue that a "native" quadcore chip is proper for 45nm node and beyond but ludicrous for 65nm and earlier.

Sustainability and efficiency come down to implementation and timing not some inherent fundamental limitation in the CMOS itself (unless you are talking about clocking chips in the THz region where silicon has fundamental physics-based limitations)

Originally posted by: Extelleron
What we might see in 2010 with Sandy Bridge or perhaps in the next new architecture in 2012 is a heterogeneous CPU where you see several large OoO cores and a larger number of simple in-order cores, like the Cell processor.

Click to expand...

Being an old-school Beowulf cluster builder I agree with the concept of the superiority in having your serialized codes process on a beefier more complex core while the parallelized codes (multi-threaded portions) are farmed out to more numerous but simpler cores.

However the complexity saved at the hardware level is merely transferred to the software. The software must increase in complexity so as to be capable of managing where threads are allocated, migrated, spawned, as hardware resources are utilized.

In Beowulf applications (such as roadrunner, blue-gene, and just about every HPC out there) this is an accepted part of the system...applications are intentionally coded to manage their threads.

But will Microsoft do what is needed to make this feasible on the desktop by 2010? I won't hold my breath, not for a second.

An 8GHz Netburst is rediculously inefficient though, regardless of what process it is on. An 8GHz Netburst @ 32nm would be possible, sure, but it wouldn't beat an E8400 @ 3GHz on 45nm with certainly lower power consumption.

I'm not saying it is impossible to build a CPU made up of 64 Nehalem cores, but I'm saying it is incredibly inefficient. Of course with smaller and smaller processes, what is impossible now will be possible 5 years from now, but that doesn't mean that a 10GHz Netburst or 64 core Nehalem is advisible despite being feasible. CPU Power consumption nearly doubles when you double the amount of cores, yet even now you see only a ~90% jump in performance moving to dual core. Then you see a 70-80% jump goving from 2->4 cores. Then an even smaller jump going from 4->8. And that just continues. If you look at scaling with a program like Cinebench that clearly shows single/multi threaded performance, you see the scaling going down each time. With 2 cores, you can get 1.95x the performance of one. With 16 cores, you get ~10x the performance of one. But even though the efficiency is not as good, the power consumption will still increase by near 2x.

It's just like with frequency.... once you reach a certain point, increasing it beyond that requires a jump in voltage and thus an exponential jump in power consumption. A 200MHz jump or so can raise power consumption by 50% in certain cases. Look at Phenom, 9650 vs 9950. For a 13% jump in frequency, you see a 47% jump in TDP. That is why the frequency race died. You can increase frequency, but once you reach a threshold the increase in performance is not worth the increase in power consumption. It's the same thing with cores, although not quite as bad.

Of course a big part of the problem is software, which is just not coded well for multicore CPUs right now. It's only been 3 years that CPUs with more than one core were available for the average consumer, and only around 2 years since they were affordable and seen in mainstream systems. As the user base continues to get wider and gains from frequency/IPC begin to diminish, programmers are going to have no choice but to program for multicore.

As for such a heterogeneous CPU being possible in Windows by 2010.....I doubt it as well. But it seems likely such a CPU is in the future. Intel didn't build the TeraScale CPU for fun, they built it to see how to build the right infrastructure for a CPU with a massive number of simple cores to be fed and communicate quickly.

As Foxery is saying though the needs for such power are just not there for the average consumer. There will always be those who need more power, but others that don't need more than what is available now. So while such architectures may be built for the high-end and servers/super computers, a significant market is not going to need more than 2 or 4 cores at most. The big thing for that market is cheaper costs and lower power consumption. Atom is a step in the right direction for meeting the needs of that market. I suspect that a lot of development will be put in that direction and not just focused on 100% raw power anymore.

VirtualLarry · Jul 23, 2008

This brings to mind an idea i had- buying a pc more powerful than you need in order to rent out computer power.

Idontcare · Jul 23, 2008

Originally posted by: Idontcare

Originally posted by: DSF
In terms of silicon we're going to hit the limit pretty soon as far as process size goes.

Click to expand...

I'd argue it's not quite like that. There is a limit to scaling process technology, it's impractical to do what it takes to shrink atoms just for the desktop market segment, but the limit we are approaching faster still is the financial limit.

The cost of developing successive process technology nodes is exquisitely prohibitive as you go below 45nm. For one the materials of choice become more and more exotic (as far as the industry is concerned) which means elevated risk which means elevated costs to quantify and characterize the risk, etc etc.

This is why you saw consolidation of R&D efforts in the form of the Crolles Alliance and the IBM Ecosystem (aka fab club) develop at the 90nm and 65nm nodes. The situation gets direr at 45nm and beyond.

Intel has the revenue stream to justify the R&D cost structure necessary to fund 22nm and 16nm node development. But does AMD and the associated IBM Ecosystem? Yes but not at a cadence of 2yrs/node...they will be forced to either throw in the towel (ala Texas Instruments) or reduce their process technology cadence to something that reduces the annual R&D commitment to something their revenue stream can cost justify.

The economic limitations will dominate process technology cadence for everyone but Intel going forward (beyond 45nm) more so than the technology challenges of scaling towards atoms.

That's not to say it isn't a challenge, the money is needed to afford the tools needed to solve those challenges. EUV at $180M per tool is a barrier to entry for developing 16nm process technology for any company whose annual sales volume is <$10B.

I hate replying to my own post as I feel like I'm beating a dead horse, but this relevant info came out today:

MIT: Optical lithography good to 12 nanometers

Optical lithography can be extended to 12 nanometers, according to Massachusetts Institute of Technology researchers who have so far demonstrated 25-nm lines using a new technique called scanning beam interference lithography.

http://www.eetimes.com/news/se...HA?articleID=209400807

Now 25nm wide lines (that would be 50nm pitch, appropriate for the 16nm node) is pretty impressive, but 12nm wide lines (which is what they are claiming to be capable of doing) would be appropriate for the 7nm node (i.e. beyond the 11nm node that the ITRS roadmap ends at for circa 2022 production timeframe).

Lots of gas can be put into this tank, provided you can afford to fill'er up.

Search

I guess multi-core processors are the only way, eh?

FalseChristian

Diamond Member

DSF

Diamond Member

boatillo

Senior member

Foxery

Golden Member

heyheybooboo

Diamond Member

Shortass

Senior member

heyheybooboo

Diamond Member

dakels

Platinum Member

Idontcare

Elite Member

Extelleron

Diamond Member

Idontcare

Elite Member

Foxery

Golden Member

Extelleron

Diamond Member

VirtualLarry

No Lifer

Idontcare

Elite Member

TRENDING THREADS