Yes another Haswell thread. Let's have a look at tock-to-tock IPC.

crashtech · Jun 12, 2013

Abwx said:
Not true , the plateform still has a SMPS that switch
the full CPU power down to 2.4V , a value low enough
to not damage the CPU s IVRs that are unable
to work with a 12V rail as you re wrongly assuming it.

This is like two regulators in cascade, which makes me think that efficiency or parts count was not the goal, rather moving ultimate control of voltage away from the motherboard.

BenchPress · Jun 12, 2013

Ancalagon44 said:
Rest assured that we're a long way from hitting the "inherent scaling issues" of multi-core and vectorization.

Click to expand...

What? We have been hitting inherent scaling issues ever since the Pentium D! Synchronization always adds a cost, not to mention, most tasks are simply not easily parallelizable. Yes, scaling issues are lower with vectorization, but with multi core CPUs, scaling is and always has been an issue, except in ideal situations.

Wrong. You're right that synchronization adds cost, but TSX lowers that cost. Hence applications where the overhead was larger than the gains were not hitting any "inherent scaling issues", they were simply held back by a lack of efficient synchronization primitives. That's not inherent to multi-core itself, that's an implementation aspect.

Secondly, most tasks really are quite parallelizable. You're right that they were not "easily" parallelizable, but that's also something TSX addresses. So once again the low amount of multi-threaded software isn't due to inherent scaling issues, but due to implementation issues that can be improved on.

So I stand by my claim; we're a long way from hitting the inherent scaling issues of multi-core. Inherent means it can't be improved on, but TSX does exactly that. You can expect to see more multi-threaded software in the future.

Cogman · Jun 12, 2013

BenchPress said:
Wrong. You're right that synchronization adds cost, but TSX lowers that cost. Hence applications where the overhead was larger than the gains were not hitting any "inherent scaling issues", they were simply held back by a lack of efficient synchronization primitives. That's not inherent to multi-core itself, that's an implementation aspect.

Secondly, most tasks really are quite parallelizable. You're right that they were not "easily" parallelizable, but that's also something TSX addresses. So once again the low amount of multi-threaded software isn't due to inherent scaling issues, but due to implementation issues that can be improved on.

So I stand by my claim; we're a long way from hitting the inherent scaling issues of multi-core. Inherent means it can't be improved on, but TSX does exactly that. You can expect to see more multi-threaded software in the future.

The thing that holds TSX back from being useful is the fact that transactional memory isn't very well done in most languages (I think haskell is the exception to this).

You COULD use TSX to just replace locking all together (The os would have to make this change) however, Transactional memory isn't always faster than straight up locking. In situations where the locks span highly contended resources, transactional memory tends to be quite a bit slower. TSX won't solve this problem.

Either way, this should at least make transactional memory a relevant discussion in many programming languages, now that the hardware itself supports transactional constructs, there is less of a reason to avoid it.

Ancalagon44 · Jun 12, 2013

BenchPress said:
Wrong. You're right that synchronization adds cost, but TSX lowers that cost. Hence applications where the overhead was larger than the gains were not hitting any "inherent scaling issues", they were simply held back by a lack of efficient synchronization primitives. That's not inherent to multi-core itself, that's an implementation aspect.

Does TSX lower the cost or remove it entirely? So, you are admitting that there is a cost - because of synchronization - and TSX may lower it but cannot completely remove it.

Also, there was and still is a cost, even with Haswell, that would not be present under a single threaded workload. That is a scaling problem.

Lets quote the anandtech article on the subject:

There is a reason why getting good multi-core scalability requires a large investment of time and money.

If another thread overwrites one of those shared variables anyway, the whole process will be aborted by the CPU, and the transaction will be re-executed but with a traditional lock.

So, if the transaction fails, it has to be run again with a traditional lock - does this sound like a magic bullet that is an end to scaling issues? No! Scaling is still there, it wont go away that easily.

Secondly, most tasks really are quite parallelizable. You're right that they were not "easily" parallelizable, but that's also something TSX addresses. So once again the low amount of multi-threaded software isn't due to inherent scaling issues, but due to implementation issues that can be improved on.

Uh no, most tasks are not. Here is a wikipedia article for you

There is a term for nice, easily parallizable tasks - they call them embarrassingly parallel problems. What you are saying is that this is a misnomer because most tasks are quite parallizable.

What you are saying is just not true! How much programming experience do you have? You cant just throw moar cores at a problem and expect it to go away! Developing good multi threaded software takes a lot of skill, time and effort, which is why so many people have invested so much time into making libraries that make parallization easy!

Can you give any evidence for the non existence of scaling issues?
Here is another link:
And here is a nice quote from that article:

Additional cores typically don’t scale anywhere near to linearly, meaning that going from two to four cores will not result in doubled computing performance, unless your applications are really thread-optimized and are not bottlenecked by other system components.

You are the only person that I have ever "met" online who claims that scaling is not an issue. Everyone else says it is a huge issue - hence the reason for TSXs existence!

EDIT: Here is another one for you. Why do you think Amdahl's Law exists?

"Oh I know multiple cores always scale perfectly, but you know what, I'm bored, so I'm just going to create a law and have it named after me for fun."

Idontcare · Jun 12, 2013

Ancalagon44 said:
Does TSX lower the cost or remove it entirely? So, you are admitting that there is a cost - because of synchronization - and TSX may lower it but cannot completely remove it.

Before you and BenchPress descend too much farther into your self-destructive rage cycle (

) I'm just going to say that I think the two of you would make for an awesome team if you were both sitting across from each other in a pub drinking a couple beers

Clearly you both know your stuff, but you are squaring off in a battle royale over what I perceive to be a simple difference of perspective.

FWIW I tend to see core scaling from your (Ancalagon44) position, but I can appreciate that from BenchPress' vantage things look different.

Think about pre-fetchers and caching. Dram sucks for latency, so CPUs have cache...but even cache isn't good enough. The CPU needs prefetchers and branch predictors to minimize the serious performance issues of misses (both in branch and cache).

Amdahl's law and the impact of inter-processor overhead is unavoidable. You can improve the latency and bandwidth but you can't make it zero-latency or infinite bandwidth, so the intrinsic bottleneck is unavoidable.

But you can hide the impact of it, mask it, the same as cache prefetchers mask and hide the horrid access latency of going to the ram.

I see TSX like that. It makes today's terrible inter-processor communications latency and bandwidth a little less problematic, and in doing so it enables a whole new level of somewhat-fine-grained parallelized software to be able to masquerade as course-grained software with great scalability.

BenchPress · Jun 13, 2013

Cogman said:
The thing that holds TSX back from being useful is the fact that transactional memory isn't very well done in most languages (I think haskell is the exception to this).

Why would it have to be a language feature? It can be done perfectly through some library functions, or better yet, intrinsics. Don't get me wrong, I certainly agree that it could be very useful to integrate it more tightly with the programming languages, but I really wouldn't say it's held back from being "useful" without that.

You COULD use TSX to just replace locking all together (The os would have to make this change) however, Transactional memory isn't always faster than straight up locking. In situations where the locks span highly contended resources, transactional memory tends to be quite a bit slower. TSX won't solve this problem.

I never said it's a silver bullet. There are problems which will inherently always be slow, such as highly contended resources. All I'm saying is that the majority of today's multi-threaded scaling issues are not inherent, but are instead caused by the high overhead of fine-grained locking of uncontended resources, or idling threads with coarse-grained locking. TSX reduces these main causes of overhead.

Either way, this should at least make transactional memory a relevant discussion in many programming languages, now that the hardware itself supports transactional constructs, there is less of a reason to avoid it.

Agreed. These are exiting times where new programming paradigms will be born. I think a lot can be borrowed from hardware design languages, where in theory everything happens in parallel unless specified otherwise. For instance SystemC is quite intriguing. Of course in practice it's quite slow due to actually being a C++ framework, and the lack of fast synchronization. TSX could change that and spawn standalone programming languages based on SystemC's principles.

BenchPress · Jun 13, 2013

Ancalagon44 said:
Does TSX lower the cost or remove it entirely? So, you are admitting that there is a cost - because of synchronization - and TSX may lower it but cannot completely remove it.

Absolutely. There's nothing to "admit" here. I never said it doesn't have a cost. What's relevant is that the cost can and is being lowered, thereby proving that we previously weren't hitting inherent multi-core scaling issues. And to be perfectly clear, by hitting them I mean they would prevent any significant performance improvement from increasing the core count for the majority of software.

Also, there was and still is a cost, even with Haswell, that would not be present under a single threaded workload. That is a scaling problem.

Yes, it is a scaling problem, but not an inherent one. So don't attack a straw man. My only point was that it is still very useful to increase the core count. It just requires addressing the synchronization overhead, which TSX makes a very serious attempt at. There would only be an inherent scaling issue if the locking overhead was zero and it still wouldn't increase performance. At that point the dependencies between the tasks dictate the performance, and not the synchronization itself. But we're a long way from that to become the general case.

Lets quote the anandtech article on the subject:
So, if the transaction fails, it has to be run again with a traditional lock - does this sound like a magic bullet that is an end to scaling issues? No! Scaling is still there, it wont go away that easily.

Straw man.

Secondly, most tasks really are quite parallelizable. You're right that they were not "easily" parallelizable, but that's also something TSX addresses. So once again the low amount of multi-threaded software isn't due to inherent scaling issues, but due to implementation issues that can be improved on.

Click to expand...

Uh no, most tasks are not. Here is a wikipedia article for you

There is a term for nice, easily parallizable tasks - they call them embarrassingly parallel problems. What you are saying is that this is a misnomer because most tasks are quite parallizable.

You're greatly exaggerating what constitutes a parallelizable task. Just because something isn't embarassingly parallel doesn't mean it can't benefit from more cores than what we have today. You also don't seem to get my point about TSX making it easier. A lot more software could have been taking advantage of multi-threading already, in theory, but hasn't been in practice due to the engineering cost involved. Transactions are way easier to reason about than atomic scalar operations.

What you are saying is just not true! How much programming experience do you have?

Sigh. I have 15 years of experience in C++ and x86 assembly. Do you use the Chrome browser? Then you're using some of my software.

So do yourself a favor and stop assuming that anyone who doesn't fully agree with you must be inexperienced.

You cant just throw moar cores at a problem and expect it to go away! Developing good multi threaded software takes a lot of skill, time and effort, which is why so many people have invested so much time into making libraries that make parallization easy!

Tell me something I don't know. Even with TSX it will require lots more work to put those cores to good use. But that doesn't take away that it's a game changer. It turned something infeasible into something attainable.

Can you give any evidence for the non existence of scaling issues?

Straw man.

Here is another link:
And here is a nice quote from that article:

Additional cores typically dont scale anywhere near to linearly, meaning that going from two to four cores will not result in doubled computing performance, unless your applications are really thread-optimized and are not bottlenecked by other system components.

Click to expand...

You are the only person that I have ever "met" online who claims that scaling is not an issue. Everyone else says it is a huge issue - hence the reason for TSXs existence!

Straw man.

EDIT: Here is another one for you. Why do you think Amdahl's Law exists?

"Oh I know multiple cores always scale perfectly, but you know what, I'm bored, so I'm just going to create a law and have it named after me for fun."

The locking overhead is part of the sequential portion in Amdahl's law. Since TSX lowers it, the law dictates that a higher speedup can be expected from using more cores. Q.E.D.

BenchPress · Jun 13, 2013

Idontcare said:
You can improve the latency and bandwidth but you can't make it zero-latency or infinite bandwidth, so the intrinsic bottleneck is unavoidable.

But you can hide the impact of it, mask it, the same as cache prefetchers mask and hide the horrid access latency of going to the ram.

I see TSX like that.

I like your analogy! Both prefetchers and TSX optimize for the common case.

Namely for prefetchers it assumes that your memory accesses are linear so it can predict what data you need next and cache it in advance. In theory this may thrash the cache with unwanted data, but in practice that's rare so it provides a very substantial net speedup. Likewise TSX optimistically assumes low contention so it starts executing a critical section before any lock is actually acquired. Again in theory this could make things worse by having to roll back in case of a conflict, but in practice that should be rare and so TSX could result in a nice net speedup.

Ancalagon44 · Jun 13, 2013

I think we're arguing about inherent multi core scaling issues. Perhaps we are misunderstanding each other.

My thinking is, inherent multi core scaling issues exist because scaling will never be 100% perfect. We may get it to 99% for certain problems, but never 100%. Thus, there are scaling issues. If we had an OS that did not even do timeslicing, and just ran one problem on one core, there would be no losses due to scaling. Any time another core is added, there will be losses due to scaling and that is unavoidable. That is my point. I'm not saying the magnitude of those losses is constant or that it cannot be lowered - clearly it can.

To me, that is inherent multi core scaling issues - an unavoidable consequence of using more than 1 CPU. There is a loss of work, no matter how small, and that is my point. Its like, a lot of eBay sellers used to advertise 5.6 Ghz computers, which obviously were 2.8Ghz dual cores. It is imperfect scaling which makes this inaccurate (amongst other things!).

You said, we are a long way from hitting inherent scaling issues - then what did you mean by that? To me, we hit them the instant Intel added a second CPU core, because of synchronization issues. Heck, we hit them the instant somebody made a superscalar CPU that could have one thread waiting for a load to complete while another did integer work. As I said, there is a non zero unavoidable cost - inherent scaling issues.

EDIT: I'm not saying multi core CPUs and multi threaded programming is not worth the hassle - obviously it is. I'm just saying, its not a perfect world out there.

krumme · Jun 13, 2013

Haswell core is 14.5mm2 and on bulk
Intel cost is tied to fixed cost; a) process development b)fab capacity and development c)design
They seem to have excess capacity.

What is the marginal production cost today of adding 4 extra cost on Haswell / 2014?

- Going forward to 14nm node on the mobile platform, the capacity on 22nm seems plenty eg. for the desktop. Are we talking 20 or 60 usd per cpu?

The beauty of Haswell is its adressing the scaling issue for performance on a technical level. At the same time there seems to be indeed very good scaling opportunities on the production and economic level.

I think it calls for a push strategy to defend the market:
Get more cores, tsx, avx2 on the desktop market. At more or less the same user cost. At the same time as keeping the current mobile strategy because mobile takes the new nodes.
It makes sense when you practically have monopoly at least for the profitable end.

Down side:
This is added production cost on the short term.
This cost is marginal and easily controlled and altered.

The upside could be:
Development of software that takes advangtage of more cores, tsx, avx2. Giving unique user experiences eg. games that is like reality for physics, animations...
The risk is, that this does not happen

But if it happens Intel would be having a larger market than would otherwise be the case - and for years. If they could relatively expand the market by say 20% using that strategy, the marginal production cost would be paid many times. The desktop market is still, and will be for years, a classical cash cow albeit getting smaller with each year - as it looks now.

Intel instead have chosen a pull strategy witch tries to get so many money out of the market for the next year or two, with classical segmentation strategies (eg. no tsx for enthusiast so they are forced to upgrade). It looks for me like a move decided more by chief bonus than by long term profitability.

They have a very forward looking product in their hands, and they react like they are scared.

If this was a family owned company i a sure they would safeguard their long terms profitability more.

Is there anyone here who beliewe eg. taking tsx away from the K models is a good strategy - in a economic perspective?

For the first time ever i chose a model who could not be overclocked (in the old 386 days i used to solder new crystal on the motherboard). Its a 65w tdp model (4570s i think), that could fit in a 100w envelope with double the cores and same gpu. And be small at the same time. Slap a good gfx on it, and with good programming for it, it could do things we can hardly imagine.

I think its in Intel interest to defend and develop the market. I want the cores and tsx. And i urge you to voice up for the same.

That would develop this desktop market into a new level.

grimpr · Jun 13, 2013

krumme said:
I think its in Intel interest to defend and develop the market. I want the cores and tsx. And i urge you to voice up for the same.

That would develop this desktop market into a new level.

Amen.

Cogman · Jun 13, 2013

BenchPress said:
Why would it have to be a language feature? It can be done perfectly through some library functions, or better yet, intrinsics. Don't get me wrong, I certainly agree that it could be very useful to integrate it more tightly with the programming languages, but I really wouldn't say it's held back from being "useful" without that.

I guess it somewhat depends on the language being used. For example, java might throw a concurrent modification exception on some data structures with tsx in place.

Any data structure that makes any sort of concurrency guarantees like that could cause issues. That is where I'm mostly coming from with the "language level support".

It would be extra nice, but not necessary, if we also got something like a transact scope or something similar.

I never said it's a silver bullet. There are problems which will inherently always be slow, such as highly contended resources. All I'm saying is that the majority of today's multi-threaded scaling issues are not inherent, but are instead caused by the high overhead of fine-grained locking of uncontended resources, or idling threads with coarse-grained locking. TSX reduces these main causes of overhead.

Agreed, I was more pointing out an issue for issue pointing out sake

. Highly contended resources are better handled by locking than transactional memory. That was more my point.

Torn Mind · Jun 13, 2013

krumme said:
Haswell core is 14.5mm2 and on bulk
Intel cost is tied to fixed cost; a) process development b)fab capacity and development c)design
They seem to have excess capacity.

What is the marginal production cost today of adding 4 extra cost on Haswell / 2014?

- Going forward to 14nm node on the mobile platform, the capacity on 22nm seems plenty eg. for the desktop. Are we talking 20 or 60 usd per cpu?

The beauty of Haswell is its adressing the scaling issue for performance on a technical level. At the same time there seems to be indeed very good scaling opportunities on the production and economic level.

I think it calls for a push strategy to defend the market:
Get more cores, tsx, avx2 on the desktop market. At more or less the same user cost. At the same time as keeping the current mobile strategy because mobile takes the new nodes.
It makes sense when you practically have monopoly at least for the profitable end.

Down side:
This is added production cost on the short term.
This cost is marginal and easily controlled and altered.

The upside could be:
Development of software that takes advangtage of more cores, tsx, avx2. Giving unique user experiences eg. games that is like reality for physics, animations...
The risk is, that this does not happen

But if it happens Intel would be having a larger market than would otherwise be the case - and for years. If they could relatively expand the market by say 20% using that strategy, the marginal production cost would be paid many times. The desktop market is still, and will be for years, a classical cash cow albeit getting smaller with each year - as it looks now.

Intel instead have chosen a pull strategy witch tries to get so many money out of the market for the next year or two, with classical segmentation strategies (eg. no tsx for enthusiast so they are forced to upgrade). It looks for me like a move decided more by chief bonus than by long term profitability.

They have a very forward looking product in their hands, and they react like they are scared.

If this was a family owned company i a sure they would safeguard their long terms profitability more.

Is there anyone here who beliewe eg. taking tsx away from the K models is a good strategy - in a economic perspective?

For the first time ever i chose a model who could not be overclocked (in the old 386 days i used to solder new crystal on the motherboard). Its a 65w tdp model (4570s i think), that could fit in a 100w envelope with double the cores and same gpu. And be small at the same time. Slap a good gfx on it, and with good programming for it, it could do things we can hardly imagine.

I think its in Intel interest to defend and develop the market. I want the cores and tsx. And i urge you to voice up for the same.

That would develop this desktop market into a new level.

Most people don't care when buying desktops. So long as it is "fast" for them, they won't care because they're Mr. Average Joe doesn't know anything about how the desktop works and certainly has no clue as to what an instruction set is, but rather that it just works well. If the desktop user is not into serious production work for a company or a gamer, they won't care what's in there as long as the box is "fast".

It is no small effort to first educate and the motivate a huge mass of people to protest something, especially if to them, the matter is meaningless to them and not worth their time.

BenchPress · Jun 13, 2013

Ancalagon44 said:
My thinking is, inherent multi core scaling issues exist because scaling will never be 100% perfect. We may get it to 99% for certain problems, but never 100%. Thus, there are scaling issues. There is a loss of work, no matter how small, and that is my point.

That certainly couldn't have been your original point:

Ancalagon44 said:
IPC should be the first thing anyone ever optimizes, due to A) inherent scaling problems with multiple cores...

Note that this is what started our discussion. If "inherent scaling problems" refer to "a loss of work, no matter how small", that implies you expect IPC to scale much more easily.

In reality IPC only scales by roughly 10% every generation before it becomes too expensive, so that's not setting a very high bar for multi-core to beat!

So that brings us back to my point. The reason multi-core has been unable to convincingly beat the IPC scaling is not because most applications "inherently" don't have multi-threading potential. It's because (A) the synchronization overhead has been too high, and (B) the synchronization primitives were too hard to use correctly by most developers to bother. TSX addresses both issues simultaneously. And thus it demands re-evaluating whether the next generation should put most effort into marginally increasing IPC, or offer more cores.

You said, we are a long way from hitting inherent scaling issues - then what did you mean by that?

What the dictionary says it means. Permanent. Fixed. And quantitatively it means the scaling would have to be worse than 10% on average, always, unfixable.

To me, we hit them the instant Intel added a second CPU core, because of synchronization issues. Heck, we hit them the instant somebody made a superscalar CPU that could have one thread waiting for a load to complete while another did integer work.

That's not superscalar, that's SMT (such as Hyper-Threading).

As I said, there is a non zero unavoidable cost - inherent scaling issues.

In that sense IPC would also have inherent scaling issues, which doesn't fit your original statement. So you're going to have to make up your mind...

EDIT: I'm not saying multi core CPUs and multi threaded programming is not worth the hassle - obviously it is. I'm just saying, its not a perfect world out there.

Which confirms that you've actually changed your mind about things. There's nothing wrong with that since we can now agree on things, but please show some integrity and admit it instead of trying to defend your original use of the word "inherent" which is really no longer fitting.

Ancalagon44 · Jun 13, 2013

BenchPress said:
That certainly couldn't have been your original point:

It was and still is.

Note that this is what started our discussion. If "inherent scaling problems" refer to "a loss of work, no matter how small", that implies you expect IPC to scale much more easily.

I do - increasing IPC requires nothing from the programmer, perhaps only the compiler.

In reality IPC only scales by roughly 10% every generation before it becomes too expensive, so that's not setting a very high bar for multi-core to beat!

Yes, probably true. However, look at the Bulldozer fiasco - can you honestly say it was a good idea to shave off execution resources and increase the pipeline length in order to add extra cores and hit higher clockspeeds?

My point is, Intel has the right approach by NOT adding moar cores like AMD does. They increase IPC as much as is possible in a given generation, and add cores only later. Witness Core 2 Quad 7 years ago, with 6 core CPUs only available to enthusiasts or on workstations.

So that brings us back to my point. The reason multi-core has been unable to convincingly beat the IPC scaling is not because most applications "inherently" don't have multi-threading potential. It's because (A) the synchronization overhead has been too high, and (B) the synchronization primitives were too hard to use correctly by most developers to bother. TSX addresses both issues simultaneously. And thus it demands re-evaluating whether the next generation should put most effort into marginally increasing IPC, or offer more cores.

I'm less interested in WHY multi core scaling has not been as good as it could have been and more interested in the FACT that it is the case.

What the dictionary says it means. Permanent. Fixed. And quantitatively it means the scaling would have to be worse than 10% on average, always, unfixable.

Actually that is not what inherent means - look at this online definition:

Existing as an essential constituent or characteristic; intrinsic.

THAT is what I meant - that scaling issues are an inherent problem with multi core CPUs.

That's not superscalar, that's SMT (such as Hyper-Threading).

Yes, used the wrong example.

In that sense IPC would also have inherent scaling issues, which doesn't fit your original statement. So you're going to have to make up your mind...

Well lets read my original statement again, shall we?

Look at K8 vs P4 - AMD went after IPC, Intel went after clockspeed. We all know which was faster.

Look at Bulldozer and derivatives vs Intel Core anything - again, Intel is faster at just about everything, despite "only" having 4 cores and "stupidly" going after higher IPC.

IPC should be the first thing anyone ever optimizes, due to A) inherent scaling problems with multiple cores, except in ideal situations like graphics, and B) power consumption and heat problems when trying to boost clock speed at the expense of IPC, a la both P4 and Bulldozer.

Do I say that IPC does not have its own issues? No. Do I say IPC is perfect? No.

I saw multi core programming has inherent scaling issues, which is true. Not fixed, but then that is not what inherent means, is it? Unless you care to correct the dictionary.

My actual point, which we have deviated from completely, is that IPC is the low hanging fruit compared to adding extra cores. Rather do that than add extra cores - if possible. Obviously having a CPU with 20 decoders and 20 integer execution units is not much good to most people, however, IPC should still be the first thing optimized in any generation. My point was particularly made concerning AMD's core race vs Intel's tick tock. Witness how Intel has consistently improved IPC every generation, whereas AMD has actually slid backwards. Who makes better CPUs? That is my point.

Which confirms that you've actually changed your mind about things. There's nothing wrong with that since we can now agree on things, but please show some integrity and admit it instead of trying to defend your original use of the word "inherent" which is really no longer fitting.

I'm really not sure how you arrived at this - I have not changed my mind. As I said, inherent does not mean fixed or permanent, it means an essential part of me. It is not I who misunderstood that. You apparently assumed that I was saying that multi core scaling carries a fixed penalty of 10 doodads. Again, the key is here:

IPC should be the first thing anyone ever optimizes,

I do not see the word "only" there - do you? I do not see a recommendation to do away with multiple cores altogether - do you?

cytg111 · Jun 13, 2013

BenchPress said:
Wrong. You're right that synchronization adds cost, but TSX lowers that cost. Hence applications where the overhead was larger than the gains were not hitting any "inherent scaling issues", they were simply held back by a lack of efficient synchronization primitives. That's not inherent to multi-core itself, that's an implementation aspect.

Secondly, most tasks really are quite parallelizable. You're right that they were not "easily" parallelizable, but that's also something TSX addresses. So once again the low amount of multi-threaded software isn't due to inherent scaling issues, but due to implementation issues that can be improved on.

So I stand by my claim; we're a long way from hitting the inherent scaling issues of multi-core. Inherent means it can't be improved on, but TSX does exactly that. You can expect to see more multi-threaded software in the future.

To add to this, take anands artice on 'making sense of tsx'

http://www.anandtech.com/show/6290/...well-transactional-synchronization-extensions

graph

http://images.anandtech.com/graphs/graph6290/49955.png

Assuming that TSX will close to eliminate that bottleneck (another anand article shows this for coarse grained scenarios) a proper haswell tsx implementation would perform almost like a 5-core haswell without tsx. Calling that crippled or not, thats perhaps for debate, but it is not *nothing*.

cytg111 · Jun 13, 2013

Ancalagon44 said:
My actual point, which we have deviated from completely, is that IPC is the low hanging fruit compared to adding extra cores. Rather do that than add extra cores - if possible. Obviously having a CPU with 20 decoders and 20 integer execution units is not much good to most people, however, IPC should still be the first thing optimized in any generation. My point was particularly made concerning AMD's core race vs Intel's tick tock. Witness how Intel has consistently improved IPC every generation, whereas AMD has actually slid backwards. Who makes better CPUs? That is my point.

- That is looking kind of deep into the crystal ball is it not? Given the time it takes from the power point slide (moar coars) to shipping silicon, AMD could easily have predicted (wrongly) that multithreaded programming models would have taken off by now (take crysis3, only off by a few years).. and in a world where that is true, adding 10% IPC over 50 or 100% more cores would be nuts. As I see it, when the players are deciding on new archs, they're making pretty big guesstimates as to what the future of software will be bringing and sometimes they have to push that change cause the alternative is worse.

edit : and the concept of ipc++ being low hanging fruit, i am afraid it isnt so low hanging anymore.. I will try to dig up an interview with an intel engineer i read years back that stated just that : "all the the low hanging fruit has been picked". (ill be back)

BenchPress · Jun 13, 2013

Ancalagon44 said:
Note that this is what started our discussion. If "inherent scaling problems" refer to "a loss of work, no matter how small", that implies you expect IPC to scale much more easily.

Click to expand...

I do - increasing IPC requires nothing from the programmer, perhaps only the compiler.

But it requires a lot from the hardware! You can't claim IPC scales much more easily by just looking at what it means to developers. That's only half the story, the good part. The bad part is that beyond modest increments it takes a very large amount of hardware to extract more ILP, and worse, power.

I'm not sure if you fully grasp that. We're really approaching the limits of IPC. Haswell increased the execution port count by 33% over Ivy Bridge, but single-threaded IPC only went up by about 12%. This is hard to justify, even at Intel itself. The engineers had to go to great lengths to turn it into a net gain. Part of the justification is that it improves Hyper-Threaded IPC by 20%, but that brings us back to multi-threaded scaling.

Yes, probably true. However, look at the Bulldozer fiasco - can you honestly say it was a good idea to shave off execution resources and increase the pipeline length in order to add extra cores and hit higher clockspeeds?

I already answered that very early on:

BenchPress said:
That said, IPC should indeed not be sacrificed for more cores like AMD did with Bulldozer.

But the disaster that is Bulldozer does not imply that IPC is all that matters. AMD could have had a very successful 8-core by keeping the IPC at the same level as Phenom (i.e. still lower than Intel's), especially if they added technology to lower the thread synchronization overhead (which ironically they've researched but failed to deliver).

My point is, Intel has the right approach by NOT adding moar cores like AMD does. They increase IPC as much as is possible in a given generation, and add cores only later.

That is simply not true. Core 2 could have been a single-core with higher IPC, but instead they designed it from the ground up as a dual-core.

What the dictionary says it means. Permanent. Fixed. And quantitatively it means the scaling would have to be worse than 10% on average, always, unfixable.

Click to expand...

Actually that is not what inherent means - look at this online definition: Existing as an essential constituent or characteristic; intrinsic.
THAT is what I meant - that scaling issues are an inherent problem with multi core CPUs.

"inherent means inborn or fixed from the beginning as a permanent quality or constituent of a thing" - same site

But much more importantly you're still ignoring the quantitative aspect of it. Nobody cares that by your definition you'd label 99% scaling as a scaling issue for not being 100%. The real bar that is set for labeling it as an issue is when it would scale worse than IPC.

So with IPC reaching its limits and multi-core not hitting the inherent/fixed scaling issues because the overhead is substantially reduced with TSX, I have to once again conclude that it's probably not a wise idea to focus primarily on IPC and consider multi-core an afterthought.

My actual point, which we have deviated from completely, is that IPC is the low hanging fruit compared to adding extra cores. Rather do that than add extra cores - if possible. Obviously having a CPU with 20 decoders and 20 integer execution units is not much good to most people, however, IPC should still be the first thing optimized in any generation. My point was particularly made concerning AMD's core race vs Intel's tick tock. Witness how Intel has consistently improved IPC every generation, whereas AMD has actually slid backwards. Who makes better CPUs? That is my point.

That's two different points. One is looking at the past and correctly concludes that AMD should have put more effort into IPC, but the other is concluding from this that IPC is low hanging fruit and it should be the first thing optimized in every generation. The latter is a pretty bold claim. Just because something has worked in the past doesn't mean it will work in the future. IPC scaling is experiencing diminishing returns while TSX represents a breakthrough in multi-core scaling.

Melina42 · Jun 14, 2013

Perhaps this is a naive entry, but hasn't the Internet already proven the value of massive, many-cores parallelism over sheer processor speed? Compute problems handled across a massively parallel network get solved faster, period, already.

The problem is really whether it's worth trying to compete with/optimize around that principle with locally coherent silicon.

Torn Mind · Jun 14, 2013

Melina42 said:
Perhaps this is a naive entry, but hasn't the Internet already proven the value of massive, many-cores parallelism over sheer processor speed? Compute problems handled across a massively parallel network get solved faster, period, already.

The problem is really whether it's worth trying to compete with/optimize around that principle with locally coherent silicon.

I am not very well versed in computer science, but there are obstacles that programmers have to deal with when it comes to utilizing multiple cores on chips.

The Internet is primarily about storing data and accessing that stored data, although something like Bitcoin mining can utilize it to have more processing power. Bitcoin mining is a better example of using disparate resources to actually do some processing(aka doing math).

However, the reason apps are not always written to scale with multiple cores is because it is still currently a pain in the ass to program multithreaded code. Debugging is a more difficult task and other issues my ignorant self are not aware of are also probably present. Programmers must specifically tell the CPU to utilize multiple cores and they have a difficult time weeding out bugs in multi-threaded code.

http://en.wikipedia.org/wiki/Multithreading_(computer_architecture)

Yes another Haswell thread. Let's have a look at tock-to-tock IPC.

Lifer

Senior member

Lifer

Diamond Member

Elite Member

Senior member

Senior member

Senior member

Diamond Member

Diamond Member

Golden Member

Lifer

Lifer

Senior member

Diamond Member

Lifer

Lifer

Senior member

Member

Lifer