• We should now be fully online following an overnight outage. Apologies for any inconvenience, we do not expect there to be any further issues.

Heard a rumor, want to know to know if it's true

Page 3 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
May 11, 2008
22,557
1,471
126
In reaction to : Idontcare,

The fact you don't see such things happening at the top (Intel/Microsoft) might be taken as proof that the problem is not as much of a problem as you perceive it to be. Other than your handful of video conversion software programs there isn't much out there that taxes a modern quad-core computer system. For the vast majority of consumers there are plenty of spare CPU cycles to go around.



Well, If such a thing would be implemented it would not be an x86 from Intel soon.

Since they need and want support from software vendors.
Microsoft has a big share in the desktop os and would be a significant factor.

But unfortunately microsoft has a history of being not a visionair, rather copying (or how one likes to call it) good idea's of other people/companies.

I think that a company that is to much marketing driven will not come up with novell idea's or will milk the cash cow as long as possible before implementing those idea's.

Therefore innovation goes much slower then it could be.

If it would happen it would happen in the server world first and there the linux community is always eager to use every feature for more performance. I do agree with you on that.

To come back to Intel :
If we take for example the IMC in the K8 and K10 from AMD, AMD and many others had this for years. Now Intel finally takes the step for an IMC to. Progress is there.


Your posts are now coming full circle and sounding a lot like my original posts on your speculation near the top of the thread:

It is always good to discuss these things. If we would just accept the "scraps" that are trown to us, the world would not be a fun place. I rather be an active person then waiting.

In reaction to : taltamir,


patents are not a wrong concept, they just completely went out of control and are now applied wrongly and stifle innovation instead of encouraging it (their original goal

I agree, that was is what i wrote but in different words. :thumbsup:


In reaction to :CTho9305,

I agree that some points in the text are a bit exaggerated.

From other points i will take your word for it .:laugh:

I have knowledge of a cpu innerworkings but not from every detail.

Nice to learn something. :thumbsup:

I read something in the past about those bottlenecks, that the functional unit's inside the core can never be kept busy all the time(even at ideal circumstances with branch prediction being right all the time).

But has that not to do with the limit's the x86 architecture has ?

The guy doesn't seem to know as much as he thinks he knows. He talks about complexities in x86 instruction decoding and how decode limitations cost a lot of performance (e.g. on Intel chips, instructions need to occur in certain patterns for maximum decode throughput) but misses the fact that there are other bottlenecks that limit performance more. Even if decoder limitations never caused idle cycles, performance wouldn't go up drastically. For what it's worth, he completely ignores the fact that the AMD CPUs can decode multiple complex instructions in a single cycle and just focuses on Intel's decoder's limitations.

What are the limitations of the AMD architecture then that causes this problem ?
Or with the Intel cpu's for that matter ?
I am really interested !

I always like to read the articles on this site :

http://www.realworldtech.com/

and here

http://arstechnica.com/articles/paedia/cpu.ars

and this

http://www.lostcircuits.com/


I find it very interesting.



 

CTho9305

Elite Member
Jul 26, 2000
9,214
1
81
Originally posted by: William Gaatjes
In reaction to :CTho9305,

{snip}

I read something in the past about those bottlenecks, that the functional unit's inside the core can never be kept busy all the time(even at ideal circumstances with branch prediction being right all the time).

But has that not to do with the limit's the x86 architecture has ?

Correct, even non-x86 chips (e.g. PowerPC, MIPS, etc) experience the same limitations. One of the main limitations (probably the biggest) is memory accesses.

The guy doesn't seem to know as much as he thinks he knows. He talks about complexities in x86 instruction decoding and how decode limitations cost a lot of performance (e.g. on Intel chips, instructions need to occur in certain patterns for maximum decode throughput) but misses the fact that there are other bottlenecks that limit performance more. Even if decoder limitations never caused idle cycles, performance wouldn't go up drastically. For what it's worth, he completely ignores the fact that the AMD CPUs can decode multiple complex instructions in a single cycle and just focuses on Intel's decoder's limitations.

What are the limitations of the AMD architecture then that causes this problem ?
Or with the Intel cpu's for that matter ?
I am really interested !

My educated guess: memory accesses, mispredicted branches, long-latency instructions with dependent operations.

Memory access is really nasty. If you hit in the L1, there's still a 3 cycle latency for most modern processors (4 cycles for some) during which you have to find other work to do. If you miss in the L1, it usually goes up to at least 10 cycles. If you miss in all your caches and go to DRAM, it's a loooong wait.

On top of that, if you have an instruction that writes memory that hasn't figured out its address yet, a subsequent instruction can't read memory safely because its value might change:
mul ebx ; slow instruction that writes eax
mov [eax], 5 ; write 5 to memory location eax*ebx
mov ecx, [12] ; read memory location 12 and write it into register ecx

Until the multiply finishes, the CPU doesn't know whether the second instruction is going to write to address 12 or a different address. If it doesn't write to 12, the third instruction can execute at any time. If it does write to 12, the third instruction has to wait. There are ways to make this go faster, but until the most recent couple generations from Intel and AMD memory accesses were pretty much always serialized. That puts a big limitation on the available pool of instructions to run when you're waiting for something.
 

Viditor

Diamond Member
Oct 25, 1999
3,290
0
0
Originally posted by: brett1
This one guy recently told me he read about some upcoming Intel technology that would trick older programs that only support one core into supporting all cores of the processor. I asked him to give me a link to what he read or an Intel codename or some such but conveniently he could not find it.

Does anyone know if this is true? Is such a technology coming out anytime soon? I play several old games that are very CPU dependent and do not take advantage of multicore processors.

thanks

Going back to the OP's question, Mitosis will not effect anything there unless you re-compile those "older programs" with a Mitosis Compiler to induce aggressive speculative threading (and that's assuming they can overcome the hardware challenges as well).
 

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
Originally posted by: Viditor
Originally posted by: brett1
This one guy recently told me he read about some upcoming Intel technology that would trick older programs that only support one core into supporting all cores of the processor. I asked him to give me a link to what he read or an Intel codename or some such but conveniently he could not find it.

Does anyone know if this is true? Is such a technology coming out anytime soon? I play several old games that are very CPU dependent and do not take advantage of multicore processors.

thanks

Going back to the OP's question, Mitosis will not effect anything there unless you re-compile those "older programs" with a Mitosis Compiler to induce aggressive speculative threading (and that's assuming they can overcome the hardware challenges as well).

Viditor and CTho9305 do either of you guys see any reason why Mitosis would or could be an Intel only feature? (outside of the compiler being engineered to intentionally generate code that exclusively runs on Intel CPU's)

Seems to me that the concept itself is actually generic enough that it should function on any x86 compatible hardware. (hardware wise you just need the capability to spawn >1 thread in a system, not even required to have >1 thread processing capability although obviously performance would degrade if you didn't)

It would also seem like SUN would be even more interested in getting such a compiler into the hands of software developers given their 64 thread Niagara2 processors.
 

The-Noid

Diamond Member
Nov 16, 2005
3,117
4
76
By the time this comes out, coding will have already solved the problem. People creating software are getting better in multi-honed and threaded environments. If anything this was a play by Intel to create some time until the majority of software will already be multi-threaded.

"Oh wait, we have this great product, however it isn't needed anymore."

Intel also doesn't want to compete with their software side competition. IBM/SUN/(Companies with large contracts) create their own software and would be thoroughly disappointed should INTC create a program that multi-threads all software, thus pricing them out of the market.
 
May 11, 2008
22,557
1,471
126
To CTho9305:

Until the multiply finishes, the CPU doesn't know whether the second instruction is going to write to address 12 or a different address. If it doesn't write to 12, the third instruction can execute at any time. If it does write to 12, the third instruction has to wait. There are ways to make this go faster, but until the most recent couple generations from Intel and AMD memory accesses were pretty much always serialized. That puts a big limitation on the available pool of instructions to run when you're waiting for something.

Indeed.

I know what you mean, the load/stores. Intel calls it : Memory disambiguation.

For anyone interested : here is a good graphical explanation.

http://arstechnica.com/articles/paedia/cpu/core.ars/8


The AMD K10 has out of order loads and stores too but in a more conservative manner.

As i understand it all comes back to memory still being slow. That's why i came up with the posts above about having the software manages the on cpu ram for local storage/buffer and as scratchpad. To hide the memory latencies as much as possible.
Since we can spend enough transistors now and having too many cores will mean to much burden on the memory bus.

The nehalem will have 3 ddr3 channels with a theoretical peak bandwith of more then 30GB/s: But the cache will still be necessary to hide the latencies.

I am curious what the future will bring because sooner or later: A the sockets, B the package, C the die itself and the avalable pads will come to their physical limits of practical use. Maybe by then optical connections will be a reality...