Heard a rumor, want to know to know if it's true

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
Originally posted by: William Gaatjes
1 possible way to solve it would be :

I would think that since all Intel chips overclock so well.

And in order to stay within the thermal envelope, the processor would shut down it's unused cores completely and clock 1 core higher automatically. Also all the shared resources between the cores would be dedicated to that single core. But if you have a modern os that already uses the advantages of multiple cores this would not work. Unless maybe there can be some coöperation between the OS and processor through a driver to let the processor know how the program is threaded.
The OS would have easy knowledge about it and force the processor in the "single core" state for as long as the thread is running.

Would that be viable ?

Viable? Yes. Will it be done? Unlikely.

Microsoft has zero motivation to implement features that increase the value of the underlying hardware for the sake of simply increasing the value of the underlying hardware.

Thread migration destroys the prospects of using these techniques of power savings and "TDP budget overclocking" of single cores while shutting down the unused cores.

Checkout the latest Anandtech review of Phenom. IMO their speculation on the funky performance/power results is spot-on. (thread migration is "freaking out" the power saving logic of the Phenom)
 

PeteRoy

Senior member
Jun 28, 2004
958
2
91
www.youtube.com
It is funny how people who have little to no knowledge about software architecture, cpu engineering go and say "I don't see how it is possible".

Humanity wouldn't go very far if people would say "I dont see how it is possible" all the time,

Here are a few things people thought werent possible only less than 200 years ago:
"I dont see how it is possible for a man to fly",
"I dont see how it is possible for people to see and talk to each other live using a display"
"I dont see how it is possible for a cart to move without horses"
"I don't see how it is possible to lower the tempratures of the home in the summer"
"I don't see how it is possible to listen to recorded music"
 
May 11, 2008
22,669
1,482
126
I have read that and found it interesting. I am waiting on the follow up and the explanation for it.

To talk about something else :
I personally agree with that the cpu get's too smart in the wrong way. But i am afraid it will not change soon because of backwards compatability.

I have read some interesting articles about taking all the thread logic out the cpu, even turning the cache in nothing more then a dualported scratchpad ram or local ram with some flags to make some parts for example read only or to let the MMU know that a part of the ram can be filled with data from external dram memory or that part of the ram has to be written to dram memory. Let the software control the thread switches and the memory througput.

This would free die space for more cores and more local ram. But unfortunatly this would ask a lot of skill from the software engineer, cause he/she has to learn how the hardware works just like 25 years ago. Because for example the memory has to be operated so that the mmu already is loading up data in the local on cpu ram before it is needed. And part the local ram has to be written back to the external dram and released before the local on cpu ram is all used up.
The software writer really has to look ahead what he want's to accomplish. However i would think that having 8 MB to run your program off and to do all your calculations in would be a good start. :) I have seen figures of 32MB cache. If that number would be higher without all the cache logic , boy that is a lot of local on cpu ram to work with. enough to buffer anything with a smart program.

But since it seems that even the cell processor is to much for most software writers , i doubt this would ever happen.


 

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
Originally posted by: PeteRoy
It is funny how people who have little to no knowledge about software architecture, cpu engineering go and say "I don't see how it is possible".

Someone has to be the ignorant consumer. A lot of someones.

Originally posted by: PeteRoy
Humanity wouldn't go very far if people would say "I dont see how it is possible" all the time,

Humanity wouldn't go very far if it resourced the entertainment of every far-fetched imagined idea and concept either.

Somewhere in the sequence of imagination to implementation there has to be a downselection process where only the most obviously viable paths forward are funded for implementation.

Humanity is not endowed with infinite resources to proof-out every idea that comes across someone's mind.
 

CTho9305

Elite Member
Jul 26, 2000
9,214
1
81
Originally posted by: Idontcare
Originally posted by: William Gaatjes
1 possible way to solve it would be :

I would think that since all Intel chips overclock so well.

And in order to stay within the thermal envelope, the processor would shut down it's unused cores completely and clock 1 core higher automatically. Also all the shared resources between the cores would be dedicated to that single core. But if you have a modern os that already uses the advantages of multiple cores this would not work. Unless maybe there can be some coöperation between the OS and processor through a driver to let the processor know how the program is threaded.
The OS would have easy knowledge about it and force the processor in the "single core" state for as long as the thread is running.

Would that be viable ?

Viable? Yes. Will it be done? Unlikely.

Microsoft has zero motivation to implement features that increase the value of the underlying hardware for the sake of simply increasing the value of the underlying hardware.

I think MS would take advantage of it if it were offered. FWIW, I also think it doesn't really matter if Microsoft does it so long as Linux-in-the-datacenter and Linux-on-the-UMPC do it.

Thread migration destroys the prospects of using these techniques of power savings and "TDP budget overclocking" of single cores while shutting down the unused cores.

Checkout the latest Anandtech review of Phenom. IMO their speculation on the funky performance/power results is spot-on. (thread migration is "freaking out" the power saving logic of the Phenom)

I missed that article when it came out. That was interesting.
 
May 11, 2008
22,669
1,482
126
I don't know if microsoft would not have motivation.

Back in the days when the games on windows where starting to emerge, they needed the help of Nvidia ,Ati, Voodoo, Matrox, S3 to make it happen. Microsoft and the graphics manufacturers worked together for a common games API we now all know as directx.



A performance/ power management/ singlethread- multi thread api would be needed now. I know the compiler takes most of the stress away. But the compiler can do only so much.


To come back to my post about letting the os have full control of the threads scheduling of the processor, the mmu of the processor, i want to go 1 step further.

I was also thinking that maye it is better to let the OS take care of power management software wise. The OS knows best when there is nothing to do. And all that logic can then be left out of the cpu freeing up space for again more cores, wider execution unit's, more local on cpu ram.

I think we are coming to a point where the software and the hardware have to be united more. The OS is the intercedent between the software and the hardware. The OS knows best how the software behaves and knows best how the hardware behaves. The OS is the perfect candidate to track software running on it and adjusting the hardware for maximum performance/ watt.


I may be wrong but i think that for example the drivers for Nvidia and ATI for their graphics cards already do this to some extent.
 

Lorne

Senior member
Feb 5, 2001
873
1
76
Actually its easyer for the programs to go back and re-optomize the software for multithred, But it cost money and no software producer is willing to take the chance spending a few buck on a chance it wont resale.

Idea, If a programmer would come out with a sorcecode rejuvinator, What ths can do is run through a installed folder and examin all the files and alter coding and resave (Im sure AV programs would like this), Same Idea for could be put to CD's and save a new image rdy to be burned.
Yes this can be done, A friend and I did this during the 6502 days porting games from C64 and apple for online BBS games, Infringment would be the only limit (Like Ahead hasnt cought crap for there products).

This might even work well enough that you could bring back games from the win95-98 up to date for XP/Vista use.
 

Tencntraze

Senior member
Aug 7, 2006
570
0
0
Originally posted by: PeteRoy
It is funny how people who have little to no knowledge about software architecture, cpu engineering go and say "I don't see how it is possible".

Well, this isn't really as silly as you make it out to be. I never said it was impossible, that no one would be able to do it, etc. "I don't see how it is possible" means "I don't understand how this would work" or "given my current knowledge of what I've worked with, this seems like it is way out there". I haven't had the time to read up on this anyway, but I certainly don't know about CPU engineering, though I do know about software engineering (but not game development), so my thoughts were made on these assumptions. This is a far cry from "it's impossible", and should I read up on this more (or more posts in this thread, but I'm insanely tired right now), perhaps I will begin to see how it is possible.

 
May 11, 2008
22,669
1,482
126
I had another idea. Since we are getting into the multiple core era, how about optimizing some cores for the os alone. In this case the OS would have it's own dedicated core(s) to do all the housekeeping of switching treads on the other cores. Less data would have to be moved back and forward or backupped. This would give a speed up improvement too.

You can even optimize these "OS" cores for OS functions.
Leaving for example SSE out, other instructions that would be never used but one could add instructions that would greatly improve the control the os has over the other cores and the treads those cores are running.

Combine that with other ideas i wrote down and i would think we would have an enourmous
speed bump.

I truly think there will come a point where an x amount of cores will create more overhead and actually slow the entire system down. The event horizon so to say :).
There will be more software running controlling the overhead then there will be actuall calculations.


I think the specialized cores are truly the future. But i think the OS (or should i say KERNEL because with windows that is not so obvious) needs to be more integrated with the hardware.



 

CTho9305

Elite Member
Jul 26, 2000
9,214
1
81
Originally posted by: William Gaatjes
I had another idea. Since we are getting into the multiple core era, how about optimizing some cores for the os alone. In this case the OS would have it's own dedicated core(s) to do all the housekeeping of switching treads on the other cores. Less data would have to be moved back and forward or backupped. This would give a speed up improvement too.

The thing is, the vast majority of that stuff doesn't really cost that much performance. Even on a single core processor, you can get maybe 99% of the CPU time used for the thread you're interested in; the background tasks just don't need that much computational power and don't cost that much. The whole "download a video while the antivirus runs and you chat on AIM" is BS. The only time you get a really worthwhile win (say, >=20%) is when you have multilple tasks that actually need significant computation. Sure, having context switches interrupting your compute-heavy task does cost some performance, but not enough to warrant another core.

You can even optimize these "OS" cores for OS functions.
Leaving for example SSE out, other instructions that would be never used but one could add instructions that would greatly improve the control the os has over the other cores and the treads those cores are running.

Most features like SSE just don't take that much die area. Removing them doesn't save much, but costs you a LOT of performance in the cases where you could have taken advantage of them. If you look at a "small" x86 processor like the Via C7 (Esther), it's 30mm^2 in 90nm technology with 256KB total cache. An Athlon 64 with 256KB total cache in 90nm would be in the 80mm^2 range (this is a guess based on sandpile.org's numbers and pictures)... and it'd probably be twice as fast. Given the choice between 2.5 slow cores and 1 fast core, the fast core is a much better choice since it's faster on everything, while the slow cores can only win if they're not too slow AND you have a very well-threaded workload.

I think the specialized cores are truly the future. But i think the OS (or should i say KERNEL because with windows that is not so obvious) needs to be more integrated with the hardware.

The kernel really doesn't use much CPU time unless you're asking it to do work. If you open task manager, there's a check box that lets you show user vs. system time. Run Prime95 and see where the CPU time goes... it's probably 99-100% Prime95, and 0-1% kernel/background tasks.

I'm not saying fixed-function processors are bad (compare mobile battery life with and without hardware assistance for watching an H.264 video or HD-DVD/BluRay) but the kind of things you're thinking about just aren't the real problem.
 

taltamir

Lifer
Mar 21, 2004
13,576
6
76
Originally posted by: soccerballtux
I heard something about this.
Would definitely be the holy grail of multi-core processing.
But that's because it's basically impossible.

no it wouldn't.

It takes 1 thread and then runs clones of it with guessed earlier results on other threads...

Example:

A single thread is calculating a function on core1, it should result in either a true or false answer.
Core2 starts running a clone of this thread with the answer being true.
Core3 starts running a clone of this thread with the answer being false.
Core1 finishes the calculation, the result is true, the thread is terminated on core1 and core3 and core2 is elevated from a "theoretical clone" to "active thread" status.

This only works for calculations with VERY small list of possible results, and it absolutely annihilates performance per watt.

The "hole grail of multi CPU computing" is for programmers to start programming everything to begin with to scale to infinite cores.
 
May 11, 2008
22,669
1,482
126
The kernel really doesn't use much CPU time unless you're asking it to do work.


You are hitting it right on the spot. With current kernels you would not notice it. But i wonder with the proposed idea's i wrote down the kernel would have a lot more to do.

And i was proposing those ideas to get the highest effiëncy. After all i am a techfreak :D


Most features like SSE just don't take that much die area. Removing them doesn't save much, but costs you a LOT of performance in the cases where you could have taken advantage of them. If you look at a "small" x86 processor like the Via C7 (Esther), it's 30mm^2 in 90nm technology with 256KB total cache. An Athlon 64 with 256KB total cache in 90nm would be in the 80mm^2 range (this is a guess based on sandpile.org's numbers and pictures)... and it'd probably be twice as fast. Given the choice between 2.5 slow cores and 1 fast core, the fast core is a much better choice since it's faster on everything, while the slow cores can only win if they're not too slow AND you have a very well-threaded workload.


I did not meant to throw it all away. I meant 1 core dedicated to the os tasks and the other cores dedicated to calculations like for example what SSE is used for.

It is what Intel and AMD are proposing i just thought of going 1 step further. 1 core dedicated and optimized for the kernel taks.
That core would not have use for FP unit's or SSE. Instead give it more resources to keep track of threads in flight.


I really think that a cpu with cores optimized for certain tasks will shine.


This will greatly improve performance and lower power.

Every cpu manufacturer acknowledges that all the fancy tricks of keeping the cores busy cost more power then the actual ALU's themselves use.

That is why i believe it is time the kernel must have more control over the local on cpu ram( not used as cache anymore cause the software must take care of that now), the mmu, powermanegement and how the cores are used.

it's a feeling. I cannot describe it. :sun::thumbsup:




 
May 11, 2008
22,669
1,482
126
Yep, but i pushed it a bit more further with what i read in the past from other cpu architectures.
And a suggestion of my own.

There are some MIPS chips out there where you can force part of the cache for exclusive use for scratchpad calculations aka use as your general on cpu ram. Since the static ram where the cache is build from is so much faster then external memory it save times and speeds up performance. Iam not sure but if i remember correct the cpu in the N64 game console could do these kinds of things for example. Please feel free to correct me, i am working from dusty memories here.

But i see that a kernel from a os does thread scheduling but the cpu does it too (hyper threading for example). So why not let the kernel take care of it all exclusively. Now normally a cpu already runs from the cache. But when it has cache misses it still has to load from "slow" external memory. When the software has total control over the cache(use it as ordinary ram) the chance of cache misses are less likely.
Now afcourse the cache has inteligent logic to minimise cache misses but i am sure that everything in the cpu is a trade off. You don't need that logic if you let the software take care of it. That would save power and you can put a little more ram on it. The biggest problem nowadays is to keep those incredibly fast cores busy. And the more cores you have the bigger that problem get's.


 

VirtualLarry

No Lifer
Aug 25, 2001
56,587
10,225
126
Originally posted by: William Gaatjes
I had another idea. Since we are getting into the multiple core era, how about optimizing some cores for the os alone. In this case the OS would have it's own dedicated core(s) to do all the housekeeping of switching treads on the other cores. Less data would have to be moved back and forward or backupped. This would give a speed up improvement too.

You can even optimize these "OS" cores for OS functions.
Leaving for example SSE out, other instructions that would be never used but one could add instructions that would greatly improve the control the os has over the other cores and the treads those cores are running.

Combine that with other ideas i wrote down and i would think we would have an enourmous
speed bump.

I truly think there will come a point where an x amount of cores will create more overhead and actually slow the entire system down. The event horizon so to say :).
There will be more software running controlling the overhead then there will be actuall calculations.

I think the specialized cores are truly the future. But i think the OS (or should i say KERNEL because with windows that is not so obvious) needs to be more integrated with the hardware.

I think what you're really trying to suggest, is not a single core for the OS (inefficient, and a step backwards), but rather, moving thread scheduling by the OS into the hardware itself. There may be a gain there. The VAX systems implemented a couple of instructions into the hardware that were only to be found in the scheduler code. So in effect, they had hardware-accelerated thread scheduling on the VAX.
 

bryanW1995

Lifer
May 22, 2007
11,144
32
91
Originally posted by: Idontcare
Originally posted by: William Gaatjes
1 possible way to solve it would be :

I would think that since all Intel chips overclock so well.

And in order to stay within the thermal envelope, the processor would shut down it's unused cores completely and clock 1 core higher automatically. Also all the shared resources between the cores would be dedicated to that single core. But if you have a modern os that already uses the advantages of multiple cores this would not work. Unless maybe there can be some coöperation between the OS and processor through a driver to let the processor know how the program is threaded.
The OS would have easy knowledge about it and force the processor in the "single core" state for as long as the thread is running.

Would that be viable ?

Viable? Yes. Will it be done? Unlikely.

Microsoft has zero motivation to implement features that increase the value of the underlying hardware for the sake of simply increasing the value of the underlying hardware.

Thread migration destroys the prospects of using these techniques of power savings and "TDP budget overclocking" of single cores while shutting down the unused cores.

Checkout the latest Anandtech review of Phenom. IMO their speculation on the funky performance/power results is spot-on. (thread migration is "freaking out" the power saving logic of the Phenom)

remember when I tried to use the automatic overclocking feature of my p965 laptop? it went from x10 to x11 for a second one time and I could never repeat it.
 
May 11, 2008
22,669
1,482
126
I think what you're really trying to suggest, is not a single core for the OS (inefficient, and a step backwards), but rather, moving thread scheduling by the OS into the hardware itself. There may be a gain there. The VAX systems implemented a couple of instructions into the hardware that were only to be found in the scheduler code. So in effect, they had hardware-accelerated thread scheduling on the VAX.

Indeed. The hardware is just plain faster at taking care of tasks. But the hardware can do only so much. It does not have future knowledge. Now afcourse they have all these functions build in to track instructions but that is what i meant to say. So much logic to keep track of the instructions and what to do with those instructions makes the cpu burn through those watts.

The kernel of an OS ( i wrote it more specific now) knows what's going on in detail and adjusts the hardware for maximum performance. Combine that with huge on chip ram and powermanegement control and we would see faster performance and saving more power. Add the intelligent compiler too. And future software experience would jump forward.


If i think for example about larrabee from Intel and Atom from Intel... Or look at the recent power cores from IBM. I may see a pattern there...

I am going to read about those vax hardware thread scheduling. That sounds interesting.

Thank you. :)



I found this article to be very interesting :


http://www.emulators.com/docs/nx05_vx64.htm

According to the writer a lot of improvements can still be made.
Since i am daydreaming anyway we can add this too for increased performance and lower power use :D



 

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
Originally posted by: William Gaatjes
The kernel of an OS ( i wrote it more specific now) knows what's going on in detail and adjusts the hardware for maximum performance. Combine that with huge on chip ram and powermanegement control and we would see faster performance and saving more power. Add the intelligent compiler too. And future software experience would jump forward.

The fact you don't see such things happening at the top (Intel/Microsoft) might be taken as proof that the problem is not as much of a problem as you perceive it to be.

Other than your handful of video conversion software programs there isn't much out there that taxes a modern quad-core computer system. For the vast majority of consumers there are plenty of spare CPU cycles to go around.
 
May 11, 2008
22,669
1,482
126
I believe it has more to do about money then that problems are not existing.

For this to work everything has to be build from the ground up. After that emulation can take place to run old software at at least the same speed.

We all know what happened to microsofts lasts attempt Longhorn for example.

There are many variables that we must take into account and backwards compatability is the biggest.

To do something like i wrote in the posts above would mean a lot of effort. And unless a competitor comes along it is not going to happen soon. Make as much money with as little effort to maximise your profit.

Now if a customer would come with a huge sum of mony they are willing to try.

And don't forget that there are patents too. Justfull or not, patents are a reality and patents too stand innovation in the way. I am not saying patents are a wrong concept, i just feel the way patents are abused as they are now are slowing progress down.

It has all to do with economics.


Example :

The enviroment and being green was never a hot topic.
Now that big sums of money can be made of the enviroment every company from every sector starts to jump on the bandwagon.




 

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
Originally posted by: William Gaatjes
I believe it has more to do about money then that problems are not existing.

For this to work everything has to be build from the ground up. After that emulation can take place to run old software at at least the same speed.

We all know what happened to microsofts lasts attempt Longhorn for example.

There are many variables that we must take into account and backwards compatability is the biggest.

To do something like i wrote in the posts above would mean a lot of effort. And unless a competitor comes along it is not going to happen soon. Make as much money with as little effort to maximise your profit.

Your posts are now coming full circle and sounding a lot like my original posts on your speculation near the top of the thread:

Originally posted by: Idontcare
Originally posted by: William Gaatjes
Would that be viable ?
Viable? Yes. Will it be done? Unlikely.

Microsoft has zero motivation to implement features that increase the value of the underlying hardware for the sake of simply increasing the value of the underlying hardware.
 

taltamir

Lifer
Mar 21, 2004
13,576
6
76
Originally posted by: William Gaatjes
I believe it has more to do about money then that problems are not existing.

For this to work everything has to be build from the ground up. After that emulation can take place to run old software at at least the same speed.

We all know what happened to microsofts lasts attempt Longhorn for example.

There are many variables that we must take into account and backwards compatability is the biggest.

To do something like i wrote in the posts above would mean a lot of effort. And unless a competitor comes along it is not going to happen soon. Make as much money with as little effort to maximise your profit.

Now if a customer would come with a huge sum of mony they are willing to try.

And don't forget that there are patents too. Justfull or not, patents are a reality and patents too stand innovation in the way. I am not saying patents are a wrong concept, i just feel the way patents are abused as they are now are slowing progress down.

It has all to do with economics.


Example :

The enviroment and being green was never a hot topic.
Now that big sums of money can be made of the enviroment every company from every sector starts to jump on the bandwagon.

patents are not a wrong concept, they just completely went out of control and are now applied wrongly and stifle innovation instead of encouraging it (their original goal)
 

CTho9305

Elite Member
Jul 26, 2000
9,214
1
81
Originally posted by: William Gaatjes
I found this article to be very interesting :


http://www.emulators.com/docs/nx05_vx64.htm

According to the writer a lot of improvements can still be made.
Since i am daydreaming anyway we can add this too for increased performance and lower power use :D

The guy doesn't seem to know as much as he thinks he knows. He talks about complexities in x86 instruction decoding and how decode limitations cost a lot of performance (e.g. on Intel chips, instructions need to occur in certain patterns for maximum decode throughput) but misses the fact that there are other bottlenecks that limit performance more. Even if decoder limitations never caused idle cycles, performance wouldn't go up drastically. For what it's worth, he completely ignores the fact that the AMD CPUs can decode multiple complex instructions in a single cycle and just focuses on Intel's decoder's limitations.

He's solving a problem that just isn't that big of a problem. Is x86 obnoxious to decode? Yes. Does it cost much? Not really. Some die area, some power, a few more engineers when designing a processor, but there are ways to make it go fast and not waste too much power, so x86 processors remain competitive.

His thoughts on mod-r-sib are interesting but I don't know enough about real 64-bit binaries to comment intelligently.

edit: The guys other writeups also demonstrate that he knows just enough to be dangerous. He believes large pages (4MB) are used to reduce the overhead of the page table size. That's not true. Even with 4KB pages, the page table for a 32-bit address space is something like 4MB... neligible nowadays, and that's only if a process actually uses most of its 4GB address space. The real issue is the TLB. If you put your data in a few big pages instead of many small pages, you'll hit in the TLB more often and performance goes up. He complains about the ridiculous case (64-bit mode) of a 1 terabyte allocation being slow because it takes 2GB of page table size, but what kind of idiot allocates 1TB of memory while having less than 4GB of physical memory? 2GB is 0.2% of 1TB. That's not a bad overhead at all!

Here's another good one: "16 thousand nanoseconds may not sound like much, but that's over 40,000 clock cycles per memory allocation in the worst case. Imagine the extra slowdown if your computer is low on physical memory and swapping a lot..." If you're swapping, 16 microseconds of CPU time is the least of your worries - you're being whacked repeatedly with milliseconds of latency from the actual disk drive.
 

pmv

Lifer
May 30, 2008
15,142
10,040
136
I seem to remember 'mitosis' is something to do with cell divison in biology.

(Of course, relying on hazy memories from school is a utterly pointless when I've got the internet in front of me. Makes me wonder why I bothered paying attention in lessons at all.)
 

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
Originally posted by: pmv
I seem to remember 'mitosis' is something to do with cell divison in biology.

(Of course, relying on hazy memories from school is a utterly pointless when I've got the internet in front of me. Makes me wonder why I bothered paying attention in lessons at all.)

Pretty good for a hazy memory: mitosis