How much electricity would be saved worldwide if Windows was writen in Assembly?

Page 10 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Status
Not open for further replies.

Schmide

Diamond Member
Mar 7, 2002
5,745
1,036
126
Use QueryPerformanceCounter (windows) if you are near the resolution of your timer.

GetTickCount has a resolution of ~15ms. QueryPerformanceCounter often uses the CPU frequency for its resolution. Of course, since we are working with assembly, you could use RDTSC as well.

Yeah well this is just a simple console app. Then I'd have to query the frequency and deal with the high and low values. I like my x10 solution.
 

mfenn

Elite Member
Jan 17, 2010
22,400
5
71
www.mfenn.com
Ok I ran it on my q9550 and although the scores do fluctuate a bit, it was quite a bit faster. Especially with the standard code. I think I've reached the limits of the timer. So I made each routine run 10 times to shift the resolution all times are 10x i.e. just a shift of a decimal point. Results.

Q9550 2.83

Code:
SchmideSSE 1.328 seconds
simplesol 1.64 seconds
simplesolASM 1.344 seconds
SchmideSSEASM 0.453 seconds
Hit any key to continue (Where's the ANY key?)
Athlon II 620 2.6ghz

Code:
SchmideSSE 1.685 seconds
simplesol 2.87 seconds
simplesolASM 1.342 seconds
SchmideSSEASM 0.827 seconds
Hit any key to continue (Where's the ANY key?)
WTH is up with the Athlon II and the straight up x87 code?

If someone with an I7 would run this, I'd like to see the results.

EXE pythsse2.zip

SRC and VCproj pythsse2src.zip

Here ya go (i7 860 @ 2.8 (stock) with turbo):
Code:
SchmideSSE 1.498 seconds
simplesol 1.482 seconds
simplesolASM 1.31 seconds
SchmideSSEASM 0.608 seconds
Hit any key to continue (Where's the ANY key?)
 

jchu14

Senior member
Jul 5, 2001
613
0
0
Here's a Phenom X6 1055t at 3.36GHz (no turbo)

SchmideSSE 1.331 seconds
simplesol 2.184 seconds
simplesolASM 1.029 seconds
SchmideSSEASM 0.624 seconds
Hit any key to continue (Where's the ANY key?)

Scaled almost perfectly linearly with frequency compared to the Athlon II. The 29.2% higher core frequency translated to 28.5% to 32.5% faster execution. I wonder what's different between the AMD and Intel implementation.
 
Last edited:

Cogman

Lifer
Sep 19, 2000
10,286
145
106
Yeah well this is just a simple console app. Then I'd have to query the frequency and deal with the high and low values. I like my x10 solution.

Psshh. Lazy :p You can rewrite the whole thing in assembly, but can't be bothered to make a good timer :awe:
 

Schmide

Diamond Member
Mar 7, 2002
5,745
1,036
126
Ok I fixed the exe. The defaults for which I was compiling the release were set to strict math and the routine was calling a very complected sqrt routine.

My Athlon II 620

Code:
SchmideSSE 1.779 seconds
simplesol 1.887 seconds
simplesolASM 1.373 seconds
SchmideSSEASM 0.827 seconds
Hit any key to continue (Where's the ANY key?)
 

EarthwormJim

Diamond Member
Oct 15, 2003
3,239
0
76
Ran on a Core i7 860 at 4ghz.

Code:
SchmideSSE 1.17 seconds
simplesol 1.155 seconds
simplesolASM 1.076 seconds
SchmideSSEASM 0.499 seconds
Hit any key to continue (Where's the ANY key?)

I wonder why the asm version runs faster on your core 2 quad Schmide, than on my core i7.
 

Ben90

Platinum Member
Jun 14, 2009
2,866
3
0
i7 920 @ 4ghz / 3054mhz uncore
3x2GB @ 572.7hz / 7s

Code:
SchmideDDE 1.201 seconds
simplesol 1.202 seconds
simplesolASM 1.138 seconds
Schmide SSEASM .515 seconds
Hit any key to continue (Where's the ANY key?)
 

flexy

Diamond Member
Sep 28, 2001
8,464
155
106
Hi,
I was just wondering if it is possible to guesstimate a number for this question. :\
I chose the CPU forum because I thought it comes down to work being done by processors.

I dont think that the energy usage of a typical PC has anything to do with the language the OS is written in. (By the way i used to code in assembler back in the days).

You could also write "stupid code" in assembly, eg. some loop, which would stress the CPU in the same way (even more) than if you had used a higher language.

If the PC is idle (or quasi idle) like right now while i am writing this, what does it matter whether its in C++ or assembly? The concept is the same.

If you want to look at power savings, look how many people run stupid screen-savers while they DONT EVEN USE THEIR PC..or just keep their PCs running while not used.

"No big deal" <-- :) in the states with utilities about 3x cheaper than here in EU...but i can tell you its an absolute no-no to keep my (heavily overclocked) PC on here 24/7 with the FVCKING INSANE utility costs we have here. Let alone using my 52" Plasma as a 15hr/day PC monitor. You would literally pay yourself right into the looney-bin.

If you want to make people start saving energy...check your work-place and see how many people leave their computes running WHILE NOT EVEN AT WORK - displaying stupid stuff like "web shots" images which not only keeps the PC running, but ALSO uses network resources and so on...
 

flexy

Diamond Member
Sep 28, 2001
8,464
155
106
I don't think you guys see the bigger picture, ASM can do so much more than that crappy C code.

My crappy C compiled windows install uses ~500 watts on full load. Using the savings from ASM, I calculate my computer will use:
500*177=70.8KW

Now, this is a lot of heat. By placing a geothermal powerplant over my computer at 80% efficiency I generate ~56.6KW of pure power.

I then resell this to the grid at a rate of .12/KWH for a profit of ~ $6.79 an hour 24/7.

I now make $60k a year by doing nothing. Thank you ASM, I can now live out my dream of being a freelance lumberjack.

what do you mean with "windows install on full load". Windows is not what runs "under full load" - its whatever app/game you run.

Also..lets take the hypothetical example that Win (and DirectX and whatever belonging to it) would be in 100% optimized ASM...i can tell you right now that it would simply not be the case that "unused cycles" would be kept simply "for saving energy".....but there will simply be more functionality added in the OS/software UNTIL THE H/W is maxxed out again, IMO.

If the OS would be in optimized ASM...maybe we'd have a real-time 3d rendered desktop right now and whatever other fancy schmanzy matrix -style stuff simply because the optimized code WOULD be able to do it with given hardware.
 

Scali

Banned
Dec 3, 2004
2,495
1
0
If the OS would be in optimized ASM...maybe we'd have a real-time 3d rendered desktop right now and whatever other fancy schmanzy matrix -style stuff simply because the optimized code WOULD be able to do it with given hardware.

I doubt it would make much of a difference really.
We already use a 3D accelerator for all the fancy effects.
Which means that:
1) Custom-designed hardware is way more power-efficient than a CPU performing the same task. Most savings are because of the hardware here, not software.
A fine example of this is HD video playback for example. A regular Atom-based system is not capable of it. You need a pretty high-end CPU to decode HD video in realtime. Which would give you a LOT more power consumption than the Atom.
However, pair an Atom with a decent IGP, such as the nVidia Ion, and suddenly you can do HD video perfectly without even taxing the CPU all that much, and overall power consumption is still relatively low. You can still make devices with passive cooling and long battery life with an Atom+Ion solution. There's your massive power savings.

2) The 3D accelerators use a programming model that is pretty much custom-made for D3D/OpenGL. Although in theory you can still program them in asm, in practice there is little to gain, because the HLSL/GLSL are designed to match the hardware so closely, and compilers are optimized so well, that there is little or no difference in performance.
In fact, in D3D10 Microsoft abandoned the use of their assembly shader language altogether (which is not exactly the same as the underlying hardware, granted... but the underlying hardware is optimized to run this language as efficiently as possible, not much more than that). Only HLSL is supported now.
 

Any_Name_Does

Member
Jul 13, 2010
143
0
0
what do you mean with "windows install on full load". Windows is not what runs "under full load" - its whatever app/game you run.

Also..lets take the hypothetical example that Win (and DirectX and whatever belonging to it) would be in 100% optimized ASM...i can tell you right now that it would simply not be the case that "unused cycles" would be kept simply "for saving energy".....but there will simply be more functionality added in the OS/software UNTIL THE H/W is maxxed out again, IMO.

If the OS would be in optimized ASM...maybe we'd have a real-time 3d rendered desktop right now and whatever other fancy schmanzy matrix -style stuff simply because the optimized code WOULD be able to do it with given hardware.

You can't do without asm. that's for sure. even schmide applied some asm to prove that asm doesn't matter all that much. the problem with asm is that it is no fun, and when something is no fun, you don't learn it as good. In the course of learning it, I found myself working for peanuts, saving a byte of code here and there, which really is a moot point, considering the terrabyte HDDs these days. Algorithm is the way. I wasn't so sure about it. but I am now. With high level you have more freedom of mind to concentrate on your algorithm, so I'd say you need them both.
 

Scali

Banned
Dec 3, 2004
2,495
1
0
the problem with asm is that it is no fun, and when something is no fun, you don't learn it as good.

I think that's personal.
I've always enjoyed programming assembly.

But yes, obviously you should not waste your time writing asm where it doesn't matter, and you shouldn't think of asm as a magic wand.
High level algorithm design and optimizations are far more important for the overall performance.
And even if you're going to optimize with assembly, you'd better know EXACTLY what you're doing, because compilers can easily beat naive assembly code.

You need to know where, when and how to use assembly if you want top performance.
Funny enough the people at MS know that aswell. If you look through some of the code for their libc, D3D/D3DX and other generally performance-intensive stuff, you'll find a lot of high-quality assembly optimizations where it matters.
 
Feb 14, 2010
78
0
0
Does it actually matter if Windows is written in Assembly or not? Also, I thought all compilers are supposed to convert code written by programmers into assembly or machine code?
 

Scali

Banned
Dec 3, 2004
2,495
1
0
Does it actually matter if Windows is written in Assembly or not? Also, I thought all compilers are supposed to convert code written by programmers into assembly or machine code?

They do, but there are cases where humans can do it better than the machine.
 

mfenn

Elite Member
Jan 17, 2010
22,400
5
71
www.mfenn.com
Many props to Schmide, because of his effort, this thread didn't turn out the way it easily could have. It's threads like these that make me glad that AT exists. :)
 

Any_Name_Does

Member
Jul 13, 2010
143
0
0
Many props to Schmide, because of his effort, this thread didn't turn out the way it easily could have. It's threads like these that make me glad that AT exists. :)

If schmide didn't do it, someone else would have. My code was screaming to be beaten. It was optimized for size, and you still can't beat it in that field.
I knew a couple of instruction were bottlenecking, but I wasn't aware of the extent of those bottleneckings.
 

Schmide

Diamond Member
Mar 7, 2002
5,745
1,036
126
:D

I explained how I used you for my purpose. :sneaky: . so you'd be on the same direction as me.

Dude? I am so not in your direction other than the wafting of my farts. I have 30 years of programming behind me. You may think you used me, but I know how to check my ego at the door.

:D
I just got GCC. Any good?

GCC is good. The key to development is the IDE and I have currently fallen behind in the linux development platform. I'm sure someone here can point you in the right direction.

If schmide didn't do it, someone else would have. My code was screaming to be beaten.

I don't think you realize how rare it is for a programmer to step up and code up some assembly in an afternoon. Please don't belittle what I did. Then again if your ego wasn't oozing all over this thread, I wouldn't have had the motivation to do it.

It was optimized for size, and you still can't beat it in that field.
I knew a couple of instruction were bottlenecking, but I wasn't aware of the extent of those bottleneckings.

It is sad that we get to this point and you still don't realize it was your algorithm holding your code back and not the implementation. Code smart, Code S-Mart.
 

Scali

Banned
Dec 3, 2004
2,495
1
0
I don't think you realize how rare it is for a programmer to step up and code up some assembly in an afternoon.

Yea, even at http://asmcommunity.net/board we don't have this sort of 'challenges' all that often anymore.
But it can be fun when it happens.
Might be a good place if you want to start this sort of thread. I usually throw in an ANSI C version for good measure, to see where we stand exactly :)
 

Schmide

Diamond Member
Mar 7, 2002
5,745
1,036
126
Scali - you do have to admit the results of the coding do shed light on the physics SSE issue we discussed before. Intrinsics/straight math/ASM x87 all about equal in the end, hand coded SSE up to twice as fast.
 

Scali

Banned
Dec 3, 2004
2,495
1
0
Scali - you do have to admit the results of the coding do shed light on the physics SSE issue we discussed before. Intrinsics/straight math/ASM x87 all about equal in the end, hand coded SSE up to twice as fast.

I'm not going into that discussion...
Thing is, you are testing a single routine here.
A full physics lib's performance doesn't just depend on a single routine.
You may be able to make SOME parts twice as fast, or more... but that's no guarantee that the physics workload in a real-world scenario will execute twice as fast.

That reminds me a bit of the Intel promotion of SSE4 in the Penryn... They could make one part of H264 encoding considerably faster, and promoted that... However, encoding a complete H264 movie, and the results were far less impressive.
 

Modelworks

Lifer
Feb 22, 2007
16,240
7
76
ASM can accomplish anything that any other language can and faster, it just depends on how good the programmers are. Some examples:

Menuet, fits on 1.44MB disk
http://www.menuetos.net/

Pre-emptive multitasking with 1000hz scheduler, multithreading, multiprocessor, ring-3 protection
- Responsive GUI with resolutions up to 1280x1024, 16 million colours
- Free-form, transparent and skinnable application windows, drag'n drop
- IDE: Editor/Assembler for applications
- USB 2.0 Hi-speed storage, webcam, printer class and tv/radio support
- TCP/IP stack with Loopback & Ethernet drivers
- Email/ftp/http/chess clients and ftp/mp3/http servers
- Hard real-time data fetch


Another OS written in ASM is baremetal, they just released version 0.4.8. It is a bit different in the OS approach going back to the way dos did things .Instead of having the OS provide access to hardware , the programs can interact directly with the hardware if they want without the OS being used or they can use calls through the OS. Both the OS and programs run at Ring 0.
http://www.returninfinity.com/baremetal.html

Starting with a clean slate we can say goodbye to bloated code and feature creep! As of the current version with the full CLI and internal functions, the operating system binary is only 16384 bytes. A standard "Hello, World!" example compiles to a file of only 31 bytes.


Would it save power if windows was all assembly ? No way to tell, but I doubt it . It might would save hard drive space though :)
 
Last edited:

Any_Name_Does

Member
Jul 13, 2010
143
0
0
Dude? I am so not in your direction other than the wafting of my farts. I have 30 years of programming behind me. You may think you used me, but I know how to check my ego at the door.



GCC is good. The key to development is the IDE and I have currently fallen behind in the linux development platform. I'm sure someone here can point you in the right direction.



I don't think you realize how rare it is for a programmer to step up and code up some assembly in an afternoon. Please don't belittle what I did. Then again if your ego wasn't oozing all over this thread, I wouldn't have had the motivation to do it.



It is sad that we get to this point and you still don't realize it was your algorithm holding your code back and not the implementation. Code smart, Code S-Mart.

I made it clear from the very beginning of this thread that I am by no means an advanced programmer. being a good prgrammer and smartness are two different things. if someone goes to a forum where a lot of top programmer are likely to be hanging around and says I am not good at programming, but my code is much better than all of you,,,,,,,,,,then,,,,,smart people,,,,,,,,

We have a saying. Don't fart in a church.
 
Status
Not open for further replies.