How much electricity would be saved worldwide if Windows was writen in Assembly?

Scali · Jul 18, 2010

Modelworks said:
Would it save power if windows was all assembly ? No way to tell, but I doubt it . It might would save hard drive space though

I think if you want to save harddisk space, you'd be better off by trimming the fat first.
Windows is mostly so large because it tries to be everything to everyone, in the past, present and future.
A 64-bit OS needs full 32-bit support aswell, meaning a lot of redundancy.
More redundancy comes from the same kind of functionality being offered by multiple libraries etc...
For example, you get a full .NET environment installed by default, which is hundreds of megs... basically it does nothing that the old Win32 API can't do, it just does the same slightly differently. You could do without it just fine, as long as you stick to Win32 applications.
Likewise, you get all versions of DirectX, and then OpenGL alongside of that.
You get full media player/media center capabilities out-of-the-box.
The list goes on and on.

Schmide · Jul 18, 2010

Any_Name_Does said:
I made it clear from the very beginning of this thread that I am by no means an advanced programmer. being a good prgrammer and smartness are two different things. if someone goes to a forum where a lot of top programmer are likely to be hanging around and says I am not good at programming, but my code is much better than all of you,,,,,,,,,,then,,,,,smart people,,,,,,,,

We have a saying. Don't fart in a church.

I and many others took you seriously, yet everything we explained to you was met with opposition. No matter how clear you were about your abilities, you were always adversarial.

I even went so far as to defend your code for what it was, a good implementation of a poor algorithm.

No one likes to be used by someone, yet you seem to think it's cool to claim manipulation of someone for your own benefit. Good luck with that if you ever make it into the industry. To survive on a coding team, you have to be a lot better than any of us here to get away with that kind of attitude. You may have some talent and drive, but without a dose of humility you will never make it in the engineering field.

Edit: PS in the Church of the SubGenius we fart all the time.

Voo · Jul 18, 2010

Schmide said:
You made a deal with the devil???

Yeah and something seems to not like that.. my poor little 8800 just kicked the bucket.

So completely untested or optimized code and I've only written a little bit of CUDA some time ago so that's hardly the best way to do it, but I thought it would be interesting (next time don't write code before looking if the HW works *cough*) and could give another data point, since we already have that much

Also like usual you'd probably have to play a fair bit with the parameters (i.e. threads per block and the leaf size) - the leaf size is essential, since we want enough parallelism to keep the GPU busy (not much memory access going on, so it's not that important and the GPU can only handle around 12k threads at the same time anyway..) and also the row/column computation of a triangular matrix is rather complex (I just played a bit with pen and paper and came up with those formulas, also couldn't find anything better in a quick search..) and we have to ammortize that..

So if anyone with some CUDA experience AND a working GPU would polish that up and run, here's the code:

Code:

#include <stdlib.h>
#include <stdio.h>

#include <cuda.h>
#include <cutil.h>

#define LEAF_SIZE (50)
#define MATRIX_SIZE (5000)
#define THREADS_PER_BLOCK (256)
#define MAX_NR_BLOCKS (65535)
#define MAX_NR_THREADS (MAX_NR_BLOCKS * THREADS_PER_BLOCK)

__device__ int ComputeRow(int i) {
	int ii = MATRIX_SIZE * (MATRIX_SIZE + 1) / 2 - 1 - i;
	int k = (((int) sqrt((float)8 * ii + 1)) - 1) / 2;
	return MATRIX_SIZE - 1 - k;
}

__device__ int ComputeCol(int row, int i) {
	return i - MATRIX_SIZE * row + row * (row + 1) / 2;
}

__device__ int IsTriple(int a2, int b) {
	int b2 = b * b;
	int c2 = a2 + b2;
	float c = sqrt((float)c2);
	if((int) c == c) return 1;
	return 0;
}

__global__ void ComputeTriples(int *erg) {
	int row, col, i, j, sum;
	i = (blockIdx.x * THREADS_PER_BLOCK + threadIdx.x) * LEAF_SIZE;
	row = ComputeRow(i);
	col = ComputeCol(row, i);
	row2 = row * row;
	for(j = 0; j < LEAF_SIZE; j++) {
		sum += IsTriple(row2, col);
		col++;
		if (col == MATRIX_SIZE) {
			row++;
			row2 = row * row;
			col = row;
		}
	}
	// make sure nothing is optimized away
	*erg += sum;
}

int main() {
	unsigned int timer, nr_blocks;
	int *erg;
	// don't care about one off errors here, at worst it computes a little bit too much.
	nr_blocks = MATRIX_SIZE * (MATRIX_SIZE + 1) / ( 2 * THREADS_PER_BLOCK * LEAF_SIZE) + 1;
	if (nr_blocks >= MAX_NR_BLOCKS) {
		printf("Too many blocks for one kernel invocation.\n");
		return 1;
	}
	cutCreateTimer(&timer);
	cutStartTimer(timer);
	cudaMalloc((void**) &erg, sizeof(int));

	ComputeTriples <<< nr_blocks, THREADS_PER_BLOCK >>> (erg);
	// cudaFree(erg); not sure if it'd optimize something away if we just free it after the call.
	cutStopTimer(timer);
	printf("done in: &#37;fms\n", cutGetTimerValue(timer));
	cutDeleteTimer(timer);
	return 0;
}

Any_Name_Does · Jul 18, 2010

Schmide said:
I and many others took you seriously, yet everything we explained to you was met with opposition. No matter how clear you were about your abilities, you were always adversarial.

I even went so far as to defend your code for what it was, a good implementation of a poor algorithm.

No one likes to be used by someone, yet you seem to think it's cool to claim manipulation of someone for your own benefit. Good luck with that if you ever make it into the industry. To survive on a coding team, you have to be a lot better than any of us here to get away with that kind of attitude. You may have some talent and drive, but without a dose of humility you will never make it in the engineering field.

Edit: PS in the Church of the SubGenius we fart all the time.

I didn't mean to hurt you

. I'll be milder next time ()

. What else can I say when you fart in my general direction?

evolucion8 · Jul 19, 2010

I don't know a thing about this, but I have a question. Is it possible to boost performance of such code posted here by using more than SSE2 like SSE3/SSSE3 or SSE4/SSE4.1 and SSE4A, I never understood the AMD's odd performance using x87, is there a reason? I would love to know why, plus seems to me that Penryn can perform slightly better than the i7 used thanks to its very low latency cache, may be code that can challenge more the execution resources or are heavily multithreaded will shine in the i7.

About the 8800 card, bake it!! You don't have anything to loose!! There's a thread around here to fix that card.

Scali · Jul 19, 2010

evolucion8 said:
I never understood the AMD's odd performance using x87, is there a reason?

Odd in what way?
Back in the days of Athlon classic/XP/64 vs Pentium 3/4, it was like this:
The Pentium 3 had a relatively simple x87 unit. Intel was moving towards SSE anyway, and didn't really care about x87 performance anymore.
AMD on the other hand used the technology from the Alpha processor to implement a very advanced pipelined x87 implementation.
This gave AMD the upper hand in x87, but Intel was usually faster with floating point when SSE was used (which AMD didn't support yet). AMD did have 3DNow!, but on an Athlon it didn't make as much sense as on the K6 anymore, now that the x87 was so powerful.

With the Pentium 4, Intel put the final nail in x87's coffin. SSE2 was going to replace the x87 completely for floating point operations (aside from some legacy stuff), and x87 was implemented in micro-code macros using the SSE execution units.
The result was that Pentium 4 had very good floating point performance when SSE2 was used, but x87 was absolutely atrocious.

AMD slowly caught up with SSE support, but their implementations weren't as strong as Intel's... Combined with AMD's superior x87 design, situations could arise where x87 was faster than SSE on Athlons.

When AMD introduced their 64-bit extensions, they took Intel's hint with the Pentium 4, and AMD deprecate x87 for 64-bit, SSE2 was preferred for all floating point operations.

AMD later moved to a better, full 128-bit implementation of SSE, so their SSE wasn't a weak point anymore. Conversely, with the Core2, Intel improved legacy x87 performance again, since it turned out to be a weak point in the Pentium 4, and SSE2 adoption was much slower than anticipated.

Currently, Intel and AMD are pretty reasonably matched in x87 and SSE (with Intel having the advantage of having new SSE extensions first, for those programs that adopt them early)... but with 64-bit adoption going the way it does with Windows 7, x87 will probably be nothing but a bad memory soon.

evolucion8 said:
I would love to know why, plus seems to me that Penryn can perform slightly better than the i7 used thanks to its very low latency cache, may be code that can challenge more the execution resources or are heavily multithreaded will shine in the i7.

Penryn has the advantage that two cores share a relatively large and low-latency L2 cache. In some cases you can take advantage (if you need to share data, but only between two threads at a time) of that and do things the i7 simply cannot do.

flexy · Jul 19, 2010

Scali said:
I think that's personal.
I've always enjoyed programming assembly.

But yes, obviously you should not waste your time writing asm where it doesn't matter, and you shouldn't think of asm as a magic wand.
High level algorithm design and optimizations are far more important for the overall performance.
And even if you're going to optimize with assembly, you'd better know EXACTLY what you're doing, because compilers can easily beat naive assembly code.

You need to know where, when and how to use assembly if you want top performance.
Funny enough the people at MS know that aswell. If you look through some of the code for their libc, D3D/D3DX and other generally performance-intensive stuff, you'll find a lot of high-quality assembly optimizations where it matters.

ASM was a lot of fun on the 68x00 - but its simply not the same boat anymore like today where Gigabytes of Data get shuffled instead of kbytes.

If someone told me 20 years ago that a graphics driver alone would be 100MB+ to download..i would not have believed him. This was a time where we had 20MB harddisks

Furthermore....its still common and nothing speaks against it to use ASM for extremely time-critical sub-routines WITHIN you higher language code.

I agree that today compilers are so good that a focus on ASM would be a waste..the benefit just wouldnt be there, IMHO.

Scali · Jul 19, 2010

flexy said:
I agree that today compilers are so good that a focus on ASM would be a waste..the benefit just wouldnt be there, IMHO.

Yup, if you look through this thread for example:
http://www.asmcommunity.net/board/index.php?topic=29696.0

Now these are assembly programmers, some of them programming in assembly ALL of the time.
As you can see, with the proper choice of algorithm, the C compiler can reach a result that beats most assembly attempts (biggest problem of many assembly programmers is that they don't know ANYTHING about the underlying architecture, and use all sorts of archaic instructions that are emulated in micro-code, and thus are very slow. Compilers won't fall for that sort of thing).
With VS2008, I could still tweak the code a bit to make it faster. When I got the VS2010 beta and recompiled the code, it came up with identical code to my hand-tuned one.

flexy · Jul 19, 2010

hiddensniper11 said:
Does it actually matter if Windows is written in Assembly or not? Also, I thought all compilers are supposed to convert code written by programmers into assembly or machine code?

You are basically right. But ASM would give a programmer more control since it would be the lowest level to code - you can more "hand optimize" so to speak.

But IMO it just isn't practical anymore, not in a case where an OS alone is Gigabytes and Gigabytes of code.

I think the benefit of writing an OS in a high language and thus make the code "easier to read", modular and/or OO etc. would far out-weight a n alleged benefit of attempting to write a whole OS in ASM.

Power savings due to ASM? I dont think so - AT ALL respective i cant even see WHY ASM code would contribute to power savings.

Cogman · Jul 19, 2010

Scali said:
Yup, if you look through this thread for example:
http://www.asmcommunity.net/board/index.php?topic=29696.0

Now these are assembly programmers, some of them programming in assembly ALL of the time.
As you can see, with the proper choice of algorithm, the C compiler can reach a result that beats most assembly attempts (biggest problem of many assembly programmers is that they don't know ANYTHING about the underlying architecture, and use all sorts of archaic instructions that are emulated in micro-code, and thus are very slow. Compilers won't fall for that sort of thing).
With VS2008, I could still tweak the code a bit to make it faster. When I got the VS2010 beta and recompiled the code, it came up with identical code to my hand-tuned one.

I don't know that I would call them good assembly programmers if they don't know anything about computer architecture. Though you are right, choosing a good algorithm to begin with is key to making a good assembly program.

Scali · Jul 19, 2010

Cogman said:
I don't know that I would call them good assembly programmers if they don't know anything about computer architecture. Though you are right, choosing a good algorithm to begin with is key to making a good assembly program.

I didn't say they were good assembly programmers, did I?
I was just trying to point out that "writing it in assembly" isn't going to get you faster code, unless you're actually good at optimizing in assembly (and obviously also at the high level, because you have to pick the proper algorithm to optimize first), which is only a small minority of all assembly programmers.

Markbnj · Jul 19, 2010

flexy said:
You are basically right. But ASM would give a programmer more control since it would be the lowest level to code - you can more "hand optimize" so to speak.

But IMO it just isn't practical anymore, not in a case where an OS alone is Gigabytes and Gigabytes of code.

I think the benefit of writing an OS in a high language and thus make the code "easier to read", modular and/or OO etc. would far out-weight a n alleged benefit of attempting to write a whole OS in ASM.

Power savings due to ASM? I dont think so - AT ALL respective i cant even see WHY ASM code would contribute to power savings.

It's actually not the lowest level. Assembler is a set of instructions that translate to opcodes. You could program in opcodes, and then save enough energy to allow Apple to boost all iPhone antenna gain by +2db! The world is saved!

Or something.

Scali · Jul 19, 2010

Markbnj said:
It's actually not the lowest level. Assembler is a set of instructions that translate to opcodes.

Since there is a 1:1 mapping between assembly mnemonics and opcodes, it is effectively the same level of programming. It's just a more human-friendly notation of the same thing.

Cogman · Jul 19, 2010

BTW Schmide, just to further emphasis the importance of a good algorithm, I blew your best out of the freaking water

, without using any assembly!

Code:

bool CogmanSimpleSolution()
{
	bool solution;
	int iStart=GetTickCount();
for(int iRuncount=0;iRuncount<10;iRuncount++)
{
    int mulTable[5001];
    for (int i = 1; i < 5001; ++i)
    {
        solution = false;
        mulTable[i] = i * i;
        int c;
        double d;
        c = 1 + mulTable[i];
        d = sqrt(c);
        int iD=(int)d;
        if (d == (double)iD)
        {
            solution = true;
        }
    }

	for(int i=2;i<5001;++i)
	{
		for(int j=i;j<5001;++j)
		{
			solution=false;
			int c;
			double d;
			c = mulTable[i] + mulTable[j];
			d=sqrt(c);
			int iD=(int)d;
			if (d == (double)iD);
			{
				solution=true; // without this it would deadcode the solution.
#ifdef _OUTPUTENABLE
				cout<<i;
				cout<<' ';
				cout<<j;
				cout<<' ';
				cout<<iD;
				cout<<endl;
#endif
			}
		}
	}
}
	int iEnd=GetTickCount();
	double dEnd=(double) iEnd;
	double dStart=(double) iStart;
	double dTotalTick=((dEnd-dStart)/(double) 1000);
	cout<<"cogmanSimplesol ";
	cout<<dTotalTick;
	cout<<" seconds"<<endl;
	return solution;
}

This was using Gcc, so the times are actually a little slower (Good job Microsoft, GCC used to be faster..) However I got 0.25 seconds for my solution and 1.987 for your simple solution. I didn't do the asm stuff just because it is a PITA to do with gcc.

KIAman · Jul 19, 2010

If your top concern was power consumption, code in random blue screens of death to force users to

1. Stop using CPU cycles
2. Reboot or shutoff

Oh wait, that feature is already implemented. GG Microsoft!

Any_Name_Does · Jul 19, 2010

Cogman said:

BTW Schmide, just to further emphasis the importance of a good algorithm, I blew your best out of the freaking water

, without using any assembly!

Code:

bool CogmanSimpleSolution()
{
    bool solution;
    int iStart=GetTickCount();
for(int iRuncount=0;iRuncount<10;iRuncount++)
{
    int mulTable[5001];
    for (int i = 1; i < 5001; ++i)
    {
        solution = false;
        mulTable[i] = i * i;
        int c;
        double d;
        c = 1 + mulTable[i];
        d = sqrt(c);
        int iD=(int)d;
        if (d == (double)iD)
        {
            solution = true;
        }
    }

    for(int i=2;i<5001;++i)
    {
        for(int j=i;j<5001;++j)
        {
            solution=false;
            int c;
            double d;
            c = mulTable[i] + mulTable[j];
            d=sqrt(c);
            int iD=(int)d;
            if (d == (double)iD);
            {
                solution=true; // without this it would deadcode the solution.
#ifdef _OUTPUTENABLE
                cout<<i;
                cout<<' ';
                cout<<j;
                cout<<' ';
                cout<<iD;
                cout<<endl;
#endif
            }
        }
    }
}
    int iEnd=GetTickCount();
    double dEnd=(double) iEnd;
    double dStart=(double) iStart;
    double dTotalTick=((dEnd-dStart)/(double) 1000);
    cout<<"cogmanSimplesol ";
    cout<<dTotalTick;
    cout<<" seconds"<<endl;
    return solution;
}

This was using Gcc, so the times are actually a little slower (Good job Microsoft, GCC used to be faster..) However I got 0.25 seconds for my solution and 1.987 for your simple solution. I didn't do the asm stuff just because it is a PITA to do with gcc.

So the MS compiler is faster and has better assembly support?

Scali · Jul 19, 2010

Any_Name_Does said:
So the MS compiler is faster and has better assembly support?

Yes and yes/no.
MS uses a very simple inline assembly syntax. gcc uses a hack where you have to write your assembly in C-strings, which is then extracted from your sourcecode and fed to the assembler.
However, MS has abandoned inline assembly support for 64-bit. gcc still supports asm in 64-bit.

Cogman · Jul 19, 2010

Any_Name_Does said:
So the MS compiler is faster and has better assembly support?

Somewhat. Gcc's ASM insertion is convoluted and complex. (plus it uses funky AT&T syntax)

As for speed, the difference wasn't extreme, like 10ms really. And as I said earlier, it used to produce faster code, so there's nothing stopping it from swapping roles again.

Any_Name_Does · Jul 19, 2010

Cogman said:
Somewhat. Gcc's ASM insertion is convoluted and complex. (plus it uses funky AT&T syntax)

As for speed, the difference wasn't extreme, like 10ms really. And as I said earlier, it used to produce faster code, so there's nothing stopping it from swapping roles again.

syntax is the same?

License restrictions on the MS compiler?
any recommendations on a good ide?

Any_Name_Does · Jul 19, 2010

Scali said:
Yes and yes/no.
MS uses a very simple inline assembly syntax. gcc uses a hack where you have to write your assembly in C-strings, which is then extracted from your sourcecode and fed to the assembler.
However, MS has abandoned inline assembly support for 64-bit. gcc still supports asm in 64-bit.

But I get it that they both support 64 bit coding. How do you tell them to write 64 or 32 bit code?

Scali · Jul 19, 2010

Any_Name_Does said:
But I get it that they both support 64 bit coding. How do you tell them to write 64 or 32 bit code?

Microsoft just installs three different sets of binaries:
1) Regular 32-bit compiler + tools
2) Regular 64-bit compiler + tools
3) Cross-compiler + tools, allowing you to compile 64-bit code on a 32-bit system.

I suppose something similar goes for gcc. I have a 64-bit FreeBSD system with gcc, but never tried to compile 32-bit code on it. I'd probably need to install some sort of cross-compiler.

Any_Name_Does · Jul 19, 2010

Scali said:
Microsoft just installs three different sets of binaries:
1) Regular 32-bit compiler + tools
2) Regular 64-bit compiler + tools
3) Cross-compiler + tools, allowing you to compile 64-bit code on a 32-bit system.

I suppose something similar goes for gcc. I have a 64-bit FreeBSD system with gcc, but never tried to compile 32-bit code on it. I'd probably need to install some sort of cross-compiler.

Would you say that 64 bit code runs better on 64 bit OS compared to 32 bit code?

Scali · Jul 19, 2010

Any_Name_Does said:
Would you say that 64 bit code runs better on 64 bit OS compared to 32 bit code?

Not necessarily.
There are a few performance pitfalls in 64-bit mode. For example, the stack is now 64-bit rather than 32-bit, so every stack push/pop uses twice the cache/memory storage and bandwidth.
If you rely heavily on stack (eg recursive routines), you could see a decrease in performance when recompiling code from 32-bit to 64-bit.
Valve ported their Source engine to 64-bit, but it ran considerably slower than the 32-bit one. Eventually they abandoned it.
The Far Cry and Crysis engines on the other hand, run faster in 64-bit mode than in 32-bit mode. So the developers knew what they were doing, and managed to side-step the pitfalls of 64-bit, and used the extra features to their advantage.

Any_Name_Does · Jul 19, 2010

Scali said:
Not necessarily.
There are a few performance pitfalls in 64-bit mode. For example, the stack is now 64-bit rather than 32-bit, so every stack push/pop uses twice the cache/memory storage and bandwidth.
If you rely heavily on stack (eg recursive routines), you could see a decrease in performance when recompiling code from 32-bit to 64-bit.
Valve ported their Source engine to 64-bit, but it ran considerably slower than the 32-bit one. Eventually they abandoned it.
The Far Cry and Crysis engines on the other hand, run faster in 64-bit mode than in 32-bit mode. So the developers knew what they were doing, and managed to side-step the pitfalls of 64-bit, and used the extra features to their advantage.

Thanks.
Now, if you would only recommend a compiler, ide and learn stuff?

Schmide · Jul 19, 2010

Cogman said:

BTW Schmide, just to further emphasis the importance of a good algorithm, I blew your best out of the freaking water

, without using any assembly!

Code:

bool CogmanSimpleSolution()
{
	bool solution;
	int iStart=GetTickCount();
for(int iRuncount=0;iRuncount<10;iRuncount++)
{
    int mulTable[5001];
    for (int i = 1; i < 5001; ++i)
    {
        solution = false;
        mulTable[i] = i * i;
        int c;
        double d;
        c = 1 + mulTable[i];
        d = sqrt(c);
        int iD=(int)d;
        if (d == (double)iD)
        {
            solution = true;
        }
    }

	for(int i=2;i<5001;++i)
	{
		for(int j=i;j<5001;++j)
		{
			solution=false;
			int c;
			double d;
			c = mulTable[i] + mulTable[j];
			d=sqrt(c);
			int iD=(int)d;
			if (d == (double)iD);
			{
				solution=true; // without this it would deadcode the solution.
#ifdef _OUTPUTENABLE
				cout<<i;
				cout<<' ';
				cout<<j;
				cout<<' ';
				cout<<iD;
				cout<<endl;
#endif
			}
		}
	}
}
	int iEnd=GetTickCount();
	double dEnd=(double) iEnd;
	double dStart=(double) iStart;
	double dTotalTick=((dEnd-dStart)/(double) 1000);
	cout<<"cogmanSimplesol ";
	cout<<dTotalTick;
	cout<<" seconds"<<endl;
	return solution;
}

This was using Gcc, so the times are actually a little slower (Good job Microsoft, GCC used to be faster..) However I got 0.25 seconds for my solution and 1.987 for your simple solution. I didn't do the asm stuff just because it is a PITA to do with gcc.

I just did a quick check and I think your foiling is foiled?

Code:

SchmideSSE 1.716 seconds
simplesol 1.857 seconds
simplesolASM 1.357 seconds
SchmideSSEASM 0.827 seconds
cogmanSimplesol 1.716 seconds
Hit any key to continue (Where's the ANY key?)

What happened I think is you put this semicolon here in the second loop. The assignment to the solution variable is important to prevent dead code. VC caught it. Though I would of dug into it as it just didn't make sense.

Code:

			if (d == (double)iD)[COLOR="Red"];[/COLOR]

This caused the compiler to see the second loop as dead code (i.e. no need to execute it)

If you look at the disassembly.

Code:

	for(int i=2;i<5001;++i)
01361068  inc         eax  
01361069  cmp         eax,1389h 
0136106E  jl          CogmanSimpleSolution+66h (1361066h) 
for(int iRuncount=0;iRuncount<10;iRuncount++)
01361070  sub         edi,1 
01361073  jne         CogmanSimpleSolution+20h (1361020h) 
			int c;
			double d;
			c = mulTable[i] + mulTable[j];
			d=sqrt((double)c);
			int iD=(int)d;
			if (d == (double)iD)
			{
			}
		}
	}
}

EDIT: This is the assembly with the solution variable assignment.

Code:

	for(int i=2;i<5001;++i)
	{
		for(int j=i;j<5001;++j)
		{
			solution=false;
			int c;
			double d;
			c = mulTable[i] + mulTable[j];
00DD15B1  mov         eax,dword ptr [esp+esi*4+20h] 
00DD15B5  add         eax,ebp 
			d=sqrt((double)c);
00DD15B7  mov         dword ptr [esp+10h],eax 
00DD15BB  fild        dword ptr [esp+10h] 
00DD15BF  xor         bl,bl 
00DD15C1  fsqrt            
			int iD=(int)d;
			if (d == (double)iD)
00DD15C3  fld         st(0) 
00DD15C5  call        _ftol2_sse (0DD21E0h) 
00DD15CA  mov         dword ptr [esp+10h],eax 
00DD15CE  fild        dword ptr [esp+10h] 
00DD15D2  fucompp          
00DD15D4  fnstsw      ax   
00DD15D6  test        ah,44h 
00DD15D9  jp          CogmanSimpleSolution+9Dh (0DD15DDh) 
			{
				solution=true; // without this it would deadcode the solution.
00DD15DB  mov         bl,1

Nice try though but multiplying 2 numbers locally should only save you 3 ticks at most nowhere near the amount you saved there, especially with the extra access of the lookup table.

How much electricity would be saved worldwide if Windows was writen in Assembly?

Banned

Diamond Member

Golden Member

Member

Platinum Member

Banned

Diamond Member

Banned

Diamond Member

Lifer

Banned

Elite Member <br>Moderator Emeritus

Banned

Lifer

Diamond Member

Member

Banned

Lifer

Member

Member

Banned

Member

Banned

Member

Diamond Member