How much electricity would be saved worldwide if Windows was writen in Assembly?

Page 14 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Status
Not open for further replies.

Cogman

Lifer
Sep 19, 2000
10,286
147
106
I was actually aware of that, realized it that when I increased the number of runs, the number of solutions also increased, and I normalized everything to the original 10 runs. Btw, all times reported are also per 10 runs, not per 1 run :)

As for the MT version, I actually planned to do that, perhaps you can try with Scali's suggestion, I won't be able to until later today.
I was going to divide work in a bit better way though. You have it by n, e.g. for 2 cores, it'd be 1-25 and 26-50. The thing is that those inner m and k loops will be executed a lot more for smaller n's, so it would be better to have something like 1-15/16-50 (and say 7, 10, 16, 17 for 4 cores) as the first thread will have lots more to do than the last. It would need some hardcoding though, and perhaps experimenting, but the overhead does make it quite unlikely that much can be gained, unless Interlocked add helps.

Meh, the critical section is only used once per thread, and critical sections are really only slow if there is a collision (really not all that likely). I did an interlocked add just to make sure and got similar results (Ok, it was with wonky gcc asm as mingw doesn't support the InterlockedAdd function).

Where the speeds are ~2x that of the single threaded version, I'm going to say that thread overhead is the big killer here. Threading will probably only benefit this if we are finding more solutions, or using a crappier algorithm. Perhaps if I'm bored, I'll thread my original solution for the heck of it.
 

Scali

Banned
Dec 3, 2004
2,495
1
0
Meh, the critical section is only used once per thread, and critical sections are really only slow if there is a collision (really not all that likely). I did an interlocked add just to make sure and got similar results (Ok, it was with wonky gcc asm as mingw doesn't support the InterlockedAdd function).

gcc has an equivalent, __sync_add_and_fetch();
 

Markbnj

Elite Member <br>Moderator Emeritus
Moderator
Sep 16, 2005
15,682
14
81
www.markbetz.net
Thanks Markbnj,

Well, I actually didn't mind his remarks. I saw it more as provocation than insult and It takes much more than that to provoke me, but you are right. Rules are rules.

If you really want to thank me, then don't ever actually create a thread this dumb in the Programming forum. I have a family. Think of the children. As long as it stays on some sort of optimized assembler discussion, which is where it seems to have landed, I'll leave it. But my finger has been hovering over the lock button since Virge copied it in.
 

Any_Name_Does

Member
Jul 13, 2010
143
0
0
If you really want to thank me, then don't ever actually create a thread this dumb in the Programming forum. I have a family. Think of the children. As long as it stays on some sort of optimized assembler discussion, which is where it seems to have landed, I'll leave it. But my finger has been hovering over the lock button since Virge copied it in.

Do you mean troll invasion?
 

Scali

Banned
Dec 3, 2004
2,495
1
0
Do you mean troll invasion?

Come on, stop pushing it.
Apparently your opinion was an impopular one, and met with a lot of adversity.
You could have known beforehand.

And even if you didn't... by now you should have realized that these 'trolls' were telling the truth, as crude as they may have been in delivering the message.
After all, plenty of code was produced to demonstrate that pretty much everything that was claimed before, is actually true (eg, that choosing the right algorithm is more important than the language, or that a modern compiler will easily outperform the average programmer trying to write some assembly).
 

Markbnj

Elite Member <br>Moderator Emeritus
Moderator
Sep 16, 2005
15,682
14
81
www.markbetz.net
Do you mean troll invasion?

There has no doubt been trolling on both sides. In this forum we talk about programming, and topics closely related to programming, and in general try to do so as adults would if they were in each other's physical presence. This thread violates pretty much all of those standards in one aspect or another.
 

Any_Name_Does

Member
Jul 13, 2010
143
0
0
Come on, stop pushing it.
Apparently your opinion was an impopular one, and met with a lot of adversity.
You could have known beforehand.

And even if you didn't... by now you should have realized that these 'trolls' were telling the truth, as crude as they may have been in delivering the message.
After all, plenty of code was produced to demonstrate that pretty much everything that was claimed before, is actually true (eg, that choosing the right algorithm is more important than the language, or that a modern compiler will easily outperform the average programmer trying to write some assembly).

You don't need to lock truth tellers. Schmidi told a truth and it was fine. some other guys optimized on his truth and it got better.
I already explained why I opened this thread. Here is the summary.
Ladies and gentlemen and dear trolls,
I just got fed up with assembly and wanted some proof that high level languages hold their own. so I would motivated to start learning one. And the proof was delivered. Go thank the moderator for locking you out, otherwise I would be doing things with you, you wouldn't want.

Why not lock the thread?
 

Scali

Banned
Dec 3, 2004
2,495
1
0
Why don't we leave the thread open, as various people are apparently still playing with the different algorithms.
 

BoberFett

Lifer
Oct 9, 1999
37,562
9
81
You don't need to lock truth tellers. Schmidi told a truth and it was fine. some other guys optimized on his truth and it got better.
I already explained why I opened this thread. Here is the summary.
Ladies and gentlemen and dear trolls,
I just got fed up with assembly and wanted some proof that high level languages hold their own. so I would motivated to start learning one. And the proof was delivered. Go thank the moderator for locking you out, otherwise I would be doing things with you, you wouldn't want.

Why not lock the thread?

Things we wouldn't want? Like what, more crappy code from a terrible programmer?

I can understand your moderation Mark, but I'd say my insults for this troll were fairly well deserved.
 

Cogman

Lifer
Sep 19, 2000
10,286
147
106
Speaking of which....

Code:
struct cogmanTData
{
    int start;
    int end;
    int* mulTable;
    int* sol;
    CRITICAL_SECTION* CS;
};

DWORD WINAPI CogmanSSThread(void* data)
{
    cogmanTData* tData = (cogmanTData*)data;
    int* mulTable = tData->mulTable;
    int solutions = 0;
    for(int i=tData->start; i<tData->end; ++i)
    {
        for(int j=i+1; j<5001; ++j)
        {
            int c;
            int d;
            c = mulTable[i] + mulTable[j];
            d=sqrt(c);
            if (mulTable[d] == c)
            {
                ++solutions; // without this it would deadcode the solution.
#ifdef _OUTPUTENABLE
                cout<<i;
                cout<<' ';
                cout<<j;
                cout<<' ';
                cout<<iD;
                cout<<endl;
#endif
            }
        }
    }
    EnterCriticalSection(tData->CS);
    *tData->sol += solutions;
    LeaveCriticalSection(tData->CS);
    return 0;
}

bool CogmanSimpleSolutionThreaded()
{
    const int NUM_RUNS = 100;
    LARGE_INTEGER start, end, freq;
    int solutions = 0;
    QueryPerformanceCounter(&start);

    int mulTable[7074];
    mulTable[2] = 7;

    for(int iRuncount=0; iRuncount<NUM_RUNS; iRuncount++)
    {
        // Generate the multiplication table
        int start = 1;
        if (mulTable[2] != 4)
        {
            for (int i = 1; i < 5001; ++i)
            {
                mulTable[i] = i * i;
                int c;
                int d;
                c = 1 + mulTable[i];
                d = sqrt(c);
                if (d * d == c)
                {
                    ++solutions;
                }
            }
            start = 2;
            for (int i = 5001; i < 7074; ++i)
            {
                mulTable[i] = i * i;
            }
        }
        #define CNUMTHREADS 32
        CRITICAL_SECTION CS;
        InitializeCriticalSection(&CS);
        HANDLE threads[CNUMTHREADS];
        cogmanTData tData[CNUMTHREADS];
        int stopPos = 1;
        for (int j = 0; j < CNUMTHREADS; ++j)
        {
            tData[j].start = stopPos;
            stopPos += 5001 / CNUMTHREADS;
            if (stopPos >= 4999) // Make sure we catch the end
                stopPos = 5001;
            tData[j].end = stopPos;
            tData[j].sol = &solutions;
            tData[j].mulTable = mulTable;
            tData[j].CS = &CS;
            threads[j] = CreateThread(NULL, 0, CogmanSSThread, (void*)&tData[j], 0, NULL);
        }
        WaitForMultipleObjects(CNUMTHREADS, threads, true, INFINITE);
    }
    QueryPerformanceCounter(&end);
    QueryPerformanceFrequency(&freq);
    double dEnd=(double) end.QuadPart / (double)freq.QuadPart;
    double dStart=(double) start.QuadPart / (double)freq.QuadPart;
    double dTotalTick=((dEnd-dStart)/(double) NUM_RUNS);
    cout<<"CogmansSimpleSolT ";
    cout<<dTotalTick;
    cout<<" seconds, solutions found: "<< solutions/NUM_RUNS << endl;
    return solutions;
}

This shows a CLEAR speed up from the result of using more then one thread, about 4x vs the regular code (I took the 10x monkier out. So, multiply by ten to see the improvement...)
 
Status
Not open for further replies.