If more things were written in assembly....

imported_goku · Sep 16, 2005

If more programs were optimized and possibly written in assembly language, wouldn't games like facry run on a PIII 600MHZ system? What makes C++ more inefficient than assembly if it all gets converted into machine code?

PottedMeat · Sep 17, 2005

doesnt the C Compiler interpret your C statements into assembly -> any asm code produced is as good/efficient as the writer of the C compiler made it?

so hand coding everything in asm should always be more efficient - though it may take forever, so in a lot of cases i guess it wouldnt be worth the effort.

NightFalcon · Sep 17, 2005

Originally posted by: goku
If more programs were optimized and possibly written in assembly language, wouldn't games like facry run on a PIII 600MHZ system? What makes C++ more inefficient than assembly if it all gets converted into machine code?

Ah, my favorite topic

There are many things that prevent a complier from taking C/C++ code and turning it into the most efficient set of asm instructions. But for the most part, it's not just the translation that isn't (and can't be) perfect, it's also the fact that even asm written for one machine, could perform very differently on another, even if they are both the same platform. Just one thing like the cache size could have a dramatic effect on the speed with which a program executes.

Last semester I actually took a course at my university that was dedicated to teaching assembly especially for the purpose of optimizing your C/C++ programs. In that course, any time we had to optimize code, we were told the exact specs of the hardware that it will be tested on. Also, after compiling the program, we would often disassemble it, and then make modifications to the actual assembly code by hand to take care of optimizations that a compiler just can't make.

One of the key things to remember about compilers, is that while optimizations are allowed, they can't in any way interfere with the result (output of the program). As a human, you can take a look at a piece of code and see that if you move some things around, the program would run faster. For a compiler however, if moving things around in one program makes it run correctly and faster, while in another it alters the outcome, that optimization isn't allowed.

Here are a few things to keep in mind:

Greater efficiency of your code usually makes it more difficult to understand and maintain. Languages, in general, were created so that it is easier to program machines, so there will always be a price to pay for convenience. There are also various other reasons that code in C/C++ is difficult for a compiler to optimize.

Take this for example:
int x = 0;
int y = 0;
for (int i = 0; i < 10000; i++) {
x++;
y++;
}

This code will run with a certain speed, but if you do the following, it will run quite a bit faster:
int x = 0;
int y = 0;
for (int i = 0; i < 10000; i+=4) {
x++;
y++;
x++;
y++;
x++;
y++;
x++;
y++;
}

And the code above will also run faster then this code:
int x = 0;
int y = 0;
for (int i = 0; i < 10000; i+=4) {
x++;
x++;
x++;
x++;
y++;
y++;
y++;
y++;
}

Each one does the same thing. In the second example, I used what is called "loop unrolling" to minimize the number of times i < 10000 comparison is made. It also eliminates a jump which can stall the CPU pipeline (this is called a control hazard, where the pipeline doesn't know if the jump should be made or ignored). In the third example, it looks just like the second, except I first increment all the x's and then all the y's. So why would it run slower? Once again because of the pipeline. In the second case, the CPU would be able to begin executing y++ even before the x++ before it was able to finish. In the third case, because all x++'s are together, each one has to wait before the other one finishes to get the new value of x (this is called a data hazard).

While it seems that things like this should be optimizable by the compiler, I could just as easily come up with a situation in which re-ordering some things could have a very drastic effect on the outcome of the program (especially when you start dealing with pointers), and thus isn't doable by the compiler. When it comes to pipelining, very often when writing in C/C++, especially with compiler optimizations on, it is very difficult to predict exactly how the compiler decides to execute instructions. Usually, compiler optimizations help, but there were many times in my own experience, when I optimized the code myself and it actually ended up running much faster when compiler did no optimizations of its own. Sometimes, what looks like a good idea in C/C++, can actually turn out pretty bad once it's converted into asm (data hazards, control hazards, etc.).

There's also the involvement of cache, which is pretty much transparent in both C/C++ and asm, but consider the following:
You have a matrix in which you have to process each element. The most obvious solution would be a for loop inside of another for loop, but do you process elements row by row, or column by column? Same question, but now imagine you're coding in Pascal?

In C/C++, the row by row approach would be MUCH faster then column by column. Also, if you process two rows at the same time and then increment the outer loop by 2, it will run even faster still. This is due to the fact that C/C++ store matrices in row-major order. Any time you access a memory location, a block of memory (usually 4096 bytes) is transferred to the cache. If you process elements next to the first element, then no more access to memory is needed - the data is already cached and can be accessed in 1 clock cycle. But if you go through a single column first, then on each request, data has to be grabbed from memory, put into cache, overwriting what you had in there before.

In Pascal, however, it's the complete opposite. Because matrices are stored in column-major order, row by row approach would be a lot slower. Now tell me where in the language itself would be you have been able to figure out this information (without doing a bit of tinkering)?

For all of the reasons above (and many more), assembly will always be a faster approach because it brings you closer to hardware. As with the example of cache, in assembly there wouldn't be any question of the most optimal way of doing things, because you're the one who decides how a matrix is stored. A compiler has a pre-determined way of doing things, and if the programmer isn't aware of such little details and goes the other route (which is perfectly legal, and logically shouldn?t have any effect), compiled programs often end up with many problems that can only be resolved by hand.

Hope that helps

Bassyhead · Sep 18, 2005

Originally posted by: goku
If more programs were optimized and possibly written in assembly language, wouldn't games like facry run on a PIII 600MHZ system? What makes C++ more inefficient than assembly if it all gets converted into machine code?

Assembly is more "efficient" because it consists of the smallest step-by-step instructions you can give to a CPU. C++ is quite a bit more advanced (higher-level form of understanding), so the compiler has to do quite a bit of work to get it into assembly. The author of the compiler can't possibly know or write for every possible type of software/hardware or to accomodate for the infinite ways to write an algorithm. Many compilers are optimized for a certain range of hardware configurations, for example icc (Intel's C/C++ compiler). Even so, all Intel CPUs are different, so you would have to make an executable for every Intel platform you wanted to run the program on. Even then, it might not run well on AMD platforms.

Keep in mind that when one writes in a higher-level language, we write in a way that makes most sense to us. However, a certain CPU may run best when things are written in a different order. Objects, classes, inheritance, threading all don't exist in assembly, so again the compiler has to figure out how to write that, too. Basically to sum up, the main advantage of assembly is that it can be written so operations on a CPU will run, to our best ability, in a set order that runs best. That would be very specific, though. It's also more difficult to write a large program in assembly. That's why higher-level languages exist, to abstract the way of thinking and allow more portability in exchange for some speed.

Markbnj · Sep 18, 2005

The short answer is yes. The caveat is that your next OS would take ten years to develop and cost more than a replacement transmission for a 350z.

calculusz · Sep 19, 2005

Assembly is a nightmare to code in.

NightFalcon · Sep 19, 2005

Originally posted by: calculusz
Assembly is a nightmare to code in.

Not the least bit. It takes a long time - yes, it is much more difficult to take someone else?s code and modify it yourself - yes, it's also more difficult to debug, though with the right tools this can be helped a bit. It is certainly not difficult. In fact, my personal view is that those who are being taught computer science in school may greatly benefit from starting their education by learning hardware followed by some basic assembly. Being able to program in assembly creates much better programmers overall. It is not at all difficult if taught in the right way with sufficient information about hardware and why things are done a certain way. The unfortunate part is that while there will be one or two students who benefit from this kind of a program, the ones that are taking computer science just for a tech credit will die in such a class.

I used to spend quite a lot of time writing inline assembly in my C/C++ programs especially when dealing with graphics (and you don't know graphics till you've coded in 13h mode - 320x200x256... oh yea!

). Once you learn the basic structures of a function (backup the frame pointer, assign stack pointer to frame pointer, read any parameters with the first one being at stack pointer + 4 bytes, restore frame pointer at the end, etc...), what would be an if statement in C, a for loop, a while loop, and some of the other basic things all of which have a very clear mapping from C to asm, it becomes very easy to write whole programs in assembly.

Now of course, if you want to do things like OOP, then I certainly would say you should go with C++, but considering how much has been done with C and how easy it is to translate most of those things, assembly only appears to be difficult, but really isn't.

Calin · Sep 19, 2005

Originally posted by: calculusz
Assembly is a nightmare to code in.

Assembly is nice to code in - as long as you write programs that are several hundred lines long. However, writing code in asm would be much harder for projects where hundreds of programmers add their work. Even calling a function that it isn't in your code is not nice, how about other things? Encapsulating would be difficult at least, polymorphism... better not to think about.
Will it run faster? Yes, maybe several times faster. Will it be written slower? Yes, maybe 100 times slower.

ShaneDOTM · Sep 19, 2005

Maybe I missed this, but one simple line of C++ code is converted to about 5 lines of assembly code. the act of changing a variable several times is multiplied several times. The coding of programs in C++ or the like makes it easier and faster to code games and the like.

DrPizza · Sep 19, 2005

Nightfalcon, thanks for that response... I actually had one of those "ah ha" moments while reading through it. It's been years since I did a lot of programming (18.)

As Calin said, it's not hard to write programs by yourself in Assembly.
But, it's a pita to write with a larger group of people.

In the end, it comes down to overall efficiency. I'm no longer in the programming "loop", but I'd be surprised to find out that on major programs, many of the subroutines aren't actually written in assembly for the sake of efficiency, while other broader portions are written in C++ (or other language.)

smack Down · Sep 19, 2005

The biggest reason is out of order processors don't need perfect asm instructions. The processor uses branch prediction so that it can unroll loops by it self and it will look in a window for any instructions it can execute that don't have any true data dependences. ie y = 1 + 2 followed by x= 2 * y. The multiply would be held until the add completed but if y = 1 + 2 was followed x = 2 + 4 then both adds would be done at the same time. The details of how much a CPU can do at the same time isn't published and can change alot so it is hard for a person to design asm instructions to take advantage of the details.

interchange · Sep 20, 2005

An inefficient program can be written in any language.

More important than instruction optimizing for games is bandwidth. You can optimize your processing, but if you can't feed it the data to crunch then you're SOL.

For a game, you're dealing with large arrays of data (e.g. screen buffers). You're usually doing a transform over the whole data or a selected portion of the data, so you'll use optimized (usually hardware accelerated) routines for most of the calculation-intensive portions of your code. So if 90% of your processing power is devoted to doing matrix multiplications, for example, what's the use in painstakingly optimizing the other 10%?

Most of the efficiency of particular graphics engines comes from how well they organize GPU and CPU hardware resources and how few stages they can use to draw a complete render.

In short, no matter how much you optimize, you can't really make something as data-intensive as a game run fast on an outdated machine.

gsellis · Sep 20, 2005

Hmmmm... I remember writing 370 style assembler, and it is indeed a pita. 16 registers, of which, only 2 temps and 3 or 4 other registers were available for application use (it was TPF, so the other registers belonged to the system/core). One register held your 'buffer' address, where you were constantly having to swap addresses of file blocks and memory blocks in and out if you were doing chained records of file addresses for other records. But, you could optimize like you could not believe (hey, it was a 2.5GL compiler written in 1985). Part of our walkthrough was to verify that the compiler worked efficiently and remediate if it did not.

Of course, all our code blocks were less than 1k...

Search

If more things were written in assembly....

imported_goku

Diamond Member

PottedMeat

Lifer

NightFalcon

Senior member

Bassyhead

Diamond Member

Markbnj

Elite Member <br>Moderator Emeritus

calculusz

Member

NightFalcon

Senior member

Calin

Diamond Member

ShaneDOTM

Member

DrPizza

Administrator Elite Member Goat Whisperer

smack Down

Diamond Member

interchange

Diamond Member

gsellis

Diamond Member

TRENDING THREADS