Compiler optimized for distributed applications?

kevinthenerd · Apr 3, 2006

Are there compilers out there that will intelligently optimize code for multiple processors? You might say that code HAS to be written for multiple processors, but a lot of code isn't that could be.

Consider a simple little app like this, written in crappy pseudocode. I put line numbers so you can see the steps involved.

1. Initialize a variable "a"
2. Initialize a variable "b"
3. Initialize a variable "c"
4. Initialize a variable "d"
5. Store 10 into a
6. Store 15 into b
7. Store 20 into c
8. Store 25 into d
9. Add a and b and store it to C
10. Multiply D with C and store it to A

Now, what if a compiler optimized this code for multiple processors? You see, the problem with multithreading is dependency on previous operations. What if a code skimmed through and realized that, for example, variable A wasn't being used in lines 2,3, 4, 6, 7, or 8? An optimized code for two processors could look something like this:

1. Initialze "a" and "b"
2. Initialize "c" and "d"
3. Store 10 into "a" and 15 into "b"
4. Add "a" and "b" and store it into "c"
5. Multiply "d" with "c" and store it into "a"

The whole idea is based on data dependency on previous and future operations.

Let's take a more extreme example:

1. Initialize "a"
2. Initialize "b"
3. Initialize "c"
4. Initialize "d"
5. Initialize "e"
6. Initialize "f"
7. Initialize "g"
8. Initialize "h"
9. Store 1.414 into "a"
10. Store 1.732 into "b"
11. Store 2.236 into "c"
12. Store 2.646 into "d"
13. Store 3.317 into "e"
14. Store 3.606 into "f"
15. Store 4.123 into "g"
16. Store 4.359 into "h"
17. Add "e" and "f" and store it to "g"
18. Add "a" and "b" and store it to "c"
19. Multiply "d" and "c" and store it to "a"
20. Multiply "h" and "g" and store it to "e"

This can be rewritten for two threads:

1. Initialize "a" ; Initialize "b"
2. Initialize "c" ; Initialize "d"
3. Initialize "e" ; Initialize "f"
4. Initialize "g" ; Initialize "h"
5. Store 1.414 into "a" ; Store 1.732 into "b"
6. Store 2.236 into "c" ; Store 2.646 into "d"
7. Store 3.317 into "e" ; Store 3.606 into "f"
8. Store 4.123 into "g" ; Store 4.359 into "h"
9. Add "e" and "f" and store it to "g" ; Add "a" and "b" and store it to "c"
10. Multiply "d" and "c" and store it to "a" ; Multiply "h" and "g" and store it to "e"

What do you think?

EagleKeeper · Apr 3, 2006

One would have to develop a smart compiler.

Also, then the compiler may destroy code logic that is designed by the developer.

Anything that is within a single source file might be able to be optimized; however, very few applications are single source file anymore.

notfred · Apr 3, 2006

So you've streamlined memory allocation. Big deal, that's not going to save you a ton of time. Memory allocation is not something that tends to be particularly processor intensive.

Write your code to do something that actually requires some processing power. Try to think of a problem that fits that criteria but DOESN'T use any previously stored data.

Good luck finding a useful application for that compiler.

Nothinman · Apr 3, 2006

gcc has a hard enough time keeping keywords like inline consistent across versions...

kevinthenerd · Apr 3, 2006

Originally posted by: notfred
So you've streamlined memory allocation. Big deal, that's not going to save you a ton of time. Memory allocation is not something that tends to be particularly processor intensive.

Write your code to do something that actually requires some processing power. Try to think of a problem that fits that criteria but DOESN'T use any previously stored data.

Good luck finding a useful application for that compiler.

1. Dim "a"
2. Dim "b"
3. Dim "c"
4. Dim "d"
5. Let "b" = 3.142
6. Let "d" = 2.718
5. "a" = Function_that_takes_a_long_time(b)
6. "c" = Function_that_takes_a_long_time(d)

CTho9305 · Apr 3, 2006

Originally posted by: kevinthenerd

Originally posted by: notfred
So you've streamlined memory allocation. Big deal, that's not going to save you a ton of time. Memory allocation is not something that tends to be particularly processor intensive.

Write your code to do something that actually requires some processing power. Try to think of a problem that fits that criteria but DOESN'T use any previously stored data.

Good luck finding a useful application for that compiler.

Click to expand...

1. Dim "a"
2. Dim "b"
3. Dim "c"
4. Dim "d"
5. Let "b" = 3.142
6. Let "d" = 2.718
5. "a" = Function_that_takes_a_long_time(b)
6. "c" = Function_that_takes_a_long_time(d)

One major problem here is that it's actually extremely hard for the compiler to figure out that Function_that_takes_a_long_time can actually run in another thread the way you want it to. In languages that people actually use (i.e. C, C++) it can be practically impossible to determine that. Issues include pointers - the compiler doesn't know that two pointers are guaranteed to point to different locations, and side effects. If Function_that_takes_a_long_time has any side effects at all (e.g. modifying any global / shared variables, system calls that have effects like displaying things or writing to files), you can't just run it multiple times in parallel.

In your initial example, you would have had more overhead spawning the two threads than just executing on one processor. In your later example, it's simple enough for the programmer to parallelize it by hand. The real-world examples are much more complicated.

Consider a super-simple hypothetical game engine, with a couple threads:
1. AI
2. Physics

Assume your AI for each character is "look around, pick up food, shoot other people", and physics handles stuff like people stepping on things that can move, dropping stuff, arrows flying, etc. You want the world to update continuously, so you have something like this:

do forever {
AI();
Phsyics();
}

Running AI() and Physics() in separate threads isn't going to be something the compiler can do! Why? Well, they both need to look at the world state. The AI engine needs to see where everything is, and potentially move stuff around. The physics engine also needs to look at all the objects in the world and move them around. A really smart compiler that understands parallelism will see that the two functions which at first glance seem independent are not. It's also not entirely trivial for a human to parallelize it well, since you'll need to protect the shared datastructures (i.e. the state of the world and things in it) with locks to avoid problems. Consider this sequence of events:

1. AI sees apple on somebody's head; physics engine begins finding interaction between the apple, and an arrow flying towards it
2. AI eats the apple; physics engine sees that the arrow hit the apple, and decides the apple should get knocked off the person's head
3. AI removes the apple from the world; phsyics engine attempts to move the apple based on its calculation results
4. Game crashes (or some other bad thing happens)

In real programs, you have many complex interactions, so it's hard for the programmer to make sure all datastructures are well-protected, and it's hard for the compiler to see when two operations are guaranteed under all circumstances to be independent.

Armitage · Apr 3, 2006

Well, first you need to distinguish between parallel applications and distributed applications. When you say a distributed application that implies (to me at least) seperate computers working in concert on a single problem. This could be loosely coupled, like SETI, or tightly coupled as in a program run on a dedicated Beowulf style cluster.

What you seem to be describing is not a distributed program, but a multi-threaded program on a shared memory system. In which case there are some tools and compilers that offer some level of automatic parallelization, though often at a very low level. One example is the automatic vectorization via SIMD processor features. At a higher level you have things like the auto-parallizing compilers from Portland Group (PGI). Take a look at the OpenMP project as well - this provides additional syntax to let the programmer provide additional information about what can be parallelized and what can't be.

The PGI compilers also offer some automatic paralleization for distributed programs also.

My guess is that what can be auto-parallelized will be pretty low level simple stuff, and will probably also need to be profile guided for the copiler to have any hope of figuring out what is worth parallelizing.

notfred · Apr 3, 2006

Originally posted by: kevinthenerd

Originally posted by: notfred
So you've streamlined memory allocation. Big deal, that's not going to save you a ton of time. Memory allocation is not something that tends to be particularly processor intensive.

Write your code to do something that actually requires some processing power. Try to think of a problem that fits that criteria but DOESN'T use any previously stored data.

Good luck finding a useful application for that compiler.

Click to expand...

1. Dim "a"
2. Dim "b"
3. Dim "c"
4. Dim "d"
5. Let "b" = 3.142
6. Let "d" = 2.718
5. "a" = Function_that_takes_a_long_time(b)
6. "c" = Function_that_takes_a_long_time(d)

Like CTho9305 said, that's absolutely trivial to do by hand, and it doesn't work if b or d are reference variables. Aside from a prime number tester or something, what kind of program is going to need to do that?

Armitage · Apr 3, 2006

Here's something else that may be of interest:

http://www.research.ibm.com/journal/sj/451/eichenberger.html
http://arstechnica.com/news.ars/post/20060225-6265.html

An interesting quote:

The compiler provides user-guided parallelization and compiler management of the underlying memories for code and data. When the user directives are applied in a thoughtful manner by a competent user, the compiler provides significant ease of use without significantly compromising performance.

EagleKeeper · Apr 3, 2006

Originally posted by: Armitage
Here's something else that may be of interest:

http://www.research.ibm.com/journal/sj/451/eichenberger.html
http://arstechnica.com/news.ars/post/20060225-6265.html

An interesting quote:

The compiler provides user-guided parallelization and compiler management of the underlying memories for code and data. When the user directives are applied in a thoughtful manner by a competent user, the compiler provides significant ease of use without significantly compromising performance.

Click to expand...

Now when can we get that technology ported for embedded systems and x86 based systems:wine:

Armitage · Apr 3, 2006

Originally posted by: EagleKeeper

Originally posted by: Armitage
Here's something else that may be of interest:

http://www.research.ibm.com/journal/sj/451/eichenberger.html
http://arstechnica.com/news.ars/post/20060225-6265.html

An interesting quote:

The compiler provides user-guided parallelization and compiler management of the underlying memories for code and data. When the user directives are applied in a thoughtful manner by a competent user, the compiler provides significant ease of use without significantly compromising performance.

Click to expand...

Click to expand...

Now when can we get that technology ported for embedded systems and x86 based systems:wine:

I've heard good things about the Portland Group tools combined with OpenMP, as I mentioned above. But haven't used it myself.

Search

Compiler optimized for distributed applications?

kevinthenerd

Platinum Member

EagleKeeper

Discussion Club Moderator<br>Elite Member

notfred

Lifer

Nothinman

Elite Member

kevinthenerd

Platinum Member

CTho9305

Elite Member

Armitage

Banned

notfred

Lifer

Armitage

Banned

EagleKeeper

Discussion Club Moderator<br>Elite Member

Armitage

Banned

TRENDING THREADS