Functional Programming

Markbnj

Elite Member <br>Moderator Emeritus
Moderator
Sep 16, 2005
15,682
13
81
www.markbetz.net
Slashdot linked a 2-page piece by Michael Swaine this morning on Dr' Dobb's portal site, asking the following question: with recent statements out of Intel to the effect that "Moore's Law is now the responsibility of software" (i.e. we can't make cores faster so we're putting more of them on a die), is it time to get good at functional programming?

What do you guys think? FP is supposed to be a major-league paradigm shift for people like me steeped in years of imperative languages. Is it time to start looking at FP? If so, what's the best place to start? F#? Scala? Or something pure like Haskell or Erlang?
 

presidentender

Golden Member
Jan 23, 2008
1,167
0
76
I agree that it may be time to start looking at functional programming, especially given the support that Microsoft seems to be giving F#. As a .NET developer, that's where I'd start.
 

DaveSimmons

Elite Member
Aug 12, 2001
40,730
670
126
IMHO, not for the 99% of application development for client software that is almost entirely single-threaded and procedural, maybe using a worker thread now and then (that doesn't use parallelism either) to maintain a responsive UI during long or blocking operations.

As Swaine says:
FP won't make your word processing program faster or better.

An order-entry form does not benefit from parallel programming, and the back-end will likely be parallelized by the database engine developer not you.

I had fun with Lisp and Prolog in college, but I've yet to encounter a real-world application where it made sense to use them. Even the simple rule-based AI we wrote at my last job was done in C++.

Is it worth playing with to stretch your mind a bit? Sure.
 

Ken g6

Programming Moderator, Elite Member
Moderator
Dec 11, 1999
16,043
3,529
65
That article started me looking for ways to use Python's FP commands in parallel. I didn't exactly find that, but I did find something called Parallel Python that looks very interesting.

Basically, if you have any process that can be contained within a function (i.e. doesn't use global variables), you can make it a parallel job. I haven't tried it yet, but it seems a lot less harsh than full functional programming, as most parts are still imperative.

Here's some 1.5-year-old praise from Bruce Eckel.
 

Markbnj

Elite Member <br>Moderator Emeritus
Moderator
Sep 16, 2005
15,682
13
81
www.markbetz.net
Yeah I agree with that, Dave. What makes that observation interesting is that when you do find a payload that will benefit from parallelism, unless it is some piece of pure analysis code, it's likely to be embedded into an otherwise fairly standard application framework. Meaning, it will need all the usual UI, IO, even handling, etc. So do you integrate FP with traditional languages for the parts that can benefit from divide and conquer strategies? Or do you turn to libraries of functions that perform some of the usual services, i.e. you evaluate a function that opens a window, and it in turn evaluates other functions that perform the stuff the window needs to do? What is an event in a functional programming language?
 

Cogman

Lifer
Sep 19, 2000
10,277
125
106
IDK, from what I have read about FP, it doesn't seem very practical for much more then scientific analysis. Multicores are useful, but I really don't see them going too much further past 16 or 32 processors. (if they even hit that).

FP just doesn't seem to cut it. It's just too hard to program with. A new paradigm that takes advantage of multithreading might be needed, however, FP isn't that paradigm.
 

DaveSimmons

Elite Member
Aug 12, 2001
40,730
670
126
Originally posted by: Markbnj
So do you integrate FP with traditional languages for the parts that can benefit from divide and conquer strategies? Or do you turn to libraries of functions that perform some of the usual services, i.e. you evaluate a function that opens a window, and it in turn evaluates other functions that perform the stuff the window needs to do? What is an event in a functional programming language?

Good questions, I can see hybrid applications being more practical in .Net since under the hood they all compile down together. Maybe something like game programming where you have much of the app in your "normal" language then switch to GPU shader programming in the graphics engine code. Or so I gather, the last time I did game programming was in 6502 assembly on a C=64 :)

In other words, treat the AI or natural language or math or whatever part of the program as being handled by an FP-based "processor" that you set up an intital state for and call out to from your C++ or C# program. That part might be written by a FP specialist who doesn't work on the UI or main program logic.

I don't see this being needed for many applications, but it might be an interesting area to specialize in for someone who is bored with normal programming.

i.e. you evaluate a function that opens a window, and it in turn evaluates other functions that perform the stuff the window needs to do?
That feels like the wrong approach to me. I can see it existing in the FP language for applications that need just a little UI and I/O, batch FP type programs where you want to be able to write results or put up an error alert, but for more than that I'd guess it would turn into a torturous exercise in coding to convert what really is procedures into math-style (not code) functions.
 

Markbnj

Elite Member <br>Moderator Emeritus
Moderator
Sep 16, 2005
15,682
13
81
www.markbetz.net
That feels like the wrong approach to me. I can see it existing in the FP language for applications that need just a little UI and I/O, batch FP type programs where you want to be able to write results or put up an error alert, but for more than that I'd guess it would turn into a torturous exercise in coding to convert what really is procedures into math-style (not code) functions.

Exactly, although I wonder if not seeing this means I am missing something. I've only just started to look into it more. It's hard to get around the differences between function evaluation and an event driven message processing model. It almost seems like you would be forced back into polling the UI within a function you are evaluating.

I like the black-box processor approach. Seems intuitively right, and the game programming analogy is a good one.
 

nordloewelabs

Senior member
Mar 18, 2005
542
0
0
it's probably easier to adapt the VMs (of languages that possess one) so that they make a more efficient use of the cores. making programmers skilled in a different paradigm would probably generate too much "overhead" to the industry. also, trying to make all new programmers skilled in FP could result in too few skilled individuals being graduated/trained....

i hope AMD thinks differently from Intel and eventually pulls another "Athlon" out of their hat to give Intel some extra incentive.... they might be getting lazy again.
 

degibson

Golden Member
Mar 21, 2008
1,389
0
0
The underlying hardware is imperative von Neumann -- I think from a performance perspective that adapting FP models to VN HW is going to start off at a huge disadvantage. What we need to do, barring some major breakthough in PL expressions of parallelism, is teach programmers how to use what they have: threads.
 

Markbnj

Elite Member <br>Moderator Emeritus
Moderator
Sep 16, 2005
15,682
13
81
www.markbetz.net
Originally posted by: degibson
The underlying hardware is imperative von Neumann -- I think from a performance perspective that adapting FP models to VN HW is going to start off at a huge disadvantage. What we need to do, barring some major breakthough in PL expressions of parallelism, is teach programmers how to use what they have: threads.

I was hoping you'd weigh in here, as I know you have some background in this. If I understand it correctly, one of the advantages of FP with regards to parallelism is the encapsulation of state as function parameters on the stack. You mention teaching programmers how to use threads, but it seems to me the real question is how to architect the application environment in which the threads are run. Nevertheless, even from my still mostly-ignorant perspective it seems to me that it would be easier to teach someone to create a thread-safe data structure and use it properly than it would be to teach them FP. I'm at that stage where I am so steeped in the imperative model that I simply have a hard time visualizing the flow of a complex program as a series of declarative function evaluations.
 

Ken g6

Programming Moderator, Elite Member
Moderator
Dec 11, 1999
16,043
3,529
65
Originally posted by: Markbnj...it seems to me that it would be easier to teach someone to create a thread-safe data structure and use it properly than it would be to teach them FP.
I personally think almost anything is better than threads. Programming with threads can get you "tangled up" in issues like deadlock or race conditions, which often can't be reliably reproduced. FP, the semi-FP model of Parallel Python, and/or pre-made, (supposedly-)thread-safe data structures from libraries are all better than going it alone.

Originally posted by: degibson
The underlying hardware is imperative von Neumann -- I think from a performance perspective that adapting FP models to VN HW is going to start off at a huge disadvantage.
This is possible; but I imagine in most cases it's not as bad as going single-threaded. People program in high-level languages like Python, even though they're slower than assembly, because the ease of programming in Python outweighs the performance disadvantage.

Of course, there are times when you want threads, just like there are times when you want assembly code. But you also want a higher-level language you can use most of the time.

I should add that I don't expect FP to take off for performance-irrelevant code; that also means that imperative languages need the ability to plug in FP, just like plugging in assembly code (but hopefully easier.)
 

Markbnj

Elite Member <br>Moderator Emeritus
Moderator
Sep 16, 2005
15,682
13
81
www.markbetz.net
I should add that I don't expect FP to take off for performance-irrelevant code; that also means that imperative languages need the ability to plug in FP, just like plugging in assembly code (but hopefully easier.)

Well that part, at least, F# appears to have nailed for the .NET world. You can code up a set of functions and then wrap them in a class type that can be used in any .NET language.
 

Crusty

Lifer
Sep 30, 2001
12,684
2
81
Originally posted by: Ken g6
Originally posted by: Markbnj...it seems to me that it would be easier to teach someone to create a thread-safe data structure and use it properly than it would be to teach them FP.
I personally think almost anything is better than threads. Programming with threads can get you "tangled up" in issues like deadlock or race conditions, which often can't be reliably reproduced. FP, the semi-FP model of Parallel Python, and/or pre-made, (supposedly-)thread-safe data structures from libraries are all better than going it alone.

Originally posted by: degibson
The underlying hardware is imperative von Neumann -- I think from a performance perspective that adapting FP models to VN HW is going to start off at a huge disadvantage.
This is possible; but I imagine in most cases it's not as bad as going single-threaded. People program in high-level languages like Python, even though they're slower than assembly, because the ease of programming in Python outweighs the performance disadvantage.

Of course, there are times when you want threads, just like there are times when you want assembly code. But you also want a higher-level language you can use most of the time.

I should add that I don't expect FP to take off for performance-irrelevant code; that also means that imperative languages need the ability to plug in FP, just like plugging in assembly code (but hopefully easier.)

If you have the right tools and are diligent with your coding you can avoid most of those problems with threading. 95% of my programming now includes multiple threads and I don't even have to worry about deadlocks or race-conditions(it's just part of how I code now). Sure I actively write code to prevent those issues, but with proper software design doing so is trivial :)

 

degibson

Golden Member
Mar 21, 2008
1,389
0
0
LONG POST WARNING.

Originally posted by: Markbnj
If I understand it correctly, one of the advantages of FP with regards to parallelism is the encapsulation of state as function parameters on the stack.
It is true that functional languages tend to have better encapsulation than imperative ones -- mostly because functional languages, by definition, don't have data in the first place. There are disciplined approaches to von Neumann computation that have both data and encapsulation.

You mention teaching programmers how to use threads, but it seems to me the real question is how to architect the application environment in which the threads are run.
Naturally, it would be a lot better to just provide a high-performance parallel environment that is easy to program. If somebody figures out exactly how to do that, it isn't just a PhD thesis -- its a nobel prize.

Nevertheless, even from my still mostly-ignorant perspective it seems to me that it would be easier to teach someone to create a thread-safe data structure and use it properly than it would be to teach them FP.

Creating a thread-safe data structure is really very very hard. I've taught students to do it -- and, in my opinion, only the top ~2.5% truly understand it well enough to go off and make a new thread-safe data structure. I've had a fairly bright body of students.

On the other hand, teaching students to use thread-safe data structures isn't all that hard at all. Unfortunately thread safety isn't the really hard part of threading -- its finding the parallelism in the first place that is hard. Too many programmers have no idea where the performance goes, so they have no idea what to parallelize.

I'm at that stage where I am so steeped in the imperative model that I simply have a hard time visualizing the flow of a complex program as a series of declarative function evaluations.

Thats OK -- because on nearly every commercial system ever built it is not a series of declarative function evaluations. ;) Its an interwoven sequence of discrete instructions, loading from, manipulating, and storing to a discrete, flat memory.

Originally posted by: Ken g6
I personally think almost anything is better than threads. Programming with threads can get you "tangled up" in issues like deadlock or race conditions, which often can't be reliably reproduced. FP, the semi-FP model of Parallel Python, and/or pre-made, (supposedly-)thread-safe data structures from libraries are all better than going it alone.

As I stated above, I agree that thread safety can be provided through third-party libraries. A parallel STL, if you will. But thread safety isn't that hard -- it can be taught, and determinism can be reached once you've been trained to write deterministic, threaded code. The hard part is finding the parallelism in the first place. Its not clear to me that FP does that.

This is possible; but I imagine in most cases it's not as bad as going single-threaded. People program in high-level languages like Python, even though they're slower than assembly, because the ease of programming in Python outweighs the performance disadvantage.
And as long as performance doesn't matter, parallelism won't matter. The only good reason to go parallel is for performance -- if your programming model sucks enough to force multiple threads (I'm looking at you, Windows), you should switch to something else.
 

imported_Dhaval00

Senior member
Jul 23, 2004
573
0
0
As far as I can tell, M$ has already embedded concepts of FP in Workflow Foundation at a higher level. For people who have coded a WF project, they'll know that there are 'functions' associated with each code path. All these functions - logical steps - will be executed in parallel given the code paths are mutually exclusive. This is not explicit FP, but you have finer control over your class structure with the ease of modularizing the logic. WF is quite new, but once you have consumed it and 'designed' it, you'll see some obvious resemblances to FP.

The same concepts apply to Integration Services (Workflow Foundation evolved from SSIS). Most folks don't think about FP when using WF, SSIS, etc. because of the way FP is taught - I have seen a bunch of white papers and none of them come back to the technologies discussed above to elucidate how FP is already being consumed - maybe not in its pristine form, but maybe somewhat of a tainted form wherein it is mixed with regular OOP concepts and programming.

Also, someone mentioned about not needing 16-32 cores. We already have servers being run on specs like six quad cores and 128GB RAM that undergo 50-60% load consistently. With things like DB mirroring, constant backups, state management, and numerous other things, the need for faster systems will spawn upon us before we even know it.

As a side note, I think FP can be easier to comprehend if it is marketed to us along with tools and languages that we all know.
 

Ken g6

Programming Moderator, Elite Member
Moderator
Dec 11, 1999
16,043
3,529
65
Originally posted by: degibson
The hard part is finding the parallelism in the first place. Its not clear to me that FP does that.
You may have found the nail in the coffin of FP here. In theory, every execution of every function in a functional program should be its own thread. That's a lot of threads! I imagine the threading overhead is huge on our current 2-4 core processors. I seem to recall one of the Slashdot posters saying that most FP languages don't do threading unless you explicitly enable it. Maybe this is why?

I wonder whether the threading overhead would be such a problem on a hypothetical 128-core processor of the future?
 

Crusty

Lifer
Sep 30, 2001
12,684
2
81
I don't think it's going to matter how many cores you have, there will always be way more threads executing then cores.
 

imported_Dhaval00

Senior member
Jul 23, 2004
573
0
0
Originally posted by: Ken g6
Originally posted by: degibson
The hard part is finding the parallelism in the first place. Its not clear to me that FP does that.
You may have found the nail in the coffin of FP here. In theory, every execution of every function in a functional program should be its own thread. That's a lot of threads! I imagine the threading overhead is huge on our current 2-4 core processors. I seem to recall one of the Slashdot posters saying that most FP languages don't do threading unless you explicitly enable it. Maybe this is why?

I wonder whether the threading overhead would be such a problem on a hypothetical 128-core processor of the future?

I am not sure about Java, but .NET used a ThreadPool that it manages internally. The entire .NET event-delegate concept revolves around the ThreadPool. With every iteration, the logic of how the ThreadPool is managed has only improved.

There was a blogpost on MSDN a few months ago regarding F# getting its own ThreadPool. Essentially, when your program initializes, the Framework scans for the number of processors available before it allocates resources to the Pool. Beyond that, all the requests are queued and threads will service your requests as resources become available.
 

Markbnj

Elite Member <br>Moderator Emeritus
Moderator
Sep 16, 2005
15,682
13
81
www.markbetz.net
And as long as performance doesn't matter, parallelism won't matter. The only good reason to go parallel is for performance -- if your programming model sucks enough to force multiple threads (I'm looking at you, Windows), you should switch to something else.

Various Windows frameworks may use multiple threads in message processing code, or wherever, but I don't know that the platform itself forces you into multiple threads. Most of my Winapps have been single threaded.

Anyway, you got down to the heart of the matter, imo. Where does performance really matter? And where it does matter, how can the workload be partitioned such that threads make a difference?

There is a lot of talk about multicore hardware architectures and how gains in throughput are going to have to come from increased parallelism, but the fact is we have large-grained parallelism now, in preemptive multi-tasking of processes. Every modern OS can take advantage of multiple cores immediately. There are 56 processes and several hundred threads running on my WinXP system as I type this.

Most of those apps are idle, waiting for me to punch a button or move the mouse. In the event-driven paradigm where most things happen in response to user input, performance is not an issue for the vast majority of applications. Most users will perceive performance issues not in terms of how fast an individual app responds to an event, but in the overall speed and efficiency of the OS and desktop environment (driven largely by I/O and graphics constraints).

So where are we really hurting for cycles? Games are almost always hardware bound somewhere: I/O, GPU, or CPU. The opportunities and challenges of parallelism in areas like processing polys and textures or applying behavior algorithms to bot state, are obvious.

Few other consumer applications present such obvious opportunities, though, and many of the ones that do (DVD encoding/decoding, for example) can often work with the process model efficiently (i.e. spawn a process to decode a stream and write it to a D3D surface). Non-consumer applications in engineering, financial analysis, modelling, etc., offer a lot of opportunity, but many of these are already highly parallel.
 

degibson

Golden Member
Mar 21, 2008
1,389
0
0
This is a great discussion. Lets keep it going!

Originally posted by: Ken g6
In theory, every execution of every function in a functional program should be its own thread. That's a lot of threads! I imagine the threading overhead is huge on our current 2-4 core processors. I seem to recall one of the Slashdot posters saying that most FP languages don't do threading unless you explicitly enable it.

Yes, that is a boatload of threads. But the problem isn't finding threads, its finding parallelism. They aren't the same because executing every function as a separate thread is enormously wasteful. Thats why we have variables -- to cache the results of computations. FP is fine from the perspective of a model, but it breaks down in practice. In short: executing FP functions in parallel is fine, and will speedup nicely, but its simply because the baseline is bad to begin with.

Originally posted by: Crusty
I don't think it's going to matter how many cores you have, there will always be way more threads executing then cores.

I disagree. Tilera has 80 cores already. I'm only running 42 processes atm, 523 threads in all. As Markbnj has pointed out, most of them are sleeping. Give Moore's law 2-3 generations: hundreds of on-chip threads. Will they actually build it? Probably not. Why? Because nobody knows how to write code for it (except a few very specialized codes).

Originally posted by: Markbnj
Various Windows frameworks may use multiple threads in message processing code, or wherever, but I don't know that the platform itself forces you into multiple threads. Most of my Winapps have been single threaded.
This is a good point: the API was purposely 'threaded', for some definition of 'threaded'. Windows didn't invent the event-driven interface of course, but they sure made it a fact of life for hundreds of thousands of programmers. Arguably, its a good thing. Event-driven programming is akin to threading -- it has a lot of the same horrible problems.

And now to what I agree is the heart of the matter:
Where does performance really matter? And where it does matter, how can the workload be partitioned such that threads make a difference?
Exactly. Let us consider three kinds of boundedness when it comes to computation:

*I/O bound computation: Threads are reading files, waiting for network traffic, responding to mouse movements, etc. How can parallelism help here? Actually, it can help a lot, under the right circumstances. Sending many file requests to a network file system can actually get them to be serviced in parallel, thereby increasing thoughput. But there's a limit -- one cannot improve a network link's fundamental bandwidth by beating it with threads. Nor is it useful to process mouse movement at a faster fundamental rate than the rate at which it arrives from the user.
A special case of I/O boundedness is graphics computation. Some may call this computation bounded, but since it happens across the I/O bus, I'm lumping it in with I/O. Explicit threading isn't going to help here -- its a good thing the GPU guys already have such a good handle on parallelism in their architectures, APIs, and JITs. Parallelism helps a LOT in that kind of extremely regular data-parallel raw computation.

*Memory bound computation: The limiting factor for these computations is either:
1) The round-trip time to memory (memory latency): e.g.: linked list traversal
2) The rate at which values can be fetched from memory (memory bandwidth): e.g.: array traversal.
In either of the above cases, Little's law puts hard performance bounds on single threads for case 1 and an arbitrary number of threads for case 2.

*Computation bounds: These are actually exceedingly rare. In order to truly be computationally bound, an application has to have a small enough working set such that it resides almost entirely in on-chip cache (thereby suffering neither from memory latency nor bandwidth constraints), and must furthermore saturate the computational capabilities of its host (Nehalem for instance, is no slouch). Threads can help a lot here as well: partitioning computation among multiple threads in this category can lead to linear speedup.

Something we haven't considered much so far is communication and synchronization. I think we'll probably all agree that the latter is pretty d*mn hard, and the former rather complicates simple models of how computation works -- but they're there, and both affect how well an application parallelizes.

So, in closing: to find the parallelism, you must understand your bottlenecks and parallelize the bottlenecks accordingly. The biggest challenge is: how do we go about marrying a deep understanding of these fundamental performance bounds with a convenient high-level abstraction for everyday programmers?
 

Markbnj

Elite Member <br>Moderator Emeritus
Moderator
Sep 16, 2005
15,682
13
81
www.markbetz.net
All excellent points and questions. Sometimes it is not a matter of performance, i.e. simply wanting to execute some operation faster. It can also be a matter of responsiveness, or perceived responsiveness. I remember reading a piece by an author some months ago who slammed the .NET framework for a bunch of reasons. Among them was that it forced you into multiple threads if you wanted to keep the UI responsive (he didn't explain why this was a .NET issue and not endemic to any event-driven model).

I bring this up because the UI and the way it interacts with the application and user offers a whole set of its own opportunities and challenges for parallelism. In most of the cases that author complained about there was actually nothing useful that the user could be doing while an operation completed. That's a key design question. It may actually be easier to run semantically disconnected operations in parallel, simply because they might not be touching the same data at the same time. For example maybe I can repaginate a document and continue editing it. Or I can repaginate a document and open another one. Keeping the UI responsive in those cases requires initiating the operation on a worker thread and returning control to the message pump (or using messier techniques that pollute the implementation of the operation itself with interrupt-and-dispatch code).

But not all applications, perhaps not even most, offer clear opportunities where the user can kick off an operation and continue working. In some cases the progress bar approach is the one that is appropriate. This may have more to do with humans than system design.

So there is performance, and there is flexibility, and both can be an argument for multiple threads of execution. I think the latter is more frequently encountered, actually, than the former. As degibson points out: finding actually parallelism in workloads is hard, and you need that before it makes sense to even consider implementing a parallel approach. Many of the things that take a long time are just really hard to break into pieces. Decoding a stream of bits, compressing a file, defragmenting a volume, are all examples.

We've gotten a little offtrack from the initial topic of FP, which I guess was to be expected. I think we've taken the right branch line, though :). What needs to be examined is not whether FP is the right model to express parallelism, but whether parallelism is the silver-bullet solution to future performance gains everyone in the technical media seems to think it is. My conclusion is that it can't be, for the reasons degibson and others have mentioned here. Most problems are not inherently parallel. People are not inherently parallel. We're very much like Windows: lots of internal threads working all the time, but we reason and act serially.

If parallelism is not the silver bullet, then we accept that future performance gains will be incremental at best, or we need some breakthrough processor architecture that executes serial logic much faster than current architectures do.
 

Cogman

Lifer
Sep 19, 2000
10,277
125
106
Mark, the question becomes, what? What can we do to increase performance at somewhat of a constant rate.

The first attempt (and it is still somewhat an attempt) was to add instruction sets galore. The more the better. But the problem we begin to face is that making compilers that can ultimately make good use of those instructions is a nightmare one that is not easily solved. So then we entered the GHz race to see who could make it to infinity first. That obviously didn't work, we found that making anything faster then about 3.6 GHz becomes economically unfeasible because the manufacturing problems.

Now the current push is to try and make everything parallel. But as you discussed, this already has the inherent problem that not everything can be made to be executed parallelly. Perhaps that is just running into the first problem of the instruction sets all over again, we just aren't able to make compilers smart enough to fix perfectly optimize our code for us.

This may sound too Sci-Fi, or to crazy, but perhaps the focus for software developers really needs to be in AI right now, Perhaps our goal should be to create an AI smart enough to recognize the most optimum ways to arrange code for us to take advantage of all our processors nooks and crannies.

Barring that, I would guess the best place to focus on making code run faster isn't so much in the actual writing of the code, but rather in the compiling of the code. I imagine that if we improve our compiler technology sufficiently we wouldn't have to worry so much about what paradigm we use for coding, rather, we worry about getting the job done and leave the rest to the compiler to take care of.

However, I'm a novice at this stuff, so don't shoot me if my suggestions are too far out of wack.
 

Ken g6

Programming Moderator, Elite Member
Moderator
Dec 11, 1999
16,043
3,529
65
There's a lot more history to CPU design than just adding instructions or the GHz race. Back in the days of the 386, things were very simple. The processor got one instruction, executed it, and moved on to the next instruction.

The 486 introduced pipelining. Pipelining is an assembly line: one instruction moves through several stages to completion, and there are usually several instructions working their way through the processor at any given time. The problem is that if there is a conditional statement, then the pipeline might have instructions in it that shouldn't be executed. The Pentium 4 was the ultimate pipelining CPU, but it went a little too far, and Intel had to back off from that level of pipelining. So pipelines are "tapped out", so to speak. :p

The Pentium introduced a second pipeline that worked in parallel with the first. In some ways it was the first x86 dual-core processor. But the second "core" (actually called an "execution unit") can only work on the instruction following the one sent to the first execution unit, and only if the second instruction doesn't work with any data that the first one works with. So everything works like a single-core processor, but faster.

The Pentium Pro architecture (also PII and PIII) introduced out-of-order execution. Like the original Pentium, the PPro had more than one execution unit; but it also had logic to look for instructions that could be processed, anywhere in the next few instructions, so any section of a few parallelizable instructions was executed in parallel, while still acting like it was serial. Virtually all modern processors, including multicore processors, do this - a notable exception being the Intel Atom. This means that (unless Intel makes a chip with dozens of Atoms, which they very well might) each core is already running small sections of code in parallel, without even touching any other core.

So, chipmakers have already embraced your "AI" that rearranges code for optimum execution. Compilers also try to arrange dissimilar instructions together so that a single core can run them in parallel. But that also means that small-scale parallelism is pretty much tapped out; practically anything to do with multiple cores needs to be on a larger scale.

Certain compiler optimization techniques, particularly those involving simple loops, may lend themselves to creating parallel code automatically. But I don't see a whole lot of room for compilers to parallelize code otherwise; except in the case of functional programming, the original topic of this thread.
 

Modelworks

Lifer
Feb 22, 2007
16,240
6
76
I work primarily in graphics now. Between creating CGI and scripting for it, I am still amazed at how little the applications make use of multiple cores, except during render times. Even there , there is often plugins and things that don't use all cores. I would have thought that this would be one field that would make use of all cores since it isn't real time related and you can execute threads out of order and still get the same result.

The answer I got a lot of the time was that the program was designed with single cores in mind and it is too hard to go back and break up the task to target multiple cores. I think the holy grail will be a processors that can divide single task among multiple cores without the aid of the programmer. Until then I am really starting to look more at the GPU and things like OPenCL, which just released its specs. http://www.khronos.org/registr...pecs/opencl-1.0.29.pdf

 

ASK THE COMMUNITY