I feel there are various issues that weren't really touched upon. First off, remember God (Donald Knuth's quote): "Premature optimization is the root of all evil."
Usually, code bloat is pretty simple to work around.
0. Design just the features you need.
1. Write code in favorite language (that can support the solution).
2. Prefer standard libraries to home-made solutions.
3. Profile code to find performance-critical sections.
4. Optimize whereever it's necessary or particularly beneficial. This can entail replacing use of the standard library, or even rewriting in a higher-performance language (i.e. Assembly where appropriate).
Admittedly, #3 and #4 aren't really well-taught by academia. It's usually acquired the "hard way", through hands-on experience. Also, the tools to support profiling tilt heavily to the languages/platforms with broad commercial support. If you're not coding on such a platform, you (or the resident guru) may have to develop your own profiling mechanisms.
If you adhere to some guidelines such as these, and the resulting application is too slow, then either the language chosen was inappropriate, the design was too rich, the code is of poor quality, it wasn't well-optimized where crucial or simply the algorithms are too demanding. I.e. MPEG4 encoding just isn't cheap, and you can't make it "fast" no matter how good you are.
On the topic of standard libraries and higher-level languages, it's a dual-edged sword. You definitely should use higher-level languages with rich standard libraries; you're more productive and standard libraries are well-implemented (relative to what the average developer can do with limited time). However, standard libraries usually implement general algorithms that support a variety of types of usage. There will be some cases where you'll need to hand-roll a more specific solution to wring out more performance.
Where the problem with 3GLs exists is because the developer doesn't conceptualize what one line of source code really expands to (potentially hundreds or thousands of lines of machine code). We trade-off increased features and convenience with performance. At least 80% of the time, the trade-off is highly worthwhile.
On the topic of compilers and optimizations, there really are too different issues. Compilers (and JITs) optimize lower-level constructs better than most programmers could. It's silly for me to worry about hand-tuning a loop unless I have direct knowledge that the loop needs to run faster. However, where programmers would optimize is usually at a higher-level than where compilers optimize. For example, the low-level design might be a little naive or too heavy, or the libraries used too generic. Those are the types of optimizations a skilled developer would know how to deal with, not micro-level instructions. And again, until you profile, you don't know where to optimize.
Again, I'd say that most trained developers write decent code, esp. if they are using standard libraries. Performance problems arise because the design itself is flawed, or because they haven't been trained to profile and optimize the code to find the performance hotspots. If I had to guess, I'd say that for the typical trained programmer, writing good comments and error handling are more significant problems than simply low-performing code. With higher-level languages and libraries, we're really writing just a lot of glue code and business logic.
If anyone has good book references on debugging and profiling, preferably for Java, then I'm all ears.