It really does depend on the language. If you compile a scala project with scalac its single threaded so as the project gets larger it just gets slower. Java uses a few threads but doesn't scale out to 8 or so cores. Certain plugins with Maven can help a little with parallel tests and such but its not a language that deals with parallel compilation in general, anything targeting the JVM struggles to get parallel behaviour. C++ compiles do multithread well because the compiles of individual files can be done separately as the files each declare (via header files) what is available. Then at link time these are actually combined and the availability is checked fully, but while this works for c/c++ it doesn't work for many other languages. Erlang for example is another language I saw almost no multithreading on.
Have you ever written a compiler? Its almost all branches. You lexer takes a character/word and then compares it against a list of possibilities to determine its type, then emits that token and the parser will take the token and compare it against all the possibilities it has and choose based on the type of it what direction to take. Basically every character goes through multiple branches in order to determine what it is so that syntax and language semantics can be checked. I have written quite a few compilers and I its a lot of branch code. Optimisers are especially branch heavy, they go through the abstract syntax tree looking for a match for their optimising pattern, which is a lot of comparisons with a no answer and a few yes ones.
Really good branch prediction and prediction calculation really help compilers along. One of the big things that Haswell beefed up was its branch predictor and its cache, two things we would seriously expect to improve a program with a lot of branches. Yet in most games and calculation heavy apps the benefits from Haswell are disappointingly low, <10% the grand majority of the time even with the additional calculation port. In C++ its +25%. If you profile the gcc compiler you'll see the reason, huge numbers of branch mispredictions from the CPU. That is most what compilers do, choose between all the options to work out which byte to emit.
This is the reason we see a 4 core/8 thread Intel CPU beating convincing a 4 module 8 core AMD chip. The AMD ought to win, its got a lot more Integer performance with those 8 integer ALUs substantially more than the Intel CPU, its just not as good at branch prediction and cache performance and it looses out heavily despite its apparent advantage. Compiling is nasty for CPUs, its one of the worst cases of calculation I know of because in and amoungst all those branches you also have some really complicated algorithms going across large amounts of data in RAM. It really uses a very specific set of resources which right now Intel's design is a lot better, basically twice as good as AMD.