I am trying to understand this, so do not hold it against me when i am totally wrong.

I read in the gcn white paper from AMD that the scalar unit is mostly for control flow and address calculation and generation and not actually graphics or shader calculations.
Are the scalar unit instructions that control the flow of instructions for the vector units not better suited to be executed out of order ? I can imagine that doing pipelined arithmetic instructions in the vector unit would indeed be difficult to execute out of order. Because i assume, the instructions would be very dependent on the previous results from previous instructions would. That would mean a lot of instructions need to be tracked before independence would be discovered and that may reduce efficiency. Something a compiler may be better at.
That would agree with what you write.