...From this point of view a vertex processor reminds any other general-purpose processor. But what's about programmability? A shader is a program which controls a vector ALU processing 4D vectors. A shader's program can be 256 ops long but it can contain loops and transitions. For organization of loops there are 16 integer registers of counters I which are accessible from the shader only for reading, i.e. they are constants assigned outside in an application. For conditional jumps there are 16 logic (one-bit) registers B. Again, they can't be changed from the shader. As a result, all jumps and loops are predetermined and can be controlled only from outside, from an application. Remember that this is a basic model declared by the DX9.
Besides, the overall number of instructions which can be processed within the shader with all loops and branches/jumps taken into account is limited by 65536. What for such strict limitations? Actually, to meet such requirements the chip can do without any logic controlling execution of loops and jumps. It's enough to organize successive execution of shaders up to 665536 instructions and unroll all conditions and loops in advance in the driver. Actually, every time the program has its constants controlling branch and jump parameters changed, we have to load into the chip a new shader. The R300 uses exactly such approach. Exactly this approach lets us have only one set of control logic and a copy of vertex program shared by all vertex processors. And this approach doesn't make the vertex processor normal - we can't make on-the-fly decisions unique for each vertex taking into account criteria calculated right in the shader. Moreover, such unrolling of jumps and loops can make a process of replacement of the shader or its parameters controlling jumps quite demanding in terms of CPU resources. That is why ATI recommends to change vertex shaders as seldom as possible - the cost of such replacement is comparable to change of an active texture...