so a mobo raid has the HW for raid but all of the work is done by the CPU (?)
and a nice raid card (over $200) has an onboard CPU that does all of the work, correct?
That is correct. However, RAID 0, RAID 1 and RAID 10 are trivially simple algorithms, and the CPU usage is barely measurable even on something like an atom. On something like a multi-core ivy bridge, the CPU usage is utterly negligible (like less than 2% of 1 core).
Good quality RAID cards do have on-board CPUs. These are more useful when you are using a more complex RAID configuration - like RAID5 and RAID6. These specialist cards use CPUs which are specifically tuned to be very fast at the specific RAID5 and RAID6 algorithms.
However, when compared with a modern general purpose CPU, these RAID CPUs are very weak and can become the bottleneck if you are running RAID6 with 8 or more fast hard drives. In contrast, repeating the same test with software raid on a sandy bridge CPU, had the drives maxed out with 10% CPU usage of 1 core. In practice, this is a very contrived scenario, and the bottleneck will only be relevant in the very unusual situation of long sequential writes.
Where hardware RAID cards win (and which cannot be replicated in software) is in the use of battery (or flash) backed cache RAM. This cache has a magical effect on RAID5/6 performance (for random writes) and drastically improves data integrity. Even with a weak on-board RAID processor, the benefits of the cache typically make it a more desirable solution to software for datacenter use.