if they wanted they could put graphics processor with the main processor, but then it will not be enough for some or more or equal for some, so better to leave it on users hand. But, when they will start to put one, there WILL BE definitely option for using another(better) one.
Becuase of demand / innovation, primary processors are increasing their multimedia processing power (MMX, SSE, 3dNow, AltiVec, VIS, MAX, MDMX, SH4, etc).
By adding Video Level 1/ VL2/VL3 cache and incorporate more graphics power, they could minimize the need for external graphics processor / card. But, then we will end up with lot of valuable cpu resource/cycles wasted up fetching/waiting for video data.
Those were mainly about OUTPUT, but now about INPUT ....
Where we are very behind, or, KEPT behind. NOT ONLY VIDEO ALSO AUDIO. Of course among many reason, some are, less research, availability, (and another might be ..."its too much power for regular people").
If that would have been the chosen path (incorporating video/audio processing unit),
or, Will Be,
Then,
We could/will use Computer's assistance to SEE/look(/hear) around our environment / objects / peoples & their movement / tinyest facial expression change recognition / etc through their eye/CCD Camera/(?) & ear (microphone / (?)) and work/process on those data effectively, quickly and take proper response (like, computer driven plane/car, personal assistant, etc).
Of course, THEN, some of us will have to introduce video/audio (recognition / tracking / tracing) processing unit (or, ISA (Instruction Set Architecture)) which is directly coupled / connected to its video/audio input interface (CCD/(?/GaAs)/microphone) and ALSO with their main/primary processor through HyperTransport / HyperThread / Multi-PARALLEL Processing bus. If not on Multi/Parallel bus, then at least on a separate card with Video/Audio Recognition chip for Shared bus.
yeah ... yeah .... i ... know ,,, .... it's .. like .. fiction ,,,,
but that is some(/lot) of ours dream ... life. so, not impossible.