There is never any interlacing at 320x240 because of the way that 640x480 is set up to double certain lines sometimes.
Yes, thats it.
NTSC interlaced video is 1 full frame(480 lines) comprised of 2 "fields" broadcast every 1/60th of a second apart (30fps 60hz) Each "field" is comprised of every other scan line(240 lines), 1 field of the odd scan lines, 1 field of the even scan lines.
so:
When you capture at 320 X 240 resolution, you are actually capturing 1 field only at 30 fps.
When you are capturing at 720 or 640 X 480, you are capturing both fields.
Because the computer displays both fields at once rather than every 1/60th of a second, you will see artifacts with the full 480 line captures during periods of motion(looks like a combing effect) unless you deinterlace the video. You don't get interlacing artifacts with 240 line captures, because the computer simply is displaying the single field. The trade off is the 240 line captures will look more jerky during periods of motion, you can really see this during pans.
You can process your 480 line captures using virtualdub, and a deinterlace filter, which can use a number of techniques to eliminate or reduce the interlacing artifacts.