This is probably a pretty broad question (and the answer may get too technical), but I don't understand how a program can take binary data as input (CD-quality stereo sound) and decide what can be "heard" and what can't, just on the basis of 1s and 0s.
