It's a common explanation that isn't accurate. There are different ways to calculate quantum mechanics and one of the more popular is the path integral. The path integral basically sums up the wavefunctions that represent a given path through the parameter space (more accurately the states). For example, let's say you have a photon source at A and a detector at B. A photon is created at time t=a. So the path integral can be used to calculate the probability amplitude (which can be used to find the probability) that you detect the photon at location and time (B, b) that was created at (A, a). Mathematically, this is done by saying that the probability amplitude, K, of this event is K(A,a; B,b) and is equivalent to the superposition of K(A,a; C,c) + K(C,c; B,b) for all (C,c) where (C,c) is an arbitrary intermediate position and time.
In essence, you can think of it as being that the amplitude from A->B is equal to the summation of all amplitudes that represent A->C->B. You can further generalize this by saying that you can do A->C->D->B and so forth. If you allow for infinitesimal differences between the positions and times, then you turn it from a summation into an integral over every possible path between A and B. But these are not physical paths, we are not saying that the particle actually travels along these paths. In addition, these are actually quantum states as opposed to saying that the photon is at point C at time c.
What happens though is that for paths that would seemingly break special relativity (and the path integral is normally used for quantum mechanics that obeys special relativity, e.g.: Quantum Field Theory and QED) the contributions of these paths to the probability amplitude are generally negligible. The "faster than light" path G has a contribution, but its neighboring path G+\delta cancels out this contribution (this is along the lines of integrating a complex function far from the path of steepest descent, the classical paths are generally steepest descent paths and contribute the most to the probability amplitude).
So long story short, this is another victim of trying to conceptualize the mathematics for laymen explanations. Feynman describes the exact same phenomenon in his QED using his stopwatch explanation. The stopwatch's progression is the phase progression of the path integral's integrand over the path in the configuration space. Paths, like the "faster than light" paths, that do not contribute will progress the stopwatch very quickly over short movements in the configuration space. This indicates that the phase changes wildly while transitioning between neighboring paths. This causes the contributions from neighboring paths to cancel out leaving only contributions from groups of paths that are slowly varying (which again usually lie around the classical paths).