Eh it's not all that easy to precisely describe non-mathematically.
FM signals vary in frequency from moment to moment since the carrier frequency varies in response to your audio amplitude. The variation in frequency is a lot so high fidelity music can be played.
The channels are 75 kHz wide for commercial FM, which is around three times the frequency width that your ears can easily hear (maybe 25 kHz).
So to lock on to an FM signal a classical semi-old but not ancient style FM radio has a PLL that makes a local carrier frequency somewhere around the middle of the channel you're tuned to. It has a flexibility of that carrier frequency so it can easily move anywhere around the 75 kHz width of the channel you're approximately tuned to.
To demodulate the FM signal even in the normal case the PLL in the radio must 'capture' the incoming signal which is basically like finding the strongest frequency in the nearby frequencies (over the 75 kHz or so bandwidth) and tuning the local PLL frequency to 'chase' that incoming signal around as it moves up and down in frequency.
If there was a large signal at a different frequency than the PLL is tuned to within the PLL's capture bandwidth, the PLL would just re-tune itself quickly to be at whatever that other frequency was, chasing it.
If the PLL is successfully chasing a large signal (i.e. it has locked on) and there's a smaller signal at some other frequency, that just looks like "noise" to the PLL since it is really going to lock on to the stronger signal since PLLs naturally lock on to the average frequency and are somewhat insensitive to noise that isn't correlated with the frequency they've captured.
Since the FM modulation frequency bandwidth is so large (75 kHz) relative to audible frequencies, a distant station's weak FM signal in the same frequency band as a local station's strong FM signal will appear as sort of uncorrelated broadband noise relative to the locked in strong signal. Since the PLL filter and audio filters diminish high frequency noise the distannt station's noise signal will be attenuated by the PLL and audio demodulation filters.
I guess it is sort of like relativity -- if you're in a stationary car it is easy to read a sign on the road because relative to you it isn't moving. But in a stationary car it is hard to get a good look at the license plate of a car driving by because relative to you it is moving rapidly.
But if you're in a moving car, it is harder to see stationary road signs since they're moving by quickly, but it is easy to see details of cars going in the same direction as you since relative to you they're barely moving.
On the other hand in a car moving in one direction it is much harder still to see the details of a car moving in the opposite direction since relative to you it is moving away at an extremely high speed.
So a PLL is like being on a playground swing, it can oscillate at any frequency. If you have neighboring swings to your left and right you could choose to change your motion to be more in sync with the motion of ONE of the neighboring swings until you're essentially perfectly synchronized in motion with it after a bit. But if you then look over at the other neighboring swing it will just be a confusing blur because it'll probably be even more "chaotic" and different in its motions relative to you when you're already oscillating in sync with the swing you're chasing than it would be if you were just standing still.
So if you imagine having a conversation with your swing partners, it would be easy and perfect with the one you're synchronized with, and quite difficult with the one you're not at all synchronized with.
Thus you have "captured" the signal that most prominently caught your attention and the other one has been seemingly attenuated.