Historical reasons mostly. When the NTSC standard was first developed, the frame rate was set to match the mains frequency. This was done to reduce interference AC power systems cause with analogue sets.
Analogue TV works by interlacing images. A single frame is made up of two fields of alternating lines. A single field is drawn at 60hz, while a single frame is drawn at 30hz. The same rates were brought into the digital age to ensure backwards compatibility with old video.
Displays can only run at even multiples of their native refresh rate. So a 60hz monitor can natively display video at 5, 15, 30, 60 fps. It does so by repeating frames. 120 and 240hz displays came about to natively display film's 24fps rate.
For non-standard resolutions, displays will employ pulldown. Back in the old days, films converted for TV used 3:2 pulldown for NTSC displays. It would show two full frames, then half a frame. This adds judder to the image. I'm not 100% sure how digital displays handle this, but I assume it's something similar. Probably repeating frames, or making certain frames run longer. However, it introduces judder into the image.
If a game's frame rate is faster than the native refresh rate of the display, you can get screen tearing. This is because the computer is outputting more image data than the display can handle. Using V-Sync clears that up.
IIRC, a lot of modern digital TVs are compatible with PAL signals as well. So they can display video at 50 and 25fps.