The reason we "test" our overclock is to gain some idea of its "time to failure" (i.e. how long before it becomes unstable).
Is it stable for 5 minutes? 5 hours? 5 days? 5 weeks?
The reason we add the word "stress" in front of the word "test" in creating a more rigorous "stability-test" is because we don't want to wait 5 days or 5 weeks to develop enough data for us to comfortably determine the typical time-to-failure for our overclock.
Playing games is not a "stress test", it is simply a "test". There are no acceleration effects entering into the picture
in the engineering sense.
The OP's issue is not with stress testing per se, rather their issue is with the fact that the suite of available stress testers on the consumer scene are very limited in terms of the scope of what they test.
Our modern CPUs have nearly 2000 instructions in the ISA, any one of which can become unstable while overclocking.
A stress test program like Prime95 or IBT might stress-test say 200 of those 2000 instructions, at best, easily leaving 90% of the ISA completely untested (let alone not stress-tested) for stability.
Games will use different instructions, as well as test the stability in other components in the system that are not part of the processing core itself.
So it should not be of any surprise that a system which passes Prime95 can still be unstable with a game, or vice versa for that matter.
The bigger concern people should have when overclocking and not having a way to detect errors and instability in the way that a bonafide stress-tester will detect them is that they are leaving their system completely wide open for
silent data corruption to occur.
Photo data corruption; in this case, a result of a failed data recovery from a hard disk drive
Your OS can become unstable, you start getting more video-driver related crashes to the desktop while playing games or doing other stuff as simple as browsing the web. Its a real pain because you don't really have a stable OC but now you have to redo your entire install.
Stability is a multi-faceted issue because it involves the interaction of multiple components in the compute topology. Relying on just one program, or one particular suite of programs, is a recipe for guaranteed data corruption and eventual disaster.
It doesn't make any one approach or program superior to the others, you absolutely must do "all of the above".
Intel and AMD do this too, and there is a reason why they do it despite the associated costs of validating each and every CPU's instruction in the ISA.
If you think about it, it is a silly sort of ignorance-fueled-arrogance on our part for us to think we can simply get away with declaring our overclocked CPU "stable" simply by running a freeware program that at best is checking and validating maybe 10% of the ISA (be it from testing with Prime95 or a couple of games).