That social test is (in my eyes) useless, because static images mean you have no reference, and thus cannot properly evaluate how the face has changed compared to baseline.
Also, I do wonder what they use as ground truth, to say whether someone is right or wrong.
...yeah, I only got 26 right 😉