If this were true then I wouldn't have created this thread. I don't trust that anything on that chart with a better battery life than the Nexus 4 will actually last longer.
How can you say that? All the phones were tested the same way. Just because you don't like the N4 results doesn't mean it isn't true. The fact that the phones are tested the same way means that the relative performance of the phones stand. You might not agree with the testing methodology, and you think the N4 should last 10 hours instead of 6, but if your use gets 10 hours instead of Anand's 6, then the other phones such as the SGS3, One X, iPhone 5 should also see a proportional increase. The point is the numbers are a relative performance indicator.
It's like saying I can't get Anand's exact framerates in a GPU test, but I use Anand's GPU tests to understand that upgrading to a Radeon 7850 will give me huge benefits over my Radeon 4870.
The iPhone 5 on LTE has the best battery life on that chart and my experience has forced me to question that result.
My iPhone 5 does ridiculously well against my SGS2 in a day of use. Granted I've only had a few days with my iPhone 5. My gf's SGS3 which seems to do better against the N4 according to Anand sits somewhere between my iPhone and SGS2 in terms of standard use. I'd say his results make sense to me.
I'm not even sure if all the devices on that chart are measure using the new test. I don't think AT's test provide a good way to gauge performance. AT's test has its own arbitrary definition of normal use which, for example, apparently doesn't include the usage of anyone who has to commute each day.
But Anand doesn't replicate real world daily use. He doesn't test standby battery. One thing I'd like to see him test is to use mobiledata and a certain amount of sync that's representative to a typical user. Email, Facebook, Twitter? Go for a day and see the 24 hour battery drop. I think this is where the iPhone does very well.
Talk time is no longer a meaningful way to rate the total battery performance for a modern smartphone just like music playback hours wouldn't be used to rate the smartphone. Placing phone calls and listening to music are just two of many functions that the smartphone is used for each day.
Some people forget the primary purpose of a phone is to make calls. A smartphone may have changed things, but many people still make calls. I make conference calls a lot. Talk time is still a figure quoted on phones, so I think that's fine.
None of Anand's tests really represent something you'll be doing on a daily basis. I don't see people tethering their phones til they die. I don't see people surfing consistently for 6 hours straight or whatever Anand simulates, but I think people WILL surf for maybe an hour spread through the day.
The point is Anand's benches show a systematic way of benchmarking phones. How they perform in standby is a different story, but most likely you can use the relative performance of phones to judge how battery life will be. The iPhone in my experience has been great to me on battery, and my 3 Android phones have never performed spectacular--though each one has gotten better, so I expect the N4 to do better than my SGS2.
I think we're going into a different discussion now. At first you were talking about the 3G vs 4G issue which I find to be the biggest issue about the new benchmarks. The fact that 3G and 4G flipped from previous benches is still weird to me. I really want an explanation of how the tests were done before and how they're done now. More importantly I want to see how the new tests are more representative of smartphone use and how the old tests are not. Because the issue for me is that the general population still believes 3G battery > 4G battery. It's not because of Anand's old reviews either. It's an industry wide accepted belief. Thus when Anand challenges the whole industry belief with a new set of numbers, I'd really do some digging.
This is almost like when HardOCP introduced real world benchmarks or whatever the hell where your GPU tests were bottlenecked by CPU. I'm not say this is as big as a foulup as that, but it's a potentially controversial new way of testing that produces very different results.