Article 'CPUOverload Project: Testing every x86 desktop processor since 2010' - Anandtech Bench

UsandThem · Jul 20, 2020

Is testing 900 CPUs ultimately realistic? Based on the hardware I have today, if I had access to Narnia, I could provide data for about 350 of the CPUs. In reality, with our new suite, each CPU takes 20-30 hours to test on the CPU benchmarks, and another 10 hours for the gaming tests. Going for 50-100 CPUs/month might be a tough ask, but let’s see how we get on. We have these dozen or so CPUs in the graphs here to start.

I'm glad I'm not the one having to test each of the CPUs. That sounds like pure torture.......but it will be nice to have something to quickly compare newer processors to what has come before.

Many of use to go to UserBenchmark until they ruined their site with their boneheaded decisions over the last 18 months or so.

MrGuvernment · Jul 21, 2020

Pretty insane. I mean, set up as many test benches as you can, boot em all up and script it all to run and walk away and come back 30 hours later. Hopefully have all the results auto dumped into a DB or something and parse data from there.

blckgrffn · Jul 21, 2020

I was waiting until someone else started a thread - but I think that the traffic potential is huge for Anandtech and that helps keep the lights on around here. I am all for that. Quality content is king. That was my primary response.

Having data collected in a consistent as possible way is very admirable too, and hopefully will benefit many of our future discussions here as well.

borandi · Jul 22, 2020

MrGuvernment said:
Pretty insane. I mean, set up as many test benches as you can, boot em all up and script it all to run and walk away and come back 30 hours later. Hopefully have all the results auto dumped into a DB or something and parse data from there.

The data gets dumped on to a local NAS, gets a sanity check, then is input into the public database which has been going for over a decade: www.anandtech.com/Bench

moinmoin · Jul 22, 2020

I don't see a mention in the article of how the whole issue of "stock" behavior is being handled. I'd consider that part the most tricky part of any benchmarking, considering all the efforts especially lately to redefine what stock entails and what doesn't (see evolving definition of TDP, hidden board optimizations etc.).

dullard · Jul 22, 2020

moinmoin said:
I don't see a mention in the article of how the whole issue of "stock" behavior is being handled. I'd consider that part the most tricky part of any benchmarking, considering all the efforts especially lately to redefine what stock entails and what doesn't (see evolving definition of TDP, hidden board optimizations etc.).

TDP hasn't really had an evolved definition. The bigger problem was that people were simply misunderstanding TDP at the beginning. I can't count the number of posts that said things along the lines of Intel desktop chips always run at full turbo at the rated TDP. That is NOT how TDP was ever defined by Intel. Yes, older Intel chips could run temporarily at full turbo at TDP, but that was a short blip in time for a few chips. For Intel, TDP has always been defined as typical maximum power used at base clocks. Turbo gives higher speeds but also gives higher wattage. That has always been the definition for Intel.

"Thermal Design Power (TDP) represents the average power, in watts, the processor dissipates when operating at Base Frequency with all cores active under an Intel-defined, high-complexity workload."

The rest of your post makes sense though. My related concern that is rarely addressed is: CPU speed in benchmarks will be different when manually performed (probably with breaks in between to let the system cool down) and scripted (probably without breaks between software benches). That is, after the system has maxed out its ability to turbo, how does it really perform? This is important because many software programs where CPU speed really matters are long-running (possibly hours, days, or weeks) and this is not usually captured in a few seconds long benchmark run while the system is cool.

blckgrffn · Jul 22, 2020

dullard said:
TDP hasn't really had an evolved definition. The bigger problem was that people were simply misunderstanding TDP at the beginning. I can't count the number of posts that said things along the lines of Intel desktop chips always run at full turbo at the rated TDP. That is NOT how TDP was ever defined by Intel. Yes, older Intel chips could run temporarily at full turbo at TDP, but that was a short blip in time for a few chips. For Intel, TDP has always been defined as typical maximum power used at base clocks. Turbo gives higher speeds but also gives higher wattage. That has always been the definition for Intel.

"Thermal Design Power (TDP) represents the average power, in watts, the processor dissipates when operating at Base Frequency with all cores active under an Intel-defined, high-complexity workload."

The rest of your post makes sense though. My related concern that is rarely addressed is: CPU speed in benchmarks will be different when manually performed (probably with breaks in between to let the system cool down) and scripted (probably without breaks between software benches). That is, after the system has maxed out its ability to turbo, how does it really perform? This is important because many software programs where CPU speed really matters are long-running (possibly hours, days, or weeks) and this is not usually captured in a few seconds long benchmark run while the system is cool.

As long as the testing methodology is consistent and repeatable, we'll just have to take it for what it is - another data point. I am going to trust that they are going to attempt to keep ambient temps roughly consistent regardless of what time of year the tests are running.

I mean, we could argue that all of these setups should be running under a big WC loop to provide optimal CPU cooling conditions, argue about which TIM is provided, dictate the upper Db limit that would be acceptable for air cooling or any number of other factors that need to be controlled for until we are all blue in the face. We have lots of practice

We could choose to be constructively critical of their methods now and then accept that it is a good faith effort when the results start rolling in. That's my plan.

tamz_msc · Jul 22, 2020

dullard said:
My related concern that is rarely addressed is: CPU speed in benchmarks will be different when manually performed (probably with breaks in between to let the system cool down) and scripted (probably without breaks between software benches). That is, after the system has maxed out its ability to turbo, how does it really perform? This is important because many software programs where CPU speed really matters are long-running (possibly hours, days, or weeks) and this is not usually captured in a few seconds long benchmark run while the system is cool.

How do you expect workloads which take hours if not days at the very least to finish to be used realistically as a benchmark for a review website? The only candidate that fits the criterion of requiring hours to finish is SPEC2017, and that is included in the suite.

dullard · Jul 22, 2020

tamz_msc said:
How do you expect workloads which take hours if not days at the very least to finish to be used realistically as a benchmark for a review website? The only candidate that fits the criterion of requiring hours to finish is SPEC2017, and that is included in the suite.

It is not realistic for websites that need to rush out the review the instant an NDA expires and they may have only had a brief time with the CPU in the first place. That said, it means that virtually all CPU review websites are already testing a scenario that doesn't really matter to the potential CPU buyer either.

This is an opportunity to do it right. There is no strict deadline to get an old CPU reviewed. For this project, it is possible that test suites can be made with much longer benchmarks to see how CPUs perform when actually stressed over extended periods.

moinmoin · Jul 22, 2020

dullard said:
TDP hasn't really had an evolved definition. The bigger problem was that people were simply misunderstanding TDP at the beginning. I can't count the number of posts that said things along the lines of Intel desktop chips always run at full turbo at the rated TDP. That is NOT how TDP was ever defined by Intel. Yes, older Intel chips could run temporarily at full turbo at TDP, but that was a short blip in time for a few chips. For Intel, TDP has always been defined as typical maximum power used at base clocks. Turbo gives higher speeds but also gives higher wattage. That has always been the definition for Intel.

"Thermal Design Power (TDP) represents the average power, in watts, the processor dissipates when operating at Base Frequency with all cores active under an Intel-defined, high-complexity workload."

Point taken. I didn't want to talk about the definition of TDP per se but actually the application of its rules. The rules may still be valid, but with Intel extending Tau every gen, and board manufactures manipulating that value as well at "stock" makes it increasingly harder to take benchmark results at face value.

I guess for me the best solution to this issue would be always including the amount of energy in joules expended for reaching a given benchmark result. That would make results more comparable again, regardless how stock or tuned the used system settings are.

Search

Article 'CPUOverload Project: Testing every x86 desktop processor since 2010' - Anandtech Bench

UsandThem

Elite Member

MrGuvernment

Junior Member

blckgrffn

Diamond Member

borandi

Member

moinmoin

Diamond Member

dullard

Elite Member

blckgrffn

Diamond Member

tamz_msc

Diamond Member

dullard

Elite Member

moinmoin

Diamond Member

TRENDING THREADS