Steam's data collection - GPU and more "by the numbers"

Hauk · Dec 31, 2009

I was poking around on Steam and found they collect data on hardware usage. That they do may be of little surprise to most; but I thought the data was interesting. Granted it's one source and not everyone uses Steam.

But Steam has a significant worldwide presence, so their data should prove to be an accurate indicator of what hardware we're using. (?)

nVidia's market share remains strong when all GPU generations are weighted. Interesting though is the percentage captured by ATI's 4800 series, which steadily closes in on nV's 8800 series percentage. Both series make up the largest percentages according to Steam.

It's interesting to note the number of older GPU's in use; many are holding on to older cards. Not many multi-gpu setups according to Steam. On the processor side, Intel dominates.

Again, nothing earth shattering; but it's a slow news day so I figured what the heck..

GPU:
nVidia - 63.46%
ATI - 27.12%
http://store.steampowered.com/hwsurvey/

Top series:
nVidia 8800 series - 9.28%
ATI 4800 series - 8.49%
http://store.steampowered.com/hwsurvey/videocard/

Multi-GPU setups:
nVidia SLI - 2.40%
ATI Crossfire - 0.18%
http://store.steampowered.com/hwsurvey/

Processor overall:
Intel - 68.97%
AMD - 31.03%
http://store.steampowered.com/hwsurvey/processormfg/

Interitus · Dec 31, 2009

Would be interesting to see this again right before the release of Fermi to see how big of a shift the 5800 series and Fermi's release woes have made. Not that I expect it to be huge, but it would be interesting to compare the differences.

Keysplayr · Dec 31, 2009

We've already discussed Steam's survey data. Even had a poll to show how many people actually use and allow their systems to be scanned. I forget the exact results, but it has shown that nowhere near everybody uses Steam. Steam cannot be used as an accurate indicator of every gaming rig out there. Not even close. All it can tell you about are the people who actually use Steam. Which are many, but nowhere near the majority.

Lonyo · Dec 31, 2009

The interesting thing from that is that it's horribly misrepresentative when it comes to the HD 5xxx series.
From those figures you would think that the HD58xx has sold more than the HD57xx series, since one is at 0.52% and the other at 0.30% (based on DX10 systems with DX10 cards, since they don't both show up in any other section...?).
The actual figures are the reverse of that though, with 5:3 in favour of the HD57xx series (based on ATI actual sales numbers).

How that all translates to the rest of the cards is anyone's guess, but taking too much notice of it isn't a very wise idea.

Avalon · Dec 31, 2009

Keysplayr said:
We've already discussed Steam's survey data. Even had a poll to show how many people actually use and allow their systems to be scanned. I forget the exact results, but it has shown that nowhere near everybody uses Steam. Steam cannot be used as an accurate indicator of every gaming rig out there. Not even close. All it can tell you about are the people who actually use Steam. Which are many, but nowhere near the majority.

Yeah, but it's essentially a random sampling of casual, moderate, and hardcore gamers. If the sample is high enough, it can be inferred that the results would generally fall within a margin of error to be determined, for avid gamers.

Granted, this isn't necessarily going to match up with shipping numbers for graphics cards, but I don't think that's what it's trying to do.

ArchAngel777 · Dec 31, 2009

Avalon said:
Yeah, but it's essentially a random sampling of casual, moderate, and hardcore gamers. If the sample is high enough, it can be inferred that the results would generally fall within a margin of error to be determined, for avid gamers.

Agreed. If this data is to be discarded, then so should absolutely every poll ever done, asside from a census.

Keysplayr · Dec 31, 2009

Avalon said:
Yeah, but it's essentially a random sampling of casual, moderate, and hardcore gamers. If the sample is high enough, it can be inferred that the results would generally fall within a margin of error to be determined, for avid gamers.

Granted, this isn't necessarily going to match up with shipping numbers for graphics cards, but I don't think that's what it's trying to do.

Whatever you wish to get from it, is up to you. However the steam survey isn't something I'd call comprehensive nor would I use it as an indicitive reference. That's up to me.

Ben90 · Dec 31, 2009

ArchAngel777 said:
Agreed. If this data is to be discarded, then so should absolutely every poll ever done, asside from a census.

Quoted for truth

Hey Zeus · Dec 31, 2009

Interitus said:
Would be interesting to see this again right before the release of Fermi to see how big of a shift the 5800 series and Fermi's release woes have made. Not that I expect it to be huge, but it would be interesting to compare the differences.

Main reason i went with 5750's instead of 5850's. Want to see what Nvidia counters with.

jvroig · Dec 31, 2009

Keysplayr has already addressed the fact that this "Is Steam reliable/indicative or not" issue has been beaten to death on a previous thread.

I do not wish to quote anybody who participated in this current thread so that my post here does not come off as a personal attack or as a post that "jumped" on a mistake.

Now that the formalities are over, please understand that Steam's data collection is not indicative of anything because of the methodology. Somebody who may have read a few things about statistics and/or probability, and perhaps thinks of polls and surveys and "random samples" as simpler than they are is prone to assume a few things wrongly and thus conclude that a sample is "random" simply when it isn't.

Let's tackle "random sample". If you understand it merely as a lay person would and think of it as a "sample that is random" and think "random" here means anything as long as it was not pre-determined, or as long as it is unpredictable, then you are wrong.

You see, the thing about random sampling and a "random sample" is that the actual sample population involved (I mean the composition of the sample; for example, the thousands of Steam users that did participate) does not determine whether the sample is a random sample or not. Even if you review each element in the sample and think "Yeah, looks varied enough based on [criteria/s]", that won't tell you that the sample is a random sample.

What makes a sample a random sample is the methodology involved in obtaining the sample. If the methodology involved meets the criterion of randomness, then the sample obtained is a random sample (it helps if you think of it as one word instead of two separate words). The criterion of randomness is rather straightforward: each element in the population you are targeting to derive a random sample from must have an equal chance to be picked in the sample.

If a sample is truly a random sample, we can then expect it to reasonably reflect the reality had we asked the entire population instead of just a sample. Of course, this is not magic so there's still a margin of error that the statistician/pollster takes into account, sampling error / margin of sampling error. For lay people, it means "that's what you get for talking to a sample instead of the whole population". (If you have managed to get this far, I hope it is also clear that every time I say "population" here, I do not mean the human or world population)

Now to the meat. Are every poll valid or invalid? Naturally, saying "all are invalid" is nonsense. But using that to then say "then that means all of them are valid, and so is Steam!" is also a bit off.

It depends on the methodology. If a pollster for an election uses random digit dialling (and only random digit dialling) to obtain a sample, that's not a random sample. That's because each household may have several adults. If you phone a household only once, and you immediately pick as a sample the adult that answered it, then your sample is biased to "phone-answering adults".

"But that's random!" you might say. No, it's not. It's not about whether you can determine it or not. Here, randomness means everybody must have an equal chance to be picked. Those who loathe answering phones and let other members of the household pick it up are immediately disqualified.

So what do good pollsters do? They introduce another element of randomness. When they call up a household for a poll, they may, for example, ask for the adult whose had the most recent birthday, or perhaps the earliest upcoming birthday, or something related to birthdays. Why? Because unlike "phone answerers" versus "phone let-it-ring-until-some-else-picks-it-uppers" , birthdays are actually randomly distributed. It's not perfect, mind you, but far better than the previous case. Also, there's the fact to deal with that households have different numbers of adults, which also skews the probabilty for each element. Instead of using an additional "randomizer" like a randomly-distributed factor such as birthdays, I have heard of some polls before that simply asks a household for all adults to participate, separately, so as to mitigate the difference in probability based on number of adults in a household (which isn't randomly-distributed). Anyway...

So which polls are valid and which are invalid? Check the methodology.

How about Steam?

Doesn't pass the criteria of randomness. Remember, we don't mean "random" here as you may have it in your head such as "unpredictable" or "non-determined". Criteria of randomness means every element in the population (here, the population is gamers, and each element is each one of us) must have an equal chance to be picked.

If you are a gamer and never buy games online? Your chances of being included are now lower than those who do. You don't have internet connection or don't bother with it? You are certainly out. You don't buy stuff online, but have internet connection, but never really play Valve games or any game using the Steam platform? Never really encountered Steam before in your life? Out and out.

Doesn't pass the criteria of randomness. Note that if the population were to be reduced from "gamers" to "Steam gamers", it might pass as a random sample.

Please understand that, despite this, Steam is probably one of the best data available for game devs anyway. Yes, it is not a random sample, and yes, for all we know people who don't buy stuff online and thos who don't have a net connection may comprise a big chunk of the population (gamers) and may have hardware different enough to skew the results very much. But there is currently nothing else out there (to my knowledge) that is Steam-like. Using Steam, as well as sales results for CPUs and GPUs (similarly, these are unreliable just like the Steam survey), it helps game developers get a feel of where things are headed. How right or how accurate is almost impossible to determine without any MOSE, and I doubt there is any MOSE figured in here.

So while it might be the best out there to get a feel of the hardware in the wild, calling it "indicative" or in the same league as valid polls is just not right, especially if you use it as a determinant for a population simply categorized as "gamers" that is too broad and undefined.

jvroig · Dec 31, 2009

First post too long, I will just add thoughts here, sorry.

Steam itself (I mean Valve) actually seem to know what they are doing, after all. They don't take the survey to mean "representative of gamers worldwide" or some such grand claim.

Instead, all they claim is "data about what kinds of computer hardware our customers are using". They have it right. All Steam can hope to be indicative of is what hardware THEIR customers are using, not "gamers in general", "gamers as a whole", or "gamers worldwide". And at the end of the day, that's all Valve cares about anyway, their customers. Sure, maybe Bethesda customers might have radically different hardware. I can't imagine Valve would lose any sleep about it.

There we go.

Schmide · Dec 31, 2009

I laugh again at those who rail against the system to discredit these results. We ended up agreeing it was a "Biased Sample" that based on our own sampling here it probably encompasses around 60% of anandtech VC&G readers. So of their desired sample, it actually is a majority.

You could easily say any sample is not a majority including the census if you declare it only a subset of the greater world population.

jvroig · Dec 31, 2009

I laugh again at those who rail against the system to discredit these results.

It's not so much as "discrediting" as it is "limiting". Taken for what it is, and what Valve seems to take it for, the Steam survey can very well be spot on: Valve's customers.

But when it is taken to a higher level and almost state it like a fact that Steam's results will reflect "all gamers" (pretty huge population), it doesn't pass the test of being a random sample.

Schmide said:
You could easily say any sample is not a majority including the census if you declare it only a subset of the greater world population.

Who's talking about "majority"? There's no majority involved. That's not how samples work. A sample is either a random sample or not, and that determines its validity in as much as it can be expected to be reflective of the entire population, taking into account the margin of sampling error. I don't believe I have even mentioned the word "majority" at all, since it is not a factor.

Schmide · Dec 31, 2009

jvroig said:
Who's talking about "majority"? There's no majority involved. That's not how samples work. A sample is either a random sample or not, and that determines its validity in as much as it can be expected to be reflective of the entire population, taking into account the margin of sampling error. I don't believe I have even mentioned the word "majority" at all, since it is not a factor.

Keys mentioned majority. Duh. Did I specifically address you?

I was talking about those who use steam, our poll inferred that a majority of steam users do participate in the survey.

Even a random sample has its limitations, in certain cases a biased sample could be considered advantageous.

Hauk · Dec 31, 2009

Looks like I missed an interesting thread when it was first covered.

@ jvroig, thanks for the comprehensive breakdown. This is the stuff that makes anand forums a cut above the rest.

dguy6789 · Dec 31, 2009

Steam is absolutely a reasonable resource for data. Aside from World of Warcraft, the top two or three most played PC games right now all require Steam to run. I recall that the poll done on this forum showed more than 50% of everyone who voted to use Steam.

http://forums.anandtech.com/showthread.php?t=2029165&highlight=

Voo · Dec 31, 2009

@jvroig: Interesting read, but that means that every voluntary poll doesn't use random samples, because you can only interview those people who want to, right?
So this is a rather hard requirement, isn't it? I agree with your conclusio, but it seems like no usual poll could ever satisfy these requirements..

Nemesis 1 · Dec 31, 2009

Keysplayr said:
We've already discussed Steam's survey data. Even had a poll to show how many people actually use and allow their systems to be scanned. I forget the exact results, but it has shown that nowhere near everybody uses Steam. Steam cannot be used as an accurate indicator of every gaming rig out there. Not even close. All it can tell you about are the people who actually use Steam. Which are many, but nowhere near the majority.

Keys is pretty sharp . According to that survey Intel is not a monopoly.

Schmide · Dec 31, 2009

Nemesis 1 said:
Keys is pretty sharp . According to that survey Intel is not a monopoly.

You don't even need a majority of the segment to be declared a monopoly, you need to read up on monopoly law.

Ross Ridge · Dec 31, 2009

While statistically flawed, the Steam Survey is the *best* source of data we have on what kind of hardware mainstream PC gamers are actually using. The only other sources of data are other companies' surveys which are even more statistically flawed, and almost always kept confidential. You can't even buy better statisticallly "valid" data from market research companies like NPD. It's just not available.

Because it is the best source of information, game companies, companies that are putting their money on the line, do in fact make business decisions based on it. According to unbiased and statistically accurate reports, Intel has something like 50% of the GPU market. You'd be have to be fool to release a game that doesn't support Intel graphics, wouldn't you? However, the majority of games released today don't support Intel graphics. Game companies put more trust in the Steam Survey or their own flawed surveys which show that very few people that actually buy games have Intel GPUs.

(Of course the above only applies to mainstream games. For the casual game market, a company would be looking at the similarly flawed Unity 3D hardware statistics and come up with a different conclusion.)

The problem here is that people often assume if something is bad it can't be the best. That imperfect information is worse than no information. However, if it's good enough when real money is on the line, then as far I'm concerned it's more than good enough for the sort of casual discusions that happen on web forums like this one. It's not like anyone can actually refute the original poster with better data.

cbn · Dec 31, 2009

dguy6789 said:
Steam is absolutely a reasonable resource for data. Aside from World of Warcraft, the top two or three most played PC games right now all require Steam to run. I recall that the poll done on this forum showed more than 50% of everyone who voted to use Steam.

http://forums.anandtech.com/showthread.php?t=2029165&highlight=

That is a pretty high number of people from Anandtech reporting results.

I've never responded to a steam hardware survey myself. How does someone do this? (See that right there is potential for bias. Maybe only the hardest core steam users know how to respond to the survey. This might imply higher than normal hardware is what is getting reported)

Schmide · Dec 31, 2009

When you sign up for steam, it asks you if you wish to allow it to share your HW data with steam. I think it rechecks if you want to share it every year or so.

sandorski · Dec 31, 2009

If Steam just took the info and didn't offer the choice, it would be a much better indicator. Because of the Choice it becomes far less reliable, but is still an interesting bit of information and probably accurate enough to at least get a taste of what the trends are.

Got a kick out of the OP being surprised by this survey though. I think most of us have known about this for years now.

jvroig · Jan 1, 2010

Schmide said:
Keys mentioned majority. Duh. Did I specifically address you?

My apologies then. Keysplayr's post was three posts above my last, and four posts above yours. It got me confused into thinking you were then addressing me. Honest mistake, sorry.

Hauk said:
@ jvroig, thanks for the comprehensive breakdown.

Now that I read it, it is too long. Long enough that it might be misconstrued as an attack against Steam. So while it is indeed not a random sample, let me highlight one of my concluding sentences there:

jvroig said:
Please understand that, despite this, Steam is probably one
of the best data available for game devs anyway.

So yes, I did try to explain how the Steam survey is not a random sample, therefore we can't expect it to be as reflective of the whole "gamer population" (if you mean it as broad as that to be able to say that "most gamers prefer x over y" based on Steam alone). Still, it's the best we've got out there right now, and although the exact figures won't mean anything (and surprise, no MOSE information, because MOSE is impossible here), the trends it presents should be true if you take it for what it is, and don't try to use it for what it is not.

What could be better than Steam? Another "Steam-like" HW profile gatherer from a different developer, perhaps BioWare or Bethesda who also releases major titles but not FPS games. Let them collect data as well. That won't be "better than Steam", but having Steam survey data AND that new Bethesda/BioWare/Whatever survey will allows us to see more data. Of course, this is all up to the developers. For all we know, whenever we register our copies electronically, they might already be taking our hardware profiles and mining the data, they just don't release results because they never asked for permission in the first place and they know they'll get in trouble from people who will feel "violated" about it.

Obsoleet · Jan 1, 2010

I installed Steam on my laptop and desktop.. and I don't think I've ever gotten the popup to submit for my desktop. Is there anyway to force it to check on that machine? My laptop is Intel integrated ! My desktop is a 5870 / quadcore. Be nice to have that reporting instead

Steam's data collection - GPU and more "by the numbers"

Platinum Member

Platinum Member

Elite Member

Lifer

Diamond Member

Diamond Member

Elite Member

Platinum Member

Banned

Platinum Member

Platinum Member

Diamond Member

Platinum Member

Diamond Member

Platinum Member

Diamond Member

Golden Member

Lifer

Diamond Member

Senior member

Lifer

Diamond Member

No Lifer

Platinum Member

Platinum Member