Originally posted by: Smilin
Wait a second. What the hell does this mean:
<blockquote>quote:
The 174770 virus samples were chosen using VS2000 according to Kaspersky, F-Prot, Nod32, Dr.Web, BitDefender and McAfee antivirus programs.</blockquote>
???
The biggest problem in holding a large scale malware test, is to ensure that your sample consists of valid functioning malware.
Sure, you can download, large amounts of malware from various sources, or even use honeypots, but a large number of them will be none-functioning.
Supposedly, to do well in tests, some antiviruses will detect everything even junk files that are not really a threat.. All they do is to create a dumb md5 hash of the file and presto! it's detected. Never mind if it's so buggy that it would never run at all.
To solve this problem of junk files, What some testers do is to scan their large virus archive with X number of reference antivirus, and any of the samples not detected by any of these antivirus will be dropped from the test set as non-functioning.
This is of course pretty silly (just because no antivirus detects a sample doesn't mean it's not a threat) but worse it gives these reference antivirus a HUGE advantage in the tests obviously compared to the others.
It's almost like setting a test by letting some of the examined set the questions.... and then subjecting everyone including those who set the test questions to said test.
Other methodologies both automated, semi-automated to weed out the chaff have being used of course, (e.g. labelling malware only if x number of antiviruses see it, which surprisingly isn't that good at filtering junk files as you might think, because some antiviruses mirror each other, so if they see antivirus X detecting file A, they will just follow suit without borthering to analyse if it is working), but in general you need humans to examine the code to be sure.
Sorry if I stated the obvious...