Handling Small Files

FrankSaucedo

Member
Mar 3, 2004
104
0
71
I use Windows XP Professional. Does anyone know any way to optimize the way windows handles small files. Doing a simple copy and paste it takes 31 seconds to copy a single 300MB file, and it takes 3 minutes 58 seconds to copy 300MB as 24,380 files.

I ask because at work we use a 3D modeling appliacation that stores jobs as tons of small files. One job can easily have 25,000 files that average to about 13 kilobytes. I am wondering if any tweaks that would optimize performace.

I dont compress the files in any way
I have Auto-Protect disabled on my NortonAntivirus
 

Smilin

Diamond Member
Mar 4, 2002
7,357
0
0
There will be a certain amount of overhead for performing a copy on a file. Each file touched will need to make an alteration to the MFT/FAT. This overhead does not decrease for a small file. A fast disk with a low seek time is your best bet.

Also keep in mind that 25,000 files cannot take up only 13k. With a 4k cluster size it would be 100MB minimum.

It sounds like the application needs optimized. That's a terrible way to do things. Is Solidworks your 3D package?


Edit:
I read that wrong. 25,000 files that average 13k in size would take up 400MB.
 

Nothinman

Elite Member
Sep 14, 2001
30,672
0
0
Think about it, to copy one file you do open(source), open(destination), read(), write(), read(), write(), etc, close(source), close(destionation). To open many files you have to do the same operations 25,000 times and the initial opens are what takes the longest since for the source secure checks have to be made and for the destination the MFT/FAT needs updated with the new file's information.
 

DnetMHZ

Diamond Member
Apr 10, 2001
9,826
1
81
Imagine placing $300 in one envelope and then placing 30,000 pennies one at a time in seperate envelopes... which is going to take longer??
 

FrankSaucedo

Member
Mar 3, 2004
104
0
71
I see (said the blind man)


That's beginning to make more sense. I hadn't taken the cluster size into consideration.

The aplication we're using is Design Data's SDS2. It's a very expensive steel modeling application. I agree that it's a very bad way of storing the data. I think part of the reason is that this application used to run only on Unix and was porter to windows a few years ago. I think this file approach ran better back in the unix days.

the performance is even worse when the files are stored on the file server. Since most of the time only one person works on a job at a given time, I end up copying the entire job to my computer, working on it, and copying it back to the fileserver
 

Nothinman

Elite Member
Sep 14, 2001
30,672
0
0
I think this file approach ran better back in the unix days

Back in the unix days? Why don't you see if they have a Linux port, then you could possibly run it on the same hardware but with a filesystem that handles lots of small files better.

the performance is even worse when the files are stored on the file server

Yup, because now instead of a local open() you're doing a network query which has higher base latency, then if you run into any other latency like disk access by other clients it can only go down hill from there.
 

VirtualLarry

No Lifer
Aug 25, 2001
56,570
10,202
126
Originally posted by: Smilin
There will be a certain amount of overhead for performing a copy on a file. Each file touched will need to make an alteration to the MFT/FAT. This overhead does not decrease for a small file. A fast disk with a low seek time is your best bet.

It sounds like the application needs optimized. That's a terrible way to do things. Is Solidworks your 3D package?

Totally agreed.

Btw, do you happen to know the registry tweak offhand for increasing the amount of files that W2K or XP will cache on a FAT32 filesystem on a per-directory basis?

I heard something about a roughly 2000-file limit in W2K at least, after which, each FindFirst() call would have to scan through the entire directory file for entries, rather than lookups being cached. I can actually attest firsthand to behavior like that happening. I have some directories with old bookmarks in them, with close to 10000 small files in one directory. The slowdown is an order-of-magnitude difference.

If the OP is running FAT32, that could well be a significant part of the slowdown, and increasing that limit (along with adding more RAM), could significantly alleviate the problem. This is one case where FAT32 is a bit less than beneficial to performance.