Program or script for Linux that can search for duplicate files on HD?

Electrode

Diamond Member
May 4, 2001
6,063
2
81
Hey all. I was just running my monthly backup on my file server, and I notice that the hard drive I back everything up to is 98% full, according to df. Untill I get some money for a nice 160GB drive, the only way to free up some space for my ever expanding collection of stuff is to delete some unneeded files.

Now, I know that there are many, many duplicates scattered around the drive. But there are thousands upon thousands of files, strewn across hundreds of directories. Searching by hand is not feasable, so I need a program to do it. Can anyone recommend a program or script that does this?

Thanks in advance!
 

Armitage

Banned
Feb 23, 2001
8,086
0
0
Here, try this. It only compares filenames though, and isn't blazingly fast.
Put this in a file (from the line '#!/usr/bin/python' on)
Make it executable
call it with arguments for each of the paths you want to look in.


#!/usr/bin/python

import os
import sys

def add_file(file_list, curr_dir, files):
for file in files:
file = curr_dir + '/' + file
if os.path.isfile(file):
file_list.append(file)

def compare_by_basename(file1, file2):
return cmp(os.path.basename(file1), os.path.basename(file2))

files = []

for j in range(1, len(sys.argv)):
path = os.path.abspath(sys.argv[j])
os.path.walk(path, add_file, files)

files.sort(compare_by_basename)

test = 0
testnm = os.path.basename(files[test])
first = 1
for j in range(1, len(files)):
if os.path.basename(files[j]) == testnm and files[j] != files[test]:
if first:
print "\nMATCH **************************\n\t", files[test]
first = 0
print "\t", files[j]
else:
test = j
testnm = os.path.basename(files[test])
first = 1

#End

edit: changed subscript from i to j to avoid italics tag problem
 

Electrode

Diamond Member
May 4, 2001
6,063
2
81
Thanks!

edit: I ran it, but got this error:

File "./compare.py", line 7
for file in files:
^

IndentationError: expected an indented block


I guess Python is sensitive to whitespace, but the forums throw it out. Perhaps you could e-mail me the script? ROT13 ryrpgebqr@tnenaqarg.arg
 

Armitage

Banned
Feb 23, 2001
8,086
0
0
Originally posted by: Electrode
Thanks!

edit: I ran it, but got this error:

File "./compare.py", line 7
for file in files:
^

IndentationError: expected an indented block


I guess Python is sensitive to whitespace, but the forums throw it out. Perhaps you could e-mail me the script? ROT13 ryrpgebqr@tnenaqarg.arg

Oops, I guess python isn't the best choice for forum postings :(
Is there an easy way to ROT13 that address? Or just send it to me via PM.
 

Armitage

Banned
Feb 23, 2001
8,086
0
0
Originally posted by: Electrode
if you have a UNIX-like system, you probably have a /usr/bin/rot13 or /usr/games/rot13. If not, here's a web-based one.

Huh, interesting. That's the first place I looked, but it isn't in my path on 3 different unix boxes (RedHat, Suse, IRIX).
Anyway, it's on it way.