Linux file copy with renaming

Cerb

Elite Member
Aug 26, 2000
17,484
33
86
I'm trying to do a large one-time copy, from JFS to NTFS.

Are there any file managers, or basic command-line copy utilities, that, when faced with a file they can't write, will either (a) remove offending characters (autorename), or (b) prompt for renaming?

So far, I've tried cp, rsync, PCManFM, Double Commander, Nautilus, Dolphin, and Konqueror.

Or...is there a utility that will find and list all of the files that will fail to copy due to 'bad' characters in their names, so that I can proactively rename them?
 
Last edited:

Nothinman

Elite Member
Sep 14, 2001
30,672
0
0
What characters are allowed in a filename vary from filesystem to filesystem, although I believe most Linux filesystems are fine with anything except for / and NULL so it's not something that comes up often. You're probably better off just using tar or cpio to put all of the files in one archive, that'll preserve the permissions, timestamps, etc and the filenames won't matter.
 

LCTSI

Member
Aug 17, 2010
93
0
66
Or...is there a utility that will find and list all of the files that will fail to copy due to 'bad' characters in their names, so that I can proactively rename them?

GNU find
you can specify the bad characters as part of the search criteria, using either the name option or the regex option.

for question marks:
find /mnt/drive -name '*\?*'

regex gives you a lot more flexibility to search for those characters at once though

This searches for question marks, equals signs and plus signs
find /mnt/drive -regex '.*[?=+].*'

Realistically you could just dump all the invalid characters in that bracket.

If you have perl's `rename` command you could also use a similar regex to just rename everything automatically.

rename -n 's/[?=+]/_/g' `find /mnt/drive -regex './.*[?=+].*'`

This will tell you what perl's `rename` command would have renamed with the regex I provided. I have it set to change the ? = and + to underscores. You can change that up some if you like, or have it just delete those characters entirely.

Lots of flexibility there.
 

Cerb

Elite Member
Aug 26, 2000
17,484
33
86
The thing there is that I'd need a list of valid characters to find values that aren't in them (I don't care what the invalid characters are, I just want them gone). But, any valid characters not typable in my locale/keybaord layout, I'd still want to be left alone. The invalid characters all show up as stylized question marks, or question marks in diamonds. I don't what they are, or how they came to exist, given that all the files had originally been copied from NTFS (maybe if I'd forced a local on my FSes when mounting the original NTFS partition, a few years ago, they'd have worked fine?).

It honestly surprises me that this is such a hard thing to do, given that usually, Linux/FOSS options are far superior with any kind of file management work. A simple auto-rename dialog, instead of an error canceling the whole operation, would more than suffice.

I ended up just deciding to give up a few hours, and do dir by dir copying, fixing as it happened. Being AHCI SATA->AHCI SATA, even in small files, it stayed above the 40MB/s mark, w/ my WD Green as the source, so it ended up not being that bad.
 
Last edited:

LCTSI

Member
Aug 17, 2010
93
0
66
The thing there is that I'd need a list of valid characters to find values that aren't in them (I don't care what the invalid characters are, I just want them gone).

Well no, you'd just need a list of invalid characters on NTFS to rename them.

which is U+0000 (NUL) / (slash) \ (backslash) : (colon) * (asterisk) ? (Question mark) " (quote) < (less than) > (greater than) and | (pipe)
 

Cerb

Elite Member
Aug 26, 2000
17,484
33
86
Then, that itself would have been handy to know, even though the failure modes are rather poor (since if that's the case, both FSes are unicode, and Windows->*n*x->Windows should offer a perfect conversion, since *n*x FSes have invalid characters within the same set).
 

LCTSI

Member
Aug 17, 2010
93
0
66
Then, that itself would have been handy to know, even though the failure modes are rather poor (since if that's the case, both FSes are unicode, and Windows->*n*x->Windows should offer a perfect conversion, since *n*x FSes have invalid characters within the same set).

Well there's an abstraction between the VFS and ext*/btrfs. So what may be valid in the VFS could sometimes not be valid in the filesystem, and vice versa.
Of course in ext3/4 all unicode characters are valid (All bytes except NULL and '/') , and some of that still might not display properly (since that's like 10,000 characters)

I'm assuming (perhaps incorrectly) that you're in a predominantly English-speaking country. Maybe you could have come up with a regex to rename everything that isn't in the POSIX C locale.
POSIX C list:
http://publib.boulder.ibm.com/infocenter/iseries/v5r3/index.jsp?topic=/nls/rbagsposixexample.htm

Of course that really depends on the number of files you notice with bogus characters... the tradeoff of time involved in constructing the regex might eclipse the time you would spend just manually fixing everything.

Anyway, glad you got it going though.