File name character encoding in Unix

Daverino

Platinum Member
Mar 15, 2007
2,004
1
0
I have a FTP site that gets files from time to time and I have a Perl script that runs over the directory, moving things around automatically for me. Trouble is lately I've seen some files with Windows-1252 and Big-5 characters in the file names. Perl is unable to move the files because it's expecting UTF-8 and is scrambling the characters when it reads them from the from the directory. I could easily encode things correctly using Perl's Encode *if* I knew what the encoding of the file name was. Most of the files are properly using UTF-8 file names, so I can't just switch encodings without messing up what I usually do. The files aren't text, so I can't use 'file' to get the encoding.

I've tried things like guessEncoding before but it's been unreliable. Is there a way to get Unix to be consistent with the encoding of its filenames? Maybe get FTP to do it somehow?
 

Nothinman

Elite Member
Sep 14, 2001
30,672
0
0
That is a strange problem, I would have guessed that Perl's normal opendir, rename, etc functions would work fine if the system's ls, mv, etc commands work fine too.

I just ran a quick test by creating a file with japanese characters and going over the directory and opening them all and it worked fine but mine were probably all UTF-8 and I don't know how to create filenames with a different encoding.