Unicode/UTF-8 problems in Java

jman19

Lifer
Nov 3, 2000
11,224
659
126
I'm trying to pass around some arabic strings encoded in UTF-8, but I'm getting some strange symbols (specifically a "box" character) that is messing up things. I try to carefully make sure the charset for any streams is UTF-8, but I'm still having problems. Any ideas why?
 

znaps

Senior member
Jan 15, 2004
414
0
0
Yah, more info please. You basically posted "UTF-8, Java, box characters, help"
 

jman19

Lifer
Nov 3, 2000
11,224
659
126
Sorry.

Anyway, I think I've pinpointed the problem. Basically, I have to read in a file of Arabic sentences, store them in a structure, send copies of the sentences to server programs, get their translations, etc. The problem exists when reading the sentences in from their original files. When I try to write the String "text" to a file, I get the messed up box character I mentioned above. The code used to read them and save them is as follows
 

kamper

Diamond Member
Mar 18, 2003
5,513
0
0
Two suggestions:
-are you sure you have the correct fonts installed? maybe the characters are correct, just not being displayed
-have you tried using a FileReader instead of a FileInputStream? The Reader/Writer series of classes have character encoding/character set support built in