Why would a text editor have trouble opening a file? It's just text!

Red Squirrel

No Lifer
May 24, 2003
70,166
13,573
126
www.anyf.ca
I get an error on a specific file (it's just text) that it can't open it. Something about the character encoding? Huh? It's plain text! Just open it! It opens in other programs, but I don't want to have to use a different program just for that one file. It's a script that's part of a program. Is it trying to detect the programming language so it can parse it or something? Is there a way to bypass that if it's really a problem?

The text editor is called Xed. It's just the one that's built into Mint. This is the exact error I get:

Code:
xed has not been able to detect the character encoding.
Please check that you are not trying to open a binary file.
Select a character encoding from the menu and try again.

I know as a fact that it's a text file because it opens fine in Windows notepad. It's the same format as about 20 other files in the same folder which all open fine.

Ok so this file is really weird. I opened it in another program in Linux and it shows up as chinese. But it's just plain text! How is it not "decoding" it right, when it's plain text? There's no decoding to do, just display it byte per byte!

Made an interesting discovery, it seems anything that has the word Checksum: in it it does not like.

This file causes certain programs to puke: (notepad and Vim open it fine though - this is copied from Vim)

Code:
/**********************************************************
RunUO AoV C# script file
Official Age of Valor Script :: www.uovalor.com
Last modified by Red Squirrel on Jan-24-2016 09:06:43pm
Checksum: 8B2E68E2285230F813FC9B3AD3C59F6
Filepath: scripts/Multis/BaseHouse.cs
Lines of code: 3039

Description: 

***********************************************************/
^M


using System;
using System.Collections;
using System.Collections.Generic;
using Server;
using Server.Items;
using Server.Mobiles;
using Server.Multis.Deeds;
using Server.Regions;
using Server.Network;

Not sure what that ^M is about, but wondering if that's playing a role somehow?

While this file is ok:


Code:
/**********************************************************
RunUO AoV C# script file
Official Age of Valor Script :: www.uovalor.com
Last modified by 
Checksum: C017F922BFCD865C71A395CBBB262B1
Filepath: scripts/Multis/Deeds.cs
Lines of code: 693

Description: 

***********************************************************/



using Server;
using System;
using System.Collections;
using Server.Multis;
using Server.Targeting;
using Server.Items;



(cut off the rest just to keep it short)
 
Last edited:

Essence_of_War

Platinum Member
Feb 21, 2013
2,650
4
81
But it's just plain text! How is it not "decoding" it right, when it's plain text? There's no decoding to do, just display it byte per byte!

There is no such thing as plain text. Everything has an encoding.

https://www.joelonsoftware.com/2003...-about-unicode-and-character-sets-no-excuses/

My guess is that that "^M" is actually your problem or at least a symptom of it. "^M" is a carriage return symbol and is combined with a newline character on dos/windows to create a newline, and on unix-likes use the newline character exclusively. These carriage returns could be confusing xed into thinking the file has a different character encoding than it is advertising. If the file was created on dos/windows and you're trying to use it on linux you could try passing it through the dos2unix.
 
  • Like
Reactions: lxskllr

Red Squirrel

No Lifer
May 24, 2003
70,166
13,573
126
www.anyf.ca
There is no such thing as plain text. Everything has an encoding.

https://www.joelonsoftware.com/2003...-about-unicode-and-character-sets-no-excuses/

My guess is that that "^M" is actually your problem or at least a symptom of it. "^M" is a carriage return symbol and is combined with a newline character on dos/windows to create a newline, and on unix-likes use the newline character exclusively. These carriage returns could be confusing xed into thinking the file has a different character encoding than it is advertising. If the file was created on dos/windows and you're trying to use it on linux you could try passing it through the dos2unix.


That was a rather interesting read, and TBH always kinda wondered how unicode even works, how do you store so many characters in 1 byte. Although I'm still kinda confused. From what I gather, even for a plain text file you need some kind of header? But won't that throw off the compiler, in the case of code? What header do I need for UTF-8, which I presume is what code would use? And why is it only affecting certain files?

As a side note, what is the proper way of doing carriage returns, is it \r\n or just \n? Or just \r? I usually do \r\n.


Edit: If anyone is bored, this is two files, one opens, one does not, as far as I can tell they're both basically the same style.

http://www.uovalor.com/misc/textissue.tar.gz
 
Last edited:

Red Squirrel

No Lifer
May 24, 2003
70,166
13,573
126
www.anyf.ca
Still confused as hell with this, even opened file with a hex editor to try to look for anything that looks off, like some kind of unicode header or something but found nothing. All returns seem to be 0x0A which I think is correct right? As a test, I opened the file in an editor that can open it, copy and pasted the content into a NEW file and saved it, and now I can actually open it. So what kind of sorcery is this encoding stuff? Where is that stored? Is it stored as a file attribute? How can the exact same text copied into another file work, but the other one not?

But what is baffling to me is how it's only the oddball file that does it. I also tried to pipe the file to another file but that didn't work. But maybe that keeps the attributes?
 
Feb 25, 2011
16,991
1,620
126
1) Did you run the file through the dos2unix utility?
2) Are there maybe some libraries you're missing? On Mint 17, I tried to install/use Kate, which I had used previously on Ubuntu. It was all weird, missing features, and crashed whenever I saved files. Turned out there were 4-5 packages it needed installed first that weren't listed as dependencies.

Try running/invoking the editor from a command line, preferably with a verbose option, and then search for the (hopefully) resulting error output - I bet you'll find somebody with a fix.
 

Red Squirrel

No Lifer
May 24, 2003
70,166
13,573
126
www.anyf.ca
Yeah tried the utility and it did not change anything. Copy and pasting the text from one editor to the other worked though, but still kinda baffling.

I'll have to try to find another text file that fails then will try with command line and see if there is a proper error.
 

Red Squirrel

No Lifer
May 24, 2003
70,166
13,573
126
www.anyf.ca
Actually is there a way to get leafpad on Mint? I use a more advanced editor for coding but sometimes I want a separate small text editor too for separate use like notes, reference etc, and Mint does not seem to have Leafpad. Could not find it in repo either.

Had to google catdoc, that sounds interesting. I wonder if it will do ODF and other types of documents too... could be interesting for a file upload site as it could generate a text preview of the document. Could also be a great way to check out a potentially harmful document without opening it.

I will also have to try it next time I run into an unopenable text file.
 

ControlD

Diamond Member
Apr 25, 2005
5,440
44
91
I have leafpad running on Mint. It is available in the Software Manager. If not, just Google "Leafpad Mint" and install from the community website.