[html] How do I stop question mark diamonds from showing up on pages that I code?

Red Squirrel

No Lifer
May 24, 2003
68,643
12,705
126
www.anyf.ca
I must be doing something wrong in how I setup my html as I often see diamond question marks show up for certain characters like accents. This is how I typically start a page:

Code:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

Is that the right doctype I should be using, or is there something else? I presume that's what affects the way it renders certain characters?

It seems to be more noticible if it needs to render data that comes from another source. Like importing stuff into a database and then displaying it. Even characters like quotes often have it happen, but if I type the quote manually then it's ok.
 
  • Like
Reactions: Pick2

Cogman

Lifer
Sep 19, 2000
10,283
134
106
Use the following

Code:
<!DOCTYPE html>

You are using the old Html 4 doc type declaration. That is by and large abandoned at this point.

Beyond that. What encoding are you using?

You probably want UTF-8, and if that is the case, throw in the following tag into your <head>
Code:
<meta charset="UTF-8">

But if not, you can read up on the other available encodings.
 

Red Squirrel

No Lifer
May 24, 2003
68,643
12,705
126
www.anyf.ca
Yeah still no go. Oddly, if I change encoding in firefox to "western" it works, but I obviously don't want to have people to do that. It seems to get hung up on oddball characters too like quotes, but like not all of them. If I type quotes directly in the html it's fine, but if it was submitted by an external source it might mess up.
 

Ken g6

Programming Moderator, Elite Member
Moderator
Dec 11, 1999
16,385
4,123
75
Oddly, if I change encoding in firefox to "western" it works
Aha! Your documents are encoded in Windows-1252.

Wikipedia said:
It is very common to mislabel Windows-1252 text with the charset label ISO-8859-1. A common result was that all the quotes and apostrophes (produced by "smart quotes" in word-processing software) were replaced with question marks or boxes on non-Windows operating systems, making text difficult to read. Most modern web browsers and e-mail clients treat the media type charset ISO-8859-1 as Windows-1252 to accommodate such mislabeling. This is now standard behavior in the HTML5 specification, which requires that documents advertised as ISO-8859-1 actually be parsed with the Windows-1252 encoding.[3]
So you can either use:

Code:
<!DOCTYPE html>
with
Code:
<meta charset="ISO-8859-1">

or maybe:
Code:
<meta charset="Windows-1252">

or you could translate your documents to a newer standard like UTF-8.
 

Red Squirrel

No Lifer
May 24, 2003
68,643
12,705
126
www.anyf.ca
When you say document what do you mean, like the source code? It's just text, as I type all my pages mostly by hand in a text editor. Though I have ran into weird issue where one text document acts differently than another, I can't really explain or understand what's going on there. Text encoding seems like a weird thing. As far as I know I'm using UTF-8.

Though here is a sample web page where it does not work correctly: http://www.iceteks.com.

Notice that some of the words that have single quotes it shows a question mark diamond instead. The doctype changes I made are only local though so on that site it's still the old way but it did not change anything.
 

Red Squirrel

No Lifer
May 24, 2003
68,643
12,705
126
www.anyf.ca
I just discovered something interesting, it really does seem to only be data that comes from a database that does it (mysql in this case).

This character is one of them: ”

If I type it directly into the html page, it's fine. But if I have a SQL select statement then display the data and that character is there, then it does not like it.
 

DaveSimmons

Elite Member
Aug 12, 2001
40,730
670
126
It might be a "smart quote" from Word document. Word changes straight single ' and double " quotation marks into curly angled ones, and dashes into long dashes (en dash, em dash).
 

Murloc

Diamond Member
Jun 24, 2008
5,382
65
91
I guess that your editor is saving files with the correct encoding since they appear normally in your browser.

Then the problem is just the database encoding right? Can't you handle the encoding issues after you get the data or just reencode the whole database?

PS: I don't know anything about databases but I found questions on stackexchange relating to converting them or having stuff have a certain encoding.
Back-up first.
 

beginner99

Diamond Member
Jun 2, 2009
5,234
1,611
136
I just discovered something interesting, it really does seem to only be data that comes from a database that does it (mysql in this case).

This character is one of them: ”

If I type it directly into the html page, it's fine. But if I have a SQL select statement then display the data and that character is there, then it does not like it.

Then your MySQL is probbaly set to Windows-1252 encoding. Set it to utf8 for client:

https://stackoverflow.com/questions/4361459/php-pdo-charset-set-names
 

Red Squirrel

No Lifer
May 24, 2003
68,643
12,705
126
www.anyf.ca
Will that do anything weird to the existing data? Is that something I just need to do once when I create the DB or does it need to be called each time I open a connection?

Is there a reason utf8 is not default?
 

mxnerd

Diamond Member
Jul 6, 2007
6,799
1,102
126
I saved the webpage and used PSPad hex view to view the page and found that where 17" and Let's words being displayed as question mark diamonds in the page weren't actually double quote (decimal 34) and single quote (decimal 39) in ASCII table, instead they are hex 94 and hex 92.

hex 94 (decimal 146) is displayed as curly closing double quote and hex 92 (decimal 144) is displayed as curly closing single quote if you use Windows Notepad to open the html file.

Don't even know how to type them out with a U.S. layout keyboard.

===

Since you said it comes from MySQL database, you might just have to massage the text before displaying it.
 
Last edited:

Red Squirrel

No Lifer
May 24, 2003
68,643
12,705
126
www.anyf.ca
Yeah I kinda suspected it may have been different code. Though what I find odd is if I copy and paste it directly into the source code it shows up right.

I guess I just need to do some strireplace() to clean up garbage characters like that before it goes in the database.

Though I've also seen accents cause those question mark diamonds and those are valid ascii characters. So that's kind of odd.
 

mxnerd

Diamond Member
Jul 6, 2007
6,799
1,102
126
Yeah, cleaning up the database might be the better way.

Those 2 characters show up as CCH and PU2 in Notepad++, don't even know what that means.

It only displays correctly if switch to ANSI encoding.

Well, every code editor is different.
 
Last edited:

Red Squirrel

No Lifer
May 24, 2003
68,643
12,705
126
www.anyf.ca
Everything I find about encoding functions in php/mysql seems to indicate that it's deprecated. I want to avoid using deprecated code.

But what does collation mean in sql terms? Mysql seems to default to "Latin" but there is tons of options including UTF8, though they are all weird names like utf8_bin etc. Is this maybe something to change or is that something completely different?
 

TempAcc99

Member
Aug 30, 2017
60
13
51
But what does collation mean in sql terms? Mysql seems to default to "Latin" but there is tons of options including UTF8, though they are all weird names like utf8_bin etc. Is this maybe something to change or is that something completely different?

It's a mess simply said. Especially since MySQL UTF8 has bugs so you need to use a different UTF8 collation, namley utf8mb4.

You don't neccearly need to change the database but that would be the best way to go. Another way to go is work with according connection settings.

SET character_set_connection = charset_name;

This will return the result in your desired charset. Use utf8mb4 here as well.
https://medium.com/@adamhooper/in-mysql-never-use-utf8-use-utf8mb4-11761243e434