2/19/2015

PHP: Special Character Encoding & those awful Black Diamonds with Questin Marks or Blank Squares

So, you get a weird character in your web output that might look like this:

   The nation�s first extraterrestrial governor

You know it's a special character -- maybe a angled apostrophe or curly quote, maybe some type of an accented character.

This is likely caused by a problem with two different types of character encoding.
The above text is a headline from a WordPress database that I have.

The WP database says it is encoded as "utf8_general_ci"
The head of my HTML5 web page specifies this:
"<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />"

So, why doesn't it come out correctly? I don't know.

However, there is a PHP function that you can use to test the character encoding of your string:

   echo mb_detect_encoding ($testHead, 'UTF-8,Windows-1252,ASCII,ISO-8859-1,ISO-8859-15');

The string being tested is $testHead.
The comma-separated parameter is a list of character sets to test on the string, and they are in a specific order.

I had an array of test headlines, and most of the output was "UTF-8", however, the test headlines that had the mystery characters said the encoding was "ISO-8859-1".

Solution #1 (WordPress specific):


Don't pull the post titles directly from  the database.
Instead, use:
    get_the_title($myPostID)

Solution #2:

Check your database structure. The way data has been saved to your database may have been incorrect. 

If you have PhpMyAdmin or know some other way to view the structure of your database, make sure the fields of your "collation" are "utf8-geneal-ci". With PhpMyAdmin, you can use drop-down menus to change the collation. Important! -- Make sure you have a backup of your database before doing this. In my case, the default database table setup was "latin1-swedish-ci" which was fine back in the days when "ISO-8859-1" was the default encoding. Of course, your situation could be different. This is just a suggestion.

After you have changed you database structure, you may need to implement changes with a command to update the database. In PhpMyAdmin, you can select a box next to the table name, and then you could select "Repair Table" to make the changes stick.

Solution #3:


There are several PHP function that will translate a string from "ISO-8859-1" to "UTF-8", but it's all very confusing because the results may not be what you are expecting. The characters might disappear, the entire string might disappear, or the oddball characters might not change at all. Here's a couple of examples:

     utf8_encode($myTestString);
     iconv("ISO-8859-1", "UTF-8", $myTestString);

-----

The best solution is probably to have the database input be saved in your database as UTF-8 to begin with. Then when you retrieve the data, you should be able to use the UTF-8 charset without translation in your web page. The web page has to have UTF-8 encoding specified.

If you do any of that wrong,

No comments :

Post a Comment