Showing posts with label character encoding. Show all posts
Showing posts with label character encoding. Show all posts

2/19/2015

PHP: Special Character Encoding & those awful Black Diamonds with Questin Marks or Blank Squares

So, you get a weird character in your web output that might look like this:

   The nation�s first extraterrestrial governor

You know it's a special character -- maybe a angled apostrophe or curly quote, maybe some type of an accented character.

This is likely caused by a problem with two different types of character encoding.
The above text is a headline from a WordPress database that I have.

The WP database says it is encoded as "utf8_general_ci"
The head of my HTML5 web page specifies this:
"<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />"

So, why doesn't it come out correctly? I don't know.

However, there is a PHP function that you can use to test the character encoding of your string:

   echo mb_detect_encoding ($testHead, 'UTF-8,Windows-1252,ASCII,ISO-8859-1,ISO-8859-15');

The string being tested is $testHead.
The comma-separated parameter is a list of character sets to test on the string, and they are in a specific order.

I had an array of test headlines, and most of the output was "UTF-8", however, the test headlines that had the mystery characters said the encoding was "ISO-8859-1".

Solution #1 (WordPress specific):


Don't pull the post titles directly from  the database.
Instead, use:
    get_the_title($myPostID)

Solution #2:

Check your database structure. The way data has been saved to your database may have been incorrect. 

If you have PhpMyAdmin or know some other way to view the structure of your database, make sure the fields of your "collation" are "utf8-geneal-ci". With PhpMyAdmin, you can use drop-down menus to change the collation. Important! -- Make sure you have a backup of your database before doing this. In my case, the default database table setup was "latin1-swedish-ci" which was fine back in the days when "ISO-8859-1" was the default encoding. Of course, your situation could be different. This is just a suggestion.

After you have changed you database structure, you may need to implement changes with a command to update the database. In PhpMyAdmin, you can select a box next to the table name, and then you could select "Repair Table" to make the changes stick.

Solution #3:


There are several PHP function that will translate a string from "ISO-8859-1" to "UTF-8", but it's all very confusing because the results may not be what you are expecting. The characters might disappear, the entire string might disappear, or the oddball characters might not change at all. Here's a couple of examples:

     utf8_encode($myTestString);
     iconv("ISO-8859-1", "UTF-8", $myTestString);

-----

The best solution is probably to have the database input be saved in your database as UTF-8 to begin with. Then when you retrieve the data, you should be able to use the UTF-8 charset without translation in your web page. The web page has to have UTF-8 encoding specified.

If you do any of that wrong,

4/22/2013

Odd character codes showing up in MovableType posts

The most likely cause of character encoding problems occurs because of a mismatch of instructions. Most often this happens with extended characters like smart quotes, apostrophes, ellipses, en-dashes, foreign characters, etc.

In the MovableType configuration file -- mt-config.cgi -- there may or may not be a line that looks like this:
  • PublishCharset iso-8859-1
Also, there may or may not be a line in your HTML (5) <head> code that looks like this: 
  • <meta http-equiv="content-type" content="text/html; charset=iso-8859-1" />
The important part of these lines is the value of the charset -- in this case it is "iso-8859-1."

If you make sure that both of these lines exist, there should be more consistency between the character encoding that MovableType spits out and the character set that the browser is expecting to receive.

Also, the "iso-8859-1" charset is no longer supported, so you are probably better off with "UTF-8".

So, the mt-config.cgi file should have a line under "#======== REQUIRED SETTINGS ==========" that looks like this:

  • PublishCharset utf-8
And the <head> code of your templates should contain a line that looks like this:
  • <meta http-equiv="content-type" content="text/html; charset=utf-8" />
When you are done updating these settings, you need to republish your whole blog by clicking on the publish button (which looks like tow arrows chasing one another in a circle), and hitting Publish for All Files.