Until PHP goes utf-8 native in PHP, we have to pay particular attention to the way we handle extended ASCII. For example I just fixed a simple issue for a client where they were pasting content from word into their cms. In their browser, certain quote characters, copyright and trademark symbols appeared as question marks.

This happens when you try to render non utf-8 text as utf-8. Utf-8, shares the same 7-bit ASCII range as common windows encodings cp-1252 and iso-8859-1. Lower ASCII is the basic alphabet e.g. a-z, A-Z, hyphens, commas, etc. However extended characters, like accents, symbols and umlauts change. So a cp-1252 trademark symbol has a different code in utf-8. Try to render a cp-1252 copyright on utf-8 you will just see a question mark in the browser, as that code does not have a corresponding entity in utf-8.

To fix this you need to normalise your content. If it's going into mysql, ensure your tables are set to your normalised format. I recommend using utf-8 as this normal form.

To normalise your text, use iconv, which is available as a php extension.

$utf8Text = iconv('iso8859-1', 'utf-8', $isoText);

When outputing to the browser run this through htmlentities or htmlspecislchars. Both these functions expect iso-8859-1 input. To correctly prepare your utf-8 text for output you need to supply a third parameter to these functions supplying your text's encoding, in this case utf-8.

htmlentities($text, null, 'utf-8');

Failing to supply this parameter means your text will be broken and your extended characters displayed as questiomarks.

PHP provides all the tools you need to properly handle many different text encodings. As a developer you just need to normalise your input, and properly encode your output.