Debugging the byte value of PHP strings
PHP 5.2 and earlier PHP versions are not very character set aware. With the help of the multibyte and iconv extensions it is possible to convert a string from one character set to the other, but in PHP a strings remains just a collection of bytes. Viewing these bytes is hard: just echoing a string won’t get you very far. The bytes you send to the browser are interpreted by the browser and are displayed as characters. Which characters depends on your Content-Type header. Many bytes do not show up as a printable character in ISO-8859-1, this makes it even harder to find out which bytes a string contains.
A handy way to view the bytes in a string is using the build-in urlencode() function. This function converts every non alphanumerical byte to a % sign followed by the value of the byte in hex. UTF-8 strings will show two or three bytes for each non-alphanumerical character, while ISO-8859-1 string will show only one byte per character.
$string = 'Café';
echo urlencode($string);
// This will echo "Caf%E9"
$string = iconv('ISO-8859-1', 'UTF-8', $string);
echo urlencode($string);
// This will echo "Caf%C3%A8"
I hope this will help you debug your UTF-8 applications.
About this entry
You’re currently reading “Debugging the byte value of PHP strings,” an entry on Willem Stuursma
- Published:
- January 11, 2009 / 10:22
- Category:
- php
- Tags:
- character set, php, utf8

No comments yet
Jump to comment form | comment rss [?] | trackback uri [?]