With Ruby 1.9 out there and all the multibyte string goodness it brings, it's a good time to think about your character encodings. Here are a few pointers on getting everything synched up.
1. MySQL encoding.
How to check it: I use Sequel Pro ... just click on a table name for the metadata:
How to change it if it's wrong:
I had a bunch of UTF content living in a latin1 table (MySQL calls it
ISO-5589-1 latin1). To fix this, export your database and re- import as UTF Details are here, but the gist of it is:
mysqldump -uUSER -pPASSWORD --default-character-set=latin1 DB_NAME | sed 's/latin1/utf8/' > temp.sql mysql -uUSER -pPASSWORD DBNAME < temp.sql
Why not just run the conversions in-place (
ALTER TABLE table_XXX CONVERT TO CHARACTER SET utf8 COLLATE utf8_ci;)? That's a different operation -- CONVERT TO CHARACTER SET is appropriate when your content and your DB's encoding already match, and you want to convert it to another encoding. If you have a mismatch in content and encoding, the export/import trick is just what you need. Discussion on additional techniques are here.
2. The charset defined in your HTML headers
How to check it: use curl
~ $ curl -I http://hotspotr.com HTTP/1.1 302 Found ... Content-Type: text/html; charset=utf-8
Rails uses utf-8 by default, so unless you've consciously changed it you should be good.
3. The charset specified in your HTML metatags.
How to check it: just view your source and look for something along the lines of
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />. Yes, this is different from the charset in the headers. Mine didn't match on one site. Fortunately, browsers pretty much ignore the metatags in favor of the value given in the header. Still, if you've got a mismatch, it's trivial to fix.
How to fix: open up your application.html.erb and make the change.
4. Your database.yml
Just make sure you have the line
encoding: utf8 in your database configuration blocks in database.yml.
5. One more thing to look at...
if you're doing static html caching, then Apache (or whatever web server you're using) probably controls the charset when it serves up the cached page. Make sure it's setting the right charset. More details here