Changing MySQL Database Character Set & Collation Defaults (or not)

| | Comments (2) | TrackBacks (2)

Feliz año nuevo. Most Content Management Systems (such as Drupal & MODx) work in conjunction with (require) a database. So do blogs (such as Movable Type) & forums (such as SMF).

MySQL is the world's most popular database software for web-based applications (.. such as CMS's & blogs & forums). It's what I use. It's the only database system I've ever used.

cPanel / Web Host ManagerThe Rad VPS (« physically located downtown Chicago) uses the cPanel/WHM control panel to help administer the site. cPanel makes it stupid-easy to create a new MySQL database.

You simply » enter a descriptive name for your new DATABASE (such as » rad_modx) and click the button labeled 'Create Database.' cPanel talks to MySQL and creates the database for you. Viola! Done. Too easy.

To use your new DATABASE, you'll also need to create a USER (such as » rad_modx, conveniently same as the database_name). Give this USER a password and click the button labeled » 'Create User.' Voila! Done creating new USER.

Lastly you need to assign a particular USER to a particular DATABASE. You do this by selecting both from their respective drop-down menus (one lists all available DATABASES, the other all possible USERS) and clicking the button labeled 'Add.'

Then you assign to this USER the appropriate PRIVILEDGES (normally ALL) necessary to perform the database functions. Click the button labeled 'Make Changes' and you're done. Voila! As if this weren't easy enough, cPanel even has a wizard to walk you thru these steps.

MySQL Database ManagementSo, in order to configure a database to work with a particular web application (such as Drupal or MODx), you need 3 pieces of info:

  1. database_name
  2. user_name
  3. user_password

Couldn't be easier. Tho I sometimes forget the final step of actually assigning the USER to the DATABASE (after creating both). Creating new DATABASES is not something I do on a regular basis.

RADIFIED currently uses 5 databases. Three for the blogs » 1-each for the 3 different versions of Movable Type I have installed .. based on v263 (installed 2003), v335 (installed 2007) & v432 (installed 2008). Another for Drupal (2008). And 1 for MODx Revolution (beta5), which I installed a few days ago.

The Rad forum, which uses YaBB, doesn't use a database. (At least not yet.) Might be worth noting here that Movable Type, Drupal, MODx & YaBB are all OPEN SOURCE (better than free) .. as is MySQL.

While installing MODx a few days ago, I noticed they include an option to test your database connection & credentials .. to see if MODx can access it okay.

••• today's entry continues here below •••

In addition, this feature also displays the CHARACTER SET and COLLATION settings used by your database .. something I've never seen before.

Character SetFor my MODx database these values were listed as:

  • Character Set » latin1
  • Collation » latin1_swedish_ci (ci = case insensitive)

Uh, Swedish? Say what? Something must be wrong, right?

In researching character sets and collation, I learned there are many possible character sets to choose from. A character set is sort of like an alphabet. Different languages use different alphabets.

Each character set has many collation options (tho only one default). But the default character set for MySQL seems to be » latin1. And the default collation setting for the latin1 character set is » latin1_swedish_ci.

Note that each char_set has a limited number of possible collation options. Using collation options from a different char_set is not allowed and will generate errors that will cause mysql to shut down (as you'll see).

Surprisingly, the default 'swedish' collation setting works fine in most English-based applications. But there are better options, especially since the web is an international venue.

The character set » utf8 would be best (from an international standpoint). UTF-8 is the charset I use on the home page, and for all my new web pages. The default collation setting for the utf8 character set is » utf8_general_ci (which Drupal uses). Another good collation setting for utf8 would be » utf8_unicode_ci .. which is probably the ultimate international option.

So let's change our newly created database (for MODx Revolution) to the utf8 character set. And we'll use either utf8_general_ci or utf8_unicode_ci collation. Good idea, right? (Before we actually install MODx or create any tables in the database.)

Tower of BabelBad Juju

Be afraid. Be very afraid. You're about to enter the world of database black magic. Our whole aim here is to adopt an international posture (via the web).

And you know what happened to the developers at Shinar when they tried to build their project with an international character set. Wasn't pretty. Management put the kibosh on that project and sent the enterprising programmers packing.

Now I enjoy doing stuff like this. Changing. Tinkering. Tweaking. Cuz that's how I learn. But sometimes you get lumped-up in the process (as the Shinarites will confirm), and walk away with nothing but lumps to show for your efforts.

Let me clarify. I'm NOT talking about CHANGING the char_set and collation of a database that's already in-use, which already contains tables with data. That's a whole 'nuther animal. And good luck to you if that's what you want. I've read many pages about doing just that. (May the database gods have mercy on you.)

I'm talking about something much simpler » creating a NEW database (MySQL) with the utf8 char_set (not latin1) .. and collation set to either utf8_general_ci (the default for utf8) OR utf8_unicode_ci.

Actually, I'd prefer utf8_unicode_ci for its maximum international appeal, but I'd probably opt for utf8_general_ci, since that's the default for the utf8 char_set. I stumbled across several pages talking about BUGS when using something other than default collation for a given char_set.

Here's what I want » When I click the 'Create Database' button in cPanel's MySQL module, I want it to spit out a database configured for/with the utf8 char_set and utf8_general_ci collation. And of course, I want all my previously created databases (latin1) to work with whatever changes are required to make that happen.

I don't want to have to edit the database settings AFTER it has already been created (with latin1_swedish_ci collation). That would be asking for trouble.

Observations with latin1_swedish_ci Database Collation

phpMyAdminNow, I told ya all that to tell you this. My observations. I noticed, while snooping around in phpMyAdmin (a front-end GUI to administer your MySQL databases) .. that all my databases use latin1_swedish_ci collation (as you might expect) EXCEPT the one for Drupal .. which is set to utf8_general_ci.

Now, I didn't do anything to change that (from the default of latin1_swedish_ci, like all the others). Heck, I didn't even KNOW about collation until installing MODx a few days ago. This tells me the application itself has the ability (somehow) during installation to dictate the char_set & collation.

I also noticed, on the 'Operations' tab in phpMyAdmin, for the Drupal database, that the Collation setting specified at the bottom of the page .. is set to (reads) » latin1_swedish_ci.

Yes, the Drupal database, which phpMyAdmin says (on the page where you actually view the tables) is using utf8_general_ci collation .. the setting for collation on the 'Operations' page specifies latin1_swedish_ci.

Make whatever determination you want from this apparent contradiction. But it tells me the 'Collation' setting located at the bottom of the 'Operations' page (which you could theoretically use to change/set the collation of a selected database) is more-or-less worthless.

Here's another data-point that leads me to believe that this 'Collation' setting on the 'Operations' page in phpMyAdmin is krap. I deleted my whole MODx install and started over. (Cuz I was having weird problems.) New database. New install directory. New everything. BUT! I changed/set this Collation setting (in phpMyAdmin) for my new database to utf8_general_ci (from its previous setting of latin1_swedish_ci).

MODx Content Management SystemWhen I installed MODx the next time, it AGAIN detected latin1_swedish_ci. "No!" I said (talking to the MODx installer). I set that database to utf8_general_ci. I tried to configure the MODx installer to use the utf8 char_set & utf8_general_ci collation. But every time it gave me an error and refused to continue.

So. I tried (ever so sheepishly) changing the settings in the MODx installer to latin1 and latin1_swedish_ci. And sure 'nuf, the MODx installer said, "Bingo! Now ya got it right!" and let me continue with the installation.

So this tells me the 'Collation' setting located at the bottom of the 'Operations' page in phpMyAdmin .. is worthless. It also tells me the MODx installer will not configure these settings for me (as Drupal did), as some have suggested it might.

All the while, I was trading emails frantically with the tech support guys at WiredTree (who are normally wizardly), asking them if they know anything about how to change the DEFAULT char_set and collation settings for MySQL.

One guy says he found some info located in a file called my.cnf. Says he changed the setting there to utf8_unicode_ci and that I should let him know if it works. So I log back into cPanel and create another database. Listed however, under 'Collation' on the 'Operations' page in phpMyAdmin, the newly created databse says » latin1_swedish_ci. Rats!

But now I know that setting is more-or-less bunk. Cuz my Drupal database ALSO says latin1_swedish_ci here, but it's not. So maybe the same thing applies to my new MODx database. Let's hope.

So I go ahead and install ANOTHER version of MODx (and point it to my new-new database) .. to see what the installer says. Ugh. It says also » latin1_swedish_ci. Now I'm starting to feel beat-up, like the boys at Shinar.

So I put aside MODx and databases and take a breather. I decide to blog about my experience (which is like geek therapy) while it's still fresh in my mind .. cuz it might help some other poor sap who finds himself in a similar situation. Good karma. When I fire up Movable Type however, I get a big fat error (in bright-RED letters, no-less) .. saying MySQL is toast.

So I write back to the tech at WiredTree and tell him. He changes the my.cnf file back to the way it was before and Movable Type magically starts working again. He wrote:

Your MySQL service was not running. This was caused by the last entry in the mysql configuration file: default-collation=utf8_unicode_ci

This was causing MySQL to fail, as described in the MySQL error log:
091231 01:04:56  mysqld started
091231 1:04:56 [ERROR] COLLATION 'utf8_unicode_ci' is not valid for CHARACTER SET 'latin1'
091231 1:04:56 [ERROR] Aborting
091231 1:04:56 [Note] /usr/sbin/mysqld: Shutdown complete

So, that's where I'm at. Lumped up, but loving the stability of latin1_swedish_ci database collation.

I love all Swedes. I'm going to have Swedish pancakes for breakfast. Swedish meatballs for dinner. Heck, I might even vacation in Sweden this summer. Swedish collation might not be very international, but it works.

Tower of BabelIn closing, if we all spoke the same language, none of this would be necessary.

And the Lord came down to see the city and the tower which the sons of men had built.

And the Lord said, Behold, they are one people and they have all one language; and this is only the beginning of what they will do, and now nothing they have imagined they can do will be impossible for them.

Come, let Us go down and there confound their language, that they may not understand one another.

Later I might offer my first impressions of the MODx Revolution CMS (beta5). UPDATE. I've had a few requests for a list of versions for the software involved. See here:

  • Linux CentOS 5.4 (i686 on Virtuozzo VPS)
  • cPanel/WHM 11.25 (VPS Optimized)
  • MySQL 5.0.87-Community
  • MySQL Client 4.1.22
  • phpMyAdmin 3.2.4
  • PHP 5.3.1
  • Apache 2.2.14
  • MODx 2.0 Revolution beta5
  • Drupal 6

For more along these lines, here's a Google search preconfigured for the query » change mysql database character set collation default settings latin1 swedish utf8 unicode phpmyadmin

« Previous Rad entry ||| NEXT Rad entry »

2 TrackBacks

Listed below are links to blogs that reference this entry: Changing MySQL Database Character Set & Collation Defaults (or not).

TrackBack URL for this entry: http://radified.com/cgi-bin/xmt4/mt-tb.cgi/191

PHP Script Installation Service from PHP Script Installation Service on December 22, 2012 6:45 PM

Our team can install any PHP/CGI/Perl script e.g. Blogs, CMS, Forums, Directories etc. Installation charge starts at $10 per script. Read More

Script Installation from Script Installation on February 11, 2013 3:28 AM

Here is my weblog : http://mt4.radified.com/2010/01/mysql-database-character-set-collation-default.html Read More

2 Comments

So, did you get it sorted? I want to use International (non-Swedish chef) characters on my blog and have a similar problem...

About this Entry

This page contains a single entry by Rad published on January 1, 2010 1:01 AM.

A 'Guides' Page & Installing MODx Revolution CMS (beta5) was the previous entry in this blog.

Sexy Shot of MySQL Database Tables Sporting UTF-8 Collation is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.