Understanding character set and collations

An important yet most ignored part in software concepts is Character Set and Collations. It is a vital thing that every developer must familiar with.

The objective of this article is to make you understand what is meant by character set and collations. I will take help of MySQL in the process.

Please remember that character set and collation are valid for text data like CHAR, VARCHAR, TEXT etc. It has no significance for data of type INT, BINARY etc.

Character Set decides which characters you are allowed to use while storing information in your application.

Collation decides how the two piece of information will be compared or sort in case you do an operation on the data in your application.

mysql> CREATE TABLE `person`
    -> (
    ->   `id` int not null auto_increment,
    ->   `name` varchar(100) COLLATE utf8_unicode_ci NOT NULL,
    ->   PRIMARY KEY (`id`),
    ->   UNIQUE KEY (`name`)
    -> ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
    -> ;
Query OK, 0 rows affected (0.03 sec)

mysql> INSERT INTO person (`name`) VALUES ('resumé');
Query OK, 1 row affected (0.01 sec)

mysql> INSERT INTO person (`name`) VALUES ('resume');
ERROR 1062 (23000): Duplicate entry 'resume' for key 'name'

mysql> select * from person;
+----+--------+
| id | name   |
+----+--------+
|  1 | resumé |
+----+--------+
1 row in set (0.00 sec)

This happens because the character set utf8 allowed us both the e and é character whereas collation utf8_unicode_ci decides that e and é will be compared as similar character.

Let us see what happens if we take name as a Binary type column. The same effect will be observed with utf8_bin for above case. But this will also make your column case sensitive during sort and compare.

mysql> CREATE TABLE `person` (
    ->   `id` int(11) NOT NULL AUTO_INCREMENT,
    ->   `name` varbinary(100) NOT NULL,
    ->   PRIMARY KEY (`id`),
    ->   UNIQUE KEY `name` (`name`)
    -> ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
    -> ;
Query OK, 0 rows affected (0.02 sec)

mysql>
mysql> INSERT INTO person (`name`) VALUES ('resumé');
Query OK, 1 row affected (0.00 sec)

mysql>
mysql> INSERT INTO person (`name`) VALUES ('resume');
Query OK, 1 row affected (0.00 sec)

mysql>
mysql> select * from person;
+----+--------+
| id | name   |
+----+--------+
|  2 | resume |
|  1 | resumé |
+----+--------+
2 rows in set (0.00 sec)

 

Comments

Back To Top

Popular posts from this blog

error 18 at 0 depth lookup: self signed certificate

How to check fragmentation in MySQL tables

How to Drop or Remove or Decommission a Database in Oracle