Optimize your MySQL queries to fine tune your application
by Glyn Mooney
As applications are getting more complex and manipulating data in more complex ways it’s becoming more important to optimise our database queries to be as efficient as possible. I’m not going to cover caching in this article, instead I’m going to cover a more fundamental rule that should be followed when querying any database and follows this Python rule:
“Explicit is better than implicit.”
Lets start by looking at MySQL query performance from the command line. Lets imagine we had a user table with the following fields:
`id` INT( 6 ) UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY ,
`group_id` INT( 6 ) UNSIGNED NOT NULL ,
`username` VARCHAR( 30 ) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL ,
`email` VARCHAR( 100 ) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL ,
`password` VARCHAR( 36 ) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL ,
`created` DATETIME NOT NULL DEFAULT '0000-00-00 00:00:00',
`updated` DATETIME NOT NULL DEFAULT '0000-00-00 00:00:00',
`active` TINYINT( 1 ) NOT NULL DEFAULT '0',
UNIQUE (
`username` ,
`email`
)
) ENGINE = MYISAM CHARACTER SET utf8 COLLATE utf8_general_ci;
Normally to retrieve data from this table you would execute a query like the following. please note that I am using a remote database to create some lag. If you use a local database on your development machine you will probably retrieve the data in 0 seconds.
+----+----------+----------+-----------------------+----------------------------------+---------------------+---------------------+--------+
| id | group_id | username | email | password | created | updated | active |
+----+----------+----------+-----------------------+----------------------------------+---------------------+---------------------+--------+
| 1 | 1 | admin | admin@example.com | 21232f297a57a5a743894a0e4a801fc3 | 2010-04-11 23:08:51 | 2010-04-11 23:08:51 | 1 |
| 2 | 2 | USER | USER@example.com | ee11cbb19052e40b07aac0ca060c23ee | 2010-04-11 23:08:51 | 2010-04-11 23:08:51 | 1 |
| 3 | 3 | test | test@example.com | 098f6bcd4621d373cade4e832627b4f6 | 2010-04-11 23:13:50 | 2010-04-11 23:13:50 | 0 |
| 4 | 4 | temp | temp@example.com | 3d801aa532c1cec3ee82d87a99fdf63f | 2010-04-11 23:13:50 | 2010-04-11 23:13:50 | 1 |
| 5 | 2 | me | me@example.com | ab86a1e1ef70dff97959067b723c5c24 | 2010-04-11 23:15:50 | 2010-04-11 23:15:50 | 1 |
| 6 | 2 | you | you@example.com | 639bae9ac6b3e1a84cebb7b403297b79 | 2010-04-11 23:15:50 | 2010-04-11 23:15:50 | 0 |
| 7 | 1 | john | john@example.com | 527bd5b5d689e2c32ae974c6229ff785 | 2010-04-11 23:17:04 | 2010-04-11 23:17:04 | 1 |
| 8 | 1 | master | master@example.com | eb0a191797624dd3a48fa681d3061212 | 2010-04-11 23:17:04 | 2010-04-11 23:17:04 | 1 |
| 9 | 1 | theboss | theboss@testing.com | b248e08d5c23541514558eea059c08cf | 2010-04-11 23:23:43 | 2010-04-11 23:23:43 | 1 |
| 10 | 2 | susan | susan@localdomain.com | ac575e3eecf0fa410518c2d3a2e7209f | 2010-04-11 23:23:43 | 2010-04-11 23:23:43 | 1 |
+----+----------+----------+-----------------------+----------------------------------+---------------------+---------------------+--------+
10 ROWS IN SET (0.05 sec)
In the previous query we used the wild-card selector ‘*’ to return all columns from the ‘users’ table. This in turn queried the database and returned the results as above. What I’m going to do next is query the database for the same data but this time I’m going to name the fields that I require in my returned dataset:
+----+----------+----------+-----------------------+----------------------------------+---------------------+---------------------+--------+
| id | group_id | username | email | password | created | updated | active |
+----+----------+----------+-----------------------+----------------------------------+---------------------+---------------------+--------+
| 1 | 1 | admin | admin@example.com | 21232f297a57a5a743894a0e4a801fc3 | 2010-04-11 23:08:51 | 2010-04-11 23:08:51 | 1 |
| 2 | 2 | USER | USER@example.com | ee11cbb19052e40b07aac0ca060c23ee | 2010-04-11 23:08:51 | 2010-04-11 23:08:51 | 1 |
| 3 | 3 | test | test@example.com | 098f6bcd4621d373cade4e832627b4f6 | 2010-04-11 23:13:50 | 2010-04-11 23:13:50 | 0 |
| 4 | 4 | temp | temp@example.com | 3d801aa532c1cec3ee82d87a99fdf63f | 2010-04-11 23:13:50 | 2010-04-11 23:13:50 | 1 |
| 5 | 2 | me | me@example.com | ab86a1e1ef70dff97959067b723c5c24 | 2010-04-11 23:15:50 | 2010-04-11 23:15:50 | 1 |
| 6 | 2 | you | you@example.com | 639bae9ac6b3e1a84cebb7b403297b79 | 2010-04-11 23:15:50 | 2010-04-11 23:15:50 | 0 |
| 7 | 1 | john | john@example.com | 527bd5b5d689e2c32ae974c6229ff785 | 2010-04-11 23:17:04 | 2010-04-11 23:17:04 | 1 |
| 8 | 1 | master | master@example.com | eb0a191797624dd3a48fa681d3061212 | 2010-04-11 23:17:04 | 2010-04-11 23:17:04 | 1 |
| 9 | 1 | theboss | theboss@testing.com | b248e08d5c23541514558eea059c08cf | 2010-04-11 23:23:43 | 2010-04-11 23:23:43 | 1 |
| 10 | 2 | susan | susan@localdomain.com | ac575e3eecf0fa410518c2d3a2e7209f | 2010-04-11 23:23:43 | 2010-04-11 23:23:43 | 1 |
+----+----------+----------+-----------------------+----------------------------------+---------------------+---------------------+--------+
10 ROWS IN SET (0.01 sec)
As you can see, by defining a more explicit query you have reduced the execution time from 0.05 to 0.01 seconds. This may not be too impressive, but imagine the “now” common application queries a database. During the life cycle of a page execution in a content management system you could be looking at around 10 – 40 (this is just an example figure) database queries being executed during this time. In turn each of these queries take .05 seconds to execute (40 * .05) = 2 seconds. So you go through your application and modify it to select one the columns you need. If we take the examples above we would have (40 * .01) = .4, that’s a massive 1.6 (thanks ekozan) seconds faster!
Hope this helps!