Optimize your MySQL queries to fine tune your application

by Glyn Mooney

As applications are getting more complex and manipulating data in more complex ways it’s becoming more important to optimise our database queries to be as efficient as possible. I’m not going to cover caching in this article, instead I’m going to cover a more fundamental rule that should be followed when querying any database and follows this Python rule:

“Explicit is better than implicit.”

Lets start by looking at MySQL query performance from the command line. Lets imagine we had a user table with the following fields:

CREATE  TABLE  `article_db`.`users` (
`id` INT( 6  )  UNSIGNED NOT  NULL  AUTO_INCREMENT  PRIMARY  KEY ,
`group_id` INT( 6  )  UNSIGNED NOT  NULL ,
`username` VARCHAR( 30  )  CHARACTER  SET utf8 COLLATE utf8_general_ci NOT  NULL ,
`email` VARCHAR( 100  )  CHARACTER  SET utf8 COLLATE utf8_general_ci NOT  NULL ,
`password` VARCHAR( 36  )  CHARACTER  SET utf8 COLLATE utf8_general_ci NOT  NULL ,
`created` DATETIME NOT  NULL DEFAULT  '0000-00-00 00:00:00',
`updated` DATETIME NOT  NULL DEFAULT  '0000-00-00 00:00:00',
`active` TINYINT( 1  )  NOT  NULL DEFAULT  '0',
UNIQUE (
`username` ,
`email`
)
)  ENGINE  =  MYISAM  CHARACTER  SET utf8 COLLATE utf8_general_ci;

Normally to retrieve data from this table you would execute a query like the following. please note that I am using a remote database to create some lag. If you use a local database on your development machine you will probably retrieve the data in 0 seconds.

mysql> SELECT * FROM users;
+----+----------+----------+-----------------------+----------------------------------+---------------------+---------------------+--------+
| id | group_id | username | email                 | password                         | created             | updated             | active |
+----+----------+----------+-----------------------+----------------------------------+---------------------+---------------------+--------+
|  1 |        1 | admin    | admin@example.com     | 21232f297a57a5a743894a0e4a801fc3 | 2010-04-11 23:08:51 | 2010-04-11 23:08:51 |      1 |
|  2 |        2 | USER     | USER@example.com      | ee11cbb19052e40b07aac0ca060c23ee | 2010-04-11 23:08:51 | 2010-04-11 23:08:51 |      1 |
|  3 |        3 | test     | test@example.com      | 098f6bcd4621d373cade4e832627b4f6 | 2010-04-11 23:13:50 | 2010-04-11 23:13:50 |      0 |
|  4 |        4 | temp     | temp@example.com      | 3d801aa532c1cec3ee82d87a99fdf63f | 2010-04-11 23:13:50 | 2010-04-11 23:13:50 |      1 |
|  5 |        2 | me       | me@example.com        | ab86a1e1ef70dff97959067b723c5c24 | 2010-04-11 23:15:50 | 2010-04-11 23:15:50 |      1 |
|  6 |        2 | you      | you@example.com       | 639bae9ac6b3e1a84cebb7b403297b79 | 2010-04-11 23:15:50 | 2010-04-11 23:15:50 |      0 |
|  7 |        1 | john     | john@example.com      | 527bd5b5d689e2c32ae974c6229ff785 | 2010-04-11 23:17:04 | 2010-04-11 23:17:04 |      1 |
|  8 |        1 | master   | master@example.com    | eb0a191797624dd3a48fa681d3061212 | 2010-04-11 23:17:04 | 2010-04-11 23:17:04 |      1 |
|  9 |        1 | theboss  | theboss@testing.com   | b248e08d5c23541514558eea059c08cf | 2010-04-11 23:23:43 | 2010-04-11 23:23:43 |      1 |
| 10 |        2 | susan    | susan@localdomain.com | ac575e3eecf0fa410518c2d3a2e7209f | 2010-04-11 23:23:43 | 2010-04-11 23:23:43 |      1 |
+----+----------+----------+-----------------------+----------------------------------+---------------------+---------------------+--------+
10 ROWS IN SET (0.05 sec)

In the previous query we used the wild-card selector ‘*’ to return all columns from the ‘users’ table. This in turn queried the database and returned the results as above. What I’m going to do next is query the database for the same data but this time I’m going to name the fields that I require in my returned dataset:

mysql> SELECT id,group_id,username,email,password,created,updated,active FROM users;

+----+----------+----------+-----------------------+----------------------------------+---------------------+---------------------+--------+

| id | group_id | username | email                 | password                         | created             | updated             | active |

+----+----------+----------+-----------------------+----------------------------------+---------------------+---------------------+--------+

|  1 |        1 | admin    | admin@example.com     | 21232f297a57a5a743894a0e4a801fc3 | 2010-04-11 23:08:51 | 2010-04-11 23:08:51 |      1 |

|  2 |        2 | USER     | USER@example.com      | ee11cbb19052e40b07aac0ca060c23ee | 2010-04-11 23:08:51 | 2010-04-11 23:08:51 |      1 |

|  3 |        3 | test     | test@example.com      | 098f6bcd4621d373cade4e832627b4f6 | 2010-04-11 23:13:50 | 2010-04-11 23:13:50 |      0 |

|  4 |        4 | temp     | temp@example.com      | 3d801aa532c1cec3ee82d87a99fdf63f | 2010-04-11 23:13:50 | 2010-04-11 23:13:50 |      1 |

|  5 |        2 | me       | me@example.com        | ab86a1e1ef70dff97959067b723c5c24 | 2010-04-11 23:15:50 | 2010-04-11 23:15:50 |      1 |

|  6 |        2 | you      | you@example.com       | 639bae9ac6b3e1a84cebb7b403297b79 | 2010-04-11 23:15:50 | 2010-04-11 23:15:50 |      0 |

|  7 |        1 | john     | john@example.com      | 527bd5b5d689e2c32ae974c6229ff785 | 2010-04-11 23:17:04 | 2010-04-11 23:17:04 |      1 |

|  8 |        1 | master   | master@example.com    | eb0a191797624dd3a48fa681d3061212 | 2010-04-11 23:17:04 | 2010-04-11 23:17:04 |      1 |

|  9 |        1 | theboss  | theboss@testing.com   | b248e08d5c23541514558eea059c08cf | 2010-04-11 23:23:43 | 2010-04-11 23:23:43 |      1 |

| 10 |        2 | susan    | susan@localdomain.com | ac575e3eecf0fa410518c2d3a2e7209f | 2010-04-11 23:23:43 | 2010-04-11 23:23:43 |      1 |

+----+----------+----------+-----------------------+----------------------------------+---------------------+---------------------+--------+

10 ROWS IN SET (0.01 sec)

As you can see, by defining a more explicit query you have reduced the execution time from 0.05 to 0.01 seconds. This may not be too impressive, but imagine the “now” common application queries a database. During the life cycle of a page execution in a content management system you could be looking at around 10 – 40 (this is just an example figure) database queries being executed during this time. In turn each of these queries take .05 seconds to execute (40 * .05) = 2 seconds. So you go through your application and modify it to select one the columns you need. If we take the examples above we would have (40 * .01) = .4, that’s a massive 1.6 (thanks ekozan) seconds faster!

Hope this helps!