I discovered a simple optimization to a troublesome slow MySQL query the other day and wanted to share the tip.
I have a very large table of users — around 1.5M records. When joining the users table with the email_list table, I get a list of subscribers. My email service provider’s API accepts batches of 1000 email addresses to which I send via worker scripts.
The obvious query to get batches of 1000 email addresses is to join the tables and LIMIT results to 1000 records and progressively increase the OFFSET by 1000.
mysql> SELECT `users`.`id`, `users`.`email` FROM `users`, `email_list` WHERE `users`.`id` = `email_list`.`user_id` AND `email_list`.`type` = 1 LIMIT 1000 OFFSET 1000000; ... 1000 rows in set (4.47 sec)
This query represents the 1000th iteration starting at the 1 millionth record. We only need 1000 rows, so why is this query taking a whopping 4.47 seconds?!
My understanding is though we only need 1000 records, we still need to count off the first 1,000,000. Even optimized tables with properly indexed keys may require memory, temp tables and swap file to fetch the rows at a speed impediment.
Take 2…
I have the luxury of not needing exactly 1000 email addresses (rows) for my workers, so I tried limiting the examined rows with a WHERE on the user ID. The BETWEEN comparison operator will limit the need to count off the previous records. Check it out:
mysql> SELECT `users`.`id`, 'users`.`email` FROM `users`, `email_list` WHERE `users`.`id` = `email_list`.`user_id` AND `email_list`.`type` = 1 AND `users`.`id` BETWEEN 1000000 AND 1001000; ... 971 rows in set (0.00 sec)
The above query does same thing as its predecessor, with the exception that the count of results might vary because not every user is subscribed to a list or users might have been deleted. I’ve effectively reduced the speed of the query to milliseconds because it only has to examine 1000s of rows instead of a million.
Here are the EXPLAINs on the queries to further illustrate the number of rows being examined by MySQL. See how the BETWEEN is the only example that takes advantage of the primary key.
-- uses email index and scans all rows mysql> EXPLAIN SELECT `id`, `email` FROM `users`, `email_list` WHERE `users`.`id` = `email_list`.`user_id` AND `email_list`.`type` = 1 LIMIT 1000 OFFSET 1000000; +----+-------------+------------+--------+--------------------+---------+---------+-------------------+---------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+------------+--------+--------------------+---------+---------+-------------------+---------+-------------+ | 1 | SIMPLE | users | index | PRIMARY | email | 767 | NULL | 1472873 | Using index | | 1 | SIMPLE | email_list | eq_ref | PRIMARY,users,type | PRIMARY | 5 | db.users.id,const | 1 | Using index | +----+-------------+------------+--------+--------------------+---------+---------+-------------------+---------+-------------+ -- even forcing the primary key, it still scans all rows mysql> EXPLAIN SELECT `id`, `email` FROM `users` JOIN `email_list` force INDEX (primary) on `users`.`id` = `email_list`.`user_id` WHERE `email_list`.`type` = 1 LIMIT 1000 OFFSET 1000000; +----+-------------+------------+--------+---------------+---------+---------+-------------------+---------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+------------+--------+---------------+---------+---------+-------------------+---------+-------------+ | 1 | SIMPLE | users | index | PRIMARY | email | 767 | NULL | 1472873 | Using index | | 1 | SIMPLE | email_list | eq_ref | PRIMARY | PRIMARY | 5 | db.users.id,const | 1 | Using index | +----+-------------+------------+--------+---------------+---------+---------+-------------------+---------+-------------+ -- BETWEEN uses the primary key mysql> EXPLAIN SELECT `users`.`id`, `users`.`email` FROM `users`, `email_list` WHERE `users`.`id` = `email_list`.`user_id` AND `email_list`.`type` = 1 AND `users`.`id` BETWEEN 1000000 AND 1001000; +----+-------------+------------+--------+--------------------+---------+---------+-------------------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+------------+--------+--------------------+---------+---------+-------------------+------+-------------+ | 1 | SIMPLE | users | range | PRIMARY | PRIMARY | 4 | NULL | 2016 | Using where | | 1 | SIMPLE | email_list | eq_ref | PRIMARY,users,type | PRIMARY | 5 | db.users.id,const | 1 | Using index | +----+-------------+------------+--------+--------------------+---------+---------+-------------------+------+-------------+
Do you have any MySQL query optimization tips?