In our MySQL trainings and consulting engagements we tell our customers always to use the smallest possible data type to get better query performance. Especially for the JOIN
columns. This advice is supported as well by the MySQL documentation in the chapter Optimizing Data Types:
Use the most efficient (smallest) data types possible. MySQL has many specialized types that save disk space and memory. For example, use the smaller integer types if possible to get smaller tables. MEDIUMINT
is often a better choice than INT
because a MEDIUMINT
column uses 25% less space.
I remember somewhere the JOIN
columns where explicitly mentioned but I cannot find it any more.
Test set-up
To get numbers we have created a little test set-up:
CREATE TABLE `a` ( `id` int(10) unsigned NOT NULL AUTO_INCREMENT , `data` varchar(64) DEFAULT NULL , `ts` timestamp NOT NULL , PRIMARY KEY (`id`) ) ENGINE=InnoDB CHARSET=latin1 | CREATE TABLE `b` ( `id` int(10) unsigned NOT NULL AUTO_INCREMENT , `data` varchar(64) DEFAULT NULL , `ts` timestamp NOT NULL , `a_id` int(10) unsigned DEFAULT NULL , PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
1048576 rows | 16777216 rows |
The following query was used for the test:
EXPLAIN SELECT * FROM a JOIN b ON b.a_id = a.id WHERE a.id BETWEEN 10000 AND 15000; +----+-------------+-------+--------+---------------+---------+---------+-------------+----------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+--------+---------------+---------+---------+-------------+----------+-------------+ | 1 | SIMPLE | b | ALL | NULL | NULL | NULL | NULL | 16322446 | Using where | | 1 | SIMPLE | a | eq_ref | PRIMARY | PRIMARY | 4 | test.b.a_id | 1 | NULL | +----+-------------+-------+--------+---------------+---------+---------+-------------+----------+-------------+
And yes: I know this query could be more optimal by setting an index on b.a_id
.
Results
The whole workload was executed completely in memory and thus CPU bound (we did not want to measure the speed of our I/O system).
SE | JOIN column | bytes | query time | Gain | Space | Character set | |
---|---|---|---|---|---|---|---|
InnoDB | MEDIUMINT | 3 | 5.28 s | 96% | 4% faster | 75% | |
InnoDB | INT | 4 | 5.48 s | 100% | 100% | 100% | |
InnoDB | BIGINT | 8 | 5.65 s | 107% | 7% slower | 200% | |
InnoDB | NUMERIC(7, 2) | ~4 | 6.77 s | 124% | 24% slower | ~100% | |
InnoDB | VARCHAR(7) | 7-8 | 6.44 s | 118% | 18% slower | ~200% | latin1 |
InnoDB | VARCHAR(16) | 7-8 | 6.44 s | 118% | 18% slower | ~200% | latin1 |
InnoDB | VARCHAR(32) | 7-8 | 6.42 s | 118% | 18% slower | ~200% | latin1 |
InnoDB | VARCHAR(128) | 7-8 | 6.46 s | 118% | 18% slower | ~200% | latin1 |
InnoDB | VARCHAR(256) | 8-9 | 6.17 s | 114% | 14% slower | ~225% | latin1 |
InnoDB | VARCHAR(16) | 7-8 | 6.96 s | 127% | 27% slower | ~200% | utf8 |
InnoDB | VARCHAR(128) | 7-8 | 6.82 s | 124% | 24% slower | ~200% | utf8 |
InnoDB | CHAR(16) | 16 | 6.85 s | 125% | 25% slower | 400% | latin1 |
InnoDB | CHAR(128) | 128 | 9.68 s | 177% | 77% slower | 3200% | latin1 |
InnoDB | TEXT | 8-9 | 10.7 s | 195% | 95% slower | ~225% | latin1 |
MyISAM | INT | 4 | 3.16 s | 58% | 42% faster | ||
TokuDB | INT | 4 | 4.52 s | 82% | 18% faster |
Some comments to the tests:
- MySQL 5.6.13 was used for most of the tests.
- TokuDB v7.1.0 was tested with MySQL 5.5.30.
- As results the optimistic cases were taken. In reality the results can be slightly worse.
- We did not take into consideration that bigger data types will eventually cause more I/O which is very slow!
Commands
ALTER TABLE a CONVERT TO CHARACTER SET latin1; ALTER TABLE b CONVERT TO CHARACTER SET latin1; ALTER TABLE a MODIFY COLUMN id INT UNSIGNED NOT NULL; ALTER TABLE b MODIFY COLUMN a_id INT UNSIGNED NOT NULL;