MySQL backup to file, gzip and load in one step

November 29, 2012, 3:26 pm

≫ Next: Last login of MySQL database users

≪ Previous: Resize XFS file system for MySQL

Taxonomy upgrade extras:

When a MySQL Slave is set-up with mysqldump you have 2 possibilities:

You dump into a file and then load the data into the Slave with the mysql client utility.
You dump directly into the mysql client utility.

The first possibility has the advantage that you can start the load again if it failed. You can look into the file (and do some changes if needed).
The second possibility has the advantage that you do not need disk space and that it is possibly faster. But when the load fails you have to start from the very beginning.

What I was looking for is a way to combine everything in one step: Dumping to a file including compression and in the same step load the database to a slave. This is what I found to solve these requirements:

mysqldump --user=root --all-databases --flush-privileges --single-transaction --master-data=1 --quick \
--flush-logs --triggers --routines --events | tee >(gzip > /tmp/full_backup.sql.gz) | mysql --user=root --host=192.168.1.60 --port 3306

With this command you can even load several CPUs of the system:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
24747 mysql     20   0  534m  56m 5504 S 36.1  0.7   4:12.35 mysqld
 4967 mysql     20   0  402m  33m 5236 S  7.0  0.4   0:02.06 mysqld
 4982 mysql     20   0 23348 2112 1216 S  6.6  0.0   0:01.64 mysqldump
 4984 mysql     20   0 28608 3856 1372 S  5.6  0.0   0:01.58 mysql
 4986 mysql     20   0  4296  688  304 S  5.3  0.0   0:02.10 gzip
 4983 mysql     20   0 98.5m  628  544 S  0.7  0.0   0:00.13 tee

If gzip becomes the bottleneck you can try with pigz.

↧

Last login of MySQL database users

December 1, 2012, 1:05 am

≫ Next: Shrinking InnoDB system tablespace file ibdata1 PoC

≪ Previous: MySQL backup to file, gzip and load in one step

Taxonomy upgrade extras:

MySQL hosting providers can easily loose the overview over their customers and which user or schema is still in use and which not.

The MySQL database becomes bigger and bigger, uses more and more RAM and disk space and the backup takes longer and longer.

In this situation it would be nice to know which MySQL database user has logged in within the last 6 months for example. MySQL database users who did not login within a defined period can be backuped and removed from the production MySQL database.

The following MySQL login trigger helps to track the login of all non-super privileged MySQL users.

First we need a table where to log the login of the users:

-- DROP DATABASE tracking;
CREATE DATABASE tracking;

use tracking;

-- DROP TABLE IF EXISTS login_tracking;
CREATE TABLE login_tracking (
  user VARCHAR(16)
, host VARCHAR(60)
, ts TIMESTAMP
, PRIMARY KEY (user, host)
) engine = MyISAM;

Then we need a MySQL stored procedure which does the logging of the login:

-- DROP PROCEDURE IF EXISTS login_trigger;

DELIMITER //

CREATE PROCEDURE login_trigger()
SQL SECURITY DEFINER
BEGIN
  INSERT INTO login_tracking (user, host, ts)
  VALUES (SUBSTR(USER(), 1, instr(USER(), '@')-1), substr(USER(), instr(USER(), '@')+1), NOW())
  ON DUPLICATE KEY UPDATE ts = NOW();
END;

//
DELIMITER ;

Then we have to grant the EXECUTE privilege to all users of the database which do not have the SUPER privilege. MySQL users with the SUPER privilege are not logged with the init_connect login trigger hook:

-- REVOKE EXECUTE ON PROCEDURE tracking.login_trigger FROM 'oli'@'%';
GRANT EXECUTE ON PROCEDURE tracking.login_trigger TO 'oli'@'%';

Those GRANTSs can be created with the following query:

tee /tmp/grants.sql
SELECT CONCAT("GRANT EXECUTE ON PROCEDURE tracking.login_trigger TO '", user, "'@'", host, "';") AS query
  FROM mysql.user
 WHERE Super_priv = 'N';
notee

+---------------------------------------------------------------------------------+
| query                                                                           |
+---------------------------------------------------------------------------------+
| GRANT EXECUTE ON PROCEDURE tracking.login_trigger TO 'oli'@'localhost';         |
| GRANT EXECUTE ON PROCEDURE tracking.login_trigger TO 'replication'@'127.0.0.1'; |
| GRANT EXECUTE ON PROCEDURE tracking.login_trigger TO 'oli'@'%';                 |
| GRANT EXECUTE ON PROCEDURE tracking.login_trigger TO ''@'localhost';            |
+---------------------------------------------------------------------------------+

As the last step we have to activate the stored procedure by hooking it into the login trigger hook:

-- SET GLOBAL init_connect="";
SET GLOBAL init_connect="CALL tracking.login_trigger()";

If something went wrong with the login trigger you find the needed information in the MySQL error log.

Reporting

To find out, which users have logged in we can run the following query:

SELECT * FROM tracking.login_tracking;
+------+-----------+---------------------+
| user | host      | ts                  |
+------+-----------+---------------------+
| oli  | localhost | 2012-11-30 15:36:39 |
+------+-----------+---------------------+

To find at what time a user has logged in last you can run:

SELECT u.user, u.host, l.ts
  FROM mysql.user AS u
  LEFT JOIN tracking.login_tracking AS l ON u.user = l.user AND l.host = u.host
 WHERE u.Super_priv = 'N';

+-------------+-----------+---------------------+
| user        | host      | ts                  |
+-------------+-----------+---------------------+
| oli         | localhost | 2012-12-01 09:55:33 |
| replication | 127.0.0.1 | NULL                |
| crm         | 127.0.0.1 | NULL                |
+-------------+-----------+---------------------+

And to find users which are logged but could not be found from the mysql user table you can run:

SELECT l.user, l.host
  FROM tracking.login_tracking AS l
  LEFT JOIN mysql.user AS u ON u.user = l.user AND l.host = u.host
 WHERE u.user IS NULL;

↧

Shrinking InnoDB system tablespace file ibdata1 PoC

December 8, 2012, 4:43 am

≫ Next: Privileges of MySQL backup user for mysqldump

≪ Previous: Last login of MySQL database users

Taxonomy upgrade extras:

innodb

In this weeks MySQL workshop we were discussing, beside other things, about the innodb_file_per_table parameter and its advantages of enabling it. In addition there was a discussion if the InnoDB system tablespace file can be shrinked once it has been grown very large or not. We all know the answer: The InnoDB system tablespace file does never shrink again.

But why should it not be possible? Other databases like for example Oracle can shrink or even get rid of tablespace files... After some philosophising about it we came to the conclusion that we should give it a try if this is possible with InnoDB as well.

The scenario we considered was the following: You inherit a MySQL database with InnoDB tables but innodb_file_per_table was set to 0. So all the tables are located in the InnoDB tablespace file. And only a small amount of space is left on the device and there is a lot of free space in the InnoDB system tablespace file. The database itself is much too big to dump and restore and we want to get rid of the one big InnoDB system tablespace file and have many small tablespace files as we get them with innodb_file_per_table = 1.

So what we did is the following: We created InnoDB tables inside the InnoDB system tablespace (ibdata1) and bloat them up. Then we altered them to be placed in their own tablespace files by OPTIMIZE TABLE. And now the tricky part starts: How can we shrink the InnoDB system tablespace file to free the disk space again?

CAUTION: This is a prove of concept and should never be used on a production system!!!

First we move all tables out of the InnoDB system tablespace (with innodb_file_per_table = 1):

mysqlcheck --optimize --all-databases --user=root
...
note     : Table does not support optimize, doing recreate + analyze instead
status   : OK
...

Now all tables have been moved out of the system tablespace, but the file is still about 674 Mbyte in size:

ll ibdata1
-rw-rw----. 1 mysql mysql 706740224 Dec  6 23:37 ibdata1

Then we search for empty blocks at the end of the InnoDB data files:

innochecksum -v -d ibdata1

file ibdata1 = 706740224 bytes (43136 pages)...
checking pages in range 0 to 43135
page 0: log sequence number: first = 3558400819; second = 3558400819
page 0: old style: calculated = 148443420; recorded = 148443420
page 0: new style: calculated = 4252778336; recorded = 4252778336
...
page 42508: log sequence number: first = 0; second = 0
page 42508: old style: calculated = 1371122432; recorded = 0
page 42508: new style: calculated = 1575996416; recorded = 0
...
page 43135: log sequence number: first = 0; second = 0
page 43135: old style: calculated = 1371122432; recorded = 0
page 43135: new style: calculated = 1575996416; recorded = 0

In ideal case we should also find blocks which are not used any more but not blanked out. Theses 627 blocks (of 16k = 10 Mbyte) can easily be removed...

Next we shrink the InnoDB system tablespace file after stopping the mysqld:

printf '' | dd of=ibdata1 bs=16384 seek=42508
ll ibdata1
-rw-rw----. 1 mysql mysql 696451072 Dec  6 23:42 ibdata1

As a next step we have to change the number of blocks in the header of the InnoDB system tablespace file. This can be done with a tool like hexedit (aptitude install hexedit). We have to change at position 0x0030 the value from 43136 (0xA880) to 42508 (0xA60C):

hexdump -C -n 256 ibdata1
00000000  fd 7c 3f 60 00 00 00 00  00 00 00 00 00 00 00 00  |.|?`............|
00000010  00 00 00 00 d4 18 e3 33  00 08 00 00 00 00 d4 18  |.......3........|
00000020  e4 13 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000030  a8 80 00 00 a6 c0 00 00  00 00 00 00 01 21 00 00  |.............!..|

Otherwise we would get an error like:

InnoDB: Error: tablespace size stored in header is 43146 pages, but
InnoDB: the sum of data file sizes is only 42508 pages

It looks like InnoDB itself corrects somehow the block number to a 0x100 boundary (4 Mbyte) later.

As the next step we have to fix the new style check sum (at position 0x0000) and the old style check sum (at position 0x3FFC). You have to do this until innochecksum does not complain anymore:

innochecksum -d -p 0 ibdata1
file ibdata1 = 696451072 bytes (42508 pages)...
checking pages in range 0 to 0
page 0: log sequence number: first = 3558400819; second = 3558400819
page 0: old style: calculated = 2354503790; recorded = 2354503790
page 0: new style: calculated = 3427457314; recorded = 3587772574

When you have done this the database should be ready to start.

The tables later on can be possibly transferred with the transportable tablespace feature which comes with MySQL 5.6.

I have not found a good way yet to find the highest used block in the tablespace file. So it is a wild guess which is dangerous. Especially because some InnoDB UNDO LOG blocks seems to be located there at very high positions:

SELECT page_type, MAX(page_number) AS max_page_number
  FROM information_schema.innodb_buffer_page
 WHERE space = 0
   AND page_number != 0
 GROUP BY page_type
 ORDER BY max_page_number;

+-------------------+-----------------+
| page_type         | max_page_number |
+-------------------+-----------------+
| TRX_SYSTEM        |               5 |
| SYSTEM            |             300 |
| BLOB              |            9366 |
| EXTENT_DESCRIPTOR |           32768 |
| IBUF_BITMAP       |           32769 |
| INODE             |           42123 |
| INDEX             |           45229 |
| ALLOCATED         |           45247 |
| UNDO_LOG          |           45503 |
+-------------------+-----------------+

It would be good if we have a method to relocate those blocks somehow...

To verify that everything works I have tried to increase the system tablespace again. This seems to work if the number of blocks is dividable by 256 (4 Mbyte, or 128 2 Mbyte?). But growing the system tablespace again should not be the intention.

Further according to our tests this method of shrinking the InnoDB system tablespace seems to work with MySQL 5.1, 5.5 and 5.6.

Thanks to Ralf, Torsten and Stefan for assistance!

It would be nice to get some feedback from the InnoDB and Percona guys about how this feature could be implemented correctly...

And finally: Do not blame and beat me. I know that this is an evil hack, but I like to play in my sandbox as I want!

↧

Privileges of MySQL backup user for mysqldump

December 11, 2012, 12:34 pm

≫ Next: Bootstrapping Galera Cluster the new way

≪ Previous: Shrinking InnoDB system tablespace file ibdata1 PoC

Taxonomy upgrade extras:

Backup Restore Recovery

Backup

Some MySQL customers do not want to use the root user for mysqldump backups. For this user you have to grant the following minimal MySQL privileges:

`mysqldump --single-transaction` (InnoDB)

CREATE USER 'backup'@'localhost' IDENTIFIED BY 'secret';
GRANT SELECT, SHOW VIEW, RELOAD, REPLICATION CLIENT, EVENT, TRIGGER ON *.* TO 'backup'@'localhost';

`mysqldump --lock-all-tables` (MyISAM)

GRANT LOCK TABLES ON *.* TO 'backup'@'localhost';

If we missed a privilege please let us know.

↧

Bootstrapping Galera Cluster the new way

February 5, 2013, 10:18 am

≫ Next: Block MySQL traffic for maintenance windows

≪ Previous: Privileges of MySQL backup user for mysqldump

Taxonomy upgrade extras:

galera

cluster

A while ago it was pretty inconvenient to start a complete Galera Cluster from scratch. Rolling restart an such things are already working well but bootstrapping was a pain.

With Galera v2.2 new functionality came in. We tried it out and it did not work as documented. :-( Thanks to Teemu's help we found there was a documentation bug in the Galera documentation.

The settings which were working for us are:

wsrep_cluster_address = "gcomm://192.168.1.2,192.168.1.3?pc.wait_prim=no"

And when all 3 nodes of the Galera Cluster are started and ready to join you can run:

SET GLOBAL wsrep_provider_options="pc.bootstrap=1";

I hope we can go life on Thursday with the new Telco VoIP Cluster for 2500 employees...

Have fun and enjoy an even better Galera Cluster for MySQL!

↧

Block MySQL traffic for maintenance windows

February 18, 2013, 1:32 pm

≫ Next: Switching from MySQL/MyISAM to Galera Cluster

≪ Previous: Bootstrapping Galera Cluster the new way

From time to time some maintenance work on the MySQL database has to be done. During the maintenance window we do not want to have application traffic on the database.

Sometimes it is hard to shut down all applications spread over the whole company. Or we want to allow only some specific hosts to access mysql from remote (for example the monitoring system or the backup server).

For this purpose we can use the Linux packet filtering.

To see what packet filtering rules are available we can run the following command:

iptables -L INPUT -v

To close the MySQL port on all interfaces we use:

iptables -A INPUT -p tcp --dport mysql -j DROP

and to open the MySQL port again after the maintenance window:

iptables -D INPUT -p tcp --dport mysql -j DROP

With the -i option we can restrict the rule to a specific interface for example eth0 and with the option -s we can specify a specific source only. Or with a ! -s we can implement an inverse rule (all but).

↧

Switching from MySQL/MyISAM to Galera Cluster

March 12, 2013, 12:23 am

≫ Next: We need you: MySQL DBA for FromDual Support line

≪ Previous: Block MySQL traffic for maintenance windows

Taxonomy upgrade extras:

Switching from MySQL/MyISAM to Galera Cluster requires that all tables (except those from the mysql, information_schema and performance_schema) are using the InnoDB Storage Engine.

For altering the Storage Engine of the tables we wrote a script (alter_engine.pl) long time ago already. Because we have made many of those switches recently we have extended its functionality.

New features

Recognizes VIEW's and does NOT try to alter their Storage Engine (bug).
Script is MySQL version aware. Complain if too old MySQL version is used.
Find tables without a Primary Key.
Check for too long InnoDB Primary Keys
Check for FULLTEXT indexes in MySQL 5.1 and 5.5 and write a note if version is older.

Example

./alter_engine.pl
User                              [root] : 
Password                              [] : secret
Schema from (or all)              [test] : all
Engine to                       [InnoDB] : 

Version is   : 5.6.10
MR Version is: 050610

The following tables might not have a Primary Key:
+--------------+----------------------+
| table_schema | table_name           |
+--------------+----------------------+
| test         | innodb_table_monitor |
| test         | log_event            |
| test         | parent               |
| test         | t                    |
+--------------+----------------------+
The tables above not having a Primary Key will negatively affect perfor-
mance and data consistency in MySQL Master/Slave replication and Galera
Cluster replication.

The following tables might have a too long Primary Key for InnoDB (> 767 bytes):
+--------------+------------+-------------+
| table_schema | table_name | column_name |
+--------------+------------+-------------+
| test         | test       | data        |
+--------------+------------+-------------+

The following tables might have a FULLTEXT index (which is only supported
in MySQL 5.6 and newer):
+--------------+------------+-------------+
| table_schema | table_name | column_name |
+--------------+------------+-------------+
| test         | test       | data        |
+--------------+------------+-------------+

Output written to /tmp/alter_table_all.sql
After reviewing it you can apply it with mysql --user=root --password=secret

↧

We need you: MySQL DBA for FromDual Support line

April 2, 2013, 2:17 am

≫ Next: Unbreakable MySQL Cluster with Galera and Linux Virtual Server (LVS)

≪ Previous: Switching from MySQL/MyISAM to Galera Cluster

FromDual is looking for professional, enthusiastic and experienced people who:

Know MySQL, Percona Server or MariaDB extensively
Are Familiar with the open source eco-system
Know how to operate database systems, as a DBA or a DevOps
Understand what can go wrong in operating a database
Are happy to work autonomously, remotely and to communicate with IRC, Skype, Mail and Phone
Are comfortable on Linux systems
Are team players, keen to contribute to the growth of the company
Are Comfortable dealing direct with clients and
Look for new challenges

Job description

We are looking for full-time MySQL support engineers (female or male) to primarily take care of our MySQL support services and help our customers operating their MySQL databases (remote-DBA and emergency interventions).

You are well trained and have good experience in:

Operating critical highly available MySQL production databases mostly on Linux.
Running MySQL-Replication in all variants is your daily business.
The working of the most used MySQL HA set-up's and how to fix them efficiently if problems occur. (If you are already experienced in running Galera Cluster this would be a plus!
Open Source Technologies (LAMP stack, etc.)
Bash scripting and you can do some simple programs in at least one popular programming/scripting language (Perl, PHP, ...).

You will be in direct contact with the customers and you need good antennae to listen to them, know how to respond and get the answers to their real problems. You also have to be proactive when something goes wrong and direct the customer back to the right track.

You need to have good communication skills and be an active team player.

To fulfil your job you have to work in European Time Zones. You can organize your working time flexible within certain ranges. Participating in the on call duty is expected. FromDual is a completely virtual company and relocation is not needed (home office). Good English verbally and in writing is a must. Most of our current customers speak German and having German skills is a plus.

Beside being our support engineer we expect you to improve your knowledge and skills and to contribute to improving our monitoring solution, our database controlling solution and our other tools. Further, we expect that you write regular technical articles and give help wherever it is needed or requested...

You should be prepared to work, think and act autonomously most of the time and to teach yourself (using Google, MySQL documentation, testing etc.). If you are ever stuck, your colleagues at FromDual will assist you.

If you need somebody holding your hand all the time, FromDual is not a good choice for you.

Who is FromDual?

FromDual is the leading independent and professional MySQL database consulting and service company in Europe with its Headquarters in Switzerland.

Our customers are mostly located in Europe and range from small start-up companies to some of the top-500 companies of Europe.

You will be joining us at an exciting time. We are growing and we need like-minded people to grow with us, individually and collectively. As our horizons expand, we need our team to expand in its skills, knowledge and expertise.

Applying to join FromDual could be the best decision you make.

How to continue

If you are interested in this opportunity and if you feel you are a good "fit" (we know that there will not be a 100% match!) we would be glad to hear from you.

Please send your true CV with your salary expectation and a list of your open source involvements, blog articles, slides, tweets etc. to jobs@fromdual.com. If you want to know more about this job opportunity or if you want to speak with me, please call me at +41 79 830 09 33 (Oli Sennhauser, CTO). Only candidates, NO head hunters please!

After we received and screened your CV we will invite you to prove your technical skills by taking an exam in operating MySQL. If you pass the exam you will be invited for the final interviews.

This job opportunity is open until May 31^st 2013.

↧

Unbreakable MySQL Cluster with Galera and Linux Virtual Server (LVS)

June 13, 2013, 8:13 am

≫ Next: MySQL and Secure Linux (SELinux)

≪ Previous: We need you: MySQL DBA for FromDual Support line

Taxonomy upgrade extras:

Recently we had to set-up a 3-node Galera Cluster with a Load Balancer in front of it. Because Galera Cluster nodes (mysqld) still reply to TCP requests on port 3306 when they are expelled from the Cluster it is not sufficient to just leave it to the Load Balancer to check the port if a Galera node is properly running or not.

We used the wsrep_notify_cmd variable to hook our own script into the Galera Cluster which disables each Node on the Load Balancer when its state changed.

# my.cnf
#
[mysqld]
wsrep_notify_cmd = /usr/local/bin/lvs_control.sh

The whole Galera Cluster Architecture looks as follows:

As Load Balancer we used the IPVS Load Balancer from the Linux Virtual Server (LVS) Project. This Load Balancer was made highly available with keepalived.

Our script to take a Galera Node out of the Load Balancer was the following:

#!/bin/bash -eu

#
# /etc/mysql/conf.d/wsrep.cnf
#
# [mysqld]
# wsrep_notify_cmd = /usr/local/bin/lvs_control.sh
#

LOG="/tmp/lvs_control.log"
LBIP="192.168.0.89"
VIP="192.168.0.99"
PORT="3306"
LBUSER="galera"
LBUSER="root"
ETC="/etc/mysql/conf.d/wsrep.cnf"
ETC="/home/mysql/data/mysql-5.5-wsrep-23.7-a/my.cnf"
MYIP=''
WEIGHT="100"
DATE=$(date '+%Y-%m-%d %H:%M:%S')

echo $DATE >>$LOG

regex='^.*=\s*([0-9]+.[0-9]+.[0-9]+.[0-9]+).*'
str=$(grep "^wsrep_node_incoming_address" $ETC 2>>$LOG)

if [[ $str =~ $regex ]] ; then
  MYIP=${BASH_REMATCH[1]}
else
  echo "Cannot find IP address in $str">>$LOG
  exit 1
fi

while [ $# -gt 0 ] ; do

  case $1 in
  --status)
    STATUS=$2
    shift
    ;;
  --uuid)
    CLUSTER_UUID=$2
    shift
    ;;
  --primary)
    PRIMARY=$2
    shift
    ;;
  --index)
    INDEX=$2
    shift
    ;;
  --members)
    MEMBERS=$2
    shift
    ;;
  esac
  shift
done

# echo $* >> $LOG
echo $STATUS >> $LOG

# Undefined means node is shutting down
# Synced means node is ready again
if [ "$STATUS" != "Synced" ] ; then
  cmd="ssh $LBUSER@$LBIP 'sudo /sbin/ipvsadm -e -t $VIP:$PORT -r $MYIP -w 0'"
else
  cmd="ssh $LBUSER@$LBIP 'sudo /sbin/ipvsadm -e -t $VIP:$PORT -r $MYIP -w $WEIGHT'"
fi

echo $cmd >>$LOG
eval $cmd >>$LOG 2>&1
echo "ret=$?">>$LOG

exit 0

We assume that the same script can be used with little modifications for the Galera Load Balancer as well.

↧

MySQL and Secure Linux (SELinux)

June 20, 2013, 7:40 am

≫ Next: To UNION or not to UNION...

≪ Previous: Unbreakable MySQL Cluster with Galera and Linux Virtual Server (LVS)

Maybe you experienced some strange behaviour with MySQL: Everything is installed correctly and should work. But it does not.

Symptoms we have seen:

MySQL starts/stops properly when started/stopped with service mysqld restart but MySQL does not start when a server is rebooted.
Or after upgrading MySQL binaries mysqld will not start at all any more.
Or after relocating MySQL datadir or changing default port MySQL does not start any more.

shell> service mysqld start
MySQL Daemon failed to start.
Starting mysqld:                                           [FAILED]

shell> grep mysqld /var/log/boot.log 
Starting mysqld:  [FAILED]

If you are lucky you get some error message like: ERROR! The server quit without updating PID file (/data/mysql/server.pid). or:

130620  9:49:14 [ERROR] Can't start server : Bind on unix socket: Permission denied
130620  9:49:14 [ERROR] Do you already have another mysqld server running on socket: /var/lib/mysql/mysql.sock ?
130620  9:49:14 [ERROR] Aborting

This typically happens when you relocate the MySQL data files (datadir), change port, socket, log file, pid file or similar.

The reason for this problem is not too easy to find. You see some traces in /var/log/boot.log. And if you know where to look for you will find something in /var/log/audit/audit.log. But without knowing where to look and what to look for it is quite hard.
If you are lucky the setroubleshoot utility is installed. This will report problems in the syslog (/var/log/messages).

The cause of this problem might be the Secure Linux (SELinux) feature!

SELinux [1], [2], [3] is typically used in Red Hat, CentOS and Fedora Linux. On Debian, Ubuntu and SuSE you have a similar solution called AppArmor.

To see if SELinux is enabled just run the following command:

shell> sestatus
SELinux status:                 enabled
SELinuxfs mount:                /selinux
Current mode:                   enforcing
Mode from config file:          enforcing
Policy version:                 24
Policy from config file:        targeted

To disable SELinux you just have to run the following command:

shell> setenforce 0

And to make this change persistent you have to change it in the following configuration file:

#
# /etc/selinux/config 
#
# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
#     enforcing - SELinux security policy is enforced.
#     permissive - SELinux prints warnings instead of enforcing.
#     disabled - No SELinux policy is loaded.
SELINUX=permissive
# SELINUXTYPE= can take one of these two values:
#     targeted - Targeted processes are protected,
#     mls - Multi Level Security protection.
SELINUXTYPE=targeted

But possibly you want to move the MySQL datadir to an other location without disabling SELinux? To achieve this proceed with the following steps:

The simple way

If you have just moved datadir or the MySQL port the Blog article SELinux and MySQL of Jeremy Smyth is a good starting point.

Complicated way

If you want to create an other or a new MySQL instance or do some other stuff you have to do some more things manually (possibly there is also an automated way?):

First it is recommended to install the setroubleshoot utility. Then with the command:

shell> tail /var/log/messages
Jun 20 09:38:53 ip-10-39-25-184 setroubleshoot: SELinux is preventing /bin/mkdir from write access on the directory /var/lib. For complete SELinux messages. run sealert -l ef8eae63-7ec3-4b22-87e0-5774120726c3

You will find what is going wrong. Follow the instructions:

shell> sealert -l ef8eae63-7ec3-4b22-87e0-5774120726c3
SELinux is preventing /bin/mkdir from write access on the directory /var/lib.

*****  Plugin catchall_labels (83.8 confidence) suggests  ********************

If you want to allow mkdir to have write access on the lib directory
Then you need to change the label on /var/lib
Do
# semanage fcontext -a -t FILE_TYPE '/var/lib'
where FILE_TYPE is one of the following: var_log_t, mysqld_var_run_t, mysqld_db_t, root_t. 
Then execute: 
restorecon -v '/var/lib'


*****  Plugin catchall (17.1 confidence) suggests  ***************************

If you believe that mkdir should be allowed write access on the lib directory by default.
Then you should report this as a bug.
You can generate a local policy module to allow this access.
Do
allow this access for now by executing:
# grep mkdir /var/log/audit/audit.log | audit2allow -M mypol
# semodule -i mypol.pp

until MySQL starts properly. And also test a reboot of the machine!

↧

To UNION or not to UNION...

July 28, 2013, 3:42 am

≫ Next: Galera Cluster for MySQL and hardware load balancer

≪ Previous: MySQL and Secure Linux (SELinux)

Recently a forum question [ 1 ] got my attention:

Is there any performance issue with Union?

I used union all sometime back and it was performance issue just to make an opinion that we should used union in query.

The question itself was not too interesting because the answer is easy: It depends. But I wanted to see if there was an improvement in this common problem over time in MySQL.

Test set-up

So I prepared a little test to simulate some of the possible scenarios:

CREATE TABLE `u` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `a` int(10) unsigned DEFAULT NULL,
  `b` int(10) unsigned DEFAULT NULL,
  `c` int(10) unsigned DEFAULT NULL,
  `d` int(10) unsigned DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `a` (`a`),
  KEY `b` (`b`),
  KEY `c` (`c`),
  KEY `d` (`d`)
) ENGINE=InnoDB
;

INSERT INTO u SELECT NULL, ROUND(RAND()*10, 0), ROUND(RAND()*10, 0), ROUND(RAND()*1000000, 0), ROUND(RAND()*1000000, 0);
INSERT INTO u SELECT NULL, ROUND(RAND()*10, 0), ROUND(RAND()*10, 0), ROUND(RAND()*1000000, 0), ROUND(RAND()*1000000, 0) FROM u;
... 1 mio rows

ANALYZE TABLE u;

With this table we can simulate the OR problem with low and high selectivity.

Running the tests

We did the tests with MySQL (5.0 - 5.7), Percona Server (5.6) and MariaDB (5.5, 10.0) for the following queries:

EXPLAIN SELECT * FROM u WHERE a = 5 OR b = 5;
EXPLAIN SELECT * FROM u WHERE a = 5 OR c = 500001;
EXPLAIN SELECT * FROM u WHERE c = 500001 OR d = 500001;

We are interested in what the optimizer is doing and what the performance of the queries is. The following results came out:

	Query 1			Query 2			Query 3
Database version	rows	avg. time	QEP	rows	avg. time	QEP	rows	avg. time	QEP
MySQL 5.0.92	194402	390 ms	1	104876	230 ms	2	6	< 10 ms	3
MySQL 5.1.66	194402	410 ms	1	104876	240 ms	2	6	< 10 ms	3
MySQL 5.5.24	194402	420 ms	1	104876	370 ms	1	6	< 10 ms	3
MariaDB 5.5.32	194402	460 ms	1	104876	420 ms	1	6	< 10 ms	3
MySQL 5.6.12	194402	440 ms	2	104876	240 ms	2	6	< 10 ms	3
Percona 5.6.12-60.40	194402	450 ms	2	104876	240 ms	2	6	< 10 ms	3
MySQL 5.7.1	194402	420 ms	2	104876	220 ms	2	6	< 10 ms	3
MariaDB 10.0.3	194402	450 ms	1	104876	400 ms	1	6	< 10 ms	3

Different Query Execution Plans (QEP)

QEP 1:

+----+-------------+-------+------+---------------+------+---------+------+---------+-------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows    | Extra       |
+----+-------------+-------+------+---------------+------+---------+------+---------+-------------+
|  1 | SIMPLE      | u     | ALL  | a,b           | NULL | NULL    | NULL | 1049134 | Using where | 
+----+-------------+-------+------+---------------+------+---------+------+---------+-------------+

QEP 2:

+----+-------------+-------+-------------+---------------+------+---------+------+--------+-------------------------------+
| id | select_type | table | type        | possible_keys | key  | key_len | ref  | rows   | Extra                         |
+----+-------------+-------+-------------+---------------+------+---------+------+--------+-------------------------------+
|  1 | SIMPLE      | u     | index_merge | a,c           | a,c  | 5,5     | NULL | nnnnnn | Using union(a,c); Using where | 
+----+-------------+-------+-------------+---------------+------+---------+------+--------+-------------------------------+

QEP 3:

+----+-------------+-------+-------------+---------------+------+---------+------+------+-------------------------------+
| id | select_type | table | type        | possible_keys | key  | key_len | ref  | rows | Extra                         |
+----+-------------+-------+-------------+---------------+------+---------+------+------+-------------------------------+
|  1 | SIMPLE      | u     | index_merge | c,d           | c,d  | 5,5     | NULL |    n | Using union(c,d); Using where | 
+----+-------------+-------+-------------+---------------+------+---------+------+------+-------------------------------+

Conclusion

Single query performance went down from 5 - 50% (in one case increased by 5%) over time (MySQL releases). But we can see some impacts on optimizer improvements.
Newer MySQL releases are not necessarily faster for single-query performance than older ones. Most of the MySQL users are not running more than 1 or 2 concurrent queries. For them scalability improvements are not really an issue.
There seems to be some changes in the Optimizer some for good, some for bad, depending on the release or branch/fork you are using. So test carefully when you change the release or branch/fork.
And: Do not believe the whole marketing yelling but do your own testing...

↧

Galera Cluster for MySQL and hardware load balancer

August 14, 2013, 12:30 pm

≫ Next: Galera Arbitrator (garbd)

≪ Previous: To UNION or not to UNION...

Our bigger customers where we help to deploy Galera Cluster for MySQL set-ups have some commercial hardware (e.g. F5 or Cisco) for load balancing instead of software load balancers.

For those hardware load balancer it is not possible to see if a Galera node is available or not because the MySQL daemon is still running and responding on port 3306 but the service is not available nonetheless.
So the load balancer still serves the Galera node while he feeds for example a joiner node with a SST. This would lead to application errors which is unlovely.

One can try somehow to teach the load balancer to find out if a Galera Cluster node is really available or not. But this requires a more sophisticated load balancer, know-how how to teach the load balancer the new behaviour and possible interaction between the MySQL node and the load balancer. See our other discussion for this mater.

An other concept we hit on this week is that we could also block the port 3306 of the MySQL node with firewall rules (iptables). Then the hardware load balancer does not see anybody listening on port 3306 any more and assumes that this IP address should not be served any more.

We also learned this week that the REJECT rule is better than the DROP rule when we want to have fast response time for immediate elimination of traffic.

The script block_galera_node.sh has to be hooked as before into the wsrep_notify_cmd variable and an additional sudoers rule has to be added for the mysql user.

#
# /etc/sudoers.d/mysql
# chmod 0440
#
mysql ALL = (root) NOPASSWD: /sbin/iptables

We are interested to hear your experience and your opinion about this approach.

↧

Galera Arbitrator (garbd)

August 17, 2013, 7:59 am

≫ Next: Huge amount of TIME_WAIT connections

≪ Previous: Galera Cluster for MySQL and hardware load balancer

Taxonomy upgrade extras:

It took me quite a while to find out how the beast Galera Arbitrator (garbd) works. To safe your time here a short summary:

How to start Galera Arbitrator (`garbd`)

shell> ./garbd --address gcomm://192.168.13.1,192.168.13.2 --group "Our Galera Cluster" --log /tmp/garbd.log --daemon

How to stop Galera Arbitrator (`gardb`)

shell> killall garbd

How to start Galera Arbitrator (`garbd`) with a configuration file

shell>./garbd --cfg /tmp/garb.cnf --daemon

The configuration file looks as follows:

#
# /etc/mysql/garb.cnf
#
address = gcomm://127.0.0.1:5671,127.0.0.1:5672,127.0.0.1:5673
group = Our Galera Cluster
options = gmcast.listen_addr=tcp://127.0.0.1:5674
log = /tmp/garbd.log

A service start/stop script can be found at: galera-src/garb/files/agrb.sh and galera-src/garb/files/garb.cnf

↧

Huge amount of TIME_WAIT connections

September 21, 2013, 2:11 am

≫ Next: Murphy’s Law is also valid for Galera Cluster for MySQL

≪ Previous: Galera Arbitrator (garbd)

In MySQL we have the typical behaviour that we open and close connections very often and rapidly. So we have very short-living connections to the server. This can lead in extreme cases to the situation that the maximum number of TCP ports are exhausted.

The maximum number of TCP ports we can find with:

# cat /proc/sys/net/ipv4/ip_local_port_range
32768   61000

In this example we can have in maximum (61000 - 32768 = 28232) connections concurrently open.

When a TCP connections closes the port cannot be reused immediately afterwards because the Operating System has to wait for the duration of the TIME_WAIT interval (maximum segment lifetime, MSL). This we can see with the command:

# netstat -nat

Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address               Foreign Address             State
tcp        0      0 0.0.0.0:10050               0.0.0.0:*                   LISTEN
tcp        0      0 0.0.0.0:10051               0.0.0.0:*                   LISTEN
tcp        0      0 127.0.0.1:10051             127.0.0.1:60756             TIME_WAIT
tcp        0      0 127.0.0.1:10050             127.0.0.1:50191             TIME_WAIT
tcp        0      0 127.0.0.1:10050             127.0.0.1:52186             ESTABLISHED
tcp        0      0 127.0.0.1:10051             127.0.0.1:34445             TIME_WAIT

The reason for waiting is that packets may arrive out of order or be retransmitted after the connection has been closed. CLOSE_WAIT indicates that the other side of the connection has closed the connection. TIME_WAIT indicates that this side has closed the connection. The connection is being kept around so that any delayed packets can be matched to the connection and handled appropriately.

The Maximum Segment Lifetime can be found as follows:

# cat /proc/sys/net/ipv4/tcp_fin_timeout
60

This basically means your system cannot guarantee more than ((61000 - 32768) / 60 = 470) ports at any given time.

Solutions

There are several strategies out of this problem:

Open less frequently connections to your MySQL database. Put more payload into one connection. Often Connection Pooling is used to achieve this.
Increasing the port range. Setting the range to 15000 61000 is pretty common these days (extreme tuning: 1024 - 65535).
Increase the availability by decreasing the FIN timeout.

Those values can be changed online with:

# echo 30 > /proc/sys/net/ipv4/tcp_fin_timeout
# echo 15000 65000 > /proc/sys/net/ipv4/ip_local_port_range

Or permanently by adding it to /etc/sysctl.conf

An other possibility to change this behaviour is to use tcp_tw_recycle and tcp_tw_reuse. By default they are disabled:

# cat /proc/sys/net/ipv4/tcp_tw_recycle
0
# cat /proc/sys/net/ipv4/tcp_tw_reuse
0

These parameters allow fast cycling of sockets in TIME_WAIT state and re-using them. But before you do this change make sure that this does not conflict with the protocols that you would use for the application that needs these ports.

The tcp_tw_recycle could cause some problems when using load balancers:

`tcp_tw_reuse`	Allow to reuse `TIME_WAIT` sockets for new connections when it is safe from protocol viewpoint. Default value is 0. It should not be changed without advice/request of technical experts.
`tcp_tw_recycle`	Enable fast recycling `TIME_WAIT` sockets. Default value is 0. It should not be changed without advice/request of technical experts.

Literature

↧

Murphy’s Law is also valid for Galera Cluster for MySQL

October 4, 2013, 11:10 am

≫ Next: Galera Cluster 3.1 GA is out!

≪ Previous: Huge amount of TIME_WAIT connections

We had a Galera Cluster support case recently. The customer was drenched in tears because his Galera Cluster did not work any more and he could not make it work any more.

Upsss! What has happened?

A bit of the background of this case: The customer wanted to do a rolling-restart of the Galera Cluster under load because of an Operating System upgrade which requires a reboot of the system.

Lets have a look at the MySQL error log to see what was going on. Customer restarted server with NodeC:

12:20:42 NodeC: normal shutdown --> Group 2/2
12:20:46 NodeC: shutdown complete
12:22:09 NodeC: started
12:22:15 NodeC: start replication
12:22:16 NodeC: CLOSED -> OPEN
12:22:16 all  : Group 2/3 component all PRIMARY
12:22:17 NodeC: Gap in state sequence. Need state transfer.
12:22:18 all  : Node 1 (NodeC) requested state transfer from '*any*'. Selected 0 (NodeB)(SYNCED) as donor.
12:22:18 NodeB: Shifting SYNCED -> DONOR/DESYNCED (TO: 660966498)
12:22:19 NodeC: Shifting PRIMARY -> JOINER (TO: 660966498)
12:22:19 NodeC: Receiving IST: 14761 writesets, seqnos 660951493-660966254
12:22:21 NodeC: 0 (NodeB): State transfer to 1 (NodeC) complete.

Everything went fine so far NodeC came up again and did an IST as expected. But then the first operational error happened: The customer did not wait to reboot NodeB until NodeC was completely recovered. It seems like NodeC took some time for the IST recovery. This should be checked on all nodes with SHOW GLOBAL STATUS LIKE 'wsrep%';...

12:22:21 NodeC: Member 0 (NodeB) synced with group.
12:22:21 NodeB: Shifting JOINED -> SYNCED (TO: 660966845)
12:22:21 NodeB: Synchronized with group, ready for connections
                --> NodeC seems not to be ready yet!
12:23:21 NodeB: Normal shutdown
12:23:21 all  : Group 1/2
12:23:21 NodeC: Aborted (core dumped)

And now Murphy was acting already the first time: We hit a situation in the Galera Cluster which is not covered as expected. Now we have 2 nodes out of 3 not working. As a result the Cluster gets a quorum loss (non-Primary, more than 50% of nodes disappeared) and does not reply to any SQL queries any more. This is a bug because both nodes left the cluster gracefully. The third node should have stayed primary:

12:23:21 NodeB: Received SELF-LEAVE. Closing connection.
12:23:23 NodeB: Shifting CLOSED -> DESTROYED (TO: 660973981)
12:23:25 NodeB: Shutdown complete
12:23:29 NodeC: mysqld_safe WSREP: sleeping 15 seconds before restart
12:23:37 NodeA: Received NON-PRIMARY.
12:23:44 NodeC: mysqld_safe mysqld restarted
12:23:48 NodeC: Shifting CLOSED -> OPEN
12:23:48 NodeC: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 2
12:23:48 NodeC: Received NON-PRIMARY.
12:23:48 NodeA: New COMPONENT: primary = no, bootstrap = no, my_idx = 1, memb_num = 2
12:23:48 NodeA: Received NON-PRIMARY.
12:24:30 NodeB: mysqld_safe Starting mysqld daemon
12:24:36 NodeB: Start replication
12:24:37 NodeB: Received NON-PRIMARY.

As a result the customer decided to shutdown the whole cluster. Which was not necessary but is a acceptable approach:

12:27:55 NodeB: /usr/sbin/mysqld: Normal shutdown
12:27:58 NodeB: /usr/sbin/mysqld: Shutdown complete
12:28:14 NodeA: /usr/sbin/mysqld: Normal shutdown
12:28:19 NodeA: /usr/sbin/mysqld: Shutdown complete
12:31:45 NodeC: /usr/sbin/mysqld: Normal shutdown
12:31:49 NodeC: /usr/sbin/mysqld: Shutdown complete

We experience a complete cluster outage now. An then the next operational error happened: The customer has chosen the node (NodeC) with the worst (= oldest) data as the starting node for the new Cluster:

12:31:55 NodeC: Starting mysqld daemon
12:31:58 NodeC: PRIMARY, 1/1
12:31:58 NodeC: /usr/sbin/mysqld: ready for connections.
12:33:29 NodeB: mysqld_safe Starting mysqld daemon
12:33:33 NodeB: PRIMARY, 1/2

An alternative approach would have been to run the command SET GLOBAL wsrep_provider_options='pc.bootstrap=yes'; on the node (NodeA) with the most recent data...
After connecting NodeB (with the newer state) requested an state transfer from the older NodeC:

12:33:35 all  : Node 1 (NodeB) requested state transfer from '*any*'. Selected 0 (NodeC)(SYNCED) as donor.
12:33:35 NodeC: Shifting SYNCED -> DONOR/DESYNCED (TO: 660982149)
12:33:35 NodeC: IST request
                --> Should be SST, why IST?
12:33:35 NodeB: Shifting PRIMARY -> JOINER (TO: 660982149)
12:33:35 NodeB: Receiving IST: 7914 writesets, seqnos 660973981-660981895
12:33:36 NodeB: Slave SQL: Could not execute Write_rows event on table test.test; Duplicate entry '8994678' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log FIRST, end_log_pos 102, Error_code: 1062

And now Mister Murphy is acting a second time: We hit another situation: The newer node requests an IST from the older node which has progressed in the meanwhile to an even newer state. So the newer joiner node receives data from the older donor node which causes an AUTO_INCREMENT Primary Key violation. As a consequence the node crashes:

12:33:36 NodeB: receiving IST failed, node restart required: Failed to apply app buffer: äJR#, seqno: 660974010, status: WSREP_FATAL
12:33:36 NodeB: Closed send monitor.
12:33:37 NodeB: Closing slave action queue.
12:33:37 NodeB: Aborted (core dumped)
12:33:37 NodeC: PRIMARY 1/1
12:33:44 NodeC: Shifting DONOR/DESYNCED -> JOINED (TO: 660983204)
12:33:59 NodeB: mysqld_safe mysqld restarted
12:34:04 NodeB: Shifting CLOSED -> OPEN
12:34:07 NodeB: Aborted (core dumped)
... Loop

This situation keeps the node NodeB now in a crashing loop. Restarted by the mysqld_safe process requesting an IST. This is another bug which is fixed in a newer Galera MySQL (5.5.33). And now the next operational error happened: Instead of killing NodeB and forcing an SST by deleting the grastat.dat file They started the third node as well...

12:37:12 NodeA: mysqld_safe Starting mysqld daemon
...
--> code dumped
... Loop

NodeB and NodeA both have the same problem now...

As a result: Node NodeA and NodeB are now looping in a crash. But at least the node NodeC was up and running all the time.

Learnings

Most important: Have an ntpd service running on all Cluster nodes to not mess up the times on different nodes while investigating in errors. This makes problem solving much easier...
In case of split-brain or quorum loss choose the node with the most recent data as your initial Cluster node.
If you have a Cluster in split-brain you do not have to restart it. You can bring the node out of split-brain with pc.bootstrap=yes variable if you found out which node is the most recent one.
Analyse error log files carefully to understand what went wrong. Forcing an SST only takes a few seconds.
Upgrade your software regularly to not hit old known bugs. The rule Do not touch a running system! does not apply here because we are already touching the running system! So regular upgrade from time to time can be very helpful!
Be familiar with operational issues of your Cluster software. A Cluster does not only mean high-availability. It means also you have to train your staff to handle it.
It is always valuable to have support for your business critical systems.

↧

Galera Cluster 3.1 GA is out!

November 13, 2013, 2:14 am

≫ Next: Impact of column types on MySQL JOIN performance

≪ Previous: Murphy’s Law is also valid for Galera Cluster for MySQL

Great News: Galera Cluster v3.1 GA for MySQL 5.6 was released at Percona Live London (PLUK) 2013. The information is still a bit hidden...

You can find it here:

The Plug-in: https://launchpad.net/galera/3.x/25.3.1
The MySQL: https://launchpad.net/codership-mysql/5.6/5.6.14-25.1

Or directly on our download page.

Careful: Online-Upgrade from 5.5 to 5.6 will not work yet. We have to find a work-around...

↧

Impact of column types on MySQL JOIN performance

December 11, 2013, 11:12 am

≫ Next: MySQL single query performance - the truth!

≪ Previous: Galera Cluster 3.1 GA is out!

Taxonomy upgrade extras:

In our MySQL trainings and consulting engagements we tell our customers always to use the smallest possible data type to get better query performance. Especially for the JOIN columns. This advice is supported as well by the MySQL documentation in the chapter Optimizing Data Types:

Use the most efficient (smallest) data types possible. MySQL has many specialized types that save disk space and memory. For example, use the smaller integer types if possible to get smaller tables. MEDIUMINT is often a better choice than INT because a MEDIUMINT column uses 25% less space.

I remember somewhere the JOIN columns where explicitly mentioned but I cannot find it any more.

Test set-up

To get numbers we have created a little test set-up:

CREATE TABLE `a` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT
, `data` varchar(64) DEFAULT NULL
, `ts` timestamp NOT NULL
, PRIMARY KEY (`id`)
) ENGINE=InnoDB CHARSET=latin1

CREATE TABLE `b` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT
, `data` varchar(64) DEFAULT NULL
, `ts` timestamp NOT NULL
, `a_id` int(10) unsigned DEFAULT NULL
, PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

1048576 rows

16777216 rows

The following query was used for the test:

EXPLAIN SELECT * FROM a JOIN b ON b.a_id = a.id WHERE a.id BETWEEN 10000 AND 15000;
+----+-------------+-------+--------+---------------+---------+---------+-------------+----------+-------------+
| id | select_type | table | type   | possible_keys | key     | key_len | ref         | rows     | Extra       |
+----+-------------+-------+--------+---------------+---------+---------+-------------+----------+-------------+
|  1 | SIMPLE      | b     | ALL    | NULL          | NULL    | NULL    | NULL        | 16322446 | Using where |
|  1 | SIMPLE      | a     | eq_ref | PRIMARY       | PRIMARY | 4       | test.b.a_id |        1 | NULL        |
+----+-------------+-------+--------+---------------+---------+---------+-------------+----------+-------------+

And yes: I know this query could be more optimal by setting an index on b.a_id.

Results

The whole workload was executed completely in memory and thus CPU bound (we did not want to measure the speed of our I/O system).

SE	`JOIN` column	bytes	query time		Gain	Space	Character set
InnoDB	MEDIUMINT	3	5.28 s	96%	4% faster	75%
InnoDB	INT	4	5.48 s	100%	100%	100%
InnoDB	BIGINT	8	5.65 s	107%	7% slower	200%
InnoDB	NUMERIC(7, 2)	~4	6.77 s	124%	24% slower	~100%
InnoDB	VARCHAR(7)	7-8	6.44 s	118%	18% slower	~200%	latin1
InnoDB	VARCHAR(16)	7-8	6.44 s	118%	18% slower	~200%	latin1
InnoDB	VARCHAR(32)	7-8	6.42 s	118%	18% slower	~200%	latin1
InnoDB	VARCHAR(128)	7-8	6.46 s	118%	18% slower	~200%	latin1
InnoDB	VARCHAR(256)	8-9	6.17 s	114%	14% slower	~225%	latin1
InnoDB	VARCHAR(16)	7-8	6.96 s	127%	27% slower	~200%	utf8
InnoDB	VARCHAR(128)	7-8	6.82 s	124%	24% slower	~200%	utf8
InnoDB	CHAR(16)	16	6.85 s	125%	25% slower	400%	latin1
InnoDB	CHAR(128)	128	9.68 s	177%	77% slower	3200%	latin1
InnoDB	TEXT	8-9	10.7 s	195%	95% slower	~225%	latin1
MyISAM	INT	4	3.16 s	58%	42% faster
TokuDB	INT	4	4.52 s	82%	18% faster

Some comments to the tests:

MySQL 5.6.13 was used for most of the tests.
TokuDB v7.1.0 was tested with MySQL 5.5.30.
As results the optimistic cases were taken. In reality the results can be slightly worse.
We did not take into consideration that bigger data types will eventually cause more I/O which is very slow!

Commands

ALTER TABLE a CONVERT TO CHARACTER SET latin1;
ALTER TABLE b CONVERT TO CHARACTER SET latin1;

ALTER TABLE a MODIFY COLUMN id INT UNSIGNED NOT NULL;
ALTER TABLE b MODIFY COLUMN a_id INT UNSIGNED NOT NULL;

↧

MySQL single query performance - the truth!

December 13, 2013, 8:33 am

≫ Next: Backup Manager for MySQL, MariaDB and Percona Server (mysql_bman)

≪ Previous: Impact of column types on MySQL JOIN performance

Taxonomy upgrade extras:

MySQL single query performance - the truth!

As suggested by morgo I did a little test for the same query and the same data-set mentioned in Impact of column types on MySQL JOIN performance but looking into an other dimension: the time (aka MySQL versions).

The answer

To make it short. As a good consultant the answer must be: "It depends!" :-)

The test

The query was again the following:

SELECT *
  FROM a
  JOIN b ON b.a_id = a.id
 WHERE a.id BETWEEN 10000 AND 15000
;

The Query Execution Plan was the same for all tested releases.

The relevant MySQL variables where used as follows where possible. Should I have considered join buffer, or any other of those local per session buffers (read_buffer_size, read_rnd_buffer_size, join_buffer_size)?

innodb_buffer_pool_size        = 768M
innodb_buffer_pool_instances   = 1
innodb_file_per_table          = 1

The results

	mysql-4.0.30	mysql-4.1.25	mysql-5.0.96	mysql-5.1.73	mysql-5.5.35	mysql-5.6.15	mysql-5.7.3
AVG	40.86	38.68	3.71	4.69	4.64	7.22	6.05
MEDIAN	41.07	38.13	3.69	4.46	4.65	6.32	6.05
STDEV	1.51	2.26	0.06	0.34	0.03	2.21	0.03
MIN	39.27	36.99	3.67	4.40	4.59	6.26	6.02
MAX	44.11	44.45	3.86	5.23	4.67	13.16	6.10
COUNT	10.00	10.00	10.00	10.00	10.00	10.00	10.00

	mariadb-5.1.44	mariadb-5.2.10	mariadb-5.3.3	mariadb-5.5.34	mariadb-10.0.6
AVG	4.58	8.63	8.34	5.02	6.12
MEDIAN	4.58	7.97	8.01	5.02	6.01
STDEV	0.01	1.45	1.10	0.02	0.25
MIN	4.55	7.86	7.90	4.99	5.97
MAX	4.60	11.38	11.46	5.06	6.75
COUNT	10.00	10.00	10.00	10.00	10.00

	percona-5.0.92-23.85	percona-5.1.72-14.10	percona-5.5.34-32.0	percona-5.6.14-62.0
AVG	3.79	4.70	4.94	10.53
MEDIAN	3.79	4.70	4.89	12.41
STDEV	0.02	0.03	0.14	3.35
MIN	3.76	4.67	4.86	5.68
MAX	3.83	4.75	5.34	12.93
COUNT	10.00	10.00	10.00	10.00

	galera-5.5.33-23.7.6 / 2.7
AVG	4.31
MEDIAN	3.98
STDEV	1.18
MIN	3.76
MAX	8.54
COUNT	30.00

The Graph

Conclusion

Do not trust benchmarks. They are mostly worthless for your specific workload and pure marketing buzz... Including the one above! ;-)

Database vendors (Oracle/MySQL, Percona, MariaDB) are primarily focussing on throughput and features. In general this is at the costs of single query performance.
MySQL users like Facebook, LinkedIn, Google, Wikpedia, Booking.com, Yahoo! etc. are more interested in throughput than single query performance (so I assume). But most of the MySQL users (95%) do not have a troughput problem but a single query performance problem (I assume here that this is true also for Oracle, MS-SQL Server, DB2, PostgreSQL, etc.).

So database vendors are not primarily producing for the masses but for some specific users/customers (which possibly pay a hell of money for this).

Back to the data:

My first hypothesis: "The old times were always better" is definitely not true. MySQL 4.0 and 4.1 sucked with this specific query. But since MySQL 5.0 the rough trend is: single query performance becomes worse over time (newer versions). I assume this also true for other databases...

Some claims like: "We have the fastest MySQL" or "We have hired the whole optimizer team" does not necessary reflect in better single query performance. At least not for this specific query.

So in short: If you upgrade or side-grade (MySQL <-> Percona <-> MariaDB), test always very carefully! It is not predictable where the traps are. Newer MySQL release can increase performance of your application or not. Do not trust marketing buzz!

Artefacts

Some artefacts we have already found during this tiny test:

In MySQL 5.0 an optimization was introduced (not in the Optimizer!?!) to speed up this specific query dramatically.
MariaDB 5.2 and 5.3 were bad for this specific query.
I have no clue why Galera Cluster has shown the best results for 5.5. It is no intention or manipulation! It is poor luck. But I like it! :-)
MySQL 5.6 seems to have some problems with this query. To much improvement done by Oracle/MySQL?
Percona 5.6 sometimes behaves much better with this query than normal MySQL but from time to time something kicks in which makes Percona dramatically slower. Thus the bad results. I have no clue why. I first though about an external influence. But I was capable to reproduce this behaviour (once). So I assume it must be something Percona internally (AHI for example?).

Finally

Do not shoot the messenger!

If you want to reproduce the results most information about are already published. If something is missing please let me know.

Please let me know when you do not agree with the results. So I can expand my universe a bit...

It was fun doing this tests today! And MyEnv was a great assistance doing this kind of tests!

If you want us to do such test for you, please let us know. Our consulting team would be happy to assist you with upgrading or side-grading problems.

↧

Backup Manager for MySQL, MariaDB and Percona Server (mysql_bman)

May 6, 2014, 8:28 am

≫ Next: Replication channel fail-over with Galera Cluster for MySQL

≪ Previous: MySQL single query performance - the truth!

Taxonomy upgrade extras:

About

The MySQL Backup Manager (mysql_bman) is a wrapper script for standard MySQL backup tools. The Problem with MySQL backup tools is, that they have many options and thus are overcomplicated and errors are easy made.

mysql_bman has the intention to make backups for MySQL easier and technically correct. This means it should per default not allow non-consistent backups or complain if some functions or parameters are used in the wrong way to guarantee proper backups.

In addition it has added some nice features which are missing in standard MySQL backup tools or which are only known from Enterprise backup solutions.

Where to download `mysql_bman`

The Backup Manager for MySQL (mysql_bman) can be downloaded from our website.

What `mysql_bman` user say about

Mathias Brem DBA@DBAOnline on LinkedIn:

Ow! Nice!
mysql backup manager is a very nice tool! Congratulations for FromDual! I made a shell script for catalog and maintained backups by xtrabackup, but mysql_bman is the best!

Xtrabackup + mysql_bman!!!!

Where can `mysql_bman` help you

The intention of mysql_bman is to assist you in bigger MySQL set-ups where you have to follow some backup policies and where you need a serious backup concept.

`mysql_bman` example

To give you an impression of the power of the MySQL Backup Manager let us have a look at a little example:

shell> mysql_bman --target=bman:secret@192.168.1.42 --type=full --mode=logical --policy=daily \
--no-compress --backupdir=/mnt/slowdisk \
--archive --archivedir=/mnt/nfsmount

With this backup method we do a logical full backup (mysqldump is triggered in the background). The backup is stored in the location for backups with the daily policy and is NOT compressed to speed up the backup by saving CPU power AND because the backup device is a de-duplicating drive. Then the backup is archived to and NFS mount.

Backup types

To achieve this we have defined different backup types:

Type	Description
full	full logical backup (`mysqldump`) of all schemas
binlog	binary-log backup
config	configuration file backup (`my.cnf`)
structure	structure backup
cleanup	clean-up of backup pieces older than n days
schema	backup of one or more schemas
privilege	privilege dump (SHOW GRANTS FOR)

A backup type is specified with the option --type=<backup_type>.

Backup modes

A backup can either be logical or physical. A logical backup is typically what you do with mysqldump. A physical backup is typically a physical file copy without looking into the data. That is what for example xtrabackup does.

The backup mode is specified with the option --mode=<backup_mode>. The following backup modes are available:

Mode	Description
logical	do a logical backup (`mysqldump`).
physical	do a physical backup (`mysqlbackup`/`innobackup`/`xtrabackup`)

Backup policies

Further we have introduced different backup policies. Policies are there to distinguish how different backups should be treated.

The following backup policies exist:

Policy	Description
daily	directory to store daily backups
weekly	directory to store weekly backups
monthly	directory to store monthly backups
quarterly	directory to store quarterly backups
yearly	directory to store yearly backups

For example you could plan to do a daily MySQL backup with binary logs with a retention policy of 7 days. But once a week you want to do a weekly backup consisting of a full backup, a configuration backup and a structure dump. But this weekly backup you want to keep for 6 months. And because of legal reasons you want to do a yearly backup with a retention policy of 10 years.

A backup policy is specified with the --policy=<backup_policy> option. This leads us to the retention time:

Options

The retention time which should be applied to a specific backup policy you can specify with the option --retention=<period_in_days>. The retention option means that a backup is not deleted before this amount of days when you run a clean-up job with mysql_bman.

Let us do an example:

shell> mysql_bman --type=cleanup --policy=daily --retention=30

This means that all backups in the daily policy should be deleted when they are older than 30 days.

Target

With the --target option you specify the connect string to the database to backup. This database can be located either local (all backup types can be used) or remote (only client/server backup types can be used).

A target looks as follows: user/password@host:port (similar to URI specification) whereas you can omit password and port.

Backup location, archiving, compressing and clean-up

The --backupdir option is to control location of the backup files. The policy folders are automatically created under this --backupdir location.
If you have a second layer of backup stores (e.g. tapes or slow backup drives or deduplicated drives or NFS drives) you can use the --archive option to copy your backup files to this second layer storage which is specified with the --archivedir option. For restore performance reasons it is recommended to always keep one or two generations of backups on you fast local drive. If you want to remove the backuped files from the --backupdir destination after the archive job use the --cleanup option.
If you want to omit to compress backups, either to safe time or because your location uses deduplicated drives you can use the --no-compress option.

Per schema backup

Especially for hosting companies a full database backup is typically not the right backup strategy because a restore of one specific customer (= schema) is very complicated. For this case we have the --per-schema option. mysql_bman will do a backup of the whole database schema by schema. Keep in mind: This breaks consistency among schemas!

Sometimes you want to do a schema backup only for some specific schemas for this you can use the --schema option. This option allows you to specify schemas to backup or not to backup. --schema=+a,+b means backup schema a and b. --schema=-a,-b means backup all schemas except a and b.
The second variant is less error prone because you do not forget to backup a new database.

Instance name

MySQL does not know the concept of naming an instance (mysqld). But for bigger environments it could be useful to uniquely name each instance. For this purpose we have introduced the option --instance-name=<give_it_a_name>. This instance name should be unique within your whole company. But we do not enforce it atm. The instance name is used to name backup files and later to identify the backup history of an instance in our backup catalog and to allow us to track the files for restore.

`mysql_bman` configuration file

Specifying everything on the command line is cumbersome. Thus mysql_bman considers a configuration file specified with the --config=<config_file> option.
A mysql_bman configuration file looks for example as follows:

policy      = daily
target      = root/secret@127.0.0.1:3306
type        = schema
schema      = -mysql
policy      = daily
archive     = on
archivedir  = /mnt/tape
per-schema  = on
no-compress = on

Simulate what happens

For the Sissies among us (as for example me) we have the --simulate option. This option simulates nearly all steps as far as possible without executing really anything. This option is either for testing some features or for debugging purposes.

Logging

If you want to track your backup history you can specify with the --log option where your mysql_bman log file should be located.

Using Catalog

It will be very useful when you can store your backups metadata in the database so you can check them in the future and to find out the backup criteria (type, mode, instance-name, ... etc) for specific backup processes. This could be achieved by using the catalog feature.

To activate this feature you have to create a database for the catalog "default name is bman_catalog" then create its tables by using the option --create in a special mysql_bman command (check examples below).
Finally, to store your backup metadata in the catalog what you only have to do is adding the option --catalog=catalog_connection_string to the normal mysql_bman command.
Check the examples below for using catalog in mysql_bman.

More help

A little more help you can get with the following command:

shell> mysql_bman --help

Examples

Do a full (logical = default) backup and store it in the daily policy folder:

shell> mysql_bman --target=root/secret@127.0.0.1:3306 --type=full --policy=daily

Do a full physical backup and store it in the weekly policy folder:

shell> mysql_bman --target=root/secret@127.0.0.1 --type=full --mode=physical --policy=weekly

Do a binary-log backup omitting the password in the target and store it in the daily policy folder:

shell> mysql_bman --target=bman@192.168.1.42:3307 --type=binlog --policy=daily

Do a MySQL configuration backup and store it in the weekly policy folder:

shell> mysql_bman --target=root/secret@127.0.0.1:3306 --type=config --policy=weekly

Do a structure backup and store it in the monthly policy folder and name the file with the instance name:

shell> mysql_bman --target=root/secret@127.0.0.1:3306 --type=structure --policy=monthly --instance-name=prod-db

Do a weekly structure backup and archive it to an other backup location:

shell> mysql_bman --target=root/secret@127.0.0.1:3306 --type=structure --policy=weekly --archive --archivedir=/mnt/tape

Do a schema backup omitting the mysql schema:

shell> mysql_bman --target=root/secret@127.0.0.1:3306 --type=schema --schema=-mysql --policy=daily --archive --archivedir=/mnt/tape

Do a schema backup only of foodmart and world and write it to their own files. Omit compressing these backups because they are located for example on deduplicated drives:

shell> mysql_bman --target=root/secret@127.0.0.1:3306 --type=schema --schema=+foodmart,+world --per-schema --policy=daily --no-compress

Creation of a backup catalog (assuming you have created already a catalog database with the default name "bman_catalog"):

shell> mysql_bman --catalog=root/secret@127.0.0.1:3306 --create

Backups against catalog:

shell> mysql_bman --target=root/secret@127.0.0.1:3306 --catalog=root/secret@127.0.0.1:3306 --instance-name=test --type=full --policy=daily

Privilege backup:

shell> mysql_bman --target=root/secret@127.0.0.1:3306 --type=privilege --policy=daily --mode=logical

↧

Replication channel fail-over with Galera Cluster for MySQL

June 18, 2014, 10:05 pm

≫ Next: Migration between MySQL/Percona Server and MariaDB

≪ Previous: Backup Manager for MySQL, MariaDB and Percona Server (mysql_bman)

Taxonomy upgrade extras:

Sometimes it could be desirable to replicate from a Galera Cluster to a single MySQL slave or to an other Galera Cluster. Reasons for this measure could be:

An unstable network between two Galera Cluster locations.
A separation of a reporting slave and the Galera Cluster so that heavy reports on the slave do not affect the Galera Cluster performance.
Mixing different sources in a slave or a Galera Cluster (fan-in replication).

This article is based on earlier research work (see MySQL Cluster - Cluster circular replication with 2 replication channels) and uses the old MySQL replication style (without MySQL GTID).

Preconditions

Enable the binary logs on 2 nodes of a Galera Cluster (we call them channel masters) with the log_bin variable.
Set log_slave_updates = 1 on ALL Galera nodes.
It is recommended to have small binary logs and relay logs in such a situation to reduce overhead of scanning the files (max_binlog_size = 100M).

Scenarios

Let us assume that for some reason the actual channel master of channel 1 breaks. As a consequence the slave of channel 1 does not receive any replication events any more. But we have to keep the replication stream up and running. So we have to switch the replication channel to channel master 2.

Switching replication channel

First for security reasons we should stop the slave of replication channel 1 first:

mysql> STOP SLAVE;

Then we have to find the actual relay log on the slave:

mysql> pager grep Relay_Log_File
mysql> SHOW SLAVE STATUS\G
mysql> nopager

Relay_Log_File: slave-relay-bin.000019

Next we have to find the last applied transaction on the slave:

mysql> SHOW RELAYLOG EVENTS IN 'slave-relay-bin.000019';

| slave-relay-bin.000019 | 3386717 | Query       |      5201 |    53745015 | BEGIN                          |
| slave-relay-bin.000019 | 3386794 | Table_map   |      5201 |    53745067 | table_id: 72 (test.test)       |
| slave-relay-bin.000019 | 3386846 | Write_rows  |      5201 |    53745142 | table_id: 72 flags: STMT_END_F |
| slave-relay-bin.000019 | 3386921 | Xid         |      5201 |    53745173 | COMMIT /* xid=1457451 */       |
+------------------------+---------+-------------+-----------+-------------+--------------------------------+

This is transaction 1457451 which is the same on all Galera nodes.

On the new channel master of channel 2 we have to find now the matching binary log. This can be done best by matching times between the relay log and the binary log of master of channel 2 (consider different time zones and that server times are synced with ntpd).

On slave:

shell> ll *relay-bin*
-rw-rw---- 1 mysql mysql     336 Mai 22 20:32 slave-relay-bin.000018
-rw-rw---- 1 mysql mysql 3387029 Mai 22 20:37 slave-relay-bin.000019

On master of channel 2:

shell> ll *bin-log*
-rw-rw---- 1 mysql mysql  2518737 Mai 22 19:57 bin-log.000072
-rw-rw---- 1 mysql mysql      143 Mai 22 19:57 bin-log.000073
-rw-rw---- 1 mysql mysql      165 Mai 22 20:01 bin-log.000074
-rw-rw---- 1 mysql mysql 62953648 Mai 22 20:40 bin-log.000075

It looks like binary log 75 of master 2 matches to relay log of our slave.

Now we have to find the same transaction on the master of channel 2:

mysql> pager grep -B 6 1457451
mysql> SHOW BINLOG EVENTS IN 'bin-log.000075';
mysql> nopager

| bin-log.000075 | 53744832 | Write_rows  |      5201 |    53744907 | table_id: 72 flags: STMT_END_F        |
| bin-log.000075 | 53744907 | Xid         |      5201 |    53744938 | COMMIT /* xid=1457450 */              |
| bin-log.000075 | 53744938 | Query       |      5201 |    53745015 | BEGIN                                 |
| bin-log.000075 | 53745015 | Table_map   |      5201 |    53745067 | table_id: 72 (test.test)              |
| bin-log.000075 | 53745067 | Write_rows  |      5201 |    53745142 | table_id: 72 flags: STMT_END_F        |
| bin-log.000075 | 53745142 | Xid         |      5201 |    53745173 | COMMIT /* xid=1457451 */              |
+----------------+----------+-------------+-----------+-------------+---------------------------------------+

We successfully found the transaction and want the position of the next transaction 53745173 where we should continue replicating.

As a last step we have to set the slave to the master of replication channel 2:

mysql> CHANGE MASTER TO master_host='master2', master_port=3306, master_log_file='bin-log.000075', master_log_pos=53745173;
mysql> START SLAVE;

After a while the slave has caught up and is ready for the next fail-over back.

Discussion

We found during our experiments that an IST of a channel master does not lead to a gap or loss of events in the replication stream. So restarting a channel master does not require a channel fail-over as long as an IST can be used for resyncing the channel master with the Galera Cluster.

The increase of wsrep_cluster_conf_id is NOT an indication that a channel fail-over is required.

A SST resets the binary logs so after the SST a slave will not replicate any more. So using this method should be safe to use. If you find any situation where you experience troubles with channel fail-over please let us know.

↧

Reporting

mysqldump --single-transaction (InnoDB)

mysqldump --lock-all-tables (MyISAM)

New features

Example

Job description

Who is FromDual?

How to continue

The simple way

Complicated way

Test set-up

Running the tests

Different Query Execution Plans (QEP)

Conclusion

How to start Galera Arbitrator (garbd)

How to stop Galera Arbitrator (gardb)

How to start Galera Arbitrator (garbd) with a configuration file

Solutions

Literature

Learnings

Test set-up

Results

Commands

MySQL single query performance - the truth!

The answer

The test

The results

The Graph

Conclusion

Artefacts

Finally

About

Where to download mysql_bman

What mysql_bman user say about

Where can mysql_bman help you

mysql_bman example

Backup types

Backup modes

Backup policies

Options

Target

Backup location, archiving, compressing and clean-up

Per schema backup

Instance name

mysql_bman configuration file

Simulate what happens

Logging

Using Catalog

More help

Examples

Preconditions

Scenarios

Switching replication channel

Discussion

`mysqldump --single-transaction` (InnoDB)

`mysqldump --lock-all-tables` (MyISAM)

How to start Galera Arbitrator (`garbd`)

How to stop Galera Arbitrator (`gardb`)

How to start Galera Arbitrator (`garbd`) with a configuration file

Where to download `mysql_bman`

What `mysql_bman` user say about

Where can `mysql_bman` help you

`mysql_bman` example

`mysql_bman` configuration file