Quantcast
Channel: mydumper Archives - Percona Database Performance Blog
Viewing all 30 articles
Browse latest View live

New mydumper 0.6.1 release offers performance and usability features

$
0
0

One of the tasks within Percona Remote DBA is to ensure we have reliable backups with minimal impact. To accomplish this, one of the tools our team uses is called mydumper. We use mydumper for logical backups because of several nice features. Some of them are:

  • multithreaded, producing very fast backups compared to mysqldump
  • almost no locking, if not using non innodb tables
  • built-in compression
  • separate files for each table, making it easy to restore single tables or schema, and the possibility to hardlink files reducing the space needed to keep history of backups. Also this feature give you the possibility to restore with more than one thread.

The mydumper project was started at the beginning of 2009 by Domas Mituzas, and a few months ago we started collaborating and adding new features to improve performance and usability.

And now we are glad to announce our second release: mydumper 0.6.1. You can download the source code here. It is highly recommended to upgrade it if you are using 0.6.0 as we fixed and improved the new less-locking feature.

New features in the mydumper 0.6 series:

  • Consistent backups with less locking
    This new feature consists of locking all non-innodb tables with the dumping threads at the beginning so in this way we can unlock the flush tables with read lock earlier and no need to wait until all non-innodb tables were dumped. You can take advantage of this feature when you have large archive or myisam tables.
  • File size chunks
    Now you can split tables dump into different files with fixed size. This is usefull to reduce storage capacity needed to keep history backups by using hardlinks. Think on big “log” tables or tables where old data didnt change, now you will be able to hardlink back those chunks.
  • Metadata Locking
    Added new option –use-savepoints to reduce metadata locking issues while backup is running.

Fixed Bugs in the mydumper 0.6 series:

mydumper 0.6.0

  • #1250269 ensure statement is not bigger than statement_size
  • #827328 myloader to set UNIQUE_CHECKS = 0 when importing
  • #993714 Reducing the time spent with a global read lock
  • #1250271 make it more obvious when mydumper is not successful
  • #1250274 table doesnt exist should not be an error
  • #987344 Primary key name not quoted in showed_nulls test
  • #1075611 error when restoring tables with dashes in name
  • #1124106 Mydumper/myloader does not care for 0-s in AUTO_INCREMENT fields
  • #1125997 Timestamp data not portable between servers on differnt timezones

mydumper 0.6.1

  • #1273441 less-locking breaks consistent snapshot
  • #1267501 mydumper erroneously always attempts a dummy read
  • #1272310 main_connection keep an useless transaction opened and permit infinite metadata table lock
  • #1269376 mydumper 0.6.0 fails to compile “cast from pointer to integer of different size”
  • #1265155 create_main_connection use detected_server before setting it
  • #1267483 Build with MariaDB 10.x
  • #1272443 The dumping threads will hold metadata locks progressively while are dumping data.

Note: #1267501 fix is important for any galera cluster deployment, because of this bug #1265656 that was fixed and released in Percona XtraDB Cluster 5.6.15-25.3.

The post New mydumper 0.6.1 release offers performance and usability features appeared first on MySQL Performance Blog.


Introducing backup locks in Percona Server

$
0
0

TL;DR version: The backup locks feature introduced in Percona Server 5.6.16-64.0 is a lightweight alternative to FLUSH TABLES WITH READ LOCK and can be used to take both physical and logical backups with less downtime on busy servers. To employ the feature with mysqldump, use mysqldump --lock-for-backup --single-transaction. The next release of Percona XtraBackup will also be using backup locks automatically if the target server supports the feature.

Now on to the gory details, but let’s start with some history.

In the beginning…

In the beginning there was FLUSH TABLES, and users messed with their MyISAM tables under a live server and were not ashamed. Users could do nice things like:

mysql> FLUSH TABLES;
# execute myisamchk, myisampack, backup / restore some tables, etc.

And users were happy until someone realized that tables must be protected against concurrent access by queries in other connections. So Monty gave them FLUSH TABLES WITH READ LOCK, and users were enlightened.

Online backups

Users then started dreaming about online backups, i.e. creating consistent snapshots of a live MySQL server. mysqldump --lock-all-tables had been a viable option for a while. To provide consistency it used FLUSH TABLES WITH READ LOCK which was not quite the right tool for the job, but was “good enough”. Who cares if a mom-and-pop shop becomes unavailable for a few seconds required to dump ~100 MB of data, right?

With InnoDB gaining popularity users had realized that one could employ MVCC to guarantee consistency and FLUSH TABLES WITH READ LOCK doesn’t make much sense for InnoDB tables anyway (you cannot modify InnoDB tables under a live server even if the server is read-only). So Peter gave mysqldump the --single-transaction option, and users were enlightened. mysqldump --single-transaction allowed to avoid FTWRL, but there was a few catches:

  • one cannot perform any schema modifications or updates to non-InnoDB tables while mysqldump --single-transaction is in progress, because those operations are not transactional and thus would ignore the data snapshot created by --single-transaction;
  • one cannot get binary log coordinates with --master-data or
    --dump-slave, because in that case FTWRL would still be used to ensure that the binary log coordinates are consistent with the data dump;

Which makes --single-transaction similar to the --no-lock option in Percona XtraBackup: it shifts the responsibility for backup consistency to the user. Any change in the workload violating the prerequisites for those options may result in a broken backup without any signs for the user to take action.

Present

Fast forward to present day. MySQL is capable of handling over a million queries per second, MyISAM is certainly not a popular choice to store data, and there are many backup solutions to choose from. Yet all of them still rely on FLUSH TABLES WITH READ LOCK in one way or another to guarantee consistency of .frm files, non-transactional tables and binary log coordinates.

To some extent, the problem with concurrent DDL + mysqldump --single-transaction has been alleviated with metadata locks in MySQL 5.5, which however made some users unhappy, and that behavior was partially reverted in MySQL 5.6.16 with the fix for bug #71017. But the fundamental problem is still there: mysqldump --single-transaction does not guarantee consistency with concurrent DDL statements and updates to non-transactional tables.

So the fact that FTWRL is an overkill for backups has been increasingly obvious for the reasons described below.

What’s the problem with FTWRL anyway?

A lot has been written on what FLUSH TABLES WITH READ LOCK really does. Here’s yet another walk-through in a bit more detail than described elsewhere:

  1. It first invalidates the Query Cache.
  2. It then waits for all in-flight updates to complete and at the same time it blocks all incoming updates. This is one problem for busy servers.
  3. It then closes all open tables (the FLUSH part) and expels them from the table cache. This is also when FTWRL has to wait for all SELECT queries to complete. And this is another, even bigger problem for busy servers, because that wait happens to occur with all updates blocked. What’s even worse, the server at this stage is essentially offline, because even incoming SELECT queries will get blocked.
  4. Finally, it blocks COMMITs.

Action #4 is not required for the original purpose of FTWRL, but is rather a kludge implemented due to the fact that FTWRL is (mis)used by backup utilities.

Actions #1-3 make perfect sense for the original reasons why FTWRL has been implemented. If we are going to access and possibly modify tables outside of the server, we want the server to forget everything it knows about both schema and data for all tables, and flush all in-memory buffers to make the on-disk data representation consistent.

And that’s what makes it an overkill for MySQL database backup utilities: they don’t require #1, because they never modify data. #2 is only required for non-InnoDB tables, because InnoDB provides other ways to ensure consistency for both logical and physical backups. And #3 is certainly not a problem for logical backup utilities like mysqldump or mydumper, because they don’t even access on-disk data directly. As we will see, it is not a big problem for physical backup solutions either.

To FLUSH or not to FLUSH?

So what exactly is flushed by FLUSH TABLES WITH READ LOCK?

Nothing for InnoDB tables, and no physical backup solution require it to flush anything.

For MyISAM it is more complicated. MyISAM key caches are normally write-through, i.e. by the time each update to a MyISAM table completes, all index updates are written to disk. The only exception is delayed key writing feature which you should not be using anyway, if you care about your data. MyISAM may also do data buffering for bulk inserts, e.g. while executing multi-row INSERTs or LOAD DATA statements. Those buffers, however, are flushed between statements, so have no effect on physical backups as long as we block all statements updating MyISAM tables.

The point is that without flushing each storage engine is not any less backup-safe as it is crash-safe, with the only difference that backups are guaranteed to wait for all currently executing INSERT/REPLACE/DELETE/UPDATE statements to complete.

Backup locks

Enter the backup locks feature. The following 3 new SQL statements have been introduced in Percona Server:

  • LOCK TABLES FOR BACKUP
  • LOCK BINLOG FOR BACKUP
  • UNLOCK BINLOG

LOCK TABLES FOR BACKUP

Quoting the documentation page from the manual:

LOCK TABLES FOR BACKUP uses a new MDL lock type to block updates to non-transactional tables and DDL statements for all tables. More specifically, if there’s an active LOCK TABLES FOR BACKUP lock, all DDL statements and updates to MyISAM, CSV, MEMORY and ARCHIVE tables will be blocked in the “Waiting for backup lock” status as visible in PERFORMANCE_SCHEMA or PROCESSLIST. SELECT queries for all tables and INSERT/REPLACE/UPDATE/DELETE against InnoDB, Blackhole and Federated tables are not affected by LOCK TABLES FOR BACKUP. Blackhole tables obviously have no relevance for backups, and Federated tables are ignored by both logical and physical backup tools.

Like FTWRL, the LOCK TABLES FOR BACKUP statement:

  • blocks updates to MyISAM, MEMORY, CSV and ARCHIVE tables;
  • blocks DDL against any tables;
  • does not block updates to temporary and log tables.

Unlike FTWRL, the LOCK TABLES FOR BACKUP statement:

  • does not invalidate the Query Cache;
  • never waits for SELECT queries to complete regardless of the storage engines involved;
  • never blocks SELECTs, or updates to InnoDB, Blackhole and Federated tables.

In other words, it does exactly what backup utilities need: block non-transactional changes that are included into the backup, and leave everything else to InnoDB MVCC and crash recovery.

With the only exception of binary log coordinates obtained with SHOW MASTER STATUS and SHOW SLAVE STATUS.

LOCK BINLOG FOR BACKUP

This is when LOCK BINLOG FOR BACKUP comes in handy. It blocks all updates to binary log coordinates as reported by SHOW MASTER/SLAVE STATUS and used by backup utilities. It has no effect when all of the following conditions apply:

  • when the binary log is disabled. If it is disabled globally, then all connections will not be affected by LOCK BINLOG FOR BACKUP. If it is enabled globally, but disabled for specific connections via sql_log_bin, only those connections are allowed to commit;
  • the server is not a replication slave;

Even if binary logging is used, LOCK BINLOG FOR BACKUP will allow DDL and updates to any tables to proceed until they will be written to binlog (i.e. commit), and/or advance Exec_Master_Log_* / Exec_Gtid_Set when executed by a replication thread, provided that no other global locks are acquired.

To release the lock acquired by LOCK TABLES FOR BACKUP there’s already UNLOCK TABLES. And the LOCK BINLOG FOR BACKUP lock is released with UNLOCK BINLOG.

Let’s look how these statements can be used by MySQL backup utilities.

mysqldump

mysqldump got a new option, --lock-for-backup which along with --single-transaction essentially obsoletes --lock-all-tables (i.e. FLUSH TABLES WITH READ LOCK). It makes mysqldump use LOCK TABLES FOR BACKUP before it starts dumping tables to block all “unsafe” statement that might otherwise interfere with backup consistency.

Of course, that requires backup locks support by the target server, so mysqldump checks if they are indeed supported and fails with an error if they are not.

However, at the moment if binary lock coordinates are requested with --master-data, FTWRL is still used even if --lock-for-backup is specified. mysqldump could use LOCK BINLOG FOR BACKUP, but there’s a better solution for logical backups implemented in MariaDB, which has already been ported to Percona Server and queued for the next release.

There is also another important difference between just mysqldump --single-transaction and mysqldump --lock-for-backup --single-transaction. As of MySQL 5.5 mysqldump --single-transaction acquires shared metadata locks on all tables processed within the transaction. Which will also block DDL statements on those tables when they will try to acquire an exclusive lock. So far, so good. The problems start when there’s an incoming SELECT query against a table that already has a pending DDL statement. It will also be blocked on a pending exclusive MDL request for no apparent reasons. Which was one of the complaints in bug #71017.

It’s better illustrated with an example. Suppose there are 3 sessions: one created by mysqldump, and 2 user sessions.

user1> CREATE TABLE t1 (a INT);
mysqldump> START TRANSACTION WITH CONSISTENT SNAPSHOT;
mysqldump> SELECT * FROM t1; # this acquires a table MDL
user1> ALTER TABLE t1 ADD COLUMN b INT; # this blocks on the MDL created by mysqldump
user2> SET lock_wait_timeout=1;
user2> SELECT * FROM t1; # this blocks on a pending MDL request by user1
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction

This is what would happen with mysqldump --lock-for-backup --single-transaction:

user1> CREATE TABLE t1 (a INT);
mysqldump> LOCK TABLES FOR BACKUP;
mysqldump> START TRANSACTION WITH CONSISTENT SNAPSHOT;
mysqldump> SELECT * FROM t1; # this acquires a table MDL
user1> ALTER TABLE t1 ADD COLUMN b INT; # this blocks on the backup MDL lock
user2> SET lock_wait_timeout=1;
user2> SELECT * FROM t1; # this one is not blocked

This immediate problem was partially fixed in MySQL 5.6.16 by releasing metadata locks after processing each table with the help of savepoints. There is a couple of issues with this approach:

  • there is still a table metadata lock for the duration of SELECT executed by mysqldump. Which, as before, blocks DDL. So there is still a chance that mysqldump --single-transaction may eventually block SELECT queries.
  • after the table is processed and the metadata lock is released, there is now an opportunity for RENAME to break the backup, see bug #71214.

Both issues above along with bug #71215 and bug #71216 do not exist with mysqldump --lock-for-backup --single-transaction as all kinds of DDL statements are properly isolated by backup locks, which do not block SELECT queries at the same time.

Percona XtraBackup

Percona XtraBackup 2.2 will support backup locks and use them automatically if supported by the server being backed up.

The current locking used by XtraBackup is:

# copy InnoDB data
FLUSH TABLES WITH READ LOCK;
# copy .frm, MyISAM, etc.
# get the binary log coordinates
# finalize the background copy of REDO log
UNLOCK TABLES;

With backup locks it becomes:

# copy InnoDB data
LOCK TABLES FOR BACKUP;
# copy .frm, MyISAM, etc
LOCK BINLOG FOR BACKUP;
# finalize the background copy of REDO log
UNLOCK TABLES;
# get the binary log coordinates
UNLOCK BINLOG;

Note that under the following conditions, no blocking occurs at any stage in the server:

  • no updates to non-transactional tables;
  • no DDL;
  • binary log is disabled;

They may look familiar, because they are essentially prerequisites for the --no-lock option. Except that with backup locks, you don’t have to take chances and take responsibility for backup consistency. All the locking will be handled automatically by the server, if and when it is necessary.

mylvmbackup

mylvmbackup takes the server read-only with FLUSH TABLES WITH READ LOCK while the snapshot is being created for two reasons:

  • flush non-transactional tables
  • ensure consistency with the binary log coordinates

For exactly the same reasons as with XtraBackup, it can use backup locks instead of FTWRL.

mydumper

mydumper developers may want to add support for backup locks as well. mydumper relies on START TRANSACTION WITH CONSISTENT SNAPSHOT to ensure InnoDB consistency, but has to resort to FLUSH TABLES WITH READ LOCK to ensure consistency of non-InnoDB tables and binary log coordinates.

Another problem is that START TRANSACTION WITH CONSISTENT SNAPSHOT is not supposed to be used by multi-threaded logical backup utilities. But that is an opportunity for another server-side improvement and probably a separate blog post.

The post Introducing backup locks in Percona Server appeared first on MySQL Performance Blog.

mydumper [less] locking

$
0
0
In this post I would like to review how my dumper for MySQL works from the point of view of locks. Since 0.6 serie we have different options, so I will try to explain how they work

As you may know mydumper is multithreaded and this adds a lot of complexity compared with other logical backup tools as it also needs to coordinate all threads with the same snapshot to be consistent. So let review how mydumper does this with the default settings.

By default mydumper uses 4 threads to dump data and 1 main thread

Main Thread
  • FLUSH TABLES WITH READ LOCK
Dump Thread X
  • START TRANSACTION WITH CONSISTENT SNAPSHOT;
  • dump non-InnoDB tables
Main Thread
  • UNLOCK TABLES
Dump Thread X
  • dump InnoDB tables
As you can see in this case we need FTWRL for two things, coordinate transaction’s snapshots and dump non-InnoDB tables in a consistent way. So we have have global read lock until we dumped all non-InnoDB tables.
What less locking does is this:
Main Thread
  • FLUSH TABLES WITH READ LOCK
Dump Thread X
  • START TRANSACTION WITH CONSISTENT SNAPSHOT;
 LL Dump Thread X
  • LOCK TABLES non-InnoDB
Main Thread
  • UNLOCK TABLES
 LL Dump Thread X
  • dump non-InnoDB tables
  • UNLOCK non-InnoDB
Dump Thread X
  • dump InnoDB tables

So now the global read lock its in place until less-locking threads lock non-InnoDB tables, and this is really fast. The only downsize is that it uses double the amount of threads, so for the default (4 threads) we will end up having 9 connections, but always 4 will be running at the same time.

Less-locking really helps when you have MyISAM or ARCHIVE that are not heavily updated by production workload, also you should know that LOCK TABLE … READ LOCAL allows non conflicting INSERTS on MyISAM so if you use that tables to keep logs (append only) you will not notice that lock at all.

For the next release we will implement backup locks that will avoid us to run FTWRL.

The post mydumper [less] locking appeared first on MySQL Performance Blog.

New mydumper 0.6.1 release offers performance and usability features

$
0
0

One of the tasks within Percona Remote DBA is to ensure we have reliable backups with minimal impact. To accomplish this, one of the tools our team uses is called mydumper. We use mydumper for logical backups because of several nice features. Some of them are:

  • multithreaded, producing very fast backups compared to mysqldump
  • almost no locking, if not using non innodb tables
  • built-in compression
  • separate files for each table, making it easy to restore single tables or schema, and the possibility to hardlink files reducing the space needed to keep history of backups. Also this feature give you the possibility to restore with more than one thread.

The mydumper project was started at the beginning of 2009 by Domas Mituzas, and a few months ago we started collaborating and adding new features to improve performance and usability.

And now we are glad to announce our second release: mydumper 0.6.1. You can download the source code here. It is highly recommended to upgrade it if you are using 0.6.0 as we fixed and improved the new less-locking feature.

New features in the mydumper 0.6 series:

  • Consistent backups with less locking
    This new feature consists of locking all non-innodb tables with the dumping threads at the beginning so in this way we can unlock the flush tables with read lock earlier and no need to wait until all non-innodb tables were dumped. You can take advantage of this feature when you have large archive or myisam tables.
  • File size chunks
    Now you can split tables dump into different files with fixed size. This is usefull to reduce storage capacity needed to keep history backups by using hardlinks. Think on big “log” tables or tables where old data didnt change, now you will be able to hardlink back those chunks.
  • Metadata Locking
    Added new option –use-savepoints to reduce metadata locking issues while backup is running.

Fixed Bugs in the mydumper 0.6 series:

mydumper 0.6.0

  • #1250269 ensure statement is not bigger than statement_size
  • #827328 myloader to set UNIQUE_CHECKS = 0 when importing
  • #993714 Reducing the time spent with a global read lock
  • #1250271 make it more obvious when mydumper is not successful
  • #1250274 table doesnt exist should not be an error
  • #987344 Primary key name not quoted in showed_nulls test
  • #1075611 error when restoring tables with dashes in name
  • #1124106 Mydumper/myloader does not care for 0-s in AUTO_INCREMENT fields
  • #1125997 Timestamp data not portable between servers on differnt timezones

mydumper 0.6.1

  • #1273441 less-locking breaks consistent snapshot
  • #1267501 mydumper erroneously always attempts a dummy read
  • #1272310 main_connection keep an useless transaction opened and permit infinite metadata table lock
  • #1269376 mydumper 0.6.0 fails to compile “cast from pointer to integer of different size”
  • #1265155 create_main_connection use detected_server before setting it
  • #1267483 Build with MariaDB 10.x
  • #1272443 The dumping threads will hold metadata locks progressively while are dumping data.

Note: #1267501 fix is important for any galera cluster deployment, because of this bug #1265656 that was fixed and released in Percona XtraDB Cluster 5.6.15-25.3.

The post New mydumper 0.6.1 release offers performance and usability features appeared first on MySQL Performance Blog.

Introducing backup locks in Percona Server

$
0
0

TL;DR version: The backup locks feature introduced in Percona Server 5.6.16-64.0 is a lightweight alternative to FLUSH TABLES WITH READ LOCK and can be used to take both physical and logical backups with less downtime on busy servers. To employ the feature with mysqldump, use mysqldump --lock-for-backup --single-transaction. The next release of Percona XtraBackup will also be using backup locks automatically if the target server supports the feature.

Now on to the gory details, but let’s start with some history.

In the beginning…

In the beginning there was FLUSH TABLES, and users messed with their MyISAM tables under a live server and were not ashamed. Users could do nice things like:

mysql> FLUSH TABLES;
# execute myisamchk, myisampack, backup / restore some tables, etc.

And users were happy until someone realized that tables must be protected against concurrent access by queries in other connections. So Monty gave them FLUSH TABLES WITH READ LOCK, and users were enlightened.

Online backups

Users then started dreaming about online backups, i.e. creating consistent snapshots of a live MySQL server. mysqldump --lock-all-tables had been a viable option for a while. To provide consistency it used FLUSH TABLES WITH READ LOCK which was not quite the right tool for the job, but was “good enough”. Who cares if a mom-and-pop shop becomes unavailable for a few seconds required to dump ~100 MB of data, right?

With InnoDB gaining popularity users had realized that one could employ MVCC to guarantee consistency and FLUSH TABLES WITH READ LOCK doesn’t make much sense for InnoDB tables anyway (you cannot modify InnoDB tables under a live server even if the server is read-only). So Peter gave mysqldump the --single-transaction option, and users were enlightened. mysqldump --single-transaction allowed to avoid FTWRL, but there was a few catches:

  • one cannot perform any schema modifications or updates to non-InnoDB tables while mysqldump --single-transaction is in progress, because those operations are not transactional and thus would ignore the data snapshot created by --single-transaction;
  • one cannot get binary log coordinates with --master-data or
    --dump-slave, because in that case FTWRL would still be used to ensure that the binary log coordinates are consistent with the data dump;

Which makes --single-transaction similar to the --no-lock option in Percona XtraBackup: it shifts the responsibility for backup consistency to the user. Any change in the workload violating the prerequisites for those options may result in a broken backup without any signs for the user to take action.

Present

Fast forward to present day. MySQL is capable of handling over a million queries per second, MyISAM is certainly not a popular choice to store data, and there are many backup solutions to choose from. Yet all of them still rely on FLUSH TABLES WITH READ LOCK in one way or another to guarantee consistency of .frm files, non-transactional tables and binary log coordinates.

To some extent, the problem with concurrent DDL + mysqldump --single-transaction has been alleviated with metadata locks in MySQL 5.5, which however made some users unhappy, and that behavior was partially reverted in MySQL 5.6.16 with the fix for bug #71017. But the fundamental problem is still there: mysqldump --single-transaction does not guarantee consistency with concurrent DDL statements and updates to non-transactional tables.

So the fact that FTWRL is an overkill for backups has been increasingly obvious for the reasons described below.

What’s the problem with FTWRL anyway?

A lot has been written on what FLUSH TABLES WITH READ LOCK really does. Here’s yet another walk-through in a bit more detail than described elsewhere:

  1. It first invalidates the Query Cache.
  2. It then waits for all in-flight updates to complete and at the same time it blocks all incoming updates. This is one problem for busy servers.
  3. It then closes all open tables (the FLUSH part) and expels them from the table cache. This is also when FTWRL has to wait for all SELECT queries to complete. And this is another, even bigger problem for busy servers, because that wait happens to occur with all updates blocked. What’s even worse, the server at this stage is essentially offline, because even incoming SELECT queries will get blocked.
  4. Finally, it blocks COMMITs.

Action #4 is not required for the original purpose of FTWRL, but is rather a kludge implemented due to the fact that FTWRL is (mis)used by backup utilities.

Actions #1-3 make perfect sense for the original reasons why FTWRL has been implemented. If we are going to access and possibly modify tables outside of the server, we want the server to forget everything it knows about both schema and data for all tables, and flush all in-memory buffers to make the on-disk data representation consistent.

And that’s what makes it an overkill for MySQL database backup utilities: they don’t require #1, because they never modify data. #2 is only required for non-InnoDB tables, because InnoDB provides other ways to ensure consistency for both logical and physical backups. And #3 is certainly not a problem for logical backup utilities like mysqldump or mydumper, because they don’t even access on-disk data directly. As we will see, it is not a big problem for physical backup solutions either.

To FLUSH or not to FLUSH?

So what exactly is flushed by FLUSH TABLES WITH READ LOCK?

Nothing for InnoDB tables, and no physical backup solution require it to flush anything.

For MyISAM it is more complicated. MyISAM key caches are normally write-through, i.e. by the time each update to a MyISAM table completes, all index updates are written to disk. The only exception is delayed key writing feature which you should not be using anyway, if you care about your data. MyISAM may also do data buffering for bulk inserts, e.g. while executing multi-row INSERTs or LOAD DATA statements. Those buffers, however, are flushed between statements, so have no effect on physical backups as long as we block all statements updating MyISAM tables.

The point is that without flushing each storage engine is not any less backup-safe as it is crash-safe, with the only difference that backups are guaranteed to wait for all currently executing INSERT/REPLACE/DELETE/UPDATE statements to complete.

Backup locks

Enter the backup locks feature. The following 3 new SQL statements have been introduced in Percona Server:

  • LOCK TABLES FOR BACKUP
  • LOCK BINLOG FOR BACKUP
  • UNLOCK BINLOG

LOCK TABLES FOR BACKUP

Quoting the documentation page from the manual:

LOCK TABLES FOR BACKUP uses a new MDL lock type to block updates to non-transactional tables and DDL statements for all tables. More specifically, if there’s an active LOCK TABLES FOR BACKUP lock, all DDL statements and updates to MyISAM, CSV, MEMORY and ARCHIVE tables will be blocked in the “Waiting for backup lock” status as visible in PERFORMANCE_SCHEMA or PROCESSLIST. SELECT queries for all tables and INSERT/REPLACE/UPDATE/DELETE against InnoDB, Blackhole and Federated tables are not affected by LOCK TABLES FOR BACKUP. Blackhole tables obviously have no relevance for backups, and Federated tables are ignored by both logical and physical backup tools.

Like FTWRL, the LOCK TABLES FOR BACKUP statement:

  • blocks updates to MyISAM, MEMORY, CSV and ARCHIVE tables;
  • blocks DDL against any tables;
  • does not block updates to temporary and log tables.

Unlike FTWRL, the LOCK TABLES FOR BACKUP statement:

  • does not invalidate the Query Cache;
  • never waits for SELECT queries to complete regardless of the storage engines involved;
  • never blocks SELECTs, or updates to InnoDB, Blackhole and Federated tables.

In other words, it does exactly what backup utilities need: block non-transactional changes that are included into the backup, and leave everything else to InnoDB MVCC and crash recovery.

With the only exception of binary log coordinates obtained with SHOW MASTER STATUS and SHOW SLAVE STATUS.

LOCK BINLOG FOR BACKUP

This is when LOCK BINLOG FOR BACKUP comes in handy. It blocks all updates to binary log coordinates as reported by SHOW MASTER/SLAVE STATUS and used by backup utilities. It has no effect when all of the following conditions apply:

  • when the binary log is disabled. If it is disabled globally, then all connections will not be affected by LOCK BINLOG FOR BACKUP. If it is enabled globally, but disabled for specific connections via sql_log_bin, only those connections are allowed to commit;
  • the server is not a replication slave;

Even if binary logging is used, LOCK BINLOG FOR BACKUP will allow DDL and updates to any tables to proceed until they will be written to binlog (i.e. commit), and/or advance Exec_Master_Log_* / Exec_Gtid_Set when executed by a replication thread, provided that no other global locks are acquired.

To release the lock acquired by LOCK TABLES FOR BACKUP there’s already UNLOCK TABLES. And the LOCK BINLOG FOR BACKUP lock is released with UNLOCK BINLOG.

Let’s look how these statements can be used by MySQL backup utilities.

mysqldump

mysqldump got a new option, --lock-for-backup which along with --single-transaction essentially obsoletes --lock-all-tables (i.e. FLUSH TABLES WITH READ LOCK). It makes mysqldump use LOCK TABLES FOR BACKUP before it starts dumping tables to block all “unsafe” statement that might otherwise interfere with backup consistency.

Of course, that requires backup locks support by the target server, so mysqldump checks if they are indeed supported and fails with an error if they are not.

However, at the moment if binary lock coordinates are requested with --master-data, FTWRL is still used even if --lock-for-backup is specified. mysqldump could use LOCK BINLOG FOR BACKUP, but there’s a better solution for logical backups implemented in MariaDB, which has already been ported to Percona Server and queued for the next release.

There is also another important difference between just mysqldump --single-transaction and mysqldump --lock-for-backup --single-transaction. As of MySQL 5.5 mysqldump --single-transaction acquires shared metadata locks on all tables processed within the transaction. Which will also block DDL statements on those tables when they will try to acquire an exclusive lock. So far, so good. The problems start when there’s an incoming SELECT query against a table that already has a pending DDL statement. It will also be blocked on a pending exclusive MDL request for no apparent reasons. Which was one of the complaints in bug #71017.

It’s better illustrated with an example. Suppose there are 3 sessions: one created by mysqldump, and 2 user sessions.

user1> CREATE TABLE t1 (a INT);
mysqldump> START TRANSACTION WITH CONSISTENT SNAPSHOT;
mysqldump> SELECT * FROM t1; # this acquires a table MDL
user1> ALTER TABLE t1 ADD COLUMN b INT; # this blocks on the MDL created by mysqldump
user2> SET lock_wait_timeout=1;
user2> SELECT * FROM t1; # this blocks on a pending MDL request by user1
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction

This is what would happen with mysqldump --lock-for-backup --single-transaction:

user1> CREATE TABLE t1 (a INT);
mysqldump> LOCK TABLES FOR BACKUP;
mysqldump> START TRANSACTION WITH CONSISTENT SNAPSHOT;
mysqldump> SELECT * FROM t1; # this acquires a table MDL
user1> ALTER TABLE t1 ADD COLUMN b INT; # this blocks on the backup MDL lock
user2> SET lock_wait_timeout=1;
user2> SELECT * FROM t1; # this one is not blocked

This immediate problem was partially fixed in MySQL 5.6.16 by releasing metadata locks after processing each table with the help of savepoints. There is a couple of issues with this approach:

  • there is still a table metadata lock for the duration of SELECT executed by mysqldump. Which, as before, blocks DDL. So there is still a chance that mysqldump --single-transaction may eventually block SELECT queries.
  • after the table is processed and the metadata lock is released, there is now an opportunity for RENAME to break the backup, see bug #71214.

Both issues above along with bug #71215 and bug #71216 do not exist with mysqldump --lock-for-backup --single-transaction as all kinds of DDL statements are properly isolated by backup locks, which do not block SELECT queries at the same time.

Percona XtraBackup

Percona XtraBackup 2.2 will support backup locks and use them automatically if supported by the server being backed up.

The current locking used by XtraBackup is:

# copy InnoDB data
FLUSH TABLES WITH READ LOCK;
# copy .frm, MyISAM, etc.
# get the binary log coordinates
# finalize the background copy of REDO log
UNLOCK TABLES;

With backup locks it becomes:

# copy InnoDB data
LOCK TABLES FOR BACKUP;
# copy .frm, MyISAM, etc
LOCK BINLOG FOR BACKUP;
# finalize the background copy of REDO log
UNLOCK TABLES;
# get the binary log coordinates
UNLOCK BINLOG;

Note that under the following conditions, no blocking occurs at any stage in the server:

  • no updates to non-transactional tables;
  • no DDL;
  • binary log is disabled;

They may look familiar, because they are essentially prerequisites for the --no-lock option. Except that with backup locks, you don’t have to take chances and take responsibility for backup consistency. All the locking will be handled automatically by the server, if and when it is necessary.

mylvmbackup

mylvmbackup takes the server read-only with FLUSH TABLES WITH READ LOCK while the snapshot is being created for two reasons:

  • flush non-transactional tables
  • ensure consistency with the binary log coordinates

For exactly the same reasons as with XtraBackup, it can use backup locks instead of FTWRL.

mydumper

mydumper developers may want to add support for backup locks as well. mydumper relies on START TRANSACTION WITH CONSISTENT SNAPSHOT to ensure InnoDB consistency, but has to resort to FLUSH TABLES WITH READ LOCK to ensure consistency of non-InnoDB tables and binary log coordinates.

Another problem is that START TRANSACTION WITH CONSISTENT SNAPSHOT is not supposed to be used by multi-threaded logical backup utilities. But that is an opportunity for another server-side improvement and probably a separate blog post.

The post Introducing backup locks in Percona Server appeared first on MySQL Performance Blog.

mydumper [less] locking

$
0
0
In this post I would like to review how my dumper for MySQL works from the point of view of locks. Since 0.6 serie we have different options, so I will try to explain how they work

As you may know mydumper is multithreaded and this adds a lot of complexity compared with other logical backup tools as it also needs to coordinate all threads with the same snapshot to be consistent. So let review how mydumper does this with the default settings.

By default mydumper uses 4 threads to dump data and 1 main thread

Main Thread
  • FLUSH TABLES WITH READ LOCK
Dump Thread X
  • START TRANSACTION WITH CONSISTENT SNAPSHOT;
  • dump non-InnoDB tables
Main Thread
  • UNLOCK TABLES
Dump Thread X
  • dump InnoDB tables
As you can see in this case we need FTWRL for two things, coordinate transaction’s snapshots and dump non-InnoDB tables in a consistent way. So we have have global read lock until we dumped all non-InnoDB tables.
What less locking does is this:
Main Thread
  • FLUSH TABLES WITH READ LOCK
Dump Thread X
  • START TRANSACTION WITH CONSISTENT SNAPSHOT;
 LL Dump Thread X
  • LOCK TABLES non-InnoDB
Main Thread
  • UNLOCK TABLES
 LL Dump Thread X
  • dump non-InnoDB tables
  • UNLOCK non-InnoDB
Dump Thread X
  • dump InnoDB tables

So now the global read lock its in place until less-locking threads lock non-InnoDB tables, and this is really fast. The only downsize is that it uses double the amount of threads, so for the default (4 threads) we will end up having 9 connections, but always 4 will be running at the same time.

Less-locking really helps when you have MyISAM or ARCHIVE that are not heavily updated by production workload, also you should know that LOCK TABLE … READ LOCAL allows non conflicting INSERTS on MyISAM so if you use that tables to keep logs (append only) you will not notice that lock at all.

For the next release we will implement backup locks that will avoid us to run FTWRL.

The post mydumper [less] locking appeared first on MySQL Performance Blog.

MySQL performance implications of InnoDB isolation modes

$
0
0

Over the past few months I’ve written a couple of posts about dangerous debt of InnoDB Transactional History and about the fact MVCC can be the cause of severe MySQL performance issues. In this post I will cover a related topic – InnoDB Transaction Isolation Modes, their relationship with MVCC (multi-version concurrency control) and how they impact MySQL performance.

The MySQL Manual provides a decent description of transaction isolation modes supported by MySQL – I will not repeat it here but rather focus on performance implications.

SERIALIZABLE – This is the strongest isolation mode to the point it essentially defeats Multi-Versioning making all SELECTs locking causing significant overhead both in terms of lock management (setting locks is expensive) and the concurrency you can get. This mode is only used in very special circumstances among MySQL Applications.

REPEATABLE READ – This is default isolation level and generally it is quite nice and convenient for the application. It sees all the data at the time of the first read (assuming using standard non-locking reads). This however comes at high cost – InnoDB needs to maintain transaction history up to the point of transaction start, which can be very expensive. The worse-case scenario being applications with a high level of updates and hot rows – you really do not want InnoDB to deal with rows which have hundreds of thousands of versions.

In terms of performance impact, both reads and writes can be impacted. For select queries traversing multiple row versions is very expensive but so it is also for updates, especially as version control seems to cause severe contention issues in MySQL 5.6

Here is example: I ran sysbench for a completely in-memory data set and when start transaction and run full table scan query couple of times while keeping transaction open:

sysbench  --num-threads=64 --report-interval=10 --max-time=0 --max-requests=0 --rand-type=pareto --oltp-table-size=80000000 --mysql-user=root --mysql-password= --mysql-db=sbinnodb  --test=/usr/share/doc/sysbench/tests/db/update_index.lua run

As you can see the write throughput drops drastically and stays low at all times when transaction is open, and not only when the query is running. This is perhaps the worse-case scenario I could find which happens when you have select outside of transaction followed by a long transaction in repeatable-read isolation mode. Though you can see regression in other cases, too.

Here is the set of queries I used if someone wants to repeat the test:

select avg(length(c)) from sbtest1;
begin;
select avg(length(c)) from sbtest1;
select sleep(300);
commit;

Not only is Repeatable Read the default isolation level, it is also used for logical backups with InnoDB – think mydumper or mysqldump –single-transaction
These results show such backup methods not only can’t be used with large data sets due to long recovery time but also can’t be used with some high write environments due to their performance impact.

READ COMMITTED mode is similar to REPEATABLE READ with the essential difference being what versions are not kept to the start of first read in transaction, but instead to the start of the current statement. As such using this mode allows InnoDB to maintain a lot less versions, especially if you do not have very long running statements. If you have some long running selects – such as reporting queries performance impact can be still severe.

In general I think good practice is to use READ COMITTED isolation mode as default and change to REPEATABLE READ for those applications or transactions which require it.

READ UNCOMMITTED – I think this is the least understood isolation mode (not a surprise as it only has 2 lines of documentation about it) which only describe it from the logical point of view. If you’re using this isolation mode you will see all the changes done in the database as they happen, even those by transactions which have not been yet committed. One nice use case for this isolation mode is what you can “watch” as some large scale UPDATE statements are happening with dirty reads showing which rows have already been changes and which have not.

So this statement shows changes that have not been committed yet and might not ever be committed if the transaction doing them runs into some errors so – this mode should be used with extreme care. There are a number of cases though when we do not need 100% accurate data and in this case this mode becomes very handy.

So how does READ UNCOMMITTED behave from a performance point of view? In theory InnoDB could purge row versions even if they have been created after the start of the statement in READ UNCOMMITTED mode. In practice, due to a bug or some intricate implementation detail it does not do that – row versions still will be kept to the start of the statement. So if you run very long running SELECT in READ UNCOMMITTED statements you will get a large amount of row versions created, just as if you would use READ COMMITTED. No win here.

There is an important win from SELECT side – READ UNCOMMITTED isolation mode means InnoDB never needs to go and examine the older row versions – the last row version is always the correct one which can cause dramatic performance improvement especially if the undo space had spilled over to the disk so finding old row versions can cause a lot of IO.

Perhaps the best illustration I discovered with query select avg(k) from sbtest1; ran in parallel with the same update heavy workload stated above. In READ COMMITTED isolation mode it never completes – I assume because new index entries are inserted faster than they are scanned, in case of READ UNCOMMITTED isolation mode it completes in a minute or so.

Final Thoughts: Using InnoDB Isolation modes correctly can help your application to get the best possible performance. Your mileage may vary and in some cases you will see no difference; in others it will be absolutely dramatic. There also seems to be a lot of work to be done in relationship to InnoDB performance with long version history. I hope it will be tackled in the future MySQL version.

The post MySQL performance implications of InnoDB isolation modes appeared first on MySQL Performance Blog.

Importing big tables with large indexes with Myloader MySQL tool

$
0
0

Mydumper is known as the faster (much faster) mysqldump alternative. So, if you take a logical backup you will choose Mydumper instead of mysqldump. But what about the restore? Well, who needs to restore a logical backup? It takes ages! Even with Myloader. But this could change just a bit if we are able to take advantage of Fast Index Creation.

As you probably know, Mydumper and mysqldump export the struct of a table, with all the indexes and the constraints, and of course, the data. Then, Myloader and MySQL import the struct of the table and import the data. The most important difference is that you can configure Myloader to import the data using a certain amount of threads. The import steps are:

  1. Create the complete struct of the table
  2. Import the data

When you execute Myloader, internally it first creates the tables executing the “-schema.sql” files and then takes all the filenames without “schema.sql” and puts them in a task queue. Every thread takes a filename from the queue, which actually is a chunk of the table, and executes it.  When finished it takes another chunk from the queue, but if the queue is empty it just ends.

This import procedure works fast for small tables, but with big tables with large indexes the inserts are getting slower caused by the overhead of insert the new values in secondary indexes. Another way to import the data is:

  1. Split the table structure into table creation with primary key, indexes creation and constraint creation
  2. Create tables with primary key
  3. Per table do:
    1. Load the data
    2. Create index
  4. Create constraints

This import procedure is implemented in a branch of Myloader that can be downloaded from here or directly executing bzr with the repository:

bzr branch lp:~david-ducos/mydumper/mydumper

The tool reads the schema files and splits them into three separate statements which create the tables with the primary key, the indexes and the constraints. The primary key is kept in the table creation in order to avoid the recreation of the table when a primary key is added and the “KEY” and “CONSTRAINT” lines are removed. These lines are added to the index and constraint statements, respectively.

It processes tables according to their size starting with the largest because creating the indexes of a big table could take hours and is single-threaded. While we cannot process other indexes at the time, we are potentially able to create other tables with the remaining threads.

It has a new thread (monitor_process) that decides which chunk of data will be put in the task queue and a communication queue which is used by the task processes to tell the monitor_process which chunk has been completed.

I run multiple imports on an AWS m1.xlarge machine with one table comparing Myloader and this branch and I found that with large indexes the times were:

myloader

As you can see, when you have less than 150M rows, import the data and then create the indexes is higher than import the table with the indexes all at once. But everything changes after 150M rows, import 200M takes 64 minutes more for Myloader but just 24 minutes for the new branch.

On a table of 200M rows with a integer primary key and 9 integer columns, you will see how the time increases as the index gets larger:

myloader2

Where:

2-2-0: two 1-column and two 2-column index
2-2-1: two 1-column, two 2-column and one 3-column index
2-3-1: two 1-column, three 2-column and one 3-column index
2-3-2: two 1-column, three 2-column and two 3-column index

Conclusion

This branch can only import all the tables with this same strategy, but with this new logic in Myloader, in a future version it could be able to import each table with the best strategy reducing the time of the restore considerably.

The post Importing big tables with large indexes with Myloader MySQL tool appeared first on MySQL Performance Blog.


New mydumper 0.6.1 release offers performance and usability features

$
0
0

One of the tasks within Percona Remote DBA is to ensure we have reliable backups with minimal impact. To accomplish this, one of the tools our team uses is called mydumper. We use mydumper for logical backups because of several nice features. Some of them are:

  • multithreaded, producing very fast backups compared to mysqldump
  • almost no locking, if not using non innodb tables
  • built-in compression
  • separate files for each table, making it easy to restore single tables or schema, and the possibility to hardlink files reducing the space needed to keep history of backups. Also this feature give you the possibility to restore with more than one thread.

The mydumper project was started at the beginning of 2009 by Domas Mituzas, and a few months ago we started collaborating and adding new features to improve performance and usability.

And now we are glad to announce our second release: mydumper 0.6.1. You can download the source code here. It is highly recommended to upgrade it if you are using 0.6.0 as we fixed and improved the new less-locking feature.

New features in the mydumper 0.6 series:

  • Consistent backups with less locking
    This new feature consists of locking all non-innodb tables with the dumping threads at the beginning so in this way we can unlock the flush tables with read lock earlier and no need to wait until all non-innodb tables were dumped. You can take advantage of this feature when you have large archive or myisam tables.
  • File size chunks
    Now you can split tables dump into different files with fixed size. This is usefull to reduce storage capacity needed to keep history backups by using hardlinks. Think on big “log” tables or tables where old data didnt change, now you will be able to hardlink back those chunks.
  • Metadata Locking
    Added new option –use-savepoints to reduce metadata locking issues while backup is running.

Fixed Bugs in the mydumper 0.6 series:

mydumper 0.6.0

  • #1250269 ensure statement is not bigger than statement_size
  • #827328 myloader to set UNIQUE_CHECKS = 0 when importing
  • #993714 Reducing the time spent with a global read lock
  • #1250271 make it more obvious when mydumper is not successful
  • #1250274 table doesnt exist should not be an error
  • #987344 Primary key name not quoted in showed_nulls test
  • #1075611 error when restoring tables with dashes in name
  • #1124106 Mydumper/myloader does not care for 0-s in AUTO_INCREMENT fields
  • #1125997 Timestamp data not portable between servers on differnt timezones

mydumper 0.6.1

  • #1273441 less-locking breaks consistent snapshot
  • #1267501 mydumper erroneously always attempts a dummy read
  • #1272310 main_connection keep an useless transaction opened and permit infinite metadata table lock
  • #1269376 mydumper 0.6.0 fails to compile “cast from pointer to integer of different size”
  • #1265155 create_main_connection use detected_server before setting it
  • #1267483 Build with MariaDB 10.x
  • #1272443 The dumping threads will hold metadata locks progressively while are dumping data.

Note: #1267501 fix is important for any galera cluster deployment, because of this bug #1265656 that was fixed and released in Percona XtraDB Cluster 5.6.15-25.3.

The post New mydumper 0.6.1 release offers performance and usability features appeared first on MySQL Performance Blog.

Introducing backup locks in Percona Server

$
0
0

TL;DR version: The backup locks feature introduced in Percona Server 5.6.16-64.0 is a lightweight alternative to FLUSH TABLES WITH READ LOCK and can be used to take both physical and logical backups with less downtime on busy servers. To employ the feature with mysqldump, use mysqldump --lock-for-backup --single-transaction. The next release of Percona XtraBackup will also be using backup locks automatically if the target server supports the feature.

Now on to the gory details, but let’s start with some history.

In the beginning…

In the beginning there was FLUSH TABLES, and users messed with their MyISAM tables under a live server and were not ashamed. Users could do nice things like:

mysql> FLUSH TABLES;
# execute myisamchk, myisampack, backup / restore some tables, etc.

And users were happy until someone realized that tables must be protected against concurrent access by queries in other connections. So Monty gave them FLUSH TABLES WITH READ LOCK, and users were enlightened.

Online backups

Users then started dreaming about online backups, i.e. creating consistent snapshots of a live MySQL server. mysqldump --lock-all-tables had been a viable option for a while. To provide consistency it used FLUSH TABLES WITH READ LOCK which was not quite the right tool for the job, but was “good enough”. Who cares if a mom-and-pop shop becomes unavailable for a few seconds required to dump ~100 MB of data, right?

With InnoDB gaining popularity users had realized that one could employ MVCC to guarantee consistency and FLUSH TABLES WITH READ LOCK doesn’t make much sense for InnoDB tables anyway (you cannot modify InnoDB tables under a live server even if the server is read-only). So Peter gave mysqldump the --single-transaction option, and users were enlightened. mysqldump --single-transaction allowed to avoid FTWRL, but there was a few catches:

  • one cannot perform any schema modifications or updates to non-InnoDB tables while mysqldump --single-transaction is in progress, because those operations are not transactional and thus would ignore the data snapshot created by --single-transaction;
  • one cannot get binary log coordinates with --master-data or
    --dump-slave, because in that case FTWRL would still be used to ensure that the binary log coordinates are consistent with the data dump;

Which makes --single-transaction similar to the --no-lock option in Percona XtraBackup: it shifts the responsibility for backup consistency to the user. Any change in the workload violating the prerequisites for those options may result in a broken backup without any signs for the user to take action.

Present

Fast forward to present day. MySQL is capable of handling over a million queries per second, MyISAM is certainly not a popular choice to store data, and there are many backup solutions to choose from. Yet all of them still rely on FLUSH TABLES WITH READ LOCK in one way or another to guarantee consistency of .frm files, non-transactional tables and binary log coordinates.

To some extent, the problem with concurrent DDL + mysqldump --single-transaction has been alleviated with metadata locks in MySQL 5.5, which however made some users unhappy, and that behavior was partially reverted in MySQL 5.6.16 with the fix for bug #71017. But the fundamental problem is still there: mysqldump --single-transaction does not guarantee consistency with concurrent DDL statements and updates to non-transactional tables.

So the fact that FTWRL is an overkill for backups has been increasingly obvious for the reasons described below.

What’s the problem with FTWRL anyway?

A lot has been written on what FLUSH TABLES WITH READ LOCK really does. Here’s yet another walk-through in a bit more detail than described elsewhere:

  1. It first invalidates the Query Cache.
  2. It then waits for all in-flight updates to complete and at the same time it blocks all incoming updates. This is one problem for busy servers.
  3. It then closes all open tables (the FLUSH part) and expels them from the table cache. This is also when FTWRL has to wait for all SELECT queries to complete. And this is another, even bigger problem for busy servers, because that wait happens to occur with all updates blocked. What’s even worse, the server at this stage is essentially offline, because even incoming SELECT queries will get blocked.
  4. Finally, it blocks COMMITs.

Action #4 is not required for the original purpose of FTWRL, but is rather a kludge implemented due to the fact that FTWRL is (mis)used by backup utilities.

Actions #1-3 make perfect sense for the original reasons why FTWRL has been implemented. If we are going to access and possibly modify tables outside of the server, we want the server to forget everything it knows about both schema and data for all tables, and flush all in-memory buffers to make the on-disk data representation consistent.

And that’s what makes it an overkill for MySQL database backup utilities: they don’t require #1, because they never modify data. #2 is only required for non-InnoDB tables, because InnoDB provides other ways to ensure consistency for both logical and physical backups. And #3 is certainly not a problem for logical backup utilities like mysqldump or mydumper, because they don’t even access on-disk data directly. As we will see, it is not a big problem for physical backup solutions either.

To FLUSH or not to FLUSH?

So what exactly is flushed by FLUSH TABLES WITH READ LOCK?

Nothing for InnoDB tables, and no physical backup solution require it to flush anything.

For MyISAM it is more complicated. MyISAM key caches are normally write-through, i.e. by the time each update to a MyISAM table completes, all index updates are written to disk. The only exception is delayed key writing feature which you should not be using anyway, if you care about your data. MyISAM may also do data buffering for bulk inserts, e.g. while executing multi-row INSERTs or LOAD DATA statements. Those buffers, however, are flushed between statements, so have no effect on physical backups as long as we block all statements updating MyISAM tables.

The point is that without flushing each storage engine is not any less backup-safe as it is crash-safe, with the only difference that backups are guaranteed to wait for all currently executing INSERT/REPLACE/DELETE/UPDATE statements to complete.

Backup locks

Enter the backup locks feature. The following 3 new SQL statements have been introduced in Percona Server:

  • LOCK TABLES FOR BACKUP
  • LOCK BINLOG FOR BACKUP
  • UNLOCK BINLOG

LOCK TABLES FOR BACKUP

Quoting the documentation page from the manual:

LOCK TABLES FOR BACKUP uses a new MDL lock type to block updates to non-transactional tables and DDL statements for all tables. More specifically, if there’s an active LOCK TABLES FOR BACKUP lock, all DDL statements and updates to MyISAM, CSV, MEMORY and ARCHIVE tables will be blocked in the “Waiting for backup lock” status as visible in PERFORMANCE_SCHEMA or PROCESSLIST. SELECT queries for all tables and INSERT/REPLACE/UPDATE/DELETE against InnoDB, Blackhole and Federated tables are not affected by LOCK TABLES FOR BACKUP. Blackhole tables obviously have no relevance for backups, and Federated tables are ignored by both logical and physical backup tools.

Like FTWRL, the LOCK TABLES FOR BACKUP statement:

  • blocks updates to MyISAM, MEMORY, CSV and ARCHIVE tables;
  • blocks DDL against any tables;
  • does not block updates to temporary and log tables.

Unlike FTWRL, the LOCK TABLES FOR BACKUP statement:

  • does not invalidate the Query Cache;
  • never waits for SELECT queries to complete regardless of the storage engines involved;
  • never blocks SELECTs, or updates to InnoDB, Blackhole and Federated tables.

In other words, it does exactly what backup utilities need: block non-transactional changes that are included into the backup, and leave everything else to InnoDB MVCC and crash recovery.

With the only exception of binary log coordinates obtained with SHOW MASTER STATUS and SHOW SLAVE STATUS.

LOCK BINLOG FOR BACKUP

This is when LOCK BINLOG FOR BACKUP comes in handy. It blocks all updates to binary log coordinates as reported by SHOW MASTER/SLAVE STATUS and used by backup utilities. It has no effect when all of the following conditions apply:

  • when the binary log is disabled. If it is disabled globally, then all connections will not be affected by LOCK BINLOG FOR BACKUP. If it is enabled globally, but disabled for specific connections via sql_log_bin, only those connections are allowed to commit;
  • the server is not a replication slave;

Even if binary logging is used, LOCK BINLOG FOR BACKUP will allow DDL and updates to any tables to proceed until they will be written to binlog (i.e. commit), and/or advance Exec_Master_Log_* / Exec_Gtid_Set when executed by a replication thread, provided that no other global locks are acquired.

To release the lock acquired by LOCK TABLES FOR BACKUP there’s already UNLOCK TABLES. And the LOCK BINLOG FOR BACKUP lock is released with UNLOCK BINLOG.

Let’s look how these statements can be used by MySQL backup utilities.

mysqldump

mysqldump got a new option, --lock-for-backup which along with --single-transaction essentially obsoletes --lock-all-tables (i.e. FLUSH TABLES WITH READ LOCK). It makes mysqldump use LOCK TABLES FOR BACKUP before it starts dumping tables to block all “unsafe” statement that might otherwise interfere with backup consistency.

Of course, that requires backup locks support by the target server, so mysqldump checks if they are indeed supported and fails with an error if they are not.

However, at the moment if binary lock coordinates are requested with --master-data, FTWRL is still used even if --lock-for-backup is specified. mysqldump could use LOCK BINLOG FOR BACKUP, but there’s a better solution for logical backups implemented in MariaDB, which has already been ported to Percona Server and queued for the next release.

There is also another important difference between just mysqldump --single-transaction and mysqldump --lock-for-backup --single-transaction. As of MySQL 5.5 mysqldump --single-transaction acquires shared metadata locks on all tables processed within the transaction. Which will also block DDL statements on those tables when they will try to acquire an exclusive lock. So far, so good. The problems start when there’s an incoming SELECT query against a table that already has a pending DDL statement. It will also be blocked on a pending exclusive MDL request for no apparent reasons. Which was one of the complaints in bug #71017.

It’s better illustrated with an example. Suppose there are 3 sessions: one created by mysqldump, and 2 user sessions.

user1> CREATE TABLE t1 (a INT);
mysqldump> START TRANSACTION WITH CONSISTENT SNAPSHOT;
mysqldump> SELECT * FROM t1; # this acquires a table MDL
user1> ALTER TABLE t1 ADD COLUMN b INT; # this blocks on the MDL created by mysqldump
user2> SET lock_wait_timeout=1;
user2> SELECT * FROM t1; # this blocks on a pending MDL request by user1
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction

This is what would happen with mysqldump --lock-for-backup --single-transaction:

user1> CREATE TABLE t1 (a INT);
mysqldump> LOCK TABLES FOR BACKUP;
mysqldump> START TRANSACTION WITH CONSISTENT SNAPSHOT;
mysqldump> SELECT * FROM t1; # this acquires a table MDL
user1> ALTER TABLE t1 ADD COLUMN b INT; # this blocks on the backup MDL lock
user2> SET lock_wait_timeout=1;
user2> SELECT * FROM t1; # this one is not blocked

This immediate problem was partially fixed in MySQL 5.6.16 by releasing metadata locks after processing each table with the help of savepoints. There is a couple of issues with this approach:

  • there is still a table metadata lock for the duration of SELECT executed by mysqldump. Which, as before, blocks DDL. So there is still a chance that mysqldump --single-transaction may eventually block SELECT queries.
  • after the table is processed and the metadata lock is released, there is now an opportunity for RENAME to break the backup, see bug #71214.

Both issues above along with bug #71215 and bug #71216 do not exist with mysqldump --lock-for-backup --single-transaction as all kinds of DDL statements are properly isolated by backup locks, which do not block SELECT queries at the same time.

Percona XtraBackup

Percona XtraBackup 2.2 will support backup locks and use them automatically if supported by the server being backed up.

The current locking used by XtraBackup is:

# copy InnoDB data
FLUSH TABLES WITH READ LOCK;
# copy .frm, MyISAM, etc.
# get the binary log coordinates
# finalize the background copy of REDO log
UNLOCK TABLES;

With backup locks it becomes:

# copy InnoDB data
LOCK TABLES FOR BACKUP;
# copy .frm, MyISAM, etc
LOCK BINLOG FOR BACKUP;
# finalize the background copy of REDO log
UNLOCK TABLES;
# get the binary log coordinates
UNLOCK BINLOG;

Note that under the following conditions, no blocking occurs at any stage in the server:

  • no updates to non-transactional tables;
  • no DDL;
  • binary log is disabled;

They may look familiar, because they are essentially prerequisites for the --no-lock option. Except that with backup locks, you don’t have to take chances and take responsibility for backup consistency. All the locking will be handled automatically by the server, if and when it is necessary.

mylvmbackup

mylvmbackup takes the server read-only with FLUSH TABLES WITH READ LOCK while the snapshot is being created for two reasons:

  • flush non-transactional tables
  • ensure consistency with the binary log coordinates

For exactly the same reasons as with XtraBackup, it can use backup locks instead of FTWRL.

mydumper

mydumper developers may want to add support for backup locks as well. mydumper relies on START TRANSACTION WITH CONSISTENT SNAPSHOT to ensure InnoDB consistency, but has to resort to FLUSH TABLES WITH READ LOCK to ensure consistency of non-InnoDB tables and binary log coordinates.

Another problem is that START TRANSACTION WITH CONSISTENT SNAPSHOT is not supposed to be used by multi-threaded logical backup utilities. But that is an opportunity for another server-side improvement and probably a separate blog post.

The post Introducing backup locks in Percona Server appeared first on MySQL Performance Blog.

mydumper [less] locking

$
0
0
In this post I would like to review how my dumper for MySQL works from the point of view of locks. Since 0.6 serie we have different options, so I will try to explain how they work

As you may know mydumper is multithreaded and this adds a lot of complexity compared with other logical backup tools as it also needs to coordinate all threads with the same snapshot to be consistent. So let review how mydumper does this with the default settings.

By default mydumper uses 4 threads to dump data and 1 main thread

Main Thread
  • FLUSH TABLES WITH READ LOCK
Dump Thread X
  • START TRANSACTION WITH CONSISTENT SNAPSHOT;
  • dump non-InnoDB tables
Main Thread
  • UNLOCK TABLES
Dump Thread X
  • dump InnoDB tables
As you can see in this case we need FTWRL for two things, coordinate transaction’s snapshots and dump non-InnoDB tables in a consistent way. So we have have global read lock until we dumped all non-InnoDB tables.
What less locking does is this:
Main Thread
  • FLUSH TABLES WITH READ LOCK
Dump Thread X
  • START TRANSACTION WITH CONSISTENT SNAPSHOT;
 LL Dump Thread X
  • LOCK TABLES non-InnoDB
Main Thread
  • UNLOCK TABLES
 LL Dump Thread X
  • dump non-InnoDB tables
  • UNLOCK non-InnoDB
Dump Thread X
  • dump InnoDB tables

So now the global read lock its in place until less-locking threads lock non-InnoDB tables, and this is really fast. The only downsize is that it uses double the amount of threads, so for the default (4 threads) we will end up having 9 connections, but always 4 will be running at the same time.

Less-locking really helps when you have MyISAM or ARCHIVE that are not heavily updated by production workload, also you should know that LOCK TABLE … READ LOCAL allows non conflicting INSERTS on MyISAM so if you use that tables to keep logs (append only) you will not notice that lock at all.

For the next release we will implement backup locks that will avoid us to run FTWRL.

The post mydumper [less] locking appeared first on MySQL Performance Blog.

MySQL performance implications of InnoDB isolation modes

$
0
0

Over the past few months I’ve written a couple of posts about dangerous debt of InnoDB Transactional History and about the fact MVCC can be the cause of severe MySQL performance issues. In this post I will cover a related topic – InnoDB Transaction Isolation Modes, their relationship with MVCC (multi-version concurrency control) and how they impact MySQL performance.

The MySQL Manual provides a decent description of transaction isolation modes supported by MySQL – I will not repeat it here but rather focus on performance implications.

SERIALIZABLE – This is the strongest isolation mode to the point it essentially defeats Multi-Versioning making all SELECTs locking causing significant overhead both in terms of lock management (setting locks is expensive) and the concurrency you can get. This mode is only used in very special circumstances among MySQL Applications.

REPEATABLE READ – This is default isolation level and generally it is quite nice and convenient for the application. It sees all the data at the time of the first read (assuming using standard non-locking reads). This however comes at high cost – InnoDB needs to maintain transaction history up to the point of transaction start, which can be very expensive. The worse-case scenario being applications with a high level of updates and hot rows – you really do not want InnoDB to deal with rows which have hundreds of thousands of versions.

In terms of performance impact, both reads and writes can be impacted. For select queries traversing multiple row versions is very expensive but so it is also for updates, especially as version control seems to cause severe contention issues in MySQL 5.6

Here is example: I ran sysbench for a completely in-memory data set and when start transaction and run full table scan query couple of times while keeping transaction open:

sysbench  --num-threads=64 --report-interval=10 --max-time=0 --max-requests=0 --rand-type=pareto --oltp-table-size=80000000 --mysql-user=root --mysql-password= --mysql-db=sbinnodb  --test=/usr/share/doc/sysbench/tests/db/update_index.lua run

As you can see the write throughput drops drastically and stays low at all times when transaction is open, and not only when the query is running. This is perhaps the worse-case scenario I could find which happens when you have select outside of transaction followed by a long transaction in repeatable-read isolation mode. Though you can see regression in other cases, too.

Here is the set of queries I used if someone wants to repeat the test:

select avg(length(c)) from sbtest1;
begin;
select avg(length(c)) from sbtest1;
select sleep(300);
commit;

Not only is Repeatable Read the default isolation level, it is also used for logical backups with InnoDB – think mydumper or mysqldump –single-transaction
These results show such backup methods not only can’t be used with large data sets due to long recovery time but also can’t be used with some high write environments due to their performance impact.

READ COMMITTED mode is similar to REPEATABLE READ with the essential difference being what versions are not kept to the start of first read in transaction, but instead to the start of the current statement. As such using this mode allows InnoDB to maintain a lot less versions, especially if you do not have very long running statements. If you have some long running selects – such as reporting queries performance impact can be still severe.

In general I think good practice is to use READ COMITTED isolation mode as default and change to REPEATABLE READ for those applications or transactions which require it.

READ UNCOMMITTED – I think this is the least understood isolation mode (not a surprise as it only has 2 lines of documentation about it) which only describe it from the logical point of view. If you’re using this isolation mode you will see all the changes done in the database as they happen, even those by transactions which have not been yet committed. One nice use case for this isolation mode is what you can “watch” as some large scale UPDATE statements are happening with dirty reads showing which rows have already been changes and which have not.

So this statement shows changes that have not been committed yet and might not ever be committed if the transaction doing them runs into some errors so – this mode should be used with extreme care. There are a number of cases though when we do not need 100% accurate data and in this case this mode becomes very handy.

So how does READ UNCOMMITTED behave from a performance point of view? In theory InnoDB could purge row versions even if they have been created after the start of the statement in READ UNCOMMITTED mode. In practice, due to a bug or some intricate implementation detail it does not do that – row versions still will be kept to the start of the statement. So if you run very long running SELECT in READ UNCOMMITTED statements you will get a large amount of row versions created, just as if you would use READ COMMITTED. No win here.

There is an important win from SELECT side – READ UNCOMMITTED isolation mode means InnoDB never needs to go and examine the older row versions – the last row version is always the correct one which can cause dramatic performance improvement especially if the undo space had spilled over to the disk so finding old row versions can cause a lot of IO.

Perhaps the best illustration I discovered with query select avg(k) from sbtest1; ran in parallel with the same update heavy workload stated above. In READ COMMITTED isolation mode it never completes – I assume because new index entries are inserted faster than they are scanned, in case of READ UNCOMMITTED isolation mode it completes in a minute or so.

Final Thoughts: Using InnoDB Isolation modes correctly can help your application to get the best possible performance. Your mileage may vary and in some cases you will see no difference; in others it will be absolutely dramatic. There also seems to be a lot of work to be done in relationship to InnoDB performance with long version history. I hope it will be tackled in the future MySQL version.

The post MySQL performance implications of InnoDB isolation modes appeared first on MySQL Performance Blog.

Importing big tables with large indexes with Myloader MySQL tool

$
0
0

Mydumper is known as the faster (much faster) mysqldump alternative. So, if you take a logical backup you will choose Mydumper instead of mysqldump. But what about the restore? Well, who needs to restore a logical backup? It takes ages! Even with Myloader. But this could change just a bit if we are able to take advantage of Fast Index Creation.

As you probably know, Mydumper and mysqldump export the struct of a table, with all the indexes and the constraints, and of course, the data. Then, Myloader and MySQL import the struct of the table and import the data. The most important difference is that you can configure Myloader to import the data using a certain amount of threads. The import steps are:

  1. Create the complete struct of the table
  2. Import the data

When you execute Myloader, internally it first creates the tables executing the “-schema.sql” files and then takes all the filenames without “schema.sql” and puts them in a task queue. Every thread takes a filename from the queue, which actually is a chunk of the table, and executes it.  When finished it takes another chunk from the queue, but if the queue is empty it just ends.

This import procedure works fast for small tables, but with big tables with large indexes the inserts are getting slower caused by the overhead of insert the new values in secondary indexes. Another way to import the data is:

  1. Split the table structure into table creation with primary key, indexes creation and constraint creation
  2. Create tables with primary key
  3. Per table do:
    1. Load the data
    2. Create index
  4. Create constraints

This import procedure is implemented in a branch of Myloader that can be downloaded from here or directly executing bzr with the repository:

bzr branch lp:~david-ducos/mydumper/mydumper

The tool reads the schema files and splits them into three separate statements which create the tables with the primary key, the indexes and the constraints. The primary key is kept in the table creation in order to avoid the recreation of the table when a primary key is added and the “KEY” and “CONSTRAINT” lines are removed. These lines are added to the index and constraint statements, respectively.

It processes tables according to their size starting with the largest because creating the indexes of a big table could take hours and is single-threaded. While we cannot process other indexes at the time, we are potentially able to create other tables with the remaining threads.

It has a new thread (monitor_process) that decides which chunk of data will be put in the task queue and a communication queue which is used by the task processes to tell the monitor_process which chunk has been completed.

I run multiple imports on an AWS m1.xlarge machine with one table comparing Myloader and this branch and I found that with large indexes the times were:

myloader

As you can see, when you have less than 150M rows, import the data and then create the indexes is higher than import the table with the indexes all at once. But everything changes after 150M rows, import 200M takes 64 minutes more for Myloader but just 24 minutes for the new branch.

On a table of 200M rows with a integer primary key and 9 integer columns, you will see how the time increases as the index gets larger:

myloader2

Where:

2-2-0: two 1-column and two 2-column index
2-2-1: two 1-column, two 2-column and one 3-column index
2-3-1: two 1-column, three 2-column and one 3-column index
2-3-2: two 1-column, three 2-column and two 3-column index

Conclusion

This branch can only import all the tables with this same strategy, but with this new logic in Myloader, in a future version it could be able to import each table with the best strategy reducing the time of the restore considerably.

The post Importing big tables with large indexes with Myloader MySQL tool appeared first on MySQL Performance Blog.

Containing your logical backups: mydumper in docker

$
0
0

Even with software like Percona Xtrabackup, logical backups remain an important component of a thorough backup strategy. To gather a logical backup in a timely fashion we rely on a tool called mydumper. In the Percona Managed Services department we’re lucky enough to have one of the project’s most active contributors and many of the latest features and bug fixes have been released to satisfy some of our own use cases. Compiling mydumper remains one of my personal bug bears and undoubtedly the highest barrier to entry for many potential adopters. There are a few conditions that make compiling it difficult, tiring or even prohibitive. The open source logical backup tool does not supply any official packages, however our friends over at TwinDB are currently shipping a package for CentOS/RHEL. So what if you’re running something else, like Debian, Ubuntu, Arch? Well recently I had a flash of inspiration.

Since I’ve been turning some ideas into docker containers, it dawned on me that it would be a trivial image to create and would add instant portability to the binaries. With a docker environment you can take an image and run a logical backup through to a mapped volume. It’s almost as easy as that.

mydumper_docker

So let me show you what I mean. I have built a docker image with the mydumper, myloader and mysql client libraries installed and it’s available for you on docker hub. This means that we can call a new container to make our backup without technically installing mydumper anywhere. This can get you from zero to mydumper very fast if there are hurdles in your way to deploying the open source backup tool into production.

With the grand assumption that you’ve got Docker installed somewhere, lets pull the image from the docker hub

docker pull mysqlboy/mydumper

Once all the layers download (it’s < 200MB) you’re all set to launch a backup using the mydumper binary. You can roll your own command but it could look similar to;

docker run
--name mydumper
--rm
-v {backup_parent_dir}:/backups
mysqlboy/mydumper
mydumper
--host={host_ipaddr}
--user={mysql_username}
--password={mysql_password}
--outputdir=/backups
--less-locking
--compress
--use-savepoints

If you’re unfamiliar with Docker itself; a very high level summary for you; Docker is a product intended to facilitate process isolation or ‘micro services’ known as containerization. It intends to be lighter and more efficient than Virtual Machines as we traditionally know them. There’s much more to this work flow than I intend to explain here but please see the further learning section in the footer.

Let me explain a little of the above call. We want to launch a mydumper run isolated to a container. We are giving the docker daemon the instruction to remove the container after it finishes it’s run (–rm), we are calling the container to be an ‘instance’ of the mysqlboy/mydumper image and we are passing a traditional mydumper command as the container’s instruction. We have mapped a location on the local filesystem into the container to ensure that the backup persists after the container is stopped and removed. The mydumper command itself will make a full backup of the instance you point it to (mydumper can make remote backups, pulling the data locally) and will use the less locking, savepoints and compression features.

What’s more, the beauty of containerizing the mydumper/myloader binaries mean that you can use this image in conjunction with docker machine to source logical backups from Mac and Windows where this process is typically difficult to assume.

I’m going to be filling in for my colleague Max Bubenick this week at Percona Live in Amsterdam talking about Logical Backups using Mydumper and if you’re planning to attend, the mydumper docker image will provide you with a quick path to trial. Thanks for reading and if you’re in Amsterdam this week don’t forget to say hello!

Further learning about docker:

https://www.docker.com/whatisdocker

https://www.youtube.com/user/dockerrun

http://europe-2015.dockercon.com/

The post Containing your logical backups: mydumper in docker appeared first on MySQL Performance Blog.

Logical MySQL backup tool Mydumper 0.9.1 now available

$
0
0

Databases backupThe new Mydumper 0.9.1 version, which includes many new features and bug fixes, is now available.  You can download the code from here.

A significant change included in this version now enables Mydumper to handle all schema objects!!  So there is no longer a dependency on using mysqldump to ensure complex schemas are backed up alongside the data.

Let’s review some of the new features:

Full schema support for Mydumper/Myloader

Mydumper now takes care of backing up the schema, including Views and Merged tables. As a result, we now have these new associated options:

-d, --no-data Do not dump table data
-G, --triggers Dump triggers
-E, --events Dump events
-R, --routines Dump stored procedures and functions

These options are not enabled by default to keep backward compatibility with actual mixed solutions using Mysqldump for DDLs.

Locking reduce options

--trx-consistency-only      Transactional consistency only

You can think on this as --single-transaction for mysqldump, but still with binlog position. Obviously this position only applies to transactional tables (TokuDB included).  One of the advantages of using this option is that the global read lock is only held for the threads coordination, so it’s released as soon as the transactions are started.

GTIDs and Multisource Slave 

GTIDs are now recorded on the metadata file.  Also Mydumper is now able to detect a multisource slave (MariaDB 10.1.x) and will record all the slaves coordinates.

Myloader single database restore

Until now the only option was to copy the database files to a different directory and restore from it. However, we now have a new option available:

-s, --source-db                   Database to restore

It can be used also in combination with -B, --database to restore to a different database name.

Full list of Bug Fixes:

#1431410 innodb stats tables
#1440403 *-post and *-triggers compressed files corrupt
#1470891 functions may be needed by SP and views
#1390437 segmentation fault against Percona MySQL 5.6.15-63.0
#1446280 Segmentation fault on Debian Wheezy
#1399715 Typo in –tables-list option in manpage
#1428608 missing -K option in mydumper manpage
#1440437 myloader: wrong database name in message when -B used
#1457091 tokudb detection doesn’t work
#1481747 Unable to compile r179 WITH_BINLOG=ON (undeclared ‘bj’)
#1507574 Assertion when broken mrg tables
#841651 dump view definitions
#1485688 make compile error myloader.c:209

 

The post Logical MySQL backup tool Mydumper 0.9.1 now available appeared first on MySQL Performance Blog.


Creating an External Replica of AWS Aurora MySQL with Mydumper

$
0
0

Oftentimes, we need to replicate between Amazon Aurora and an external MySQL server. The idea is to start by taking a point-in-time copy of the dataset. Next, we can configure MySQL replication to roll it forward and keep the data up-to-date.

This process is documented by Amazon, however, it relies on the mysqldump method to create the initial copy of the data. If the dataset is in the high GB/TB range, this single-threaded method could take a very long time. Similarly, there are ways to improve the import phase (which can easily take 2x the time of the export).

Let’s explore some tricks to significantly improve the speed of this process.

Preparation Steps

The first step is to enable binary logs in Aurora. Go to the Cluster-level parameter group and make sure binlog_format is set to ROW. There is no log_bin option in Aurora (in case you are wondering), simply setting binlog_format is enough. The change requires a restart of the writer instance, so it, unfortunately, means a few minutes of downtime.

We can check if a server is generating binary logs as follows:

mysql> SHOW MASTER LOGS;

+----------------------------+-----------+
| Log_name                   | File_size |
+----------------------------+-----------+
| mysql-bin-changelog.034148 | 134219307 |
| mysql-bin-changelog.034149 | 134218251 |
...

Otherwise, you will get an error:

ERROR 1381 (HY000): You are not using binary logging

We also need to ensure a proper binary log retention period. For example, if we expect the initial data export/import to take one day, we can set the retention period to something like three days to be on the safe side. This will help ensure we can roll forward the restored data.

mysql> call mysql.rds_set_configuration('binlog retention hours', 72);
Query OK, 0 rows affected (0.27 sec)

mysql> CALL mysql.rds_show_configuration;
+------------------------+-------+------------------------------------------------------------------------------------------------------+
| name                   | value | description                                                                                          |
+------------------------+-------+------------------------------------------------------------------------------------------------------+
| binlog retention hours | 72    | binlog retention hours specifies the duration in hours before binary logs are automatically deleted. |
+------------------------+-------+------------------------------------------------------------------------------------------------------+
1 row in set (0.25 sec)

The next step is creating a temporary cluster to take the export. We need to do this for a number of reasons: first to avoid overloading the actual production cluster by our export process, also because mydumper relies on FLUSH TABLES WITH READ LOCK to get a consistent backup, which in Aurora is not possible (due to the lack of SUPER privilege).

Go to the RDS console and restore a snapshot that was created AFTER the date/time where you enabled the binary logs. The restored cluster should also have binlog_format set, so select the correct Cluster parameter group.

Next, capture the binary log position for replication. This is done by inspecting the Recent events section in the console. After highlighting your new temporary writer instance in the console, you should see something like this:

Binlog position from crash recovery is mysql-bin-changelog.034259 32068147

So now we have the information to prepare the CHANGE MASTER command to use at the end of the process.

Exporting the Data

To get the data out of the temporary instance, follow these steps:

  1. Backup the schema
  2. Save the user privileges
  3. Backup the data

This gives us added flexibility; we can do some schema changes, add indexes, or extract only a subset of the data.

Let’s create a configuration file with the login details, for example:

tee /backup/aurora.cnf <<EOF
[client]
user=percona
password=percona
host=percona-tmp.cgutr97lnli6.us-west-1.rds.amazonaws.com
EOF

For the schema backup, use mydumper to do a no-rows export:

mydumper --no-data \
--triggers \
--routines \
--events \
-v 3 \
--no-locks \
--outputdir /backup/schema \
--logfile /backup/mydumper.log \
--regex '^(?!(mysql|test|performance_schema|information_schema|sys))' \
--defaults-file /backup/aurora.cnf

To get the user privileges I normally like to use pt-show-grants. Aurora is, however, hiding the password hashes when you run SHOW GRANTS statement, so pt-show-grants will print incomplete statements e.g.:

mysql> SHOW GRANTS FOR 'user'@'%';
+---------------------------------------------------------+
| Grants for user@%                                       |
+---------------------------------------------------------+
| GRANT USAGE ON *.* TO 'user'@'%' IDENTIFIED BY PASSWORD |
| GRANT SELECT ON `db`.* TO 'user'@'%'                    |
+---------------------------------------------------------+

We can still gather the hashes and replace them manually in the pt-show-grants output if there is a small-ish number of users.

pt-show-grants --user=percona -ppercona -hpercona-tmp.cgutr97lnli6.us-west-1.rds.amazonaws.com  > grants.sql

mysql> select user, password from mysql.user;

Finally, run mydumper to export the data:

mydumper -t 8 \
--compress \
--triggers \
--routines \
--events \
—-rows=10000000 \
-v 3 \
--long-query-guard 999999 \
--no-locks \
--outputdir /backup/export \
--logfile /backup/mydumper.log \
--regex '^(?!(mysql|test|performance_schema|information_schema|sys))' \
-O skip.txt \
--defaults-file /backup/aurora.cnf

The number of threads should match the number of CPUs of the instance running mydumper. In the skip.txt file, you can include any tables that you don’t want to copy. The –rows argument will give you the ability to split tables in chunks of X number of rows. Each chunk can run in parallel, so it is a huge speed bump for big tables.

Importing the Data

We need to stand up a MySQL instance to do the data import. In order to speed up the process as much as possible, I suggest doing a number of optimizations to my.cnf as follows:

[mysqld]
pid-file=/var/run/mysqld/mysqld.pid
log-error=/var/log/mysqld.log
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
log_slave_updates
innodb_buffer_pool_size=16G
binlog_format=ROW
innodb_log_file_size=1G
innodb_flush_method=O_DIRECT
innodb_flush_log_at_trx_commit=0
server-id=1000
log-bin=/log/mysql-bin
sync_binlog=0
master_info_repository=TABLE
relay_log_info_repository=TABLE
query_cache_type=0
query_cache_size=0
innodb_flush_neighbors=0
innodb_io_capacity_max=10000
innodb_stats_on_metadata=off
max_allowed_packet=1G
net_read_timeout=60
performance_schema=off
innodb_adaptive_hash_index=off
expire_logs_days=3
sql_mode=NO_ENGINE_SUBSTITUTION
innodb_doublewrite=off

Note that mydumper is smart enough to turn off the binary log for the importer threads.

After the import is complete, it is important to revert these settings to “safer” values: innodb_doublewriteinnodb_flush_log_at_trx_commit, sync_binlog, and also enable performance_schema again.

The next step is to create an empty schema by running myloader:

myloader \
-d /backup/schema \
-v 3 \
-h localhost \
-u root \
-p percona

At this point, we can easily introduce modifications like adding indexes, since the tables are empty. We can also restore the users at this time:

(echo "SET SQL_LOG_BIN=0;" ; cat grants.sql ) | mysql -uroot -ppercona -f

Now we are ready to restore the actual data using myloader. It is recommended to run this inside a screen session:

myloader -t 4 \
-d /backup/export \
-q 100 \
-v 3 \
-h localhost \
-u root \
-p percona

The rule of thumb here is to use half the number of vCPU threads. I also normally like to reduce mydumper default transaction size (1000) to avoid long transactions, but your mileage may vary.

After the import process is done, we can leverage faster methods (like snapshots or Percona Xtrabackup) to seed any remaining external replicas.

Setting Up Replication

The final step is setting up replication from the actual production cluster (not the temporary one!) to your external instance.

It is a good idea to create a dedicated user for this process in the source instance, as follows:

CREATE USER 'repl'@'%' IDENTIFIED BY 'password';
GRANT REPLICATION SLAVE ON *.* TO 'repl'@'%';

Now we can start replication, using the binary log coordinates that we captured before:

CHANGE MASTER TO MASTER_HOST='aurora-cluster-gh5s6lnli6.us-west-1.rds.amazonaws.com', MASTER_USER='repl', MASTER_PASSWORD='percona', MASTER_LOG_FILE='mysql-bin-changelog.034259', MASTER_LOG_POS=32068147;
START SLAVE;

Final Words

Unfortunately, there is no quick and easy method to get a large dataset out of an Aurora cluster. We have seen how mydumper and myloader can save a lot of time when creating external replicas, by introducing parallel operations. We also reviewed some good practices and configuration tricks for speeding up the data loading phase as much as possible.


Optimize your database performance with Percona Monitoring and Management, a free, open source database monitoring tool. Designed to work with Amazon RDS MySQL and Amazon Aurora MySQL with a specific dashboard for monitoring Amazon Aurora MySQL using Cloudwatch and direct sampling of MySQL metrics.

Visit the Demo

Back From a Long Sleep, MyDumper Lives!

$
0
0
MyDumper MySQL

MyDumper MySQLMySQL databases keep getting larger and larger. And the larger the databases get, the harder it is to backup and restore them.  MyDumper has changed the way that we perform logical backups to enable you to restore tables or objects from large databases. Over the years it has evolved into a tool that we use at Percona to back up petabytes of data every day. It has several features, but the most important one, from my point of view, is how it speeds up the entire process of export and import.

Until the beginning of this year, the latest release was from 2018; yes, more than two years without any release. However, we started 2021 with release v0.10.1 in January, with all the merges up to that point and we committed ourselves to release every two months… and we delivered! Release v0.10.3 was released in March with some old pull requests that have been sleeping for a long time. The next release is planned to be in May, with some of the newest features.

Just to clarify, mydumper/myloader are not officially-supported Percona products. They are open source, community-managed tools for handling logical backups and restores with all flavors of MySQL.

What Has Changed?

The principal maintainer remains Max Bubenick, and I’ve been helping out with reviewing issues and pull requests to give better support to the community.

Better planning means that it is not just released on time; we also need to decide what is the new feature that we are going to be packaging in the next release, and the level of quality.

Register for Percona Live ONLINE
A Virtual Event about Open Source Databases

The releases were in 2021, but this effort started in 2020 and I had been working on the release repository as it wasn’t being maintained.

What’s Next?

Exciting times! There are three features that I would like to mention, not because the others are not important, but rather, because they will speed up the import stage.

  • Fast index creation is finally arriving! This was one of the requested features that even mysqldump implemented.
  • Not long ago I realized that two requests can be merged into one, and the community was asking about CSV export and LOAD DATA support.
  • Finally, this request had been waiting for a long time – Is it possible for MyDumper to stream backups? We found a way and we are going to be working on v0.10.9.

I was able to measure the speed-up of the first two and we could get up to 40% on large tables with multiple secondary indexes. The latest one is not implemented yet, but taking into account that the import will be started alongside the export, we can expect a huge reduction in the timing.

Conclusion

We still have a lot of opportunities to make MyDumper a major league player. Feel free to download it, play with it, and if you want to contribute, we need your help writing code, testing, or asking for new features.

MyDumper 0.10.5 is Now Available

$
0
0
MyDumper 0.10.5 available

MyDumper 0.10.5 availableThe new MyDumper 0.10.5 version, which includes many new features and bug fixes, is now available.  You can download the code from here.

For this release, we focused on fixing some old issues and testing old pull requests to get higher quality code. On releases 0.10.1, 0.10.3, and 0.10.5, we released the packages compiled against MySQL 5.7 libraries, but from now on, we are also compiling against MySQL 8 libraries for testing purposes, not releasing, as we think that more people in the community will start compiling against the latest version, and we should be prepared.

New Features:

  • Password obfuscation #312
  • Using dynamic parameter for SET NAMES #154
  • Refactor logging and enable –logfile in myloader #233
  • Adding purge-mode option (NONE/TRUNCATE/DROP/DELETE) to decide what is the preferable mode #91 #25
  • Avoid sending COMMIT when commit_count equals 1 #234
  • Check if directory exists #294

Bug Fixes:

  • Adding homebrew build support #313
  • Removing MyISAM dependency in temporary tables for VIEWS #261
  • Fix warnings in sphinx document generation #287
  • Fix endless loop when processlist couldn’t be checked #295
  • Fix issues when daemon is used on glist #317

Documentation:

  • Correcting ubuntu/debian packages dependencies #310
  • Provide better CentOS 7 build instructions #232
  • –defaults-file usage for section mydumper and client #306
  • INSERT IGNORE documentation added #195

Refactoring:

  • Adding function new_table_job #315
  • Adding IF NOT EXISTS to SHOW CREATE DATABASE #314
  • Update FindMySQL.cmake #149

 

MyDumper 0.10.7 is Now Available

$
0
0
MyDumper 0.10.7 is Now Available

MyDumper 0.10.7 is Now AvailableThe new MyDumper 0.10.7 version, which includes many new features and bug fixes, is now available.  You can download the code from here.

For this release, we have added several features like WHERE support that is required for partial backups. We also added CHECKSUM for tables which help to speed up the restore of large tables to take advantage of fast index creation, and more.

New Features:

  • Adding metadata file per table that contains the number of rows #353
  • Adding –where support #347 #223
  • Option to automatically disable/enable REDO_LOG #305 #334
  • Adding wsrep_sync_wait support #327
  • Adding fast index creation functionality #286
  • Adding ORDER BY Primary Key functionality #227
  • Added support for dumping checksums #141 #194
  • Dump strings using single quote instead of double quotes #191
  • Specify the number of snapshots #118

Bug Fixes:

  • Fixed the create database section when creating with –source-db enabled #213
  • Escaping quotes on detect_generated_fields as it caused segfault #349
  • [usingfastindex] Indexes on AUTO_INCREMENT column should not be selected for fast index creation #322
  • Fixed checksum compression #355
  • Fixed as constraint was ignored #351
  • Fixed int declaration to comply with C99 #346

Documentation:

  • Added s to install #360
  • Added libatomic1 dependency reference on ubuntu #358 #359
  • Release signatures #28

Refactoring:

  • Moving detected_generated_fields as parameter in table_job #331 #333
  • Abstract function for io #332

Won’t-fix:

  • Cannot dump Clustrix tables #87
  • Extending support of WITH_BINLOG to MySQL 8.0 #341

MyDumper 0.11.1 is Now Available

$
0
0
MyDumper 0.11.1

MyDumper 0.11.1The new MyDumper 0.11.1 version, which includes many new features and bug fixes, is now available.  You can download the code from here.

For this release, there are three main changes: 1) we added config file functionality which allows users to set session-level variables (one of the most requested features!), 2) we developed a better and robust import mechanism, and 3) we fixed all the filename related issues.  Those changes and mostly the last one forced us to change the version number from 0.10.9 to 0.11.1 as a backup taken in 0.10.x will not work in 0.11.x and vice versa.

New Features:

  • Adding order by part functionality #388
  • improve estimated_step #376 #373
  • Adding config file functionality #370
    • Use only a config file to load the parameters #318
    • We hope to add parameters and support custom SQL mode and character configuration #274
    • [myloader] Add option to disable Galera Cluster (Wsrep) replication before importing #159
    • mydumper can’t recreate mysql.proc + mysql.events due to 5.7 sql_mode #142
    • trouble with global sql_big_selects=0 #50
    • Enabling MAX_EXECUTION_TIME option #368
    • mydumper dump big table failed, because of dumping time longer than max_execution_time #257
    • trouble with global sql_big_selects=0 #50
    • We hope to add parameters and support custom SQL mode and character configuration #274
  • Adding sync mechanism and constraint split #352
    • [fast index] Deadlock found when trying to get lock #342
    • usingFastIndex: indexes may be created while restore is in progress #292
    • [enhancement] optimized data loading order #285
    • Enhancement request: myloader option to prioritize size #78
  • Table and database names changed to fix several issues #399
    • bug for overwrite-tables option #272
    • Problems related to log output when backing up table name with #179 #210
    • parallel schema restore? #169 #311
    • bug with table name #41 #56
    • Export with table folders #212
    • bug for overwrite-tables option #272
    • [BUG] table name make mydumper fail #391
    • [BUG] table name starts with checksum breaks myloader #382
    • Fix issue #182 for tables with quotes #258
  • Dump procedures and functions using case sensitive database name #69

Bug Closed:

  • [BUG] Error occurs when using –innodb-optimize-keys and there is a column having a foreign key and part of a generated column #395
  • Tables without indexes can cause segfault #386

Fixes:

  • [ Fix ] on table name segfault and add free functions #401
  • Adding the set session execution #398
  • [ Fix ] When no index, alter_table_statement must be NULL #397
  • Avoid logging SQL data on error #392
  • [ fix ] The count was incorrect #390
  • Fix on Debian version #387
  • Fix package URLs #384
  • Change quotes to backticks in SQL statements #350
  • bug for overwrite-tables option #272
  • mydumper: fix an ‘SQL injection’ issue when table name contains a ‘ or #168

Refactoring

  • Reading file list only one time #394
  • Export with table folders #212

Question Addressed:

  • [QUESTION] Increase CPU usage Tuning restore #380
  • [Question] how to use mydumper to backup 2 tables with different where conditions? #377
  • column charset is ignored once it’s different from table charset. #156
  • mydumper/myloader feature requests #84
  • load Generated Columns error #175

Download MyDumper 0.11.1 Today!

Viewing all 30 articles
Browse latest View live