Monday, February 16, 2009

Advanced MySQL Replication - Improving Performance

Introduction

MySQL Replication is some sophisticated and flexible technology. As we discussed in our recent article "Fixing MySQL Replication", it can be made quite reliable and robust if the right tools are used to keep it running smoothly.

But what of sites who are experiencing enormous load on the primary server, which is overloading the slave server. Are there ways to speed up performance, so the slave can keep up? As it turns out there are quite a number of ways to do this. We'll discuss a number of them.

Architecture

First, a quick discussion of the architecture. As you recall, there are two threads on the slave that are performing work. One is the IO thread, which copies transactions from the master server’s binary log to the slaves relay log. The next is the SQL thread. This is the important one. It reads the relay logs and applies the SQL queries against the slave database in a serial fashion. Once again that means all writes are serialized on the slave database. This can become a performance bottleneck in high transaction environments.

Use Multiple Slaves

One method would be to break up data into multiple slaves. This could be done database by database (schema by schema) creating one slave database. Each slave would be pointing to the same master, and using replicate_do_db and replicate_ignore_db statements to specify which data to capture for it's slave. However it may be that your applications are not broken up cleanly that way, or that load favors certain tables, rather than certain schemas. So for your site there may be reason to do it on a table-by-table basis. You would then use replicate_do_table and replicate_ignore_table. You'll also need to keep in mind that more slaves mean more threads on the master database, which could add load there. If you run into that problem, consider creating a distribution master, and then slaving off of that. As a further consideration, realize that if you are running backups off of slave databases, your backup scripts will obviously be more complicated in this arrangement, so document well.

Priming the Cache

A second method would be to do what's called "priming the cache". Recall that each of the SQL statements, which executed on the master, is re-executed on the slave database. All the work involved in reading blocks of data, sorting, and caching query execution details has to happen again on the slave. However, what if we could do this all BEFORE the slave SQL thread gets to that query. That's what the maatkit mk-slave-prefetch tool was built to do. Some sites have had great success with this method, so it is worth investigating. As with anything, test, tweak, and evaluate before rolling out in production.

Move Writes Outside of Replication

A third method would be to move writes outside of replication. What does that mean? Suppose you have tables that load apache logs into your primary database. You can then use replicate_ignore_tables option to skip these heavy archive tables, and then just run a separate data load process on the slave to get that data into your slave database. Since this can also be done in parallel, it may provide surprising speedups.

Tune Disk IO Subsystem

A fourth method could be to look at the underlying disk subsystem. Are you using RAID? If not, consider it. Can you get faster disks, or add more spindles to your RAID array? If so, these options can speed up overall IO throughput to the volume where your datafiles are sitting.

Consider the MyISAM Storage Engine

A fifth option would be to look at using the MyISAM storage engine on the slave side for those high write-heavy tables. Recall that MyISAM is not transactional. There is a huge amount of code involved in providing the row-based locking, and ACID compliance you find in InnoDB and related transactional storage engines. MyISAM is blazingly fast because it doesn't have to check anything. It simply writes. Therefore, it is very very fast.

You might ask, but what about my transactional integrity. Remember though that all transactions are serialized on the slave, so there is no worry of other sessions reading or writing the same data. You only need to worry about the SQL thread.

Give Up Some Safety

The sixth option we'll mention involves giving up some safety and recoverability on the slave. Since your slave database is presumably a copy of data held elsewhere, recommending less safe options can be seen in that context. Obviously, be sure all your data has been sufficiently backed up in various places.

Firstly, you can disable the binary log on the slave. This will reduce the amount of data that needs to be written while the slave is executing queries from the SQL thread.

Secondly you can configure InnoDB to flush its changes less frequently using innodb_flush_log_at_trx_commit=2. In addition, you can set innodb_locks_unsafe_for_binlog=1. For MyISAM there is a setting delay_key_write=ALL which may help.

Since these options make your database less recoverable, you want to be VERY SURE to disable them if this slave becomes the master at some point.

If the above options don't work for you, consider looking at MySQL 5.1. The new version of MySQL includes a feature called row-based replication. In contrast with the existing statement-based replication, row-based can often avoid re-executing statements on the slave by passing along the change vector of actual data that changed on the primary. Time will tell if the overall performance is noticeably faster, but it's worth investigating.

Conclusion

When looking at ways to speed up the slave, keep in mind that lag on the slave server is normal. MySQL's out-of-the-box slave technology is not meant to be synchronous. If your application has that requirement, we recommend looking at the Google patches to provide semi-synchronous replication. We'll discuss that in more detail in next month's article as we continue our investigation of advanced MySQL replication techniques.

How to Recover MySQL Root Password

Do you want to recover the MySQL root password. its by no means, easy. But its quite simple if you follow the procedure. You will have to follow this step-by-step processes.

  1. Step 1: Stop the MySQL server process.
  2. Step 2: Start the MySQL (mysqld) server/daemon process with the –skip-grant-tables option so that it will not prompt for a password.
  3. Step 3: Connect to the MySQL server as the root user.
  4. Step 4: Set a new root password.
  5. Step 5: Exit and restart the MySQL server.

Here are the commands you need to type for each step (log in as the root user):

Step 1 : Stop the MySQL service:

# /etc/init.d/mysql stop

Output:

Stopping MySQL database server: mysqld.

Step 2: Start the MySQL server w/o password:

# mysqld_safe --skip-grant-tables &

Output:

[1] 5988
Starting mysqld daemon with databases from /var/lib/mysql
mysqld_safe[6025]: started

Step 3: Connect to the MySQL server using the MySQL client:

# mysql -u root

Output:

Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 1 to server version: 4.1.15-Debian_1-log

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql>

Step 4: Set a new MySQL root user password:

mysql> use mysql;
mysql> update user set password=PASSWORD("NEW-ROOT-PASSWORD") where User='root';
mysql> flush privileges;
mysql> quit

Step 5: Stop the MySQL server:

# /etc/init.d/mysql stop

Output:

Stopping MySQL database server: mysqld
STOPPING server from pid file /var/run/mysqld/mysqld.pid
mysqld_safe[6186]: ended

[1]+ Done mysqld_safe –skip-grant-tables

Now Start the MySQL server and test it:

# /etc/init.d/mysql start
# mysql -u root -p

Tuesday, January 6, 2009

Dumping Table Structure and Data

mysqldump

Utility to dump a database or a collection of database for backup or for transferring the data to another SQL server (not necessarily a MySQL server). The dump will contain SQL statements to create the table and/or populate the table.

If you are doing a backup on the server, you should consider using the mysqlhotcopy instead. See section 4.8.6 mysqlhotcopy, Copying MySQL Databases and Tables.

shell> mysqldump [OPTIONS] database [tables]
OR mysqldump [OPTIONS] --databases [OPTIONS] DB1 [DB2 DB3...]
OR mysqldump [OPTIONS] --all-databases [OPTIONS]

If you don't give any tables or use the --databases or --all-databases, the whole database(s) will be dumped.

You can get a list of the options your version of mysqldump supports by executing mysqldump --help.

Note that if you run mysqldump without --quick or --opt, mysqldump will load the whole result set into memory before dumping the result. This will probably be a problem if you are dumping a big database.

Note that if you are using a new copy of the mysqldump program and you are going to do a dump that will be read into a very old MySQL server, you should not use the --opt or -e options.

mysqldump supports the following options:

--add-locks
Add LOCK TABLES before and UNLOCK TABLE after each table dump. (To get faster inserts into MySQL.)
--add-drop-table
Add a drop table before each create statement.
-A, --all-databases
Dump all the databases. This will be same as --databases with all databases selected.
-a, --all
Include all MySQL-specific create options.
--allow-keywords
Allow creation of column names that are keywords. This works by prefixing each column name with the table name.
-c, --complete-insert
Use complete insert statements (with column names).
-C, --compress
Compress all information between the client and the server if both support compression.
-B, --databases
To dump several databases. Note the difference in usage. In this case no tables are given. All name arguments are regarded as database names. USE db_name; will be included in the output before each new database.
--delayed
Insert rows with the INSERT DELAYED command.
-e, --extended-insert
Use the new multiline INSERT syntax. (Gives more compact and faster inserts statements.)
-#, --debug[=option_string]
Trace usage of the program (for debugging).
--help
Display a help message and exit.
--fields-terminated-by=...

--fields-enclosed-by=...

--fields-optionally-enclosed-by=...

--fields-escaped-by=...

--lines-terminated-by=...
These options are used with the -T option and have the same meaning as the corresponding clauses for LOAD DATA INFILE. See section 6.4.9 LOAD DATA INFILE Syntax.
-F, --flush-logs
Flush log file in the MySQL server before starting the dump.
-f, --force,
Continue even if we get a SQL error during a table dump.
-h, --host=..
Dump data from the MySQL server on the named host. The default host is localhost.
-l, --lock-tables.
Lock all tables before starting the dump. The tables are locked with READ LOCAL to allow concurrent inserts in the case of MyISAM tables. Please note that when dumping multiple databases, --lock-tables will lock tables for each database separately. So using this option will not guarantee your tables will be logically consistent between databases. Tables in different databases may be dumped in completely different states.
-K, --disable-keys
/*!40000 ALTER TABLE tb_name DISABLE KEYS */; and /*!40000 ALTER TABLE tb_name ENABLE KEYS */; will be put in the output. This will make loading the data into a MySQL 4.0 server faster as the indexes are created after all data are inserted.
-n, --no-create-db
CREATE DATABASE /*!32312 IF NOT EXISTS*/ db_name; will not be put in the output. The above line will be added otherwise, if a --databases or --all-databases option was given.
-t, --no-create-info
Don't write table creation information (the CREATE TABLE statement).
-d, --no-data
Don't write any row information for the table. This is very useful if you just want to get a dump of the structure for a table!
--opt
Same as --quick --add-drop-table --add-locks --extended-insert --lock-tables. Should give you the fastest possible dump for reading into a MySQL server.
-pyour_pass, --password[=your_pass]
The password to use when connecting to the server. If you specify no `=your_pass' part, mysqldump you will be prompted for a password.
-P, --port=...
Port number to use for TCP/IP connections.
--protocol=(TCP | SOCKET | PIPE | MEMORY)
To specify the connect protocol to use. New in MySQL 4.1.
-q, --quick
Don't buffer query, dump directly to stdout. Uses mysql_use_result() to do this.
-Q, --quote-names
Quote table and column names within ``' characters.
-r, --result-file=...
Direct output to a given file. This option should be used in MSDOS, because it prevents new line `\n' from being converted to `\n\r' (new line + carriage return).
--single-transaction
This option issues a BEGIN SQL command before dumping data from server. It is mostly useful with InnoDB tables and READ_COMMITTED transaction isolation level, as in this mode it will dump the consistent state of the database at the time then BEGIN was issued without blocking any applications. When using this option you should keep in mind that only transactional tables will be dumped in a consistent state, e.g., any MyISAM or HEAP tables dumped while using this option may still change state. The --single-transaction option was added in version 4.0.2. This option is mutually exclusive with the --lock-tables option as LOCK TABLES already commits a previous transaction internally.
-S /path/to/socket, --socket=/path/to/socket
The socket file to use when connecting to localhost (which is the default host).
--tables
Overrides option --databases (-B).
-T, --tab=path-to-some-directory
Creates a table_name.sql file, that contains the SQL CREATE commands, and a table_name.txt file, that contains the data, for each give table. The format of the `.txt' file is made according to the --fields-xxx and --lines--xxx options. Note: This option only works if mysqldump is run on the same machine as the mysqld daemon, and the user/group that mysqld is running as (normally user mysql, group mysql) needs to have permission to create/write a file at the location you specify.
-u user_name, --user=user_name
The MySQL user name to use when connecting to the server. The default value is your Unix login name.
-O var=option, --set-variable var=option
Set the value of a variable. The possible variables are listed below. Please note that --set-variable is deprecated since MySQL 4.0, just use --var=option on its own.
-v, --verbose
Verbose mode. Print out more information on what the program does.
-V, --version
Print version information and exit.
-w, --where='where-condition'
Dump only selected records. Note that quotes are mandatory:
"--where=user='jimf'" "-wuserid>1" "-wuserid<1"
-X, --xml
Dumps a database as well formed XML
-x, --first-slave
Locks all tables across all databases.
--master-data
Like --first-slave, but also prints some CHANGE MASTER TO commands which will later make your slave start from the right position in the master's binlogs, if you have set up your slave using this SQL dump of the master.
-O net_buffer_length=#, where # <>
When creating multi-row-insert statements (as with option --extended-insert or --opt), mysqldump will create rows up to net_buffer_length length. If you increase this variable, you should also ensure that the max_allowed_packet variable in the MySQL server is bigger than the net_buffer_length.

The most normal use of mysqldump is probably for making a backup of whole databases. See section 4.4.1 Database Backups.

mysqldump --opt database > backup-file.sql

You can read this back into MySQL with:

mysql database < backup-file.sql

or

mysql -e "source /patch-to-backup/backup-file.sql" database

However, it's also very useful to populate another MySQL server with information from a database:

mysqldump --opt database | mysql --host=remote-host -C database

It is possible to dump several databases with one command:

mysqldump --databases database1 [database2 ...] > my_databases.sql

If all the databases are wanted, one can use:

mysqldump --all-databases > all_databases.sql

Selectively dumping data with mysqldump

mysqldump is a command line tool for outputting table structures and data and can be used for backups etc. By default mysqldump will dump all data from a table, but it is possible to select which data to be exported with mysqldump. This post looks at how to do this.

The examples in this post have a table called "mytable" in a database called "test". mytable has three columns: mytable_id, category_id and name, and we will be selectively exporting data that matches a specific category_id.

Using mysqldump to dump all data from the table would look like this, subsituting [username] for your username (the -t flag suppresses the table creation sql from the dump):

mysqldump -t -u [username] -p test mytable

The output from my example table looks like this, once we remove all the extra SQL commands (I've added linebreaks to make it more legible):

INSERT INTO `mytable` VALUES
(1,1,'Lorem ipsum dolor sit amet'),
(2,1,'Ut purus est'),
(3,2,'Leo sed condimentum semper'),
(4,2,'Donec velit neque'),
(5,3,'Maecenas ullamcorper');

If we only wanted to dump data from mytable in category_id 1, we would do this:

mysqldump -t -u [username] -p test mytable --where=category_id=1

which would output this:

INSERT INTO `mytable` VALUES
(1,1,'Lorem ipsum dolor sit amet'),
(2,1,'Ut purus est');

You can also abbreviate --where as -w like so:

mysqldump -t -u [username] -p test mytable -wcategory_id=1

If you need to have spaces in the where query or other special shell characters (such as > and <) then you need to put quotes around the where clause like so:

mysqldump -t -u [username] -p test mytable --where="category_id = 1"
OR
mysqldump -t -u [username] -p test mytable -w"category_id = 1"

You can also use the --where flag to selectively dump data from more than one table, but obviously the columns specified in the where clause need to be in both tables.

An example of dumping data from two tables using the same where clause could look like this, where we are selecting category_id from tables "mytable" and "anothertable":

mysqldump -t -u [username] -p test mytable anothertable --where="category_id = 1"

If category_id exists in both tables then the dump will run without error. If the column doesn't exist, you'll see an error like this:

mysqldump: mysqldump: Couldn't execute 'SELECT /*!40001 SQL_NO_CACHE */ * FROM `anothertable` WHERE category_id=1': Unknown column 'category_id' in 'where clause' (1054)
mysqldump: Got error: 1054: Unknown column 'category_id' in 'where clause' when retrieving data from server

mysqldump is an excellent tool for exporting data from MySQL databases.
Using the --where or -w flags allows you to selectively export data
from one or more tables which saves you having to export all data from
a table if you only need a specific subset.