Beginners Guide to MySQL Performance Tuning
The performance of MySQL is affected by the performance characteristics of the host. Various factors can affect the performance of the host: CPU speed and number, disk throughput and access time, network throughput, and competing services on the operating system all have some bearing on the performance of a MySQL instance.
The database contents and its configuration also have an effect on MySQL’s performance.
- Databases that have frequent small updates benefit by being well designed and normalized.
- Database throughput improves when you use the smallest appropriate data types to store data.
- Queries that request only a subset of table data benefit from well-designed indexes.
- Applications that request only specific rows and columns reduce the overhead that redundant requests create.
- Shorter transactions result in fewer locks and delays in other transactions.
- Well-tuned server variables optimize the allocation of MySQL’s buffers, caches, and other resources for a specific workload and data set.
Monitoring
To tune your server’s performance, you must understand its performance characteristics. You can do this by benchmarking its overall performance, and by profiling events either individually with logs and EXPLAIN or as a group using PERFORMANCE_SCHEMA. MySQL installations provide the following benchmarking tools:
- mysqlslap is part of the standard MySQL distribution. It is a diagnostic program that emulates client load on a MySQL Server instance and displays timing information of each stage.
- sql-bench is part of the MySQL source distribution, and is a series of Perl scripts used to perform multiple statements and gather status timing data.
The following is an example of using mysqlslap to set up a schema from a SQL script and run queries from another script:
$ mysqlslap --iterations=5000 --concurrency=50 --query=workload.sql --create=schema.sql --delimiter=";"
The slow query log records statements that exceed limits set by the long_query_time and min_examined_row_limit variables. Use mysqldumpslow to view the contents of the slow query log. The general query log records all client connections and requests received by MySQL. Use it to record all SQL statements received during a period of time (for example, to generate a workload for the use of mysqlslap or other benchmarking tools). Third-party benchmarking suites are also available.
Performance Schema
Performance Schema is a feature for monitoring MySQL Server execution at a low level. It is implemented using the PERFORMANCE_SCHEMA storage engine and the performance_schema database. Performance Schema monitors and allows you to inspect performance characteristics of instrumented code in MySQL Server.
Performance Schema is available in all binary versions of MySQL downloaded from Oracle. Performance Schema is enabled by default, and controlled at server startup with the performance_schema variable. Verify that Performance Schema is enabled with the following statement:
mysql> SHOW VARIABLES LIKE 'performance_schema';
+--------------------+-------+
| Variable_name | Value |
+--------------------+-------+
| performance_schema | ON |
+--------------------+-------+
Performance Schema is designed to run with minimal overhead. The actual impact on server performance depends on how it is configured. The information Performance Schema exposes can be used to identify low-level bottlenecks. This is extremely useful for developers of the MySQL Server product family when debugging performance problems, and for system architects and performance consultants when tuning InnoDB data and log file storage hardware.
Instruments, Instances, Events, and Consumers
Instruments in Performance Schema are points within the server source code from which MySQL raises events. Instruments have a hierarchical naming convention. For example, the following is a short list of some of the hundreds of instruments in Performance Schema:
stage/sql/statistics
statement/com/Binlog Dump
wait/io/file/innodb/innodb_data_file
wait/io/file/sql/binlog
wait/io/socket/sql/server_unix_socket
Each instrument consists of its type, the module it belongs to, and the variable or class of the particular instrument. View all available instruments by querying the performance_schema.setup_instruments table. Performance Schema records each instrumented instance in an instance table. For example, the following query shows that the instrument wait/io/file/sql/FRM records events on the file instance /var/lib/mysql/mem/tags.frm.
mysql> SELECT file_name, event_name FROM file_instances LIMIT 1\G
*************************** 1. row ***************************
FILE_NAME: /var/lib/mysql/mem/tags.frm
EVENT_NAME: wait/io/file/sql/FRM
The following output shows the contents of the setup_consumers table:
mysql> SELECT * FROM setup_consumers;
+--------------------------------+---------+
| NAME | ENABLED |
+--------------------------------+---------+
| events_stages_current | NO |
| events_stages_history | YES |
| events_stages_history_long | NO |
| events_statements_current | YES |
| events_statements_history | NO |
| events_statements_history_long | NO |
| events_waits_current | YES |
| events_waits_history | YES |
| events_waits_history_long | NO |
| global_instrumentation | YES |
| thread_instrumentation | YES |
| statements_digest | YES |
+--------------------------------+---------+
12 rows in set (0.00 sec)
Each consumer NAME is the name of a table in Performance Schema used to query events and summaries. A disabled consumer does not record information, which saves system resources. As MySQL identifies events that occur in the instrumented instances, it records them in event tables.
- The primary event table is events_waits_current, which stores the most recent event for each thread.
- events_waits_history stores the 10 most recent events for each thread.
- events_waits_history_long stores the 10,000 most recent events in total.
The events_waits_* tables all have the same schema. When using Performance Schema to identify a bottleneck or other problem, do the following:
- Ensure that you have enabled Performance Schema with a range of instruments and consumers applicable to the type of problem that you are experiencing. For example, use the wait/io/file/* instruments if you are sure the problem is I/O bound, or a wider range if you are unsure of the root cause.
- Run the test case that produces the problem.
- Query consumers such as the events_waits_* tables, in particular events_waits_history_long with suitable WHERE clause filters to further narrow the cause of the problem.
- Disable instruments that measure problems you have ruled out.
- Retry the test case.
General Database Optimizations
Normalization
Normalization is the act of removing redundancies and improper dependencies in the database to avoid storing the same data in multiple places and risking anomalies. Normalization generally results in more tables with fewer columns, lower overall storage requirements, lower I/O requirements, and faster individual inserts, updates, and deletes. This improves the performance of transactional workloads that perform frequent small updates, but can complicate queries that retrieve large amounts of data.
Data Types and Size
Choosing the correct data type is an important but often overlooked part of table design, yet the size of a data type has a large potential effect on table operations. For example, choosing to store a SMALLINT number as an INT doubles the space required by that column. In a table of a million rows, that decision results in an extra 2 MB in storage wasted, along with slower disk operations, and more memory used by buffers and caches.
Use INSERT … COMPRESS(field_name) … and SELECT … UNCOMPRESS(column_name) … to compress and uncompress string data while storing and retrieving it. Although you can use CHAR or VARCHAR fields for this purpose, avoid problems with character set conversion by using VARBINARY or BLOB columns to store compressed data.
Also consider how columns are compared in queries. If the MySQL server must convert data types to compare them, there is a performance overhead. For example, when you compare or join character fields that use different collations, MySQL must coerce one of them to match the other, which slows down the query. Similarly, avoid queries that compare numeric values to other numeric values, and not string types.
Efficient Indexes
When you query a table for specific rows by specifying a field in the WHERE clause, and that table does not have an index on that field, MySQL reads every row in that table to find each matching row. This results in a lot of unnecessary disk access and can greatly slow down the performance of large tables. Indexes are ordered sets of data that make it easier for MySQL to find the correct location of a queried row. InnoDB orders the table according to the primary key by default; this ordered table is called a clustered index. Every additional or secondary index on an InnoDB table takes up extra space on the file system because the index contains an extra copy of the fields it indexes, along with a copy of the primary key.
Every time you modify data with an INSERT, UPDATE, REPLACE, or DELETE operation, MySQL must also update all indexes that contain the modified fields. As a result, adding many indexes to a table reduces the performance of data modifications that affect that table. However, properly designed indexes produce large gains in the performance of queries that rely on the indexed fields. A query that cannot use an index to find a particular row must perform a full table scan; that is, it must read the entire table to find that row. A query that uses an index can read the row directly without reading other rows, which greatly speeds up the performance of that type of query.
To easily identify queries that expect to retrieve all rows and could therefore potentially benefit from adding indexes to the tables involved, enable the slow query log and the –-log-queries-not-using-indexes server option. Note that this option also logs any query that uses an index but performs a full index scan.
PROCEDURE ANALYSE
PROCEDURE ANALYSE analyzes the columns in a given query and provides tuning feedback on each field:
mysql> SELECT CountryCode, District, Population
-> FROM City PROCEDURE ANALYSE(250,1024)\G
The default settings often suggest ENUM types to optimize a table’s design. If you are sure you do not want to use an ENUM value when analyzing a column for which PROCEDURE ANALYSE() suggests it, use non-default arguments.
- The first argument is the number of distinct elements to consider when analyzing to see whether ENUM values are appropriate. This argument has a default value of 256.
- The second argument is the maximum amount of memory used to collect distinct values for analysis. This argument has a default of 8192, representing 8 KB. If you set a value of 0 for this argument, PROCEDURE ANALYSE() cannot examine distinct values to suggest an ENUM type.
If PROCEDURE ANALYSE() cannot store an acceptable range of candidate ENUM values within the limits set by its arguments, it does not suggest an ENUM type for that column. The example in the slide suggests a CHAR(3) type for the City.CountryCode column. On the other hand, if you use the default arguments, PROCEDURE ANALYSE() suggests ENUM(‘ABW’,‘AFG’,…,‘ZMB’,‘ZWE’), an ENUM type with over 200 elements containing a distinct value for each corresponding CountryCode value.
EXPLAIN
EXPLAIN command:
- Describes how MySQL intends to execute your particular SQL statement
- Does not return any data from the data sets
- Provides information about how MySQL plans to execute the statement
Use EXPLAIN to examine SELECT, INSERT, REPLACE, UPDATE, and DELETE statements. EXPLAIN produces a row of output for each table used in the statement. The output contains the following columns:
- table: The table for the output row
- select_type: The type of select used in the query. SIMPLE means the query does not use UNION or subqueries.
- key: The index chosen by the optimizer
- ref: The columns compared to the index
- rows: Estimated number of rows that the optimizer examines
- Extra: Additional information provided per-query by the optimizer
Use EXPLAIN EXTENDED … to view additional information provided by the optimizer. For example, the following query joins fields from two tables and performs an aggregation:
mysql> SELECT COUNT(*) as 'Cities', SUM(Country.Population) AS Population,
> Continent FROM Country JOIN City ON CountryCode = Code
> GROUP BY Continent ORDER BY Population DESC;
+--------+--------------+---------------+
| Cities | Population | Continent |
+--------+--------------+---------------+
| 1765 | 900934498400 | Asia |
| 580 | 95052481000 | North America |
| 842 | 55127805400 | Europe |
| 470 | 48533025000 | South America |
| 366 | 16179610000 | Africa |
| 55 | 307500750 | Oceania |
+--------+--------------+---------------+
6 rows in set (0.01 sec)
The following output shows the result of using EXPLAIN with the preceding query:
mysql> EXPLAIN SELECT COUNT(*) as 'Cities', SUM(Country.Population) AS Population,
> Continent FROM Country JOIN City ON CountryCode = Code
> GROUP BY Continent ORDER BY Population DESC\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: Country
type: ALL
possible_keys: PRIMARY
key: NULL
key_len: NULL
ref: NULL
rows: 239
Extra: Using temporary; Using filesort
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: City
type: ref
possible_keys: CountryCode
key: CountryCode
key_len: 3
ref: world_innodb.Country.Code
rows: 9
Extra: Using index
2 rows in set (0.00 sec)
The EXPLAIN output lists the joined tables in the order they are read. EXPLAIN approximates the total number of rows that must be read from each table. The total approximate number of rows that this SELECT query must examine is the product of the rows examined in each joined table: 239 x 9 = 2151 rows.
EXPLAIN Formats
EXPLAIN output is available in other formats:
Visual EXPLAIN:
– Outputs in a graphical format – Is available in MySQL Workbench
EXPLAIN FORMAT=JSON:
Outputs in JSON format and is useful when passing EXPLAIN. JSON (JavaScript Object Notation) is a simple data interchange format. The following output shows the result of using FORMAT=JSON in an EXPLAIN statement:
mysql> EXPLAIN FORMAT=JSON SELECT COUNT(*) as 'Cities',
SUM(Country.Population) AS Population, Continent FROM Country JOIN City
ON CountryCode = Code GROUP BY Continent ORDER BY Population DESC\G
*************************** 1. row ***************************
EXPLAIN: {
"query_block": {
"select_id": 1,
"ordering_operation": {
"using_filesort": true,
"grouping_operation": {
"using_temporary_table": true,
"using_filesort": false,
…
1 row in set, 1 warning (0.00 sec)
Examining Server Status
MySQL provides several ways to view server status variables:
1. At the mysql prompt: – STATUS; – SHOW STATUS;
2. At the terminal: – mysqladmin –login-path=login-path status – mysqladmin -u user -p extended-status
MySQL provides a short status message with the mysql command STATUS and the mysqladmin command status. The longer form status output, shown with the mysql command SHOW STATUS and the mysqladmin command extended-status contains values for many system status variables.
Use options with mysqladmin to provide additional functionality. For example, the –sleep (or -i) option specifies the number of seconds to wait between iterations, and automatically re-executes the command after that time. The –relative (or -r) option displays the difference in each variable since the last iteration, rather than its value. Use command-line tools such as grep to extend how you use mysqladmin. For example, use the following command to show only variables that contain the string cache_hits:
shell> mysqladmin --login-path=admin extended-status | grep cache_hits
| Qcache_hits | 0 |
| Ssl_callback_cache_hits | 0 |
| Ssl_session_cache_hits | 0 |
| Table_open_cache_hits | 280 |
Top Status Variables
- Created_tmp_disk_tables: For the temporary tables created by the server while executing statements. If this number is high, the server has created many temporary tables on disk rather than in memory, resulting in slower query executions.
- Handler_read_first: If this number is high, the server has performed many full index scans to complete query requests.
- Innodb_buffer_pool_wait_free: Wait for a page to be flushed in the InnoDB buffer pool before the query request can be completed. If this number is high, the InnoDB buffer pool size is not correctly set and query performance suffers.
- Max_used_connections: This variable provides valuable information for determining the number of concurrent connections that your server must support.
- Open_tables: When compared with the table_cache server system variable, this provides valuable information about the amount of memory that you should set aside for the table cache. If the value of the Open_tables status variable is usually low, reduce the size of the table_cache server system variable. If its value is high (approaching the value of the table_cache server system variable), increase the amount of memory assigned to the table cache to improve query response times.
- Select_full_join: If this value is not 0, you should carefully check the indexes of your tables.
- Slow_queries: This status variable depends on knowing the setting of the long_query_time variable, which defaults to 10 seconds. If the Slow_queries status variable is not 0, check the value of the long_query_time and the slow query log and improve the queries that have been captured.
- Sort_merge_passes: Sorting operations require a buffer in memory. This status variable counts the passes through sort buffers that sort operations require. If this value is high, it can indicate that the sort buffer size is not sufficient to perform one-pass sorting of queries; consider increasing the value of the sort_buffer_size system variable.
- Threads_connected: Capturing this value on a regular basis provides you with valuable information about when your server is most active. Use this variable to determine the best time to perform maintenance on the server, or as a justification for allocating more resources to the server.
- Uptime: This value can provide valuable information about the health of the server, such as how often the server needs to be restarted.
Tuning System Variables
There is a common misconception that server variable configuration is the most important part of server tuning. In fact, in terms of effort spent, more benefits come from optimizing the schema, common queries, and indexes of a typical database than from tuning variables.
Default Settings
The default settings have been chosen by Oracle’s MySQL engineers to suit a majority of production systems, which tend to have frequent small transactions, many updates and few large, slow queries such as those used to generate reports. However, because MySQL is used on systems from small devices (such as point-of-sale systems and routers) to large web servers with huge amounts of memory and fast disk arrays, you may find that your particular environment and workload benefits from changing some server settings from the default.
InnoDB Settings
For example, on a server that is dedicated to MySQL using only InnoDB user tables, increase the value of innodb_buffer_pool_size to a large percentage of total server memory (70%–85%), bearing in mind the needs of the operating system such as cron jobs, backups, virus scans, and administrative connections and tasks. If you have several gigabytes of RAM, you might also benefit from having several innodb_buffer_pool_instances, a setting that enables multiple buffer pools to avoid contention.
Reducing MyISAM Settings
On a system that does not use MyISAM for user tables, reduce the value of MyISAM-only options such as key_buffer_size to a small value such as 16 MB, bearing in mind that some internal MySQL operations use MyISAM.
Reporting Systems
On servers used for running few large slow queries such as those used in business intelligence reports, increase the amount of memory dedicated to buffers with settings such as join_buffer_size and sort_buffer_size. Although the default server settings are better suited to transactional systems, the default my.cnf file contains alternate values for these variables, suited to reporting servers.
Transactional Systems
On servers used to support many fast concurrent transactions that disconnect and reconnect repeatedly, set the value of thread_cache_size to a large enough number so that most new connections use cached threads; this avoids the server overhead of creating and tearing down threads for each connection.
On servers that support many write operations, increase log settings such as innodb_log_file_size and innodb_log_buffer_size, because the performance of data modifications relies heavily on the performance of the InnoDB log. Consider changing the value of innodb_flush_log_at_trx_commit to increase the performance of each commit at the risk of some data loss if the server fails.
If your application executes the same query (or many identical queries) repeatedly, consider enabling the query cache and sizing it according to the results of the common queries by setting appropriate values for query_cache_type and query_cache_size.
Balancing Memory Use
When you set larger values for per-query or per-connection caches and buffers, you reduce the available size for the buffer pool. Tuning a server’s configuration variables is a balancing process where you start with the defaults, give as much memory as you can to the buffer pool, and then tune the variables that are most closely associated with your goals for tuning, problems that you identify from examining the server status, and bottlenecks that you identify by querying Performance Schema.
Top Server System Variables
innodb_buffer_pool_size: For best performance, set this value as large as possible, bearing in mind that too high a value causes the operating system to swap pages, which slows down performance considerably. If you use only InnoDB user tables on a dedicated database server, consider setting this variable to a value from 70% to 85% of physical RAM.
innodb_flush_log_at_trx_commit: There are three possible settings for this variable:
- 0: Write the log buffer to disk once per second.
- 1: Flush the log to disk at each commit, or once per second if no commit occurs.
- 2: Flush the log to the operating system cache, and flush to disk every innodb_flush_log_at_timeout seconds (with a default of one second)
innodb_log_buffer_size: This variable has a default value of 8 MB. Transactions that exceed this size cause InnoDB to flush the log to disk before the transaction commits, which can degrade performance. For applications that use a large number of BLOBs or that have a large spike in update activity, improve transaction performance by increasing this value.
innodb_log_file_size: For write-intensive workloads on large data sets, set this variable so that the total maximum size of all log files (as set by innodb_log_files_in_group) is less than or equal to the size of the buffer pool. Larger log files slow down crash recovery, but improve overall performance by reducing checkpoint flush activity.
join_buffer_size: Increase this value from its default (256 KB) for queries that have joins that cannot use indexes. Change the per-session value when running such queries to avoid setting the global setting and wasting memory for queries that do not need such a large value.
query_cache_size: Improve the performance of applications that issue repeated queries on data that rarely changes by using the query cache. As a baseline, set this variable to a value from 32 MB to 512 MB based on the number of repeated queries and the size of data they return. Monitor the cache hit ratio to determine the effectiveness of this variable, and tune its value based on your observations.
sort_buffer_size: Increase this value to improve the performance of ORDER BY and GROUP BY operations if the value of the Sort_merge_passes status variable is high. table_open_cache: Set this value so that it is larger than N * max_connections, where N is the largest number of tables used in any query in your application. Too high a value results in the error “Too many open files.” High values of the Open_tables status variable indicate that MySQL frequently opens and closes tables, and that you should increase table_open_cache.
thread_cache_size: By default, this variable is sized automatically. Evaluate the Threads_created status variable to determine whether you need to change the value of thread_cache_size.