Mysql Monitoring
Mysql Monitoring
www.monitis.com
White Paper
Monitis provides an internal agent for monitoring MySQL health. It currently monitors most MySQL status variables (about 245). The whole set of collected variables is divided on 8 categories that cannot be selected simultaneously. Every collection of variables is customizable so users can build their own subset of a variable for every category. Table 1: List of MySQL variables categories in Monitis MySQL monitoring Category Com_xxx Handler Query Cache Ratio of monitored to total number of variables 100 / 147 15 / 16 8/8 Description Counts number of specific (e.g. INSERT, DELETE, SELECT, etc.) SQL statements that have been executed. Counts handler operations, such as the number of times MySQL internally asks a storage engine to read the next row from an index. The query cache behavior (e.g. number of times query was added and query result was found in the cache, count of free memory blocks in the cache, etc.). Inspects the threads cache (e.g. how many threads were created to handle connections, how many threads are not sleeping, etc.). Counts how many times MySQL has created temporary tables and files. The InnoDB storage state (e.g. InnoDB buffer pool, log buffer behavior, number of data reads/writes, etc.) Presents how the Secure Sockets Layer (SSL) is configured on a server if applicable. Inspects some other status variables (e.g. the number of open tables, files, streams, joins, slow queries, etc.).
Of course, not all of the above mentioned MySQL variables monitored by the agent are equally important. Many are very important but some do not give clear information about MySQL health status. Tracking such data doesnt help much and makes MySQL monitoring more cumbersome. Monitoring a large variety of conditions and metrics of a database server is a common mistake that causes many false-positive alerts. This leads to situations where a user cannot understand what is happening with his/her application. It is therefore very important to reduce the "noise" and select the minimal-working set of monitoring metrics. Ideally, when something goes wrong, you should only get one alert, not many alerts from several different health checks. In particularly, you should not monitor things that are unreliable in indicating a problem. Finally, supplying the raw data does not always allow the user to get a clear understanding of the current situation. Fortunately the Monitis agent allows you to customize all the categories and select to monitor only important variables.
www.monitis.com
White Paper
HEALTH EVALUATION
The MySQL server maintains system variables (340 for version 5.5.9) that indicate how it is configured and status variables (338 for version 5.5.9) that provide raw information about the state of the database. The status variables can be divided into two categories GLOBAL, which collects the values over all connections, and SESSION, which shows the values for the current connection. Most status variables mindlessly accumulate the count of all occurred events like commands, queries, connections, caches and buffers filling, etc. This can give a rough assessment of overall status, but usually filtering, preprocessing and simple calculations are required to get more informative metrics that better evaluate MySQL server health status. Please remember that most MySQL status variables are counters that may just show an average estimation of database server status, which is generally calculated by tracking processes since the last restart of the server. Very often, shortterm evaluations and dynamic behaviors of the server are also required. The dynamic evaluation is required to follow the behavior of problematic processes and for real-time detection of dangerous situations. A series of average values for permanently increasing counters can be prepared for periodical (e.g. per minute) system requests. Therefore the differential behavior of any variable (vdyn) can be calculated by the following simple formula which calculates the average differential value (vn vn-1) of the variable at the end (vn) and beginning (vn-1) of a defined time period (dt). dyn = 1 /dt The average, long-term estimations are usually used to see general trends and detect the inaccuracies and mistakes in server configuration and problems cause by external events. These metrics have well defined states and can be shown as green/yellow/red light signals. On the other hand, the dynamic behavior used in movement analysis and real-time detection of alarming situations are usually shown in graphs. In the next chapter, the selected MySQL metrics have been described and the way to obtain them from raw MySQL data has been formulated. Moreover, suggested threshold values have been defined. Of course, users may change the threshold values and set new desired ones. The metrics defined below are the recommended base for monitoring MySQL status and are used in the Monitis monitoring agent. To make use of the monitoring agent easier, the most important information has been repeated in the tables at the end of this document. Appendix A presents the importance (value) of the defined metrics along the related area and possibility of their graphical presentation. Suggested threshold values have been collected in the second appendix. Possible user actions (recommendations) to improve the situation if metric values cross the threshold are presented in Appendix C.
White Paper
Usually, it is necessary to set the key_buffer_size to at least a quarter, but no more than half, of the total amount of memory on a server. Ideally, it will be large enough to contain all indexes (the total size of all .MYI files on the server). If it is impossible for some reason, the best way to evaluate sufficiency of the current key_buffer_size is to compare the ratio of the key_reads (number of physical reads index blocks from disk) to the key_read_requests (number of requests to read a key block from the cache). This ratio should normally be less than 0.01. If this value is not close to zero, it indicates that MySQL key cache is overloaded and key_buffer_size variable should be reconfigured. FORMULA: Normally, the usage of the MyISAM key buffers is evaluated by calculating: key cache hit rate = 1 THRESHOLDS: Warning Less than 99% Critical Less than 95% key_reads 100% key_read_requests
DESCRIPTION:
Key cache writes is another extremely important variable, because MySQL internally uses MyISAM storage for temporary tables (join, sort, order, etc.) and most INFORMATION_SCHEMA tables. Normally the effectiveness of MyISAM key buffer writes will depend on the kind of queries the MySQL server primarily executes. But it should be configured correctly to avoid mass misses while putting keys into the cache.
FORMULA:
The efficiency of MyISAM key cache can be evaluated by key writes to the cache rate: key cache write rate = key_writes 100% key_write_requests
THRESHOLDS:
Warning Less than 90% Critical Less than 75%
White Paper
If the query cache hit rates goes to low please checks MySQL configuration parameters: query_cache_type (enables or disables query-caching mechanism, and specifies how the cache should work if it's enabled), query_cache_limit (maximum size of a result set that MySQL can cache) query_cache_size (total amount of system memory that can be allocated to the query cache).
FORMULA:
query cache hit rate = qcache_hits 100% Qcache_hits + com_select
THRESHOLDS:
Warning Less than 90% Critical Less than 80%
SLOW QUERIES
EFFECT: Performance DESCRIPTION: The server can detect and count of queries that have taken more than long_query_time seconds (slow_queries). Generally, the high value indicates that many queries are not being optimally executed. The percentage of slow queries can be evaluated by using the following formula. Its value should be as low as possible. Try to find and reconstruct the slow executed queries (in most cases they are not optimally created) using the following technique: switch on the logging of slow queries (log_slow_queries ON) and analyze the log results by using e.g. www.monitis.com
White Paper
mysqldumpslow tool. Next, run EXPLAIN on the slow query and then, based on the particular query, take the appropriate actions to fix it. Note that, ideally, there should not be any slow queries, but sometimes there are a few. FORMULA: slow queries rate = THRESHOLDS: Warning Less than 95% Critical Less than 90% slow_queries 100% queries
White Paper
so that the optimization of problematic queries can decrease the odds of table locks. Next, optimize tables (try to split the problematic tables). Finally, switch to the InnoDB engine which uses row level locking instead of table and is therefore much less susceptible to lock contention. FORMULA: table lock contention = 1 THRESHOLDS: Warning Less than 70% Critical Less than 40% table_lock_waited 100% table_lock_waited + table_lock_immediate
www.monitis.com
White Paper
FORMULA: innoDB buffer pool hit ratio = 1 THRESHOLDS: Warning Less than 99% Critical Less than 95% innodb_buffer_pool_reads 100% innodb_buffer_pool_requests
White Paper
EFFECT: Performance DESCRIPTION: Join operations are often quite a heavy process, especially when indexes are not used correctly. The InnoDB heavy join hit rate, as define below, should have a low value. If it does not, it's a good idea to check indexes in the most commonly used tables. They might be not optimized for popular queries. FORMULA: For evaluating the rate of heavy join operations the following formula can be used: heavy join hit rate = THRESHOLDS: Warning Less than 95% Critical Less than 90% select_full_join + select_range_check + select_scan 100% com_select
INDEXES USAGE
EFFECT: Performance DESCRIPTION: When MySQL does a lot of table scans it might mean that the server does not use indexes efficiently. A good indicator of this problem is when the values of handler_read_rnd_next and handler_read_rnd together (number of rows read via full table scans) - are high compared to the sum of handler variables which denote all row accesses - such as handler_read_key, handler_read_next etc. It is advised to examine tables and queries for proper use of indexes. Turn on the slow query log, identify the queries using a full table scan and tune them. It might also be necessary to reorganize and tune the database schema, indexing or queries. FORMULA: innoDB full table scan rate = 1 where: handler sum = handler_read_rnd_next + handler_read_rnd + handler_read_first + handler_read_key + handler_read_next + handler+read_prev + handler_read_last THRESHOLDS: Warning Less than 90% Critical Less than 80% handler_read_rnd_next + handler_read_rnd 100% handlers sum
White Paper
For optimal performance, InnoDB shouldnt have to wait before writing DML activity to the InnoDB log buffer. Therefore, a high number of innodb_log_waits (times per second that waiting was required for writing activities to be flushed before continuing) indicates that the log buffer is too small. A large log buffer enables large transactions to run without a need to write the log to disk before the transactions commit and saves disk I/O. The InnoDB log cache hit rate will show how close to optimal the usage of log cache is (should be close to 100%). When innoDB log cache values are low try to increase the innodb_log_buffer_size. (Normally, the sensible values range from 1MB to 8MB). Notice that optimally the innodb_log_buffer_size should be a multiple of innodb_log_file_size. FORMULA: innoDB log cache = 1 THRESHOLDS: Warning Less than 90% Critical Less than 80% innodb_log_writes 100% innodb_log_write_requests
10
White Paper
performance and the backups on disk should be done from time to time. The binary log cache hit rate can show how close to optimal the usage of the binlog cache is (It should be close to 100%). With a small value of binlog_cache_size, it is easy to overrun this cache, hence a temporary file will be created/used to store this information until the transaction completes. To avoid this, you can try to increase binlog_cache_size and/or to change overall binary log format to MIXED mode. Notice that beginning with MySQL 5.5.9, binlog_cache_size sets the size for the transaction cache only, and the size of the statement cache is governed by the binlog_stmt_cache_size system variable. FORMULA: bin log cache hit rate = 1 THRESHOLDS: Warning Less then 90% Critical Less than 80% binlog_cache_disk_use 100% binlog_cache_use
11
White Paper
The server can detect and count the threads that have taken more than slow_launch_time seconds to create (slow_launch_threads). If it occurs often it can lead to general system overload and may be caused by the existence of non-optimal queries. The percentage of slow threads can be evaluated by using the following formula FORMULA: slow threads rate = 1 THRESHOLDS: Warning Less than 95% Critical Less than 90% slow_launch_threads 100% connections
CONNECTIONS USAGE
EFFECT: Availability DESCRIPTION: Once the maximum connection limit for the MySQL server has been reached, no other user connections can be established and errors occur on the client side of the application. So the connections usage shouldn't normally exceed 90 95%. Otherwise you may try to increase connections limit (max_connections). This situation can occur also when the process failed to release the database connection resources for some reason or simply because of unclosed connections. Anyway, this situation requires careful analysis to identify the concrete reason and fix it. FORMULA: connections usage = 1 THRESHOLDS: Warning More than 85% Critical More than 95% threads_connected 100% max_connections
www.monitis.com
12
White Paper
The pure (excluding other background processes on host machine) CPU usage by the MySQL process can be evaluated by the following formula: CPU usage = process_cpu_usage 100% process_cpu_usage + cpu_idle
Where process_cpu_usage is the process usage CPU time, expressed as a percentage of total CPU time. THRESHOLDS: Warning More than 85% Critical More than 95%
13
White Paper
There are several other global buffers that MySQL creates depending on which table engines you are using. The formula below assumes you have a mix of MyISAM and InnoDB tables and you are using the query cache: Based MySQL memory usage = key_buffer_size + max_head_table_size + innodb_buffer_pool_size + innodb_additional_mem_pool_size + innodb_log_buffer_size + query_cache_size Memory usage by the mysqld process should be quite low on a properly configured and well-tuned system. This metric can be evaluated in manner similar to MySQL CPU usage, i.e. using top: top -b -n 1 -p $MYSQL_PID > test.txt Where $MYSQL_PID is the mysql process id and test.txt is the name of the file where you want to save the output of the command. FORMULA: The pure (excluding other background processes on host machine) memory usage by the MySQL process can be evaluated with the following formula: memory usage = 1 100% 1 + free_memory/process_memory_usage_percent * total_memory
Where, process_memory_usage_percent is the process usage memory, expressed as a percentage of total memory. THRESHOLDS: Warning More than 85% Critical More than 95%
www.monitis.com
14
White Paper
This will output data_length and index_length for each table in the selected database (mydb). If you add them all together you can get the size used for your particular database. Alternatively, the following query can be used: SELECT SUM(DATA_LENGTH) + SUM(INDEX_LENGTH) AS SIZE \ FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA = 'MYDB'; If you have direct access to the host machine and root permissions then the physically occupied disk space of MySQL data can be obtained by using following command: du -sh /var/lib/mysql and the free disk space: df h /var/lib/mysql where /var/lib/mysql is the default location of MySQL data (you should replace it if you change the default location of data during MySQL setup). FORMULA: So, the evaluation of pure Disk Usage can be done with the following formula: disk usage = used_disk_space 100% used_disk_space + disk_free_space
APPENDIX A. SELECTED METRICS WITH SUGGESTED WEIGHT , INDICATOR OF GRAPHICAL 1, 2 PRESENTATION DESIRABILITY, AND AREA OF IMPACT
Metric name Key cache hit rate Key cache writes rate Query cache hit rate Query cache prunes rate Slow queries rate Table cache hit rate Table Lock contention InnoDB buffer pool hit rate InnoDB buffer pool usage
5 5
Effect Performance Performance Performance Performance Performance Performance Performance Performance Performance
Graph Y
Weight 3 1
3 2 1
3 2
3 2
www.monitis.com
15
White Paper
InnoDB cache write wait rate Heavy join hit rate Indexes usage
5 5 5
1 2 1 2 3 1 2 3 3 Y 2 2 1
Tmp cache hit rate Binary log cache hit rate Slow threads rate Thread cache hit rate Connections usage MySQL process CPU usage
6 6
1. 2. 3. 4. 5. 6.
NOTES: The suggested list of metrics does not yet cover the MySQL cluster health status monitoring. This will be defined later. The metric calculation formulas are described in chapter 4. Weight shows the importance of metric while evaluating MySQL health status (3 high, 2 middle, 1 low), Graph column indicates desirability of calculating and graphically displaying the dynamic behavior of metric. InnoDB storage metric. Host machine resource-usage.
Metric key cache hit rate Key cache writes rate Query cache hit rate Query cache prunes rate Slow queries rate Table cache hit rate Table Lock contention 2 InnoDB buffer pool hit rate 2 InnoDB buffer pool usage 2 InnoDB cache write wait rate 2 Heavy join hit rate 2 Indexes usage 2 Log cache hit rate Tmp cache hit rate Binary log cache hit rate Thread cache hit rate www.monitis.com
Warning <99% <90% <90% <95% <95% <90% <70% <99% <90% <95% <95% <90% <90% <90% <90% <95%
Critical <95% <75% <80% <90% <90% <85% <40% <95% <80% <90% <90% <80% <80% <80% <80% <90%
16
White Paper
Slow threads rate Connections usage 3 MySQL process CPU usage 3 MySQL process RAM usage 3 MySQL process Disk space usage
1. 2. 3.
NOTES: The default values of WARNING and CRITICAL criteria for every metric have been approximately defined. User may have to/want to change this. InnoDB storage metric. Host machine resource-usage.
Query Cache engine removes queries from cache to make room for other queries because of not enough cache size. Some queries exceed long_query_time seconds (default 1 sec). Table cache size is too small and cache overloaded, so MySQL opens part of tables from disk (not memory). MySQL have to wait for a lock table because of: incorrect database design, suboptimal complex queries, using MyISAM storage instead of InnoDB, etc. InnoDB buffer size is too small and MySQL have to access part of data from disk (not memory). InnoDB buffer is probably incorrectly configured. For optimal performance, InnoDB should not have to wait before writing
Check MySQL configuration parameters: query_cache_type, query_cache_limit, query_cache_size. Usually increasing the query_cache_size resolves a situation. Increase the query-cache-size slightly to keep most queries inside cache (often it's not possible to keep all query results in the cache). Try to find and reconstruct slow executed queries. Carefully (not to exceed the limitation of operating system for open file descriptors) increase the table_cache value. This is complex problem that require careful analysis, but try to: firstly optimize queries, next optimize (split) tables, finally, and switch to the InnoDB engine. Check the correctness of the InnoDB buffer configuration, especially: innodb_buffer_pool_instances, innodb_buffer_pool_size. Checks the configuration of MySQL memory buffer. The InnoDB buffer pool should be a bit (~10%) larger than your data (total size of InnoDB TableSpaces).
InnoDB buffer pool 2 usage InnoDB cache write wait 2 rate www.monitis.com
17
White Paper
Indexes usage
pages into the InnoDB buffer pool. Select queries cannot use tables indexes (index might not exist, or by not optimize for the query). MySQL are doing a lot of table scans which is suggests that your tables are not properly indexed or that your queries are not written to take advantage of the indexes you have. InnoDB log buffer size is too small and MySQL too often puts down the buffer content to the disk file. This occurs if an internal temporary table is created initially as an in-memory table but becomes too large and MySQL have to convert it to an on-disk table. This occurs when part of transactions is too large for the binary log cache and MySQL have to store such statements into a temporary file. The threads cache size is too small and cached connections cannot be reused. It occurs if something is delaying the new threads creation for connection. It occurs if connections count is close or has reached current setting of maximum connections limit. There are many different reasons that can affect for overloading CPU usage by MySQL server.
Tables and queries should be examined for proper use of indexes: identify and tune the queries using full table scan, reorganization and tune a database schema, indexing or queries. Try to increase the Innodb_log_buffer_size, usually should be between 1 and 8 MB, optimally the Innodb_log_buffer_size should be multiple of innodb_log_file_size. Try to increase one of these (or both) values: tmp_table_size, max_heap_table_size values. Try to increase binlog_cache_size and/or to change overall binary log format to MIXED mode. For MySQL 5.5.9 and newer look also at binlog_stmt_cache_size system variable. The ideal situation is to get Threads_created as close as possible to Thread_cache_size, so any new connections doesn't waits for new thread allocation. Check existence of suboptimal queries. Look for of unclosed connections and later increase connections limit (max_connections). Check connection to MySQL (especially from remote hosts). Look for slow queries. Check what currently running queries are doing (SHOW PROCESSLIST) Also check other things like: buffer sizes, table cache, query cache and innodb_buffer_pool_size. Check the number of allowed connections (max_connections). Ensure that total size of global buffers (key_buffer_size, max_heap_table_size, innodb_buffer_pool_size, innodb_additional_mem_pool_size, innodb_log_buffer_size, query_cache_size) is not too big. Optimize the database design to normalize DB and reduce unnecessary data and indexes. Remove outdated binary logs (PURGE BINARY LOGS command). Regularly use OPTIMIZE TABLE command to reclaim the unused space and to defragment the data files.
There are many different reasons that can affect for overloading RAM usage by MySQL server.
Normally, MySQL uses the Disk space to store the tables data and indexes. Besides, the disk space is used to store binary log files. In addition, MySQL require temporary disk space for temporary tables.
1.
NOTES: Establishing of caches may take a while to reach a state that is representative of normal operations. So it is usually necessary to wait some time until the system is stabilized to get correct results.
www.monitis.com
18
White Paper
2. 3.
www.monitis.com
19
White Paper
ABOUT MONITIS
Monitis believes that the Cloud is the biggest thing to happen in IT management since IT management. Having seen this vision early, Monitis is now the global leader in developing this market. It is the first affordable network and systems monitoring solution based 100% in the Cloud. Besides Monitis enthusiastic and loyal user base of 70,000 customers from small businesses to Fortune 500 companies to government agencies and educational institutions, Monitis has won rave reviews from the technology analyst community, such as Most Innovative Start-Up from The 451 Group, a listing in OnDemand 100, a ranking by Morgan Stanley, KPMG, and AlwaysOn, of 100 top private companies globally. Monitis was founded in 2005 by a team of seasoned IT developers fed-up and tired of the limits of software-based tools, while inspired by the promise of the Cloud. Headquartered in San Jose, CA, Monitiss team of IT professionals has extensive experience running enterprise-grade IT businesses, as well as starting and selling several IT start-ups. Monitis employs a global workforce and enjoys a robust average month-on-month revenue growth of over 10%. For more information, contact: Monitis Inc. Sales & Marketing Department sales@monitis.com http://www.monitis.com US & Canada Toll Free: +1-800-657-7949
www.monitis.com
20