KEMBAR78
Query Optimization | PDF | Database Index | Relational Database
0% found this document useful (0 votes)
204 views9 pages

Query Optimization

Query optimization involves examining multiple query plans to identify an efficient plan for satisfying a query. Cost-based query optimizers evaluate the resource requirements of various plans and use this as the basis for selection. Choosing appropriate data types, adding relevant indexes, limiting results, and allowing MySQL to perform operations can help improve query efficiency and reduce load times.

Uploaded by

Sahil Mahajan
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
204 views9 pages

Query Optimization

Query optimization involves examining multiple query plans to identify an efficient plan for satisfying a query. Cost-based query optimizers evaluate the resource requirements of various plans and use this as the basis for selection. Choosing appropriate data types, adding relevant indexes, limiting results, and allowing MySQL to perform operations can help improve query efficiency and reduce load times.

Uploaded by

Sahil Mahajan
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 9

Query optimization is a function of many relational database management systems in which multiple query plans for satisfying a query

are examined and a good query plan is identified. This may or not be the absolute best strategy because there are many ways of doing plans. There is a trade-off between the amount of time spent figuring out the best plan and the amount running the plan. Different qualities of database management systems have different ways of balancing these two. Cost based query optimizers evaluate the resource footprint of various query plans and use this as the basis for plan selection. Typically the resources which are costed are CPU path length, amount of disk buffer space, disk storage service time, and interconnect usage between units of parallelism. The set of query plans examined is formed by examining possible access paths (e.g., primary index access, secondary index access, full file scan) and various relational table join techniques (e.g., merge join, hash join, product join). The search space can become quite large depending on the complexity of the SQL query. There are two types of optimization. These consist of logical optimization which generates a sequence of relational algebra to solve the query. In addition there is physical optimization which is used to determine the means of carrying out each operation.

Mysql Query Optimization


I heard a comment from a developer the other day: You dont need indexes on small tables. So I asked what the definition of a small table was. He said, anything with a few hundred rows. So I said, 2300 rows? Well.. 24000 rows? Well.. 292000 rows? Thats large. I showed him unindexed queries in his application dealing with tables that had 2300, 24000 and 292000 rows.

Avoid tablescans
When MySQL deals with a query that is unindexed, it does a full tablescan to see if each record in the table meets the criteria specified. On a small table, if the query is executed frequently, the MySQL query cache might be able to serve the query. However, on a larger table, or a table with large rows, it must read every row, check the fields, possibly create a temporary table in ram or disk, and return the results. On a small site, you might not notice it, but, on a large system, forcing tablescans on tables with even a few thousand rows will slow things down considerably: Uptime: 60016 Threads: 11 Questions: 105460332 Slow queries: 197769 Opens: 5819 Flush tables: 1 Open tables: 1320 Queries per second avg: 1757.204 Slow queries are sometimes unavoidable, but, often, slow queries are missing an index.

Use the slow-query log to find potential issues


When analyzing a system to find problems, putting log-queries-not-using-indexes in the my.cnf file and restarting mysql will log the unindexed queries to the slowquery log.

What can be indexed?


The rule of thumb when writing indexes is to write your query in such a way that you reduce the result set as quickly as possible, with the highest cardinality possible. What does this mean?

If you are collecting data of the IP address and the Date, your query against date,ip will actually be worse than ip,date. Imagine receiving 40000 hits to your site on the same date. If you were looking for the number of hits that a particular IP had, you would search the 41 hits they have made over time, and then the 8 that they had today. If you queried by date,ip, you would search 40000 rows then would receive the 8 they had today. Each index you have, adds extra overhead and an index file should be as small as possible. IP addresses can be represented in an unsigned int which takes much less space than the varchar(15) usually used. Remember when you index a varchar field, indexing will spacepad the key to the full length. If you have a variable length field you want indexed, you might be able to figure out the significant portion of that field by finding the average length and adding a few characters for good measure and indexing fieldname(15) rather than the entire field. If a query is longer than the 15 characters, you have still created a significant reduction in the number of rows that it must check. Cardinality refers to the uniqueness of the data. The more unique the data, the lower the chance that youll have thousands of records that match the first criteria. When the data is very similar, the index as built on disk will become imbalanced resulting in slower queries. Since MyISAM and InnoDB use a B-Tree index (or R-Tree if you use a spatial index), data that is similar when inserted, can create a very imbalanced tree which leads to slower lookups. An optimize table can resort and reindex the table to eliminate this, but, you cant do that on an extremely large, active table without impacting response times. # Query_time: 0 Lock_time: 0 Rows_sent: 1 Rows_examined: 3323 SELECT * FROM websites_geo where (zoneid = 5135) LIMIT 1; In this case, zoneid is not indexed on the table websites_geo. Adding an index on zoneid eliminates the tablescan on this query.

Check for equality, not inequality.


An index can only check equality. A query checking to see if values are not equal, cannot be indexed. # Query_time: 0 Lock_time: 0 Rows_sent: 5 Rows_examined: 2548 SELECT * FROM websites where (id = 1056692 && status != d && status != n) order by rand() LIMIT 5; # Query_time: 0 Lock_time: 0 Rows_sent: 10 Rows_examined: 2544 SELECT * FROM websites where (status != n && status != d && traffic > 3000) order by added desc LIMIT 10; These two queries show two different issues, but, deal with the same fundamental issue. First, id is not indexed which would have at least limited the result set to 9 records rather than 2548. The status check isnt able to use an index. On the second query, status is checked followed by traffic. There are other queries issued that check status,traffic,clicks_high. When we look at status (which should be an enum or char(1) rather than varchar(1)), we find that there are only 4 values used. By indexing on id,status and status,traffic,clicks_high, we could alter the queries as such: SELECT * FROM websites where (id = 1056692 && status in (g, )) order by rand() LIMIT 5; SELECT * FROM websites where (status in (g, ) && traffic > 3000) order by added desc LIMIT 10; which would result in both queries using an index.

Choose your data types intelligently.


As a secondary point, id (though it is numeric) happens to be a text field. If you index id in this case, you would have to specify a key length. mysql> select max(length(id)) from websites; ++

| max(length(id)) | ++ | 22 | ++ 1 row in set (0.02 sec) mysql> select avg(length(id)) from websites; ++ | avg(length(id)) | ++ | 8.3315 | ++ 1 row in set (0.00 sec) mysql> Based on this, we might decide to set the key length to 22 as it is a relatively small number and allows room to grow. Personally, I would have opted to have the id be an unsigned int which would be much smaller, but, the application developer uses alphanumeric ids which are exposed externally. With sharding, you could use the id throughout the various tables, or, you could map the text id to a numeric id internally for all of the various tables. There are a number of possible solutions to help any SQL engine perform better. And your data set will dictate some of the things that you can do to make data access quicker.

Helping MySQL Help You


If you do select * from table where condition_a=1 and condition_b=2 in one place, and select * from table where condition_b=2 and condition_a=1, setting up a single index on condition_a,condition_b and adjusting your second query, reversing the conditions to the same order as the keys on the index will increase performance.

Limit your results

Another thing that will help considerably is using a limit clause. So many times a programmer will do: select * from table where condition_a=1 which returns 2300 rows but only the first few rows are used. A limit clause will prevent a lot of data from being fetched by MySQL and buffered waiting for the response. select * from table where condition_a=1 limit 20 would hand you the first 20 records.

Avoid reading the data file, do all your work from the Index
Additionally, if you have a table and only need three of the columns from the result, select fielda,fieldb,fieldc from table where condition_a=1 will return only the three fields. As an added boost, if the fields you are checking can be answered from the index, the query will never hit the actual data file and will be answered from the index. Many times Ive added a field that wasnt needed in the index, just to eliminate the lookup of the key in the index then the corresponding read of the data file.

Let MySQL do the work


MySQL reads tables, filters results, can do some calculations. Going through 40000 records to pick the best 100 is still faster in MySQL than allowing PHP to fetch 40000 rows and do calculations and sorts to come up with that 100 rows. Index, optimize, and allow MySQL to do the database work.

Summary
Making MySQL work more efficiently goes a long way towards making your database driven site work better. Adding six indexes to the system resulted in quicker response times and an increase in the transactions per second. Uptime: 32405 Threads: 1 Questions: 58729705 Slow queries: 64122 Opens: 2911 Flush tables: 1 Open tables: 295 Queries per second avg: 1812.366 Previously, MySQL was generating 3.26 slow queries per second. Now were just beneath 2 slow queries per second and our system is processing 55 more transactions per second. There is still a

bit more analysis to do to identify the slow queries that are still running and to alter the queries to reverse the inequality checks, but, even just adding indexes to a few tables has helped noticeably. Once the developer is able to make some changes to the application, Im sure well see an additional speedup.

Indexes are used to find rows with specific column values fast. Without an index, MySQL has to start with the first record and then read through the whole table to find the relevant rows. The

larger the table, the more this costs. If the table has an index for the columns in question, MySQL can quickly determine the position to seek to in the middle of the data file without having to look at all the data. If a table has 1,000 rows, this is at least 100 times faster than reading sequentially. Note that if you need to access almost all 1,000 rows, it is faster to read sequentially, because that minimizes disk seeks. Most MySQL indexes (PRIMARY KEY, UNIQUE, INDEX, and FULLTEXT) are stored in B-trees. Exceptions are that indexes on spatial column types use R-trees, and MEMORY (HEAP) tables support hash indexes. Strings are automatically prefix- and end-space compressed. In general, indexes are used as described in the following discussion. Characteristics specific to hash indexes (as used in MEMORY tables) are described at the end of this section.

To quickly find the rows that match a WHERE clause. To eliminate rows from consideration. If there is a choice between multiple indexes, MySQL normally uses the index that finds the smallest number of rows. To retrieve rows from other tables when performing joins. To find the MIN() or MAX() value for a specific indexed column key_col. This is optimized by a preprocessor that checks whether you are using WHERE key_part_# = constant on all key parts that occur before key_col in the index. In this case, MySQL will do a single key lookup for each MIN() or MAX() expression and replace it with a constant. If all expressions are replaced with constants, the query will return at once. For example:
SELECT MIN(key_part2),MAX(key_part2) FROM tbl_name WHERE key_part1=10;

To sort or group a table if the sorting or grouping is done on a leftmost prefix of a usable key (for example, ORDER BY key_part1, key_part2). If all key parts are followed by DESC, the key is read in reverse order. In some cases, a query can be optimized to retrieve values without consulting the data rows. If a query uses only columns from a table that are numeric and that form a leftmost prefix for some key, the selected values may be retrieved from the index tree for greater speed:

Indexes are used to find rows with specific column values quickly. Without an index, MySQL must begin with the first row and then read through the entire table to find the relevant rows. The larger the table, the more this costs. If the table has an index for the columns in question, MySQL

can quickly determine the position to seek to in the middle of the data file without having to look at all the data. If a table has 1,000 rows, this is at least 100 times faster than reading sequentially. If you need to access most of the rows, it is faster to read sequentially, because this minimizes disk seeks. Most MySQL indexes (PRIMARY KEY, UNIQUE, INDEX, and FULLTEXT) are stored in B-trees. Exceptions are that indexes on spatial data types use R-trees, and that MEMORY tables also support hash indexes. In general, indexes are used as described in the following discussion. Characteristics specific to hash indexes (as used in MEMORY tables) are described at the end of this section. MySQL uses indexes for these operations:

To find the rows matching a WHERE clause quickly. To eliminate rows from consideration. If there is a choice between multiple indexes, MySQL normally uses the index that finds the smallest number of rows. To retrieve rows from other tables when performing joins. MySQL can use indexes on columns more efficiently if they are declared as the same type and size. In this context, VARCHAR and CHAR are considered the same if they are declared as the same size. For example, VARCHAR(10) and CHAR(10) are the same size, but VARCHAR(10) and CHAR(15) are not. Comparison of dissimilar columns may prevent use of indexes if values cannot be compared directly without conversion. Suppose that a numeric column is compared to a string column. For a given value such as 1 in the numeric column, it might compare equal to any number of values in the string column such as '1', ' 1', '00001', or '01.e1'. This rules out use of any indexes for the string column.

To find the MIN() or MAX() value for a specific indexed column key_col. This is optimized by a preprocessor that checks whether you are using WHERE key_part_N = constant on all key parts that occur before key_col in the index. In this case, MySQL does a single key lookup for each MIN() or MAX() expression and replaces it with a constant. If all expressions are replaced with constants, the query returns at once. For example:
SELECT MIN(key_part2),MAX(key_part2) FROM tbl_name WHERE key_part1=10;

You might also like