Create Efficient Indexes
Comparison between “Index Merge” and “Composite Index”
The “Index Merge” works well when the indexes that will be merged have similar density. And “Composite Index” works well when the query condition (predicates in the WHERE clause) uses “=” operator.
When the query condition doesn’t use the first column in the composite index, the composite index will generally perform badly.
The characteristics of the “Composite Index”
When the leading column (the first column in the index) isn’t used in the query condition, the composite index will most likely not be used. Even under some circumstances the “index skip scan” can use the composite index, the performance is not very sound.
To create a composite index, two factors should be considered. One is which columns should be included in the index, the other one is the order of the columns in the index. These two factors have great influence on the performance of the index.
- The relationship between the density and the order of the columns
If the indexed columns will be only used using “=” operator, the density of the columns has little impact on the order of the columns. - The impact of “=” operation on the order of the columns
If the query condition doesn’t use “=” operator for the first column in the composite index, the index will not perform well even if other columns in the index are used with “=” operator in the query condition.
“=” operation is more important than the density of the column when deciding the order of columns in the composite index. So to make the best use of the composite index, we need to take both the density and the column usage (“=” or not) into consideration. - IN list iterator
Sometimes, if the leading column of the composite index is used in “BETWEEN...AND” or “LIKE” operation, we can take advantage of “IN list” to rewrite the SQL to improve the performance.
For example, suppose the there is one index idx_tab1 (col1, col2) on the table TAB1…
SELECT * FROM TAB1 WHERE col1 between 10 and 20 AND col2 = 'A';
If there are only limited values that meets the predicate (col1 between 10 and 20), we can rewrite the SQL as follows..
SELECT * FROM TAB1
WHERE col1 IN (10, 15, 20) AND col2 = 'A';
The SQL statement above is equal to …
SELECT * FORM TAB1 WHERE (col1=10 AND col2=’A’) OR (col1=15 AND col2=’A’) OR (col1=20 AND col2=’A’)
This way, the composite index idx_tab1 can be used well because the SQL engine can scan less of the index entries.
Another example, suppose there is one index idx_tab1 (col1, col3, col2) on table TAB1,
SELECT * FROM TAB1 WHERE col1 = 'A' and col2='222';
This time, even the leading column col1 is used in the “=” operator, the second column col3 is not used the WHERE clause. This way, the col2=’222’ can only be used as the “filter” to check the index entry which is not very efficient.
If we know the column col3 only have several values, like 1, 2, 3, and then the SQL statement above can be rewritten as follows…
SELECT * FROM TAB1 WHERE col1 = 'A' and col2='222' and col3 in (1, 2, 3);
It is equal to the following SQL statement…
SELECT * FROM TAB1 WHERE (col1= 'A' and col3=1 and col2='222') OR (col1= ‘A’ and col3=2 and col2='222') OR (col1= ‘A’ and col3=3 and col2='222')
This way, the column col2, col3 can be used for index entry access which is much efficient than being a data “filter”.