Need of Indexes
Image of you have a table of user info, if the table contains 50 million of rows. Without index, running a query like below will need a full table scan. Clearly it is not efficient as it is O(n) problem.
SELECT * FROM user_info WHERE last_name = “Tom”
But if we index it on last_name column, the last_name field will be sorted alphabetically. Now if you look up the last_name = ‘Tom’, you can go directly to the one starts with ‘T’. Internally, index table contains the fields you are indexed and the position of the matching records (ie. pointer).
Cost of Indexes
- Because database needs to maintain a separate list of indexes’ values, there is cost to keep them updated as your data changes
- Indexes cost space. It is a trade-off between space and time. Lets say the user_info table has 2 billions rows and last_name is 8 bytes long, you are looking at roughly 16 GB of space for the data portion of the index. Plus 4-8 bytes for each row pointer, it will go up to 32 GB of space.
With these cost, you don’t want to index every column in a table.
Type of Indexes
- Multicolumn indexes - reduce the set that matches single column only (ie. more selective). For example, if you have each field as index separately, database may not use them all at once at the same time. Like MySQL, it will only ever use one index per table per query. To choose which index to use, MySQL will make a decision about which index will return fewer rows via index statistics.
- Partial indexes - if you don’t have too much space for your index, you can specify a subset of bytes from your index value be used.
- Clustered indexes - In MySQL, for MyISAM, the indexes are kept completely separate from the row data. With clustered indexes, the index and the record itself are “clustered” together. InnoDB uses clustered indexes. In Oracle, clustered indexes are known as “index-organized tables”. The type of index will reduce two lookups (index and record data) to one. Internally, clustered index reorders the way records in the table are physically stored. Therefore table can have only one clustered index. The leaf nodes of a clustered index contain the data pages.
Deep look in MySQL: The InnoDB storage engine creates a clustered index for every table. If the table has a primary key, that is the clustered index. If not, InnoDB internally assigns a six-byte unique ID to every row and uses that as the clustered index. (don’t let it generate a useless one for you). All indexes are B-trees. In InnoDB, the primary key’s leaf nodes are the data. Secondary indexes have a pointer to the data at their leaf nodes.
Index Structure
- B-Tree Indexes - (balanced tree indexes, a tree structure that will never become lopsided as new nodes are added and removed. It gives us O(log n) performance for single-record lookup). Unlike binary tree, B-trees have many keys per node and don’t grow tall or deep as quick as a binary tree. It is very good for range-based queries as well. For example, for quey below, server simply finds the first Ray record and last Robert record. It then knows everything in between are also matches. The same is true of virtually any query that involves understanding the range of values (>, <, MIN, MAX, BETWEEN xx AND yy)
SELECT * FROM user_info WHERE last_name BETWEEN ‘Ray’ AND ‘Robert’
- Hash Indexes - It gives very fast lookups O(1) but it is less flexible and predictable than other indexes. First, the key is hashed and compare. So, the range-based queries cannot use it. Uniform distribution is the key here.
Index Limitation
- Index doesn’t work together with wildcard and regular expression search.
- If index selectively is like > 30% of rows (very low), table scan may be better.






































(4.75 out of 5)
No Comment Received
Sorry the comment area are closed for non registered users