Indexing, what’s the fuss ?!

While I was working as an intern, I was very much intrigued to see how queries would turn out so efficiently just by adding indexes to a table. Today in this article, I’m just going to share my knowledge about why we use indexes, how they work and the available indexing techniques in the relational databases. I’ve used indexes in relational databases (SQL) and non-relational databases (MongoDB) so far and I’ve seen query time getting reduced phenomenally.

Akshit Bansal
5 min readMar 16, 2022

Why do we use Indexing ?

Indexing is exactly what it seems. The very first time you would have heard about it, must be in your school textbooks. Say your teacher had asked you to open chapter number five and read some particular sub-topic from that particular chapter. You obviously didn’t scroll page by page to reach that particular sub-topic (unless you really wanted to pass time in that boring lecture). You would go to the ‘index’ page, find page number for that particular sub-topic from chapter number five and directly open that page number.

Indexing page improved your ‘search time’ to find that particular topic!

Same applies to databases as well. Imagine, you went to Myntra’s site to buy a shirt of some particular brand and checked that particular brand under the filters section and now the website takes around 1–2 minutes just to filter out all the shirts of that particular brand from the database, row by row. Sounds daunting right? You would probably shift to some other platform (which uses indexing xD).

That’s when indexing comes into picture. All these big e-commerce platforms use indexing in their databases intensively. It provides a way to search through the table without iterating through every row of a table (with possibly millions of rows).

How indexing in databases works?

There are multiple ways to achieve indexing in databases. I’m going to explain two very popular ones in this article. First being, clustered indexing.

Clustered Indexing

Clustered index helps in enhancing the search operations by optimising the way data is stored in a table. Let’s say you want to filter out items of the ‘H&M’ brand among all the products visible on Myntra’s product listing page. You hit the filter button and a select query where the brand name is H&M gets triggered at the backend.

Will it reduce the query time if all the rows in the backend are present in a sorted fashion w.r.t. Brand name? Of Course it would! Just like binary search is more efficient than linear search, if the data is present in a sorted fashion.

So that’s what clustered indexing does. It stores the data in a sorted fashion w.r.t the provided column. By default, SQL does the clustered indexing w.r.t primary key column. If you wish to change the column for indexing, you would need to delete the primary column index and trigger a query for your preference of column indexing.

But why do we need to delete the primary key index to add our own index?

Clustered indexing works on the physical table itself without creating any copy of it. In case of a primary key, it has already sorted the data w.r.t primary key. At a time, only one sorting can be applied on a table. Hence adding a new clustered index would require changing the order of rows in the entire table.

Non-clustered indexing

Unlike clustered indexing, non-clustered indexing doesn’t make any changes to the way physical data is kept in tables. Instead, it stores a mapping of all the rows and their addresses on the basis of the provided key, separately. Just like we discussed the ‘index page in a textbook’ analogy at the start.

So this time, if you hit the filter button, the SQL query will be triggered in a way that it will first find the address of the row with brand name as ‘H&M’ and bring row data from that particular row address. This search operation will be slightly expensive relative to the clustered index, both memory and time wise due to an extra lookup. (i.e. finding the row address from the index mapping and then getting the data from that particular row address).

However, it is still very very efficient rather than performing a linear search over every row.

This also means, we can achieve multiple indexes on the same table unlike using a clustered index since we are not making any changes to the physical table. We can maintain multiple index mappings on the memory w.r.t different columns (or combination of columns).

Non-clustered indexing takes some extra space to store the row addresses mapping, however it is so small that it can actually be stored in the cache memory.

When to use clustered and non-clustered indexing?

In my opinion, every table should have at least one clustered index to optimise basic search operations preferably on columns with maximum possibility of search operations. For example, the brand name in our example.

On the other side, for tables with higher number of insert/update operations, non-clustered indexing should be preferred, since it would not require extra time and CPU computations required for sorting the rows after every insert/update on the actual table.

--

--

Akshit Bansal

As a kid I was scared of bugs, then I became a software engineer. Now I’m even more scared of ’em. Engineer @Milkbasket