SQL Server Index basics in 10 minutes.

SQL Server index can be in intimidating subject. Pretty much everything you read out there is confusing, complicated and you aren’t even sure if it applies to your situation.

As always, we’ll try ang shed a little light on the situation, at the risk of oversimplifying it. It’s far better to get the big ideas right than to get caught up in minor detail. So, without further ado…

There are two kinds of tables: heaps and clustered tables

Heap tables are like diaries. You always append new information after the last entry – you fill the available space from top to bottom. On the other hand, clustered tables are like English dictionaries. In a dictionary, there is a physical sorting of the word – you can essentially count of all the “B..” words coming after all the “A…” words.

english_dictionary

So, just by enforcing all the entries are physically sorted by one piece of information – something wonderful happens. When you want to find something, you can SEEK the word instead of SCANning the whole thing from page one. Want to find the word “Dish”? You’d look at the side of dictionary, look for the part with all the “D” words, and then find your word, right? That, in SQL Server world, is called a SEEK. The word is used in contrast to SCAN, which refers to the idea of taking some bulk data, and look for pattern match from very top to very bottom, a brute-force approach if you will. As you can imagine, SEEK is better.

Now, before we leave the conversation on heap table and clustered table, let’s think if there ever is a time you want to choose heap table over clustered table. The answer is practically never. The only advantage heap may have is that it’s easier to INSERT because you always append at the end. However, if that is what you are after – you can always get an identity column (You may know this as AutoNumber – where the ID of a table would automatically increment by one), make a clustered index based on it, and you are guaranteed a heap-like write performance.

 Clustered Index and Nonclustered Index

Now that you’ve figured Clustered table, Now’s the time to figure out Non-Clustered Index. THe presence of a clustered index is what makes a table a clustered table. It is a mechanism by which SQL server can enforce the sorting by given criteria. So, if you have a Customer table, you can do several things. One idea is to physically sort the table by CustomerID – so that it’s very easy to find particular row by ID. Another idae is to physically sort the table by (“cluster on”) geography columns, such as State and then City, so that range-selecting everyone in Buffalo, NY is really easy. Yet another idea is to cluster it on customer date of birth, and so on…

If you are catching on, yes, you can’t have more than one clustered index. You can physically sort the rows only one way. And that is precisely why there can be only one clustered index.

So what do you when you want more indexes?

That’s where the Non-Clustered Index (NCI) comes in to the play. An NCI is like the index at the end of college textbook. A bio 101 book may cluster on “ease of understanding” (chapter 1 = easy stuff, last chapter = pure unmiti gated hell), but there’s gotta be a way you can look up the word you don’t know, and be taken to the right page.

Most likely, the textbook would contain an index in the appendix, allowing you to SEEK the work you want to find and then look up corresponding page by an ID.  That index right there, is called the Non-Clustered Index.

Naturally, you can have as many NCI’s as you want. But keep in mind, if you do – while you might make the READ access go faster, there are two pitfalls: one – it penalizes the WRITE performance. two – it increases the storage size. We’ve seen customer databases wtih 20GB of actual data, but 55GB of non-clustered index on one table.

Hope this helps!

Leave a comment