Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
103 views4 pages

Indexing

Database indexes are auxiliary data structures that allow for faster retrieval of data from database tables. The most common type of index is the B-tree index, which stores data in a tree structure to allow logarithmic retrieval times. B-tree indexes allow both equality and inequality queries. While indexes improve retrieval performance, they also increase storage usage and slow down data modifications since the indexes also need updating. The optimal index choice depends on the data type and query patterns.

Uploaded by

Sarita Sharma
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
103 views4 pages

Indexing

Database indexes are auxiliary data structures that allow for faster retrieval of data from database tables. The most common type of index is the B-tree index, which stores data in a tree structure to allow logarithmic retrieval times. B-tree indexes allow both equality and inequality queries. While indexes improve retrieval performance, they also increase storage usage and slow down data modifications since the indexes also need updating. The optimal index choice depends on the data type and query patterns.

Uploaded by

Sarita Sharma
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 4

The Question

Most consumer-facing web startups these days use one of the major open source databases,
either MySQL or PostgreSQL, to some degree. If you want to prove your worth it’s a good idea
to get down to the nitty gritty and gain some understanding about these databases’ internals.

So, the question: “Explain to me what databases indexes are and how they work.”

The Answer

In a nutshell a database index is an auxiliary data structure which allows for faster retrieval of
data stored in the database. They are keyed off of a specific column so that queries like “Give me
all people with a last name of ‘Smith’” are fast.

The Theory

Database tables, at least conceptually, look something like this:

id age last_name hometown


-- -- -- --
1 10 Johnson San Francisco, CA
2 27 Smith San Joe, CA
3 15 Rose Palo Alto, CA
4 64 Farmer Mill Valley, CA
5 55 Pauling San Francisco, CA
6 17 Smith Oakland, CA
... ... ... ...
100 49 Meyer Berkeley, CA
101 30 Wayne Monterey, CA
102 18 Schwartz San Francisco, CA
104 6 Johnson San Francisco, CA
... ... ... ...
10000 41 Fetterman Mountain View, CA
10001 25 Breyer Redwood City, CA

That is, a table is a collection of tuples1. If we have a file like this sitting on disk how do we get
all records that have a last name of ‘Smith?’

The code would wind up looking something like this:

results = []
for row in rows:
if row[2] == ‘Smith’:
results.append[row]

Finding the appropriate records requires checking the conditions (here, having a last name of
‘Smith’) for each row. This is linear in the number of rows which, for many databases, could be
millions or billions of rows. Bad news.
How can we make it faster?

Database Indexes

Any type of data structure that allows for (potentially) faster access can be considered an index.
Let’s look at some.

Hash Indexes

Take the same example from above, finding all people with a last name of ‘Smith.’ One solution
would be to create a hash table. The keys of the hash would be based off of the last_name field
and the values would be pointers to the database row.

This type of index is called, unsurprisingly, a “hash index.” Most databases support them but
they’re generally not the default type. Why?

Well, consider a query like this: “Find all people who are younger than 45.” Hashes can deal
with equality but not inequality. That is, given the hashes of two fields, there’s just no way for
me to tell which is greater than the other, only whether they’re equal or not.

B-tree Indexes

The data structure most commonly used for database indexes are B-trees, a specific kind of self-
balancing tree. A picture’s worth a thousand words, so here’s an example.

The main benefit of a B-tree is that it allows logarithmic selections, insertions, and deletions in
the worst case scenario. And unlike hash indexes it stores the data in an ordered way, allowing
for faster row retrieval when the selection conditions include things like inequalities or prefixes.

For example, using the tree above, to get the records for all people younger than 13 requires
looking at only the left branch of the tree root.

Other Indexes
Hash indexes and B-tree indexes are the most common types of database indexes, but there are
others, too. MySQL supports R-tree indexes, which are used to query spatial data, e.g., “Show
me all cities within ten miles of San Francisco, CA.”

There are also bitmap indexes, which allow for almost instantaneous read operations but are
expensive to change and take up a lot of space. They are best for columns which have only a few
possible values.

Subtleties

Performance

Indexes don’t come for free. What you gain for in retrieval speed you lose in insertion and
deletion speed because every time you alter a table the indexes must be updated accordingly. If
your table is updating frequently it’s possible that having indexes will cause overall performance
of your database to suffer.

There is also a space penalty, as the indexes take up space in memory or on disk. A single index
is smaller than the table because it doesn’t contain all the data, only pointers to the data, but in
general the larger the table the larger the index2.

Design

Nodes in a B-tree contain a value and a number of pointers to children nodes. For database
indexes the “value” is really a pair of values: the indexed field and a pointer to a database row.
That is, rather than storing the row data right in the index, you store a pointer to the row on disk.

For example, if we have an index on an age column, the value in the B-tree might be something
like (34, 0×875900). 34 is the age and 0×875900 is a reference to the location of the data, rather
than the data itself.

This often allows indexes to be stored in memory even for tables that are so large they can only
be stored on disk.

Furthermore, B-tree indexes are typically designed so that each node takes up one disk block.
This allows each node to be read in with a single disk operation.

Also, for the pedants among us, many databases use B+ trees rather than classic B-trees for
generic database indexes. InnoDB’s BTREE index type is closer to a B+ tree than a B-tree, for
example.

Summary

Database indexes are auxiliary data structures that allow for quicker retrieval of data. The most
common type of index is a B-tree index because it has very good general performance
characteristics and allows a wide range of comparisons, including both equality and inequalities.
The penalty for having a database index is the cost required to update the index, which must
happen any time the table is altered. There is also a certain about of space overhead, although
indexes will be smaller than the table they index.

For specific data types different indexes might be better suited than a B-tree. R-trees, for
example, allow for quicker retrieval of spatial data. For fields with only a few possible values
bitmap indexes might be appropriate.

You might also like