0% found this document useful (0 votes)

103 views4 pages

Indexing

Database indexes are auxiliary data structures that allow for faster retrieval of data from database tables. The most common type of index is the B-tree index, which stores data in a tree structure to allow logarithmic retrieval times. B-tree indexes allow both equality and inequality queries. While indexes improve retrieval performance, they also increase storage usage and slow down data modifications since the indexes also need updating. The optimal index choice depends on the data type and query patterns.

Uploaded by

Sarita Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

103 views4 pages

Indexing

Uploaded by

Sarita Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

You are on page 1/ 4

The Question

Most consumer-facing web startups these days use one of the major open source databases,
either MySQL or PostgreSQL, to some degree. If you want to prove your worth it’s a good idea
to get down to the nitty gritty and gain some understanding about these databases’ internals.

So, the question: “Explain to me what databases indexes are and how they work.”

The Answer

In a nutshell a database index is an auxiliary data structure which allows for faster retrieval of
data stored in the database. They are keyed off of a specific column so that queries like “Give me
all people with a last name of ‘Smith’” are fast.

The Theory

Database tables, at least conceptually, look something like this:

id age last_name hometown

-- -- -- --
1 10 Johnson San Francisco, CA
2 27 Smith San Joe, CA
3 15 Rose Palo Alto, CA
4 64 Farmer Mill Valley, CA
5 55 Pauling San Francisco, CA
6 17 Smith Oakland, CA
... ... ... ...
100 49 Meyer Berkeley, CA
101 30 Wayne Monterey, CA
102 18 Schwartz San Francisco, CA
104 6 Johnson San Francisco, CA
... ... ... ...
10000 41 Fetterman Mountain View, CA
10001 25 Breyer Redwood City, CA

That is, a table is a collection of tuples1. If we have a file like this sitting on disk how do we get
all records that have a last name of ‘Smith?’

The code would wind up looking something like this:

results = []
for row in rows:
if row[2] == ‘Smith’:
results.append[row]

Finding the appropriate records requires checking the conditions (here, having a last name of
‘Smith’) for each row. This is linear in the number of rows which, for many databases, could be
millions or billions of rows. Bad news.
How can we make it faster?

Database Indexes

Any type of data structure that allows for (potentially) faster access can be considered an index.
Let’s look at some.

Hash Indexes

Take the same example from above, finding all people with a last name of ‘Smith.’ One solution
would be to create a hash table. The keys of the hash would be based off of the last_name field
and the values would be pointers to the database row.

This type of index is called, unsurprisingly, a “hash index.” Most databases support them but
they’re generally not the default type. Why?

Well, consider a query like this: “Find all people who are younger than 45.” Hashes can deal
with equality but not inequality. That is, given the hashes of two fields, there’s just no way for
me to tell which is greater than the other, only whether they’re equal or not.

B-tree Indexes

The data structure most commonly used for database indexes are B-trees, a specific kind of self-
balancing tree. A picture’s worth a thousand words, so here’s an example.

The main benefit of a B-tree is that it allows logarithmic selections, insertions, and deletions in
the worst case scenario. And unlike hash indexes it stores the data in an ordered way, allowing
for faster row retrieval when the selection conditions include things like inequalities or prefixes.

For example, using the tree above, to get the records for all people younger than 13 requires
looking at only the left branch of the tree root.

Other Indexes
Hash indexes and B-tree indexes are the most common types of database indexes, but there are
others, too. MySQL supports R-tree indexes, which are used to query spatial data, e.g., “Show
me all cities within ten miles of San Francisco, CA.”

There are also bitmap indexes, which allow for almost instantaneous read operations but are
expensive to change and take up a lot of space. They are best for columns which have only a few
possible values.

Subtleties

Performance

Indexes don’t come for free. What you gain for in retrieval speed you lose in insertion and
deletion speed because every time you alter a table the indexes must be updated accordingly. If
your table is updating frequently it’s possible that having indexes will cause overall performance
of your database to suffer.

There is also a space penalty, as the indexes take up space in memory or on disk. A single index
is smaller than the table because it doesn’t contain all the data, only pointers to the data, but in
general the larger the table the larger the index2.

Design

Nodes in a B-tree contain a value and a number of pointers to children nodes. For database
indexes the “value” is really a pair of values: the indexed field and a pointer to a database row.
That is, rather than storing the row data right in the index, you store a pointer to the row on disk.

For example, if we have an index on an age column, the value in the B-tree might be something
like (34, 0×875900). 34 is the age and 0×875900 is a reference to the location of the data, rather
than the data itself.

This often allows indexes to be stored in memory even for tables that are so large they can only
be stored on disk.

Furthermore, B-tree indexes are typically designed so that each node takes up one disk block.
This allows each node to be read in with a single disk operation.

Also, for the pedants among us, many databases use B+ trees rather than classic B-trees for
generic database indexes. InnoDB’s BTREE index type is closer to a B+ tree than a B-tree, for
example.

Summary

Database indexes are auxiliary data structures that allow for quicker retrieval of data. The most
common type of index is a B-tree index because it has very good general performance
characteristics and allows a wide range of comparisons, including both equality and inequalities.
The penalty for having a database index is the cost required to update the index, which must
happen any time the table is altered. There is also a certain about of space overhead, although
indexes will be smaller than the table they index.

For specific data types different indexes might be better suited than a B-tree. R-trees, for
example, allow for quicker retrieval of spatial data. For fields with only a few possible values
bitmap indexes might be appropriate.

Top 50 Java Design Pattern Interview Questions
No ratings yet
Top 50 Java Design Pattern Interview Questions
67 pages
Cheat Sheet - Incident Response Log Management
No ratings yet
Cheat Sheet - Incident Response Log Management
1 page
Database Indexing Techniques
No ratings yet
Database Indexing Techniques
2 pages
Blog Algomaster Io P A Detailed Guide On Database Indexes
No ratings yet
Blog Algomaster Io P A Detailed Guide On Database Indexes
8 pages
A Detailed Guide On Database Indexes 11
No ratings yet
A Detailed Guide On Database Indexes 11
14 pages
Index: Presented By-VISHAKHA CHANDRA (10030141082)
No ratings yet
Index: Presented By-VISHAKHA CHANDRA (10030141082)
29 pages
SQL Indexes
No ratings yet
SQL Indexes
20 pages
Database Management System-203105251: Assistant Professor Computer Science & Engineering
No ratings yet
Database Management System-203105251: Assistant Professor Computer Science & Engineering
35 pages
Ss Three Data PR 1therm
No ratings yet
Ss Three Data PR 1therm
17 pages
Unit 4 Notes
No ratings yet
Unit 4 Notes
15 pages
Lec 8 Indexing & Data Structures For Query Processing
No ratings yet
Lec 8 Indexing & Data Structures For Query Processing
51 pages
MySQL Indexing
No ratings yet
MySQL Indexing
19 pages
Tuning
100% (2)
Tuning
29 pages
Introduction To Storage Strategies in DBMS
No ratings yet
Introduction To Storage Strategies in DBMS
8 pages
Tuning: Overview: Leccotech
No ratings yet
Tuning: Overview: Leccotech
29 pages
Tuning SQL Queries - Oracle
100% (1)
Tuning SQL Queries - Oracle
27 pages
How Does Database Indexing Work
No ratings yet
How Does Database Indexing Work
4 pages
Mod 4
No ratings yet
Mod 4
4 pages
Database Basics
No ratings yet
Database Basics
4 pages
SQL Query Optimization
No ratings yet
SQL Query Optimization
49 pages
Index
No ratings yet
Index
16 pages
Module 12 - Managing Indexes
No ratings yet
Module 12 - Managing Indexes
19 pages
Final Group-Orcl
No ratings yet
Final Group-Orcl
18 pages
Lec20Indexing v1
No ratings yet
Lec20Indexing v1
57 pages
Indexing Hashing
No ratings yet
Indexing Hashing
34 pages
Introduction To Indexing in Database Management Systems Print
No ratings yet
Introduction To Indexing in Database Management Systems Print
12 pages
DBMS Case Study 19 1
No ratings yet
DBMS Case Study 19 1
12 pages
An in Depth Look at Database Indexing
No ratings yet
An in Depth Look at Database Indexing
3 pages
IN3020/4020 - Database Systems Spring 2020, Week 3.1 Indexing
No ratings yet
IN3020/4020 - Database Systems Spring 2020, Week 3.1 Indexing
44 pages
Indexing - DBMS
No ratings yet
Indexing - DBMS
20 pages
05 Indexes
No ratings yet
05 Indexes
28 pages
DBMS Unit5
No ratings yet
DBMS Unit5
40 pages
G-03 Presentation
No ratings yet
G-03 Presentation
19 pages
Lec6 QP Indexing
No ratings yet
Lec6 QP Indexing
40 pages
Database Indexing
No ratings yet
Database Indexing
4 pages
Indexing Data
No ratings yet
Indexing Data
10 pages
MySQL-Indexing Best Practices (WEBINAR)
No ratings yet
MySQL-Indexing Best Practices (WEBINAR)
41 pages
Indexing - II
No ratings yet
Indexing - II
57 pages
Inls 623 - Database Systems Ii - File Structures, Indexing, and Hashing
No ratings yet
Inls 623 - Database Systems Ii - File Structures, Indexing, and Hashing
41 pages
Database Storage and Indexing
No ratings yet
Database Storage and Indexing
14 pages
Indexing
No ratings yet
Indexing
6 pages
CSCE5350 Activity 7
No ratings yet
CSCE5350 Activity 7
32 pages
Understanding Indexes: User Login
No ratings yet
Understanding Indexes: User Login
10 pages
c4 Index PDF
No ratings yet
c4 Index PDF
100 pages
Query Optimization
No ratings yet
Query Optimization
9 pages
Database Indexing Essentials
No ratings yet
Database Indexing Essentials
110 pages
Database Index Management Guide
No ratings yet
Database Index Management Guide
32 pages
VI. Indices
No ratings yet
VI. Indices
12 pages
1 - Database Design
No ratings yet
1 - Database Design
29 pages
1 - Database Design
No ratings yet
1 - Database Design
29 pages
Indexing Hashing Files
No ratings yet
Indexing Hashing Files
68 pages
What Is Indexing?: Indexing Is A Data Structure Technique Which Allows You To Quickly Retrieve
100% (1)
What Is Indexing?: Indexing Is A Data Structure Technique Which Allows You To Quickly Retrieve
7 pages
Database Indexing Techniques
100% (1)
Database Indexing Techniques
4 pages
L6 Query Optimization
No ratings yet
L6 Query Optimization
52 pages
Indexes
No ratings yet
Indexes
18 pages
Index Structures
No ratings yet
Index Structures
34 pages
UNIT1 Notes ABDA
No ratings yet
UNIT1 Notes ABDA
7 pages
Black Elegant and Modern Startup Pitch Deck Presentation
No ratings yet
Black Elegant and Modern Startup Pitch Deck Presentation
16 pages
Module Iippt
No ratings yet
Module Iippt
27 pages
FTP Server Configuration Guide
100% (1)
FTP Server Configuration Guide
4 pages
CIS Ubuntu Linux 22.04 LTS Benchmark v1.0.0 PDF
No ratings yet
CIS Ubuntu Linux 22.04 LTS Benchmark v1.0.0 PDF
865 pages
Data Warehousing Exam Guide
No ratings yet
Data Warehousing Exam Guide
10 pages
Metrics For 5000 Lines of Java Code
No ratings yet
Metrics For 5000 Lines of Java Code
48 pages
Smart Card PDF
No ratings yet
Smart Card PDF
13 pages
Normalization With Example2
No ratings yet
Normalization With Example2
20 pages
Python Redmine PDF
No ratings yet
Python Redmine PDF
97 pages
FortiNAC Backup Guide for Admins
No ratings yet
FortiNAC Backup Guide for Admins
23 pages
ISPM Solution Brief 2024-05-17 VFinal
No ratings yet
ISPM Solution Brief 2024-05-17 VFinal
5 pages
COmp INtfc Code
No ratings yet
COmp INtfc Code
21 pages
Chapter 2
No ratings yet
Chapter 2
10 pages
Data Security Tools Overview
No ratings yet
Data Security Tools Overview
4 pages
Siemens PCS 7 Hardening Script
No ratings yet
Siemens PCS 7 Hardening Script
6 pages
ZKBioSecurity V5000 - Marketing Guide V4.0.0 - 20210203
No ratings yet
ZKBioSecurity V5000 - Marketing Guide V4.0.0 - 20210203
25 pages
IT Data Solutions Expert Profile
No ratings yet
IT Data Solutions Expert Profile
6 pages
Apache Superset Readthedocs Io en Latest PDF
No ratings yet
Apache Superset Readthedocs Io en Latest PDF
120 pages
Internship Report-HET
No ratings yet
Internship Report-HET
35 pages
Coe Siebel Stats
No ratings yet
Coe Siebel Stats
8 pages
Stuck in The Integration Part - ?
No ratings yet
Stuck in The Integration Part - ?
8 pages
Liang Chapter 1
No ratings yet
Liang Chapter 1
72 pages
Srs
No ratings yet
Srs
3 pages
SAP HCM: Get Manager's Subordinates
No ratings yet
SAP HCM: Get Manager's Subordinates
4 pages
ISRO CS/IT Exam Syllabus
No ratings yet
ISRO CS/IT Exam Syllabus
12 pages
Resume Format 3
No ratings yet
Resume Format 3
3 pages
MCSA 70-410 PowerShell Guide
100% (1)
MCSA 70-410 PowerShell Guide
4 pages
Kavita Bhatt Resume
No ratings yet
Kavita Bhatt Resume
1 page
Azure DevOps LAB
No ratings yet
Azure DevOps LAB
6 pages
6236-Implementing and Maintaining Microsoft SQL Server 2008 Reporting Services
No ratings yet
6236-Implementing and Maintaining Microsoft SQL Server 2008 Reporting Services
5 pages

Indexing

Uploaded by

Indexing

Uploaded by

The Question

Database tables, at least conceptually, look something like this:

id age last_name hometown

The code would wind up looking something like this:

You might also like