Analytic Top-N Queries
One of the more advanced tricks I like to exploit are analytic
Top-N queries. Although I am using them for quite a while, I
recently discovered a “limitation” that I was not aware of.
Actually—to be honest—it’s not a limitation; it is a missing
optimization in a rarely used feature that can easily worked
around. I must admit that I ask for quite a lot in that case.
The article starts with a general introduction into Top-N
queries, applies that technique to analytic queries and
explains the case where I miss an optimization. But is is
really worth all that efforts? The article concludes with my
answer to that question.
Please find the CREATE and INSERT statements at the end of
the article.
Top-N Queries
Top-N queries are queries for the first N rows according to a
specific sort order—e.g., the first three rows like that:
select * from (
select start_key, group_key, junk
from demo
where start_key = 'St'
order by group_key
where rownum <= 3;
That’s well known and very straight. However, the
interesting part is performance—as usual. A naïve
implementation executes the inner SQL first—that is, fetch
and sort all the matching records—before limiting the result
set to the first three rows. In absence of a useful index, that
is really happening:
START_KEY GROUP_KEY JUNK
---------- --------- ----------
St 1 junk
St 3 junk
St 10 junk
3 rows selected.
Execution Plan
---------------------------------------------------------
-
Plan hash value: 142682949
---------------------------------------------------------
-----
| Id | Operation | Name | Rows | Bytes |
Cost |
---------------------------------------------------------
-----
| 0 | SELECT STATEMENT | | 3 | 1032 |
8240 |
|* 1 | COUNT STOPKEY | | | |
|
| 2 | VIEW | | 370 | 124K|
8240 |
|* 3 | SORT ORDER BY STOPKEY| | 370 | 76960 |
8240 |
|* 4 | TABLE ACCESS FULL | DEMO | 370 | 76960 |
8239 |
---------------------------------------------------------
-----
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM<=3)
3 - filter(ROWNUM<=3)
4 - filter("START_KEY"='St')
Statistics
---------------------------------------------------------
-
0 recursive calls
0 db block gets
30370 consistent gets
30365 physical reads
0 redo size
998 bytes sent via SQL*Net to client
419 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
3 rows processed
A full table scan is performed—more on that in a few
seconds—to retrieve all the rows that match the where
clause; about 370 according to the optimizers estimate. The
next sorts the entire result set. Finally the limit is applied—
the COUNT STOPKEY step—and the number of rows is reduced
to three.
The performance problem of this query is obviously the full
table scan. Let’s create an index to make it go away:
create index demo_idx on demo (start_key);
exec dbms_stats.gather_index_stats(null, 'DEMO_IDX');
That’s much better:
START_KEY GROUP_KEY JUNK
---------- --------- ----------
St 1 junk
St 3 junk
St 10 junk
3 rows selected.
Execution Plan
---------------------------------------------------------
-
Plan hash value: 1129354520
---------------------------------------------------------
---------------
| Id | Operation | Name | Rows |
Bytes | Cost |
---------------------------------------------------------
---------------
| 0 | SELECT STATEMENT | | 3 |
1032 | 372 |
|* 1 | COUNT STOPKEY | | |
| |
| 2 | VIEW | | 370 |
124K| 372 |
|* 3 | SORT ORDER BY STOPKEY | | 370 |
76960 | 372 |
| 4 | TABLE ACCESS BY INDEX ROWID| DEMO | 370 |
76960 | 371 |
|* 5 | INDEX RANGE SCAN | DEMO_IDX | 370 |
| 3 |
---------------------------------------------------------
---------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM<=3)
3 - filter(ROWNUM<=3)
5 - access("START_KEY"='St')
Statistics
---------------------------------------------------------
-
1 recursive calls
0 db block gets
360 consistent gets
201 physical reads
0 redo size
998 bytes sent via SQL*Net to client
419 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
3 rows processed
You can see that the full table scan was replaced by an index
lookup and the corresponding table access. The other steps
remain unchanged.
However, this is still a bad execution plan because all
matching records are fetched and sorted just to throw most
of them away. The following index allows a much better
execution plan:
drop index demo_idx;
create index demo_idx on demo (start_key, group_key);
exec dbms_stats.gather_index_stats(null, 'DEMO_IDX');
The new execution plan looks like this:
ID START_KEY GROUP_KEY
---------- ---------- ---------
936196 St 1
232303 St 3
759212 St 10
3 rows selected.
Execution Plan
---------------------------------------------------------
-
Plan hash value: 1891928015
---------------------------------------------------------
--------------
| Id | Operation | Name | Rows |
Bytes | Cost |
---------------------------------------------------------
--------------
| 0 | SELECT STATEMENT | | 3 |
465 | 7 |
|* 1 | COUNT STOPKEY | | |
| |
| 2 | VIEW | | 3 |
465 | 7 |
| 3 | TABLE ACCESS BY INDEX ROWID| DEMO | 3 |
36 | 7 |
|* 4 | INDEX RANGE SCAN | DEMO_IDX | 370 |
| 3 |
---------------------------------------------------------
--------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM<=3)
4 - access("START_KEY"='St')
Statistics
---------------------------------------------------------
-
0 recursive calls
0 db block gets
7 consistent gets
0 physical reads
0 redo size
609 bytes sent via SQL*Net to client
419 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
3 rows processed
Well, that is efficient. The sort operation has vanished at all
because the index definition supports the ORDER BY clause.
But even more powerful, the STOPKEY takes effect down to
the index range scan. You can see the reduced number of
table accesses in the plan. Although not visible in the
execution plan, the index range scan is also aborted after
fetching the first three records.
Well, that optimization is in the Oracle database for quite a
while—at least since 8i I guess. After that preparation, I can
demonstrate what 10R2 has to offer on top of that.
Analytic Top-N Queries
It is actually the very same story with a small extension: I
don’t want to retrieve the first N rows, but all the rows
where the group_key value is at it’s minimum for the
respective start_key. A very straight solution is that:
select id, start_key, group_key
from demo
where start_key = 'St'
and group_key = (select min(group_key)
from demo
where start_key = 'St'
);
That statement is perfectly legal—even performance wise:
ID START_KEY GROUP_KEY
---------- ---------- ---------
936196 St 1
1 row selected.
Execution Plan
---------------------------------------------------------
-
Plan hash value: 1142136980
---------------------------------------------------------
---------------
| Id | Operation | Name | Rows |
Bytes | Cost |
---------------------------------------------------------
---------------
| 0 | SELECT STATEMENT | | 1 |
12 | 8 |
| 1 | TABLE ACCESS BY INDEX ROWID | DEMO | 1 |
12 | 5 |
|* 2 | INDEX RANGE SCAN | DEMO_IDX | 1 |
| 3 |
| 3 | SORT AGGREGATE | | 1 |
7 | |
| 4 | FIRST ROW | | 1 |
7 | 3 |
|* 5 | INDEX RANGE SCAN (MIN/MAX)| DEMO_IDX | 1 |
7 | 3 |
---------------------------------------------------------
---------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("START_KEY"='St'
AND "GROUP_KEY"= (SELECT MIN("GROUP_KEY")
FROM
"DEMO" "DEMO" WHERE "START_KEY"='St'))
5 - access("START_KEY"='St')
Statistics
---------------------------------------------------------
-
1 recursive calls
0 db block gets
8 consistent gets
0 physical reads
0 redo size
550 bytes sent via SQL*Net to client
419 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
Analytic functions
Analytic functions can perform calculations on the basis of multiple rows.
However, not to be confused with aggregate functions, analytical functions
work without GROUP BY. A very typical use for analytical functions is a running
balance; that is, the sum of all the rows preceding the current row.
The function used in the example (dense_rank) returns the rank of the
current row according to the supplied OVER(ORDER BY) clause—that is, in
turn, not to be confused with a regular ORDER BY.
orafaq.com has a nice intro to Oracle analytic functions.
The fist step is to fetch the smallest group_key. Because of
the min/max optimization in combination with a well
supporting index, the database doesn’t need to sort the data
—it just picks the first record from the index which must be
the smallest anyway. The second step is to perform a regular
index lookup for the start_key and the group_key that was
just retrieved from the sub-query.
Another possible implementation for that is to use
an analytic function:
select * from (
select id, start_key, group_key,
dense_rank() OVER (order by group_key) rnk
from demo
where start_key = 'St'
where rnk <= 1;
Do you recognize the pattern? It is very similar to the
traditional Top-N query that was described at the beginning
of this article. Instead of limiting on
the rownum pseudocolumn we use an analytic function. The
execution plan reveals the performance characteristic of
that statement:
ID START_KEY GROUP_KEY RNK
---------- ---------- --------- ----------
936196 St 1 1
1 row selected.
Execution Plan
---------------------------------------------------------
-
Plan hash value: 3221234897
---------------------------------------------------------
--------------
| Id | Operation | Name | Rows |
Bytes | Cost |
---------------------------------------------------------
--------------
| 0 | SELECT STATEMENT | | 370 |
62160 | 374 |
|* 1 | VIEW | | 370 |
62160 | 374 |
|* 2 | WINDOW NOSORT STOPKEY | | 370 |
4440 | 374 |
| 3 | TABLE ACCESS BY INDEX ROWID| DEMO | 370 |
4440 | 373 |
|* 4 | INDEX RANGE SCAN | DEMO_IDX | 370 |
| 3 |
---------------------------------------------------------
--------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("RNK"<=1)
2 - filter(DENSE_RANK() OVER ( ORDER BY
"GROUP_KEY")<=1)
4 - access("START_KEY"='St')
Statistics
---------------------------------------------------------
-
0 recursive calls
0 db block gets
6 consistent gets
0 physical reads
0 redo size
610 bytes sent via SQL*Net to client
419 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
What was the COUNT STOPKEY operation for the traditional
Top-N query has become the WINDOW NOSORT
STOPKEY operation for the analytical function. However, the
expected number of rows is not known to the optimizer—any
number of rows could have the lowest group_key value. Still
the index range scan is aborted once the required rows have
been fetched. On the one hand, the consistent gets are even
better as with the sub-query statement. On the other hand,
the cost value is higher. Whenever you use analytic
functions, go for a benchmark to know the actual
performance.
Let’s have some thoughts about this optimization. The
database knows that the index order corresponds to
the OVER (ORDER BY) clause and avoids the the sort
operation. But even more impressive is that that it can abort
the range scan when the first value that doesn’t match
the rnk <= 1 expression is fetched. That is only possible
because the dense_rank() function can not decrease if the
rows are fetched in order of the OVER(ORDER BY) clause.
That’s impressive, isn’t it?
Mass Top-N Queries
The next step towards the issue that made me writing this
article is to make a mass Top-N query. With the previous
statement as basis, it is actually quite simple; just remove
the inner where clause to get the result for
all start_key values and add a partition clause to make sure
the rank is built individually for each start_key:
select * from (
select start_key, group_key, junk,
dense_rank() OVER (partition by start_key
order by group_key) rnk
from demo
where rnk <= 1;
Declaring the partition is required to make sure
those start_keysthat don’t have a group_key of one will
still show up, with their lowest group_key value.
With that query, we have reached the end of the optimizers
smartness—as of release 11r2. On the first sight, the plan is
not surprising:
3260 rows selected.
Execution Plan
---------------------------------------------------------
-
Plan hash value: 1766530486
---------------------------------------------------------
-----
| Id | Operation | Name | Rows | Bytes |
Cost |
---------------------------------------------------------
-----
| 0 | SELECT STATEMENT | | 1000K| 340M|
8239 |
|* 1 | VIEW | | 1000K| 340M|
8239 |
|* 2 | WINDOW SORT PUSHED RANK| | 1000K| 198M|
8239 |
| 3 | TABLE ACCESS FULL | DEMO | 1000K| 198M|
8239 |
---------------------------------------------------------
-----
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("RNK"<=1)
2 - filter(DENSE_RANK() OVER ( PARTITION BY
"START_KEY" ORDER BY
"GROUP_KEY")<=1)
Statistics
---------------------------------------------------------
-
22 recursive calls
20 db block gets
30370 consistent gets
33163 physical reads
0 redo size
59215 bytes sent via SQL*Net to client
2806 bytes received via SQL*Net from client
219 SQL*Net roundtrips to/from client
0 sorts (memory)
1 sorts (disk)
3260 rows processed
It’s a full table scan. However, a “mass” query performs a
full table scan on good purpose—that did not call my
attention. What did call my attention is the following:
select * from (
select * from (
select start_key, group_key, junk,
dense_rank() OVER (partition by start_key
order by group_key) rnk
from demo
) where rnk <= 1
) where start_key = 'St';
It is actually the individual Top-N query again. This time it is
built on the basis of the mass Top-N query—that was set up
as view. That way, a single database view can be used for
any mass query as well as for individual Top-N queries—
that’s a maintainability benefit. If the advanced magic to
abort the index range scan is still working it would be
extremely efficient as well. The execution plan proves the
opposite:
START_KEY GROUP_KEY JUNK RNK
---------- --------- ---------- ----------
St 1 junk 1
1 row selected.
Execution Plan
---------------------------------------------------------
-
Plan hash value: 1309566133
---------------------------------------------------------
--------------
| Id | Operation | Name | Rows |
Bytes | Cost |
---------------------------------------------------------
--------------
| 0 | SELECT STATEMENT | | 370 |
128K| 373 |
|* 1 | VIEW | | 370 |
128K| 373 |
|* 2 | WINDOW NOSORT | | 370 |
76960 | 373 |
| 3 | TABLE ACCESS BY INDEX ROWID| DEMO | 370 |
76960 | 373 |
|* 4 | INDEX RANGE SCAN | DEMO_IDX | 370 |
| 3 |
---------------------------------------------------------
--------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("RNK"<=1)
2 - filter(DENSE_RANK() OVER ( PARTITION BY
"START_KEY" ORDER BY
"GROUP_KEY")<=1)
4 - access("START_KEY"='St')
Statistics
---------------------------------------------------------
-
0 recursive calls
0 db block gets
363 consistent gets
0 physical reads
0 redo size
808 bytes sent via SQL*Net to client
419 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
Although no sort is required, the STOPKEY has disappeared
from the WINDOW NOSORT operation. That means that the full
index range scan will be performed; for all 359 rows where
start_key='St'. On top of that, the number of consistent
gets is rather high. A closer look into the execution plan
reveals that the entire row is fetched from the
table before the filter on the analytic expression is applied.
The junk column that is fetched from the table is not
required for the evaluation of this predicate; it would be
possible to fetch that column only for those rows that pass
the filter.
The “premature table access” is the reason why the full
table scan is more efficient for the mass query than a index
full scan. Have a look into the (hinted) full index scan
execution plan for the mass query:
3260 rows selected.
Execution Plan
---------------------------------------------------------
-
Plan hash value: 1402975529
---------------------------------------------------------
---------------
| Id | Operation | Name | Rows |
Bytes | Cost |
---------------------------------------------------------
---------------
| 0 | SELECT STATEMENT | | 1000K|
340M| 1002K |
|* 1 | VIEW | | 1000K|
340M| 1002K |
|* 2 | WINDOW NOSORT | | 1000K|
198M| 1002K |
| 3 | TABLE ACCESS BY INDEX ROWID| DEMO | 1000K|
198M| 1002K |
| 4 | INDEX FULL SCAN | DEMO_IDX | 1000K|
| 2504 |
---------------------------------------------------------
---------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("RNK"<=1)
2 - filter(DENSE_RANK() OVER ( PARTITION BY
"START_KEY" ORDER BY
"GROUP_KEY")<=1)
Statistics
---------------------------------------------------------
-
0 recursive calls
0 db block gets
1002692 consistent gets
817172 physical reads
0 redo size
59215 bytes sent via SQL*Net to client
2806 bytes received via SQL*Net from client
219 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
3260 rows processed
The expensive step in this execution plan is the table access.
If the table access would be moved up to take place after the
window filter, the cost for this step would be 3260 (one for
each fetched row). The total cost for the plan would
probably stay below 7000; that is, lower then the cost for the
full table scan plan.
Conclusion
Just to re-emphasize the motivation behind the view that can
serve both needs; it is about multidimensional optimization—
and that has nothing to do with OLAP!
Typically, performance optimization takes only one
dimension into account; that is, performance. So far so good,
but what about long term maintenance? Very often,
performance optimization reduces the maintainability of the
software. That’s not a coincidence, it’s because
maintainability is the only degree of freedom during
optimization. Unfortunately, reduced maintainability is very
hard to notice. If it is noticed at all, it is probably years later.
I have been in both worlds for some years—operations and
development—and try to optimize for all dimensions
whenever possible because all of them are important for the
business.
Create and Insert Statements
To try it yourself:
create table demo (
id number not null,
start_key varchar2(255) not null,
group_key number not null,
junk char(200),
primary key (id)
);
insert into demo (
select level,
dbms_random.string('A', 2) start_key,
trunc(dbms_random.value(0,1000)),
'junk'
from dual
connect by level <= 1000000
);
commit;
exec DBMS_STATS.GATHER_TABLE_STATS(null, 'DEMO');
My tests were conducted on 11R2.
Finding the Best Match With a Top-
N Query
In Performance on 2010-09-29 at 11:16
There was an interesting index related performance problem
on Stack Overflow recently. The problem was to check an
input string against a table that holds about 2000 prefix
patterns (e.g., LIKE 'xyz%'). A fast select is needed that
returns one row if any pattern matches the input string, or
no row otherwise.
I believe my solution is worth a few extra words to explain it
in more detail. Even though it’s a perfect fit for Use The
Index, Luke it’s a little early to put it as an exercise there. It
is, however, a very good complement to my previous
article Analytic Top-N queries—so I put it here.
Although the problem was raised for a MySQL database, my
solution applies to all databases that can properly optimize
Top-N queries.
The original SQL statement in the question was like that:
select 1
from T1
where 'fxg87698x84' like concat (C1, '%')
T1.C1 is the column that holds the prefix patterns—one per
row. Although a prefixed LIKE filter can use an index range
scan, the problem is that it is the wrong way around: it’s not
searching for a string that matches the pattern, it’s
searching for a pattern that matches the string.
The query, as written, must check all the patterns against
the string. E.g., by a full table scan or (fast) full index scan.
However, it’s always a full scan. Can that be improved?
Let’s start step-by-step. The simplest case is that the exact
input string is a pattern in the table. A SQL statement to
check for the exact pattern is very simple:
select C1
from T1
where C1 = 'fxg87698x84'
The next case that the exact pattern doesn’t exist in the
table, but a prefix pattern, that matches the input string,
exists. That pattern must be shorter than the input string—
otherwise it cannot match. Because we aim to solve the
problem with an index, let’s imagine the patterns, as they
would be stored in an index:
axt3
fxg
<- place where 'fxg87698x84' would be
tru56
If the exact pattern doesn’t exist, the preceding index entry
is the best possible match (precondition: no overlapping
patterns exist). That’s because shorter strings are
considered “smaller” when sorted. So, let’s extend the select
to find the preceding record if the exact pattern is not in the
table:
select C1
from T1
where C1 <= 'fxg87698x84'
order by c1 desc
limit 1
The less than or equals condition will match the exact
pattern, if it exists, and all that precede it. The
reverse ORDER BY clause makes sure that the index is
traversed upwards. In conjunction with the where clause, it
means that the tree traversal is done to find the input string,
and the leaf node scan continues upwards from there.
The LIMIT 1clause is the MySQL way to make a Top-N
query so that the leaf node scan aborts after the first record.
Voilà, this statement will return the best candidate pattern
(or none at all) by performing a very small index range scan.
The final case we need to take care of is that no pattern
matches the input string. There are two sub-variants that
can happen: (a) a potentially matching pattern would be the
very first entry in the index. In that case the Top-N query
will not return any row and we are done; (b) the Top-N
query returns a pattern that is not a prefix for the input
string. That can be handled by wrapping the Top-N query to
filter the result through the original LIKE expression:
select 1
from (
select C1
from T1
where C1 <= 'fxg87698x84'
order by C1 desc
limit 1
) tmp
where 'fxg87698x84' like concat (C1, '%')
Done.
Simple? With a good understanding of index fundamentals,
it is simple! That’s why I am writing a Web-Book about
indexing basics: Use The Index, Luke!. Funny enough, the
basics are the same for all databases—we all put our pants
on one leg at a time.
Closing Note
The precondition for all that is that there are no overlapping
patterns in the table. E.g., the statement doesn’t work with
the following patterns:
axt3
fxg
fxg1
<- place where 'fxg87698x84' would be
tru56
In that case, the closest entry doesn’t match although there
is a matching entry. However, the FXG entry matches
everything that FXG1can possibly match—the two patterns
are overlapping.
Second Closing Note
The original problem posted on Stack Overflow mentioned
that this lookup must be performed 1 million times—within
half an hour. The author did not mention if that target was
reached, nor if the process is single-threaded.
However, considering the overall problem, the most
computing resource efficient solution would probably be to
sort both sets—the patterns and the input strings—and
implement a manual merge. But that’s probably much more
effort to implement. The index solution is very efficient on
human resources. Whatever is the best solution for the
business is up to the company to decide.
Choosing NoSQL For The
Right Reason
In Performance, Reliability, Scalability on 2011-05-13 at 09:42
Observing the NoSQL hype through the eyes of an SQL
performance consultant is an interesting experience. It is,
however, very hard to write about NoSQL because there are
so many forms of it. After all, NoSQL is nothing more than a
marketing term. A marketing term that works pretty well
because it goes to the heart of many developers that
struggle with SQL every day.
My unrepresentative observation is that NoSQL is often taken
for performance reasons. Probably because SQL performance
problems are an everyday experience. NoSQL, on the other
hand, is known to “scale well”. However, performance is
often a bad reason to choose NoSQL—especially if the side
effects, like eventual consistency, are poorly understood.
Most SQL performance problems result out of improper
indexing. Again, my unrepresentative observation. But I
believe it so strongly that I am writing a book about SQL
indexing. But indexing is not only a SQL topic, it applies to
NoSQL as well. MongoDB, for example, claims to support
“Index[es] on any attribute, just like you’re used to“. Seems
like there is no way around proper indexing—no matter if you
use SQL or NoSQL. The latest release of my book, “Response
Time, Throughput and Horizontal Scalability“, describes that
in more detail.
Performance is—almost always—the wrong reason for NoSQL.
Still there are cases where NoSQL is a better fit than SQL. As
an example, I’ll describe a NoSQL system that I use almost
every day. It is the distributed revision control system Git.
Wait! Git is not NoSQL? Well, let’s have a closer look.
Git doesn’t have an SQL front end
Git has specialized interfaces to interact with the
repository. Either on the command line or integrated
into an IDE. There isn’t anything that remotely compares
to SQL or a relational model. I never missed it.
Git doesn’t use an SQL back-end
Honestly, if I would have to develop a revision control
system, I wouldn’t take an SQL database as back-end.
There is no benefit in putting BLOBs into a relational
model and handling BLOBs all the time is just too
awkward.
Git is distributed
That’s my favourite Git feature. Working offline is
exactly what is meant by ‘partition tolerance’
in Brewer’s CAP Theorem. I can use all Git features
without Internet connection. Others can, of course, still
use the server if they can connect to it. Full functionality
on either end. It is partition tolerant.
Conflicts happen anyway
If there is one thing we learned in the 25 years
since Larry Wall introduced patch, it is that conflicts
happen. No matter what. Software development has a
very long “transaction time” and we are mostly using
optimistic locking—conflicts are inevitable. But here
comes the famous CAP Theorem again. If we cannot
have consistency anyway, let’s focus on the other two
CAP properties: availability and partition tolerance.
Acknowledging inconsistencies means to take care of
methods and tools to find and resolve them. That
involves the software (e.g., Git) as well as the user. But
here comes one last unrepresentative observation from
my side: most NoSQL users just ignore that. They
assume that the system magically resolves
contradicting writes automatically. It’s like using a CVS
work flow with Git—it works for a while, but you’ll end up
in trouble soon.
I’m not aware of a minimum feature set for NoSQL datastores
—it’s therefore hard to tell if Git fulfils them or not. However,
Git feels to me like using NoSQL for the right reason.
It’s about choosing the right tool for the job. But I can’t get
rid of the feeling that NoSQL is too often taken for the wrong
reasons—query response time, in particular. No doubt,
NoSQL is a better fit for some applications. However,
an index review would often solve the performance problems
within a few days. SQL is no better than NoSQL, nor vice-
versa. Because the question is not what’s better. The
question is what is a better fit for a particular problem.
▶ 6 Responses
1. “Most SQL performance problems result out of improper indexing.”
I’m sorry, but that is simply not true. The main reason you run into
performance problems with relational databases is because the data
model has problems. Indices are more like “inherent opportunities”
to speed things up here and there, but the main performance comes
from understanding the data, understanding the access patterns
and understanding how the SQL engine will calculate a plan. And
then designing a relational model that will balance these things out.
Designing a relational model that performs well is hard. Which is
why there are so few database professionals available who seem
able to do this well. It requires deep understanding of both the
domain that needs to be modeled as well as the technology it will
run on.
Just tweaking indices is going to work for a small subset of problems,
but the real performance gains are made during the design of the
relational model you will use. And unfortunately, they do not teach
this in school.
Also, the whole NoSQL vs SQL debate is artificial. There is no real
debate: they address entirely different classes of problems. It can be
boiled down to this: SQL is about consistency and flexibility during
querying at the expense of scalability and performance.
NoSQL is about “performance at scale”and flexibility when writing
data at the expense of consistency and querying flexibility. Note
that I said “performance at scale”, because many of the NoSQL
databases do not have particularly impressive performance for small
datasets.
Sacrificing absolute consistency is hard, but as it turns out, for a lot
of “new” problems lack of consistency is less of a problem than long
response times. The “new” problems here are online systems with
massive numbers of users. For some classes of companies you can
feel this directly. Most online banks are still pretty slow. Sites like
Amazon, on the other hand are quite snappy given that they have a
lot more web traffic than any bank. (And when it comes to online
commerce, every millisecond counts).
A system can be said to be scalable when the cost of increasing its
size is sublinear with respect to the dimension you need to scale.
Since the relational model is inherently expensive to apply in a
distributed manner, it is relatively easy to show that you cannot get
sublinear cost for arbitrary scale along any dimension.
However, with certain sacrifices you can get sublinear cost. For
instance by breaking the relational model somewhat and
partitioning the data into independent instances that have no
dependencies on other instances.
Note that when we say “cost” we mostly talk in terms of latency and
processing power. Not dollars. Although it will end up costing dollars.
SQL has its place and NoSQL has its place, but it is important to
understand that they address different types of problems. I have
worked at companies that have naively used SQL databases for
NoSQL type problems and vice versa. It is unhelpful that people
keep comparing them directly instead of trying to develop and
disseminate the kind of knowledge needed to reason about this.
Also it doesn’t help that Stonebreaker et al, to draw attention to
themselves, muddy the waters and confuse the issues by planting
the idea that NoSQL is somehow the antithesis of SQL. In fact the
label “NoSQL” has been incredibly unhelpful because it suggests
that there is a problem with SQL and that NoSQL is the magic
solution. This is, at best, naive. And unfortunately leads people to
get hung up on the wrong ideas.
Reply
Gruntle Grüber14 May 2011 at 10am
“The main reason you run into performance problems with
relational databases is because the data model has problems.”
Well, I made another observation. In fact, everybody knows that
database design is important and must be done carefully. There
are many books covering that in more or less detail. I do not say
that database design is not important, but I say it’s usually done
carefully anyway because everybody knows it’s important.
What I find at client sites is that developers are not aware how to
index properly and how to write queries that can benefit from
indexing. The DBAs, on the other hand, know about indexing but
don’t have the deep domain knowledge to know how the data is
queried.
Nobody ignores schema design but indexing is almost always
ignored until it’s too late. Adding some more or less random
indexes might improve the situation, but it is exactly what I refer
to as “improper” indexing. Indexing without a plan. In fact, my
position is that indexing must be designed with the same care as
the schema.
I pretty much agree with your statements about NoSQL.
Reply
Markus Winand14 May 2011 at 5pm
“No doubt, NoSQL is a better fit for some applications.”
Other than performance, could you provide examples of
applications that are more suitable to NoSQL semantics? SQL has
all other advantages, like persistence frameworks, consistency,
and other tools. NoSQL solutions like cassandra are target for
scaling writes. Other than sharding how do you think to scale
writes on a RDMS?
Reply
Deniz Oguz16 May 2011 at 12pm
Well, in lack of a definition for “NoSQL semantics” I like to see
the CAP Theorem as the central star that NoSQL systems orbit
around.
That said, I believe that any application where partition tolerance
is more important than consistency is a good fit for NoSQL.
Partition tolerance seems to be poorly understood in the field. I
took the Git example because many developers know what
“distributed” means in context of Git. I could have taken any
other distributed, partition tolerant revision control system for
that purpose. Source code repositories are a particularly good fit
because they hardly every reach consistency anyway.
The question—what is more important, partition tolerance or
consistency—depends on the data. Huge social networks have to
cope with tons of data that has very little value. Strict
consistency doesn’t pay off for that. The damage caused by
conflicts is little compared to the costs to establish strict
consistency. That argument is, however, nonexistent for small
sites because consistency is easy to achieve there.
I also mentioned BLOBs in the article. As a software architect, I
have been involved in many discussions where to store low
value binary data like user uploads. I remember a meeting
where the DBA smashed the proposal to use BLOBs for vast
amounts of user data by proclaiming that “BLOBs have no
business in my database”. BLOBs have, quite often, little value—
even if connected to high value relational data. From that angle,
I believe that some NoSQL systems make a great distributed
BLOB store which can coexist with a relational database.
Scaling writes is subject to Brewer’s CAP Theorem—take two out
of three. Sharding and similar methods bypass it by not
distributing the data at large—that is, only a small subset of the
nodes is responsible to maintain a particular data sub-set.
I feel, however, that the need to scale out is constantly
decreasing for most applications. I have observed a multi-
national banking system over the past decade. It was initially
running on a huge two node active-active cluster to distribute
load. A few years later, it was moved to a hot-standby cluster—
just one node active. Today, it’s being migrated to a virtualized
server running other databases on the same hardware. “Scale-
out” is not the trend—virtualization is, at least in enterprise
environments. Huge social sites being the obvious exception.
Markus Winand16 May 2011 at 3pm
2. Nice post, Git being close to a NoSQL system shows how little the
word “NoSQL” actually means.
Giorgio16 May 2011 at 1pm
3. […] Winand made the case earlier this year that the version control
system Git is actually a NoSQL datastore. The blog […]
Reply