Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
4 views3 pages

CH 12

The document discusses various concepts related to database indexing, including the reasons for not keeping multiple indices, the distinction between clustering and secondary indices, and the implications of using dense versus sparse indices. It also covers hashing techniques, bucket overflow causes, and the efficiency of B+-trees for range queries. Additionally, it addresses methods for optimizing B+-tree structures and computing existence bitmaps while considering null values.

Uploaded by

miguelalmeidapt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views3 pages

CH 12

The document discusses various concepts related to database indexing, including the reasons for not keeping multiple indices, the distinction between clustering and secondary indices, and the implications of using dense versus sparse indices. It also covers hashing techniques, bucket overflow causes, and the efficiency of B+-trees for range queries. Additionally, it addresses methods for optimizing B+-tree structures and computing existence bitmaps while considering null values.

Uploaded by

miguelalmeidapt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 3

// vim: spl=en spell tw=80 encoding=utf-8:

12.1 Since indices speed query processing, why might they not be kept on
several search keys? List as many reasons as possible.

----

* they take up space, and so might not really be worth it to keep them
* on search keys that have a lot of entries but few values, it may not be wise
to keep an index. Accessing using a full table scan would be more efficient due
to lower seek rate.
* on a query where several indexes could be used, it may not be clear to the
optimizer which would provide the best choice.
* ... ?

12.2 Is it possible in general to have two clustering indices on the same


relation for different search keys? Explain your answer.

----

No. A clustering index is used to order the relation on-disk. As different


search keys usually impose a different ordering, it is usually not possible to
have such a setup.

12.13 When is it preferable to use a dense index rather than a sparse index?
Explain your answer.

----

When most queries are equality queries, for instance, when we want to use
several indexes to do a ``in memory bitmap scan'', when the index is secondary
it cannot be sparse, when ... ?

12.14 What is the difference between a clustering index and a secondary index?

----

A clustering index is closely related to the ordering of the tuples on disk,


actually defining it. A secondary index is basically a side structure from which
we can get pointers to the tuples we want.

12.16 The solution presented in Section 12.5.3 to deal with non-unique search
keys added an extra attribute to the search key. What effect does this change
have on the height of the B+-tree?

----

Having a larger search key means that less search keys can be fit into a page,
and thus the logarithmic factor of the height of the tree is reduced. Notice
that although this is true it may not be meaningful, as the extra attribute is
generally quite small, and relations quite big, the overhead may not be all that
impairing.

12.17 Explain the distinction between closed and open hashing. Discuss the
relative merits of each technique in database applications.

----

In closed hashing when a bucket is full, new tuples that would end up in that
bucket are put in chaining overflow buckets (chaining means that a linked list
of overflow buckets is created). In open hashing whenever a bucket is full
tuples that would be in it are put in other (already existing) buckets, by some
kind of probing for a free bucket.

Open hashing has the principal advantage that when the hash function does not
satisfyingly disperse the values through the buckets, no extra space is required
for the hash table. On the other hand it does not allow for deletes or updates
in an easy manner.

Closed Hashing allows for deletes and updates but requires extra space even if
there are buckets that are not full, or even that are empty. It also has the
disadvantage that overflow buckets usually require additional disk access, while
in linear probing open hashing one can fetch 2 or 3 buckets which almost guarantees
that from that disk access it will be possible to find the required entry.

12.18 What are the causes of bucket overflow in a hash file organization? What
can be done to reduce the occurrence of bucket overflows?

----

(esta pergunta é parva...) Bucket overflows can occur due to insufficient number
of buckets or to a badly chosen hash function for the set of expectable data.
Against this one can either properly size the hash table or use dynamic hashing.

With dynamic hashing the hash table grows it's buckets as needed, without the
need for rehashing.

12.19 Why is a hash structure not the best choice for a search key on which
range queries are likely?

----

While B+-tree's tend to cluster similar data together (in order), hash tables,
by definition, tend to spread data throughout the various buckets. A range query
in an hash table would most likely access buckets that are in different areas of
the disk, and thus require a lot of I/O to process.

12.20 Suppose there is a relation R(A,B,C), with a B+-Tree index with search key
(A,B).

a. What is the worst case cost of finding records satisfying 10 < A < 50 using
this index, in terms of the number of records retrieved n1 and the height h of
the tree?

b. What is the worst case of finding records satisfying 10 < A < 50 AND 5 < B <
10 using this index, in terms of the number of records n2 that satisfy this
selection, as well as n1 and h defined above.

c. Under what conditions on n1 and n2 would the index be an efficient way of


finding records satisfying 10 < 5 < 50 AND 5 < B < 10.

----

a. h + 1 + n1 seeks and block transfers.

b. same as above, only + n2

c. in the condition where the index was a clustering index. If this were true,
n1+n2 would be block transfers instead of seeks.

12.21 Suppose that you have to create a B+-tree index on a large number of
names, where the maximum size of a name may be quite large (say 40 characters)
And the average name is itself large (say 10 characters). Explain how prefix
compression can be used to maximize the average fanout of internal nodes.

----

Given that B+-Tree's are ordered it is expectable that at least on lower level
internal nodes most keys have the same prefix, and would thus gain from
compression. This leads to less space occupied by each key in the node, leaving
more space available for other keys which in turn leads to shorter trees, which
are more efficient to query.

12.22 Why might the leaf nodes of a B+-tree file organization lose
sequentiality? Suggest how the file organization may be reorganized to restore
sequentiality.

---- (sim, a minha resposta a esta pergunta é meio duvidosa)

When internal nodes explode there has to be more pages available. This may
require using a new page that's after the leaf nodes. When, in turn, leaf nodes
explode, they will require pages from after this internal node. This may lead to
leaf nodes being interleaved by internal nodes.

12.24 Show how to compute existence bitmaps from other bitmaps. Make sure that
your technique works even in the presence of null values, by using a bitmap for
the value null.

---- (tão simples? onde está o truque na manga?)

Just OR all bitmaps and you get a bitmap with all existing tuples.

You might also like