Hash Full
Hash Full
Now we're going to start in a new lecture about hash tables. We'll talk about, uh,
hash codes, compression functions, how to resolve collisions Allusions. And I mean
we will discuss three ways using linear probing, quadratic probing and double
hashing. And we'll talk a little bit about basic operations. And this lecture is
not difficult, but the content could be new. That's why we you need you might need
to review it later. I'm at home. So before starting this lecture the idea is the
following. How can we implement the basic operations and constant time, let's say
in some structure, like looking for an element, deleting an element, or inserting a
new element. The idea is the following you. You don't compare the keys with each
other. Otherwise the running time would be could be possibly linear If the if the
keys are sorted, it could be up to log n, but how can we do it in constant time?
The idea is to compute compute the mathematical function that's going to for the
key. So based on the key. And then this mathematical function will evaluate to some
let's say index. And then we can access the record directly. This is the ideal
case. Okay. So I have a key I apply a function or compute a function to this key.
And then I get a result. This is the intuition behind this idea. All right so let's
look at it in more details. What is the hash table. Some of you will use it in the
coursework. It is an effective data structure for implementing a map. So review the
last lecture we saw about a very basic one. And what are we interested here in is
to store a key value pair. And we gave several examples like DNS records. It could
be I gave you also the example of sports centre where we have let's say sports and
uh members. So the key might not be, uh, might not have an integer value. But in a
hash table we're going to store, uh, them as keys. You'll see how we can do this.
Uh, transformation. Okay. So we have a key and value. Given a key, I would access
the value in constant time. How? We can do this using a hash table. Um, here's an
example of a map of size n equal to 11 that contains four items and where the keys
are integers. So assume here that the keys are integers. And um, we simply included
in this, uh, hash table. values or records that are characters. For example. You
could include other things. Now, if the keys are not integers in the range zero to
n minus one, we use what we call hash function to keep this range. Or to keep the
values, the result of h of the key h of k between zero and n minus one. Um. One of
the simplest way to perform or to achieve our aim is to use a function mathematical
function with the modulus operation. Okay. If you compute h of x is equal to x
modulus, and we are sure that the result x mod n will be between zero and n minus
one. And in this case you say that h is your hash Function. Okay. Or you can call
it also. Uh, yeah. So the hash value of the key x. Now an ideal hash function is
going to map the keys to integers in a random like manner. And the buckets are
going to be, uh, evenly or. So the keys will be evenly distributed in the buckets.
We'll see several examples on how to design your hash functions. All right. So we
have two steps here. First you map your key to an integer. And second step a step.
You map the key into a bucket. And remember that a bucket might have more than one
key. We'll see how to resolve. We call it a collision. We'll see how to resolve it.
Because we don't want to have two keys or two elements in the same pocket. A hash
function is usually specified as the composition of two functions. You can view it
as a composition of two hash functions, so such as you apply h1 to the key, and
then h2. Another hash function h2 to the key. Okay, so first step hash code. We h1
is going to map keys to integers because we need uh to store our key value pairs.
Um let's say an a in a hash table. And the hash table should be accessed using
subscripts or indices. So these are going to be integers. So this is what we can
do. You transform your keys into integers and then use your hash function where you
are going to map the keys or integers that correspond to the keys to a bounded set
from zero to n minus one, where n is the size of your hash table, and usually n is
a is a prime number. Okay, so for practical implementations, use n as a prime
number. Okay. So first of all again so repeating ourselves. Step one you compute
the hashcode of an item that you would like to store in a hash table. And then you
apply the compression function H2 and your item. And this case kV will be stored at
index I equal to h of k. We'll see an example. Now let's have a look at hash codes.
We would like our hash code to minimise collisions. So we'd like once we compute
this h of k, I would like to have kind of a unique value, kind of a new index where
we can insert our record or key value pair. If we have a lot of collisions, then
this means our hash function is not good, and maybe insertion or search or deletion
could take up to a linear time. Now, there isn't a unique way to choose a hash code
for a key. Here are some examples. Um, assume that, uh, you want to store a data
type of type, let's say buy char, and then you can use at most 32 bits for it. And
this goes well with Java for 32. Uh, Okay, but if you are using a 64 bit
representation, then you need to do some extra work. For example, if you want to
use, uh, if your. Keys so are of type long double, then you could combine the
higher and lower order portion of the key to get a 32 bit key to be able to to
compute its hash H2 of the result. And there are several ways to do it. You could,
for example, uh, xor xor the results. So you split the 64 bit key into two blocks
of 32 bit each, and then you simply ignore them and then you compute H2 of this
result. Uh, what if you have, uh, also characters you can, uh, or maybe word if you
have a word as as a key, uh, you can use a different approach. Uh, one common way
is, uh, using what is known as polynomial hash code. So let's explain what this is
about. If you have a character string, let's say you want to your key is let's say
kind of a sport could be let's say basketball. And then you have the value is going
to be the number of members who are playing, let's say, basketball. But the word
here is a string of characters. So how do we transform it into a into a key. So
remember these steps here. So we have h one of k that should get us an integer. So
we transform our key. Step one into an integer. And then once you have an integer,
we apply a mathematical function that uses a modulus operator. And for this scheme
you can use, as we said, a polynomial hashcode. And this is where we are going to
describe it. You just need to understand how it works. First of all we need to talk
about polynomial accumulation. What you can do. You start by partitioning the bits
of the key into a sequence of components or fixed length. It could be eight, 60 or
32 bits okay. So we divide them eight, eight, eight bits at a time or 16 bits at a
time. And then you are going to evaluate the polynomial this one. Because you
already have a0 a1 up to a and minus one depending on the length of the string
stream, and that is the constant of your choice. And when we say ignore overflows.
Decimal values are ignored. So or not decimal. So it depends on the operation. If
we are concerned with a fixed size, let's say overall 132 bits. If we have some
overflow when we do addition, then we ignore that. Now this polynomial at a fixed.
So you are given. Let's say you decide what the value of z is, and you are given a0
a1 to a n minus one. You simply compute this expression. Remember we are given a
key. I would like to compute its integer related value and there is a nice trick.
So definitely you can do it in quadratic time using a straightforward approach. You
compute a0, then a1 times z, then a2 times, z times, etc.. Okay. Or you could use
Horner's rule that computes this polynomial in linear time. And this is how we can
proceed. This is a nice trick for those maybe who don't know how to do it. We start
by computing z times a n minus one. Then you add to it a and minus two. Then you
multiply the result by z. And you keep doing this again over and over and over. And
then eventually after n big O of n steps or big O of n operations that combine
multiplication and additions, then you get the value of p of z. And this method is
suitable for strings. And if Um. Try several values of that. You can, um, check
that z equals 33 is quite efficient for data sets that contain about 50,000
records. Okay. So you could use this value. So what does it mean. It means that if
you want to build your hash table where the keys are, uh, words, then you can and
you have around 50,000 records would like to minimise the number of collisions. So
more on this later. That equal to 33 works very well. Now we'll talk about
compression functions. So this is step two. First step we had our keys. Now we
transform them into integers. And now we have integers. We'd like to store our
value. Uh and the key value pair k v, one of the simplest methods. So this is part
two okay or. Step two. And our process is simply to use what what is known as the
division method. And we have the guarantee that y is an integer based on what we
said previously. Let's say it's kind of a 32 bit integer, and h two of y is going
to be equal to y mod n. And n is something that you also choose. So the size of the
hash table you should also choose it. And this is going to be about the trade off
between time and space. I mean you can choose n to be very very large, but if the
number of items you want to store in your hash table is very low, then this would
be a waste of space. Although the number of collisions will be minimised. The idea
is to find the kind of a trade off. An optimal trade off between the size of
the hash table and the and the elements you want to store, or the records you want
to store in your hash table, plus number of collisions. Does it make sense to you
how large it is? How many numbers or records I would like to store and would like
to minimise the collisions? And you choose it. Let's say if a depends, it depends
on the way you would like to build your hash table. We have two possible ways we'll
discuss them. But in any case in practice n should be the size of the hash table
should be prime. And this has to deal with uh or to do with number theory. Um, and
then for example you could also use another uh hash function such as the following
one Add and divide. So you take a. B are also constants that you can choose. Then
you multiply a by y, you add to a b and then you compute modulus modulus and take
the result. Okay. And yeah there shouldn't be a negative values. So how you can
build your hash table. We have two strategies. The first one is referred to as
separate chaining. Collisions are going to occur when different elements are mapped
to the same cell. And how would you resolve. Do we resolve collisions by chaining
and chaining. Here is kind of building a linked list. I will show you how okay this
is this is an example. So again assume that to start over assume that you would
like to store some records key value pairs in a hash table. In this case, we have
three records to store. We say, okay, I'm going to do the mapping of my keys to
integers. And then um, so this is step one and step two. I'm going to choose maybe
a hash function, the division method. Although the division method might not be a
good choice. But that's fine. And you decided to build a hash table of size five.
In this case we see the indices from 0 to 5 and the keys um or sorry. The values
here are ABC. Let's say these are characters are not going to be stored inside the
hash table. Can you see it. They're going to be stored outside the hash table.
Assume that the first key would like to store was a. You compute H of the key A,
and then you found out that it that this fall. So you still fall? Where? As in node
of this linked list. Then you compute, for example, h of the key that correspond to
the to the value c, and you found out that it's also for okay. Then you add the the
value c to this chain okay. And so on. Because there was a collision with four. And
you add b let's say a position at index zero. And all the other indices for the
time being are empty. So this is represented by the small sample okay. So you get
any idea I mean the idea on how we can insert records in this hash table using
separate chaining to resolve collisions. Now what if I would like to add a record D
for example Or key value pair. K lets say. K four and d letter d. And when you hash
h of k four you get also four. Then we have to add d after c. Okay. And this might
not be good. So if you're. The collision resolution strategy or the hash function
that used is not good enough, then you might end up having a lot of collisions. So
inserting a record, finding a record or deleting a record could take up to linear
time. That's why we would like to minimise it. Ideally speaking here for this case
maybe the analysis on the next slide. For this, if you want to use this strategy
you should use The size of the hash table to be approximately equal to the number
of records you would like to insert. For example, if you have 100 records, then you
choose uppercase N to be approximately n equal to 100 should be a prime number, and
then your hash function should distribute all the keys equally likely. So the
probability of A for h of k to be equal to 012 up to 99 is the same. So more or
less you will have one maybe two records per index. And in this case accessing a
record insertion deletion finding search would be done very quickly. In constant
time. Almost constant time. Okay. So here we have an example of this column. So
analysis. Usually this is just for you. Information is performed using the load
factor, which is the of our hash table T which is equal to lowercase n over
uppercase n. And remember lowercase n represents the number of records we would
like to store in our hash table okay. Uppercase N is the size of your hash table.
And if your hash function performs well, then the average length of a list is going
to be equal to alpha, which is the load factor. And if you take n equal to a bigger
uppercase n, then all the basic operations. So we can show this can be done in
constant time okay. If you are using separate chaining. So practically use a good
hash function that distributes all the keys equally, likely between the value zero
to uppercase n minus one. And you choose uppercase N to be equal approximately or
asymptotically equal to n. Actually equal to n, but you take a very close. Choose a
very close prime number that is equal to n okay. Now there is another strategy that
you can also adopt to resolve collisions. The first one was separate chaining where
you stored the keys outside of the hash table. But here we are going to store them
inside the hash table and we'll see why. So open addressing is going to resolve
collisions in a different way. The idea is if a cell, if you have a collision, we
are not going to overwrite a value. We're going to find another cell that is empty,
definitely following some strategy. And each table cell inspected is referred to as
a probe. So let's say if we go back here to the previous example, assume that I
compute h of key of A. We got we got four. Now we compute then again and restore A
inside the table at position four. And now if you compute h of c of the key of C
the key of the record C equal to four, then we cannot store it at position four we
have to find another empty cell within this hash table. We don't have the right to
use any linked list. Okay. And every time, uh, we have, uh, we need to do a new
search to find if there is another available cell. We call it a probe. And ideally
speaking, we would like to minimise the number of probes. Okay. That's why we need
also to define, uh, a ratio between lowercase n and uppercase n. They shouldn't be
equal for, for this case for open addressing. Okay. So bear in mind trade off
between space and time. Not choosing a very large hash table because this is going
to be a waste of space, but not a too small one. Otherwise this will be also. This
will lead to a lot of collisions and it will impact the running times of the basic
operations. Okay. So during inspection. It could be for inserting an element.
Finding an element. So search, insert or delete. The goal is also to minimise these
collisions. And this is referred to as the probe sequence. So for example if this
cell is not available then I have to check another one. But we should follow a
specific strategy. We cannot just insert a key value pair in randomly in a hash
table. So we need to introduce some notation. Here it's not very complicated. Don't
worry about it. But this is just a formal notation. Beta zero is equal to h of k.
And then if a beta zero is unavailable then we compute beta of one. If beta one is
unavailable then we compute beta of two, etc.. So hopefully Uh, if your, uh,
implementation is optimal, the prop sequence should be very short. It shouldn't
depend on the number of items you would like to insert into into the hash tables.
So the prop sequence could have length two, three, four. It should be a constant
with respect to the size of your hash table. Does it does this make sense to you.
Let's see how we can do it. We'll see three strategies. First one. Um, is uh I
mean, this is the general model. Now, what I advise you is just check the examples
that we are going to work. Uh, we are going to see together. And then you will see
that it will be very easy to memorise. Morris. This form of the probe sequence.
Remember, we try with beta0. If it doesn't work, we try with beta one to see if
there is an empty space, etc.. Now your trials should depend on some index. So we
start at index zero. Every. Every time you would like to insert the key zero, index
one then index two and so on. So your hash function here will have a modified form.
It is h prime of k where h prime of k is the straightforward division method. But
we have a function f of I. This is going to be your collision resolution strategy.
Okay so this is very simple. So instead of using the simple or straightforward
division method h prime of k equal to k mod n, I'm going to say okay let's use k
mod n plus our function and everything should be computed modulo n. Of course,
because we cannot exceed the boundary between zero and n minus one, because we
would like to store our value or key value pair within this range. Okay. Um,
usually three techniques um, are going to be covered linear probing, quadratic
probing and double hashing. And why do we study them. It's like when we studied
sorting algorithms you see which ones are not good and why. And which one is going
to be a better choice for you. And also you'll be able to explain to justify your
answer. Okay. We start with linear probing and all these choices will be related to
F of I in bold. So you see, and h prime of k is the division method. And you will
have to the choice between uh selecting the value of f of I. Um, and now everything
should be clear for you. Linear probing. What should we take for f of I? Very
simple. We take f of I equal to I simple its identity okay. Quite straightforward.
And let's put it in practice and see what happens here on an example. It will allow
you to understand how this works in practice. So here's the exercise. Let's say I
ask you to be the exam. Let's say I give you some key value pairs. And to simplify
things the values are going to be Integers. Okay. Not going to be. Numbers,
letters. Whatever. Just for your understanding and I tell you okay, we have these
keys and we'd
like to insert them into a hash table of size seven. So we know that uppercase N
equal to seven. Indices go from 0 to 6. Pay attention to this fact. Um, few I would
say really, in some rare cases I see 0 to 7. Don't make this mistake. Size is
seven, but indices start from zero, so it's going to be 0 to 7 minus one six. Let's
let's start inserting 54. So 54. Remember the collision resolution strategy is the
identity is this is this is a linear probing. So this is going to be our expression
I compute a beta of zero which is h of 54 comma zero, which is simply the division
method. If y is going to be equal to zero, which is five. So we simply insert 54 at
position or at index five. What about now five. On purpose I chose these values for
five. We want to insert this key. Let's say we compute h5 comma zero and the result
is also five. We have a collision here. We cannot override this value. Now we have
the probe sequence which is uh, we have to, uh, compute. So we have beta of zero
already. Now we compute a beta of one. I highlighted it in red, you see here. And
since we have the, uh, we have used linear probing. F of y is equal to I. So you
add one here in both. So five mod seven is five plus one, it becomes six and six.
Modulus seven is six. And six is empty. We store five at location six. What about
nine? We do the same for nine. Nine. Seven is two empty? We insert it. What about
1212 mod seven? Uh. Is five but five? We have a collision. Then we compute. Bit of
one. Okay, so H of 12 comma one is six modulus seven. It is also six. So we have
also a collision. And then eventually we find an empty slot at index zero. Okay.
Any questions on how this works. Yeah. So I will resume the lecture later. Okay.Um.
All right. Um, we are talking about, uh, linear probing. And what you need to
understand about this method is that it is not effective, so you shouldn't be using
it for a simple reason. Um, it suffers from primary clustering. So we saw it on the
previous example. So intuitively what happens is that when let's say a position is,
uh, when we have a collision, we need to look for another, let's say available or
the next available, uh, spot. And the next available spot will be the one with
index I plus one. If we have a collision at index I eventually if you try this
method, you'll see that your hash table will be formed, uh, with clusters like
consecutive cells that are unavailable, followed by kind of a lot of empty spaces.
And this is not good because this will affect the speed or the time required to
perform the basic operations, such as look for a value, insert the value, or even
delete the value. Okay. Um, next we will see another method to resolve or strategy
to resolve collisions. It is also a very straightforward one called quadratic
probing. And as its name indicates, this time instead of f of y equal to I, you can
consider I square. So I times I for the function f um. And you will also see that
this method is also not very efficient because it also suffers from secondary
clustering, and you should be able to guess why. So first let's see how you can
insert these keys on this using the same example by the way. So I use the same
example. Okay. Given the keys 50 459 12 with some values. And we assume here that
the values are let's say the same. And the size of the hash table is also seven.
Initially you start with an empty table. And don't forget to use the proxy sequence
we talked about. Inserting 54 is going to be the same as before, because this is
the first element we insert in our table and The hash of 54 at index zero is going
to be equal to five. So insert it here. Um H50. We have a collision. F of I equal
to I which is one one square. I square I mean and it's the same for the second
step. Okay. So so far we have ah actually similar structure with a linear probing.
What about inserting nine nine. We don't have any problems. Um because for this key
we have H9 comma zero equal to two and two is available. What about sorry. What
about. Yeah 12. What did we say here? Nine. 12. Okay 12. Let's insert 12 and then
remove five. Okay. So what happens when we insert 1212. We have a collision. Yeah.
Maybe I need to update this then a 12 one. We also have a collision here. So before
removing five okay I will fix this. That's fine. And notice here the beta of one is
equal to 12 mod seven plus one squared which is one. So we still have a collision.
Now we try with beta two which is h of 12 comma two. And you have two squared. So
you compute 12 modulus modulus seven plus four. Everything should be also computed
modulus seven. What do we notice. We also have a collision at two you see here. And
then we have to try the fourth time by computing beta of Three. So who can guess
the next slot? Any idea? Even you can start from zero. By the way. You can compute
it. Yes. Why? Sorry, I couldn't hear you.
SPEAKER 1
I'm sorry.
SPEAKER 0
So? So what should we compute? Two. But we've already got to age 12. Comma two is
two. What should we compute? What should we compute here? Age 12. What? Which
value? Three. Very good. So technically it's going to be equal to 12 modulo seven.
Plus what. I square. Yes three squared which is nine and 12 modulo seven is five.
We know the answer from here five plus nine. It's 1414 modulus seven equal to yes
zero. Okay, so you should familiarise yourself with the modulus operation. I mean
14 modulus you divide 14 by seven. The remainder we take the remainder at zero. Now
without doing all these calculations you can guess from the start. So assume that
we tried h 12 comma zero. It is five. Now we want for next step one square, then
two squared then three squared and four squared. So we can predict the number of or
the index of the next set. Said. We try to find or yes, whether it is empty or not.
So the sequence, instead of being plus one plus one plus one for a linear probing,
for quadratic probing, it is going to be equal to what? Yeah. I mean just gave you
the answer. But just to verify the to make sure that you understood it, what would
be the sequence here. So assume that the first step sorry, the first cell is also
unavailable. I want to insert click next one. And this key is at position. Let's
say the index is at position I. And then next step I should look for a cell at
index. Which index can we guess it ahead of time. Is there a pattern for all the
indices. So index I then I plus how much if I use this quadratic probing function.
So let's give an example. Assume that something that you already you are already
familiar with. Let's say at position five five is not available. Where should we
look to insert our key? Which cell? Who can tell us if a five is busy or
unavailable? We do what without doing any calculations. Five plus. Look at the
pattern. It's 12 modulo seven. Plus what? Zero. Now, if this is not. This is not
available. We replace zero with one square, then two square. So it's going to be
one plus one then plus four. Can you, can you see it then. Plus nine then plus 16
and so on. So it's going to be of the form I square. And the pattern is true for
all the indices. Okay. So you might. You may take an example here. So this is the
answer. And for this reason I didn't mention it here for this reason. Quadratic
probing also suffers from secondary clustering. We still have a lot of empty slots
and a lot of slots that are unavailable, so the pattern will repeat itself. A
better strategy is to use double hashing so it has a trade off the strategy. Notice
the function here. Instead of having I or I square, we are going to have I times
another hash function. So the trade off here is going to be between more
computation. Because computing items is is much faster than computing items.
Another hash function. So we have a function call here. But the advantage is that
this function call is going to randomise Uh, the location where we are going to
insert another record. And usually we choose this, uh, hash function h second of k
to be equal to q minus k modulus q for for some prime number q that is less than n
okay. So this should be given to you. And then you simply proceed as usual. And
definitely the possible values for each second of k are going to be between one and
q. So this is very easy to verify. And again if we consider the same example see
how uh or where the keys are going to be stored whenever we have a collision. There
is no specific pattern here. Um, okay. So for the first key 54, we have the same
case uh, as before. First key at index five. Second one. Notice here beta zero of
five is five. We have a collision, but the next slot we cannot guess. It is going
to depend on the second hash function. Okay. And assume that for the hash function
we uh we select q equal to five. So q has a prime number equal to five. So what do
we get. We get five minus five. Uh yeah. Modulus uh five or so as h second of k.
Right. And and we simply evaluate this expression we get three. You see it was uh,
a bit unexpected. I mean, uh, and what about, uh, nine? So nine, uh, a beta0 of
nine is two. And what about beta0 of 12? We have. We know that we have a collision
with the key 54. Next slot is going to also depend on h second of k which is five -
12 modulus five okay. And 12 modulus five is two. Five minus two is three. And the
result is going to be one. And and that's it. You see here how we can. Distribute
evenly these keys across these slots okay. And a word on open addressing analysis.
So this is the second technique or approach that you could use to create your hash
table. And the load factor here is the same Theoretically speaking. So alpha is
going to be equal to or denoted by n over uppercase N, and it should remain below
one over two. For efficiency reasons this is based on practice. What does it mean.
It means that. So this is what we need to remember. This is what you need to
remember. The idea is if you want, let's say to store 100 records in a hash table
using open addressing. Then the size of your hash table should be around 200. So
double the number of elements you would like to store in it. Okay. So this is what
we recommend. And notice that this is something that you can do dynamically. So
initially you start with an empty hash table. You think that you might store 100
keys more or less. And the size of your hash table is going to be approximately
200, but not 200. Prime number that's close to 100. And what happens. So this is a
question. Yeah. Yeah. So I will address it later. So what happens if we need to
insert more elements. Okay. Maybe instead of 100 we'd like to store maybe up to
150. So this is we will answer this question later. But here we are going to talk
about first about deletion. How can you delete an element from a hash table. The
best way to do it is to replace the element with a special defunct sentinel object.
So kind of marked as deleted. Why? Any idea why? So let's look at this table here.
This is this simple example. Assume that I want to delete. So okay we inserted in
order 50 54. Five. Nine and 12. And the collisions? We had collision five with 54.
And also 12 with, uh, 54. What if I would like to delete 54? Assume that I want to
delete 54. We say okay, we are going to replace 54 with a kind of a special value
kind of a marker. Why not simply removing 54. What happens if I simply delete 54?
Anyone? Yes.
SPEAKER 2
You will have to search the entire list if you're going to find something that
collided afterwards.
SPEAKER 0
Yes. Very good. I mean, any other remark? Yeah. So the problem is assume that.
Okay, we deleted 54 and now we would like to search for 12. Okay. So the first time
you try to search for 12 you would complete H 12 comma zero. And you'll see that h
12 zero is equal to five. Five is empty. You think that 12 is not there? This is
going to be a problem for you too. To solve it, you'll have to move again. All the
elements that had a collision with index 54, which is going to be very costly. It
doesn't make sense okay. So the fastest and most efficient way is to simply I mean
use kind of a lazy deletion. So we say okay we mark this item or element as
deleted. Now again if I want to do this and we would like to search for 12. If we
encounter such a cell we know that we have to keep looking. Maybe we had a
collision, then we might find 12 in a different cell or at a different location.
Okay, so keep this in mind if you want to implement a hash table. Yeah. So we don't
remove it. We simply mark it as deleted sometimes in some books we can refer to it
as lazy deletion. All right. And now if you implement this strategy and after a
while you end up reaching an empty cell, then you know, and while looking for an
element, then you know that the element is not in the hash table. Okay. So what are
the basic operations? Uh, we discussed we can remove a key. We can search for a
key, and we can insert the key. And how can we remove a key? Very simple. First we
have to look for it. If a key is found, we simply replace it with this special
default value or item. Otherwise we return null because the item is or the key is
not there. And this is a kind of a Pseudocode or algorithm on how pseudocode. How
to also put an element to add an element. So we throw an element. Sorry, an
exception of the table is full. Otherwise this is only two for open addressing, not
for separate chaining, because in separate chaining we use a linked list to add
elements to the table. Okay. So we keep looking for the position. And then once we
have an empty slot we uh we insert CVV and Sally with the record giving. And
definitely we have to keep looking for a, for a for a cell. Okay. Um, what about
the analysis, the expected running time if you implement your hash tables in an
efficient way. All the operation. Basic operations can be performed in constant
time. Search, deletion or lazy deletion and insertion. And in practice, hashing is
very fast provided the load factor is not close to 100%. You can you can test it.
So again use the recommended values for the load factor as a quick analysis between
separate chaining and open addressing and separate chaining. Use the same size for
the hash table as the number of the items or records you would like to insert into
it, but the drawback is that you need to use another data structure, which is a
linked list for open addressing, a simple, uh, array based data structure. And then
you have the indices just need to do to deal with collisions. Conditions. Okay, so
maybe I didn't talk about rehashing here, so I'm going to talk about it here. If we
go back a little bit, maybe I should add a few slides about rehashing. Just a note.
So what if here you have this scenario. We have let's say uh, four elements to
store. And the size of the hash table is seven. What if we need to store more
elements than the ones I provided you with? Any idea? What should we do? Let's say
you know that at this point that you need to store two more elements. What should
we. Should we do? Should we keep the same, uh, hash table? Yes or no? Why? I don't
want any explanation. Tell me yes or no. Who thinks that we can? I mean, it's
feasible, but can we keep the same hash table? If you'd like to, to to insert two
additional records. Can we. Yes or no? Yeah. Yes we can. What about five? Can we
keep it if I want to insert five more elements? Given the fact that we have, uh,
three available slots or spots left. Yes or no, we cannot. Um, okay, so let's go
back for the case where we want to add two more elements. Technically we can, but
this might lead to a lot. Any idea if I want to store, let's say key uh 15 and key
22 without doing any calculations. We know that after a while, we will end up
storing them at index 0 or 4 or 6. But this will affect what? Any ideal. If we keep
trying. Maybe we need to try more than once. Yeah. Insertion. Can it be done in
constant time? Yeah. Or this might take more than constant time. Okay. You have to
think of it. How many collisions this. This is going to to lead to. If I want to
insert let's say maybe a key K, maybe I have a collision with 12. Then I have a
collision with five. Maybe then I have a collision with with 54. You see what I
mean? That's all we need to maintain this ratio. One over two. Always remember it.
Okay. So practically we rehash the table. We don't do it often.
SPEAKER 2
Um.
SPEAKER 0
Instead of working with the hash table of size seven, then, for example, you could,
um, build a new hash hash table of size maybe 17 at this time. So more space and
then you rehash all the values again, 54, five, nine, 12, plus the 2 or 3, uh, new
values you would like to insert into this table. Okay. And this is definitely this
is going to take more time, but you do it. Uh, you don't do it often. Rehashing
takes, uh, linear time, but all the basic operations will be guaranteed to run in
constant time. Okay. Any questions? Yeah. Good. So, yeah, maybe I could add for
next year, uh, a couple of slides about, uh, rehashing. But this is good to know.