Hashing : Hashing The process of mapping a key value to a position in a table.
A hash function maps key values to positions. It is denoted by h.
A hash table is an array that holds the records. It is denoted by T.
The hash table has M slots, indexed from 0 to M - 1
Hashing : Hashing
For any value K in the key range and some hash
function h,
h(K) =i; 0 i < M, such that key(T[i]) = K.
Load factor
The ratio of the number of items in the table to the table size
Load factor = Number of items / Table size
Hash Tables : Hash Tables Direct Address Tables
If we have a collection of n elements whose
keys are unique integers in (1 ... m), where
m >= n, then we can store the items in a
direct address table, T[m], where Ti is either
empty or contains one of the elements of our
collection.
Hash Tables : Hash Tables
Hash Tables : Hash Tables Direct Address Tables
Searching a direct address table operation for
a key, k, we access Tk,
if it contains an element, return it,
if it doesn't then return a NULL.
There are two constraints here:
the keys must be unique, and
the range of the key must be severely bounded.
Hash Tables : Hash Tables Direct Address Tables
If the keys are not
unique, then we can
simply construct a set of
m lists and store the
heads of these lists in
the direct address table.
Hash Tables : Hash Tables Direct Address Tables
The range of the key determines the size of
the direct address table and may be too large
to be practical.
Hash Tables : Hash Tables Direct Address Tables
Direct addressing is easily generalized to the
case where there is a function,
h(k) => (1,m)
which maps each value of the key, k, to the
range (1,m). In this case, we place the
element in T[h(k)] rather than T[k].
Hash Tables : Hash Tables Mapping functions
The direct address approach requires that the
function, h(k), is a one-to-one mapping from
each k to integers in (1,m). Such a function is
known as a perfect hashing function.
The PHF maps each key to a distinct integer within
some manageable range.
Hash Tables : Hash Tables Mapping functions
Finding a perfect hashing function is not
always possible.
More realistically, a hash function, h(k),
maps most of the keys onto unique integers,
but maps some other keys on to the same
integer.
Hash Tables : Hash Tables Mapping functions
This clashing is called Collision.
If the number of collisions is sufficiently
small, then the hash tables work quite well.
Hash Tables : Hash Tables Collision Resolution
Chaining
Overflow areas
Re-hashing
Using neighbouring slots (linear probing)
Quadratic probing
Random probing.
Collision Resolution : Collision Resolution
Chaining
Chain all collisions in lists attached to the appropriate slot.
This allows an unlimited number of collisions to be handled
and doesn't require a priori knowledge of how many
elements are contained in the collection.
The tradeoff is the same as with linked lists versus array
implementations of collections: linked list overhead in space
and, to a lesser extent, in time.
Collision Resolution : Collision Resolution 2. Overflow areas
Divide the pre-allocated table into two
sections:
the primary area to which keys are mapped
an area for collisions, normally termed the overflow area.
Collision Resolution : Collision Resolution 2. Overflow areas
When a collision occurs, a
slot in the overflow area is
used for the new element
and a link from the primary
slot established as in a
chained system.
Collision Resolution : Collision Resolution Re-hashing
Use a second hashing operation when there is a collision. If there is a further collision, re-hash until an empty “slot” in the table is found.
The re-hashing function can either be a new
function or a re-application of the original one. As
long as the functions are applied to a key in the
same order, then a sought key can always be
located.
Collision Resolution : Collision Resolution Re-hashing
Collision Resolution : Collision Resolution 4. Using neighbouring slots - linear probing
One of the simplest re-hashing functions is +1 (or -1)
i.e. on a collision, look in the neighbouring slot in the table.
This calculates the new address extremely quickly..
Collision Resolution : Collision Resolution 4. Using neighbouring slots - linear probing
Linear probing is subject to a clustering phenomenon.
Re-hashes from one location occupy a block of slots in the table which “grows” towards slots to which other keys hash.
This exacerbates the collision problem and the number of re-hashed can become large.
Collision Resolution : Collision Resolution Quadratic probing
Better behaviour is usually obtained with quadratic probing, where the secondary hash function depends on the re-hash index:
address = h(key) + c i2
on the tth re-hash. (A more complex function of i may also be used.)
Collision Resolution : Collision Resolution Quadratic probing
Since keys which are mapped to the same value by the primary hash function follow the same sequence of addresses, quadratic probing shows secondary clustering. However, secondary clustering is not nearly as severe as the clustering shown by linear probes.
Collision Resolution : Collision Resolution Quadratic probing
Re-hashing schemes use the originally allocated table space and thus avoid linked list overhead, but require advance knowledge of the number of items to be stored.
However, the collision elements are stored in slots to which other key values map directly, thus the potential for multiple collisions increases as the table becomes full.
Collision Resolution : Collision Resolution 6. (Pseudo) Random probing
Collision Resolution : Collision Resolution Example:
Using the hash function:
h (k) = 2k mod 10
Insert the numbers: 2, 20, 7, 15, 3, 0, 4, 14
into a one-dimensional array of 10
elements using linear probing to resolve
collisions.
Collision Resolution : Collision Resolution h (k) = 2k mod 10
2, 20, 7, 15, 3, 0, 4, 14