CSC 378: Data Structures and Algorithm Analysis

Hashing

Definitions

All elements are stored in the hash table. The table is implemented as an array with m slots. On inserting, we search through the table until an empty slot is found - a process called ``probing.'' The number of probes is at most m.

The hash function, h, maps elements from the universe, U, onto slots in the range 0 ... m-1:

index1.gif

In probing, the probe number (in 0 ... m-1) also determines the slot:

index2.gif

For a key, k, the probe sequence is index3.gif This is a permutation of index4.gif Hashing is said to be uniform if each permutation is equally likely.

Algorithms

Insertion

Insert( k ) i = 0 while (i < m && A[ h(k,i) ].occupied) i = i+1 if i == m error( "Table full" ) else A[ h(k,i) ].data = k A[ h(k,i) ].occupied = true

Searching

Search( k ) i = 0 while (i < m && A[ h(k,i) ].occupied && A[ h(k,i) ].data != k) i = i+1 return (i != m && A[ h(k,i) ].occupied) // true if found, false otherwise

Analysis

Let n be the number of elements in a hash table of m slots, which uses open addressing.

The load factor is index5.gif .

Theorem

[CLR theorem 12.5]

With open addressing and uniform hashing, the expected number of probes in an unsuccessful search is index6.gif .

Corollary

Expected insertion time is index7.gif . The graph looks like this for m = 1000:

This means that we can expect really fast insertion as long as the table is less than about 90 % full. At 90 % full, an insertion is expected to require 10 probes. This increases quickly as the table fills even more.

Proof

index8.gif

Let index9.gif be the probability that exactly i slots are found to be occupied before an empty slot is found ( index10.gif for index11.gif ).

Let index12.gif be the number of probes.

Then index13.gif .

Let index14.gif be the probability that at least i slots are found to be occupied. Note that index15.gif , since index16.gif is the probability that at least i, but not i+1 slots are found to be occupied. Then

index17.gif

The first probe has an index18.gif chance of hitting an occupied slot, assuming simple uniform hashing, so

index19.gif

If the first slot is occupied, then index20.gif of the remaining index21.gif slots are occupied, and

index22.gif

In general,

index23.gif

Thus

index24.gif

So the expected number of probes in an unsuccessful search is index25.gif .




Back to the lecture index

Copyright 1998 James Stewart