Introduction to Algorithms/Hashing

April 11, 2023 2 minute read

Video Link

Symbol-table problem

Table $S$ holding $n$ records.

Operations:

Insert(S,x): $S \leftarrow S \cup {x}$ ($S$ is a dynamic set)
Delete(S,x): $S \leftarrow S - {x}$
Search(S,k): return $x$ s.t. $key[x]=k$ or nil if no such $x$

Direct-access table

Suppose keys are drawn from set $u=\{0,1,\dots,m-1\}$. Assume keys are distinct.

Set up array $T[0,\dots,m-1]$ to represents dyn set $S$.

\[T[k]=\begin{cases} x, &\text{if} \ x \in S \ \text{and} \ key[x]=k \newline \mathrm{nil}, &\text{otherwise} \end{cases}\]

Ops take $\Theta(1)$ time.

Hashing

Hash function $h$ maps keys “randomly” into slots of table $T$.

When a record to be inserted maps to an already occupied slot, a collision occurs.

Resolving collision by chaining

Idea: link records in same slot into a list.

Analysis:

Worst-case: every key hashes to the same slot. Access take $\Theta(n)$ if $ S =n$.
Average-case: assumption of simple uniform hashing.
- each key $k \in S$ is equally likely to be hashed to any slot in $T$, independent of where other keys are hashed.
- Def. the load factor of a hash table with $n$ keys and $m$ slots is $\alpha=\frac{n}{m}$.
- Expected unsuccessful search time is $\Theta (1+ \alpha)$(1 is doing a hash and $\alpha$ is searching the list), is $\Theta(1)$ if $\alpha=\Theta(1)$, i.e. $n=\Theta(m)$.

Choosing a hash function

Should distribute keys uniformly into slots
Regularity $m$ keys distribution should not affect uniformly

Division method:

\[h(k)=k\mod m\]

Don’t pick $m$ with small divisor $d$. Ex. if $d=2$ and all keys even $\implies$ odds slots are never used. Ex. if $m=2^r$ $\implies$ hash doesn’t depend on all bits of $k$(binary).
Pick $m=$ prime not close to power of 2 or 10.

Multiplication method:

$m=2^r$, cimputer has $w$-bits words.

\[h(k)=(A\cdot k \mod 2^w) >> (w-r)\]

$A$ is an odd number , $2^{w-1}<A<2^{w}$.

Don’t pick $A$ close to $2^{w-1}$ and $2^{w}$.
Fast method: mult $\mathrm{mod}$ $2^{w}$ faster than div and rsh is also fast.

Resolving collision by open addressing

Idea: probe the tabel systematically until an empty slot is found.
$h$: $U \times \{0,1,\dots,m-1\}$(probe seq) $\to \{0,1,\dots,m-1\}$
Probe seq should be permutation of $0,1,\dots,m-1$.
Table may fill up, s.t. $n\leqslant m$.
Deletion is difficult.

Probing strategies:

Linear: $h(k,i)=\left( h(k,0) +i \right)\mod m$.
- primary clusting: long runs of filled slots.
Double-hashing: $h(k,i)=(h_{1}(k)+ih_{2}(k))\mod m$.
- excellent method, usually pick $m=2^{r}$ and $h_{2}(k)$ odd.

Analysis of open addressing: Assumption of uniform hashing: each key equally likely to have any one of the $m!$ permutations as its probe seq, independent of other keys.

Theorem:

\[E[ \\# \mathrm{probes}] \leqslant \frac{1}{1-\alpha}, \alpha <1\]

Proof:(unsuccessful search) 1 probe always neccessary. With probability $\frac{n}{m}$, collision $\implies$ 2nd prob. $\frac{n-1}{m-1} \implies$ 3rd prob.

\[\begin{align} E[\\# \mathrm{probs}] &= 1+ \frac{n}{m}\left( 1+\frac{n-1}{m-1}(1+\dots) \right) \newline &\leqslant 1+\alpha(1+\alpha(\dots)) \newline &\leqslant \sum_{i=0}^{\infty} \alpha^i = \frac{1}{1-\alpha} \end{align}\]

$a<1$ const $\implies$ $\Theta(1)$ probs.

Twitter Facebook LinkedIn

Zhancun Mu

Introduction to Algorithms/Hashing

Symbol-table problem

Direct-access table

Hashing

Resolving collision by chaining

Choosing a hash function

Resolving collision by open addressing

Comments

You May Also Enjoy

Introduction to Algorithms/Introduction

Introduction to Algorithms/Sorting

Introduction to Algorithms/Skip List

Introduction to Algorithms/Shortest Paths 3