Introduction to Algorithms/Hashing
Symbol-table problem
Table $S$ holding $n$ records.
Operations:
Insert(S,x)
: $S \leftarrow S \cup {x}$ ($S$ is a dynamic set)Delete(S,x)
: $S \leftarrow S - {x}$Search(S,k)
: return $x$ s.t. $key[x]=k$ or nil if no such $x$
Direct-access table
Suppose keys are drawn from set $u=\{0,1,\dots,m-1\}$. Assume keys are distinct.
Set up array $T[0,\dots,m-1]$ to represents dyn set $S$.
\[T[k]=\begin{cases} x, &\text{if} \ x \in S \ \text{and} \ key[x]=k \newline \mathrm{nil}, &\text{otherwise} \end{cases}\]Ops take $\Theta(1)$ time.
Hashing
Hash function $h$ maps keys “randomly” into slots of table $T$.
When a record to be inserted maps to an already occupied slot, a collision occurs.
Resolving collision by chaining
Idea: link records in same slot into a list.
Analysis:
-
Worst-case: every key hashes to the same slot. Access take $\Theta(n)$ if $ S =n$. - Average-case: assumption of simple uniform hashing.
- each key $k \in S$ is equally likely to be hashed to any slot in $T$, independent of where other keys are hashed.
- Def. the load factor of a hash table with $n$ keys and $m$ slots is $\alpha=\frac{n}{m}$.
- Expected unsuccessful search time is $\Theta (1+ \alpha)$(1 is doing a hash and $\alpha$ is searching the list), is $\Theta(1)$ if $\alpha=\Theta(1)$, i.e. $n=\Theta(m)$.
Choosing a hash function
- Should distribute keys uniformly into slots
- Regularity $m$ keys distribution should not affect uniformly
Division method:
\[h(k)=k\mod m\]- Don’t pick $m$ with small divisor $d$. Ex. if $d=2$ and all keys even $\implies$ odds slots are never used. Ex. if $m=2^r$ $\implies$ hash doesn’t depend on all bits of $k$(binary).
- Pick $m=$ prime not close to power of 2 or 10.
Multiplication method:
$m=2^r$, cimputer has $w$-bits words.
\[h(k)=(A\cdot k \mod 2^w) >> (w-r)\]$A$ is an odd number , $2^{w-1}<A<2^{w}$.
- Don’t pick $A$ close to $2^{w-1}$ and $2^{w}$.
- Fast method: mult $\mathrm{mod}$ $2^{w}$ faster than div and rsh is also fast.
Resolving collision by open addressing
- Idea: probe the tabel systematically until an empty slot is found.
- $h$: $U \times \{0,1,\dots,m-1\}$(probe seq) $\to \{0,1,\dots,m-1\}$
- Probe seq should be permutation of $0,1,\dots,m-1$.
- Table may fill up, s.t. $n\leqslant m$.
- Deletion is difficult.
Probing strategies:
- Linear: $h(k,i)=\left( h(k,0) +i \right)\mod m$.
- primary clusting: long runs of filled slots.
- Double-hashing: $h(k,i)=(h_{1}(k)+ih_{2}(k))\mod m$.
- excellent method, usually pick $m=2^{r}$ and $h_{2}(k)$ odd.
Analysis of open addressing: Assumption of uniform hashing: each key equally likely to have any one of the $m!$ permutations as its probe seq, independent of other keys.
Theorem:
\[E[ \\# \mathrm{probes}] \leqslant \frac{1}{1-\alpha}, \alpha <1\]Proof:(unsuccessful search) 1 probe always neccessary. With probability $\frac{n}{m}$, collision $\implies$ 2nd prob. $\frac{n-1}{m-1} \implies$ 3rd prob.
\[\begin{align} E[\\# \mathrm{probs}] &= 1+ \frac{n}{m}\left( 1+\frac{n-1}{m-1}(1+\dots) \right) \newline &\leqslant 1+\alpha(1+\alpha(\dots)) \newline &\leqslant \sum_{i=0}^{\infty} \alpha^i = \frac{1}{1-\alpha} \end{align}\]- $a<1$ const $\implies$ $\Theta(1)$ probs.
Comments