2 minute read

Video Link

Symbol-table problem

Table $S$ holding $n$ records.

Operations:

  • Insert(S,x): $S \leftarrow S \cup {x}$ ($S$ is a dynamic set)
  • Delete(S,x): $S \leftarrow S - {x}$
  • Search(S,k): return $x$ s.t. $key[x]=k$ or nil if no such $x$

Direct-access table

Suppose keys are drawn from set $u=\{0,1,\dots,m-1\}$. Assume keys are distinct.

Set up array $T[0,\dots,m-1]$ to represents dyn set $S$.

\[T[k]=\begin{cases} x, &\text{if} \ x \in S \ \text{and} \ key[x]=k \newline \mathrm{nil}, &\text{otherwise} \end{cases}\]

Ops take $\Theta(1)$ time.

Hashing

Hash function $h$ maps keys “randomly” into slots of table $T$.

When a record to be inserted maps to an already occupied slot, a collision occurs.

Resolving collision by chaining

Idea: link records in same slot into a list.

Analysis:

  • Worst-case: every key hashes to the same slot. Access take $\Theta(n)$ if $ S =n$.
  • Average-case: assumption of simple uniform hashing.
    • each key $k \in S$ is equally likely to be hashed to any slot in $T$, independent of where other keys are hashed.
    • Def. the load factor of a hash table with $n$ keys and $m$ slots is $\alpha=\frac{n}{m}$.
    • Expected unsuccessful search time is $\Theta (1+ \alpha)$(1 is doing a hash and $\alpha$ is searching the list), is $\Theta(1)$ if $\alpha=\Theta(1)$, i.e. $n=\Theta(m)$.

Choosing a hash function

  • Should distribute keys uniformly into slots
  • Regularity $m$ keys distribution should not affect uniformly

Division method:

\[h(k)=k\mod m\]
  • Don’t pick $m$ with small divisor $d$. Ex. if $d=2$ and all keys even $\implies$ odds slots are never used. Ex. if $m=2^r$ $\implies$ hash doesn’t depend on all bits of $k$(binary).
  • Pick $m=$ prime not close to power of 2 or 10.

Multiplication method:

$m=2^r$, cimputer has $w$-bits words.

\[h(k)=(A\cdot k \mod 2^w) >> (w-r)\]

$A$ is an odd number , $2^{w-1}<A<2^{w}$.

  • Don’t pick $A$ close to $2^{w-1}$ and $2^{w}$.
  • Fast method: mult $\mathrm{mod}$ $2^{w}$ faster than div and rsh is also fast.

Resolving collision by open addressing

  • Idea: probe the tabel systematically until an empty slot is found.
  • $h$: $U \times \{0,1,\dots,m-1\}$(probe seq) $\to \{0,1,\dots,m-1\}$
  • Probe seq should be permutation of $0,1,\dots,m-1$.
  • Table may fill up, s.t. $n\leqslant m$.
  • Deletion is difficult.

Probing strategies:

  • Linear: $h(k,i)=\left( h(k,0) +i \right)\mod m$.
    • primary clusting: long runs of filled slots.
  • Double-hashing: $h(k,i)=(h_{1}(k)+ih_{2}(k))\mod m$.
    • excellent method, usually pick $m=2^{r}$ and $h_{2}(k)$ odd.

Analysis of open addressing: Assumption of uniform hashing: each key equally likely to have any one of the $m!$ permutations as its probe seq, independent of other keys.

Theorem:

\[E[ \\# \mathrm{probes}] \leqslant \frac{1}{1-\alpha}, \alpha <1\]

Proof:(unsuccessful search) 1 probe always neccessary. With probability $\frac{n}{m}$, collision $\implies$ 2nd prob. $\frac{n-1}{m-1} \implies$ 3rd prob.

\[\begin{align} E[\\# \mathrm{probs}] &= 1+ \frac{n}{m}\left( 1+\frac{n-1}{m-1}(1+\dots) \right) \newline &\leqslant 1+\alpha(1+\alpha(\dots)) \newline &\leqslant \sum_{i=0}^{\infty} \alpha^i = \frac{1}{1-\alpha} \end{align}\]
  • $a<1$ const $\implies$ $\Theta(1)$ probs.

Comments