2 minute read

Video Link

How large should a hash table be?

  • As large as possible (time)
  • As small as possible (space)
  • $\Theta(n)$ for $n$ items

Problem What if we don’t know $n$ in advance?

Soluition Dynamic tables.

Dynamic tables

Idea: Whenever the table gets too full (overflows), we “grow” it.

  1. Allocate (malloc or new) a larger table.
  2. Move items from old to new.
  3. Free old table.

Analysis Seq of $n$ insert: ops worst-case cost of 1 insert is $\Theta(n)$. $\implies$ w-c cost of $n$ inserts = $n \cdot \Theta(n)=\Theta(n^2)$.

Wrong! $n$ Inserts take $\Theta(n)$ time.

Let $c_{i}$ be the cost of the $i$th insert.

\[c_{i}=\begin{cases} i, &\text{if }i-1 \text{ is power of }2 \newline 1, &\text{otherwise} \end{cases}\] \[\begin{align} \text{cost}&= \sum_{i=1}^{n}c_{i} \newline &= n+\sum_{j=0}^{\lfloor \lg (n-1) \rfloor} 2^{j} \newline &\leqslant 3n \end{align}\]

Thus, average cost of insert is $\Theta(1)$.

Amortized analysis

Analyze a seq of ops to show that ave. cost per op is small, even though 1 op may be expensive.No probability - average performance in worst-case.

Types of amortized arguments

  • aggregate analysis (just saw)
  • accounting argument
  • potential argument

Accounting and potential arguments are more precise, allocating specific amortized costs to each op.

Accounting method

  • Charge $i$th op a fictitious amortized cost $\hat{c_{i}}$ ($1 pays for 1 unit of work)
  • Fee is consumed to perform the op.
  • Unused amount stored in the “bank” for later use.
  • Bank balance must not go negative. Must have $\sum_{i=1}^{n}c_{i}\leqslant \sum_{i=1}^{n}\hat{c_{i}},\ \forall n$.

Dynamic Table:

  • Charge $\hat{c_{i}} =$ 3 dollars for $i$th insert
    • $1 pays for immediate insert.
    • $2 stored for table doubling.
  • When table doubles
    • $1 moves recent item.
    • $1 moves old item.

Invarient: Bank balance $\geqslant 0$. $\implies \sum_{i=1}^{n}c_{i}\leqslant 3n$.

Difference schemes for charging can also work (ex. charge 2 for the first iterm and 3 for the rest).

Potential method

“Bank account” viewed as potential energy of dynamic set.

Framework:

  • Start with data structure $D_{0}$.
  • Op $i$ transforms $D_{i-1} \to D_{i}$.
  • Cost of op $i$ is $c_{i}$.
  • Define potential function $\Phi: \{D_{i}\} \to \mathbb{R}$, s.t. $\Phi (D_{0})=0$ and $\Phi (D_{i}) \geqslant 0 \ \forall i$.
  • Define the amortized cost $\hat{c_{i}}$ w.r.t. $\Phi (D_{i})$ is \(\hat{c_{i}} = c_{i}+\Phi (D_{i}) - \Phi (D_{i-1}) = c_{i}+\Delta \Phi_{i}\) If $\Delta_{i} \geqslant 0$, then $\hat{c_{i}}>c_{i}$. Op $i$ stores work in data structure for later. Vice versa.

Totla amortized cost of $n$ ops is \(\sum_{i=1}^{n}\hat{c_{i}}=\sum_{i=1}^{n}c_{i}+\Phi(D_{i})-\Phi(D_{0})\geqslant \sum_{i=1}^{n}c_{i}.\) Table doubling: Define $\Phi (D_{i})=2i-2^{\lceil \lg i \rceil}$.(Assume $2^{\lceil \lg 0 \rceil}=0$).

  • $\Phi (D_{0})=0, \Phi (D_{i})\geqslant 0, \forall i$.
\[\begin{align} \hat{c_{i}} &= c_{i} + \Phi (D_{i}) - \Phi (D_{i-1}) \newline &= c_{i} + 2 - 2^{\lceil \lg i\rceil} + 2^{\lceil \lg(i-1) \rceil} \end{align}\]

case 1: $i-1$ is exactly power of 2, $\hat{c_i}=i+2- 2^{\lceil \lg i\rceil} + 2^{\lceil \lg(i-1) \rceil} = 3$.

case 2: $\hat{c_{i}}=1+2- 2^{\lceil \lg i\rceil} + 2^{\lceil \lg(i-1) \rceil}=3$.

$\implies \sum_{i=1}^{n} c_{i}< 3n$.

Conclusions. Amortized costs provide a clean abstraction for data structure performance.

  • Any method can be used, but each has situations where it’s arguably simplist or most precise.
  • In fact different potential functions or accounting costs may yield different bounds.

Comments