Introduction to Algorithms/Amortized Analysis
How large should a hash table be?
- As large as possible (time)
- As small as possible (space)
- $\Theta(n)$ for $n$ items
Problem What if we don’t know $n$ in advance?
Soluition Dynamic tables.
Dynamic tables
Idea: Whenever the table gets too full (overflows), we “grow” it.
- Allocate (malloc or new) a larger table.
- Move items from old to new.
- Free old table.
Analysis Seq of $n$ insert: ops worst-case cost of 1 insert is $\Theta(n)$. $\implies$ w-c cost of $n$ inserts = $n \cdot \Theta(n)=\Theta(n^2)$.
Wrong! $n$ Inserts take $\Theta(n)$ time.
Let $c_{i}$ be the cost of the $i$th insert.
\[c_{i}=\begin{cases} i, &\text{if }i-1 \text{ is power of }2 \newline 1, &\text{otherwise} \end{cases}\] \[\begin{align} \text{cost}&= \sum_{i=1}^{n}c_{i} \newline &= n+\sum_{j=0}^{\lfloor \lg (n-1) \rfloor} 2^{j} \newline &\leqslant 3n \end{align}\]Thus, average cost of insert is $\Theta(1)$.
Amortized analysis
Analyze a seq of ops to show that ave. cost per op is small, even though 1 op may be expensive.No probability - average performance in worst-case.
Types of amortized arguments
- aggregate analysis (just saw)
- accounting argument
- potential argument
Accounting and potential arguments are more precise, allocating specific amortized costs to each op.
Accounting method
- Charge $i$th op a fictitious amortized cost $\hat{c_{i}}$ ($1 pays for 1 unit of work)
- Fee is consumed to perform the op.
- Unused amount stored in the “bank” for later use.
- Bank balance must not go negative. Must have $\sum_{i=1}^{n}c_{i}\leqslant \sum_{i=1}^{n}\hat{c_{i}},\ \forall n$.
Dynamic Table:
- Charge $\hat{c_{i}} =$ 3 dollars for $i$th insert
- $1 pays for immediate insert.
- $2 stored for table doubling.
- When table doubles
- $1 moves recent item.
- $1 moves old item.
Invarient: Bank balance $\geqslant 0$. $\implies \sum_{i=1}^{n}c_{i}\leqslant 3n$.
Difference schemes for charging can also work (ex. charge 2 for the first iterm and 3 for the rest).
Potential method
“Bank account” viewed as potential energy of dynamic set.
Framework:
- Start with data structure $D_{0}$.
- Op $i$ transforms $D_{i-1} \to D_{i}$.
- Cost of op $i$ is $c_{i}$.
- Define potential function $\Phi: \{D_{i}\} \to \mathbb{R}$, s.t. $\Phi (D_{0})=0$ and $\Phi (D_{i}) \geqslant 0 \ \forall i$.
- Define the amortized cost $\hat{c_{i}}$ w.r.t. $\Phi (D_{i})$ is \(\hat{c_{i}} = c_{i}+\Phi (D_{i}) - \Phi (D_{i-1}) = c_{i}+\Delta \Phi_{i}\) If $\Delta_{i} \geqslant 0$, then $\hat{c_{i}}>c_{i}$. Op $i$ stores work in data structure for later. Vice versa.
Totla amortized cost of $n$ ops is \(\sum_{i=1}^{n}\hat{c_{i}}=\sum_{i=1}^{n}c_{i}+\Phi(D_{i})-\Phi(D_{0})\geqslant \sum_{i=1}^{n}c_{i}.\) Table doubling: Define $\Phi (D_{i})=2i-2^{\lceil \lg i \rceil}$.(Assume $2^{\lceil \lg 0 \rceil}=0$).
- $\Phi (D_{0})=0, \Phi (D_{i})\geqslant 0, \forall i$.
case 1: $i-1$ is exactly power of 2, $\hat{c_i}=i+2- 2^{\lceil \lg i\rceil} + 2^{\lceil \lg(i-1) \rceil} = 3$.
case 2: $\hat{c_{i}}=1+2- 2^{\lceil \lg i\rceil} + 2^{\lceil \lg(i-1) \rceil}=3$.
$\implies \sum_{i=1}^{n} c_{i}< 3n$.
Conclusions. Amortized costs provide a clean abstraction for data structure performance.
- Any method can be used, but each has situations where it’s arguably simplist or most precise.
- In fact different potential functions or accounting costs may yield different bounds.
Comments