Introduction to Algorithms/Binary Search Tree
BST sort
for i <- to n
do Tree-Insert(T,A[i])
Inorder-Tree-Walk(root[T])
Time: $O(n)$ for walk, $\Omega(n\lg n)$ and $O(n^2)$ for $n$ Tree-Inserts. Lucy: balanced $O(\lg n)$ height, unlucky: sorted.
Relation to quicksort: BST sort & quick sort (basic version) make same comparisons but in a different order.
Randomized BST sort
- Randomly permute $A$
BST-sort(A)
Time equals to randomized quick-sort.
Randomly-built BST
tree resulting from randomized BST sort.
Time $=\displaystyle \sum_{x \in T} d(x)$. $E[T(n)]=n\lg n \implies$
\[E\left[ \frac{1}{n}\sum_{x\in T} d(x) \right] = \Theta(\lg n)\]Thm. $E[\text{height of randomly-built BST}] = O(\lg n)$.
Proof outline:
- Prove Jensen’s inequality: \(f(E[X])\leqslant E[f(X)]\) for convex function $f$.
- Instead of analyzing $X_{n}=$ r.v. of height of BST $n$ trees , analyze $Y_{n} =2^{X_{n}}$.
- Prove that $E[Y_{n}]=O(n^3)$.
- Conclude that \(O(n^3)=E[2^{X_{n}}]\geqslant 2^{E[X_{n}]}\) \(\implies E[X_{n}] \leqslant O(\lg n)\) Proof:
$f:\mathbb R \to \mathbb R$ is convex if
\[f(tx+(1-y)t)\leqslant tf(x)+(1-t)f(y), 0\leqslant t\leqslant 1\]Lemma: if $f: \mathbb R \to \mathbb R$ is convex and $x_{1},x_{2},\dots,x_{n} \in \mathbb R, t_{1},t_{2},\dots, t_{n} \geqslant 0,\sum_{i=1}^{n}t_{i}=1$,
\[f\left( \sum_{i=1}^{n}t_{i}x_{i} \right)\leqslant \sum_{i=1}^{n}t_{i} f(x_{i})\]Proof: Induction:
- Base $n=1$: trival.
Jensen’s inequation:
\[f(E[X])\leqslant E[f(X)],\text{if convex and }X\text{ is integer r.v.}\]Proof:
\[E[X]=\sum_{-\infty}^{+\infty}P(X=i)\cdot i\]According to the lamma and $\sum_{-\infty}^{+\infty}P(X=i)=1$,
\[f(E[X])\leqslant \sum_{-\infty}^{+\infty}P(X=i)f(i)\]And
\[\sum_{-\infty}^{+\infty}yP(f(X)=y)=\sum_{-\infty}^{+\infty}y \sum_{k:f(k)=y} P(X=k) =E[f(X)] .\square\]Expected BST height analysis: $X_{n}=$ r.v. of height of randomly built BST on $n$ nodes, $Y_{n}=2^{X_{n}}$.
If root $r$ has rank $k$, then
\[X_{n} =1+\max(X_{k-1},X_{n-k})\] \[Y_{n}=2\max(Y_{k-1},Y_{n-k})\]define indicator r.v.’s $Z_{nk}$;
\[Z_{nk}=\begin{cases} 1, &\text{if root has rank }k \newline 0, &\text{otherwise} \end{cases}\] \[P(Z_{nk}=1)=E[Z_{nk}]=\frac{1}{n}\] \[\begin{align} E(Y_{n})&= E\left(\sum_{k=1}^{n}Z_{nk}(2\max (Y_{k-1},Y_{n-k}))\right) \newline &= \sum_{k=1}^{n}E(2Z_{nk}\max (Y_{k-1},Y_{n-k})) \newline &= \frac{2}{n}\sum_{k=1}^{n}E[\max (Y_{k-1},Y_{n-k})] \newline &\leqslant \frac{2}{n}\sum_{k=1}^{n}E[Y_{k-1}+Y_{n-k}] \newline &= \frac{4}{n} \sum_{k=1}^{n}Y_{k-1} \end{align}\]Claim: $E[Y_{n}]\leqslant cn^3$. Proof: substitution method.
- Base: $n=\Theta(1)$, true for $c$ sufficiently large.
- Step: \(\begin{align} E[Y_{k}] &\leqslant \frac{4}{n} \sum_{k=0}^{n-1}Y_{k} \newline &\leqslant \frac{4}{n} \sum_{k=0}^{n-1}ck^3 \newline &\leqslant \frac{4c}{n}\int_{0}^{n}x^3 \mathrm{d}x \newline &= cn^3 \quad \square \end{align}\)
Comments