1 minute read

Viedo Link

BST sort

for i <- to n
  do Tree-Insert(T,A[i])
Inorder-Tree-Walk(root[T])

Time: $O(n)$ for walk, $\Omega(n\lg n)$ and $O(n^2)$ for $n$ Tree-Inserts. Lucy: balanced $O(\lg n)$ height, unlucky: sorted.

Relation to quicksort: BST sort & quick sort (basic version) make same comparisons but in a different order.

Randomized BST sort

  1. Randomly permute $A$
  2. BST-sort(A)

Time equals to randomized quick-sort.

Randomly-built BST

tree resulting from randomized BST sort.

Time $=\displaystyle \sum_{x \in T} d(x)$. $E[T(n)]=n\lg n \implies$

\[E\left[ \frac{1}{n}\sum_{x\in T} d(x) \right] = \Theta(\lg n)\]

Thm. $E[\text{height of randomly-built BST}] = O(\lg n)$.

Proof outline:

  1. Prove Jensen’s inequality: \(f(E[X])\leqslant E[f(X)]\) for convex function $f$.
  2. Instead of analyzing $X_{n}=$ r.v. of height of BST $n$ trees , analyze $Y_{n} =2^{X_{n}}$.
  3. Prove that $E[Y_{n}]=O(n^3)$.
  4. Conclude that \(O(n^3)=E[2^{X_{n}}]\geqslant 2^{E[X_{n}]}\) \(\implies E[X_{n}] \leqslant O(\lg n)\) Proof:

$f:\mathbb R \to \mathbb R$ is convex if

\[f(tx+(1-y)t)\leqslant tf(x)+(1-t)f(y), 0\leqslant t\leqslant 1\]

Lemma: if $f: \mathbb R \to \mathbb R$ is convex and $x_{1},x_{2},\dots,x_{n} \in \mathbb R, t_{1},t_{2},\dots, t_{n} \geqslant 0,\sum_{i=1}^{n}t_{i}=1$,

\[f\left( \sum_{i=1}^{n}t_{i}x_{i} \right)\leqslant \sum_{i=1}^{n}t_{i} f(x_{i})\]

Proof: Induction:

  • Base $n=1$: trival.
\[\begin{align} f\left( \sum_{k=1}^{n}t_{i}x_{i} \right)&=f\left( t_{n}x_{n}+(1-t_{n})\sum_{i=1}^{n-1} \frac{t_{i}}{1-t_{n}}x_{k} \right) \newline & \leqslant t_{n}f(x_{n})+(1-t_{n})f\left(\sum_{i=1}^{n-1} \frac{t_{i}}{1-t_{n}}x_{k}\right) \newline & \leqslant t_{n} f(x_{n})+(1-t_{n})\sum_{i=1}^{n-1} \frac{t_{i}}{1-t_{n}}x_{k} \newline &= \sum_{i=1}^{n}t_{i} f(x_{i}) \end{align}\]

Jensen’s inequation:

\[f(E[X])\leqslant E[f(X)],\text{if convex and }X\text{ is integer r.v.}\]

Proof:

\[E[X]=\sum_{-\infty}^{+\infty}P(X=i)\cdot i\]

According to the lamma and $\sum_{-\infty}^{+\infty}P(X=i)=1$,

\[f(E[X])\leqslant \sum_{-\infty}^{+\infty}P(X=i)f(i)\]

And

\[\sum_{-\infty}^{+\infty}yP(f(X)=y)=\sum_{-\infty}^{+\infty}y \sum_{k:f(k)=y} P(X=k) =E[f(X)] .\square\]

Expected BST height analysis: $X_{n}=$ r.v. of height of randomly built BST on $n$ nodes, $Y_{n}=2^{X_{n}}$.

If root $r$ has rank $k$, then

\[X_{n} =1+\max(X_{k-1},X_{n-k})\] \[Y_{n}=2\max(Y_{k-1},Y_{n-k})\]

define indicator r.v.’s $Z_{nk}$;

\[Z_{nk}=\begin{cases} 1, &\text{if root has rank }k \newline 0, &\text{otherwise} \end{cases}\] \[P(Z_{nk}=1)=E[Z_{nk}]=\frac{1}{n}\] \[\begin{align} E(Y_{n})&= E\left(\sum_{k=1}^{n}Z_{nk}(2\max (Y_{k-1},Y_{n-k}))\right) \newline &= \sum_{k=1}^{n}E(2Z_{nk}\max (Y_{k-1},Y_{n-k})) \newline &= \frac{2}{n}\sum_{k=1}^{n}E[\max (Y_{k-1},Y_{n-k})] \newline &\leqslant \frac{2}{n}\sum_{k=1}^{n}E[Y_{k-1}+Y_{n-k}] \newline &= \frac{4}{n} \sum_{k=1}^{n}Y_{k-1} \end{align}\]

Claim: $E[Y_{n}]\leqslant cn^3$. Proof: substitution method.

  • Base: $n=\Theta(1)$, true for $c$ sufficiently large.
  • Step: \(\begin{align} E[Y_{k}] &\leqslant \frac{4}{n} \sum_{k=0}^{n-1}Y_{k} \newline &\leqslant \frac{4}{n} \sum_{k=0}^{n-1}ck^3 \newline &\leqslant \frac{4c}{n}\int_{0}^{n}x^3 \mathrm{d}x \newline &= cn^3 \quad \square \end{align}\)

Comments