Introduction to Algorithms/Order Statistics, Median
given $n$ elems in array, find the $k$th smallest element (elem of rank $k$).
- $k=1$ : mininum
- $k=n$ : maximum
- $k=\lfloor \frac{n+1}{2} \rfloor$ or $\lceil \frac{n+1}{2} \rceil$: median
naive: sort and return the $k$th elem. $O(n\lg n)$
Randomized divide and conquer
def RandSelect(A, p, q, i):
"""
ith smallest in A[p,...,q]
"""
if p == q:
return A[p]
r = RandPartition(A, p, q)
k = r - p + 1
if i == k:
return A[r]
elif i < k:
return RandSelect(A, p, r - 1, i)
else
return RandSelect(A, r + 1, q, i - k)
Intuition for analysis (today assume elems are distinct):
- lucky case: \(T(n) = T(\alpha n)+\Theta(n)\) if $\alpha <1$, case 3: $T(n)=\Theta(n)$.
- unlucky case: split $0:n-1$. \(T(n)=T(n-1)+\Theta(n)\) $T(n)=\Theta(n^2)$.
Analysis of expected time:
- assume random partition be chosen independently \(X_{k} =\begin{cases} 1 &, \text{if partition generates} \ k:n-k-1 \ \text{split} \\ 0 &, \text{otherwise} \end{cases}\)
Claim: $E[T(n)]\leqslant c\cdot n$ for some $c>0$. Proof: substitution method,assume true for $<n$.
\[\begin{align} E[T(n)] &\leqslant \frac{2}{n} \sum_{k=\lfloor \frac{n}{2} \rfloor}^{n-1}E[T(k)]+\Theta(n) \newline &\leqslant \frac{2}{n}\sum_{k=\lfloor \frac{n}{2} \rfloor}^{n-1}ck+\Theta(n) \newline &\leqslant \frac{3c}{4}n +\Theta(n) \end{align}\]need $\frac{c}{4}n > \Theta(n)$. that’s true for $c$ sufficiently large.
Worst-case linear-time order statistics
- Blum, Floyd, Pratt, Rivest and Tarjan (BFPRT algorithm)
- idea: generate good pivot recursively
Select($1,\dots,n$):
- Divide the $n$ elems into $\lfloor \frac{n}{5} \rfloor$ groups of 5 elems each. Find the median of each group($\Theta(n)$).
- Recursively select the median $x$ of the $\lfloor \frac{n}{5} \rfloor$ group medians(using BFPRT) ($T(\frac{n}{5})$)
- Partition with $x$ as pivot and let $k=\mathrm{rank}(x)$.
- if $i=k$ then return $x$; if $i<k$ then return …(same as randomized select).
according to steps 1, 2, at least $3 \lfloor \lfloor \frac{n}{5} \rfloor /2 \rfloor$ elems $\leqslant x$ ($\geqslant x$).
- we have $\lfloor \lfloor \frac{n}{5} \rfloor /2 \rfloor$ group medians $\leqslant x$ and for each group median 3 elems equal or larger. simplification: for $n\geqslant 50,3 \lfloor \lfloor \frac{n}{5} \rfloor /2 \rfloor \geqslant 3\lfloor \frac{n}{10} \rfloor \geqslant \frac{3n}{4}$.(thus $\frac{3}{4}n+\frac{1}{5}n <n$).
Proof: assume $T(n)\leqslant cn$.
\[T(n)\leqslant \frac{19c}{20}n+\Theta(n)\]$\implies T(n)\leqslant cn$ if $\frac{1}{20}cn\geqslant \Theta(n)$. And that’s true if $c$ is sufficiently large.
Ex: why group of 5?
int InsertSort(int array[], int left, int right);
int GetPivotIndex(int array[], int left, int right);
int Partition(int array[], int left, int right, int pivot_index);
int BFPRT(int array[], int left, int right, int k);
int InsertSort(int array[], int left, int right)
{
int temp;
int j;
for (int i = left + 1; i <= right; i++)
{
temp = array[i];
j = i - 1;
while (j >= left && array[j] > temp)
{
array[j + 1] = array[j];
j--;
}
array[j + 1] = temp;
}
return ((right - left) >> 1) + left; // median
}
int GetPivotIndex(int array[], int left, int right)
{
if (right - left < 5)
return InsertSort(array, left, right);
int sub_right = left - 1;
for (int i = left; i + 4 <= right; i += 5)
{
int index = InsertSort(array, i, i + 4);
swap(array[++sub_right], array[index]);
}
return BFPRT(array, left, sub_right, ((sub_right - left + 1) >> 1) + 1);
}
int Partition(int array[], int left, int right, int pivot_index)
{
swap(array[pivot_index], array[right]);
int partition_index = left;
for (int i = left; i < right; i++)
{
if (array[i] < array[right])
{
swap(array[partition_index++], array[i]);
}
}
swap(array[partition_index], array[right]);
return partition_index;
}
int BFPRT(int array[], int left, int right, int k)
{
int pivot_index = GetPivotIndex(array, left, right);
int partition_index = Partition(array, left, right, pivot_index);
int num = partition_index - left + 1;
if (num == k)
return partition_index;
else if (num > k)
return BFPRT(array, left, partition_index - 1, k);
else
return BFPRT(array, partition_index + 1, right, k - num);
}
Comments