3 minute read

Video Link

given $n$ elems in array, find the $k$th smallest element (elem of rank $k$).

  • $k=1$ : mininum
  • $k=n$ : maximum
  • $k=\lfloor \frac{n+1}{2} \rfloor$ or $\lceil \frac{n+1}{2} \rceil$: median

naive: sort and return the $k$th elem. $O(n\lg n)$

Randomized divide and conquer

def RandSelect(A, p, q, i):
  """
  ith smallest in A[p,...,q]
  """
  if p == q:
     return A[p]
  r = RandPartition(A, p, q)
  k = r - p + 1
  if i == k:
     return A[r]
  elif i < k:
     return RandSelect(A, p, r - 1, i)
  else
     return RandSelect(A, r + 1, q, i - k)

Intuition for analysis (today assume elems are distinct):

  • lucky case: \(T(n) = T(\alpha n)+\Theta(n)\) if $\alpha <1$, case 3: $T(n)=\Theta(n)$.
  • unlucky case: split $0:n-1$. \(T(n)=T(n-1)+\Theta(n)\) $T(n)=\Theta(n^2)$.

Analysis of expected time:

  • assume random partition be chosen independently \(X_{k} =\begin{cases} 1 &, \text{if partition generates} \ k:n-k-1 \ \text{split} \\ 0 &, \text{otherwise} \end{cases}\)
\[\begin{align} T(n) &=T(\max(k,n-1-k))+\Theta(n), \text{if} \ k:n-1-k \ \text{split} \newline &= \sum_{k=0}^{n-1} X_{k}(T(\max(k,n-1-k))+\Theta(n)) \end{align}\] \[\begin{align} E[T(n)] &= E\left[\sum_{k=0}^{n-1}X_{k}(T(\max(k,n-1-k))+\Theta(n))\right] \newline &= \sum_{k=0}^{n-1}E\left[X_{k}(T(\max(k,n-1-k))+\Theta(n))\right] \newline &= \sum_{k=0}^{n-1}E[X_{k}]\cdot E[T(\max(k,n-1-k))+\Theta(n)] \ (\text{assumption}) \newline &= \frac{1}{n}\sum_{k=0}^{n-1}E[T(\max(k,n-1-k))+\Theta(n)] \newline &\leqslant \frac{2}{n} \sum_{k=\lfloor \frac{n}{2} \rfloor}^{n-1}E[T(k)]+\Theta(n) \end{align}\]

Claim: $E[T(n)]\leqslant c\cdot n$ for some $c>0$. Proof: substitution method,assume true for $<n$.

\[\begin{align} E[T(n)] &\leqslant \frac{2}{n} \sum_{k=\lfloor \frac{n}{2} \rfloor}^{n-1}E[T(k)]+\Theta(n) \newline &\leqslant \frac{2}{n}\sum_{k=\lfloor \frac{n}{2} \rfloor}^{n-1}ck+\Theta(n) \newline &\leqslant \frac{3c}{4}n +\Theta(n) \end{align}\]

need $\frac{c}{4}n > \Theta(n)$. that’s true for $c$ sufficiently large.

Worst-case linear-time order statistics

  • Blum, Floyd, Pratt, Rivest and Tarjan (BFPRT algorithm)
  • idea: generate good pivot recursively

Select($1,\dots,n$):

  1. Divide the $n$ elems into $\lfloor \frac{n}{5} \rfloor$ groups of 5 elems each. Find the median of each group($\Theta(n)$).
  2. Recursively select the median $x$ of the $\lfloor \frac{n}{5} \rfloor$ group medians(using BFPRT) ($T(\frac{n}{5})$)
  3. Partition with $x$ as pivot and let $k=\mathrm{rank}(x)$.
  4. if $i=k$ then return $x$; if $i<k$ then return …(same as randomized select).

according to steps 1, 2, at least $3 \lfloor \lfloor \frac{n}{5} \rfloor /2 \rfloor$ elems $\leqslant x$ ($\geqslant x$).

  • we have $\lfloor \lfloor \frac{n}{5} \rfloor /2 \rfloor$ group medians $\leqslant x$ and for each group median 3 elems equal or larger. simplification: for $n\geqslant 50,3 \lfloor \lfloor \frac{n}{5} \rfloor /2 \rfloor \geqslant 3\lfloor \frac{n}{10} \rfloor \geqslant \frac{3n}{4}$.(thus $\frac{3}{4}n+\frac{1}{5}n <n$).
\[T(n)\leqslant T\left( \frac{n}{5} \right)+T\left( \frac{3}{4}n \right)+\Theta(n)\]

Proof: assume $T(n)\leqslant cn$.

\[T(n)\leqslant \frac{19c}{20}n+\Theta(n)\]

$\implies T(n)\leqslant cn$ if $\frac{1}{20}cn\geqslant \Theta(n)$. And that’s true if $c$ is sufficiently large.

Ex: why group of 5?

int InsertSort(int array[], int left, int right);
int GetPivotIndex(int array[], int left, int right);
int Partition(int array[], int left, int right, int pivot_index);
int BFPRT(int array[], int left, int right, int k);

int InsertSort(int array[], int left, int right)
{
    int temp;
    int j;
    for (int i = left + 1; i <= right; i++)
    {
        temp = array[i];
        j = i - 1;
        while (j >= left && array[j] > temp)
        {
            array[j + 1] = array[j];
            j--;
        }
        array[j + 1] = temp;
    }
    return ((right - left) >> 1) + left; // median
}

int GetPivotIndex(int array[], int left, int right)
{
    if (right - left < 5)
        return InsertSort(array, left, right);
    int sub_right = left - 1;
    for (int i = left; i + 4 <= right; i += 5)
    {
        int index = InsertSort(array, i, i + 4);
        swap(array[++sub_right], array[index]);
    }
    return BFPRT(array, left, sub_right, ((sub_right - left + 1) >> 1) + 1);
}

int Partition(int array[], int left, int right, int pivot_index)
{
    swap(array[pivot_index], array[right]);
    int partition_index = left;
    for (int i = left; i < right; i++)
    {
        if (array[i] < array[right])
        {
            swap(array[partition_index++], array[i]);
        }
    }
    swap(array[partition_index], array[right]);
    return partition_index;
}

int BFPRT(int array[], int left, int right, int k)
{
    int pivot_index = GetPivotIndex(array, left, right);
    int partition_index = Partition(array, left, right, pivot_index);
    int num = partition_index - left + 1;
    if (num == k)
        return partition_index;
    else if (num > k)
        return BFPRT(array, left, partition_index - 1, k);
    else
        return BFPRT(array, partition_index + 1, right, k - num);
}

Comments