1 minute read

Video Link

Dynamic multithreading.

Ex. $Fib(n)$.

if n < 2
  then return n
x<-spawn Fib(n-1)
y<-spawn Fib(n-2)
sync
return (x+y)

Spawn - subroutine can execute at the same time as its parent.

Sync - wait until all children are done.

Logical parellelism, not actual. A scheduler determines hwo ti map dynamically unfolding execution to onto whatever processors.

Multi-threaded computation

Parallel instruction stream = dag. Vertices are threads - maximal seq of instructions not containing parallel control (spawn,sync,return).

Performance measures: $T_{p}=$ running time on $p$ processors. $T_{1}=$work = serial time.$T_{\infty}=$ critical-path length = longest path.

Ex. $Fib(4)$ (unit-time threads)

  • $T_{1}=17$
  • $T_{\infty}=8$

Lower bounds on $T_{p}$:

  • $T_{p}\geqslant \frac{T_{1}}{p}$ - $p$ processors can do at most $p$ work in one step.
  • $T_{p}\geqslant T_{\infty}$ - $p$ processors can’t do more work than $\infty$ processors .

Speedup: $\frac{T_{1}}{T_{P}}=$ speed up on $p$ processors. $\frac{T_{1}}{T_{p}}=\Theta(p)\implies$ linear speedup. If $\frac{T_{1}}{T_{p}}>p\implies$ super linear speedup, not possible in our model.

Max possible speedup gice $T_{1},T_{\infty}$ is $\frac{T_{1}}{T_{\infty}}$ = parallelism = average amount of work can be done in parellel along each step of the critical path = $\bar{p}$.

Scheduling

Map computation to $p$ processors. Done by runtime system. On-line schedulers are complex. Illustrate ideas using off-line scheduler.

Greedy scheduler: Do as much as possible on every step.

  • Complete step: at least $p$ threads ready to run.Execute any $p$.
  • Incomplete step: $<p$ threads ready to run. Execute all of them.

Theorem (Graham,Brent) A greedy schedule executes any computation $G$ with work $T_{1}$ and critical path length $T_{\infty}$.

\[T_{p}\leqslant \frac{T_{1}}{p}+T_{\infty}\]

on a acomputer with $p$ processors.

Proof: # Complete steps $\leqslant \frac{T_{1}}{p}$. Consider an incomplete step and let $G’$ be subgraph of $G$ that remains to be executed.

Threads with in-degree 0 in $G’$ are ready to be executed. The critical path len of $G’$ is reduced by one $\implies$ # incomplete steps is almost $T_{\infty}$.

Corollary: Linear speadup when $p=O(\bar{p})$.

\[\bar{p}=\frac{T_{1}}{T_{\infty}}\implies p=O\left( \frac{T_{1}}{T_{p}} \right)\implies T_{\infty}=O\left( \frac{T_{1}}{p} \right)\]

Comments