dot \(= (atom(V_0)\ \&\ atom(W_0))\) | \(?\ \ V_0*W_0 \) |
\(;\ \ +\ :\ !\ \text{dot} \ :\ d^2_{lr}\) |
dot = PDC( | divide = \(d_{lr}\), combine = \(+\), |
pre\( = id\), post\(\ = id \) | |
\(p_b = (atom(V)\ \&\ atom(W) ?,\ \ f_b =\ V\ *\ W\ \) | |
) |
reverse \( X\ =\ (atom?\) \(id\) \(c_{lr}\ :\ (!\ other\ :\ \#\ !\ corr)\ :\ d_{lr}\ X \)
and in Divacon code:
reverse = PDC( \( \text{divide}=d_{lr}, \text{combine}=c_{lr},\) \(\text{preadjust}=id, \text{postadjust}=(!\ \text{other}:\ \#\ !\ corr)\) \(\text{basepred}= \text{atom}?,\text{ basefunc}=id\ )\)
This is George's algorithm: if you keep dividing the inputs eventually you get singleton vectors. Then reversing is done on the way back up the tree: having combined two vectors together at the lower level, collect on each side the corresponding opposite partner from the other side and on that side choose the other of the two, thus swapping each side across wholesale. Then not only have you reversed the innermost pairs, but as you go up the corresponding larger vectors are reversed.
Interestingly the "adjustment" can be done as either postadjust or preadjust, reversing outside in or inside out, but in either case thoroughly.
and in Divacon code:
\((scan\ +) X\ =\ (atom?\) \(id\) \(c_{lr} : (id,!+)\ :\ \#\ (null,last\ 0)\ :\ !\ (scan\ +)\ :\ d_{lr}\ X \)
In English: keep dividing the input array in half forming a binary tree down to singletons. Go up the tree handling each pair of vectors by communicating the last of the 0'th vector into tuples with each member of the 1'th (second) vector, then summing the tuple to form a subtree scan result. The first vector receives no communications, indicated by \(nil\) in the communication process, and experiences no local operations, indicated by \(id\) in the local operation process.
(scan +) = PDC( \( \text{divide}=d_{lr}, \text{combine}=c_{lr},\) \(\text{preadjust}=id, \text{postadjust}=(id,! +):\#(nil,last\ 0)\) \(\text{basepred}= \text{atom}?,\text{ basefunc}=id\ )\)
George also offers an alternative PDC definition for scan in the thesis on p84 to cut down the O(n) communication implicit in the last 0. However its central step hscan + comprises a cascade of eight component subprocesses, and I'm setting it to be my homework to ask him to work through them with me. Result here.
\(N=[N'\ N'']^T\) and \(K=[K'\ K'']^T\) whereand define
\(N'=f(n-1)\) and \(N''=f(n-2)\),
$$ \begin{align} N\ \#\ K\ \ &\text{≜} \begin{bmatrix} N \cdot (K+1)\\ N \cdot K \end{bmatrix} =\begin{bmatrix} (N+K)'\\ (N+K)'' \end{bmatrix} \end{align}$$(Derivation here).
The Fibonacci recurrence
\(F(n+k) = F(k)F(n-2) + F(k+1)F(n-1)\)enables doubling the index of results by calculating in the domain F(n+n), or in the codomain \(N\# N = (2N) = [F(2n-1),F(2n-2)]^T.\)
Define an even/odd division based not on the position of an element in the input vector being in an even or odd point in the order, but on the value of the element which is itself an even or odd number: \(d_{01}\).
$$\begin{align} d_{01} X\ =\ Y\ Z\ |\ & \forall\ x=(b,k,N,K) \in X\ \\ \text{if}\ & x.k = (x.k\gg 1\ll 1)\ \ \text{// last bit is 0}\\ \text{then}\ & x\ \in\ Y,\\ \text{else}\ & x\ \in\ Z \end{align}$$Then we can build a Parallel Divide and Conquer or PDC, higher level function:
Given an array of indexes \(k\) this PDC returns a shuffled array of tuples \((k,f(k))\).
! f = PDC( divide = \(d_{01}\), combine = \(c_{lr},\)
pre = \((b=k\&\text{0x1}, k:=k\gg 1, N:=N\# N,\ K:=(b\ ?\ N\# K;K)\ )\),
post = \( \forall\ v\ \in Y \cup Z\ \ \ v:=( v.b; v.k\ll 1+v.b, v.K )\)
basepred = \((k=0)\),
basefunc = \((b:=0,\ k:=k,\ f(k):=K'+K'')\)
)
Further discussion here.
\(F(b,N,X) = \large\sum_{s=0}^{N-1} X_s * e^{isTb/N}\)This makes use of several elements:
\(F_{b,N} X\) | \( = \large\sum_{s=0}^{N-1} X_s * (S^b)^s \) | |
\( = \large\sum_{s=0}^{\frac{N}{2}-1} X_{2s} * (S^b)^{2s}\) | \(\ \ +\ \large\sum_{s=0}^{\frac{N}{2}-1} X_{(2s+1)} * (S^b)^{(2s+1)}\) | |
\( =\ \ \ \ \ "\ \ \) | \(\ \ +\ \large\sum_{s=0}^{\frac{N}{2}-1} X_{(2s+1)} * S^b * (S^b)^{(2s)}\) | |
\( = \large F_{2b,\frac{N}{2}} X_{evens} \) | \(\ \ +\ \large S^b * F_{2b,\frac{N}{2}} X_{odds}\) |
Since \(F_b\) is inside and outside, this can be transcribed as a PDC parallelization of \(F_b\).
In Divacon notation:
and in Divacon code:
\(F_{b,N}\ X\ = \) atom ? (Re=id, Im=0, b=0) \(c_{eo} : (\text{self}+S^b*\text{other, self}-S^b*\text{other}, 2b)\ :\ \#\ !\ corr\ :\ !\ F_{2b2,\frac{N}{2}}\ :\ d_{eo}\ X \)
One way to check a PDC is whether the inputs and outputs of each phase of the chain are of matching structure, size, and type.
pft = PDC( \( \text{divide}=d_{eo}, \text{combine}=c_{eo},\) \(\text{preadjust}=id, \text{postadjust}=!(\text{self}+\text{other}*e^{-iTb/N}, \text{self}+\text{other}*e^{iTb/N}, 2b) : \# ! \text{corr}, \) \(\text{basepred}=\text{atom}?,\text{ basefunc}=( (Re=id, Im=0), (Re=id,Im=0), b=0)\ )\)
Further discussion of FFT algorithm, interpretation, use, and history is here.
A Divacon tutorial is here.