Divacon homework algorithms

including reverse, Fibonacci, and FFT in time O(log(N))

On Mou
on Divide and Conquer

Dot product

Dot product, a sum of corresponding element products between two n-length vectors, should be a slam-dunk O(log(n)) time cost process in parallel. Divide each of the two input vectors in half and put them pairwise back into correspondence with each other, and at the bottom of the division tree take their products all in parallel in the base predicate, and then just add up partial sums in the post adjustment.
dot \(= (atom(V_0)\ \&\ atom(W_0))\)\(?\ \ V_0*W_0 \)
\(;\ \ +\ :\ !\ \text{dot} \ :\ d^2_{lr}\)

dot = PDC(divide = \(d_{lr}\), combine = \(+\),
pre\( = id\), post\(\ = id \)
\(p_b = (atom(V)\ \&\ atom(W) ?,\ \ f_b =\ V\ *\ W\ \)
)
Any questions?

Reverse

reverse \( X\ =\ (atom?\)    \(id\)
otherwise   \(c_{lr}\ :\ (!\ other\ :\ \#\ !\ corr)\ :\ d_{lr}\ X \)

and in Divacon code:

reverse = PDC(\( \text{divide}=d_{lr}, \text{combine}=c_{lr},\)
\(\text{preadjust}=id, \text{postadjust}=(!\ \text{other}:\ \#\ !\ corr)\)
\(\text{basepred}= \text{atom}?,\text{ basefunc}=id\ )\)

This is George's algorithm: if you keep dividing the inputs eventually you get singleton vectors. Then reversing is done on the way back up the tree: having combined two vectors together at the lower level, collect on each side the corresponding opposite partner from the other side and on that side choose the other of the two, thus swapping each side across wholesale. Then not only have you reversed the innermost pairs, but as you go up the corresponding larger vectors are reversed.

Interestingly the "adjustment" can be done as either postadjust or preadjust, reversing outside in or inside out, but in either case thoroughly.

scan

\((scan\ +) X\ =\ (atom?\)    \(id\)
otherwise   \(c_{lr} : (id,!+)\ :\ \#\ (null,last\ 0)\ :\ !\ (scan\ +)\ :\ d_{lr}\ X \)
and in Divacon code:

(scan +) = PDC(\( \text{divide}=d_{lr}, \text{combine}=c_{lr},\)
\(\text{preadjust}=id, \text{postadjust}=(id,! +):\#(nil,last\ 0)\)
\(\text{basepred}= \text{atom}?,\text{ basefunc}=id\ )\)

In English: keep dividing the input array in half forming a binary tree down to singletons. Go up the tree handling each pair of vectors by communicating the last of the 0'th vector into tuples with each member of the 1'th (second) vector, then summing the tuple to form a subtree scan result. The first vector receives no communications, indicated by \(nil\) in the communication process, and experiences no local operations, indicated by \(id\) in the local operation process.

George also offers an alternative PDC definition for scan in the thesis on p84 to cut down the O(n) communication implicit in the last 0. However its central step hscan + comprises a cascade of eight component subprocesses, and I'm setting it to be my homework to ask him to work through them with me. Result here.

Fibonacci in O(log(N)) time

Let the domain of Fibonacci indexes such as \(n\) and \(k\) be mapped to a codomain of 2-vectors of Fibonacci summands
\(N=[N'\ N'']^T\) and \(K=[K'\ K'']^T\) where
\(N'=f(n-1)\) and \(N''=f(n-2)\),
and define
$$ \begin{align} N\ \#\ K\ \ &\text{≜} \begin{bmatrix} N \cdot (K+1)\\ N \cdot K \end{bmatrix} =\begin{bmatrix} (N+K)'\\ (N+K)'' \end{bmatrix} \end{align}$$
(Derivation here).

The Fibonacci recurrence

\(F(n+k) = F(k)F(n-2) + F(k+1)F(n-1)\)
enables doubling the index of results by calculating in the domain F(n+n), or in the codomain \(N\# N = (2N) = [F(2n-1),F(2n-2)]^T.\)

Define an even/odd division based not on the position of an element in the input vector being in an even or odd point in the order, but on the value of the element which is itself an even or odd number: \(d_{01}\).

$$\begin{align} d_{01} X\ =\ Y\ Z\ |\ & \forall\ x=(b,k,N,K) \in X\ \\ \text{if}\ & x.k = (x.k\gg 1\ll 1)\ \ \text{// last bit is 0}\\ \text{then}\ & x\ \in\ Y,\\ \text{else}\ & x\ \in\ Z \end{align}$$
Then we can build a Parallel Divide and Conquer or PDC, higher level function:
! f = PDC(divide = \(d_{01}\), combine = \(c_{lr},\)
pre = \((b=k\&\text{0x1}, k:=k\gg 1, N:=N\# N,\ K:=(b\ ?\ N\# K;K)\ )\),
post = \( \forall\ v\ \in Y \cup Z\ \ \ v:=( v.b; v.k\ll 1+v.b, v.K )\)
basepred = \((k=0)\),
basefunc = \((b:=0,\ k:=k,\ f(k):=K'+K'')\)
)
Given an array of indexes \(k\) this PDC returns a shuffled array of tuples \((k,f(k))\).

Further discussion here.

Parallel Fourier Transform

For \(b\ in\ 0..(N/2)-1 \), let the discrete Fourier transform be

\(F(b,N,X) = \large\sum_{s=0}^{N-1} X_s * e^{isTb/N}\)

This makes use of several elements: The nub of the problem is to separate even and odd terms, which can be calculated separately and added, and then to notice the odd sum can be factored, pulling out our base step angle \(S^b = e^{iTb/N}\) leaving the identical evens' formula, or calculation process, but using the odds' data. Since the subproblems are half the size, the step angle doubles from \(S^b\) to \((S^b)^2\), to reach the same number of rotations after only half the steps.

\(F_{b,N} X\) \( = \large\sum_{s=0}^{N-1} X_s * (S^b)^s \)
\( = \large\sum_{s=0}^{\frac{N}{2}-1} X_{2s} * (S^b)^{2s}\)\(\ \ +\ \large\sum_{s=0}^{\frac{N}{2}-1} X_{(2s+1)} * (S^b)^{(2s+1)}\)
\( =\ \ \ \ \ "\ \ \)\(\ \ +\ \large\sum_{s=0}^{\frac{N}{2}-1} X_{(2s+1)} * S^b * (S^b)^{(2s)}\)
\( = \large F_{2b,\frac{N}{2}} X_{evens} \)\(\ \ +\ \large S^b * F_{2b,\frac{N}{2}} X_{odds}\)

Since \(F_b\) is inside and outside, this can be transcribed as a PDC parallelization of \(F_b\).

In Divacon notation:

\(F_{b,N}\ X\ = \) atom ?   (Re=id, Im=0, b=0)
otherwise  \(c_{eo} : (\text{self}+S^b*\text{other, self}-S^b*\text{other}, 2b)\ :\ \#\ !\ corr\ :\ !\ F_{2b2,\frac{N}{2}}\ :\ d_{eo}\ X \)
and in Divacon code:

pft = PDC(\( \text{divide}=d_{eo}, \text{combine}=c_{eo},\)
\(\text{preadjust}=id, \text{postadjust}=!(\text{self}+\text{other}*e^{-iTb/N}, \text{self}+\text{other}*e^{iTb/N}, 2b) : \# ! \text{corr}, \)
\(\text{basepred}=\text{atom}?,\text{ basefunc}=( (Re=id, Im=0), (Re=id,Im=0), b=0)\ )\)

One way to check a PDC is whether the inputs and outputs of each phase of the chain are of matching structure, size, and type.
Let X, C, and I represent Real numbers, Complex numbers, and Integers. Then: Check: do the inputs and outputs of each processing element match up? Looks okay to me, but I'm sleepy.
Implicit, except in basefunc, is that the values in postadjust are complex numbers.
The result array is not just a single-frequency FT, dependent on \(b\), but an array of imaginary values each representing the magnitude and phase of a frequency component for \(b \in 0..\frac{N}{2}-1\).

Further discussion of FFT algorithm, interpretation, use, and history is here.

A Divacon tutorial is here.

Your thoughts?
(will not be shared or abused)
Comment:
                                          Feedback is welcome.
Copyright © 2024 Thomas C. Veatch. All rights reserved.
Created: June 13, 2024; Modified June 17, 2024