<i>Divacon</i> homework algorithms

Dot product

Dot product, a sum of corresponding element products between two n-length vectors, should be a slam-dunk O(log(n)) time cost process in parallel. Divide each of the two input vectors in half and put them pairwise back into correspondence with each other, and at the bottom of the division tree take their products all in parallel in the base predicate, and then just add up partial sums in the post adjustment.


dot $= (atom(V_0)\ \&\ atom(W_0))$	$?\ \ V_0*W_0 $
	$;\ \ +\ :\ !\ \text{dot} \ :\ d^2_{lr}$

dot = PDC(	divide = $d_{lr}$, combine = $+$,
	pre$ = id$, post$\ = id $
	$p_b = (atom(V)\ \&\ atom(W) ?,\ \ f_b =\ V\ *\ W\ $
	)

Any questions?

Reverse

reverse $ X\ =\ (atom?$ $id$

$c_{lr}\ :\ (!\ other\ :\ \#\ !\ corr)\ :\ d_{lr}\ X $

and in Divacon code:

reverse = PDC( $ \text{divide}=d_{lr}, \text{combine}=c_{lr},$

$\text{preadjust}=id, \text{postadjust}=(!\ \text{other}:\ \#\ !\ corr)$

$\text{basepred}= \text{atom}?,\text{ basefunc}=id\ )$

This is George's algorithm: if you keep dividing the inputs eventually you get singleton vectors. Then reversing is done on the way back up the tree: having combined two vectors together at the lower level, collect on each side the corresponding opposite partner from the other side and on that side choose the other of the two, thus swapping each side across wholesale. Then not only have you reversed the innermost pairs, but as you go up the corresponding larger vectors are reversed.

Interestingly the "adjustment" can be done as either postadjust or preadjust, reversing outside in or inside out, but in either case thoroughly.

scan

$(scan\ +) X\ =\ (atom?$ $id$

$c_{lr} : (id,!+)\ :\ \#\ (null,last\ 0)\ :\ !\ (scan\ +)\ :\ d_{lr}\ X $

and in Divacon code:

(scan +) = PDC( $ \text{divide}=d_{lr}, \text{combine}=c_{lr},$

$\text{preadjust}=id, \text{postadjust}=(id,! +):\#(nil,last\ 0)$

$\text{basepred}= \text{atom}?,\text{ basefunc}=id\ )$

In English: keep dividing the input array in half forming a binary tree down to singletons. Go up the tree handling each pair of vectors by communicating the last of the 0'th vector into tuples with each member of the 1'th (second) vector, then summing the tuple to form a subtree scan result. The first vector receives no communications, indicated by $nil$ in the communication process, and experiences no local operations, indicated by $id$ in the local operation process.

George also offers an alternative PDC definition for scan in the thesis on p84 to cut down the O(n) communication implicit in the last 0. However its central step hscan + comprises a cascade of eight component subprocesses, and I'm setting it to be my homework to ask him to work through them with me. Result here.

Fibonacci in O(log(N)) time

Let the domain of Fibonacci indexes such as $n$ and $k$ be mapped to a codomain of 2-vectors of Fibonacci summands

$N=[N'\ N'']^T$ and $K=[K'\ K'']^T$ where
$N'=f(n-1)$ and $N''=f(n-2)$,

and define

$$ \begin{align} N\ \#\ K\ \ &\text{≜} \begin{bmatrix} N \cdot (K+1)\\ N \cdot K \end{bmatrix} =\begin{bmatrix} (N+K)'\\ (N+K)'' \end{bmatrix} \end{align}$$

(Derivation here).

The Fibonacci recurrence

$F(n+k) = F(k)F(n-2) + F(k+1)F(n-1)$

enables doubling the index of results by calculating in the domain F(n+n), or in the codomain $N\# N = (2N) = [F(2n-1),F(2n-2)]^T.$

Define an even/odd division based not on the position of an element in the input vector being in an even or odd point in the order, but on the value of the element which is itself an even or odd number: $d_{01}$.

$$\begin{align} d_{01} X\ =\ Y\ Z\ |\ & \forall\ x=(b,k,N,K) \in X\ \\ \text{if}\ & x.k = (x.k\gg 1\ll 1)\ \ \text{// last bit is 0}\\ \text{then}\ & x\ \in\ Y,\\ \text{else}\ & x\ \in\ Z \end{align}$$

Then we can build a Parallel Divide and Conquer or PDC, higher level function:

! f = PDC( divide = $d_{01}$, combine = $c_{lr},$
pre = $(b=k\&\text{0x1}, k:=k\gg 1, N:=N\# N,\ K:=(b\ ?\ N\# K;K)\ )$,
post = $ \forall\ v\ \in Y \cup Z\ \ \ v:=( v.b; v.k\ll 1+v.b, v.K )$
basepred = $(k=0)$,
basefunc = $(b:=0,\ k:=k,\ f(k):=K'+K'')$
)

Given an array of indexes $k$ this PDC returns a shuffled array of tuples $(k,f(k))$.

Further discussion here.

Parallel Fourier Transform

For $b\ in\ 0..(N/2)-1 $, let the discrete Fourier transform be

$F(b,N,X) = \large\sum_{s=0}^{N-1} X_s * e^{isTb/N}$

This makes use of several elements:

$s \in 0..N-1$ counts the samples in our index-normalized data array $X$ which has length N.
$X_s$ is the measured sample value at index $s$.
To multiply by $e^{i\theta}$ means to rotate by angle $\theta$ in the {Re,Im} plane.
"Tau" or $T=2\pi$ radians = 1 bend = 360 degrees
$T/N$ divides the circle into $N$ equal fractions (one for each sample),
$bT$ is $b$ full circles.
$s/N$ is a fraction ranging from 0 to 1 as $s$ ranges from 0 to N.
$sTb/N$ ranges from 0 to Tb (that is, b bends) as $s$ ranges from 0 to N
$Tb/N$ is our base step angle, and $e^{iTb/N}$ a point on the unit circle encompassing that angle. Then $(e^{iTb/N})^s=e^{isTb/N}$ counts $s$ base steps around the unit circle.
$e^{i\theta}=cos(\theta)+i\ sin(\theta)$ means increasing $\theta$ by one can also be done by multiplying by $e^{i1}$ instead of working inside the cosine and sine functions' arguments.
This allows factoring out a base step $S^b = e^{iTb/N}$ from the odd summands ($s$ odd) in DFT above. Which means the odds sub-function is just the evens sub-function times that power of $e^{iTb/N}$.

The nub of the problem is to separate even and odd terms, which can be calculated separately and added, and then to notice the odd sum can be factored, pulling out our base step angle $S^b = e^{iTb/N}$ leaving the identical evens' formula, or calculation process, but using the odds' data. Since the subproblems are half the size, the step angle doubles from $S^b$ to $(S^b)^2$, to reach the same number of rotations after only half the steps.

$F_{b,N} X$	$ = \large\sum_{s=0}^{N-1} X_s * (S^b)^s $
	$ = \large\sum_{s=0}^{\frac{N}{2}-1} X_{2s} * (S^b)^{2s}$	$\ \ +\ \large\sum_{s=0}^{\frac{N}{2}-1} X_{(2s+1)} * (S^b)^{(2s+1)}$
	$ =\ \ \ \ \ "\ \ $	$\ \ +\ \large\sum_{s=0}^{\frac{N}{2}-1} X_{(2s+1)} * S^b * (S^b)^{(2s)}$
	$ = \large F_{2b,\frac{N}{2}} X_{evens} $	$\ \ +\ \large S^b * F_{2b,\frac{N}{2}} X_{odds}$

Since $F_b$ is inside and outside, this can be transcribed as a PDC parallelization of $F_b$.

In Divacon notation:

$F_{b,N}\ X\ = $ atom ? (Re=id, Im=0, b=0)

$c_{eo} : (\text{self}+S^b*\text{other, self}-S^b*\text{other}, 2b)\ :\ \#\ !\ corr\ :\ !\ F_{2b2,\frac{N}{2}}\ :\ d_{eo}\ X $

and in Divacon code:

pft = PDC( $ \text{divide}=d_{eo}, \text{combine}=c_{eo},$

$\text{preadjust}=id, \text{postadjust}=!(\text{self}+\text{other}*e^{-iTb/N}, \text{self}+\text{other}*e^{iTb/N}, 2b) : \# ! \text{corr}, $

$\text{basepred}=\text{atom}?,\text{ basefunc}=( (Re=id, Im=0), (Re=id,Im=0), b=0)\ )$

One way to check a PDC is whether the inputs and outputs of each phase of the chain are of matching structure, size, and type.
Let X, C, and I represent Real numbers, Complex numbers, and Integers. Then:

divide: $X^N → X^{\frac{N}{2}}, X^{\frac{N}{2}}$
combine: $C^{\frac{N}{2}}, C^{\frac{N}{2}}$→$C^N$
preadjust: $id$ maps anything to itself.
postadjust: This is a chain of operations starting from
     $corr:\ \ corr\ i = i$. As simple as can be, this communication generator says my index is your index.
     $\# ! corr $ then actually communicates between corresponding elements in a pair of arrays
      by making for each element (!: in each array) a tuple containing that corresponding element from the other array.
     Thus $\# ! corr ((a,b),(c,d)) = ( ((a,c),(b,d)), ((c,a),(d,b)) )$, (so far communication alone, no calculations)
     The remainder of the postadjust takes each tuple and does the famous butterfly calculation, yielding in this case four 3-tuples,
      each containing rotations right and left of the other result-so-far from the self result-so-far,
      along with the bend count $b$ controlling the next rotation size.
      Summarizing: $(C^N,C^N,I)$→$(C^N,C^N,I)$
basepred: $X^N$→ Boolean
basefunc: $X^1$→ $(C^1,C^1,I)$

Check: do the inputs and outputs of each processing element match up? Looks okay to me, but I'm sleepy.
Implicit, except in basefunc, is that the values in postadjust are complex numbers.
The result array is not just a single-frequency FT, dependent on $b$, but an array of imaginary values each representing the magnitude and phase of a frequency component for $b \in 0..\frac{N}{2}-1$.

Further discussion of FFT algorithm, interpretation, use, and history is here.

A Divacon tutorial is here.

dot = PDC(	divide = \(d_{lr}\), combine = \(+\),
	pre\( = id\), post\(\ = id \)
	\(p_b = (atom(V)\ \&\ atom(W) ?,\ \ f_b =\ V\ *\ W\ \)
	)

reverse = PDC(	\( \text{divide}=d_{lr}, \text{combine}=c_{lr},\)
	\(\text{preadjust}=id, \text{postadjust}=(!\ \text{other}:\ \#\ !\ corr)\)
	\(\text{basepred}= \text{atom}?,\text{ basefunc}=id\ )\)

(scan +) = PDC(	\( \text{divide}=d_{lr}, \text{combine}=c_{lr},\)
	\(\text{preadjust}=id, \text{postadjust}=(id,! +):\#(nil,last\ 0)\)
	\(\text{basepred}= \text{atom}?,\text{ basefunc}=id\ )\)

\(F_{b,N} X\)	\( = \large\sum_{s=0}^{N-1} X_s * (S^b)^s \)
	\( = \large\sum_{s=0}^{\frac{N}{2}-1} X_{2s} * (S^b)^{2s}\)	\(\ \ +\ \large\sum_{s=0}^{\frac{N}{2}-1} X_{(2s+1)} * (S^b)^{(2s+1)}\)
	\( =\ \ \ \ \ "\ \ \)	\(\ \ +\ \large\sum_{s=0}^{\frac{N}{2}-1} X_{(2s+1)} * S^b * (S^b)^{(2s)}\)
	\( = \large F_{2b,\frac{N}{2}} X_{evens} \)	\(\ \ +\ \large S^b * F_{2b,\frac{N}{2}} X_{odds}\)

pft = PDC(	\( \text{divide}=d_{eo}, \text{combine}=c_{eo},\)
	\(\text{preadjust}=id, \text{postadjust}=!(\text{self}+\text{other}e^{-iTb/N}, \text{self}+\text{other}e^{iTb/N}, 2b) : \# ! \text{corr}, \)
	\(\text{basepred}=\text{atom}?,\text{ basefunc}=( (Re=id, Im=0), (Re=id,Im=0), b=0)\ )\)

Divacon homework algorithms

Dot product

Reverse

scan

Fibonacci in O(log(N)) time

Parallel Fourier Transform

Copyright © 2024 Thomas C. Veatch. All rights reserved.
Created: June 13, 2024; Modified June 17, 2024


dot \(= (atom(V_0)\ \&\ atom(W_0))\)	\(?\ \ V_0*W_0 \)
	\(;\ \ +\ :\ !\ \text{dot} \ :\ d^2_{lr}\)

reverse \( X\ =\ (atom?\)	\(id\)
	\(c_{lr}\ :\ (!\ other\ :\ \#\ !\ corr)\ :\ d_{lr}\ X \)

\((scan\ +) X\ =\ (atom?\)	\(id\)
	\(c_{lr} : (id,!+)\ :\ \#\ (null,last\ 0)\ :\ !\ (scan\ +)\ :\ d_{lr}\ X \)

\(F_{b,N}\ X\ = \) atom ?	(Re=id, Im=0, b=0)
	\(c_{eo} : (\text{self}+S^b\text{other, self}-S^b\text{other}, 2b)\ :\ \#\ !\ corr\ :\ !\ F_{2b2,\frac{N}{2}}\ :\ d_{eo}\ X \)

	Your thoughts?
Name⁺:
Email^*:	(will not be shared or abused)
Comment:
	Feedback is welcome.

Dot product

Reverse

scan

Fibonacci in O(log(N)) time

Parallel Fourier Transform

Copyright © 2024 Thomas C. Veatch. All rights reserved. Created: June 13, 2024; Modified June 17, 2024

Copyright © 2024 Thomas C. Veatch. All rights reserved.
Created: June 13, 2024; Modified June 17, 2024