In studying George Mou's Divacon system for Divide-and-Conquer parallelization of computer algorithms, I thought I'd give myself the homework of using it to implement the FFT.

George says his parallelized version of the FFT is written in three lines of Divacon code. Whereas when I wrote my FFT programs in grad school and again later in integer-only code for Sprex, it took me several pages. So I am taking this as a challenge: Can I follow George's footsteps?

Let's see if we can invent a PDC-FFT together. First the discrete Fourier Transform (O(n^2)), then the Fast FT (O(n*log(n)), and finally a Parallel FT (O(log(n)).

The Discrete Fourier Transform

Let's start with the discrete Fourier Transform equation, then break it down understandably. If this is too slow for you please just read the equations, but I want to explain it so that even I can understand it, and if I can then surely you can too. Here goes:

For \(b\ in\ 0..(N/2)-1 \), let

\(DFT(b,N,X) = \large\sum_{s=0}^{N-1} X_s * e^{isTb/N}\)

Let's take this apart:

The Bend, or Tau, \(T\): We start with the concept of a circle. The ancient trade of leadwork or plumbing has from time immemorial called 360 degrees a "bend". A quarter bend is a 90 degree cast iron elbow fitting, and if you don't know that you're not a real (legit, apprenticed, commercial, uptown) plumber. In current math fashion, \(\pi\) is out, and they substitute "tau", \(T=2*\pi\). So \(T=1\) bend. Glad to see you all catching up.
b is the bend count, which relates to the analysis frequency. We are going to map our measured data samples onto a circle, and at the lowest frequency, the whole set of them will just wrap around to the end of the circle one time, b=1. At two bends, the samples are spread out by doubling the angle between each one, and the result is that by the end of them they reach around the circle twice, and this is also a doubling of the frequency, the cycles per unit time. (You have to know how much time did your sampled data fill up, and then you can change units from bends to cycles per second, but that is strictly outside the algorithm. Take a note, and use a conversion factor when interpreting the results. Inside the Fourier Transform itself, frequency is simply the bend count, and its meaning is that we will wrap around the unit circle \(b\) times while counting through all our samples once.
Science, because \(X\): Noone hardly ever uses the Fourier Transform except to connect with actual measureable reality. There is generally a regularly spaced, identically measured, sequence of samples, \(X_s\), which noone would normally get by inventing values out of their head, and otherwise there's no point in doing Fourier. What, are you going to invent 1024 numerical values out of your head to make some kind of point of armchair philosophy? I don't think so. So yes, it's pretty much all Science, and here that means consistent measurement, and the sample vector X. By the way this is where you get the sample rate to map samples and bends to actual frequencies in time.
N: FFT likes powers of two. You can use odd and in-between sample counts if you fill up your sample array with zeroes up to a power of two samples. We count N numbers from 0 to N-1, and N/2 numbers from 0 to \(\frac{N}{2}-1\). For example a sample array of 8 samples counts our sample numbers \(s\) from 0 to 7, and our bend numbers \(b\) from 0 to 3. You can also use division-into-threes instead, or other preferably prime factors, but the biggest win is by dividing into twos, because you can divide any number more often by two than by any other number over 1, and the bigger speed-up comes from the number of division layers, not the split-count of the division at any one layer.
Euler's Formula: \(e^{i\theta}=cos(\theta)+i*sin(\theta)\)
First, this is just truly weird. How you get from an exponential, \(e\) to the something power, to a circle in the imaginary plane, I don't know. Still, you could prove it, like this:

First, let \(f(\theta) = \large \frac{cos(\theta)+i*sin(\theta)}{e^{i\theta}}.\)
Second, show the derivative \(f'(\theta)=0\) is zero, meaning that \(f(\theta)\) is flat, unchanging, constant.
Third, show the value at zero is one: \(f(0)=1\). That specifies the constant value which \(f\) has everywhere.
So the ratio \(f(\theta)\) is always \(1\) irrespective of \(\theta\), so the top and the bottom parts of \(f\) must have the same value as each other everywhere, so \(cos(\theta)+i*sin(\theta) = e^{i\theta}.\)
Since that's just an imaginary number with coordinates \(cos(\theta)\) and \(i\ sin(\theta)\) in the imaginary plane, and if you draw it you'll see cos and sin are coordinates of the points on a circle, in fact, on a unit circle centered on the origin. Therefore \(e^{i\theta}\) is a point on the unit circle in the imaginary plane. Now when \(\theta\) exceeds \(T = 2*\pi\) it has rotated around past its original starting point of \((1+i0)\).
Oh, sorry, this will also blow your mind. Notice that suppose you go some angle \(\theta\) around the imaginary unit circle, call it \(e^{i\theta}\). Then suppose you want to double that angle. Well in the \(cos+isin\) interpretation you just use \(2\theta\) and get \(cos(2\theta) + i\ sin(2\theta)\) but that's now the same as \(e^{i(2\theta)}=(e^{i\theta})^2\) by Euler's Formula. So stepping around the circle means taking powers of the step \(e^{i\theta}\). This is very wierd and very amazing and you have to check it to see if you believe it's true, because otherwise you wouldn't believe it.
Done? Satisfied? No? Go back and think about it, this should be in your toolbelt. We will be going back and forth between powers of \(e^{i\theta}\) and increments of angles around the unit circle, all the time, starting now.
Step size \(sTb/N\): Notice that \(s/N\) is a fraction ranging from 0 to 1 as \(s\) ranges from 0 to \(N\); so \(sTb/N\) ranges from 0 to \(Tb\) (that is, \(b\) bends) as \(s\) ranges from 0 to \(N\).
Now we pick one value of \(b\), and that value will determine both the step size and the number of bends which will be rotated through after \(N\) samples, which could be \(0, 1, 2\) or any integer up to \(\frac{N}{2}-1\).
Next we align our samples \(X_s\) with the positions of the hours hand on a counterclockwise-rotating clock zeroed at the 3pm hour. The "clock hand" \(e^{i\theta}\) is a 0,0-based unit vector that rotates around the unit circle by a fixed angular step size, one step for each sample. Euler showed us that, above.
We will go back and forth between an angle, \(\theta\) and a point on a unit circle, \(e^{i\theta}\), each encodes the other very nicely, so I might say one when I mean the other. You can figure out what I mean, I think.
To go around the unit circle one bend after all the samples, we use a step of \(\large e^{i\theta_{step}}\) which would rotate 1/N of a circle for each step. That is, \(\theta_{step} = T/N\), which for the s'th sample reaches \(Ts/N\) and by the end of all the N samples (which counted 0 to N-1), we are all the way back to the beginning, or actually one step short of all the way back to the beginning, because the loop is exhaustive but not redundant and we already touched the 0'th step when we started at 0.
Alignments, Bends: To achieve \(b\) bends and go around the circle \(b\) times in just \(N\) steps (with \(s\) counting from \(0\) to \(N-1\)), we must increase the step size to \(\theta_{step} = Tb/N\). Then \(s\) steps is \(sTb/N\) and when \(s\) actually tops out at \((N-1)\) we have rotated \((N-1)Tb/N = Tb(N-1)/N\) which means we have fully covered \(Tb\) or \(b\) bends. In every case, all of our unit vectors can be written as \(e^{isTb/N}\) for some \(s\) and some \(b\).
Scaling the Unit Vectors: Next step, we scale up each of those rotated unit vectors by their corresponding samples, \(X_s * e^{isTb/N}\), and then add them all up for all the samples at that step size \(Tb/N\). Actually the scaling factor could be a fraction, meaning we scale down the unit vector, and it could be negative, meaning we reverse that vector to the other side of the unit circle. \(-3\) on the vector at \(90\) degrees \(= 3\) on the vector at \(270\) degrees, right? It'll come in handy in a minute.

That concludes the breakdown of the DFT expression above. Make sense? I mean, you can do the arithmetic now, right? The meaning and interpretation come next, and you are all plumbed up and ready for that now.

To summarize so far, this sum gives us a complex (real + imaginary) number, \(x_b+iy_b\) called the Fourier-transform of X for a certain bend count, \(b\) which is to say a logical frequency component. That point in radial coordinates tells us the size \(r_b\) and angle-shift ("phase") \(\theta_b\) of that summed frequency component in the sampled signal values in our data array. As a point in the imaginary plane, it is typically not on the unit circle. But its angle from \(\theta=0\) at position \((1,0)\) to \(\theta_b\) is what we will call the phase shift of the sinusoid component that the formula has detected, and the length of the vector from 0,0 to that point, \(r_b\) is the magnitude of that sinusoid component.

DFT Interpretation

Fourier's Transform is a kind of a miracle, and I have spent many hours for four decades contemplating it, so you also might take some time to get it all. It's normally taught using \(e^{ijk2\pi/N}\) which is incomprehensible to me just because I get \(j\) and \(k\) confused, so maybe using \(e^{isTb/N}\) will make more sense since we use \(s\) for sample number and \(b\) for bend count. Don't you just love counting in bends?!

So one thing is for you to understand how to calculate it, and by now it should be feeling friendly to you, and the other thing is, it's also good to develop some intuitions of what the results mean.

So a given output of the formula will be based on a particular value of \(b\), and to think geometrically, it is created by wrapping the samples around the unit circle at a certain pace, faster in order to finish more bends by the end, whatever \(b\) is, and then given that alignment of samples to angles (or you could say scaling of angles by samples), adding them up.

If at 0 bends this sum is large then the signal has a constant bias, a.k.a. Direct-Current or DC offset. Oftentimes signal processing algorithms will subtract the DC component from all the samples considering it an artifact of the voltage settings of the machinery and uninformative of the sound wave or other tracked signal, which if it were interesting would be changing.

If at 1 bend the sum is large, then the signal contains an ultra-bass, a lowest-measureable-frequency signal component. Why? If the samples now pointing in all directions added up to zero then they just cancel each other and yield nothing for that bend-count, neither in total nor on average. There is nothing there, at that \(b\)-specific pace.

But if they don't cancel out, like in some direction they add up like positive numbers do, but in this case they are positive vectors in some general direction, whereas in the opposite direction they are adding in negatives to the sum, well, think about those negatives, they are negatives to the anti-positive direction and they make a double negative, which means they reinforce the positive-direction parts of the sum, and the total sum thus accumulated if it is far from zero, it reveals a substantial signal component at the particular bend count you were using to step around the circle.

That's the idea of wave analysis, waves have peaks and troughs, and when you wrap them around the circle the troughs being negatives on the other side of the circle they add to the peaks, but only at that bend count at which they line up together. The correlation of both peaks and valleys gives a nice big sum when they are lined up at the right pace around the unit circle. And that means the signal has a significant sinusoidal signal component at that frequency. Cool and amazing, right? I think so.

Finally if you go \((N/2) -1\) bends around the unit circle for each sample step, aligning samples X[0] to X[N-1] to their corresponding s'th-step unit vector and scaling them and adding them all together, that is a way to detect if there is a very high frequency component in your signal. If every even sample you go positive and every odd sample you go negative, or vice versa, at least enough to affect the total sum of them all, then you will end up with a large, high frequency, signal component. Curiously, if you went beyond N/2 - 1 bends to even higher frequencies, the system stops making sense because the alignments start to wrap around and overlap with previous frequencies and you don't get new information. (The similar Nyquist frequency concept says you need two points for every sine wave's peak and trough, here we have half as many frequencies as points. So we will stop at \(b \le \large \frac{N}{2} - 1\).)

I can't stop saying how this frequency analysis thing is super cool and amazing:

not only because everything in this world vibrates and
not only because small things vibrate faster than large things so you can kind of see what is the size breakdown of all the stuff in your vibrating field of stuff by looking at the Fourier transform of the signal coming out of it,
but also because any finite-time or periodic signal can be decomposed into sine waves even if the sum of them all together doesn't look at all like sine waves.

(That was Fourier's theorem, smart guy.)

FFT

Now we have a strong idea, it's time to make it a fast strong idea. NFL Lineman style. Except the NFL has nothing on the FFT, the speedups are by large multiples, not a few tenths of a second or seconds. Last night I sped up the FFT by a multiple of at least 500, by doing my Parallel Divide-and-Conquer homework for George Mou. So forget the NFL and write Parallel Algorithms!

Did I mention, Fourier's (1807) Transform seems to have been predated by Gauss' (est. 1805) unpublished work, which tripled Fourier's achievement including not only (1) the idea of this samples-in-time to amplitudes-in-frequency transformation but also (2) the discrete version of it and (3) an efficient method for calculating it, equivalent to what is today called decimation-in-frequency FFT on real samples. Poor Gauss, he of course invented plenty of stuff, but this particular genius idea of his was forgotten and reinvented and finally only made popular by Cooley and Tukey (1965) when it was needed for detecting underground nuclear testing in the Cold War, so I've read. But it's way, Way, WAY worse: at least 11 other authors found ways of doing the same basic thing, whether restricted to sample counts that are factors of 2, or prime factorable, or etc. including the we-only-ignored-you-for-23-years Danielson and Lanczos (1942), whose lemma is used to derive the following (after Press et al, Numerical Recipes):

First, we need to know that Mou's beautiful, even-odd vector division notation, \(d_{eo} X\) can be clumsily written in traditional math form, dividing vector X into evens and odds by letting \(s=0\ \ ..\frac{N}{2}-1, even=2s, odd=2s+1\).

Second, let "step" \(S = S_b = e^{iTb/N}\); so \(S^n\) rotates n steps and \(S^N\) rotates \(b\) bends. (If N is very very large like millions or billions, then maybe \(S_1\) is almost indistinguishable from 1, yielding numerical instability in calculating its powers. In that case, keep a bigger S around with its built in exponent, say 10^5 or whatever, for accurate calculations of \((e^{iT/N})^{sb}\).)

Then follow this derivation:

\(F_{b,N} X\)	\( = \large\sum_{s=0}^{N-1} X_s * S^s \)
	\( = \large\sum_{s=0}^{\frac{N}{2}-1} X_{2s} * S^{2s}\)	\(\ \ +\ \large\sum_{s=0}^{\frac{N}{2}-1} X_{(2s+1)} * S^{(2s+1)}\)
	\( =\ \ \ \ \ "\ \ \)	\(\ \ +\ \large\sum_{s=0}^{\frac{N}{2}-1} X_{(2s+1)} * S * S^{(2s)}\)
	\( = \large F_{b,\frac{N}{2}} X_{even} \)	\(\ \ +\ \large S * F_{b,\frac{N}{2}} X_{odds}\)

But if you prefer using \(e^{i\theta}\) notation to \(S\), you can follow through this derivation of the same thing:

\(F_{b,N} X\)	\( = \large\sum_{s=0}^{N-1} X_s * e^{isTb/N} \)
	\( = \large\sum_{s=0}^{\frac{N}{2}-1} X_{2s} * e^{i{2s}Tb/N}\)	\(\ \ +\ \large\sum_{s=0}^{\frac{N}{2}-1} X_{(2s+1)} * e^{i(2s+1)Tb/N}\)
	\( = F_{b,\frac{N}{2}} X_{even} \)	\(\ \ +\ \large\sum_{s=0}^{\frac{N}{2}-1} X_{(2s+1)} * e^{iTb/N} * e^{isTb/(N/2)}\)
	\( = F_{b,\frac{N}{2}} X_{even} \)	\(\ \ +\ e^{iTb/N} * F_{b,\frac{N}{2}} X_{odds}\)

FFT using Divacon: PDC FT_b in O(log(N))

A brief treatment is here; longer explanation is here:

\(PFT_b = PDC(divide=d_{eo},\ combine=id,\ preadjust=(self,b=2b), postadjust=(other*e^{iTb/N}\ +\ self),\ basepred=atom?,\ basefunc=id)\)

Meaning:

Inputs \(S_b, N, X\): Let N be a power of two so we can divide X in half all the way until we get singleton arrays at the end. \(b\) bends or full rotations defines the frequency of interest, via an initial base step angle \(Tb/N\) expressed as \(S_b = e^{iTb/N}\). As we count across all our samples, we also cycle \(b\) times around our periodic reference signal \(e^{isTb/N}\).
Divide the source data array using even odd division, \(d_{eo}\). The 0th element is even, the 1st is odd, and after \(d_{eo}\) division the latter becomes the 0th element of the odds, the right hand sub-array. The evens and the odds are two sub arrays of length \(N/2\), with normalized indices \(0\ ..(N/2)-1\).
Recursion on \(F\): Although \(F\) seems to be doing local calculations at each node and branch of the tree, and it is indeed, but all its work down to the leafs is done by its own internal reinstantiations of the divide, combine, preadjust and postadjust, at the different levels; there is nothing within \(F\) other than those. It's strangely empty in the middle. This is the trick: it's empty in the middle; everything is in the recursive or parallel breakdown. Can you think of it that way?
Distribution: \(!\ F\) means apply F to each of the tuples supplied to it by the function to the right, which will be the preadjust function.
This is the essence of parallel decomposition. More than recursive decomposition where a function's definition includes a reference to itself, in parallel decomposition a function's definition includes a distribution of itself. \(!\ F\) says separately apply \(F\) to each of the tuples in the following array. Since they are done separately, they can be done in parallel. Hence the power of parallel, more than mere recursive, analysis.
\(!\) is the whole trick, and everything else revolves around enabling our formalism to use it: dividing arrays to be distributed over, implicitly normalizing their indexes so we can think of them as the same after any number of divisions as each counting from 0 to length-1, and so that F can simply apply to the divided arrays just as to the original; and communicating data from other parts of the array to make the local computation actually be local. The design problem to create a parallel algorithm with Divacon is to figure out what is the minimal nub of local computation, and to place it properly in the expanding and collapsing of the computation tree. Then to move data around so it's ready for that nub of local computation to do its work at that point. And by then you should know whether the divisions should be even/odd or first-half/second-half (LR). It is a 3D dance of perspectives on your problem, and until you get the idea of this approach it seems impossible; but once you get it, any algorithm's coding is almost trivial. George said he wrote his dissertation's benchmarked, reference algorithms in less than a half hour each, and each attained optimal or "sub-optimal" speeds. I think George's perfect English finally failed him, I think by sub-optimal he meant less than the so-far-best times, i.e. faster than the optimal results so far, not slower, or at least he meant "near-optimal". I hope readers didn't read that and misunderstand.
No preadjustment: If \(preadjust = id\), it is a no-op; the output of \(id\) is just its input; no change.
This means the computation graph gets built all the way down to the leaves of the tree by first dividing, as part of \(PFT\), then dividing again as part of recursive calls to \(PFT\), and no other kinds of operations on the way down, because preadjust is almost a no-op. The work, the actual calculations, will be done slightly at the bottom, and almost entirely on the way back up the tree.

\(PDC\ FT\) in O(log(N))

So far so good, but I'd like to do all the frequencies not just one, also in O(log(N)) time. How? Perlis has a hint in his Turing Award talk (ACM V 14 No 1, 1967), p 5-6:

"I have mentioned that procedure identifiers are initialized through declaration. Then the attachment of procedure to identifier can be changed by assignment. I have already mentioned how this can be done by means of [function] pointers. There are, of course, other ways. The simplest is not to change the identifier at all, but rather to have a selection index that picks a procedure out of a set. The initialization now defines an array of forms, e.g., procedure array P [1:j] (f1, f2,... ,fj); ... The call P [i] (al, a2, . .., aj) would select the ith procedure body for execution. Or one could define a procedure switch P:= A, B, C and procedure designational expressions so that the above call would select the ith procedure designational expression for execution. The above approaches are too static for some applications and they lack an important property of assignment: the ability to determine when an assigned form is no longer accessible, so that its storage may be otherwise used. A possible application for such procedures, i.e., ones that are dynamically assigned, is as generators. Suppose we have a procedure computing (a) \(large \sum_{k=0}{N} C_k(N) X^k\) as an approximation to some function (b) f(x)= \(large \sum_{k=0}{\inf} C_k X^k\), when the integer N is specified. Now once having found the \(C_k(N)\), we are merely interested in evaluating (a) for different values of \(x\). We might then wish to define a procedure which prepares (a) from (b). This procedure, on its initial execution, assigns, either to itself or to some other identifier, the procedure which computes (a). Subsequent calls on that identifier will only yield this created computation." (p 5-6)

Perlis' suggestion seems very close to the incremented-base-step adjustment of the functions \(F_b\) of a multi-frequency optimized parallel Fourier Transform. The parallel breakdown of the function creation seems obvious: a log(N) decomposition of the calculation of base steps.

Let's try generating all base steps \(S_b = e^{iTb/N}\) in \(log(N/2)\) steps:

\(S_1 = \) \(e^{iT/N}\)

\(S[] = \) doubleshift \(Ss \) // calculate base step Ss in log N time.

where doubleshift \( Ss = \) atom\(?\ \ \ (\ 1\ S\ )\) // base case

else \(\ \ \ \ \ \ (\ Ss\ !(\ S_1\ *\ ((last\ 0)\ Ss))\ Ss\ )\) // retain all Ss on the left half, increment copies on the right

and
\(PDCFT[b] X = PDC(d_{eo},c_{eo},id,ma(Ss[b]),atom?,id) X\ \ \ \ \ \ \ \) // calculate \(F_{S_b}\) in \(log(N)\) time, in parallel for each \(b\).

Conclusion

The previous section is nice, but its whole computation of \(S_b\) values is actually independent of the data so it could be precalculated once for a given N and stored for multiple later uses.

So you could choose to accept my claim \(PFT\ \sim\ O(log(N))\) because \(S_b\)'s are precalculated, or you could choose to believe it because in the curious math of computational complexity, \(O(2*log(N)) = O(log(N))\), as they say, "within a constant factor". In either case I was shocked to discover I had reduced a 1024 point DFT (to take a typical audio analysis frame size) from the well-known FFT time of \(O(N * log(N)) = O(1024*10)\) to this amazing PFT time of \(O(log(N)) = O(10)\). A 1000X speedup is shocking. In fact I didn't believe it, before I forced myself to imagine 1000 processors working on it together. Edible piranhas indeed. Typical of George, he said laconically of this Divacon expression, "Looks right to me", and of the speed result, "Of course, order log N in parallel". He knew it already and was not the least surprised, yet my Google searches only find NVidia itself, and some Israeli computer scientists in 2023, as saying that this is even an imaginable goal. So I'm confused if I deserve a Satisfactory for my homework, or a Turing Award for out-Gaussing Gauss himself. Does anyone else find this amusing?

A few more speedups might be worth mentioning. If we do calculate the \(S_b\)'s, then the application of PFT[b] may be started as soon as its corresponding \(S_b\) is calculated, so that only the last of the former must be so delayed as to start only after the final result of the latter; staggered early start might save some parallel-computation nodes' time before the last come in. Also if you happen to only care about getting initial (lower-frequency) results from the analysis, you could stop at any specified discard threshold, and save a fraction of the time.

To reiterate, if fully parallelized then the N functions \(PFT_b\) are, first, themselves generated in log(N) time if not precalculated entirely, and second, applied to the sampled data vector X in log(N) time. Hence the entire vector of frequencies F is calculated in O(2*log(N)) = O(log(N)) time. There are a lot of multiply adds going on, but they are done in parallel in a small number, log(n), of levels, and each level takes time O(1).

From what little I know, I believe this Parallel Divacon Fourier Transform algorithm is the fastest Fast Fourier Transform algorithm. My academic friends gently correct me, saying No, FFT is O(N*log(N), but yes, that is incorrect where parallel. I found a 2023 ArXiv reference that also claims O(log(N)) speed for an FFT, but that is a pretty complex paper, and was my 12-symbol PFT homework really that fancy? No. That references in turn refers to NVidia's efforts to beat O(n log(N)) with its GPUs. Apparently O(log(N)) is actually newsworthy. I'll take it.

On the other hand, this analysis ignores any cost for plenty of processors, and assumes zero time cost for communications, which actually becomes a dominant factor in highly parallel computing.

Bibliography

John Backus 1977 Turing Lecture.

H. BURKHARDT,"Trigonometrische Interpolation," in Chapter 9, Encyklopiidie der Mathematischen Wissenschaften, vol. 2, part 1, 1st half, 1899-1916.

H. BURKHARDT,"Trigonometrische Reihenund Integrale(bis etwa 1850)," in Chap- ter 12, Encyklopiidie der Mathematischen Wissenschaften,vol. 2, part 1, 2nd half, 1904-1916.

J. W. COOLEY & J. W. TUKEY, "An Algorithm for the Machine Calculation of Complex Fourier Series," Math. Comput., Vol. 19, no. 2, pp. 297-301, Reprinted in Digital SignalProcessing,ed.L. R.RABINER& C. M. RADER,pp.223-227,New York: IEEE Press,1972,Apr. 1965.

C. F. Gauss, "Nachlass, Theoria InterpolationisMethodo Nova Tractata," in Carl Friedrich Gauss Werke, Band 3, Koniglichen Gesellschaft der Wissenschaften: Gottingen, pp. 265-330, 1866.

MICHAEL T. HEIDEMAN, DON H. JOHNSON, C. SIDNEY BURRUS "Gauss and the History of the Fast Fourier Transform", 1985. Rice U Eng. Dept.

Orian Leitersdorf, Yahav Boneh, Gonen Gazit, Ronny Ronen, Shahar Kvatinsky, FourierPIM: High-Throughput In-Memory Fast Fourier Transform and Polynomial Multiplication Viterbi Faculty of Electrical and Computer Engineering, Technion – Israel Institute of Technology, Haifa, 3200003, Israel. ArXiv 2304.02336v1 [cs.AR] 5 Apr 2023. Alan Perlis, The Synthesis of Algorithmic Systems, The 1st Turing Lecture. (ACM V 14 No 1, 1967).

Zhijing George Mou, A Formal Model for Divide-and-Conquer and Its Parallel Realization, May 1990, Research Report YALEU/DCS/RR-795.

	Your thoughts?
Name⁺:
Email^*:	(will not be shared or abused)
Comment:
	Feedback is welcome.

Introduction to FFT

The Discrete Fourier Transform

DFT Interpretation

FFT

FFT using Divacon: PDC FT_b in O(log(N))

\(PDC\ FT\) in O(log(N))

Conclusion

Bibliography

Copyright © 2024 Thomas C. Veatch. All rights reserved.
Created: June 13, 2024; Modified June 17, 2024

\(S_1 = \)	\(e^{iT/N}\)
\(S[] = \)	doubleshift \(Ss \)		// calculate base step Ss in log N time.
where	doubleshift \( Ss = \)	atom\(?\ \ \ (\ 1\ S\ )\)	// base case
		else \(\ \ \ \ \ \ (\ Ss\ !(\ S_1\ *\ ((last\ 0)\ Ss))\ Ss\ )\)	// retain all Ss on the left half, increment copies on the right

The Discrete Fourier Transform

DFT Interpretation

FFT

FFT using Divacon: PDC FT_b in O(log(N))

\(PDC\ FT\) in O(log(N))

Conclusion

Bibliography

Copyright © 2024 Thomas C. Veatch. All rights reserved. Created: June 13, 2024; Modified June 17, 2024

Copyright © 2024 Thomas C. Veatch. All rights reserved.
Created: June 13, 2024; Modified June 17, 2024