Below are my unedited notes from conversations with Zhijing George Mou in 2024, at his apartment, at Thanh Brothers Vietnamese Pho restaurant, and elsewhere, sometimes over the phone.
Near the end I wrote down some:...
5/27/24 phone call with George Mou
Regarding my question which asked, what is communication in the GM system and how does a function between index sets express it? On communication as indexes: any computation consists of two parts, the calculations and the communication that makes it so that the data is there for the calculations. Maybe you’re in a parallel processor but maybe not, so don’t assume irrelevancies, just get the logic right. So suppose what you need for a calculation is all in a vector, then the data would be found under indexes. To calculate a lot of sums, like V[i]+V[j] you need indexes. So indexing is a form of communication but normally it is singular, moving one piece of data, not a group. Whereas consider logically parallel computing, say LR or EO which are 1/2-1/2 splittings, that can do everything in Numerical Recipes. Eg FFT: all odd even splitting is equivalent to LR splitting it can be proven (refer to theorem please).
For any I you can communicate with I+1 in a one at a time approach. But instead open your eyes and go with a group of numbers. In any (stupid, current) programming language only the adjacent next is available. Not so in divacon.
I asked a question, does communication refer to moving data among levels of the caching hierarchy? I offered the thought, perhaps let the code handle it explicitly, then you would have communication, concretely. George says No, that would be dumb, let your compiler do it. On the other hand, no other system offers group communication, which under Divacon is offered and works.
For this reason in Divacon, the FFT algorithm is just two lines. The bitonic sort algorithm is 3 lines. Not 5-10 pages like others use. (I can confirm this because I have written FFT code and that’s about right.)
I asked: Is div LR group communication?
In chemistry there are only proton electron neutron, 3 things make the universe.
In Computer Science there are 3 orthogonal dimensions: 1) structure manipulation like Div LR or Div EO, etc but in structure manipulation there is no communication nor operation. 2) Communication, as in, what information is needed for one’s work (getting it there by machine copy commands is low level; the right way to think of it is, what is the information needed, not How to move it there). Communication contains no structure manipulation and no calculations/operations. Finally 3) operations: these assume the data is right there, no communication necessary or (encouraging properly factoring your work) none is even allowed. Hence each of them, SM, Comm, Ops, is orthogonal, independent, of the others. And yet with the three you can have all of CS. Orthogonal dimensions are so important. Then you avoid muddle, especially muddled thinking.
Communication is data movement. Move data from one location to another. Correspondent communication is move data from left side to right side. Think of. It like this, you cut a stick in half then move them and to end to align: each i’th element is adjacent or aligned to each other on the sticks: this is called correspondence. This is all you need to achieve every algorithm in Numerical Recipes.
In stupid current modern Programming Languages what you look at is one item, but in this you can look at K items at once. Then in the von Neumann bottleneck you are stuck handling one thing at a time. Hence we use loop instead of higher order functions. Instead, in logic, move two — which breaks the bottleneck, even though in logic you don’t impose time constraints, you are not saying when to do the moving. Don’t add any constraint which is not logically necessary
George asks zoom, What is Computer Science? What is the most important thing? Tom replies: Abstraction. Yes. And all current CS is counter-CS, contrary to CS.
After 70 years still the community hasn’t figured it out. Looking at a maximum of one number, that’s all they can do. To look at two numbers, that is considered too horrible. But if you can do one, then you can do 2 or any number K, by induction.
An Allegory: In ancient times they had limited arithmetic and limited concepts of number. Two primitive people were challenged to a mental contest. What is the biggest number you can think of? One scratches his chin, thinks for a long time, announces his answer is 1, the other th8nks even longer, and finally says, I give up you win. That is the whole field today.
The great miracle algorithm, the FFT, can use correspondent or odd-even either way, they are equivalent. One is premorphism-plus-odd even vs postmorphism-plus-LR
Each reduces to the other.
Tom says: in CS departments there is a tiny minority that does theoretical CS, proving stuff about code, and they think they are better than everyone else. After these conversations I think I am becoming one of them.
George says: Theoretical CS is mostly about computability, complexity and automata, not really programming language theory. Still that is the most respectable work in CS.
But Gates and Musk often say stupid things but people have no idea.
Science should theoretically progress: even negative answers are progress in a world of unsolved problems. For any problem given infinite time there should be progress, but is there?
Tom offers the analogy with rational and real numbers, rational (say decimal, or binary) numbers are a finite sequence of choices from a finite vocabulary, whereas the reals also include unending unpredictable sequences. In the same way every proof or algorithm is a sequence of statements from a finite vocabulary: finitely long if a halting problem, never-endingly unpredictable if non-halting. Also in the same way that in practically computable applications we can use only rationals and have no need of real numbers (even floats are rational and finite), similarly to say anything interesting or useful we may not need any infinitely-unpredictable-proof based theorems. Hence I am not suffering because of Gödel.
George replies: in the halting problem literature it is shown that no algorithm can finitely decide if an arbitrary program will ever actually end. To prove this from abstract algebra: given two domains infinite but countable, the set of function between them is uncountably infinite. Any algorithm written in an alphabet is infinite but countably so. However problems are also infinite and at least countably so, hence a function between problems and algorithms is uncountably infinite, hence no halting.
George says that in his last lecture in his algorithms and computabity class he says: we must know in front of God how tiny we are. Our tiny speck, which is the finitely solveable problems, is infinitesimal in the face of the uncountable infinity of algorithms for problems. Therefore Fear God. Musk and Gates are devils thinking they can solve everything. Tom says, I’m sorry I must be a devil!
George offers his CS students a problem on day one. Write a program to write all current literature. There is an easy answer: you can easily write all 1-letter literature. Therefore you can also write all two letter novels. Etc. (Kleene Star.) By Induction.
DC programs are all running, Tom, it’s not a joke. Divacon is implemented on 8 machines. Tom says, well I just can’t download it from GitHub and run stuff that I write. Not yet anyway.
George says, let’s talk about the Recursive Array.
What is a data structure? Data plus operation!
A vector is a group of data and an operation of indexing. With the monofocal eye problem of modern computer scientists (the Von Neumann bottleneck) they cannot look at more than one thing and even then only from the end of an array. But a Recursive Array is an object that splits into (balanced) subsets. If you can look at K elements at a time then you get group communication not just indexing. Also relative indexing as when after splitting a (Eg 16 element) vector in half you have a+1..8 and b+1..8 in the splitted subgroups of the data. In two subgroups corresponding communication is a generalization of an array with multifocal, multi-target eyeballs.
GM says it’s time to write a book, although everything is in the thesis. Tom agrees, it needs to become a book. George, It needs not the details but the lessons: orthogonality, the story of theone eyed primitives, higher level lessons, etc.
Tom says: yes, do it!
FFT =
Divide: Div_eo
Conquer: {out i=0}, {out i=1 } = {in j=0 + e^i*depth^2 in j=1},
Conquer: {out i=0}, {out i=1 } = {in j=0 + e^i*depth^2 in j=0},
M.
Given A’ A = I show A A’ = I:
Then A’ A A’ = I A’ = A’ post multiplying by A’
and A A’ A A’ = A I A’ = A A’ pre multiply by A.
Also A A’ A = A I and A’ A A’ A = A’ A I = I I = I.
So A A’ A = A = I A = A’ A A no help.
Discover the O(log n) PDC for FFT
———————
Phone call with George Mou June 16, 2024.
Kidneys struggling with myeloma, reduced chemo now but not improving, and CART to start in August.
GM went from Yale to Brandeis to WaveTrace
Then he went to give a talk at IBM, where his first slide said,
“Why the IBM parallel computer is completely Wrong”.
He gave the talk, then asked, Any questions? No. Ok they asked, is there a paper? Here's the paper. They said, We will show you why you are wrong, he waited for their feedback. Instead he got a job offer from IBM.
Later he went to John’s Hopkins high performance computing lab and group, where he was supported to build a computer and run his stuff, so he bought chips and built a parallel computer with 32 nodes, and ran many applications. He had many people working under him. So he would try to give them something to do. Explain, wait 2 weeks, they don’t get it at all, they are completely confused. Explain again. Do you understand? “Yes.” Two weeks later they still didn’t understand. “After a number of iterations I gave up. I went to the lab, said, you watch me. After five days Divacon is running on the parallel computer. “
Later he was invited to consult on the Connection Machine by Guy Steel, who later was the Turing Award decider, George was the first speaker to speak at the Connection Machine company.
But no matter who, no one really gets it.
Like at Yale, my adviser Perlis after the talk says, George what you talked about looks beautiful like a crystal palace. Only one problem, we don’t understand it. Please wait in the hall for a while. They came out and said, we don’t understand what you are talking about. Did they figure it out? They explained that the way I walk and the tone of my voice and your eyes are telling us, He must be right. We believe you. But we don’t understand.
George: To come back to your question, Did you have a compiler team? No I never found anyone who could help me. Only exact copying. lol .
Look up George’s paper For communication optimal FFT, on a
K dimensional FFT on M dimensional mesh. In:
CONFERENCE PROCEEDING Parle '93, parallel architectures and languages Europe : 5th International PARLE Conference, Munich, Germany, June 14-17, 1993 : proceedings International PARLE Conference (5th : 1993 : Munich, Germany) ©1993; Berlin ; New York : Springer-Verlag
What is a mesh? 2D mesh = grid. N x N is a grid of processors. A 3d mesh is a cube of processors. 4D is easy to describe and theoretically possible and in math easy but no one has made one yet.
Optimal means in correspondent communication to all other processors, the minimum imaginable is the travel distance or diameter of the network. If you can show its communication time is equal to the diameter of mesh then that would be the fastest ever.
So consider the problem of How to assign the nodes to the processors of the mesh, so that the comm distance = the diameter of the mesh.
Odd even… it was mentioned in the thesis?
Two options do your FFT in time odd even or in frequency left right. From the DivACon DC view they are the same exactly. EO is equivalent to LR as pre-odd-even = post LR. But people still don’t see it. The FFT algorithms are listed separately. (See NR).
Premorphism under correspondent communication with odd even, is equivalent to Post morphism with division being LR.
Yes DC uses morphism at its core. You can understand when you get the breakdown.
Bitonic sort is so confusing it has the adjustment being itself a DC. I have my own algorithm, a Monotonic sort, more elegant and cleaner. (Thesis 4.3.5, p89.)
In my Monotonic sort I have the first level is a post morphism, the adjustment function in the second level is a premorphism. Alan Perlis asked, can you reverse it and make the outer one a premorphism and the nested one a post morphism? “Still I can’t solve his question.” If corr, correspondent communication, then yes right away but if using mirr I still have no answer. Some day I’d like to come back to this again.
Remember Backus in his Turing award lecture emphasized algebraic transformation,
So GM showed how, e.g., #(last 1) transforms to # corr,
You may ask, why? Last one communication is like this. Suppose you have to call 8 people to go to a party, last 1 means 1 calls all 8. But it’s faster in a phone tree, which is done in 3 steps not 7. That’s one transformation shown in the thesis. Another is corr = LR under the condition that communication is correspondent. Many such algebraic transformations can be done quite easily.
Is it mere fun? No, it has much significance. By means of such a transformation, O(n) could becomes O(log n), for example.
Even Google people don’t have the concept of computational intensity.
CI = Local calculation count / Communications count.
Given a bit of data, how many times do you use it? If only once then it means you need to move it for every use. But it should be many times, not once, since communications are actually slow.
Looking at matrix multiplication, data is N^2 and computation is ^3, so the ratio should be CI = N. (See https://sites.cs.ucsb.edu/~gilbert/cs140/notes/ComputationalIntensityOfMatMul.pdf)
So anything that has CI ~ O1 is stupid.
Like for example Google. Like Map-Reduce. Consider the dot product, it’s a great tool, but it has computational intensity of O1. You have to bring in each factor, then do one multiply/add. There is no reuse of the data. That is, a vector inner product, N dot N is actually very slow and yet the whole field is using this. Inner product is done all the time by everyone in AI, linear systems etc, so much waste of time, money, electricity. But who wants to hear that?
What’s the trick, the better way, the improvement? Just don’t use
inner product in the first place. Done the right way, NxN Matrix
Multiplication has CI of O(N), can it be divided by DC. One way is to
do eight smaller (or two or four), basically reduced to Om with m
I’m dumb. DC in a different sense.
Once you understand that, throw it out and look at the
computation. What kind of communication, divide, local, combine it
becomes so simple even a grade school student can get it.
Build a DC using python. Implement the different PDC parts div comb
comm and build it and call
Then it will run.
I had 3 PhD students at Xiamen Univ a few years ago. I explained
things to one, in detail. Still not.
Not running on his own computer. A few hours later it’s running.
Even watching by my side he couldn’t do it. Why so hard I don’t know.
Local vs comm vs structure change.
DC is simply to identify the three.
Ma() as combination is wrong thinking.
Once divided that way very some. People quickly confuse themselves.
—-
Tom thinking about wavelet ish Fourier transformation.
Instead of a Fourier equiwidth array of F_b’s,
just don’t overlap the b’s, thus a
Fourier tree with every subsequence analyzed having only 360 degrees
to yield a result, ie no bend overlapping.
Let F b,N,X bits = sum_{s=0..2^(N/2-b)} X[s] e^{isTb/N}
At DC, the window is full on ei0. F0
At lowest frequency, b=1 the whole window wraps on one bend. F1.
At b=2, half the window wraps for a bend, that comes out as a freq result for F10
And the second half of the window wraps for one bend which comes out as F11.
Recurse till a single sample pair wraps to one bend. which is difference of the two points.
Last to a single sample at e^{i0} is the point itself.
Hence the Fourier tree includes at the leaf level
the data itself, preterminally the pairwise differences (nonoverlappingly),
and as you go toward the root of the tree, widening windowed, lower
frequency F values using more samples.
F(0|1)+ define the indexes, each has a complex value.
Only half the data informs each node at the next level. Is that a problem?
If you want to represent fine detail finely you should sample faster
for more data to estimate at the higher frequencies.
Should the results at each level be subtracted from the next level?
Do we have orthogonality? If so then no. (Really?). Code it with a
flag yes no to shift the points at each level to center on the F[bits]
complex frequency at the previous. And see what is more interesting or
effective.
Can this be generalized to 3D for memory or scene representation, or
to 2D image or LIDAR map processing to yield a 3d multi scale bloblet
transform?
Mou thesis errata.
Item 4 p56 missing close square bracket.
Example 4.1 p78 last line, 28 is missing.
P87 last word identity not identify.
P88,89, algorithms 4.9,.10,.11 “mm = (“ should be “mm = PDC(“.
Algorithm 4.11 should have atom? before mm_b.
6/19 George Mou, phone call 1pm after meeting Simon.
Here tom, think about 1+1 = 2, do you or don’t you throw out the
Apple, and do the math regardless of what the counted things
are. Tigers, trees, apples, or other counted item. For FFT all you
need to know is in the spirit of divacon, no Apple tree or Tiger, but
communication calculation division (= structure manipulation).
Butterfly multiplication, two numbers…
In a good algorithm what is the division…
In the thesis are sets of PDC, divide into two and within each the
parts communicate and the results go to a new result by the local
operation.
Division is structure manipulation. Cut one piece of rope. Is it by indexes? No.
Division does not do normalization.
When you cut a rope in the middle, it’s a new structure. Yes you
could say the first element of the second is array was #4 of the
original, but equally valid is it has a relative index so no it’s zero
on the right which *means* 4 on the parent. It’s a relative index.
Division communication and local computation.
None contains Anything of the other.
Local op and communication and calculation.
Frequency amplitude etc, forget it.
If a 2nd order PDC should not be a 2nd order PDC. That’s over
complicating like 1+1-1+1. So it’s not 2nd order.
Resist is a combination of the two so composition of communication
plus local operation over the communicated data.
The adjustments are performed in the division or combination
phases. In dividing phase its post if in combining it is preadjust (i
think i transcribed this backwards).
F x = y so what’s done in the domain (x) is pre, if done in the y
domain then It’s post.
In FFT in time domain is pre, in freq domain it’s post.
Similarly counting ones, one what, we don’t care.
Even in applied math you might skip the units. Even Newton F= m a I don’t need to know f what.
Odd even is a premorphism. LR is a post morphism. “Tom you are not alone. Very smart people don’t get this.”
To build any PDC pick for division just choose 1 of 2; communication
is correspondent ie butterfly communication, local op is coefficient
times other value plus self.
“If you take more than one minute to write a PDC down you know you must be wrong. “
Let’s consider Newton F = m A, you might approach it with a lot of
experimental measurements etc. Then maybe you will believe the
math. You can say the experimental work is admirable, that’s okay, but
it is unnecessary.
This decomposition is so powerful.
In term of programming FFT we have LR, CORR, MA that’s it.
Don’t write a Divacon interpreter, it should be self persuasive. To
run it for applications, ok feel free to do it. But spend one minute
to write down the three things. Write down a PDC for FFT. And send it
to me.
Communication should feed calculation.
Do some examples in the thesis and really understand it then come back
to this.
2nd order PDF aka how to make a fast algorithm slower. Don’t figure
it out don’t even figure just forget about that.
“A real story: at Boeing I was a pretty high up manager. But atypical
among managers I did my own programming. My group said they completed
something. Usually managers wouldn’t look. I looked and said you guys
are very creative; they looked happy. This is really original work,
bigger smiles. You are very talented in the way you present this
problem to me, to work hard to make it complicated even in 100 years I
couldn’t do something so ridiculous and slow. This requires a lot of
originality. I was so harsh.
It’s like in physics learning F = ma to do experiments and then see if
a result says so, then later we throw out all the quantities and
units. PDC says throw out all the instruments and units.
One plus one equals two is a big thing to understand, many have a hard time.
Most people never graduated first grade.
Ok I’ll just try to understand something
maybe not even separately from understanding computing it. Ignore the
meaning of the parts in PDC. 1+1=??: Okay two is the right answer.
The calculator didn’t ask for what things you are counting. PDC gives
you the answer, boom done. The result is right.
So just go to the PDC directly, write it down and do it right.
Do some simple examples in the thesis, simple and comfortable ones. Develop abstraction and throw out all unnecessary details.
George 7/5/24
I start explaining an approach I'm trying to take to understand and do one of his ideas.
Finally he interrupts saying, Tom, you're going in the
wrong direction. There no point to explain a wrong solution.
This is the way to talk about fibonacci numbers. By definition a
number that depends on some previous numbers in this case two previous
numbers with some dependency.
You can expand it to put it in a matrix form, so the two numbers
before it and the number itself. F n-1 in turn depends on F n-2 and F
n-3.
The trick is to shift the dependency from two previous ones.
This is captured by a 2x2 matrix, each time you shift the dependency it
changes, the way it changes is matrix multiplication. Until it
reaches all the way to F1 F2.
For each group of 2 you align the dependency so it relies othe same two things.
Then for a group of 4 you align them so they rely in the same four
previous Fib numbers.
Then align 8 and make them all align on the same thing.
Then you can shift the dependency with broadcast, which does nothing but two,
Within the current division group, too small,
The data representation never grows, two 2x2 matrices.
Finally it’s moved to the first group in the entire sequence,
In that case you have the final solution. Shift two by two matrices
No larger matrices, whenever the data to be communicated dgrows, you
know you are wrong.
By hand do a 16 point FFT.b
#2, do scan by hand and pencil.
Then do two more. Scan related.
Another is, do a scan, a premorphism scan. Do it and run it.
Another homework. This scan uses broadcast. On real machines it
means the data needs to go out, to be sent outfox fan out of 32x for
32 receipting processors. This imposes some sequential time cost.
Can you transform the scan to an algorithm using corr as opposed to
broadcast.
So run it with the divacon interpreter in your head, using a pencil
and paper.
Make it work, then you know it's right.
T: we are tracking the types? G: It’s exactly that.
You should remember divacon is following the functional programming approach.
The basic element is the function, not numbers.
A function always has a domain and codomain and the mapping between then.
That’s why the
In general in any divide and conquer algorithm, the division is in
the source domain but dividing, the combination is in the co-domain
of the output results.
The + involves both communication and local operation.
Doesn’t have to be bidirectional like butterfly FFT.
+ : # comm
For any DC, this is so.
Perfect shuffle, inverse, reverse, etc.
And for stupid map reduce which is all Google understands, beyond
which they understand nothing.
Another day:
Don’t just read and understand each sentence and combine from
beginning to the end in flow, that's a start but then now you cannot
insult the work by not doing the exercise, for each day of reading,
time spent reading vs exercise, is 1:20. If a person says, have you
learned calculus, I told you I learned calculus when I was sent to the
countryside in very harsh conditions with my schoolmates, not only one
textbook but one as main and another was 3000 problems or more, so I
spent 1.5 years to complete it, built myself a really solid knowledge
and ability to solve problems. Some schoolmates actually commented on
me, George, people are really different? Yes? Yes. One other
classmate started learning calculus and in three days he finished. I
said. Okay All right. Of course, eventually the Cultural Revolution
ended, and college reopened, In 1/3 Shaanxi province I was number 1,
that guy failed it. In 2 months I took the entrance exam to grad
school, there they really rest the calc problems, I was competing with
students from before Cultural Revolution. Score #1 was 61, #2 was 60,
me was #59, #4 was 39.
I’m just saying doing exercises is very important. It’s not a novel.
In three days he turned over all the pages. People are doing that,
and calling it good. But that's wrong, and so he never went to
college as a result.
Are you doing an Open source project? Some day so. Don’t feel very
good. Looking forward to car t procedure. Hard for a couple of
months but after that a new world, I imagine I can basically go back
to my old state. I used to go to gym every day of the year, 2-3 hours
every day, and in winter a lot of skiing, summer a lot of roller
blading. I don’t think I can completely go back to that, but at least
work on the research.
Going back back to the proof of DST that it can search k dimensional
space in constant time. I have a proof but simpler now which will be
shocking to the community. The previous version is not published.
Your question showed a lot of confusion about a lot of things, Tom.
A network has a topology, can be a 2 d mesh, nxn or 3D nxnxn, or 4d or
5d or log dimensional mesh is called a hypercube. Any mesh has a
diameter, referring to the shortest path between remotest processors.
Use 2D as easiest to plug into the head, for a 2D diameter is corner
to corner. N-1 + N-1. You cannot make it log N away.
A 3D cube is n-1 * e3. Two furthest are log N.
So to pass a message to someone who is not your direct
neighbor. Electric poles in the country, just pass the power it to the
next.
Higher dimension, smaller diameter.
Nvidia is so stupid it is unbelievable, and so successful, #2 biggest
enterprise in the world with such a stupid idea. It is Beyond
imagination. I can ask, Can you invent a more stupid communication
model? No. I cannot think of one. I challenge you to find a stupider
model. I believe you cannot invent such a one. Well you could say,
Let’s send it to everyone 1000 times, or send randomly to anyone,then
you would also have to be crazy.
A paper I wrote that you can read is called “communication optimal
algorithm of K d FFT in Md mesh for arbitrary k and m.” Best paper
presented in international conference I Munich Germany, got the best
speaker of the conference, of course no one understand it. Search
with “Z. George Mou”.
7/6/2024 phone call notes.
The key to the Fibonacci parallelization is CI: computational
intensity: the ratio of operation count to data element count.
E.g. matrix multiply has data of 2 NXN matrices and N^3 operations so
CI = O(N).
Google map reduce has o(1) intensity.
You want a small block of data and a lot of work. No one seems to know
this principle.
TV are these things in anyone’s curriculum?
GM they are in my curriculum. But the general statement can be made
as follows:
Anything good is not in the current curriculum. Particularly when it
comes to parallel computing almost everything in the curriculum is bad
and wrong, as wrong as could possibly be.
TV: ArrayFib for n number of fib numbers the task to compute all in
log n time means it needs to be done in parallel. Any one can be done
in log n by itself.
GM: Fib is special case of order k difference equations with order k=2.
Also it is a special case of a triangular linear system, below
diagonal the are all constants, above are all zero.
New publications in the field are all stupid. Unbelieveably.
Consider for example you know automata can be considered as
transducers, right? But if I had input and outputs of 2^20 elements
did you know, same as fib you can do it in log n. Yes for an arbitrary
finite state automaton.
GM: I have a principle, anything that looks sequential is actually
parallel. Neighbor dependency seems to prove sequentiality but by GMs
principle it must be highly parallel.
The arguments always go, By induction this process operates 1 by 1, then …
TV: that’s nice bait you dangle before me. I am the fish you are the fisherman.
GM: Somewhere it says in the thesis but
any postmorphism with correspondent communication and dlr, division, is
equal to a premorphism with deo and corr,
Just draw the picture then you will see and know. Assume local
operations same. It is too obvious, it is trivial. And that’s why
there are two FFT algorothms. But they publish them separately
because they don’t have the vocabulary. It’s so simple.
TV: So ok that’s already a big deal, but what if not corr comm?
GM: The three basic communications are: corr, broad, mirr. Correspondent, broadcast, and mirror.
Broad can be transformed to corr. Think about it.
Mirr: can you turn mirr into corr? TV: Reverse first? GM: this is a
good question but it needs to be worked out. But to do that, first
show the transformation from broad to corr. So for clues to turn one
into another look at that case then work on mirr.
Backus in “Can programming be liberated from Von Neumann”, said,
programs should be syntactically transformable like mathematical
expressions in calculus and algebra. (Eg associative…) In DC there
are many such equations. Eg this one of dlr + post + corr = deo + pre
+ corr. Lots of these exist. T: that’s amazing. GM: looking at
current papers they are not ashamed but they give stupid solutions.
TV what is a recipe for better papers? Maybe, Do they separate the
three dimensions. GM: you can’t use divacon without separating
them. So just say it in divacon. Just like in chem and physics, the
three things photon neutron electron, you have to decompose to those
to understand anything. If you don’t do that you have no idea. It is,
Come to divacon or Fail.
TV: Like Zellig Harris and henchmen at Penn thesis defenses, you will
say How does this fit into MY theory. GM: maybe in linguistics, but
here, the fact that the universe of computation this whole world,
consists of the three components (TV: dimensions?) is a fact.
Anything in conflict with this view goes nowhere. You don’t have to be
polite. Same with DC. In fact this can be said: In the entirety of
computation there is nothing else but these three things. For any
computation, whatever a computation it is made of is, all and only,
these three things. So for example, Iteration is also Divide and
Conquer, indeed a stupid one, but also dc
For example a For loop, is nothing but head tail division. It is
DC. Ask generally: Can you turn head tail into balanced? Yes under
certain conditions this can automatically be done. TV: What
conditions? GM I should be careful to be fully general in the
statement, but in most cases it is trivial.
Both: Everything is either trivial or false.
So you see pre to post, broad to corr, and HT to LR and sometimes to
EO you can go from one place to others, there are a lot of choices and
so as Backus said, a program should have that property.
GM: it’s like in circuit design, one Boolean expression can be
realized by many circuits; you can pick out one for what fits in terms
of speed, gate count, etc. In terms of parallelisms it’s the same.
The world does not know this.
GM: I owe the world to talk in such a way that people do not give up
in the middle, like my Yale professors who said we dont understand
why, even though we think you are right. I did get best dissertation
of the year at Yale, and for nomination to ACM but not even one really
understood.
TV: Am I the only one to come sip at the fountain of George Mou?
GM I had students but it is sort of hopeless. Brandeis, Hopkins,
Xiamen university, if I gave something to them they would graduate but
if not they are going nowhere.
T: To me it’s like the need for homework. To become a pickle one has
to marinate like a cucumber in the salt or the vinegar of the problems
and expressions.
T: What village did they send you to when they sent you in the
cultural revolution? Chang hsing tsun village.
GM: I listen to Tchaikovsky and think, how could he have done that?!
NOBODY could have done that. I can only think, he must have gone to
another place to bring this to us. Like going there and God gives us
something to take back.
As an example: DST came from a challenge: with my teammate we said
split the problem; George, assume the 3d space search problem is done
and you do your part. So GM did his part. After two weeks, great
results. But by then the other fellow had nothing. What have you
done? He had looked at hexagons. GM said ok that’s beautiful but what
does it give you? No answer. Does it have to be so complicated? Can
it be simpler? “That night I watched TV till 11, then by 3am it was
running. Plus I spent part of the intervening time writing lots of
theorems. My friend said if you can do this I’ll give you my one
month of salary. He didn’t give me the salary, of course. Even now
after this remote time the algorithm and results I would not change at
all. I look back and think, How could I possibly have obtained all
those insights in a few hours. Even now I have difficulty to
understand. My explanation is someone took me to heaven to give me
something and threw me back down to earth. Later I did spend several
months teaching this to others and even then people did not
understand, those were PhD students. Not even close.
Tv: George where did you get your spare and mathematical writing
style? GM: I read a lot of books and of course they influenced me.
At Yale a little book called elements of style (struck and white), of
course I read it carefully. I also believe in this: if you think
clearly then you can write clearly.
TV: The k dim FFT m mesh, paper, it’s a paywall loop. GM: keep
looking, from some places you can download it directly. That article
is probably completely hopeless, the way I wrote it no one, not even
one person, could ever understand it. But give it a try. It’s really
good. If the entire field could understand that they could save a lot
of computational efficiency, even these many years later. Thinking
about that is very depressing.
TV: that’s why I’m here, George, to give you hope.
Ok let’s stop for now. Time for dinner.
—-
7/8
Decompose pbits with pbits.max into bit substrings that know their levels a, finally single bits 1<