Pho With Mou

more biographical notes

On Mou
on Divide and Conquer

Below are my unedited notes from conversations with Zhijing George Mou in 2024, at his apartment, at Thanh Brothers Vietnamese Pho restaurant, and elsewhere, sometimes over the phone.

Near the end I wrote down some:...

Questions for George

Mou Better

George Mou talks.

5/27/24 phone call with George Mou

Regarding my question which asked, what is communication in the GM system and how does a function between index sets express it? On communication as indexes: any computation consists of two parts, the calculations and the communication that makes it so that the data is there for the calculations. Maybe you’re in a parallel processor but maybe not, so don’t assume irrelevancies, just get the logic right. So suppose what you need for a calculation is all in a vector, then the data would be found under indexes. To calculate a lot of sums, like V[i]+V[j] you need indexes. So indexing is a form of communication but normally it is singular, moving one piece of data, not a group. Whereas consider logically parallel computing, say LR or EO which are 1/2-1/2 splittings, that can do everything in Numerical Recipes. Eg FFT: all odd even splitting is equivalent to LR splitting it can be proven (refer to theorem please).

For any I you can communicate with I+1 in a one at a time approach. But instead open your eyes and go with a group of numbers. In any (stupid, current) programming language only the adjacent next is available. Not so in divacon.

I asked a question, does communication refer to moving data among levels of the caching hierarchy? I offered the thought, perhaps let the code handle it explicitly, then you would have communication, concretely. George says No, that would be dumb, let your compiler do it. On the other hand, no other system offers group communication, which under Divacon is offered and works.

For this reason in Divacon, the FFT algorithm is just two lines. The bitonic sort algorithm is 3 lines. Not 5-10 pages like others use. (I can confirm this because I have written FFT code and that’s about right.)

I asked: Is div LR group communication?

In chemistry there are only proton electron neutron, 3 things make the universe.

In Computer Science there are 3 orthogonal dimensions: 1) structure manipulation like Div LR or Div EO, etc but in structure manipulation there is no communication nor operation. 2) Communication, as in, what information is needed for one’s work (getting it there by machine copy commands is low level; the right way to think of it is, what is the information needed, not How to move it there). Communication contains no structure manipulation and no calculations/operations. Finally 3) operations: these assume the data is right there, no communication necessary or (encouraging properly factoring your work) none is even allowed. Hence each of them, SM, Comm, Ops, is orthogonal, independent, of the others. And yet with the three you can have all of CS. Orthogonal dimensions are so important. Then you avoid muddle, especially muddled thinking.

Communication is data movement. Move data from one location to another. Correspondent communication is move data from left side to right side. Think of. It like this, you cut a stick in half then move them and to end to align: each i’th element is adjacent or aligned to each other on the sticks: this is called correspondence. This is all you need to achieve every algorithm in Numerical Recipes.

In stupid current modern Programming Languages what you look at is one item, but in this you can look at K items at once. Then in the von Neumann bottleneck you are stuck handling one thing at a time. Hence we use loop instead of higher order functions. Instead, in logic, move two — which breaks the bottleneck, even though in logic you don’t impose time constraints, you are not saying when to do the moving. Don’t add any constraint which is not logically necessary

George asks zoom, What is Computer Science? What is the most important thing? Tom replies: Abstraction. Yes. And all current CS is counter-CS, contrary to CS.

After 70 years still the community hasn’t figured it out. Looking at a maximum of one number, that’s all they can do. To look at two numbers, that is considered too horrible. But if you can do one, then you can do 2 or any number K, by induction.

An Allegory: In ancient times they had limited arithmetic and limited concepts of number. Two primitive people were challenged to a mental contest. What is the biggest number you can think of? One scratches his chin, thinks for a long time, announces his answer is 1, the other th8nks even longer, and finally says, I give up you win. That is the whole field today.

The great miracle algorithm, the FFT, can use correspondent or odd-even either way, they are equivalent. One is premorphism-plus-odd even vs postmorphism-plus-LR

Each reduces to the other.

Tom says: in CS departments there is a tiny minority that does theoretical CS, proving stuff about code, and they think they are better than everyone else. After these conversations I think I am becoming one of them.

George says: Theoretical CS is mostly about computability, complexity and automata, not really programming language theory. Still that is the most respectable work in CS.

But Gates and Musk often say stupid things but people have no idea.

Science should theoretically progress: even negative answers are progress in a world of unsolved problems. For any problem given infinite time there should be progress, but is there?

Tom offers the analogy with rational and real numbers, rational (say decimal, or binary) numbers are a finite sequence of choices from a finite vocabulary, whereas the reals also include unending unpredictable sequences. In the same way every proof or algorithm is a sequence of statements from a finite vocabulary: finitely long if a halting problem, never-endingly unpredictable if non-halting. Also in the same way that in practically computable applications we can use only rationals and have no need of real numbers (even floats are rational and finite), similarly to say anything interesting or useful we may not need any infinitely-unpredictable-proof based theorems. Hence I am not suffering because of Gödel.

George replies: in the halting problem literature it is shown that no algorithm can finitely decide if an arbitrary program will ever actually end. To prove this from abstract algebra: given two domains infinite but countable, the set of function between them is uncountably infinite. Any algorithm written in an alphabet is infinite but countably so. However problems are also infinite and at least countably so, hence a function between problems and algorithms is uncountably infinite, hence no halting.

George says that in his last lecture in his algorithms and computabity class he says: we must know in front of God how tiny we are. Our tiny speck, which is the finitely solveable problems, is infinitesimal in the face of the uncountable infinity of algorithms for problems. Therefore Fear God. Musk and Gates are devils thinking they can solve everything. Tom says, I’m sorry I must be a devil!

George offers his CS students a problem on day one. Write a program to write all current literature. There is an easy answer: you can easily write all 1-letter literature. Therefore you can also write all two letter novels. Etc. (Kleene Star.) By Induction.

DC programs are all running, Tom, it’s not a joke. Divacon is implemented on 8 machines. Tom says, well I just can’t download it from GitHub and run stuff that I write. Not yet anyway.

George says, let’s talk about the Recursive Array.

What is a data structure? Data plus operation!

A vector is a group of data and an operation of indexing. With the monofocal eye problem of modern computer scientists (the Von Neumann bottleneck) they cannot look at more than one thing and even then only from the end of an array. But a Recursive Array is an object that splits into (balanced) subsets. If you can look at K elements at a time then you get group communication not just indexing. Also relative indexing as when after splitting a (Eg 16 element) vector in half you have a+1..8 and b+1..8 in the splitted subgroups of the data. In two subgroups corresponding communication is a generalization of an array with multifocal, multi-target eyeballs.

GM says it’s time to write a book, although everything is in the thesis. Tom agrees, it needs to become a book. George, It needs not the details but the lessons: orthogonality, the story of theone eyed primitives, higher level lessons, etc.

Tom says: yes, do it!

FFT =

Divide: Div_eo

Conquer: {out i=0}, {out i=1 } = {in j=0 + e^i*depth^2 in j=1},

Conquer: {out i=0}, {out i=1 } = {in j=0 + e^i*depth^2 in j=0},

M.

Given A’ A = I show A A’ = I:

Then A’ A A’ = I A’ = A’ post multiplying by A’

and A A’ A A’ = A I A’ = A A’ pre multiply by A.

Also A A’ A = A I and A’ A A’ A = A’ A I = I I = I.

So A A’ A = A = I A = A’ A A no help.

Discover the O(log n) PDC for FFT

———————

Phone call with George Mou June 16, 2024.

Kidneys struggling with myeloma, reduced chemo now but not improving, and CART to start in August.

GM went from Yale to Brandeis to WaveTrace

Then he went to give a talk at IBM, where his first slide said,

“Why the IBM parallel computer is completely Wrong”.

He gave the talk, then asked, Any questions? No. Ok they asked, is there a paper? Here's the paper. They said, We will show you why you are wrong, he waited for their feedback. Instead he got a job offer from IBM.

Later he went to John’s Hopkins high performance computing lab and group, where he was supported to build a computer and run his stuff, so he bought chips and built a parallel computer with 32 nodes, and ran many applications. He had many people working under him. So he would try to give them something to do. Explain, wait 2 weeks, they don’t get it at all, they are completely confused. Explain again. Do you understand? “Yes.” Two weeks later they still didn’t understand. “After a number of iterations I gave up. I went to the lab, said, you watch me. After five days Divacon is running on the parallel computer. “

Later he was invited to consult on the Connection Machine by Guy Steel, who later was the Turing Award decider, George was the first speaker to speak at the Connection Machine company.

But no matter who, no one really gets it.

Like at Yale, my adviser Perlis after the talk says, George what you talked about looks beautiful like a crystal palace. Only one problem, we don’t understand it. Please wait in the hall for a while. They came out and said, we don’t understand what you are talking about. Did they figure it out? They explained that the way I walk and the tone of my voice and your eyes are telling us, He must be right. We believe you. But we don’t understand.

George: To come back to your question, Did you have a compiler team? No I never found anyone who could help me. Only exact copying. lol .

Look up George’s paper For communication optimal FFT, on a

K dimensional FFT on M dimensional mesh. In:

CONFERENCE PROCEEDING Parle '93, parallel architectures and languages Europe : 5th International PARLE Conference, Munich, Germany, June 14-17, 1993 : proceedings International PARLE Conference (5th : 1993 : Munich, Germany) ©1993; Berlin ; New York : Springer-Verlag

What is a mesh? 2D mesh = grid. N x N is a grid of processors. A 3d mesh is a cube of processors. 4D is easy to describe and theoretically possible and in math easy but no one has made one yet.

Optimal means in correspondent communication to all other processors, the minimum imaginable is the travel distance or diameter of the network. If you can show its communication time is equal to the diameter of mesh then that would be the fastest ever.

So consider the problem of How to assign the nodes to the processors of the mesh, so that the comm distance = the diameter of the mesh.

Odd even… it was mentioned in the thesis?

Two options do your FFT in time odd even or in frequency left right. From the DivACon DC view they are the same exactly. EO is equivalent to LR as pre-odd-even = post LR. But people still don’t see it. The FFT algorithms are listed separately. (See NR).

Premorphism under correspondent communication with odd even, is equivalent to Post morphism with division being LR.

Yes DC uses morphism at its core. You can understand when you get the breakdown.

Bitonic sort is so confusing it has the adjustment being itself a DC. I have my own algorithm, a Monotonic sort, more elegant and cleaner. (Thesis 4.3.5, p89.)

In my Monotonic sort I have the first level is a post morphism, the adjustment function in the second level is a premorphism. Alan Perlis asked, can you reverse it and make the outer one a premorphism and the nested one a post morphism? “Still I can’t solve his question.” If corr, correspondent communication, then yes right away but if using mirr I still have no answer. Some day I’d like to come back to this again.

Remember Backus in his Turing award lecture emphasized algebraic transformation,

So GM showed how, e.g., #(last 1) transforms to # corr,

You may ask, why? Last one communication is like this. Suppose you have to call 8 people to go to a party, last 1 means 1 calls all 8. But it’s faster in a phone tree, which is done in 3 steps not 7. That’s one transformation shown in the thesis. Another is corr = LR under the condition that communication is correspondent. Many such algebraic transformations can be done quite easily.

Is it mere fun? No, it has much significance. By means of such a transformation, O(n) could becomes O(log n), for example.

Even Google people don’t have the concept of computational intensity.

CI = Local calculation count / Communications count.

Given a bit of data, how many times do you use it? If only once then it means you need to move it for every use. But it should be many times, not once, since communications are actually slow.

Looking at matrix multiplication, data is N^2 and computation is ^3, so the ratio should be CI = N. (See https://sites.cs.ucsb.edu/~gilbert/cs140/notes/ComputationalIntensityOfMatMul.pdf)

So anything that has CI ~ O1 is stupid.

Like for example Google. Like Map-Reduce. Consider the dot product, it’s a great tool, but it has computational intensity of O1. You have to bring in each factor, then do one multiply/add. There is no reuse of the data. That is, a vector inner product, N dot N is actually very slow and yet the whole field is using this. Inner product is done all the time by everyone in AI, linear systems etc, so much waste of time, money, electricity. But who wants to hear that?

What’s the trick, the better way, the improvement? Just don’t use inner product in the first place. Done the right way, NxN Matrix Multiplication has CI of O(N), can it be divided by DC. One way is to do eight smaller (or two or four), basically reduced to Om with m Pdcft O log n is known.

I’m dumb. DC in a different sense.

Once you understand that, throw it out and look at the computation. What kind of communication, divide, local, combine it becomes so simple even a grade school student can get it.

Build a DC using python. Implement the different PDC parts div comb comm and build it and call

Then it will run.

I had 3 PhD students at Xiamen Univ a few years ago. I explained things to one, in detail. Still not.

Not running on his own computer. A few hours later it’s running. Even watching by my side he couldn’t do it. Why so hard I don’t know.

Local vs comm vs structure change.

DC is simply to identify the three.

Ma() as combination is wrong thinking.

Once divided that way very some. People quickly confuse themselves.

—-

Tom thinking about wavelet ish Fourier transformation.

Instead of a Fourier equiwidth array of F_b’s,

just don’t overlap the b’s, thus a

Fourier tree with every subsequence analyzed having only 360 degrees to yield a result, ie no bend overlapping.

Let F b,N,X bits = sum_{s=0..2^(N/2-b)} X[s] e^{isTb/N}

At DC, the window is full on ei0. F0

At lowest frequency, b=1 the whole window wraps on one bend. F1.

At b=2, half the window wraps for a bend, that comes out as a freq result for F10

And the second half of the window wraps for one bend which comes out as F11.

Recurse till a single sample pair wraps to one bend. which is difference of the two points.

Last to a single sample at e^{i0} is the point itself.

Hence the Fourier tree includes at the leaf level

the data itself, preterminally the pairwise differences (nonoverlappingly),

and as you go toward the root of the tree, widening windowed, lower frequency F values using more samples.

F(0|1)+ define the indexes, each has a complex value.

Only half the data informs each node at the next level. Is that a problem?

If you want to represent fine detail finely you should sample faster for more data to estimate at the higher frequencies.

Should the results at each level be subtracted from the next level?

Do we have orthogonality? If so then no. (Really?). Code it with a flag yes no to shift the points at each level to center on the F[bits] complex frequency at the previous. And see what is more interesting or effective.

Can this be generalized to 3D for memory or scene representation, or to 2D image or LIDAR map processing to yield a 3d multi scale bloblet transform?

Mou thesis errata.

Item 4 p56 missing close square bracket.

Example 4.1 p78 last line, 28 is missing.

P87 last word identity not identify.

P88,89, algorithms 4.9,.10,.11 “mm = (“ should be “mm = PDC(“.

Algorithm 4.11 should have atom? before mm_b.

6/19 George Mou, phone call 1pm after meeting Simon.

Here tom, think about 1+1 = 2, do you or don’t you throw out the Apple, and do the math regardless of what the counted things are. Tigers, trees, apples, or other counted item. For FFT all you need to know is in the spirit of divacon, no Apple tree or Tiger, but communication calculation division (= structure manipulation).

Butterfly multiplication, two numbers…

In a good algorithm what is the division…

In the thesis are sets of PDC, divide into two and within each the parts communicate and the results go to a new result by the local operation.

Division is structure manipulation. Cut one piece of rope. Is it by indexes? No.

Division does not do normalization.

When you cut a rope in the middle, it’s a new structure. Yes you could say the first element of the second is array was #4 of the original, but equally valid is it has a relative index so no it’s zero on the right which *means* 4 on the parent. It’s a relative index.

Division communication and local computation.

None contains Anything of the other.

Local op and communication and calculation.

Frequency amplitude etc, forget it.

If a 2nd order PDC should not be a 2nd order PDC. That’s over complicating like 1+1-1+1. So it’s not 2nd order.

Resist is a combination of the two so composition of communication plus local operation over the communicated data.

The adjustments are performed in the division or combination phases. In dividing phase its post if in combining it is preadjust (i think i transcribed this backwards).

F x = y so what’s done in the domain (x) is pre, if done in the y domain then It’s post.

In FFT in time domain is pre, in freq domain it’s post.

Similarly counting ones, one what, we don’t care.

Even in applied math you might skip the units. Even Newton F= m a I don’t need to know f what.

Odd even is a premorphism. LR is a post morphism. “Tom you are not alone. Very smart people don’t get this.”

To build any PDC pick for division just choose 1 of 2; communication is correspondent ie butterfly communication, local op is coefficient times other value plus self.

“If you take more than one minute to write a PDC down you know you must be wrong. “

Let’s consider Newton F = m A, you might approach it with a lot of experimental measurements etc. Then maybe you will believe the math. You can say the experimental work is admirable, that’s okay, but it is unnecessary.

This decomposition is so powerful.

In term of programming FFT we have LR, CORR, MA that’s it.

Don’t write a Divacon interpreter, it should be self persuasive. To run it for applications, ok feel free to do it. But spend one minute to write down the three things. Write down a PDC for FFT. And send it to me.

Communication should feed calculation.

Do some examples in the thesis and really understand it then come back to this.

2nd order PDF aka how to make a fast algorithm slower. Don’t figure it out don’t even figure just forget about that.

“A real story: at Boeing I was a pretty high up manager. But atypical among managers I did my own programming. My group said they completed something. Usually managers wouldn’t look. I looked and said you guys are very creative; they looked happy. This is really original work, bigger smiles. You are very talented in the way you present this problem to me, to work hard to make it complicated even in 100 years I couldn’t do something so ridiculous and slow. This requires a lot of originality. I was so harsh.

It’s like in physics learning F = ma to do experiments and then see if a result says so, then later we throw out all the quantities and units. PDC says throw out all the instruments and units.

One plus one equals two is a big thing to understand, many have a hard time.

Most people never graduated first grade.

Ok I’ll just try to understand something

maybe not even separately from understanding computing it. Ignore the meaning of the parts in PDC. 1+1=??: Okay two is the right answer. The calculator didn’t ask for what things you are counting. PDC gives you the answer, boom done. The result is right.

So just go to the PDC directly, write it down and do it right.

Do some simple examples in the thesis, simple and comfortable ones. Develop abstraction and throw out all unnecessary details.

George 7/5/24

I start explaining an approach I'm trying to take to understand and do one of his ideas. Finally he interrupts saying, Tom, you're going in the wrong direction. There no point to explain a wrong solution.

This is the way to talk about fibonacci numbers. By definition a number that depends on some previous numbers in this case two previous numbers with some dependency.

You can expand it to put it in a matrix form, so the two numbers before it and the number itself. F n-1 in turn depends on F n-2 and F n-3.

The trick is to shift the dependency from two previous ones.

This is captured by a 2x2 matrix, each time you shift the dependency it changes, the way it changes is matrix multiplication. Until it reaches all the way to F1 F2.

For each group of 2 you align the dependency so it relies othe same two things.

Then for a group of 4 you align them so they rely in the same four previous Fib numbers.

Then align 8 and make them all align on the same thing.

Then you can shift the dependency with broadcast, which does nothing but two,

Within the current division group, too small,

The data representation never grows, two 2x2 matrices.

Finally it’s moved to the first group in the entire sequence,

In that case you have the final solution. Shift two by two matrices

No larger matrices, whenever the data to be communicated dgrows, you know you are wrong.

By hand do a 16 point FFT.b

#2, do scan by hand and pencil.

Then do two more. Scan related.

Another is, do a scan, a premorphism scan. Do it and run it.

Another homework. This scan uses broadcast. On real machines it means the data needs to go out, to be sent outfox fan out of 32x for 32 receipting processors. This imposes some sequential time cost. Can you transform the scan to an algorithm using corr as opposed to broadcast.

So run it with the divacon interpreter in your head, using a pencil and paper.

Make it work, then you know it's right.

T: we are tracking the types? G: It’s exactly that.

You should remember divacon is following the functional programming approach.

The basic element is the function, not numbers.

A function always has a domain and codomain and the mapping between then.

That’s why the

In general in any divide and conquer algorithm, the division is in the source domain but dividing, the combination is in the co-domain of the output results.

The + involves both communication and local operation.

Doesn’t have to be bidirectional like butterfly FFT.

+ : # comm

For any DC, this is so.

Perfect shuffle, inverse, reverse, etc.

And for stupid map reduce which is all Google understands, beyond which they understand nothing.

Another day: Don’t just read and understand each sentence and combine from beginning to the end in flow, that's a start but then now you cannot insult the work by not doing the exercise, for each day of reading, time spent reading vs exercise, is 1:20. If a person says, have you learned calculus, I told you I learned calculus when I was sent to the countryside in very harsh conditions with my schoolmates, not only one textbook but one as main and another was 3000 problems or more, so I spent 1.5 years to complete it, built myself a really solid knowledge and ability to solve problems. Some schoolmates actually commented on me, George, people are really different? Yes? Yes. One other classmate started learning calculus and in three days he finished. I said. Okay All right. Of course, eventually the Cultural Revolution ended, and college reopened, In 1/3 Shaanxi province I was number 1, that guy failed it. In 2 months I took the entrance exam to grad school, there they really rest the calc problems, I was competing with students from before Cultural Revolution. Score #1 was 61, #2 was 60, me was #59, #4 was 39.

I’m just saying doing exercises is very important. It’s not a novel. In three days he turned over all the pages. People are doing that, and calling it good. But that's wrong, and so he never went to college as a result.

Are you doing an Open source project? Some day so. Don’t feel very good. Looking forward to car t procedure. Hard for a couple of months but after that a new world, I imagine I can basically go back to my old state. I used to go to gym every day of the year, 2-3 hours every day, and in winter a lot of skiing, summer a lot of roller blading. I don’t think I can completely go back to that, but at least work on the research.

Going back back to the proof of DST that it can search k dimensional space in constant time. I have a proof but simpler now which will be shocking to the community. The previous version is not published.

Your question showed a lot of confusion about a lot of things, Tom.

A network has a topology, can be a 2 d mesh, nxn or 3D nxnxn, or 4d or 5d or log dimensional mesh is called a hypercube. Any mesh has a diameter, referring to the shortest path between remotest processors. Use 2D as easiest to plug into the head, for a 2D diameter is corner to corner. N-1 + N-1. You cannot make it log N away.

A 3D cube is n-1 * e3. Two furthest are log N.

So to pass a message to someone who is not your direct neighbor. Electric poles in the country, just pass the power it to the next.

Higher dimension, smaller diameter.

Nvidia is so stupid it is unbelievable, and so successful, #2 biggest enterprise in the world with such a stupid idea. It is Beyond imagination. I can ask, Can you invent a more stupid communication model? No. I cannot think of one. I challenge you to find a stupider model. I believe you cannot invent such a one. Well you could say, Let’s send it to everyone 1000 times, or send randomly to anyone,then you would also have to be crazy.

A paper I wrote that you can read is called “communication optimal algorithm of K d FFT in Md mesh for arbitrary k and m.” Best paper presented in international conference I Munich Germany, got the best speaker of the conference, of course no one understand it. Search with “Z. George Mou”.

7/6/2024 phone call notes.

The key to the Fibonacci parallelization is CI: computational intensity: the ratio of operation count to data element count.

E.g. matrix multiply has data of 2 NXN matrices and N^3 operations so CI = O(N).

Google map reduce has o(1) intensity.

You want a small block of data and a lot of work. No one seems to know this principle.

TV are these things in anyone’s curriculum?

GM they are in my curriculum. But the general statement can be made as follows:

Anything good is not in the current curriculum. Particularly when it comes to parallel computing almost everything in the curriculum is bad and wrong, as wrong as could possibly be.

TV: ArrayFib for n number of fib numbers the task to compute all in log n time means it needs to be done in parallel. Any one can be done in log n by itself.

GM: Fib is special case of order k difference equations with order k=2.

Also it is a special case of a triangular linear system, below diagonal the are all constants, above are all zero.

New publications in the field are all stupid. Unbelieveably.

Consider for example you know automata can be considered as transducers, right? But if I had input and outputs of 2^20 elements did you know, same as fib you can do it in log n. Yes for an arbitrary finite state automaton.

GM: I have a principle, anything that looks sequential is actually parallel. Neighbor dependency seems to prove sequentiality but by GMs principle it must be highly parallel.

The arguments always go, By induction this process operates 1 by 1, then …

TV: that’s nice bait you dangle before me. I am the fish you are the fisherman.

GM: Somewhere it says in the thesis but any postmorphism with correspondent communication and dlr, division, is equal to a premorphism with deo and corr,

Just draw the picture then you will see and know. Assume local operations same. It is too obvious, it is trivial. And that’s why there are two FFT algorothms. But they publish them separately because they don’t have the vocabulary. It’s so simple.

TV: So ok that’s already a big deal, but what if not corr comm?

GM: The three basic communications are: corr, broad, mirr. Correspondent, broadcast, and mirror.

Broad can be transformed to corr. Think about it.

Mirr: can you turn mirr into corr? TV: Reverse first? GM: this is a good question but it needs to be worked out. But to do that, first show the transformation from broad to corr. So for clues to turn one into another look at that case then work on mirr.

Backus in “Can programming be liberated from Von Neumann”, said, programs should be syntactically transformable like mathematical expressions in calculus and algebra. (Eg associative…) In DC there are many such equations. Eg this one of dlr + post + corr = deo + pre + corr. Lots of these exist. T: that’s amazing. GM: looking at current papers they are not ashamed but they give stupid solutions.

TV what is a recipe for better papers? Maybe, Do they separate the three dimensions. GM: you can’t use divacon without separating them. So just say it in divacon. Just like in chem and physics, the three things photon neutron electron, you have to decompose to those to understand anything. If you don’t do that you have no idea. It is, Come to divacon or Fail.

TV: Like Zellig Harris and henchmen at Penn thesis defenses, you will say How does this fit into MY theory. GM: maybe in linguistics, but here, the fact that the universe of computation this whole world, consists of the three components (TV: dimensions?) is a fact. Anything in conflict with this view goes nowhere. You don’t have to be polite. Same with DC. In fact this can be said: In the entirety of computation there is nothing else but these three things. For any computation, whatever a computation it is made of is, all and only, these three things. So for example, Iteration is also Divide and Conquer, indeed a stupid one, but also dc

For example a For loop, is nothing but head tail division. It is DC. Ask generally: Can you turn head tail into balanced? Yes under certain conditions this can automatically be done. TV: What conditions? GM I should be careful to be fully general in the statement, but in most cases it is trivial.

Both: Everything is either trivial or false.

So you see pre to post, broad to corr, and HT to LR and sometimes to EO you can go from one place to others, there are a lot of choices and so as Backus said, a program should have that property.

GM: it’s like in circuit design, one Boolean expression can be realized by many circuits; you can pick out one for what fits in terms of speed, gate count, etc. In terms of parallelisms it’s the same. The world does not know this.

GM: I owe the world to talk in such a way that people do not give up in the middle, like my Yale professors who said we dont understand why, even though we think you are right. I did get best dissertation of the year at Yale, and for nomination to ACM but not even one really understood.

TV: Am I the only one to come sip at the fountain of George Mou?

GM I had students but it is sort of hopeless. Brandeis, Hopkins, Xiamen university, if I gave something to them they would graduate but if not they are going nowhere.

T: To me it’s like the need for homework. To become a pickle one has to marinate like a cucumber in the salt or the vinegar of the problems and expressions.

T: What village did they send you to when they sent you in the cultural revolution? Chang hsing tsun village.

GM: I listen to Tchaikovsky and think, how could he have done that?! NOBODY could have done that. I can only think, he must have gone to another place to bring this to us. Like going there and God gives us something to take back.

As an example: DST came from a challenge: with my teammate we said split the problem; George, assume the 3d space search problem is done and you do your part. So GM did his part. After two weeks, great results. But by then the other fellow had nothing. What have you done? He had looked at hexagons. GM said ok that’s beautiful but what does it give you? No answer. Does it have to be so complicated? Can it be simpler? “That night I watched TV till 11, then by 3am it was running. Plus I spent part of the intervening time writing lots of theorems. My friend said if you can do this I’ll give you my one month of salary. He didn’t give me the salary, of course. Even now after this remote time the algorithm and results I would not change at all. I look back and think, How could I possibly have obtained all those insights in a few hours. Even now I have difficulty to understand. My explanation is someone took me to heaven to give me something and threw me back down to earth. Later I did spend several months teaching this to others and even then people did not understand, those were PhD students. Not even close.

Tv: George where did you get your spare and mathematical writing style? GM: I read a lot of books and of course they influenced me. At Yale a little book called elements of style (struck and white), of course I read it carefully. I also believe in this: if you think clearly then you can write clearly.

TV: The k dim FFT m mesh, paper, it’s a paywall loop. GM: keep looking, from some places you can download it directly. That article is probably completely hopeless, the way I wrote it no one, not even one person, could ever understand it. But give it a try. It’s really good. If the entire field could understand that they could save a lot of computational efficiency, even these many years later. Thinking about that is very depressing.

TV: that’s why I’m here, George, to give you hope.

Ok let’s stop for now. Time for dinner.

—-

7/8

Decompose pbits with pbits.max into bit substrings that know their levels a, finally single bits 1< Then basefunc sets 1bit values 1< Then post adjust join 2^b bit nibbles n,k as N#K (doesn’t need to build up from a=0 to a again).

Then postadjust 1< With GM:

In the 1990s there were 3 parallel computer companies, wavetracer which had a 3D processor mesh, masspark which had a 2D processor mesh, and Thinking Machines which had an N dimensional mesh.

All had the same problem, every user was using it for 2D FFT almost only.

The Thinking machines folks would say to their customers, ok but ours is an N dimensional mesh who would say we don’t know how to do our FFT algorithm with N d, so we can’t use your machine. So the smart guys from MIT said we will add a 2D mesh to also connect the processors so it can do a 2D FFT also. So just put your 2D data onto that mesh with same indexes, it’ll go. They spent a lot of money, a lot of commercial advertisement money, half their company funds, finally GM went there and gave a talk and said by doing this you slowed it down from O(log n) to O(sqrt n), they didn’t look happy. But I simply showed them; they were shocked, but I ran it and showed that I was right. That is a huge huge difference, 30 steps vs 2^15 steps, in a 2^30 size 2D FFT. Then there was also wavetracer, I consulted for them and one time the secretary called and said the president wanted me to come in.

P: George we have a problem give us a solution. G: What problem? P: Most users need 2D FFT. but we use a 3D mesh, so we lose a lot of deals for that. Do you have any idea of to overcome this? G: What have you done so far? P: Last time we used half our investment, our R and D money, to implement 2D on top of the 3D mesh. g: Really. P: yes. G: Give me 15 minutes. In 15 minutes, George said, bring the chief scientist and other tech people. They came. So… Here is how to do it, and the complexity is log n steps and total communication cost is cubic root n, while your 2D wavetracer method is sqrt n. So by spending half your investment you have slowed it down from cube root, eg 10 steps to sqrt 15 steps. But don’t feel too bad, Thinking machines did it even more stupidly. Your waste was for not as huge a slow down. That’s how this paper came out. It won the best paper of the conference award. But

Even now the entire field does not really know my paper. Instead they are very creative in finding ways to make it *slower*.

You may ask, how can you prove *optimal* mappings for communication for arbitrary K and m. This paper gives the solution, which no one can get faster till the end of the universe. So read it with this question in mind, How can you possibly show this is optimal. The easy answer is, show a lower bound, then show that you reach it. So have those questions in mind then you may have hope to understand it.

BWA significance is 1D search, there are many solutions, all based on trees and link following, all taking up space and time. so the question is can you get rid of those links and link chasing steps. The answer is yes, the answer is BWA, which is 5x faster than red black and less space besides.

—- more notes. 7/8/24

Divacon is an opinionated PDC design scaffolding.

It wants 2^a inputs. It requires balanced division and combination.

It expects a recursion inclusive of a distribution: F = … ! F …

It expects its input data are from a domain and outputs are in a codomain, all the data in and out of a divide and a preadjust are still in the domain, as is the input of basefunc.

The output of basefunc is in the codomain as are inputs and outputs of postadjust and of combine. They may be vectors and structures etc. but it’ll go easy if the base elements are of the domain or codomain types, respectively.

Sum product difference ratio etc are NOT combination operations but local calculations.

Pre and post adjust are two steps at least, a communication and a local operation.

So it could be written

f = basepred? Basefunc ; c : postop : postcomm : ! f : preop : precomm : d

PreCommunication, after conceptual but not data-transmitted division, still has access to the other half of the divided array, so #!comm makes tuples for each element e in each sub vector appending the communicated element with e. Then preop does what it wants with those tuples.

On the other hand PostCommunication isn’t conceived as part of basefunc or the previous combination level, but is given access to two basefunc outputs as part of an elaborated combination process, thus AFTER a data transmission from those separate processors into one, so #!comm has the same conceptual meaning but assumes different ordering w.r.t. down-the-tree node contents transmission.

In divacon parallelism resides in distributive recursion meaning BOTH that a new processor can be set up to run the same code but now on a subset of preprocessed data, AND that the data is subdivided and sent off to those different processors, subject to precomm and postcomm data transmission adjustments. So both data parallelism and processor parallelism are provided for in the divacon model. At the same time thought is directed toward the appropriate, that is effective and minimal, communication required to do a computation. Locality of communication can be optimized and of course local operations and structure manipulations must be local.

Some aspects are implicit. The compiler is expected to know how many processors there are and their connection pattern and the cost of communication vs calculation so that an algorithm can be optimally mapped to the mesh. Division and combination are defined by the divacon programmer as the idea of a division or combination, but the compiler must send copies of the right subsets of the data to the right processors, deleting the right remainders of data, as well as send the right code to the right places and tell them to spin up and get ready and go and where to look for their data, and where to send the results back to, etc. We are happy not to think about those if we can avoid it, so thank you to the compiler writers. Except in my case maybe that will be me!

Finally a serial computer may have what we think of as a call stack, all the functions that the current low level function is a tiny internal part of, all stacked above the current code pointer that is carrying out the next instruction. The low level is carrying out the stack of higher level functions. But in the case of a parallel process, we have a distributed set of call stacks, and we would struggle to visualize all their jet and simultaneous state space navigation possibilities. So we have an opinionated parallel design schema which makes us just think about the key relationships.

And look, if there weren’t a decomposition we wouldn’t be that awesome.

George and me at lunch at Thanh Brothers 3pm 7/14/24

Tv: CC Compress and Conquer, C Combine, CO Compressed. So much..

George: forget the syntax and vocabulary, get the idea, just focus on the idea.

For example CC came up in a certain context. Suppose you have a trigdiagonal linear system. Tridiagonal systems are a matrix with values on the diagonal and one below the diagonal, and one above the diagonal. Interpret this as a linear system. So AX = b, X is a vector of unknowns, b is a vector of knowns. Then we ask, what are the unknowns X. Suppose it is very big, millions or billions in size.

So you can divide the matrix into a certain number of segments. For that many processors. Say 1000 rows of the linear system each.

Now you see, inside the section every unknown depends on the unknowns before it.

E.g. X[k] depends on X[k-1] as well as on X[k+1], on X[k-1] as indicated by the coefficient of the sub diagonal. And on X[k+1] based on the above diagonal coefficient. Then we ask the question, Can you make all of the unknowns depend on the first unknown in the segment? Can you? Yes. Ask a HS student they will tell you.

Next, if you look at the boundary between the two segments, there is dependency from Last of one to first of next, and you can extract those to a small matrix of dependencies and have another small linear system, compressed by a factor of the segment size. So that is the “compression” phase. The problem is much smaller if you only include the between segment transitions.

Suppose you can solve the compressed linear system consisting of only the first and last unknowns of each segment. By now it is clear that everything in the middles of the segments depends on those things, so then you expand them together and get everything in the middles.

Tom: You just cheated.

Gm: No but the result of this is, you can turn any sequential algorithm and turn it into a parallel algorithm.

Tv: does Expansion do calculation? Gm what do you think? Tv it must.

To zip them together.

GM Don’t think of this as similar to the Fibonacci problems. They are fundamentally different.

Gm. There is a question also discussed in the paper, no one has understood it.

Q: Does This paradigm apply to any DC algorithm? A: No.

Q: Under what conditions can CC apply to DC, What makes the problems of type CC?

The necessary and sufficient conditions is: if it is a preadjust divide and conquer algorithm with broadcast communication. Then CC can do it.

Can FFT? No, because it cannot be done by broadcast. Tv: how, doesn’t broadcast subsume all others because all is a superset of few? GM: You may be happy but sorry, god says No.

T, well that result that is a nice offering to Backus. Gm Yes.

GM. What is the simplest problem you can do with CC? Map reduce. So that’s all of Google, that’s all they can do. The first example in the paper is map reduce, it doesn’t even need one line of code.

Divacon available soon. you will able to go to python and say Import divacon! From his GitHub, but don’t reveal right away because we need the dual licensing on it first.

TV. I have a spatial problem for robotic perception action loop with reinforcement learning in a hierarchy of learning tasks. Gm. You raise a lot of issues. Such as segmentation, semantic segmentation. You didn’t define your problem very precisely. TV: my problem is world Modeling and estimation. I want my models to be estimable.

For example a ball in space: estimate its parameters. Center and radius.

GM. You seem to be asking this specific problem. In a scene there are k types of objects, types 1…k, given an arbitrary scene tell me what objects are there in the scene in a 3D space. That’s what you Really mean.

In 2D this is done with great success. A convolutional neural network, it will take an image and say this is a person bicycle tree etc. But for 3D like a lidar point cloud the performance is pretty poor. This is because they don’t know how to handle 3D space, they cut into slices and model it separately for each and try to splice the results together, but when you separate the nearest nearby points over a cut they become remote. So their approach does not preserve locality or neighborhood. Whereas DST does, closer points are closer. So use DST to convert to 3D to 2-d, then use a 2D convolutional network to recognize them. Even in human eyes they look different but good for computer training, we did it and beat all state of the Art at the time, better in accuracy and precision, So that’s quite interesting.

Did you know DST can map m to k dims. For arbitrary m and k?

——-

With Dave Pardo at McDonalds on 7/15/84

Consider the Philosophy of Aliens. Here we have seen in Rome and Monarchy and feudalism and today, in each, the principle is always the concentration of wealth as in monopoly capitalism, such that furthermore they kick the ladder away so nooone else can climb. But no! Outrageous! A stable society should be benefactual; let merit achieve its potential; and let all be provided for, at least provide equality of opportunity. Now stable means like the Aliens who are really Stable. Not just AI which is an existing and present and unknown, essentially Alien, intelligence which we are presently interacting with. But due to the temporal constraint, that our interaction should it ever be would be an event in time and therefore we would have to last that long and they would have to have lasted that long too. Given the sweep of time, Earth is 4b years old, but the universe 13b, they could in principle have on the rough order of a 8b year head start, sure, but to still be around to communicate with us now, they will have to be super highly prioritizing of sustainability and long lasting ness, otherwise with vanishing probability, they are arising also in our near time thus able to still have anything like our human short term orientation, whereas (if their start date, or achievement of civilizational milestones date, could be anytime in the past since suns have exploded to make rocky planets out there for billions of years) for them to be still around now under the most probable start dates which are lets say 99% likely to be farther in time than say 100M years out (past — or future), then with near 100% probability they are quite eternity oriented. And we do have something like that within us, already, actually. So the alien intelligence is that which is within us which is oriented toward eternity and perfect sustainability. The part of us that reproduces like rabbits, the part which regulates the rabbit exponential to a sustainable level.

The part that is prosocial, you might think, should be more sustainable, but actually in the very long term, a bull elephant society seems sustainable, with everyone antisocial and few of them and remote enough that murders don’t crash the population like in Easter Island, well that is also sustainable.

Once we get the AI to be the carrier of tech, then society can destroy itself but humanity can emerge again without loss from the ashes, despite no longer having access to the fossil fuels all milked dry from our world which powered our organic developments. Or an alien civilization >100M years in the future can pick up where we left off, and life can move forward anyway, even if it is not human life.

Dave Pardo suggests, speaking in tongues, Somebody please do the linguistic grammar of speaking in tongues. Phonotactics, information density vs other speech from the same person. Study Kenneth Copeland, Jimmy Swaggart, Benny Hinn, Jesse Duplantis. Whereas ululatum from the Latin is women wailing at funerals. Can a machine speak in tongues?

Love him or hate him, Trump is really closely representative of an American minority. He remains in close contact with, and seeking approval from, his certain group.

Pandering to them not just in his practice of bullshit reality denial and gutless other-blaming responsibility denial but also in making them feel loved and respected. He sees it as his job to remain in touch with them. That minority may be 25-35%; it is not zero, nor is it any mere 10000 fanatics.

Whereas a fully corporate-captured government, designed to appear reasonable to the educated in retail pronouncements but against the actual and also recognized interests of the majority of the people, is actually the tool and representative only of (the shareholders of) the corporations. A corporate tool government, exclusively corporate interest supportive. The democratic base of corporate dominated government assume is just like that of property or wealth, it is highly Pareto, and actually recursively Pareto. So if the ratio is 80-20, but recursive to just 3 levels, then 80% of 80% of 80% is owned by 20% of 20% of 20%, that is still (0.8)^3= .512 owned by (0.2)^3= 0.008 or 0.8% of the people. Hence the vast majority of actual alignment of power to actual interests goes with that of the top 20%^3. If I were to guess maybe the interests of 10,000 big shots run the show on the left, while Trump only represents 100M Americans, then Trumps democratic base is 10^8/10^4 or 10^(8-4) = 10^4 TIMES LARGER than the corporate-captured left’s. To get that we only need 0.2^6.4. Which at our 80-20 rule corresponds to Pareto ownership of 0.8^6.4 = 24% by 0.2^6.4 = 0.000033 of the people, and the assumption that 24% ownership could possibly capture and maintain control over government. Let’s say, I wouldn’t be surprised.

30 years ago I got a summer job for a bio prof doing computational biology, and was invited to do FFT parallelization on a company on the 128 ring, a box of 16 processors, so if CPU is slow, now have 16 workers on your problem, and also connect to a host computer. Now 16 cpus doing FFT for you. So he bought the box and asked him to program it. I did it really fast, and produced the code and started to measure the performed, I thought I could get 4-8 X faster, but not only it was not faster, it was slower! However I looked at the code, it was right. At lunch I suddenly realized the problem, and measured the bandwidth of sending data to and back with no computation, just sending it back. But the time instead of zero is very slow, much slower than the CPU computing FFt by itself, like 10x slower.

So with data and theory analysis I wrote two pages and gave it to the professor.

He said Gee why do they make this box? He couldn’t see any flaw in my reasoning.

The title was Using this box will slow FFT down by a large factor. He called the company and said, your Box does not deliver. They said, the grad student you hired doesn’t know how to use it. We will send a team to help. Next day a team of 4 showed up. The head looked very confident. Then they started working. In the afternoon they looked very depressed. Did you find something? We will, we will.

They did not have cheerful expressions. And left, saying we will find the problem.

The leader was not very analytical. I said the bandwidth is in the spec, and if you translate in data in terms of the bandwidth it is slower than the CPU.

Similarly I looked at FFT on GPU. They realized that the connection between CPU and GPU is a bottleneck. They declare that using GPU they can do 125GFlops if the FFT on GPU, but the author said his implementation is only 1GFLOPs.

Yesterday I looked at the cheapest GPU, $119, the CPU is 3 Teraflops, and the card a few GFlops. Are you crazy just to slow it down? The translation between the two, FFT complexity is N log N operations. So you have to deliver O N size data to the GPU and get it back. Summary: in 30 years they learned nothing,

The only justification they have that is left is, suppose FFT is only one phase of a computation pipeline., increase the computational intensity, but many many times because it is so huge. They don’t have a clear way to think, is the problem.

This is why divacon makes communication a first class part of the language. In comparison with computation it is more important. The tragedy is the entire field is so pitiful in terms of the way of thinking. I find it hard to believe. Most people don’t realize there is a problem.

How to think about it? That’s why the optimal mappings paper is about communication, it is comm optimality, the cost is not flops but comm. The hardware vendor advertising materials must also list the communication bandwidth between the CPUs.

TV: What does nivida say? G they don’t understand they are idiots. They say 125GFlops, but hardware capacity is many times that. Their claim is several petaflops. But they deliver megaflops or gigaflops. They don’t even ask the question.

That was very clear. What is your solution? First destroy nvidia. Second, have a divacon chip GPU with optimal communication. Must match computation with communication. The whole world is running on a joke. It is 2.9T market cap value on a joke.

No pressure George. Using NVidia for FFT is 1M X less efficient than nothing. Then NVidias power consumption is an important part of world power consumption, the sky of the world will be bluer. From a CS point of view, this will lead to a better programmming model, and analytical model for performance analysis. fLOPs is just a graph with nodes and connections. To see nodes ignoring links, is ridiculous. Shockin to me can the world be so stupid? 39 years so stupid still in different forms.

Someone measured the nvidia program and didn’t know why so slow. Nvidia declared 125GF and he measured 120x slower.

I predict they will be that stupid in 30 more years.

This problem shows that human evolution has a long way to go. This is really not a hard problem but for whatever reason they cannot approach this problem. Even thinking machines built a 2D mesh on top to slow it down. From graph theory there is a notion of subgraphs, so hyper cubes contain 2D mesh as a sub graph. A big mistake. Secondly if they have a 2D mesh they should still use the hyper cube with log n cost vs sqrt n a big difference. One mistake after another that’s called MIT.

MIT built Thinking Machines, which was considered the frontier of communication. Their model was mesh messaging. Each processor p takes from its direct neighbors, sends to its neighbors, each can talk to 1 away or 2 away, without conflict. Also you can reverse the dividing bit, then no message will be in the way of the dividing message. To show that is beyond most people’s imagination but I can show it on a graph that easy to see. Forget the terms, just send from a to b, or a group A to a group B and they better not have conflicts, they better model that and resolve it. I say they have no way to think about it.

ed too

Your base concepts are simple and powerful but not enough power.

GM no, in optimal mapping, I explained too much. I never imagined people could have a hard time to understand it . So that had too much detail.

My mistake is to assume the reader could understand.

TV You assert the 3 constituents of computation are comm SM LO: where? GM It is constituent operations, it is in 1988 journal paper, in the thesis.

Hyper cube contains 2D mesh as a sub graph. I don’t explain basic Newton in every paper,

People block on f, m, and a. Talk a little bit, each paper to present small percent of the content.

Linear least squares.

Model a column of outputs, b, as a sequence of A-row-weighted sums of inputs x. Rows of A are weights for the column x and their weighted sum(s) is(are) (the column) b. Modelling is setting values in A. Then when you get a new observation measured in various dimensions that is a new x, you can expect that Ax is close to what you want. Why b? Where did the F did b come from that we want it?

7/25/24

G communication is about the relationship between elements in two groups, there is no processor, no local computation, no structure manipulation, no outside stuff, focus on just the relationship between the two, don’t drag other stuff into it.

T Although 3D concept of electron neutron proton is a good model for atoms, three components for matter, each element only differing in counts of them each, in divacon each of Local,Co,Sm is not merely a count but a range of different qualities which interoperate together for a miraculous joint outcome. G just don’t confuse one with another.

T Polynomial is like FFT in terms of local computation.

G When you talk about one don’t think about the other.

G FFT communication is precisely a hyper cube. Even a single processor can do hyper cube communication. So yes they have a dumb way of communication, and leads to slow performance, and small return, and load on HW, but it does communication.

T Is there a hardware optimal communication? G Ask clearly. Given a fixed topology and architecture how can you implement best on it? Suppose you want to create a communication network what is the best design for it. What is the best topology for a given communication? That is Another question. We want to do communication efficiently, so what does communication efficiency mean? Latency vs bandwidth are independent. Most focus on latency not bandwidth, and since current data is intensive they are going the wrong direction. I relate clock cycle to latency and bandwidth to bus width or # wires. Latency in clock cycles. Units are clock cycles. Moores law could stop in clock cycle increase but more wires would increase the other.

T what is your GPU Design question.

G I’m sure NVidia is doing the wrong thing. Because so far I have never seen a high tech company doing the right thing. That gives an opportunity to do something even with the same architecture but much faster anyway. CC, but also GPU could be sped up by a big factor. Now Nv claims proudly 30% faster, but I can say without any change I can make it 10,000% faster. With a GPU it is fixed hardware, and we see how to do it better.

T Are you writing a divacon compiler for NVIdia,

G whether I write in cuda or machine code the main issue is to run divacon. I’ll run maybe some on the GPU. the main thing they are proud of is matrix multiplication, so that’ll be the main focus.

T Same as thesis? G Not necessarily maybe a different decomposition, different scheduling.

The value of something depends on How different it is from everything else. Outstanding extraordinary matters. You dont compliment someone on two eyes one nose five fingers.

SIMD why not a SIMD machine? They are building some features of SIMD into GPUs.

8/14

Did your committee understand your thesis? Perlis did. He had good intuition, didn’t need to know the details. Others? Maybe not.

Cuda/GPU is using odd even, nobody realizes EO with preadjust vs LR with post adjust. Are the same. This is a mystery to the entire field. Still! They will never catch up.

To understand this notice that Quicksort recursively divides. All the work is done on the division.

It’s Easy. 8 kids unsorted by height. QS does premorphism sorting of the height.

Choose a pivot, all kids shorter go left, others go right. That’s a split, a division. Then repeat. So QS is a premorphism. After doing the division the work is done. All the work is done in the domain of X, of the input.

Post morphism: Do you know merge sort?

Tom: I think I do. In division you do nothing, divide in the middles recursively until down to one, then all the work is done in the domain of the combining, the post. So here post op and pre op are on the combining and dividing respectively.

See GM’s drawings of div+eo+pre vs combine+LR+post: same correspondences,

So if op is same then it’s the same. Same ops in same orders on same data: same.

I said I understood optimal mappings. He said k-D FFT mapping? I couldn’t say so.

Later over tea, I said, Tell me a story. What’s your first memory? I don’t know. Where did you get your good habits? Like what? Eating vegetables. What’s the question? Was your life struggling and difficult or just natural?

I guess I didn’t realize at first but I have always been quite different from others, since I was little. I have always been a good student. Since a child, in terms of studies, in terms of physical competition, I sort of took it for granted, I thought I was supposed to be number one wherever I went, it was not hard for me, being praised a lot by teachers.

Then I moved from a city in Dalien to Beijing. Dalien is a city under occupation of Russian and Japanese, and battle ground for 1904 Russian Japanese war, it is a city quite different from any other city in China, for just one example, using gas to cook, the only place to do so. Dalien had a lot of western influence. Going to Beijing was going backwards in a lot of aspects, but connected me with a lot of traditional aspects of China.

After primary school, by taking the entrance exam I went to the best HS in China, Beijing #4,

Politics started to play a role. That was very ugly. I thought it was one of the subjects like physics or math or geometry, but I was very wrong.

If you ask questions you are labelled as a reactionary, backwards, politically not reliable. But I continued to be good academically. And I loved what I studied. I studied for the sake of my love, not to find a better position or please teachers or parents. I just loved it and did a lot of things required by nooone. This kept being the case.

Later when I went to the countryside I continued to do self study for the sake of love, and learned calculus even though school was closed.

When college reopened after the cultural revolution I took the entrance test, scoring among the best few, and without undergrad I went to grad school,

Then I applied abroad, was accepted to Yale, and again unlike others I studied out of love. So that when I graduated with a PhD out of Yale there were 10 PhD’s in my class, I got 7 offers, others not one, to me it seemed very natural, I thought of course, but later realized it was not of course.

Now I got three masters and one PHd. In China one masters, in EE. A deep dive into switching theory.

I had no degree for math and physics, I studied them for myself.

At Yale I got two masters before the PHD, one CS, one Philosophy. Not much to say about the philosophy one, it was like a reward, not really much that I did.

It was at age 36 I arrived at Yale, from the Communication University of China, it was a ridiculous name.

I was in the Countryside for a long time, not long in Beijing. Altogether 10 years.

2 of those years intensively studying.

Chinese grad school included one semester of German, all forgotten. I learned enough to read it without understanding what I’m reading. So I could pretend. I spent quite some years in Japanese as well. Not really Russian.

I started a newspaper, the first independent newspaper in communist China, it made a big splash, big impact, even now you could find it discussed on the internet. It was 1967, I was 18, Beijing, it wasn’t red guard but I called it Red guard but not accurate. I was the so-called commander of the red guard troop. “Jung shie wung ge bao”.

Even Mao and Chou were forced to comment. For several months all of China was talking about nothing else, I thought of course, but looking back it was really a miracle.

Mao definitely knew about my newspaper. He may or may not know me.

Tom, by your signing names criterion, Hitler didn’t kill many at all. Stalin signed a lot. Mao fewer.

By an accidental famine criterion maybe killing thousands is not so bad.

But creating murderous conditions, that’s a responsibility.

GM: that’s a big question. TV: When did you start Buddhism, and what drew you to it?

GM: It is one thing to realize, it should be obvious but it is not, wisdom and intelligence are not the same, one can be very intelligent but very stupid in wisdom. Buddhism may provide some solutions there.

TV: Is it also about love? G: No.

Oncologist name, numburi. Swedish.

8/27/2024

-— we can’t let the people have an opinion.

T What is the best government for China?

G Nothing to do with this history we have been talking about.

Are you committed to that question? Yes. Finally.

Each time I ask you give me a different question.

Have you read war and peace. Volume 4 in particular, discussion of history

And who decided history, who has driven the development of events.

Tolstoy said Napoleon ad Kutuzov contributed to the development of 1812 war, as little as any soldiers in the battleground, no more. he reason they are mentioned a lot is only the stage spotlight not because they did more, but because people can see them more clearly. As to history, everything is in the hand of God, without the will of God even one hair will not fall from another persons head.

Chinese government may say they want this they want that but a lot of things are not what they can decide. Right? T Right.

U Ziao, I am beyond her imagination, but she got a touch of it.

"Middle school" is indeed high schoool. We call it middle school it has two sections, grades 7-9 and then an exam to enter 10-12. The #4 school is all boys. It has record as best school in China, measured by the % who entered best colleges, like Oxford and Cambridge of China, holding the record for a long time, now some other schools catching up but #4 is best or in the top #3 in China.

I got in there, entrance by exam. That was at the end of 6th grade.

The Big char poster I put up was August 1966.

I didn’t say those words.

Regarding romance and love, my reading was for a future voyage I was preparing an ocean chart to navigate in the ocean of the subject, later I found it doesn’t confirm with the real world. And paid a big price. I read exhaustively every Russian 19th century book by all those authors Tolstoy Chekhov Gorky Lermontov Pushkin every one of them and every book they wrote. I read a lot of French Stendhal, Balzac, Victor Hugo how do you say, Les miserables English Spanish not much but of course Don Quixote, of Americans Jack London, Mark Twain and some others, not Ben Franklin, mostly in Chinese but some in English. One version I brought from China, from my GF, Les mis.

I did 6th grade in Beijing, in the beginning I spoke in the dialect from Lushun’s neighbor city, Dalian. Did you leave there at age 10? Yes. It had a lot of Russian and Japanese influence esp in architecture, but the school materials were same as in China.

Tell me about your favorite teacher in #4 middle school.

Actually when I entered #4 school the political atmosphere became a big issue, so studies both academic and political. Interfered a lot. I was criticized as backward and a reactionary student all the time, became I didn’t realize for political subjects I am not supposed to ask questions but to just agree with the party, you cannot say Why, by asking that question you are reactionary, so in many ways not a pleasant experience. However many teachers were good in their subjects and I could also study on my own a lot so it didn’t hinder my academic performance. A math test is a math test, there is one subject called politics and they may put there a comment on your political attitude, but they won’t say fail for math, that didn’t happen, 1+1=2 still gets a perfect score. In math. But not in politics? Yes.

So not much personal relationship with teachers.

During cultural revolution teacher completely lost their power. Their roles reversed, they had to show great respect to envy student they have to say yes sir.

Was that brief or only 1966? Until end of the CR, in 1976, it maintained that situation.

Questions for George

Tell about your parents and if you had siblings, and their influence; I don't care about political significance. Tom Didn’t ask, no answers. But he had a sister, who lives in the US now, and has a son who is married with two daughters.

What was your mother like? Did she understand and support you best? Why did the move from Dalian to Beijing?

Tell about your favorite teacher at #4 Middle School. It was the cultural revolution, the teachers did not have the power, they spoke respectfully and submissively to students. There was no question about the teachers encouraging George to post his big character poster, no personal relationships. They considered me reactionary because of my questioning, asking Why? They might give bad grades on the politics subject but not on math for your math. I could study and learn.

What was #4 Middle School like? What is a middle school? It is one school that contains two three year schools, entrance by exam from 6th grade and later passing from one level 789 to the next 10-11-12 by exam also. Boys only. There was another school, a girls school, when we were sent to the countryside they sent that girls school to the same village as they did for #4. My first girlfriend was from that school. Her father had a PhD from the US, they were very accomplished, physics, concert musician. I was in two villages, one nearer Beijing the other far away. One was in an island in a lake, there were boats, handled similarly to the boats of Venice with a single oar, later at one point I went to Venice and said let me try that, and they said you won’t be able to do it. “Just let me. “ They were surprised. Ours were bigger than those taxis of Venice. The first village was a violent and chaotic place, troublesome, it wasn’t Red Guards on rampage, it was peasants who were in conflict. People were killed. The second village was very far from Beijing.

How did you study, and how did you work? I had certain tasks. I had a taskto watch the carrot field to prevent theft, so I could sit and read. Or I had to watch a mule, so I took the mule in hand with a rope and walked around, holding a book in my hand, learning Japanese words from the dictionary. Or I had to chop grass for horse feed. A certain amount was the expected amount for the day, so I would work furiously chopping chopping, and finishing in three hours, I had time to study. I studied at night with an oil lamp. There was no seat, no table. I put my travel trunk on top of mud bricks, that was my desk. I put a board on another mud brick, that was my seat. My calculus book was a problem book, it had 3000 problems. I did all of them one after the other, three times in a row. In storage here today there are the three stacks of papers (motioning about a foot tall) with my problems. I would do problems until I couldn’t solve one. Then my dreams were about that problem, and the first thing in the morning I would solve it. This is how I studied.

What was Yu Luoke like and what was his potential if not murdered by the government? DNANA

When he was in prison were you already in the countryside? Could you visit? Yu Luoke in prison? No. Speaking only about the time after newspaper, after entering prison he never came out until his death.

Are you still in touch with anyone from those days? Yu Luowen, Wang Jianfu from #4 school. DNANA. Did not ask, no answer for this question.

What did his essay On Wages say? GM: He talked with me about it. He made the observation that people did not care what their wages were, but they care the relative level of their wages compared to others.

The determination to make something of oneself and for ones family is powerful; is that why your Journal, which offered a path to those prevented in that determination, so popular? Is it only the oppression of a classist system that motivates the popularity? Or is it universal?

What was the path offered? Battle? or self-improvement? If the CCP was against the couplet, how could they be against the journal?

In a book Half of My Life 2005 Xu Ziao says you are still a spiritual wanderer to the present day. Please comment. Xu does not understand me very much.

What made you decide on EE, science and math? DNANA

Please let me photograph you. Also your homework stacks. I recorded him playing accordion.

When year did you go Down to the Villages, why so long? Who went with you? Was it your choice as it was early? Did you believe in the link up with peasants and workers?

What books did you take, what was your syllabus in the countryside? Do you still have those books? Here is a book that first girlfriend gave me, Les Miserables by Victor Hugo in English, I still have it.

Tell about your August 1966 big character poster, what led up to it and what followed it? G: I just wanted to make the argument. T: So you felt responsible for the issue, you felt you had a voice in the public dialog, you had the right and responsibility to speak, as good and as much as anyone. G: Isn’t that obvious. What would be a different position? T: the opposite position is of the CCP.

G: I was being young and idealistic therefore stupid I did not know how to protect myself. Also I felt, if I pay a price for this, that is an honor, you should sacrifice yourself for the truth. Not many were thinking that way. I was too dumb. Tom: you are too great. G: No, dumb. I was so dumb that it helped me. The newspaper I started was the first independent newspaper in Communist China and it had a great impact. For 3 months entire China was discussing one thing, that paper, and it disturbed workers and peasant and it disturbed the top leaders, including Zhou and Mao. They were asked what is your opinion about this, and they had to say something. It was a splash I made over China. At that time I thought to myself, of course, this is right and natural, but looking back it was a miracle. It was in a space of all the tension. I thought of course, that is how it should be. People who later joined the paper, there were 12-15 people, and many paid a high price. Police came to me to investigate, even though I was a so called commander, editor in chief, one of main writers, even many articles Yu wrote I revised and wrote together with him. But the police didn’t do much to me. I didn’t pay much price. Why , because out of naïveté you may say, I just insisted that I was right, and refused to say anything against him, he is really a Marxist nothing against Communism, which was what I learned from him. The policeman was shocked. They tried to scare me, put a gun on the table, “so you know the consequences”. I was laughing and laughing. The police was shocked by my laughter. They actually made a conclusion I was crazy, not normal, we don’t want to waste our time with this crazy person. Of course that was not my intention but it came out that way.

Tom: reminds me of the holy fool. A saint who has strange ways, people think them a fool but they are wise. GM: I’m not so wise, it just turned out that way.

Tell about how you switched from politics to science. What incentives evolved? DNANA

Tell about your trip to Vietnam. When where who what why how. I told you about it before, looking back that was really dumb, I paid such a price, I went all the way from Beijing to the south with no train ticket, risking my body by jumping on a fast moving car, three days hiking across mountains with really dense and thick grass, that was really dumb, too silly. I survived. One classmate went to Burma and joined the army there and he died in battle. When? This was 1967, before Luoke’s arrest. On April 13 1967 the central party group made the statement, its leaders spoke, saying of the paper that it was reactionary. So in May I left. How could a party I believe in say that not all people are equal?! I just could not deal with that. I was struggling inside, I don’t know what to believe anymore. It was a struggle for me. Yes, then I was home in June.

What kind of government should China have in the future? Turnover of high office, decision-making, separation of powers, voting rights, federalism, ethnic representation? George quotes Tolstoy in the end of War and Peace. Napoleon and any soldier had equal impact on the flow of history.

Is the CCP lesson from the Cultural Revolution a pendulum, to allow no political choice or flexibility, that Mao's revolution from the top was a pure self-destructive mistake, and thus to immediately and permanently crystallize bureaucracy and corruption. George: Is that a question?

Does the CCP’s military history constrain itself? As a style of bureaucracy, top-down, anti-democratic, etc.?

How old were you moving Lushun to Beijing? around 10. How did you get into Beijing #4? By exam.

What is your message to the future citizen of China or the world? DNANA

Is the Chinese character anti-democratic as it is submissive to the Big Man Archetype? DNANA

Do you admire Genghis Khan's non-favoring of family? DNANA

Will benefax supplant the capitalist/socialist dichotomy in public decisionmaking?

Can/How can neural networks combine knowledge (pattern, texture), perception (filling a space of temporary possibility with temporary content), and reasoning (considering alternative plans subject to logical constraints and dependencies)??

How should a chip company enter into the market and defeat NVidia?

Specify the best way to build a robotic perception/action loop.

How to learn their control-law association,

How to expand domain-of-use from simpler to more complex while preserving generalizeable domain knowledge

How to represent space/time trajectories for actuators and in temporal object perception (target mapping and prediction)

Given N object models comprising points in their own frame, identifying them in a scene means finding 9+dof for shift rotate scale+distort to place it there, similar to SLAM for each, along with a goodness of fit measure. True? How to think about it, how to best do it, since nearest points with Least squares estimates of the dofs makes no sense with small objects in big space. But octant splitting and refitting Nx8x might find it? You are full of confusion, apparently, tom.

I started a divacon interpreter. Please comment on preferred implementation: Should it use MPI? Threads? Any language? C with cross language linking? Threads? Shared memory? Index storage for relative indexing? MPI? It’s math so whatever.

I heard Zhiqing means a person sent down, or something. Is that related to your name? No. Zhijing, voiced.

Can the Kalman filter be improved? Track target qualities in a trajectory, adjusting by a fraction towards measured vs predicted, then adjust control toward desired.

In the village in Xaanxi the food was a grain in Chinese called Shudze. Instead of separating hulls they ground the whole grain roughly; it would cut the throat when eating, and came out often bloody.Also there was a fermented form of carrot potato and beet leaf, like sauerkraut but richer. Corn bread or blocks rat eaten, we would cut those corners off. Many had hard time to eat it but I was happy studying.

Once 2 girls came, asking, can we eat with you? Saying, When you eat it’s the same food but You look like it’s the best meal. Ok.

So we ate together. In the village on the lake closer to Beijing they only supplied dried sliced yam with rotten black spots. The idea being to make a bread. Which ended up brown. Actually it Didn’t taste bad but only if it’s the only food you get. The government controlled the basic food. But also there was sometimes a black market, where you might buy wheat. We observed a fisherman in a small boat, fishing with his cormorant, and throwing back the eel. Ahh disgusting the fisherman thought. Would it be ok if you gave that to me? What for? I’m going to cook it. Ok. It was thick and long like a forearm. On one condition. I see you eat it. Ok. Home, prep, he squatting, me eating it; he made faces, I finished, he was amazed. Until one day he said I’m going to charge you. Sure. Lake bai yang den

Local military and central military were fighting, real guns, my village got bombed a few times. My village was occupied by the group supported by the local military which was very strong in that village. One time they criticized a certain student who wasn’t a very nice person anyway, but I stood up, what you said is really not right. What?!! What’s your name. Everyone said you are a very bad student! Tell me who I am? He didn’t know. You don’t even know my name so how do you know, I’m a bad student. wait A few people with guns pointed at me. Later some approached me, saying an office called militia training office of something, the head of that office, said he wants to invite you to dinner, I got four very nice dishes, and one soup, it was a great treat,. He said we heard about your heroic resistance to the people in your village, you are a hero. We already nominated you as a member of the revolutionary committee. I was scared because they killed such members. But I could not say please don’t, either. That’s how I decided to leave the village and transferred to Xaanxi.

Luoke, mention turning bad into good, But here, no, it was turning good into bad, your bravery turned into a threat to your life. Standing up for equality into rigid authoritarianism.

In the village in Xaanxi, 12 of the kids from my high school are still there, now they are organizing a reunion. In October during the couplet debate. I was against it I stood up, I was known openly, Yu Luoke wasn’t open, but I stood on stage in front of thousands, I wrote big character posters, my class was criticizing me. In fact this current-day reunion organizer guy, he stood up then and said, Muzhjing! you are so bad, I am from a counter revolutionary family, I am Son of a Bitch, his face was so red, his neck had a blood vessel popping out, he was so “Sincere”, "You don’t accept that I am a Son of Bitch. You are a Son of Bitch." Isn't that funny?! I was so upset at that time. I should have said, ok I was wrong, you are indeed a son of a bitch, but I’m not, because I’m not from a “reactionary family”.

Did they criticize you for being from Dalian?

One cannot say a city is reactionary. My big poster was about the couplet. Son of reactionary. I said the family history does not determine your class. Many revolutionary leaders including Mao himself are from so-called bad families. Most of the high ranking CCP leaders are from such families. No I didn’t say Mao’s name.

T: Did Mao have actual enemies like the Beijing city mayor?

G: He prosecuted the chairman of the whole nation, Yu xiao he killed many, without exception. But there is a tradition in China. The founding person of a dynasty, will kill all the helpers. Han, Tang, Ming, they just do that. So Lin Biao also was helping Mao kill others, and died killed himself finally. It’s a tradition, there doesn’t have to be a reason.

T: I propose to write a DC interpreter (see q above) G: Tom, you are making it too complicated. Divacon is so simple, there exist only three things, d and c, com, and local, local is simple, X +,-,… 2D div into 4, do any divide, just divide. Do any local op, it’s fine. Choosing the programming language, using a multi threading model, such details don’t matter, divacon is mathematics, so just apply div and its inverse just make the axioms true, it’s divacon, there are millions of ways, find one, don’t think there is just one way.

Your object identification and placement with framing question for robotic perception is a trivial search engine problem. You need to abstract the problem, state it in an abstract way, don’t be distracted by particular details. It is so trivial.

You can make it more complex, how about find the distance between the features is Euclidean. You may say find the k nearest neighbors which are sort of similar but not exactly. Not kidding it is a trivial problem. The question reduces to finding a point in a space. You may say it has name location size weight whatever, now I have a group of those values and find me a point which has those, or say no there isn’t.

So go study about search engines first. More importantly, abstract what you were trying to describe into what I’m saying a search engine problem given a space consisting of points, given a point in terms of its features, decide in the space whether there is such a point or not.

Search for the features not the location,

You may ask, do you. Gave a 5’ boy ? Yes, then ask where, he is there,

Find a boy, 5’, ask more

One point can have a lot of features, color name, pixels etc, they can all be loaded with features. Find one point in the space that is close. When you say everything is everything then you don’t know what to do.

Is it there? is one question, where is it? is another, Each point has a location! But when you make the inquiry do you want to make the location a criterion? No,

You don’t say. I don’t mean that it doesn’t matter how, but before you know how you have to say what you are trying to do

T: Are there any tables here if so where? In a given space there are objects, do a search. Replace room by set, replace table by point. Any search is w,r,t, a known set of data.

The Divacon proof I'm working on is to verify an amazing result that DST can do search in constant time, O(1). This is hard to believe. I have a proof but it is complex, one lemma after another, so I do need to verify it. If true it is like a miracle.

For divacon, both involve programming, both are working; for DST I have the python version working and verified and for plotting the result; for the C++ version it has one verification left.

I will aim to do these two things in this next year.

Tom: I will at least put your papers in one place and that will help toward your book.

Your thoughts?
(will not be shared or abused)
Comment:
                                          Feedback is welcome.
Copyright © 2024 Thomas C. Veatch. All rights reserved.
Created: September 12, 2024