Communication Notes

Lockstep Communication

Given a regularly connected lattice or other rectangular arrangement of processors, lockstep communication is a fundamental and optimal communication pattern. If each sends to a processor located any number \(n\) positions away in any dimension \(d\) then all processors can send their own message to a recipient \(n\) positions away from them in that dimension \(d\), all sending simultaneously, without queuing, and without any message blocking any other message: in lockstep each aims in the same direction sending the same distance.

Lockstep communication enables every sender to send at the same time. Lockstep communication achieves the theoretical maximum communication efficiency, completely freed from the "Von Neumann Bottleneck".

A communication system which allows messages to be sent from any to any using a central message-forwarder and polling by the recipient (as is apparently the case in nVidia GPUs) could not possibly be designed to be slower, as it allows any message to interfere with any other and any link between processors to be unuseable until any other link is cleared, and prevents delivery of any message until arbitrary communication-external conditions free the recipient to poll for a message again.

Communication Notes

Following are notes from Mou, 1990 on "Communication" in the context of computer algorithms.

Example 4.1, p78
\(\ \ \ \ c_{lr}\ :\ (id,\ !\ +)\ :\ \#(nil,\ last\ 0)\ ([1\ 3\ 6\ 10],\ [5,11,18,26])\)
\(= c_{lr}\ :\ (id,\ !\ +)\ ([1,3,6,10],\ [(5,10)\ (11,10)\ (18,10)\ (26,10)])\)
(If I had to read this out in words I'd say that apparently:
- #() is a "communication function" meaning that it primarily has to identify indexes of the things that need to be brought in for the calculations, but then in reality it needs to know what things to look for what it indexes into and what to do other than extract indexed elements, namely construct new structures and tuples and vectors etc. for downstream use ;
- "nil" seems to mean just copy the first item from the following tuple (in this case a tuple of two 4-vectors) and use that as the first item of the output of #(nil,...), which is hardly a no-op (needed 28 words to describe it!) but I guess you could see it that way.
- "# ! last 0" identifies for each element in each subvector that it wants the last element in the other subvector. means send the last element in the left vector to every element in the right vector, AND the last element in the right vector to every elementin the left vector. So "# (nil,last 0)" means don't bring anything to the elements in the left subvector (by nil being first) whereas do bring the last element of the left subvector to each element of the right subvector. (see p58, Figure 3.2) The meaning of last 0 is semantically composeable from the meanings of last and of 0; "last" makes you think of counting from the last index of the other vector, and to count down 0 more indexes. Then applying last 0 communication to a subvector copies the last element of the other vector into structures for each element in this vector.
In either case #(nil, last 0) takes 10 from the first (0'th) subvector and appends a copy of it to each element in the 2nd subvector. Very convenient to separate communication from calculation, the communication puts the right stuff in your workspace, and the calculation adds things up.
\(h_{scan} + = (id, ! +) : \#(nil, last 0)\)
"... when applied to two vectors will leave the left vector untouched but will add the last entry on the left to each entry on the right. For example, \((h_{scan} +) ([1 3], [3 7])\)
= (id, +) : \#(nil, last 0) ([1 3], [3 7])\)
= (id, +) ([1 3], [(3, 3) (7, 3)])\)
= (id [1 3], ! + [(3, 3) (7, 3)]\)
= ([1 3], [6 10])\)
Section 4.2.4 p 82
\(g_{broad} = (id, !other) : (nil, \#corr) \)
This pre-adjust "function can be used to compute broad with any balanced division on vectors including \(d_{lr}\) or \(d_{eo}\)".
Algorithm 4.7
\(... (id,!loc) : \#(nil, (last 0)), ...\)
"The (last 0) communication used above implies broadcast. It can however be replaced by correspondent communication ... by introducing a new variable and making the broadcasted value available at all entries (as in 4.8)
Algorithm 4.8
\(... loc : \# ! corr ... \)
Algorithm 4.12 2nd order parallel-DC sort, p 89ff:
\(sort = PDC(d_{lr},c_{lr},id, loc:\#!mirr, atom?, id) loc = !nest : (!min : !max) nest = PDC(d_{lr},c_{lr},(!min : !max) : \# ! corr, id, atom?, id) \)
5.2.2 p 94:
! + : #(last 0)
"When applied to two vectors will fetch the last entry of the left subvector and add (+) it to the first entry of the right subvector... thus will only effect the value of the first entry of the right subvector.
5.3.1 p97:
\(\mu (Q,R)\ =\ loc\ :\ \#(rowbr\ k)\ :\ \#(colbr\ 0)(Q,\ R)\)
(colbr for column-wise broadcast). The explanation:
"When .. #(colbr 0) is applied to (Q,R), a matrix, say Q' is returned where

Q'(i,j) = ( Q(i,j), R(0,j) )
thus broadcasting the values of row zero of R into a tuple in each cell of the Q' matrix so that later operations can use it.