George Mou on the Bit

Choose: Trivial or False

On Mou
on Divide and Conquer

Zhijing George Mou keeps returning to a central idea, the deceptively simple and but unimaginably powerful concept of the bit. He has found worlds in it.

0 or 1, that's a bit.

But then provide yourself with a sequence of them, you have a set of (unsigned, binary) numbers. 00 01 10 11 are the numbers 0 1 2 3 in binary. You knew that already.

But there are amazing ramifications of this simple idea.

Storage

Consider storage: One such binary number might be the count of entries that have been jammed into a dictionary or at least into a sorted array. Track what happens to the bits when you add one to a number, and build corresponding data vectors and operations to hold values in arrays of sizes, each the next power of 2. The result is the Black White Array.

Parallel Communications

Or consider communications on a mesh of parallel processors, each with its address as a binary number like this, and each connected to single-bit-adjacent processors. So 00 would be connected to 01 and to 10 directly (but in different dimensions, that is, by changing a different bit), and 11 similarly to 10 and 01, because you just change a single bit and that should be directly connected. Now all the processors with a 0 in a bit, for example, are connected to and can communicate directly with a different processor with the same address but a 1 in that bit, in unit time. If there are \(m\) bits in the processor address space (and \(2^m\) processors), then any processor can send a message to all the others in \(m\) time steps, which is 10x faster than what Thinking Machines bet the company on in the early 1990s, and 5x faster than NCube.

The point is, a bit in an address space picks out half the addresses, and establishes a correspondence between halves by swapping that bit's value. If you do that just \(m\) times you can send from anywhere to anywhere or from anywhere to everywhere.

To be extra clear, swapping a bit could be done in different contexts, with interesting ramifications.
  • In the context of a single address, keeping all its other bits fixed, swapping a bit picks out a single corresponding address in the other half.
  • In the context of all the addresses together, it picks out, for each, one other unique corresponding address which is in the other half from that of the first.
  • In the context of all the addresses in just one half, it specifies not just 'the other' for every one, but all the \(2^m/2\) bidirectional correspondences between corresponding addresses.
  • Finally in the abstract context of just considering the these dimensions or dimensionality in general, swapping one bit also amounts to identifying one dimension that is completely independent of all the others, which is to say that if you choose any dimension, that amounts to choosing one of many ways of splitting up the address space in half, all being completely orthogonal, independent halves, each of which is itself half black half white; half of it is addressed by a one in any of the other bits, and half is addressed by a zero in that bit. Every bit-split half is half one, half zero, for each other bit.
These are just different ways of looking at the same thing. None of these bullet points needed to be said, really, but a concrete minded person like myself might not see these ramifications, or be immediately comfortable with them, in hearing the mere word, "bits".

In particular, to send from everywhere to everywhere, some (any arbitrary) ordering of the available bits is chosen, the instruction is shared at all nodes to communicate in that order each with its specific individual corresponding node, which as we know is directly connected, and which if you want to know its address (which you don't need to know) would be uniquely identified by swapping that bit in the first node's address. Continuing with pairwise communications across the order, which is to say across the \(m\) dimensions, a highlighted message can be shared first between 2 nodes then 4, then 8, ultimately all \(2^m\) nodes. Before the final clock cycle there might be extraneous and pointless correspondent communications between pairs which haven't gotten the message yet, but that doesn't slow down the wildfire of what matters.

On the other hand the others might be building up something that does matter to the final outcomes and communicating their partial results also, so that the model is not one of a message spreading across a network, like a wildfire, but of a joint calculation being shared across a network, a bit more like baking a cake where all the parts go from liquid to bubbling to baked each in their own phases but at the same time -- except that cake batter doesn't communicate, you could say that the temperature or the cooked state of each bit of batter is really the size of its partial result, and when the partial result becomes the final result at each location then the cake is baked.

Data Parallelism

Or consider data parallelism. A vector of \(2^m\) data points can be divided into left and right halves using the most-significant bit of their indexes, or into even and odd halves using the least significant bit of their indexes, or repeatedly into left and right (or even and odd) quarters and eighths etc until exhaustion by going down or up the bits of the indexes. Then they can be combined together using the opposite order, to get back to the beginning. If subvectors of data are communicated at each division or combination, then a parallel process can operate on the division or the combination with exactly and only the data needed in parallel. Just draw them out by hand and you'll see.

Process Parallelism

Consider process parallelism. If the data is onesies that would be either (1) the even-odd on the dividing operation, or (2) the left-right on the combining operation. The same pairs handled in the same order by the same adjust function, would yield the same results.

Draw them out, like this:

Thus an even-odd preadjust PDC is equivalent to a left-right postadjust PDC, and as an example the so-called decimation in time FFT and decimation in frequency FFT algorithms are actually the same algorithm and should be treated as such although this is not so in Numerical Recipes or elsewhere.

Space: GIS, Robotics, SLAM

Consider multi-dimensional domain (e.g. spatial) representation and search. The Dimensional Shuffle Transform is another ingenious ramification of the idea of the bit sequence. What are the bits in a 3D spatial location, that is, a triple of X,Y,Z indexes? What if you shuffle those bits, taking the most significant from each first, then the second most significant from each after that, etc. Then you have a single 1D binary number of 3x as many bits as the original indexes, and which contains the same information but which now maintains locality: identifiers which share a prefix are both in some corner, some subset, an octant or sub*octant of the 3D space. Hence spatial search reduces to 1D search, which will multiply the efficiency of all space-related algorithms such as SLAM, robotics, etc.

Summary

To summarize: by developing a simple, indeed trivial and obvious, idea, applied to the different primary problems in theoretical computer science, you can achieve, trivially, what mere humans find impossibly complex. If you are reading this, perhaps it will be less than centuries before Mou is recognized as peer to, as the greatest son of, Turing, Von Neumann, Tukey, and Perlis.

Your thoughts?
(will not be shared or abused)
Comment:
                                          Feedback is welcome.
Copyright © 2024 Thomas C. Veatch. All rights reserved.
Created: August 15, 2024