|
Zhijing George Mou keeps returning to a central
idea, the deceptively simple and but unimaginably powerful concept
of the bit. He has found worlds in it. Sure, you already know: 0 or 1, that's a bit. Provide yourself with a sequence of them, you have a set of (unsigned, binary) numbers. 00 01 10 11 are the numbers 0 1 2 3 in binary. You knew that already, too. But there are amazing ramifications of this simple idea.
StorageConsider storage: One such binary number might be the count of entries that have been jammed into a dictionary or at least into a sorted array. Track what happens to the bits when you add one to the count (if the count is a binary number) and build corresponding data vectors and operations to hold values in arrays of sizes, each the next power of 2. The result is the Black White Array.
Parallel CommunicationsOr consider communications on a mesh of parallel processors, each with its address as a binary number of \(m\) bits, and each connected to single-bit-adjacent processors. When \(m\) is just 2, see that 00 would be connected to 01 and to 10 directly (but in different dimensions, that is, by changing a different bit), and 11 similarly to 10 and 01, because you just change a single bit and that should be directly connected. Now all the processors with a 0 in a bit, for example, are connected to and can communicate directly with a different processor with the same address but a 1 in that bit, in unit time. If there are \(m\) bits in the processor address space (and \(2^m\) processors), so linked, then any processor can send a message to all the others in \(m\) time steps, which is 10x faster than what Thinking Machines bet the company on in the early 1990s, and 5x faster than NCube.The point is, a bit in an address space picks out half the addresses, and establishes a correspondence between halves by swapping that bit's value. If you do that just \(m\) times you can send from anywhere to anywhere or from anywhere to everywhere.
To be extra clear, swapping a bit could be done in different contexts, with interesting ramifications. Data ParallelismOr consider data parallelism. A vector of \(2^m\) data points can be divided into left and right halves using the most-significant bit of their indexes, or into even and odd halves using the least significant bit of their indexes, or repeatedly into left and right (or even and odd) quarters and eighths etc until exhaustion by going down or up the bits of the indexes. Then they can be combined together using the opposite order, to get back to the beginning. If subvectors of data are communicated at each division or combination, then a parallel process can operate on the division or the combination with exactly and only the data needed in parallel. Just draw them out by hand and you'll see.
Process ParallelismConsider process parallelism. If the data is onesies in a long vector, then adjacent pairs would be even-odd pairs, and in the current approach those are handled together (1) upon division if your algorithm is using an even-odd dividing operation, or (2) upon combination if your algorithm is using a left-right combining operation. The same pairs handled in the same order by the same adjust function, would yield the same results.Thus an even-odd preadjust PDC is equivalent to a left-right postadjust PDC, and as an example the so-called decimation in time FFT and decimation in frequency FFT algorithms are actually the same algorithm in terms of what data values are combined using what calculations with what other particular data values, and so they should be treated as effectively the same although this is not pointed out in Numerical Recipes or elsewhere. If you can wrap your mind around that you will be ahead of the entire field of Computer Science today, outside George's place in Seattle.
Space: GIS, Robotics, SLAMConsider multi-dimensional domain (e.g. spatial) representation and search. For example, you've got a point, and maybe it's the cat's nose, and 30,000 other points in a pile of measurements from a digital scan, so you need to find all the other points on the cat's nose to make a digital model of it for your sculpture or cartoon or computer vision system or whatever. Then you need to know the other points out of all those 30,000 that are closest to the first, or within some appropriate cube around it, or maybe in a radius around it. So you need spatial search, and by the way spatial search is about 80% of all the computation in all the spatial algorithms, according to George, so if you suck at this your whole system is going to suck. And that's why robots actually suck so bad nowadays, in case you were wondering. The big deal here is that we are going to not just try we are actually not going to suck.The Dimensional Shuffle Transform is another ingenious ramification of the idea of the bit sequence. What are the bits in a 3D spatial location, that is, a triple of \(X, Y, Z\) indexes? What if you shuffle those bits, taking the most significant from each first, then the second most significant from each after that, etc. Then you have a single 1D binary number of 3x as many bits as the original indexes, and which contains the same information but which now maintains locality: identifiers which share a prefix are both in some corner, some subset, an octant or sub*octant of the 3D space. Hence spatial search reduces to 1D search, which will multiply the efficiency of all space-related algorithms such as SLAM, robotics, etc. It means you convert space to text, regions to bit strings, and now instead of stupid near-infinitely tree-climbing space-searching, you can do regular binary search on string-sorted strings, which is log N in the length of the store, and find anything in a blink, no tree-climbing. By tree-climbing I mean, testing on this or that side of a dividing plane, climbing down, climbing up, testing, following the pointer to the place that has the data on that side of the tree, left and right, left and right, no. It's just binary search on the sorted text (bit-strings), and binary search can do 14 levels of subdivision in a single high-speed cache hit with 16k values, so we are SMOKING those other algorithms. Got it?
SummaryTo summarize: by developing a simple, indeed trivial and obvious, idea, applied to the different primary problems in theoretical computer science, you can achieve, trivially, what mere humans find impossibly complex. If you are reading this, perhaps it will be less than centuries before Mou is recognized as peer to, as the greatest son of, Turing, Von Neumann, Tukey, and Perlis. But I consider him greater, because he has made out of complexity not more complexity as did those Great Four, but simplicity itself:Hah! IF you can understand!
And that's what these pages are for, so you can understand them. The not-so-mathematical intuitive explanations for Mou Math, which doesn't speak for Mou's Math but might help you see why and to orient around so you can get the point and the essence and then understand it and then use it and then conquer the world, and maybe save it, too. |