In spatial applications quality means precision, but precision is not just sensor-limited but compute-limited, and the limits of computation are defined by algorithmic complexity. Those who use substantially worse algorithms, hampered by large and unnecessary inefficiencies, are likely to be left behind, even to be destroyed in warfare by those who use better ones.
Here I present an introductory description of George Mou's very-substantially-improved approach to space-capable systems called the Dimensional Shuffle Transform or DST. When to use it, why to use it, what it is, how much better it is than the alternatives, you will discover here. All these ideas are his, merely the expression is mine, which I hope will serve you.
Typically your system will take observations or make measurements yielding data about what is where. In general such systems can carry out several levels of analysis and computation: removal of outliers or denoising, removal of the ground (typically in ground vehicle systems), clustering, classification, etc.
Each of these levels, modules, algorithms, has to take regions or points and search for and find other points in or near them, or determine that the region is empty. K-nearest neighbor, for example, is a search across the entire observed universe to select the K other points that are closest. (Out of 1M points, it could be a long search, especially if the dimensions are encoded separately.)
Now each of these levels of analysis depends essentially on spatial search, so that points can be seen as far from others (outliers), within a surface along with others (such as the ground), nearby or far from others (clustering), forming a distinctive pattern with others (classification), etc.
(By the way, if your particular software approach fails to modularize, to separate out the spatial search problem, you can be assured it underperforms; search is faster if you can search for more things at once, since if you find one while searching for the other, you don't have to do all that searching over again for each.)
In spatial applications quality means precision; precision is not just sensor-limited but compute-limited, and the limits of computation are defined by algorithmic complexity. Hence if you are to succeed via improved quality of spatial systems you must make DST your own.
We are given an archive of some number of points of spatial data, each a record comprising coordinates (the where) and optionally some value (the what). For different applications coordinates can be of any dimensionality M>1 such as 2D (e.g., a map, or a mapped set of points of interest), or 3D (e.g., a defense area model of airborne attackers such as a drone cloud) or 4D (time X space), or any dimensionality: the method is general. Once M coordinates are specified numerically, we have a location.
If each coordinate is specified using a B-bit unsigned integer, then our space is discretized into \(2^B\) regions in each coordinate, or \(2^{MB}\) regions in all M coordinates. Given a real-world frame (with its origin and M distinct directions), plus a shift and a scale, we can transform M coordinates of B bits onto a useful, and usefully-precise real-world coordinate system.
If a bounded, discrete, 3D space, for example, is considered a \(2^B \times 2^B \times 2^B\) sized array in which the 3 coordinates are B-bit integer indexes into the dimensions of the array, then a location can also be considered as a "spatial index". "Index" is a potentially confusing word, here, though, because for example given an archive of 4 points, the index of the 4th point is 3 (since we like to count from 0), but the location a.k.a. spatial index of the 4th point is the one specified by that point using all the M*B bits to pick out a location within an M-dimensional space encoded with B bits per dimension.
Spatial search, then, takes a given search specification and returns all and only the points in the archive which match the specification. Common spatial searches are (1) "cubic" search based on M ranges, one range for each dimension, which can be encoded in two M-dimensional, opposite-corner locations, (2) "radius" search based on a center and a radius length, and (3) "KNN" or K Nearest Neighbor search, based on a center and a count K.
The so-called cubic search is actually a search within a rectangular solid, since the extents in each dimension need not be identical, contrary to what we learned in school as a definition of a "cube"; still it is conventional in this context to call this "cubic" search. A pair of points, "corners", defines a "cube", being the region between the specified ends, low and high, taken from each of the coordinates. The first coordinates of the two corners will define an extent, one being typically lower than the other; the cube lies within that range.
The radius search searches an M-dimensional sphere (in 2D, a circle) centered on a point and defined to have the specified radius.
The KNN search yields K points (assuming the archive contains at least that many), while radius and cubic can return any number of points.
Incidentally, search can also be more or less sloppy, returning points outside the specified region (reducing precision) and failing to return points inside the region (reducing recall).
"Precision" = specified & retrieved / retrieved.
"Recall" = specified & retrieved / specified.
Search is 100% precise if everything retrieved was as specified. The search retrieved no garbage: no false positives.
Search has 100% recall if everything specified was indeed retrieved. The search missed nothing: zero false negatives.
What is the cost of inefficiency in spatial search? Consider this as two questions:
A1) About 80% (4/5) of computation in spatial systems is spatial search, according to George Mou.
A2) About 85% (6/7), according to George's benchmarks, in a typical application (LIDAR out to 60 meters, 30k points)
Considering total cost of ownership of drone technology, a significant fraction of system costs is for the computing, the hardware and software, and this 2/3 cost reduction would apply. Perhaps more significantly, given similar hardware, increased precision of operations may be enabled by an algorithm which allows handling 300% more data and therefore could potentially give 300% improved accuracy or equal accuracy at 300% the range.
Furthermore, as the space searched (and the size of the archive) increases, the proportional advantage of DST over K-D Trees and FLAN increases without upper limit.
This is an amazing result, and the entire field of computer science ought immediately to shift to Z. George Mou's (patented) DST methodology in any resource-constrained or accuracy-optimizing, application (that is, all applications).
For example, given a 7X faster algorithm, one can use the same compute-limited hardware to calculate spatial bounds, trajectories, targets to 7x greater precision, in a sense, by simply doing 7x more work, thereby increasing targetting accuracy from, say, the precision of a vehicle to the precision of a wheel. Alternatively, the system designer might use exponentially cheaper hardware (e.g. a 7x slower CPU, which might be more than 7x cheaper to buy and use) to achieve results equivalent to previous, K-D Tree based, performance without DST. With computing being a significant cost in the Bill of Materials for any autonomous navigating system such as a drone copter, boat, or "wolf", the designer with DST now has a powerful lever to optimize targeting accuracy, or system cost.
Is it an exaggeration that more-ambitious missions might be achieved by optimizing this design dimension? It is certainly no exaggeration that the number of systems deployed for any finite budget can be increased by use of this improved technology.
Let us next consider how DST works. If you are a computer scientist, you will be thrilled to discover what I'm about to say.
Let each coordinate be a number, specifically, an unsigned or non-negative integer, comprising B bits, 0..\(2^B-1\) for example for UINT's.
Some might object: Space is continuous! Spatial coordinates are REAL numbers, how can you use unsigned integers to represent space?
Consider: the distance from San Diego to Maine is 5129km or ~5B mm (5,129,000,000mm), while an unsigned 32-bit integer counts from 0 to \(2^{32}-1 = 4,294,967,295\) ~ 4B. Dividing the number of steps encoded in a 32-bit UINT equally across the entire US implies that each step is about 5B/4B = 1.25 mm. Hence it is quite practical to model relevant spatial universes with unsigned integers for the coordinates.
Now, the key idea here is, the "key". We will distinguish a set of M coordinates from a single key. A key has the same information, the same total number of bits with the same values for each bit, but organized differently, for certain miraculous purposes which we will get to below.
Now, a set of coordinates can be transformed into a key (and vice versa) as follows.
Start with a key K = 0. Instead of a single zero you could identically say M*B zeroes, and indeed we will preserve leading zeroes in this representation. Now we are going to copy M*B bits into K.
Let's consider the least significant bit of all M coordinates, how many are there? M, of course: 1 each, times M. In order, one after the other, insert those M bits into the lowest M bits of the key. This is called a shuffle, because just as each next card of a deck rotates among all the players before getting back to the first player, here the source of each next bit of the key is taken by rotation around all the dimensions before getting another bit from the same dimension again.
Naively adjoining a set of coordinates, as we normally do considering Cartesian coordinates like (x,y,z), is like stacking all the cards from one player (or dimension, say x) then all from the next (y), etc. Whereas dimensionally shuffling the bits of the coordinates takes first the lowest bit from every coordinate dimension before taking the second lowest bit of any coordinate. Thus we shuffle the dimensions together, one from the low bit of each of M dimensions, then one from the second-lowest for each of M dimensions, etc.
No information is modified, but the bits are in a different order now: the most significant in all the dimensions are all together at the high end of the key, while the least significant in all dimensions are all together at the low end of the key
Or we can state that key contains the bits shuffled into it from the coordinates.
The same process could be done not just low to high but high to low, or all at once, since each bit in each coordinate has a specific landing place in the key, and vice versa. I think it's easier to think about it from most significant to least. Why?
The most significant M bits of the key select the largest M-dimensional sub-cube in the coordinate space, which spans half of each dimension.
They pick out \(1/2^M\) of the whole space, so for example a 3D space, the first 3 bits of the key pick out 1/8 of the entire space. The second most significant M bits pick out a second subregion or fraction, also \(1/2^M\), of the first cube. The second subregion is another cube, half-sized in each of the M dimensions compared to the previous. Each M bit nibble taken from the key divides the subcube at a certain level into a sub-sub-cube with volume ratio 1:(1/2^M) for each step.
Consider an example with M=2 coordinates each made of B=3 bits, and in that space if we are given a point at location (010,110), and count from high to low, or b=2..0, the most significant bits (for bit b=2) are 0 and 1 respectively, so we take those and pack them into the top 2 bits of the key, which becomes 01. Continuing with bit b=b-1=1, the b'th two bits, the middle ones of each of the two coordinates, are 1 and 1, so we take those and pack them into the next 2 bits of the key, which now becomes 0111. Finally with b=b-1=0, the least significant bits of the two coordinates are 0 and 0, so we take those and pack them at the end of the key, which appears as 011100.
To summarize, the bits in each dimension or coordinate are shuffled together orderly to make a key. The key has M*B bits in it, here 2*3=6, but 4D space with 32 bit precision in each coordinate would yield 126 bit keys. Got it?
If the original coordinates range from \(0..2^{16}\), then a DST key derived from 3D coordinates comprises 48-bits in 16 nibbles of 3 bits each.
This key is now an object of great interest. Its properties and relationships with others are well-defined and the thing is a very convenient entity to make use of.
This is the Dimensional Shuffle Transform.
The DST provides a morphism for spatial computation.
You transform coordinates into a key, and do your spatial search operations in the key domain instead of in the domain of coordinates, and the task will be simplified. Vastly simplified. Indeed here there is actually no limit to the relative advantage of the simplification achieved, meaning that even bigger problems are proportionately even easier in the transformed domain. The 7x advantage of DST over K-D Trees in one application can become Nx for any N in larger spatial applications. For example instead of a few thousand points in a 60 meter vehicular planning space, we try a geographical search of 20 million points across an entire, large country, then the advantage over K-D trees is larger than 7x.
How does this transform help us?
Suppose you are searching amongst an archive of points in an M-dimensional space. Using coordinates, holding constant all but one, you could do a linear search (O(N) in the archive size N) or at least a binary-subdivision search (O(log N)) separately within each coordinate in order to see if there is a point in the archive within the search-specified region. Even with binary-subdivision search O((log\(N)^M\)) is a lot, if N=\(2^{32}\), M=3, then O(\((log N)^M\)) ~ 32768... for each point in the archive! What a drag!
On the other hand suppose you are searching an archive of keys to pull out those which are in a cube, or within a radius of a point, or one of the K nearest neighbors of a point.
Then we can make use of some Very Convenient Facts.
Just as each point's discrete coordinates represent not an infinitesimal, but a finite region of space, each key uniquely identifies a single tiny cube within the M-dimensional space, which is sized according to the precision of the coordinates and the translation/scaling mapping of the coordinate space to the world space.
Notice that the first (i.e. most significant) M bits of the key tell us that the tiny region picked out by the full key is located in, contained in, just one of the \(2^M\) top-level subdivisions of the space, say, sub-cubes. (In 3D, these are octants, 8 sub-cubes making up a larger cube). If the first 3 bits are 111, then the point whether considered as 3 coordinates or a single key, is in the far corner, diagonally opposite the 000 octant.
The first bits in the key divide the entire space successively into nice fat cubical subregions, and we can make a lot of useful inferences based on the first bits of a key, a key prefix, which identifies a subregion of a very particular scale and indeed a unique subregion based on any particular initial string of bits of a given key. A key prefix of some number of bits can be considered as subdividing the space in half that many times, choosing one of the two halves to continue to work with. Indeed a key prefix identifies a specific subregion of a specific scale (dependent on the prefix length, which gets longer as the subregion gets divided in half again and again) and a specific shape (which is perfectly cubic in the sense of equal length sides after exactly M bits are used up, but the next bit divides the cube in half which yields a split cube or two flats, not a cube, and the next bit divides the flat into sticks, also not a cube, but the next bit (in this M=3, 3D example) divides the stick into two equal-length-sided cubes again.
A single key has a lot of bits (M*B of them), and can be transfomed back to M coordinates by unshuffling them. In unshuffling the first M bits of the key are pulled out, separated, and returned to become the first bit in each of the M coordinates. Rinse, repeat.
Now suppose you want to pull out the points in the archive in a subregion specified by a key prefix. Do you have to run up and down all M coordinates to find them? No! The archive, let's assume, is previously sorted in key order (consider the many bits of a key together, as a M*B-bit unsigned integer, and sort the archive's keys in numerical order), which we can do as part of the archive creation: take each coordinate M-tuple, shuffle-transform it into a key, insert the key into the archive at its place in numerical order among the keys presently in the archive, repeat. (George's Black White Array paper, for example, carries out this sort operation simultaneously with the insertion operation. See Bibliography.
An archive of points stored with coordinates which are sorted in some coordinate order would not help you. Comparing to the coordinates of a given reference point, you could search the archive for a matching point in just one coordinate by a log N binary search, but that just narrows you to a nearby plane which still spans the whole space, within the 3D space, and you have to start over again searching the entirety of the next dimension to narrow it down to a line, and then again with the entirety of the third dimension to narrow it down to a point-sized subregion, after all that then you can check if a search hit is there, and maybe it is or maybe not, but then you have to try again and with the next dimension, suppose nothing is there, then you have to do the first coordinate again for the next value of the second coordinate. No wonder K-D trees are so painful.
But in key space, the points in the same subregion actually share a key prefix. Searching the archive with a key prefix immediately (O(B)) pulls out the start and end indexes in the archive which share that key prefix, and in constant or unit time you can extract the entire set.
Okay there is a little bit of a trick here: Your search region might not exactly correspond to a single key prefix, but to more than one, indeed to several, indeed depending on how exactly you want to approximate the search region, by more and more subregions.
Let's define a key-prefix-subregion as a subregion specified by a key prefix.
A key, containing a prefix that is made up of full nibbles, one from each of the M dimensions, is a (hyper)cube.Then any arbitrary region of the space is a union of key-prefix-subregions. In the following Figure, 6, 7 or 9 subregions which correspond each to a unique key prefix together comprise a 5x5 block of 2D space.But fractional nibbles are perfectly reasonable objects. By specifying one or another of the bits in the nibble, you simply split the (hyper)cube in half pancake-wise and take one half according to the specified bit which is specified. Specifying the value of another splits the selected region then into strips, then finally after all M bits are specified one for each of the M dimensions, we are down to the next level smaller hypercube. The point of this aside is that you don't need to work using only full nibble steps, because each bit has a definite geometry to it, and maybe you can stack up some flats or sticks or other higher-dimensional shapes to make the overall shape your search demands, and save a bit on the calculations.
So if we want to think only about full hypercubes, a key prefix must be an integral number of nibbles; however, remembering that a region specified by fewer than M bits yields a flat, a stick etc., we could provide for subregions that are such half- and quarter- hypercubes, etc, in whatever orientation, by allowing fractional nibbles in whatever order.
Are you as impressed as I at what a miracle this is? Each blue box is uniquely identified by a single number which is a key prefix, or a key. All imaginable blue boxes of all sizes and all split-hypercubic shapes in all locations in a discrete M-dimensional bounded space, each of them has been numbered within this set of numbering systems, all mutually compatibly, interconvertibly, with mutual approximation relationships among sets of keys which are trivially computed and guaranteed to work, using the extremely simple key bitstring operation of clipping off a number of least-significant bits. Each of them is uniquely identified by number, by keys and key prefixes. Each one's size and its shape is determined exactly by the bit pattern in the key, including the number of bits that a key or key-prefix specifies and the values for those bits. In all this a mapping from one domain, multi-dimensional space to another domain, shuffled bit strings, is maintained with absolute simplicity and ease of comprehension, reasoning, and proof. Furthermore, the idea of operating on organized and arbitrarily sized chunks of space instead of only on infinitesimal points specified in multiple completely independent dimensions, which is the Cartesian/Newtonian arrangement, is just shocking as a complete and radical rejection and overturning of commonsensical geometrical intuitions which turn out to have been hobbling computer science theoretically and practically for many years.For cubic search, the number of subregions is few, as in the above examples, where a small number of key-prefix-subregions covers a 5x5 square in any position in the space. With constant or O(log(N) time cost for retrieval of all the data in a key-prefix-subregion, we can afford a lot of subregion retrievals.I find this to be truly breath-taking. It is another evidence of George Mou's incomparable genius.
More, it is something that any responsible spatial system designer MUST become familiar with and make use of if they are to do their job, because it blows away every alternative. And as I conclude in the Hunter Seeker discussion, early adoption may lead to victory in war, and potentially the protection of civilizations.
Sorry for the hyperventilating; I'm as surprised as you! It's just what things look like from here.
Continuing with the technical discussion...
By a set of key prefixes each specifying one cube or split-cubic shaped chunk of space, a joint list of them fully specifies the whole shape, all of it, to any level of precision you like. No matter what the thing is -- this is just the familiar miracle of digital representation.
We can also approximate a shape to some gross precision by using only larger subregions. An approximating larger subregion can always be specified by using a clipped key prefix, which is to say by retaining only the most-significant nibbles or fractional nibbles of any key-prefix-subregion. Clipping out the less-significant bits from a key while leaving in its more significant bits amounts to specifying a larger region than the full key specifies, indeed this larger region any and all subregions which share the same most-significant bits, the same key prefix.
Not only can any arbitrary shape be approximated to any desired degree of precision as a union or enumeration of key-prefix-subregions, but the degree of precision can be reduced by replacing in that list of key prefixes, all those sharing a clipped key prefix, by a single clipped key prefix.
Therefore to approximate a detailed shape specified by potentially many keys it is sufficient to clip the subregion key prefixes to a common prefix.
Now, for example a sphere in space could be approximated by a cube, using a much smaller number of key prefixes, and that may make many calculations much easier to handle. An approximation, inaccurate in fingerprint details but excellent in geographic specificity would be very useful in remote missile guidance, at distances from which the details don't matter. Do you see the value here? You can solve every problem at the right scale by using exactly as little information as is actually required for the problem.
For example, in the case of radius search, for each point retrieved from the archive which is located within a cube in space, one may quickly check the radial distance from your center to the extracted key's location coordinates (transformed back from key to coordinates, if not already stored in both forms), and thereby include or exclude it from your retrieval set. The huge win in computational complexity was already achieved by not even having to look at entire halves and sub-halves of the entire universe, the whole space, which we did by simply specifying some particular key prefixes as the general area of our target region.
It seems another miracle that one can search the whole space without even looking at almost all of it. But by specifying one or a few key prefixes as the zone of detailed search, that miracle is in fact achieved.
Solution: Shift the space up, so all points in it are positive.
Solution: Scale your measurement space so that rounding/truncation to an unsigned integer still has a useful precision.
Then those pesky fractional or negative numbers are transformed into working coordinates which are unsigned integers, without significant loss of information, assuming enough precision in your integers. Recall that a 32-bit unsigned int gives 1.2mm precision to the entire USA, noted above.
In short, simply transform from float or double to unsigned integer by adding a shift, multiplying by a scale and applying floor/round/truncate. Transform back by appending the right number of zeroes (lossy reconstitution of truncation), dividing by the same scale, and subtracting the same shift, to send a an unsigned integer back to float or double. Use a convenient scale and origin such that sufficient precision and coverage of the spatial range of interest (e.g, the Earth, US, Russia, etc.) is expressed with those resulting unsigned integers. For example all China's latitudes and longitudes can be represented to the nearest meter using a total of just 32 bits. In this way, spatial applications aren't practically hindered by this discretization and transformation.
So I will distinguish tree in the abstract sense of potentially recursively subdividable structure versus tree in the sense of computer science, in which it is that data structure, comprising a set of nodes and links, with one root node, some number of terminals and links from each node to its parents and daughters and possibly siblings, along with decision-making rules to decide which branch to take when traversing down from the root of the tree towards a leaf for some purpose.
Everything with a shared DST prefix is accessed at once (O(log(N)) with N the size of the archive, using binary search); meaning that you can just ask for all the observed points within the spatial region you specify and bam there they are, with all and only those collected values in storage that are within your specified region. To find those in an arbitrary region, approximate the region as precisely as you wish as a union of DST subregions, each identifiable by a DST key prefix.
The miracle of this approach perhaps derives from the fact that in it you can avoid searching where the points aren't. This leads to a large and proportionately increasing speed advantage over every other multi-coordinate search algorithm including the fastest and best known to computer science.
I reiterate, none of this is my idea; these concepts and their software implementation are the invention of my friend and hero, Zhijing George Mou. It is protected by multiple US and international patents (see Bibliography) and is available for licensing/sublicensing. I hope wise heads will do business, since if they don't there is an arms race after all.
Tested, benchmarked implementations are available as C++ and Python libraries.
Contact George or Tom for details.