Understanding Hypercubes

Definitions

A point, a line-segment, a square, and a cube: these, we can comfortably imagine, are the first four hypercubes in 0, 1, 2, and 3 dimensions, respectively.

But what is a hypercube in general?

Let \(N\) be a number of dimensions, say 2 or 2 million, any nonnegative integer (unless you accept Negative Dimensions).

Then \(C_N\) is a hypercube of \(N\) dimensions if:

1) \(C_N\) is a graph, namely, a set and a binary relation on the set.

For example, the set might be a set of nodes a.k.a. vertexes a.k.a. points a.k.a. corners, and the binary relation might be a set of pairs of nodes, each pair being an edge between some two nodes.

2) The graph \(C_N\) has \(2^N\) corners, and each corner is directly connected to exactly \(N\) neighbors.

Do you see that we could label all of the \(2^N\) corners using an address of \(N\) bits, that is to say, a string of N 0's and 1's?

Because there are \(2^N\) distinct strings of N 0's and 1's. For example if \(N=3\), then 000 001 010 011 100 101 110 111 are all the possible (\(2^3=8\) distinct) strings of 3 0's and 1's. (Was that too simple? But, simple is Good!)

We might consider each of the 3 bits to be associated with, or to be, one of the three discrete, binary dimensions which are the dimensions of the hypercube. I like to count from 0, so the dimensions would be dimension 0, dimension 1, and dimension 2.

Now here's a sweet trick. If we label the corners of the hypercube in a special way, we can identify all the \(N\) immediate neighbors of any node by their addresses.

We want to label them so that each immediate neighbor of a node, \(p\), has an address that is different from \(p\)'s address by a single bit. That is the Test, below.

We can do this recursively. Here's how.

Definition: an \(N\)-dimensional hypercube \(C_N\) is a graph; every point in \(C_N\) is one of its corners. Each of the N dimensions of \(C_N\) has only two values, 0 and 1. Every point in \(C_N\) is specified by 0 or 1 for each of the N dimensions.

Test Condition: Evaluate Test as true if for every point \(p\) in \(C_N\), there are exactly \(N\) different immediate neighbors of \(p\) each with an address that is different from the address of \(p\) by a single bit.

Base: Let \(C_0 = p_0\) be a point which by assumption is a hypercube of dimension 0, with address nil (a string of 0 bits: the empty string). Notice that there are no other points in a 0-D hypercube, so there are no neighbors in it. Take every neighbor, apply the Test, did it fail? There are no neighbors, so \(C_0\) does NOT fail.

Extension: Given a hypercube \(C_N\) of dimension \(N\) which passes the Test, for each point \(p\) in \(C_N\), let's refer to \(p\)'s address as \(a\), and call that the parent address. Now extend that address \(a\) by prefixing a (most significant bit) 0. Then make a full copy of \(C_N\) including corners and edges, calling the copy \(C'_N\). Rename each corner of the original by prefixing a new, most significant bit set to 0 to its original address, and rename each new copied corner by prefixing a 1. Then we have two daughter addresses 0a and 1a. Then create an edge or link (which will we can think of as being in a new, N+1'th dimension) between the sisters, from 0a to 1a. The result all together is a new double-sized graph \(C_{N+1}\), which is a new hypercube of dimension N+1, with twice as many nodes as \(C_N\), and again the number of links out of each node is \(N+1\), and the addresses for the points in \(C_{N+1}\) are now \(N+1\)-length strings of 0's and 1's.

Apply Test: For any node \(n\) in \(C_{N+1}\) it is either 0a or 1a with \(a\) the address of its parent in \(C_N\), and it has an immediate neighbor in \(C_{N+1}\) which is the other of the two, namely its sister, so they have the same address except that one has a 0 and the other a 1. Since \(C_N\) passed the Test, all the points in \(C_N\) and in \(C'_N\) before being linked to their partners, each had N immediate neighbors and for each of those N immediate neighbors, they have the same 0 or 1 in the most significant bit in \(C_{N+1}\) because they came from the same side, and because they passed the Test, there is exactly one bit in the rest of their address which differentiates it from its partner. Hence all points in \(C_{N+1}\) pass the Test, having N+1 neighbors, N being inherited from its parent, and 1 from its Extension partner.

Starting with Base representing a 0-D hypercube, we apply Extension recursively to get 1-D, 2-D, 3-D, etc., .. N-dimensional, hypercubes for any N. At each recursive level, the Test must pass as shown above.

Nice, huh? So remember, every time you add a dimension:

that's another bit in the address,
it's another doubling of the node count, and
now for each bit in the address, we find an immediate neighbor with the same address except that one bit is flipped,
every immediate neighbor actually was itself when they emerged from their parent node, and actually is its own sister, after one of the N copy-and-link operations which built up \(C_N\) in the first place.

Consequences

\(C_{N+1}\) has twice as many points in it as \(C_N\), indeed it has \(2^{N+1}\) corners.

Each point of \(C_N\) has an address of N bits.

{ 0, 1 } are the addresses of a 1 dimensional hypercube.

{ 00, 01, 10, 11, } are the addresses of a 2 dimensional hypercube, etc.

Every point of \(C_N\), having some address \(a\), has N immediate neighbors, each of which is not only adjacent to \(a\) but distinct from \(a\) by flipping a single bit in the address \(a\).

An enumeration of dimensions in some order P can be thought of as a neighbor-visiting order, for each point in \(C_N\).

To visit all the neighbors, any order will do.

Note that a hypercube of more than 4 dimensions is ALSO a 2-D mesh -- since a 2D mesh has all its nodes directly connected to 2 (in the corner) or 3 (along the side) or 4 (in the middle) neighbors. The 2D mesh only uses a subset of the edges of the hypercube.

Starting with a hypercube graph, \(C_N\) with \(N>=4\) and \(2^N\) nodes, \(N*2^{N-1}\) edges) in order to create a 2-D mesh, just edit out some of the edges. In case N = 4:

Shape # Nodes # Edges

\(C^4\) hypercube \(2^4=16\) 16*4/2=32 edges
(16 nodes, 4 edges out of each node, each counted twice)

4x4 mesh \(4*4=16\) (4*2 + 8*3 + 4*4)/2 = 24 edges
(4 corners with 2 edges, 8 side nodes with 3 edges,
4 in-the-middle nodes with 4 edges, all edges counted twice)

Given that every node in the hypercube has more edges than any node in the 2D mesh requires, a reduction in edges and a renumbering of nodes according to row and column in the mesh, would seem to be required when mapping a hypercube to a 2D mesh. A construction may make this obvious if you allow that each edge is two half-edges each attached to a single node. Since in a hypercube arrangement each node is attached to N half-edges, you may then place any node in any position in the 2D mesh, then join together up to 4 of its 4 or more half-edges to adjacent nodes' half-edges, and delete the unjoined half-edges, thereby constructing the mesh from the hypercube by process of deletion.

Note that bit addresses no longer follow the rule that obtains in hypercubes, that adjacent nodes differ by one bit in their addresses. Of course since the construction has placed any node in any position, essentially demanding a renumbering. Renumbered so that nodes in the 2D mesh are addressed by row and column position using 2 nonnegative integers; even then, a single bit-flip only yields the next integer in sequence when the first is even and the second the first plus one; although this makes half the adjacent-column and adjacent-row neighbors a single bit-flip apart in their addresses, still the remaining neighbors are separated by more than one bit in address space.

Notice that the mesh diameter is always equal to or longer than the hypercube diameter. A diameter is (the length of) the shortest path between most-distant pairs of points via a sequence of directly connected neighbors. Given \(n\) rows and \(n\) columns for a 2D, the mesh diameter (distance between opposite corners) is \(n-1+n-1=2n-2\) in a mesh of \(n*n\) nodes. Whereas the shortest path between opposite corners of a \(C^N\) hypercube is \(N=\log_2(2^N)\). For example, 1M nodes in a 1000x1000 2D mesh are arranged with a diameter distance of (2000-2), while the slightly larger \(C^{20}\) hypercube with \(2^{20}\) nodes has a diameter distance of 20.

In general the lost connectivity from the deletion of edges or half-edges when constructing a 2D mesh from a hypercube, represents for any diameter either shortcuts or not. A shortcut makes the diameter shorter, a non-shortcut makes no difference. Never is a diameter shorter in the 2D mesh case.

Similarly hypercubes (with \(N>6\)) strictly include 3D meshes, and diameters are also shorter in the hypercubic case. In the case of 1000x1000x1000 3D mesh of 1B nodes, the diameter is 2997 steps while the \(C^{30}\) hypercube of \(2^{30}>1B\) nodes has a diameter of 30.

While 2D and 3D meshes superficially seem more comprehensible for 2D image or 3D scene processing, just consider the physical geometry separately from processor-communication architecture by mapping one to the other (cf the Optimal Mappings paper), and your process may be speeded up as we see by factors of thousands.

Lockstep Communication

This is taken from George's Optimal Mappings paper.

Consider communication in a network.

Suppose every communication is bidirectional. adjacent, and takes 1 unit of time.

Let every node in the hypercube communicate with its immediate neighbor, that is, another node with an address identical to itself except one bit is flipped, and let the bit which is flipped be the same for all nodes. Then nodes in one half of the entire set, with zero in that bit, communicates one-to-one, to unique nodes in the other half of the entire set. Everyone can communicate at the same time with their partner, that immediate neighbor which differs by exactly one bit in its address. The same bit for everyone. Lockstep.

Lockstep is the big deal in communication patterns.

Instead of Lockstep in modern parallel processing, we get Ethernet: blocking queues of serially-communicated packets. Do you see how there could not be any communication architecture slower than this "state of the art" approach? The measure of parallel computation is, behind von Neumann's bottleneck (cf Backus 1977), what fraction of the participants are able to get through the bottleneck at once? One at a time, that's a bottleneck. Ethernet is not only one at a time in terms of what is transmitted, but worse, one at a time in terms of who gets to transmit. In Ethernet every multi-party channel demands a global negotiation of who gets to send now, then that one gets to send a packet, then reset the negotiation and start again. Asynchronous and rare communications make ethernet pretty reasonable, but in parallel processing we want to open up the bottle's neck to be as wide as the bottle itself (Mou's expression, P.C.).

Every participant ideally should be able to send a message at every time step. Lockstep allows this with the simplest concept. Instead of spaghetti communications, every sender sends at the same time to a recipient in the same, perhaps geometric, relationship to it in the mesh or hypercube. Lockstep communication doesn't require hypercubic direct-neighbor communication, but it helps to have some kind of mesh where a target geometry of communication can be defined uniformly across the mesh. Then if everyone is a white pawn for a moment, everyone can move one step forward at the same time, and if everyone is a white knight for a moment, and everyone knows to move two to the left and one forward at a certain time, again all at the same time, then noone ever interferes with anyone else, no negotiation of who gets to communicate is ever done because everyone communicates at the same time. Lockstep is George's genius solution for maximizing communication in parallel networks.

Have you discovered figured out that communication is one of the three fundamental dimensions of computation, the other two being structure manipulation (division and combination) and local operations? Did you notice that no high level computer programming language lets the programmer decide anything about two of these three dimensions? You cannot even give any hints about how best to carry out the necessary communication in parallel processed computation!

The reason this is so exciting and horrible is because of the ratio of delay for local operation and data communications. Did you notice that moving data depends on both transmission speed and bandwidth? In computers, the time to get some data from one level of memory to another can be multiples of hundreds or thousands compared with the time it takes to carry out a local operation within the processor.
The concept of Computational Intensity is FUNDAMENTAL. If you have to move some data 1x and operate on it 1x, then your intensity is 1x/1x = 1, whereas if you can operate on it 10x, then your intensity is 10x/1x = 10, and your subservience to the horrible delays of the caching hierarchy is reduced by 10x. When it takes 100x time to move something from slower memory to the place the operation will be done, as compared with doing the operation, then communication ought to be a primary focus of the parallel programmer.

Taking some permutation of the dimensions P, say P=[0, .., N-1], or P=[N-1, .., 0], or in fact any order, communicating across the first bit (which divides the hypercube into two populations of nodes, each with an immediate neighbor in the other population), takes let's say unit time, across the second bit (in P) also in unit time, etc., creates a communication path from any node to any other node in just N units of time. To cover \(2^30 = 10^9\) nodes requires just 30 units of time, for any-to-all or all-to-all or any-to-any communication.

Lockstep is the fastest communication method for parallel processor architectures. Considering what passes for communication architectures for a parallel computer nowadays, it is vastly vastly faster.

Got it?

Reference: See George Mou's Optimal Mappings paper.

Story Time

Here's a nice story from George to go with this. It's about George's first Yale student internship in the biology department, where the professor had bought a parallel-processing computer for technical computing in biological sciences. George being the parallel programming expert was hired for the summer, perhaps. He struggled until he discovered that using the GPU array to do nothing took longer than using only the CPU to do the whole algorithm. To understand and prove this he changed his code so that results were immediately returned without ANY distributed computation, but only the communication of the data to be used for the different calculations to the different processors. Wow: that communication task took longer using that particular parallel computer than the CPU took to do the entire algorithm in a single-threaded single-processor system.

Hence communication is a big deal, and must be done efficiently. All to all simultaneously beats one to one serially, for example.

Hence think about Hypercubic Communication. Okay? It's the kind of communication in Divacon expressed as between vectors related by \(d_{lr}, c_{lr}, c_{eo}, c_{eo}\), or as I am thinking \(d_{P}, c_{P}\) which carry out division and combination according any bit-permutation whatsoever. Then the number of bits in the addresses is the number of units of time required for a complete all-to-all or one-to-call communication in the system. Which is great, amazing, and superior.

And it's all George, where the heck have the rest of you been?

Okay?

	Your thoughts?
Name⁺:
Email^*:	(will not be shared or abused)
Comment:
	Feedback is welcome.

Definitions

Consequences

Lockstep Communication

Story Time

Copyright © 2024 Thomas C. Veatch. All rights reserved. Created: December 29, 2024; Modified January 6, 2025

Copyright © 2024 Thomas C. Veatch. All rights reserved.
Created: December 29, 2024; Modified January 6, 2025