Optimal Mapping

solving the k-D fft on an m-D parallel processor mesh

On Mou
on Divide and Conquer

Foreword

We model the parallel computer as a set of processors connected in some kind of preferably regular mesh or lattice, an array of data, and a PDC copied to all the nodes, so they know what to do. The processors are preferably connected in a HyperCube architecture, described as follows. Pick a number of dimensions, \(m\). Let there be \(2^m\) processors each labelled with a number or address in \(0..2^m-1\), that is, with an \(m\)-bit number. Each bit, say the \(i\)'th, is the \(i\)'th coordinate for any given processor's number. Further, for any bit half of the processors have 0 in that position, and half have 1 in that position, so the bit divides the processor set in half. Each bit or dimension is fully independent of all the others. And in a HyperCube architecture (I suppose) each half of the full processor set identified by a bit's value is directly connected to the other half; that is, corresponding processors in the two halves have the same address except they differ in that bit's value.

Motivation: a Narrative Introduction

To motivate me, George told me the story behind his Optimal Mappings paper.

After getting his PhD in 1990, George went to be a professor at Brandeis University, close to Yale and MIT. This was the golden age of parallel processing, and the golden place for it also. In the famous Highway 128 ring around the suburbs of Boston were three different parallel computer companies, which had a 2-Dimensional, 3-Dimensional, and M-dimensional architecture for the mesh connecting the processors, which were MasPar, NCube, and Thinking Machines, respectively. The market for parallel computers was mainly driven by the need to do 2-D FFTs, think of image processing, to find spatial frequency patterns in the images. Hence MasPar was getting all the business, because they could map the FFT data and tasks and communications to the 2D mesh of processors without further thinking, whereas mapping data and tasks and communications to a 3D or m-D mesh requires some thought.

At some point the CEO of Thinking Machines invited George in for a consult. George arrived, and asked, What is the problem? We can't sell our machines because they are m-D meshed and the customers want 2D meshed. So we took our startup money, many millions of dollars, and redesigned a 2D mesh communication system on top of our M-D mesh system, so the customers would be happy. But we find we have slowed down the algorithms by a factor of 10. Please help us.

George said: Give me five minutes. After five minutes, he said: Call your CTO and engineers. They came. George said: To map a 2D mesh on top of a m-D mesh introduces queueing, blocking, interference between processors. Further, the path between processors takes the Manhattan distance, a message must go through at most \(sqrt(2^m)\) steps instead of \(m\) steps. If \(m=15\), say, then \(sqrt(2^m) = sqrt(32768) = 181\) which is indeed more than 10x slower than 15 steps. Since communication delays dominate the computing time, you have successfully spent your startup funds to slow down the system by 10x. Congratulations. Later, George got the call from NCube, the company with the 3-D meshes. The same story evolved. Did they build a 2D mesh on top of their 3D mesh, to avoid the mapping question? But the path between processors on a 3D mesh is, well, how do we calculate that? We divide them (say 2^m processors) all up into 3 dimensions, then there are cube-root of \(2^m\) values in each dimension (=32), so that to communicate from one corner to the opposite requires (with \(m=15\) again) 32-1 steps in each dimension times 3, thus 93 steps instead of 15 steps. Congratulations, you have only slowed down your algorithms by a factor of 6, by spending the R&D money to add a 2D mesh on top of to your 3D processor mesh, whereas Thinking Machines did twice the slow-down.

George is quite ironical, as you see.

So this was the context in which the Optimal Mappings paper came out. George thought he would try to solve the problem, once and for all, of how to map the FFT (which is what everyone was using these machines for) onto a mesh of any number of dimensions. And he proved in this paper exactly how to do it, except that there are \(m!\) equivalent ways to do it. Taking just communication as an example, you can communicate processor-to-processor, from one entire half of the set to the other entire half of the set, choosing any bit to divide the two halves, and send messages from every processor in the one half to their corresponding processor in the other half. The evens to the odds, for example, or the first half to the second, for example, or use any of the bits in the address space to address the communications from one half of it to another half of it. Do this once for all the bits, in any order and everyone has communicated with everyone in only \(m\) steps. And yes it is optimal, from now to eternity you won't do asymptotically better.

So everyone needed desperately to know how to map FFTs and their calculations and data and communications to a network of processors and if they couldn't figure it out it would cost them their whole company. And Thinking Machines indeed went under in 1994.

Although George said when he gave the paper at a conference, it received the Best Paper award, he also said he doesn't think anyone has read and understood the optimal mappings paper, and applied its lessons.

So I thought I'd gave it a shot, to understand and express at least a part of this seminal paper.

Dimensional communication is optimal

Let's call this dimensional communication.

One of the points of George's paper is that if you do communication in a parallel program by sending a message between adjacent processors -- pairs which differ in a single bit -- in fact if entire halves of the processor set communicate with their corresponding processor in the other half (odds to evens; left side to right side; all processors in the address space with any given bit a zero to all those with that bit a one), then a communication pattern between a full set of pairs in one dimension can be redirected and repeated in any other dimension, until all dimensions have been exhausted, thus reaching from all processors to all processors in a maximum of \(m\) time steps. Conveniently everyone is, all the processors are, looking and sending/receiving in the same direction at the same time, so corresponding-processor communications can be simultaneous over the entire set, one half with another half distinguished by any given single address bit. Therefore there need not be any queueing or blocking or interference between messages, and in \(m\) steps everyone talks with everyone.

Wasn't that simple? It's another application of the surprisingly profound idea of the bit, applied to the question of communications in a network. And yet before you read this, you couldn't have conceived it, and worse yet the entire field of theoretical computer science has never understood it to this day.

A bit is a dimension, independent of other bits, other dimensions. A lot of bits gives you a high dimensional space. But merely \(n\) bits while generating an address space of \(2^n\) only needs \(n\) steps to navigate between opposite corners, one bit, one dimension, at a time. With just \(n=100\) we are doing unimaginably huge problems in unbelieveably short times. Do you get it?

Once you get it, there is no going back. Without dimensional communication, and the divacon concept generally, you cannot even conceive these ideas, and yet with them, you can conquer the world.

May I say, Study and learn with love. Because this is the result of George's study with love.

Bibliography

Z. George Mou and Xiaojing Wang, Optimal Mappings of m Dimensional FFT Communication to k Dimensional Mesh for Arbitrary m and k. pp 104-119. in PARLE '93, Parallel Architectures and Languages Europe, Lecture Notes in Computer Science #694, Springer-Verlag. 5th International PARLE Conference, Munich, Germany, June 1993 Proceedings.  

 

Your thoughts?
(will not be shared or abused)
Comment:
                                          Feedback is welcome.
Copyright © 2024 Thomas C. Veatch. All rights reserved.
Created: July 22, 2024