Let's watch reverse unroll in a Divacon
computation/communication graph:
By this method, reverse takes log-N steps (here, three steps), each with a division, a combination, and a full set of parallel exchange-and-select-other operations between corresponding elements between pairs of 1-arrays, 2 arrays, and 4 arrays. Notice that from each position there is a communication once in each of log-N directions. The remarkable characteristic of this communication pattern, expressed in the term "lockstep", is that an entire half of the set of all positions communicates with a second entire half of the set, all in one timestep. The whole marching band takes one step in one direction at once. Conveniently you can cut the whole set into halves 3 different orthogonal ways, like into 8 1-arrays, 4 2-arrays, and 2 4-arrays. Talk between halves that are cut up this way, then that way, then the other way, you're done. Is that a miracle? I think it's a miracle. I can't see it at all, then it seems perfectly obvious, then I can't see it again. At all. So go through the steps of the unrolled computation graph, and see if any part isn't perfectly obvious. Then have a cup of tea, and begin to accept it, and soon, very soon, try some more examples, and maybe some exercises, and try to do some for yourself.
|