Relative Arrays

or, thinking clearly so that thinking is possible

from Mou 1990

On Mou
on Divide and Conquer

Every solution is either trivial or false.
-- Zhijing George Mou.

The "relative array" concept is essential if you want to understand the details of Divacon, or more, to implement a Divacon compiler or interpreter.

Arrays can be absolute, relative, and related.

  • ABSOLUTE arrays are regular arrays in current programming languages. We like zero-based indexing (Mou 1990 p7) "for the easy grasp of their index domains".

  • RELATIVE arrays always count from 0 to A.L-1 and have the property that they can be divided and combined, the result also being a relative array counting from 0 to L-1. Relative arrays aid in clear thinking about the conceptual operations in CS without getting bogged down in index arithmetic. Relative arrays seem to be unknown in computer science outside the work of George Mou.

  • RELATED arrays are pairs of relative arrays which are the immediate result of a division operation or the immediate inputs of a combine operation. Another way to say it is \(d\ V\ =\ L\ R\) and \(c\ L\ R\ =\ V\), where L and R are related arrays and L, R, and V are relative arrays.

Incidentally, COMMUNICATION within Divacon is well-defined as to endpoints and comprehensible as to meaning when it is between Related arrays; for example #!corr defines a communication pattern which means that, particularly referring to related arrays L and R, each element in L communicates a copy of itself to the corresponding (same relatively indexed) element in R, and vice versa.

Implementation

Would it help to give C declarations for members of these objects?

  • 1) Absolute arrays contain
    int sz; // element type or size
    elementType *dataPtr; // a region of memory, defined by a pointer to the first element and a length.
    int L; // length (indexes are in the range 0..L-1)
    int B; // *    log_2 L = B
    // *: value derived from other parameters.
  • 2) A relative array contains:

    int sz; // element type or size
    int d; // the dividing bit (or in a permutation of address bits, the last)
    int L; // *   length \(L=2^d\). (indexes are consecutive in the range 0..L-1)
    elemType *dataPtr; // a memory region for elements, addressed by indexes in 0..L-1
    int s; // *    start address (always 0 in relative(-to-self) indexing)
    int t; // *    terminal address (always L-1 in relative(-to-self) indexing)
  • 3) Related arrays are just relative arrays, except that we also consider their context of use. When a parent is divided, its daughters are related arrays. When combination is to be done, the arrays to be combined are related arrays. Given a parent has dividing bit \(d_p\), the new dividing bit \(d_d\) of each daughter related array is one less than that of its parent. Given two daughters to be combined, they must have the same value dividing bit \(d_d\), and the new dividing bit of the combined result is again \(d_p\): \(d_p = d_d + 1\).

Relative Array relationships

Let A be a relative array with \(2^d\) elements.

The Left-Right division operation, \(d_{LR}\), divides A into (related) daughters L, R such that L contains all the elements of A with addresses containing a '0' in the d-1'th least significant (0'th most significant) bit of the binary value of the address within A, while R contains all the elements taken from A from addresses having a '1' in that address bit.

L.d = R.d = A.d-1, since L and R have half the number of elements as compared with A. That is, the bits in an address in L or R are one fewer than the bits in an address in A. Further, the addresses within L or R of an element taken from A are exactly the same address as the element had in A, except the most significant bit is removed. In a sense, nothing moves, but array address bookkeeping can simply be made to note that the L and R addresses are one bit shorter, and the L (R) elements are those with a 0 (1) in the parent array address's high bit.

A combine operation combines relative arrays L, R into a larger relative array A containing all elements of both L and R. It puts elements from L (R) into A at the positions having an address comprised by the element's address in L (R) prefixed with a '0' ('1').

Where is the data?

One (toy) implementation of relative arrays is in a uniprocessor with a single contiguous array of memory. In that case, all A.L elements are addressable in the absolute array as values A[i] with i in 0..L-1.

Another implementation is with some number of processors accessing some sections of memory each with their absolute or relative properties, sizes, etc.; such details are the compiler's task to manage.

The entire input array may (should?!) never need to be fully copied over into any one processor's memory; the system can do its divisions merely conceptually, by just updating pointers and indexes and descriptive data such as d. Only after division down into small, even atomic subarrays, might actual movement occur of any data from storage to one or another processor for actual processing. No assumption is made even that such storage is a single location. Indeed many-to-many communication is the theme of this work. Lockstep!

Listen: To blindly assume all the data must be copied to the highest level processor, and halves thereof to other processors that receive divided arrays, is an unimaginative approach that unnecessarily congeals the imagined system with an infinity of serial data copying, yet that assumption is quite unnecessary.

Indeed, as results are calculated on combining subarrays in for example a postadjust PDC, it may emerge conveniently that only atomic arrays may be sent to processors for their local operations to occur, and similarly small amounts of data, namely summary results from adjusting and combining subarrays (subresults!), might get passed to the processor of the combined array. Then no message is larger than the local data needed for a single local operation.

Yes the compiler must solve these details, but do we need to assume it will do the slowest and least effective thing? No!

An Exercise

Question: Given a relative array V implemented over an absolute array A. Indexes within A range maximally between 0 and \((2^B)-1\) inclusive. If V is the result of \(d\ge 0\) divisions on the maximal relative array, then what are the largest and smallest indexes for elements in V?

Answer: Indexes can only range from 0 to \((2^{B-D})-1\).

Check my arithmetic!

Your thoughts?
(will not be shared or abused)
Comment:
                                          Feedback is welcome.
Copyright © 2024 Thomas C. Veatch. All rights reserved.
Created: December 19, 2024