SRNNs are not really neural networks, although they look like them. Instead of learning weights, the learning is only in the link lengths and perhaps the relative twists of normals across each successive link.
Other than its brief, one-time dependence on learning these dependent and fixed quantities, the SRNN concept is a simply a tree structured geometrical information transform.
Before I transcribe into here my notes on SRNNs, let me mention the space controversy. My argument is that math and symbolic logic are derivative of spatial perception; once an organism has some analog of space in its cognitive/representational capabilities, it also has the entire basis for hard logical reasoning, for it can say something is here, and not there, and respond differently in the two cases, which may differ only spatially: as one thing in two locations or as two things in one location. Therefore it can reason with the confidence of direct spatial perception, which is to say, life-committing certainty, about what is where, and what isn't. Even if all space reduces to a single pair of positions, occupiable by zero or one occupants, which is pretty minimal, then an organism with that representational capability is going to be able to do essentially logical reasoning by merely examining its representations. Ah, my thing is here, not there, or there, not here, and by responding differently in the two cases, it is carrying out operations which have all the character of logic: NOT, IMPLIES, AND, OR, XOR as in:
A consequence of this view is, symbolic representations are derivative; they multiply innumerably without adding meaning; their meaning is transmutable into spatiality. To those who consider linguistic representations Platonically otherworldly and particularly non-spatial, please have a look at Veatch 1991, on Time (and Space) in Linguistic Theory which should put that to rest.
I think I've made this point elsewhere, but upon searching for Newton's original treatment of the Newton-Raphson, I came across his Method of Fluxions and Infinite Series, in which the point is repeated by the translator:
The chief Principle, upon which this Method of Fluxions is here built, is this very simple one, taken from Rational Mechanicks; which is, that Mathematical Quantity, particularly Extension, may be conceived as generated by continued local Motion, and that all Quantities whatever, at least by analogy and accommodation, may be concieved as generated after a like manner. (Preface by John Colson, trans. 1736 of Method of Fluxions, by Isaac Newton 1671).Motion = Space X Time. We organisms start with (temporal-)spatial representations, and then we reason on top of those. Thus mathematics, thus logic, thus symbolisms, thus our natural confidence of and self-persuasion in thinking itself -- because we do not question, but we fully believe, what is spatially present to our senses and represented perceptually in our internal analog of space, whatever that may be; therefore, the things we read off of space we believe with equal certainty.
A pulse timed within a transmission frame's period, represents a corresponding distance from the timer-generating node. Two copies of the pulse on a stereo transmission channel may be shifted in time to represent a positive or negative number similar to a degree of stereo left-vs-right shift, or bend angle \(b\). They may also be scaled relative to one another, with the difference or ratio (log difference) representing a twist angle \(\theta\), with equality representing that the bend remains within a reference plane, and inequality representing that the bend is in a plane rotated from the reference plane, around a reference direction or pole.Cartesian coordinates? Given a center \(C=(cx,cy,cz)\) and radius, \(r\), Cartesian coordinates for the points on the surface are in \(\{\ (x,y,z)\ |\ (x-cx)^2+(y-cy)^2+(z-cz)^2=r^2\ \}\). This is correct, general, and useable but not the simplest, and not as easily transmitted as other options. Wouldn't you need to transmit three numbers, instead of two?But before we go off the deep end with Twist and Bend, let us consider the more normal approaches, why might they need improvement?
Both biological/empirical and theoretical arguments could be made for and against Twist-and-Bend encoding.Geographical coordinates? This requires a sphere and two selected reference points on its surface to serve as North Pole and as Greenwich Observatory. Its center, \(C\) and radius, \(r\), are implicit; and the equator can be derived as \(\frac{\pi}{2}\) away from the pole in any direction.Shannon's Information Theory would interpret this as follows: The actual number bits of information reliably transmitted in the form of two scaled/shifted copies of a unit pulse, are surely calculable and countable, based on the distinguishability-preserving density of packing of pulses into signals, and on the precision to which the receiving end is able to distinguish degrees of scaling and shifting (assuming that the sending end always has the ability to be more precise, which in a degrading-copies world, means the greatest detail and precision always resides at the origin of the information). Reciever precision defines the bit rate. That number of bits could be used to transmit any arbitrary bits you might want, from passwords to PDFs, not to exclude sensory or control data, within this particular packaging. Shannon showed us that bits are fungible, from the perspective of a communication channel, what the bits mean doesn't matter, what matters is that every channel has a limited bit rate.
To apply this to a theoretical transmission of Cartesian coordinates within neural networks, first we would have to imagine that neurons carry out an arbitrary packaging and unpackaging of what are in some essential way Cartesian coordinates in specific contrast to other representations, which I somehow doubt. Worms curve, joints rotate, and heads turn; they don't internally count units \(x\) rectilinearly to the right, \(y\) units up, and \(z\) units forward on a grid, unless they are carpenters with plans, although all of these may do something similar and transformably equivalent in rotations rather than rectilinearly, as polar and spherical coordinate systems teach us. Second, a global coordinate system in which the (x,y,z)'s are to be interpreted is not how organisms work. They have their own locally body-centered orientation frames; indeed one for each part. Just as organisms must very generally and by bitter Darwinian logic maintain and prioritize self, similarly even cells can only operate within the spatial and informational bounds of their local dimensions and interactions. Self-orientated, self-ish limited-information subjectivity are basic organizational principles and selectional constraints of life itself. So we can assume our representations are local and subjective. Not global or even out-of-body Cartesian coordinates. And third, Cartesian coordinates make no use of known local constants, distances and local orientations for their construction or interpretation of location information, nor of natural signal-modifying effects of transmission down neural links and across synapses which themselves, naturally, and without additional, side processing may carry out adjustments and shifts that correspond to local projections and transformations such that the more-centrally-perceived locations are correctly (faithfully) placed in the centrally-established local orientation frame, and the more-peripherally actuated regions or locations naturally capture the information intended for or corresponding to their region's local spatial labors. To do all that in a minimum-bits Shannon framework would be, it seems to me, to seek out something very much like the Twist-and-Bend representation and interpretation system given here. So much for Shannon.
But outside these theoretical considerations of Shannon entropy, signal processing, and digital modems, there is the biological consideration, which has been my motivator, although you may think me more an engineer.
In DSPs, GPUs, and robots, sure, you can use 4x4 matrices to transform to and between Cartesian coordinate systems, and that would be quite efficient in a computer. As engineer, I'd stop there. But as scientist, with enough of an engineering bent to demand something that would actually work, I want to capture what nervous systems actually do, or at least might do. In an ostensibly biological model of a space-representing nervous system, we want to know what the actual neural signal encodings are (spike trains!), and we want to understand how those might carry and map to locally-interpretable location information, and how those are transmitted, combined or divided, and internally represented, for perception and actuation. Here I'm proposing a scheme which something similar to real neurons could imaginably, evolvably, use to encode, combine, transmit, and decode relative locations within and across connected local frames, with such simple and direct relationships between signal characteristics and source location information that even a visual examination of signals would let you read, or imaginably an auditory presentation of them or a smoothed transform of them would let you hear, the locations directly out of the signals.
It prompts the hypothesis that audio signals are shaped by scale-and-shift adjustments:
No more or less than the Hebbian model underlying computational neural networks with backpropagation, or the CMAC or cerebellar model articulation controller, this is a biologically inspired guess of how things might work. It does also have engineering potential for use as a foundation for effective robotic perception/action loops, by providing the "embedding" space for machine learning models which acquire skills that require representations of space (that is, all skills).
- not only in stereo audition in which a single source impulse goes in one ear but must travel around the other side of the head, thus arriving with delay, perceived as bend within the default orientation frame of the listener's head.
- but also, check this out by the wavefront-shaping effects of the pinna and its parts, helix, anti-helix, concha, tragus, earlobe. These have definite and oriented spatial periodicities which imply frequency-specific and source-direction-specific amplification effects as the sloshing of air-density-waves of different spatial and temporal frequencies are held up at a distance from the doorway just long enough to be added to the later-arriving adjacent bit of the same wave at the same frequency, producing an amplification of a certain frequency of sound coming from a source in a certain direction, as they meet in the middle to enter the ear canal. The highest of frequencies from a certain direction, front and to the side, might double in amplitude after a half-period delay holds them up across the tiny projecting tragus, that pointy bit which covers the entrance to the ear canal, so that the slosh around the tragus adds to the directly aimed adjacent bit of the same wave just a half or quarter inch to the side. This pattern repeats anatomically on the curlicued surface of the ever so strange, cat-unlike, human ear, in at least three distinct frequencies and directions. This frequency-specific directional signal amplification effect can potentially explain the phenomenon of three dimensional auditory perception. Sharp or percussive impulse sounds of wide frequency bandwidth and little amplitude modulation across frequencies, and even pulse trains like human speech, containing many frequencies added together, will arrive with a strange amplitude modulation depending on frequency and direction. In the case of human speech (dare I mention, Vowels), each pulse's harmonics will be adjusted in amplitude based on their frequency and their physical direction from that ear. Cauliflower eared wrestlers are hereby predicted to have lost some part of their auditory directional-hearing precision, though not the basic stereo dimension; this offhand thought should be refined into a prediction of what frequencies and which directions, based on a physical models, and that should be empirically testable. As a scientist of the phenomenon of human speech, a phonetician, this is an exciting day.
Twist-and-bend won't be wrong unless both Shannon and biology reject it, and even if it's wrong, it won't be useless until robots can't make good use of it either.
Then two angles, a polar angle or longitude, \(o\) such that \(-\pi \le o\lt \pi\) (let's use radians), and an azimuth angle or latitude \(a\) such that \(-\frac{\pi}{2}\le a\le \frac{\pi}{2}\), uniquely specify any given point on the surface as \((a,\theta)\). The meaning of \((a,o)\) is as follows: Take the plane through the pole, the center of the globe \(C\), and any point on the reference longitude, call it \(g\) for Greenwich, then cut the full plane into half-planes along the center-to-pole line, and call the \(g\) side the \(pCg\) half-plane. Next, take a second half-plane, also having its edge on the line from center to pole, but making an interior angle between the half-planes \(-\pi \lt o \le \pi\) with \(pCg\). Within this second half-plane travel \(-\frac{\pi}{2}\le a\lt\frac{\pi}{2}\) around the sphere away from the equator and toward the pole (if negative, away from the pole).
An equivalent, slightly different encoding seems natural for our task of sending relative position as a time-domain signal, as must be done by neural network transmissions, which are themselves time-domain signals. This natural encoding would be similar to geographical coordinates, using twist (scale) and bend (shift) for polar angle and azimuth, except with zeroes appropriate to the interpretation of two equal-timed, equal-scaled pulses as indicating a location straight down the middle, neither twisted, nor bent. How, then?
Twist-and-Bend, which I'm using for this SRNN local position encoding, is essentially a re-centered geographical coordinate system with Twist \(\theta\) defined as \(\theta ≜ o\) interpreted relative to the reference normal \(n\) and Bend \(b\) defined as \(b ≜a+\frac{\pi}{2}\). So:
- Instead of a pole, we have a parent node location, \(u\).
- Instead of a center, we have a daughter node location, \(v\).
- Instead of a radius, we have a link \(uv\) and its length \(|uv|\).
- Instead of a longitude reference, or Greenwich, we are given an arbitrary reference direction, as a normal vector \(n_v\) at \(v\), where \(n_v \perp uv\).
- It may be convenient to consider \(n_v\) as a point on the equator, and for left-to-right readers to consider the globe as on its side, pole to the left, so that the branching of the SRNN tree from parent to daughters can be read as proceeding from left to right. Then \(n_v\) may also be arbitrarily located at the top of the local equator and referred to as "Up".
- \((\theta,b)\) is a Twist/Bend encoding of an arbitrary point, \(w\), on the sphere where:
- The twist \(\theta\) \((-\pi\lt \theta\le \pi)\), is the plane angle between \(uvn_v\), the reference half-plane with edge on \(uv\) including \(n_v\) and \(uvw\), the \(w\)-including half-plane, also on \(uv\) but including \(w\).
- The bend \(b\), \(0\le b\lt\pi\) can be expressed in different ways:
- If \(b' = \angle uvw\) then \(b = \pi - b'\).
- \(b\) is the "exterior angle" between \(uv\) and \(vw\).
- If \(u'\) is the opposite pole from \(u\), where the continuation of \(uv\) intersects the sphere on the opposite side, then \(b\) is the interior angle between \(u'v\) and \(vw\).
- \(b = \angle u'vw\)
(Incidentally, as \(b\)→0 or \(\pi\), \(\theta\) needs less precision.)
Twist/Bend covers all of space by supplementing the direction \((\theta,b)\) with \(r, (r\gt 0)\), the distance \(|vw|\) from the center to the point.
We will use Twist-and-Bend geometry to transform \((\theta,b,r)\) coordinates from daughter's frame of reference to parent's frame of reference, recursively from periphery towards the root of an SRNN tree.
But let's describe the tree, the signals, their origin and transmission, then it will make sense why we transform between each level of the tree.
Given bend/twist/distance information for each pulse originated at \(x\), arriving from node \(w\) to node \(v\) with parent \(u\), the transmission needs to transform the pulse information into a new bend/twist/distance encoding, which is that pulse's geometrical position relative to \(v\) instead of \(w\).
With such a re-encoding for all pulses coming upstream across all nodes such that they capture the central view, a rich spatial representation for all the sources is made available at the higher levels.
Then:
Thus each time bounded signal \(s_u\) is a time domain representation of space.
\(s_u\) is an echo-space.
Compare this with echoed sound impulse copies returning to an originating spatial location with delays and scales proportional to the distances and sizes of nearby sound-reflecting surfaces. That information is not enough because the directionality or orientations of returning impulse copies is lost.
But if as in this system a pulse can be associated with \(b_w,\theta_w,n_u\) then direction is retained.
Given \(n_v\) at each daughter node v, \(n_v\perp uv\), then a twist \(\theta\) from \(n_v\) around \(uv\) (at \(v\)), and a bend \(b\) from 0° to 180° (\(uvw\) collinear: 0° w=u: 180°) specifies the pulse-originating location in the reference frame of \(u\), looking at \(v\) with "up" in the direction \(n_v\). The lengths \(|uv|\) and \(vP_w\) must also be combined by the alignment function \(a\) so that the timing position within \(s_u\) correlates to the expanded, central-to-peripheral distance \(|uP_w|\).
I'll do the math here, and my expectation is that the local neural response to inputs follows the resulting formulas, but I make no imputation that the neurons do math, just that what they do, does: they receive, copy/transform, and send signals in some biochemical cascade which itself follows the formulas. Perhaps further insight may emerge here as to how and why.
Actually \(n_x\) will be \(n_y\) if \(x\) is not a direct daughter of \(w\) but instead there is a different node, call it \(y\), which is the immediate daughter of \(w\) in the path down the tree which ends in \(x\). It doesn't matter, so long as it is available to the calculations below, because the half-plane containing \(v,w,x\) is at the angle \(\theta_{vwx}\) away from \(n_x\). From \(vw\)'s perspective we need the downstream reference normal in order to interpret \(\theta\) relative to it, and it just seems a little simpler to call it \(n_x\), which if it is a direct daughter, would indeed be correct. You can use \(n_y\) if you prefer.All that is anatomical context, which the system can calculate on its own by introspection, as described above, and keep as local parameters within this Space Representing geometrical tree.
Now we go live.
At some time period, we are given a Twist-and-Bend representation \(wx\) for a leaf location \(x\) at a recipient node \(w\) within an orientation frame \(vwn_w\) namely \(wx = (\theta_{vwx},b_{vwx},|wx|)\).
Translating to a geographic representation, from \(vw\), we are looking through a single-lens camera on the surface of an imaginary and transparent earth, at the north pole at location \(v\), with the camera pointing down at the center of the earth \(w\), which is a radius \(|vw|\) away from us at \(v\) and with the top of the camera pointing in the direction \(n_w\), which is parallel to what we will call the same vector from the center of the earth toward some place on the equator.
Going live, we get a Twist-and-Bend encoded signal from \(w\) saying that \(w\) knows about a location \(x\) in its frame of reference, which we will find at \((\theta_{vwx},b_{vwx},|wx|)\) relative to \(w\) and \(n_w\). From our perspective, this location \(x\) is bent away from our center-of-view line \(vw\) at a bend angle \(b_{vwx}\) within a plane which is itself rotated away from our "Up" (camera top) direction \(n_w\) by a twist angle \(\theta_{vwx}\). Its distance on that twisted-bent vector rooted at \(w\) is \(|wx|\). Can you visualize it?
Then we re-set, moving back up the tree from \(v\) to \(w\), and we have to transform \(wx\) from the \(vw\) frame into a wider view at the parent node \(u\), namely into a new Twist-and-Bend representation, \(ux\) as interpreted from within the \(uv\) frame. So there's a new center of the earth now at \(v\), a new north pole vector from \(v\) to \(u\), and we have to find the adjusted values for \((\theta, b, r)\) from a higher-parent \(uv\)'s perspective.
First, does the problem make sense? We want to move the information up the tree, and if we can move one location up the tree one level, then we can do it repeatedly at every node and thereby move all the locations up the tree all the way, from the sensory periphery at the tree's terminals or leafs, to the root of the tree where central processing of spatial information can be carried out on rich, hopefully accurate, or accurate-enough-to-be-useful, sensory data. That's our job.
So this bit we are developing here is the recursive process that happens in going from each daughter node to each parent node with every signal transmitted upstream. If we can get it right, then the whole thing will be a miracle and we can make it memorize shapes, remember shapes, perceive known-or-expected shapes including from low-resolution information given high-resolution memories, we can plan responses to shapes, including changing trajectories of shapes over time. If we can do the reverse mapping, then we can have parallel networks for choreographed actuator control trajectories going from the central spatial-processor at the root of the tree, and branching down to the actuators, muscle-twitch controllers within the same anatomy, so as to produce learned movements appropriate to the perceptual information sent up the tree. Coordinated action, baby.
So I hope that makes sense. We want to transform the locally-transmitted geometrical information from a twist-and-bend representation at a daughter node, to a twist-and-bend representation of the same location information at a parent node.
Then the second question arises: Can we? I'm not sure, at this writing, but I think so.
To boil things down to clarity, we are given our local constants and the current Twist-and-Bend datum passed now from \(w\) to \(v\), being meaningful in the context of the frame \(vw\). Our task, now, is to find \((\theta_{uvx}, b_{uvx}, |vx|)\), which is a Twist-and-Bend encoding for \(x\) within the frame \(uv\).
\(b_{uvx}\) and \(\theta_{uvx}\) are not shown in the figure because it got too busy. I think you can see what I mean.
- \(wx = (b_{vwx}, \theta_{wx}, |wx|)\) is given, being our Twist-and-Bend encoding for \(x\) at \(w\), recieved at \(v\).
- \(|uv|,n_v,|vw|,n_w,n_x\) are given by introspection, described above.
- \(n'_x\) was established at the previous level using its local constants and input signals (\(n'_w\) will be found at this level).
$$|vx|^2 = |vw|^2 + |wx|^2 + 2|vw||wx|cos(b_{vwx})$$
since \(v,w,x\) are coplanar. That was shockingly easy.
But in general \(u\) is not in the \(vwx\) plane, which will haunt us below, until we achieve clarity. Generally, we have to project \(n_x, n_w\) to \(n'_x, n'_w\) between the two equatorial circles around \(w\) defined by \(vw\), \(n_w\) and by \(wx\), \(n_x\). One is looking from parent \(u\) to child \(v\), a circle in the plane perpendicular to \(uv\) centered on \(v\) and containing \(n_v\). The other is looking from child \(w\) to parent \(w\), a circle in the plane perpendicular to \(vw\), also centered on \(v\) but containing \(n_w\).
But the plane \(vwx\) is not only \(\theta_{vwx}\) away from \(n'_x\) at \(w\), it is also \(\theta_{vwx}+n_w-n'_x\) away from \(n_w\) at \(w\). Adding the difference \(n_w-n'_x\) to \(\theta_{vwx}\) gives us the plane angle of \(vwx\) in the perspective of \(vw\), which is a step up the tree.
That's fine, but \(n'_x\) is not the same as \(n_x\). \(n'_x\perp vw\) but \(n_x\perp wx\), so in general they are not the same. They would be the same if \(wx\) were along the line \(vw\), that is, with zero bend, \(b_{vwx}=0\). We need a projection from \(n_x\) on the circle around \(w\) perpendicular to \(wx\) on to a projected normal \(n'_x\) on the circle around \(w\) perpendicular to \(vw\).
If \(n_x\) were in the plane \(vwx\), then \(n'_x\) would be \(n_x\) rotated by \(b_{vwx}\). If \(n_x\) were perpendicular to the plane \(vwx\) (at \(w\)) then \(n'_x\) would coincide with \(n_x\). This is compatible with bend \(b_{vwx}\) multiplied by a cosine of the angle from the plane \(vwx\) to the downstream normal \(n_x\), which is great except we haven't specified the plane \(vwx\) until after we know \(wx\) in the frame \(vw\) .
Or do we instead have \(n_x\) and \(n_w\) and \(\theta_{vwx}\) and \(b_{vwx}\) and need to project \(n_w\) onto \(n'_w\), so as to figure out \(b_{uvw}\), and \(\theta_{uvw}\)? I suspect that's the nugget of it.
But it's bedtime and it seems I have to dream about this to get more clarity, given my aging, fried, and underslept brain. 5:09am, good night..
Okay, I am still awake and I think I got a part of it.
Take the two equatorial equal-radius circles around \(v\) perpendicular to \(uv\) and \(vw\) respectively, call them \(Cu\) and \(Cw\). Notice that irrespective of their orientations, \(Cu\) and \(Cw\) intersect at two points (unless they are the same, in which case our problem is already solved, with \(n'_w = n_w\)). Pick one of the two points of intersection, either one, call it Top. Now find the minimum angle from Top to \(n_w\) on \(Cw\), call that angle \(X\) and its direction (tangent to \(Cw\) at Top\), call that \(dTop_w\). Now go to Top again, which is also on \(Cu\). Now choose the direction \(dTop_u\) tangent to and at Top and in the plane of \(Cu\) which is closest to \(dX\) (that is, with \(dTop_u\dot dTop_w > 0\)). Going around \(Cu\) in the direction of \(dTop_u\) will take you closest to \(n_w\). When \(Cu\) and \(Cw\) are at 90 degrees to each other, either direction will do equally well. Now travel from Top starting in direction \(dTop_u\) around \(Cu\) for angle \(X*\) maybe \(cos(b_{uvw})\) or something. That's your projection onto \(Cu\), call it \(n'_w\), of the daughter-link's reference normal vector \(n_w\). Okay now 5:26, going to bed for real now..
In the Twist-and-Bend hypothesis, pulses are doubly represented, as in an echo-carrying stereo signal channel, with relative amplitude encoding Twist, and relative delay encoding Bend. Delay encodes Bend because a signal going around the outside of a curve will arrive later than its initially equal-timed copy travelling around the inside of the curve. Twist might or might not actually be encoded by relative amplitude differences between copies of the same original pulse, but it makes intuitive sense that a signal arriving more strongly on one side of a stereo channel would be more to-the-side, and if encoded rotationally rather than merely laterally, would represent a twist parameter.
Then a transformation of this information which represents original event location, moving from one perspective in space to a different perspective one level up in the tree, would allows spatial information to be sent up the tree for central processing and listening/viewing, preserving relative locations of the original events.