So far Rayleigh's rule has been applied to the explanation of established facts. Let us use it now to predict new facts, which to my knowledge have not previously been observed. Consider the effect of articulating an apical obstruent at various locations between the antinode at the lips and the F2 node around or behind the palate. By the theory, the farther back the constriction, the greater the effect of F2 node constriction relative to the effect of antinode constriction, and thus the higher the balance point, or locus frequency, of the second formant. Varying the constriction location in this way in a uniform tube corresponds to the articulatory differences between retroflex and alveolar (backer) vs. dental (fronter) obstruent locations.
An experiment was carried out to explore the different effects of
various apical places of articulation on the formant transitions into
the consonant. The measurements are similar to those of Sussman
(1991).
Data and Method:
I produced voiced dental, alveolar, and retroflex stops in repetitive, equally stressed, CV syllables, 4 to an utterance, varying the vowel among /iy, ey, I, , æ, , , ow, uw, U, /.2.11 One utterance, for example, was of the form ``dddd.'' Two sets of four repetitions of each CV combination were spoken and digitized (16 bits, 10kHz), resulting in 2 sets * 4 repetitions * 3 places * 11 vowels = 264 tokens. Two signal-processing operations were performed: an RMS energy contour was calculated, to be used in locating the measurement points; and the signals were formant-tracked to measure formant frequencies.2.12 The onset and nucleus time-points were located by means of a nearly automatic procedure. The onset and offset of the acoustic vowel in each CV syllable was automatically located using an energy threshold: the onset of the vowel is located at the upward threshold-crossing in the amplitude contour, and the offset is located at the downward threshold-crossing. The crucial choice of an effective threshold is made by examining the values of the RMS energy contour at the onsets and offsets of a number of syllables, and picking an RMS value intermediate between the values for bursts and closures and the values for vocalic segments. The occasional jitter back and forth across the threshold level was eliminated by a ``sloppy-crossing'' threshold algorithm, where a ``true'' threshold crossing is taken to be one where after crossing the threshold the signal stays on the other side for some fixed time. The exact threshold crossing location is linearly interpolated between frames, so that if frame t is just under the threshold, and frame t+1 is far above the threshold, the crossing location is closer to the center time of frame t than to that of frame t+1. Using this method of automatic segmentation not only is a labor saving device, but also results in more consistent segmentation. The same criterion is used consistently throughout, where in hand segmentation the method may vary slightly from the beginning to the ending of a lengthy measurement session.
The threshold-based onset and offset locations are examined by eye in relation to the waveform. The threshold that was chosen resulted in only 6 out of 264 tokens with even slightly unreasonable segmentation points; these were modified by hand.
Formant frequencies for the frame whose center time was within the centisecond immediately following the onset location were extracted from the formant-track data. The nucleus formant frequencies were taken from the frame halfway between vowel onset and offset locations.
The formant frequencies of onset and nucleus for the three categories, dental, alveolar, and retroflex, are plotted against each other in charts displaying F2 transitions. A line was fitted to each of the three classes of tokens. The results are in Figure .
These graphs display transitions between the onset and the nucleus. They are similar to those found in Sussman (1991), following Lindblom (1963a). Let us consider how they may be interpreted.
If the onset is higher than the nucleus for a given token, that token will occur above the Y=X line. If there is no transition, and the onset frequency is identical to the nucleus frequency, then that token will occur on the Y=X line, at its nucleus frequency. If the onset is lower than the nucleus, then the token will be plotted below the Y=X line. The magnitude of the transition equals the distance above or below the line. If in all transitions for a given consonant class the onset were lower than the nucleus, independent of nucleus F2 frequency, then the tokens would be distributed below the Y=X line. Bilabial consonants, with an onset F2 lower than the nucleus F2, should be arrayed in this way.
In these graphs, tokens with low-frequency F2 nuclei occur above the line (onset is higher than nucleus), while those with high-frequency F2 nuclei occur below the line (onset is lower than the nucleus). The point at which the regression line intercepts the Y=X line, called the ``Y=X-intercept'' in the chart, is interpretable as the classic Haskins ``locus'' frequency: above that nucleus F2 frequency, onsets are lower than nuclei, while below that frequency, onsets are higher than nuclei, and just at the Y=X-intercept, there is no transition. The slope of the line2.13shows the degree of applicability of the locus theory to the data: If the line is horizontal (slope=0), then no matter what the nucleus, the onset would start at the same frequency. This would occur if there were a fully realized locus, from which all transitions begin. If the line has slope=1 (that is, it is parallel to the Y=X line), then all the onsets are a fixed direction and distance from the nucleus. E.g., if F2 is 30 Hz lower at the release of a bilabial consonant than in the nucleus, independent of the nucleus F2 frequency, the slope of the fitted line in the transition graph would be 1, and its vertical offset would be 30 Hz below the Y=X line. Consonants with a locus equation of slope between 0 and 1 have a ``virtual locus,'' which is not attained in cases where the nucleus F2 is distant from the locus frequency. The closer the slope is to 0, the more actual, and less virtual, the locus is. The closer the slope is to 1, the more the transitions follow a constant pattern, going a relatively fixed distance in a fixed direction. For slopes greater than one, the range of onset frequencies should be greater than the range of nucleus frequencies, resulting in a set of transitions forming a reversed fan that spreads out, going backwards from the vowel nucleus into the preceding consonant. Consonants exhibiting such a pattern would actually have an anti-locus, a frequency from which formants move away, in the transition from a vowel to a consonant. Thus, for example, if the nucleus is a little distant in frequency from the anti-locus, the onset is even farther away from it. This pattern would be a surprising finding that is completely incompatible with a locus theory of consonant-vowel coarticulation.
The three consonant classes examined here fall into two groups: alveolar and retroflex on the one hand and dental on the other. The difference between the two groups is exactly as predicted by the node-antinode theory: consonants with a constriction closer to the lips and farther from the palate contribute more to the sum of antinode constrictions than to that of node constrictions, and should have lower onset transitions in general than consonants with a constriction farther back in the mouth, closer to the F2 node. Thus the Y=X Intercept, i.e., the locus frequency, for the dentals is lower than for the alveolars and retroflexes, by about 160Hz.
Another difference between the two classes is that the locus equation's slope is smaller, (the line is closer to horizontal) for the retroflex and alveolar classes than for the dentals. By the above interpretation, this means that the locus for the retroflex and alveolar consonants is closer to being fully realized than for the dental consonants. Conversely, the dentals have F2 transitions that are more nearly constant in pattern across different F2 values. In short, there is less of a locus effect with dentals than with alveolar or retroflex consonants. This is to be expected if alveolar/retroflexes are closer to the midpoint between node and antinode than dentals, since the balancing, ``locus'' effect is lessened, and the constriction becomes more purely one kind of effect or the other. At the midpoint between node and antinode, a constriction will make a maximally balanced contribution to both node constriction and antinode constriction sums, and the node-antinode balancing, which constitutes the locus effect, is maximal.
Having interpreted the difference between the dentals and the other coronals studied here, the question remains, Why are retroflex and alveolar consonants so similar? The slopes of the fitted lines are within 2 percent, and the locus frequencies (Y=X-intercepts) are identical to 3 decimals. This near-perfect identity is quite puzzling, since the retroflex consonants were felt proprioceptively to be articulated farther back in the mouth than the alveolars, and also they sound more retroflex. There may be other acoustic cues to distinguish them, such as F3 differences.
This patterning does correlate interestingly with the fact that speakers of Hindi, which has both dental and retroflex stops, classify English alveolars with their retroflex stops rather than with their dentals. The retroflex production of English coronal stops is a salient part of an Indian English accent. Perhaps native Hindi speakers learning English classify alveolar stops as retroflex stops on the basis of this acoustic similarity.
However, the identity of alveolar and retroflex stops is not predicted by the node-antinode theory, unless the node is located between the alveolar and retroflex places of articulation. Since the node is 1/3 of the distance from lips to glottis, this seems unlikely. Perhaps constrictions elsewhere in the vocal tract have counterbalanced the F2-raising effect of moving the constriction back, towards the node.