The arguments in this book make use of a particular kind of inference, namely, that significant formant frequency differences reflect real differences in phonetic vowel quality. This section attempts to show that this kind of inference is valid. To do so, we must consider what are the relations between ``same'' and ``different'' in the realms of perceivable sound quality and of acoustic measurements, as displayed in Table .
Two classes of sounds may be the same on both grounds (Same), or different on both grounds (Different). In these cases, measured differences reflect auditory differences, and overlapping measurements reflect auditory identity. These constitute the ideal situations, in which both ways of characterizing vowel quality are mutually consistent.
But there are also non-ideal cases. Two kinds of possible mismatch may occur between F1-F2 measurements and auditory quality. The measurements may be indistinguishable while the classes sound different (A), and the classes may sound the same while the measurements are different (B). In case A, we generally assume that other features of the sounds besides measurements of F1 and F2 at the nucleus allow them to be perceived as different. For example, measurements of the nuclei of bait and bit can be entirely overlapping, but one is longer and up-gliding, while the other is shorter and in-gliding. The result is that they sound completely different, although their nuclei may lie in the same part of F1-F2 space. Similarly, Lisker's pap and pep tokens may have differed in duration, gliding, etc.
Case B, in which the measurements are different but the sounds are percieved to be the same, comes in two flavors. In one flavor, the perceived sameness of the sounds is due to their phonological identity, not to their phonetic identity. When undergoing raising, /æ/ as in pat or pan commonly varies in natural speech in Northern U.S. dialects from [æ] to [e], depending on consonantal context. Native speakers of these dialects have learned to classify these different sounds as the ``same sound'', i.e., as the same phonological unit, despite the significant phonetic differences between them. But despite this linguistic effect in perception, by which different sounds are perceived as identical, the sounds remain phonetically different. Children, trained phoneticians, adults listening to the sounds taken out of a linguistic context, and speakers of dialects where the phonetic change has risen to popular consciousness and become stigmatized, may differentiate them. Thus the first flavor of the mismatch between perception and measurement is one in which perception is wrong: the sounds really are phonetically different, as the measurements show, but the linguistic system of the listener may inhibit the recognition of this fact. Here, the inference from ``acoustically different'' to ``phonetically different'' is valid, despite the protests of phonetically confused native speakers.
The second possible flavor of this mismatch is where the measurements are wrong: The sounds are measured as different, but they actually are auditorily indistinguishable. This situation can arise in two trivial ways, and apparently not otherwise. First, the measurements can simply be erroneous. If the measurement procedure leads to picking the wrong formant, for example, one vowel can appear as another. Thus if a nasal formant at 400 Hz is present as well as an F1 at 300 Hz, then [i], where (F1,F2) = (300,2200) can appear as [u], with (F1,F2) = (300,400), if these two resonances are chosen as F1 and F2. Such errors can be avoided by carefully matching the formants chosen with what is known about the relation of vowel quality and formant-frequency. This makes formant-tracking an art rather than a science, in which the phonetician must go back and forth between looking at the spectrogram and formant-tracks to listening to the sound itself. Cases of bad formant tracking must be identified auditorily.
The second trivial source of the second flavor of mismatch between perceived and measured vowel quality derives from the inherently limited degree of precision of measurements and of perception. If two measurements, expressed numerically to the nth degree, differ by less than the measurement error, then the small numerical difference between them is not significant. Also, the human perceptual system has a limited sensitivity to formant-frequency differences, not a great deal different from the measurement error. Differences below this ``difference limen''(Flanagan 1955) are insignificant.
The last conceivable situation, in which this kind of mismatch could occur, is if the mismatch were genuine, that is, if the sounds really were auditorily identical despite non-erroneous, sufficiently large (greater than the difference limen), statistically significant differences in formant frequencies. I know of no good evidence that such cases occur. The most common class of evidence is that one can't hear a difference that one can perfectly well measure and see. But this can be attributed to a lack of phonetic sensitivity of the individual listener, rather than to the lack of an actual phonetic difference. In fact, listeners with a low degree of phonetic sensitivity, who believe that two quite different sounds (that may be phonologically the same) sound the same, can often be convinced by rapid interactive playback of the distinction. A method for learning to perceive subtle differences is to make an interactive vowel chart. In an interactive vowel chart, tokens are displayed on a computer screen in the form of labelled buttons that are located on an F1-F2 plot. When a button is activated by selecting it (with a mouse), the sound associated with the button is played out over a loudspeaker, so that rapidly going back and forth between two or more tokens, playing them out repeatedly, and listening for the differences in quality between them, will bring out the finest audible differences. Practice with interactive vowel charts can help develop listeners' phonetic sensitivity.
True class-B mismatches do not seem to occur. For F1, F2 differences between two sound classes to be judged both significant and audible, the difference must be both statistically significant and greater than the threshold of audible differences (Flanagan 1955). When I find F1-F2 differences between sound-classes that meet these criteria, I will infer that they reflect genuine, audible phonetic differences.