The linguistic and psychological significance of acoustic segments like those defined above must be explored empirically. The reason for supplying these definitions is that acoustic studies commonly investigate the properties of such segments, and it clarifies matters to define what we're talking about. This book, in particular, is about the properties of acoustic vowels. There is certainly information about vowel identity beyond the temporal limits of the acoustic vowel; whispered vowels can be identified, yet they do not constitute acoustic vowels; similarly, one can often identify vowels from the spectral information contained in acoustic consonant segments (for example, F2 is often clearly identifiable during adjacent turbulent noise segments). However, in phonetics, unlike in phonology, information is redundantly present. If a vowel's identity is signalled in many different ways, then a listener who is sensitive to all the redundant cues will be more easily and confidently able to identify it. Indeed, if some cues are effectively eliminated through environmental noise, listener's poor hearing, etc., then it is only the redundancy in the signal that enables the sounds to be correctly identified and for communication to function robustly.
The fact that information is contained in very short transitions into and out of a vowel does not imply that the ``nucleus'' of the acoustic vowel contributes nothing to perception. Similarly, the absence of an acoustic vowel corresponding to a phonological vowel in an utterance does not imply that there are no phonetic cues for listeners to make use of in identifying that phonological vowel. Many disparate sources of information about the underlying linguistic forms may be used to understand spoken language. The features may be ``distinctive'' or not, but as long as they are useful to listeners in constructing the linguistic form of an utterance, they are real, phonetic features.
This argument can be taken too far. It remains important to ask, What are the true cues? It is well and good to say that many redundant cues are present, but constructing the psychologically correct inventory of cues is another matter. Any phonetic measurement that can be made constitutes a potential member of this inventory, and many are clearly false members. For example, the formant frequencies during the thirteenth pitch period after voice-onset in a stop-vowel sequence constitute a potential cue for vowel identification. Indeed that information may often be enough to identify vowels correctly, or certainly to reduce the set of possibilities. However, depending on rate, pitch, and other factors, there may be fewer than thirteen pitch pulses in the realization of a given vowel; or the thirteenth pitch pulse may be still in the onset transition. An infinity of such possible measurements are derivable from the acoustic vowel's formant trajectories and duration (e.g., the first pitch period, the second, the third, ...). The catalog of cues may be analytically reduced to a smaller set from which the larger set is derivable, and this reduction is an important part of the progress of speech science. There is a balance between a senseless proliferation of cues and an overly abstract view of speech perception and production: the catalog should include all the cues that make a perceptible difference, and exclude all measurements that can be derived from others.
The question then arises whether patterns of formant frequencies during acoustic vowels constitute a legitimate part of the ultimate space of phonetic dimensions. If the mechanism of speech perception includes an F2-tracker, for example, it is possible that it is insensitive to rapid fluctuations in amplitude and to the noise source, and thus that it doesn't stop tracking F2 at the exact moment that voicing ceases (i.e., at the edges of acoustic vowels). Thus segmentation points that are acoustically well-defined may be irrelevant at various levels of the speech perception process. Similarly in production, the ongoing gestures of tongue, jaw, and lips that give rise to the acoustic realization of a vowel may partly occur during unvoiced segments outside the acoustic vowel, because the glottal voicing gesture is absent. The articulatory gesture for a vowel may start and stop at points that are only indirectly related to the temporal bounds of the acoustic vowel. Thus the particular properties of acoustic vowels chosen for measurement and presentation here are not necessarily the ultimately correct cues.
However, in the process of learning about how speech production and perception work, we must make progress where we can, by investigating promising sources of information, and by exploiting the knowledge that we do have, in order to learn more. The patterns uncovered in an exploration of formant structure in acoustic vowels must be determined in some way by phonetic grammar. The issue here is not necessarily to resolve immediately the ultimate psychological questions about the true catalog of phonetic cues for speech perception, but to find interesting patterns in measurements that call out for explanation, to model them (by writing programs or rule-systems that generate similar patterns, for example) and ultimately to predict unobserved patterns from the understanding gained.