The central statistical problem of these studies is that of demonstrating whether the distributions of two sound classes in F1-F2 space are the same or different. Different distributions in this acoustic space are evidence for some linguistic difference, whether phonetic or phonological, as argued earlier. Therefore the problem of characterizing a difference as significant or insignificant is crucial.

If two categories of data are normally distributed, then
well-understood analytical methods are available to test the
hypothesis that the two categories are different. Such methods exist
even for multivariate data, and even for log-normal distributions.
However, formant-frequency data is frequently *not* normally
distributed. For example, the distribution of // in the Chicago
chapter (page is quite non-normal, as are any
number of charts of raw formant distributions.

A test exists for testing the question, Do two distributions have different means? which does not depend on the normality assumption: the Wilcoxon test. But tests of this nature for multi-dimensional data are less well-known (though see Maekawa 1989 for a two-dimensional t-test, not used here). I use two methods in dealing with this problem. The first is used as a technique of visually displaying differences in a way that is easily interpreted. The second is used for numerically estimating the statistical significance of the difference between two sets of measurements.