[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Simple way to calculate modality of non-normal variables

From   "Nick Cox" <>
To   <>
Subject   st: RE: Simple way to calculate modality of non-normal variables
Date   Fri, 11 Jul 2008 11:43:27 +0100

Your title may seem more na´ve than your specific details imply. I doubt that there is any simple way to calculate, or even display, modality that is entirely problem-free. 

I suspect what you do not say, that your interest is in (approximately) continuous variables. Assessing modality for discrete and continuous variables necessarily has different flavours. 

I'll give a broader answer than I suspect you want or need in the hope that might interest, or provoke, people other than yourself. 

1. -tabulate- shows frequencies and can be used to identify modes. -modes- (-search modes, sj- for locations) is a tool more directly designed for the purpose. Naturally these commands treat values exactly as entered, are more appropriate for discrete variables than continuous, and are descriptive only. 

2. The classical statistician believes multimodality (including bimodality and trimodality as special cases) only when 

(a) evident on histograms, or equivalent displays, with some jiggling of bin origin and width to be wary of binning artefacts 

(b) some measure of scepticism about apparent modes is applied 

(c) some substantive knowledge implies there is some mixture of different populations (sexes, rock types, whatever), or possibly just heterogeneity, which is responsible for the multimodality in the first place. 

The classical statistician is not up-to-date but will usually get it right. 

3. The modern statistician behaves similarly, but with kernel density estimates and some jiggling of kernel type and width. Something that is quite easy, but it seems is rarely done, is to estimate densities on some transformed scale and then transform to the original scale. There is some discussion of this in 

SJ-4-1  gr0003  . . . . . . . . . . . . Speaking Stata: Graphing distributions
        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
        Q1/04   SJ 4(1):66--88                                   (no commands)
        a review of official and user-written commands for
        graphing univariate distributions; includes tricks
        beyond what is obviously and readily available

which is accessible via

For very skewed distributions, it may make sense to check that estimation on raw and transformed scales gives consistent indications of multimodality.) 

4. Although it may seem perverse, I think it makes sense to look at the quantile function or cumulative distribution function for multimodality as well as density or frequency representations. One argument is that either of the former is relatively free of binning artefacts. Another is that by integration there is some automatic smoothing over irregularities. A mode shows up as a shoulder on a quantile function, for example. -qplot- and -distplot- (-search- for locations) are possibilities. 

Smoothing of quantiles can help too. See -hdquantile- on SSC for an implementation of the Harrell-Davis method. Distinguished Statalister Tony Lachenbruch did some work in very similar spirit, although not necessarily with assessing modality as a motive. 

5. The -hsmode- program on SSC gives what I think is a nice way of getting at modes, although I would say that. It won't help directly with assessing multimodality. 

6. -group1d- on SSC might be used to look for clusters in order statistics. 

7. I distinguish between granularity, often reflecting resolution of measurement, digit preferences, etc., and multimodality. If a criterion is needed, it is that granularity disappears or is much reduced on mild smoothing, whereas multimodality becomes clearer. 

However, I don't know of any Stata implementations of dip-test, Silverman's test, or the excess mass test, which was your specific question. 


Jordan Silberman

I'm relatively new to Stata--I apologize in advance for my Stata  

I'm struggling to find a way to have Stata calculate the number of  
modes in a non-normally distributed variable.

I've searched for a way to calculate this with procedures like the Dip- 
test, Silverman's test, Excess Mass test, etc., and I can't seem to  
find a straightforward way to simply calculate the number of modes in  
a single non-normal variable.

*   For searches and help try:

© Copyright 1996–2015 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index