Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: Adding normal density to overlayed histograms


From   Nick Cox <n.j.cox@durham.ac.uk>
To   "'statalist@hsphsun2.harvard.edu'" <statalist@hsphsun2.harvard.edu>
Subject   st: RE: Adding normal density to overlayed histograms
Date   Thu, 21 Oct 2010 13:09:22 +0100

Michael Mitchell and Ulrich Kohler explained what is going on in Stata terms and gave excellent and essentially identical solutions to the problem posed. Here I broaden the discussion. 

A histogram has some advantages and some disadvantages. This list is a personal take and naturally not intended to be definitive or complete: 

+1. It is likely to seem familiar to analyst and audience. 

+2. People can focus on modes, left and right tails. 

-1. One histogram can easily occlude part of the other, unless you do a lot of work. 

-2. More generally, the result can easily look a bit of a mess. 

-3. Histograms depend on choices about bin width and bin starts, even if those choices are automated; such choices can be hard to optimise. 

-4. Linked to that, you can lose detail that might be important. 

-5. If the normal is a reference, the comparison is of a curve with a set of bars, which is not the easiest comparison to get right. (Sometimes, the graph is a propaganda graph presented in the spirit "Look, it's roughly normal", when a more critical look would show important features, such as heavier tails or a mild outlier.)  

Now, in terms of alternatives: 

I mention first -histogram, by() normal- which eases some of the problems. 

A very different approach is to use quantile-quantile plots. Stata's own -qnorm- is very limited (one variable, one group), but it is easy enough 

(a) to do it yourself or 
(b) to exploit user-written programs. 

On (a), see 

SJ-7-2  gr0027  . .  Stata tip 47: Quantile-quantile plots without programming
        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
        Q2/07   SJ 7(2):275--279                                 (no commands)
        tip on producing various quantile-quantile (Q-Q) plots

The .pdf of that short paper is accessible to all via 

http://www.stata-journal.com/sjpdf.html?articlenum=gr0027

so I'll not repeat the exposition, other than to underline that the first worked example is precisely that raised in this posting, two groups and whether they are normally distributed. 

On (b), -qplot- offers one-liners such as 

. qplot mpg, over(foreign) trscale(invnormal(@)) 

-search qplot, sj- for publications and download sources. 

Nick 
n.j.cox@durham.ac.uk 

Dorothy Bridges

I am overlaying two histograms and would like Stata to add a normal
density curve for each.

hist x, normal addplot(hist x2)

works fine, but

hist x, normal addplot(hist x2, normal)

tells me that normal is not an option.  Any ideas as to why this is happening?


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index