Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: update hangroot


From   Maarten buis <maartenbuis@yahoo.co.uk>
To   stata list <statalist@hsphsun2.harvard.edu>
Subject   st: update hangroot
Date   Fri, 30 Nov 2007 17:58:25 +0000 (GMT)

Thanks to Kit an update of the -hangroot- package is now available from
-ssc-. This update adds two new features: the possibility to add
confidence intervals and new distributions. To update type -ssc install
hangroot, replace- or -adoupdate, update-. 

-hangroot- creates a hanging rootogram (Tukey 1977, Wainer 1974), which
compares a theoretical distribution with an empirical one, by "hanging"
the histogram from the theoretical distribution, instead of "standing"
the histogram on the x-axis. This way deviations are shown as
deviations from a horizontal line (y=0) instead of deviations from a
curve (the density curve). This makes it easier to spot patterns in the
deviations. Also the y-axis is scaled as the square root of the
frequency instead of the frequency to show deviations in the tails more
clearly. To quote one happy user: "-hangroot- is really fun to play
with".

This update adds the possibility to add confidence intervals around the
bottom of the bars. These confidence intervals assume that the number
of observations in a bin follow a multinomial distribution, and use
Goodman's (1965) approximation of the simultaneous confidence interval.
These confidence intervals do not take into account that the       
parameters in the theoretical distribution are also estimated. Also,
these confidence intervals do not take into account that nearby bins
are likely to be similar, as was suggested by Vermeesch (2005).
However, I would consider this latter point a feature, as this
corresponds with the simple non-parametric logic that is behind the
histogram and the (hanging) rootogram.

This update also increases the number of theoretical distributions
supported. Apart from the normal (Gaussian), beta, Pareto, and Poisson
distribution that were already implemented, -hangroot- now also
supports the exponential, Laplace, uniform, and geometric
distributions.

Two example datasets are available.  One file records the number of
deaths due to being kicked by horses in 14 Prussian cavalry units
between 1875 and 1894, as collected Bortkiewicz (1898). This closely
follows a poisson distribution. The second file records the proportions
of Dutch city budgets spend on different categories 2005. These are not
to far from a beta distribution. Use the -net get- command to get those
files.

-- Maarten

References

    Bortkiewicz, L. (1898): Das Gesetz der kleinen Zahlen. Teubner,
Leipzig.

    Goodman, Leo A. (1965), "On Simultaneous Confidence Intervals for
Multinomial Proportions". Technometrics, 7(2), pp. 247-254.

    Tukey, John W. (1977), "Exploratory Data Analysis", Addison-Wesley.

    Vermeesch, Pieter, (2005), Statistical uncertainty associated with
histograms in the Earth Sciences, Journal of Geophysical Research -
Solid Earth, Vol 110, B02211.

    Wainer, Howard. (1974), "The Suspended Rootogram and Other Visual
Displays:  An Empirical Validation".  The American Statistician, 28(4),
pp. 143-145.


-----------------------------------------
Maarten L. Buis
Department of Social Research Methodology
Vrije Universiteit Amsterdam
Boelelaan 1081
1081 HV Amsterdam
The Netherlands

visiting address:
Buitenveldertselaan 3 (Metropolitan), room Z434

+31 20 5986715

http://home.fsw.vu.nl/m.buis/
-----------------------------------------


      __________________________________________________________
Sent from Yahoo! - the World's favourite mail http://uk.mail.yahoo.com

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index