Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Maarten buis <maartenbuis@yahoo.co.uk> |
To | statalist@hsphsun2.harvard.edu |
Subject | RE: st: Modeling % data |
Date | Fri, 24 Sep 2010 08:29:43 +0000 (GMT) |
--- Austin Nichols > -qreg- requires no "fix" using higher powers of X in the > general case. It does in the following sense: If we call -qreg prop X- then we assume that X has a linear effect on the median of prop. If prop is bounded, which proportions are, then such a linear effect will eventually lead to prediction less than 0 or more than than 1. This is not a real problem when that happens way outside the observed range of X, but this is a problem when that happens inside the observed range of X. Moreover, even if we do not get negative predictions, we would still expect the effect to "slow down" near the boundary in anticipation of the boundary (and similarly near the upper boundary). All this is more likely to occur when you have lots of observations near either of the boundaries. The example below illustrates how such linear model in the median can go wrong. This is intended to be an illustration, so I created the data such that it is very clear. In real data I would expect this to be less pronounced. *------------ begin example --------------- set seed 123456789 clear set obs 500 gen x = rnormal() gen mu = invlogit(-3 + x) local phi = exp(3.5) gen a = mu * `phi' gen b = (1-mu)*`phi' gen y = rbeta(a, b) qreg y x predict med twoway scatter y x || /// line med x, sort /// lpatter(solid) /// lcolor(red) *------------ end example --------------- Hope this helps, Maarten -------------------------- Maarten L. Buis Institut fuer Soziologie Universitaet Tuebingen Wilhelmstrasse 36 72074 Tuebingen Germany http://www.maartenbuis.nl -------------------------- * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/