Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: Modeling % data

From   Maarten buis <>
Subject   RE: st: Modeling % data
Date   Fri, 24 Sep 2010 08:29:43 +0000 (GMT)

--- Austin Nichols
> -qreg- requires no "fix" using higher powers of X in the
> general case.

It does in the following sense: If we call -qreg prop X- 
then we assume that X has a linear effect on the median of
prop. If prop is bounded, which proportions are, then such
a linear effect will eventually lead to prediction less than
0 or more than than 1. This is not a real problem when that
happens way outside the observed range of X, but this is a
problem when that happens inside the observed range of X. 
Moreover, even if we do not get negative predictions, we 
would still expect the effect to "slow down" near the 
boundary in anticipation of the boundary (and similarly
near the upper boundary). All this is more likely to occur
when you have lots of observations near either of the 

The example below illustrates how such linear model in the 
median can go wrong. This is intended to be an illustration, 
so I created the data such that it is very clear. In real 
data I would expect this to be less pronounced.

*------------ begin example ---------------
set seed 123456789
set obs 500
gen x = rnormal()
gen mu = invlogit(-3 + x)
local phi = exp(3.5)
gen a = mu * `phi'
gen b = (1-mu)*`phi'
gen y = rbeta(a, b)

qreg y x
predict med
twoway scatter y x ||    ///
       line med x, sort  ///
       lpatter(solid)    ///
*------------ end example ---------------
Hope this helps,

Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen


*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index