Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down at the end of May, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Austin Nichols <austinnichols@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Modeling % data |

Date |
Fri, 24 Sep 2010 13:00:56 -0400 |

Maarten-- I think you are asserting that if X is unbounded and y is bounded, the conditional median of y cannot be linear in X, except in one trivial case with zero slope. I agree. But this is not what you claimed in prior posts. In particular, this claim is false in its generality: "If prop is bounded, which proportions are, then such a linear effect will eventually lead to prediction less than 0 or more than than 1" since it depends on X being unbounded (not continuous). The assertion that I agree with (if X is unbounded and y is bounded, the conditional median of y cannot be linear in X) does not mean that -qreg- is not a good idea for data where y is bounded *in general* as I pointed out--for a continuous X which is bounded (say between 0 and 100, e.g. for age in the sample, or 0 and 25, for education, or whatever the bounds on "amount brain that is dysfunctional (mm^3)" might be) there is not necessarily any problem. E.g. in particular for the case I outlined where 10% of the observed y are piled up on the lower bound at lower values of X and 10% of the observed y are piled up on the upper bound at higher values of X, there is no real problem running -qreg- at all. The only difficulty the original poster (Marlis Gonzalez Fernandez <mgonzal5@jhmi.edu>) would have is that the interpretation is different: conditional median of y instead of conditional mean. The example you gave is a straw man--you generate data where the conditional median is not linear in X, and then show that a model that assumes the conditional median is linear in X performs badly. Better to generate data where the conditional median is linear in X, and the conditional mean is not, and maybe there is censoring or some other problem, and see whether -qreg- or -glm- or any -tobit- type of command can recover a reasonable estimate of the slope... but none of this addresses the poster's original question, since we are not given any further details on the data. On Fri, Sep 24, 2010 at 4:29 AM, Maarten buis <maartenbuis@yahoo.co.uk> wrote: > --- Austin Nichols >> -qreg- requires no "fix" using higher powers of X in the >> general case. > > It does in the following sense: If we call -qreg prop X- > then we assume that X has a linear effect on the median of > prop. If prop is bounded, which proportions are, then such > a linear effect will eventually lead to prediction less than > 0 or more than than 1. This is not a real problem when that > happens way outside the observed range of X, but this is a > problem when that happens inside the observed range of X. > Moreover, even if we do not get negative predictions, we > would still expect the effect to "slow down" near the > boundary in anticipation of the boundary (and similarly > near the upper boundary). All this is more likely to occur > when you have lots of observations near either of the > boundaries. > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**RE: st: Modeling % data***From:*Nick Cox <n.j.cox@durham.ac.uk>

**RE: st: Modeling % data***From:*Maarten buis <maartenbuis@yahoo.co.uk>

- Prev by Date:
**Re: st: Matched ID in Kernel Matching (PSMATCH2)** - Next by Date:
**Re: st: Omnibus effects following xtmelogit with margins** - Previous by thread:
**RE: st: Modeling % data** - Next by thread:
**st: more on_statsby error** - Index(es):