Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Modeling % data


From   Austin Nichols <austinnichols@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Modeling % data
Date   Fri, 24 Sep 2010 13:00:56 -0400

Maarten--
I think you are asserting that if X is unbounded and y is bounded, the
conditional median of y cannot be linear in X, except in one trivial
case with zero slope.  I agree. But this is not what you claimed in
prior posts. In particular, this claim is false in its generality: "If
prop is bounded, which proportions are, then such a linear effect will
eventually lead to prediction less than 0 or more than than 1" since
it depends on X being unbounded (not continuous).

The assertion that I agree with (if X is unbounded and y is bounded,
the conditional median of y cannot be linear in X) does not mean that
-qreg- is not a good idea for data where y is bounded *in general* as
I pointed out--for a continuous X which is bounded (say between 0 and
100, e.g. for age in the sample, or 0 and 25, for education, or
whatever the bounds on "amount brain that is dysfunctional (mm^3)"
might be) there is not necessarily any problem.  E.g. in particular
for the case I outlined where 10% of the observed y are piled up on
the lower bound at lower values of X and 10% of the observed y are
piled up on the upper bound at higher values of X, there is no real
problem running -qreg- at all.  The only difficulty the original
poster (Marlis Gonzalez Fernandez <mgonzal5@jhmi.edu>) would have is
that the interpretation is different: conditional median of y instead
of conditional mean.

The example you gave is a straw man--you generate data where the
conditional median is not linear in X, and then show that a model that
assumes the conditional median is linear in X performs badly.  Better
to generate data where the conditional median is linear in X, and the
conditional mean is not, and maybe there is censoring or some other
problem, and see whether -qreg- or -glm- or any -tobit- type of
command can recover a reasonable estimate of the slope...  but none of
this addresses the poster's original question, since we are not given
any further details on the data.

On Fri, Sep 24, 2010 at 4:29 AM, Maarten buis <maartenbuis@yahoo.co.uk> wrote:
> --- Austin Nichols
>> -qreg- requires no "fix" using higher powers of X in the
>> general case.
>
> It does in the following sense: If we call -qreg prop X-
> then we assume that X has a linear effect on the median of
> prop. If prop is bounded, which proportions are, then such
> a linear effect will eventually lead to prediction less than
> 0 or more than than 1. This is not a real problem when that
> happens way outside the observed range of X, but this is a
> problem when that happens inside the observed range of X.
> Moreover, even if we do not get negative predictions, we
> would still expect the effect to "slow down" near the
> boundary in anticipation of the boundary (and similarly
> near the upper boundary). All this is more likely to occur
> when you have lots of observations near either of the
> boundaries.
>
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index