Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: Modeling % data
Nick Cox <email@example.com>
RE: st: Modeling % data
Thu, 23 Sep 2010 19:43:04 +0100
I'd say rather that -qreg- knows nothing of any bounds on the response. This ignorance may or may not bite you. As the original poster was quite explicit that the bounds occur in the data, the probability of being bitten seems higher.
The debate calls that over linear probability model vs logit or probit. Perhaps the linear probability model need not be as bad as often painted, but watch out nevertheless. When it produces ludicrous predictions, it really goes to town.
-qreg- requires no "fix" using higher powers of X in the general case.
I referred the original poster to -glm- as well, in
but your objection to -qreg- is unfounded in the case I outlined--if X
is a continuous variable (did you mean unbounded, maybe?) there is no
reason it cannot have a linear effect on the conditional median of y,
even if y is bounded between 0 and 1, and even if there is a nonzero
fraction at the boundaries. Of course, if a significant fraction of
the data piles up at the boundary, neither -qreg- nor -glm- will be a
particularly good model, and the typical researcher may prefer a MLE
that has a two-part flavor to it (requiring some strong assumptions
about the distribution of errors).
On Wed, Sep 22, 2010 at 12:19 PM, Maarten buis <firstname.lastname@example.org> wrote:
> --- Austin Nichols wrote:
>> I don't see how data approaching the boundaries is a problem in
>> -qreg-, as long as the fraction at the boundary itself is not too
>> large (though that in itself is more an indictment of the outcome
>> measure than a necessary problem for quantile regression). If 10% of
>> the outcomes are at the lower boundary (zero) for low X and 10% of the
>> outcomes are at the upper boundary (100) for high X, how is that a
>> problem for estimating how the conditional median changes with X?
> The problem would be that in those cases is that if X is an continuous
> variable it is probably not going to have a linear effect. That is
> what the boundary does. If you are approaching one boundary, than you
> might get away with adding squares, but if you are approaching both
> boundaries, like in the case of the original question, things would
> get much harder (though not impossible). However, in those case I
> would just go for models in Stata that were written for this type of
> data like the ones I refered to earlier, rather than try to "fix"
* For searches and help try: