[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Dependent var is a proportion, with large spike in .95+

From   David Airey <david.airey@Vanderbilt.Edu>
Subject   Re: st: Dependent var is a proportion, with large spike in .95+
Date   Thu, 4 Sep 2008 06:56:53 -0400


Here is an article I used for a spiked distribution. It is probably not the same situation as yours, however.

Genetics. 2003 Mar;163(3):1169-75.

Mapping quantitative trait loci in the case of a spike in the phenotype

Broman KW.

Department of Biostatistics, Johns Hopkins University, Baltimore, Maryland 21205,

A common departure from the usual normality assumption in QTL mapping concerns a
spike in the phenotype distribution. For example, in measurements of tumor mass,
some individuals may exhibit no tumors; in measurements of time to death after a
bacterial infection, some individuals may recover from the infection and fail to
die. If an appreciable portion of individuals share a common phenotype value
(generally either the minimum or the maximum observed phenotype), the standard
approach to QTL mapping can behave poorly. We describe several alternative
approaches for QTL mapping in the case of such a spike in the phenotype
distribution, including the use of a two-part parametric model and a
nonparametric approach based on the Kruskal-Wallis test. The performance of the
proposed procedures is assessed via computer simulation. The procedures are
further illustrated with data from an intercross experiment to identify QTL
contributing to variation in survival of mice following infection with Listeria

PMCID: PMC1462498
PMID: 12663553 [PubMed - indexed for MEDLINE]

On Sep 3, 2008, at 3:22 PM, Dan Weitzenfeld wrote:

Hi Statalist,
I am trying to determine which testing factors drive a proportion
dependent variable, PercentNoise.
In searching the archives, I came across -betafit-, and a link to the
FAQ: "How do you fit a model when the dependent variable is a
proportion?"  In that response, Allen McDowell and Nic Cox write, "In
practice, it is often helpful to look at the frequency distribution: a
marked spike at zero or one may well raise doubt about a single model
fitted to all data."
That describes my situation exactly:  I have a marked spike in my
histogram at the top bin, roughly .95 - 1.  I am wondering how to
account for this.
Does -betafit- take such a possibility into account?
Can someone briefly describe how I could use multiple models to fit
all the data, as implied in the FAQ response?
My fallback is setting a pass/fail bar and converting my proportions
to a binary, then using probit/logit.  But the obvious drawback is
that I am throwing away information by collapsing the continuous
(albeit bounded) proportion variable to a binary.

Thanks in advance for any suggestions,
*   For searches and help try:
*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index