[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"R. Allan Reese" <R.A.Reese@hull.ac.uk> |

To |
statalist <statalist@hsphsun2.harvard.edu> |

Subject |
Re: st: Proportion as a dependent variable |

Date |
Thu, 17 Jul 2003 15:33:39 +0100 (BST) |

On Thu, 17 Jul 2003, Vickers, Andrew J./Integrative Medicine wrote: > Ronnie Babigumira asked whether linear regression was appropriate for a > proportion. Many wrote back to point out that proportions involved > binary data and linear regression is for continuous outcomes.... > ... many areas in > medical research and psychometrics have similar properties to the > problem Ronnie raises. For example, pain is often measured on a 0 - 100 > scale; .... Biostatisticians have used linear regression for many > years without worrying too much about it, > .... If the dependent variable is normally distributed > with a mean of 0.5 and an SD of 0.1, linear regression is probably going > to work fine. With respect, what a strange set of comments that seem fixed back in the 1960s. It has long been my view that much damage is done in service statistics courses by teaching a ragbag of approaches each with its separate name. Since the 1970s, we have had software that enables a general approach to modelling, so that distinctions between "regression", "anova", "probit" etc become flim-flam. You may sensibly model relationships where the dependent and predictor variables are any combination of interval, ordinal or nominal, provided appropriate functions and assumptions are used. People may be talking at cross-purposes, in that "regression" to one person may imply "linear, normal errors" but to another person is a general approach. For example, outside the US we refer to GLMs which are Generalized Linear Models (cf McCullough & Nelder) but within the US the acronym often means General Linear Models with only normal errors. Binary responses (as observed) may be modelled with reference to the underlying probability function. I held back from commenting on the original problem which was about modelling the take-up of a new seed type in relation to social and economic factors. But it seemed to me necessary to discuss, without expecting a rigid or mathematical answer, whether it was necessary to model the proportion of land planted with the new seed or the absolute area. Indeed, it seems feasible that a farmer who was totally committed to the new seed might have more than 100% response: if, say, he rented extra land to plant up compared with before. The absolute area for each farmer might be determined by the availability of new seed, or the proportion might be limited by risk-management. Is this an application where it is piously hoped that some magical analysis of a quantitative survey will generate insights, where a more fruitful approach might be to interview farmers and *ask* what made them decide on the proportion of land to use for the new seed? R. Allan Reese Email: r.a.reese@hull.ac.uk * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: Proportion as a dependent variable***From:*"Vickers, Andrew J./Integrative Medicine" <vickersa@mskcc.org>

- Prev by Date:
**Re: st: Using the cluster command or GLS random effects?** - Next by Date:
**Re: st: PDF Stata 8 manuals** - Previous by thread:
**st: Proportion as a dependent variable** - Next by thread:
**st: RE: Proportion as a dependent variable** - Index(es):

© Copyright 1996–2017 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |