Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Proportion as a dependent variable

From   "R. Allan Reese" <>
To   statalist <>
Subject   Re: st: Proportion as a dependent variable
Date   Thu, 17 Jul 2003 15:33:39 +0100 (BST)

On Thu, 17 Jul 2003, Vickers, Andrew J./Integrative Medicine wrote:
> Ronnie Babigumira asked whether linear regression was appropriate for a
> proportion. Many wrote back to point out that proportions involved
> binary data and linear regression is for continuous outcomes....
> ... many areas in
> medical research and psychometrics have similar properties to the
> problem Ronnie raises. For example, pain is often measured on a 0 - 100
> scale; .... Biostatisticians have used linear regression for many
> years without worrying too much about it,
> .... If the dependent variable is normally distributed
> with a mean of 0.5 and an SD of 0.1, linear regression is probably going
> to work fine.

With respect, what a strange set of comments that seem fixed back in the
1960s.  It has long been my view that much damage is done in service
statistics courses by teaching a ragbag of approaches each with its
separate name.  Since the 1970s, we have had software that enables a
general approach to modelling, so that distinctions between "regression",
"anova", "probit" etc become flim-flam.  You may sensibly model
relationships where the dependent and predictor variables are any
combination of interval, ordinal or nominal, provided appropriate
functions and assumptions are used.  People may be talking at
cross-purposes, in that "regression" to one person may imply "linear,
normal errors" but to another person is a general approach.  For example,
outside the US we refer to GLMs which are Generalized Linear Models (cf
McCullough & Nelder) but within the US the acronym often means General
Linear Models with only normal errors.  Binary responses (as observed) may
be modelled with reference to the underlying probability function.

I held back from commenting on the original problem which was about
modelling the take-up of a new seed type in relation to social and
economic factors.  But it seemed to me necessary to discuss, without
expecting a rigid or mathematical answer, whether it was necessary to
model the proportion of land planted with the new seed or the absolute
area. Indeed, it seems feasible that a farmer who was totally committed to
the new seed might have more than 100% response: if, say, he rented extra
land to plant up compared with before.  The absolute area for each farmer
might be determined by the availability of new seed, or the proportion
might be limited by risk-management.  Is this an application where it is
piously hoped that some magical analysis of a quantitative survey will
generate insights, where a more fruitful approach might be to interview
farmers and *ask* what made them decide on the proportion of land to use
for the new seed?

R. Allan Reese                       Email:

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index