Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Proportion as a dependent variable

From   "Vickers, Andrew J./Integrative Medicine" <>
Subject   st: Proportion as a dependent variable
Date   Thu, 17 Jul 2003 09:56:44 -0400

Ronnie Babigumira asked whether linear regression was appropriate for a
proportion. Many wrote back to point out that proportions involved
binary data and linear regression is for continuous outcomes. Ronnie
then clarified that the proportion was a single value between 0 and 1
for each observation, in this case, the percentage of field space
allocated to new variety maize for each farmer.  

My tuppence, with an open call for comment, is that many areas in
medical research and psychometrics have similar properties to the
problem Ronnie raises. For example, pain is often measured on a 0 - 100
scale; quality of life scales such as the SF36 convert various numerical
scores into a proportion of the maximum score to give a quality of life
between 0 and 100. Biostatisticians have used linear regression for many
years without worrying too much about it, unless there was a particular
reason: as Nick Cox put it, it all depends on the data and the use to
which it is being put. If the dependent variable is normally distributed
with a mean of 0.5 and an SD of 0.1, linear regression is probably going
to work fine. If the dependent variable has many 0's and / or 1's, as
might well be the case with the maize data, you might have a problem,
particular that you regression will make out of sample predictions. My
guess is that with the maize data, differences between say, 55% and 65%
aren't neither important nor likely as farmers will plant certain whole
areas with a particular crop. Thus you could categorize the data into
quartiles (0-24.9%, 25%-49.9%, 50% - 74.9%, 75%- 100%) and then do an
ordinal regression.

Andrew Vickers
Memorial Sloan-Kettering Cancer Center 

     Please note that this e-mail and any files transmitted with it may be 
     privileged, confidential, and protected from disclosure under 
     applicable law. If the reader of this message is not the intended 
     recipient, or an employee or agent responsible for delivering this 
     message to the intended recipient, you are hereby notified that any 
     reading, dissemination, distribution, copying, or other use of this 
     communication or any of its attachments is strictly prohibited.  If 
     you have received this communication in error, please notify the 
     sender immediately by replying to this message and deleting this 
     message, any attachments, and all copies and backups from your 

*   For searches and help try:

© Copyright 1996–2022 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index