Ronnie Babigumira asked whether linear regression was appropriate for a proportion. Many wrote back to point out that proportions involved binary data and linear regression is for continuous outcomes. Ronnie then clarified that the proportion was a single value between 0 and 1 for each observation, in this case, the percentage of field space allocated to new variety maize for each farmer. My tuppence, with an open call for comment, is that many areas in medical research and psychometrics have similar properties to the problem Ronnie raises. For example, pain is often measured on a 0 - 100 scale; quality of life scales such as the SF36 convert various numerical scores into a proportion of the maximum score to give a quality of life between 0 and 100. Biostatisticians have used linear regression for many years without worrying too much about it, unless there was a particular reason: as Nick Cox put it, it all depends on the data and the use to which it is being put. If the dependent variable is normally distributed with a mean of 0.5 and an SD of 0.1, linear regression is probably going to work fine. If the dependent variable has many 0's and / or 1's, as might well be the case with the maize data, you might have a problem, particular that you regression will make out of sample predictions. My guess is that with the maize data, differences between say, 55% and 65% aren't neither important nor likely as farmers will plant certain whole areas with a particular crop. Thus you could categorize the data into quartiles (0-24.9%, 25%-49.9%, 50% - 74.9%, 75%- 100%) and then do an ordinal regression. Andrew Vickers Memorial Sloan-Kettering Cancer Center

