Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

Re: st: linear probability model

 From "stefan.duke@gmail.com" To statalist@hsphsun2.harvard.edu Subject Re: st: linear probability model Date Wed, 23 Jun 2010 23:35:22 +0200

```As usual it depends a bit on which part of the forest you are coming
from and the tools and experience you have.

When you data is not very extreme, i.e. no too discriminant
predictors, than the linear regression approximates the (middle part)
of the logistic curve pretty well (see
http://en.wikipedia.org/wiki/Logistic_function for a picture).
So the estimation of probabilities for well behaved data doesn't
differ much (and OLS runs better on old, say 20 year old, software).
As you (should) use well behaved data your standard errors should be
sufficiently approximat. normally distributed and hence you can draw
inference (test for significance) from your OLS model, in particular
when sample size goes to infinity (i.e. is large).

On the other hand your model is not robust (for less well-behaved
data) and a better, more appropriate model (logit, probit) is out
there for which you need to check  for less assumptions and it never
gives you implausible probabilities.

So to put it in a nutshell, if you have a large sample analyzing the
effect of gender on smoking behavior in an advanced market society for
young cohorts (not too discriminant) and do the analyze for , say, a
political scientist who learnt some applied statistics 30 years age
and since then stopped reading statistics books, the linear
probability should work well enough.
If you, on the other hand, analyze the effect of gender on consumption
of, say, lipsticks in a society which has more backward gender roles
(I hope this isn't too sexist) and do the analyze for somebody who got
his phd in econometrics some 5 years ago you will be in trouble. For
everything between the two extremes you are on your own.
HTH,
Stefan

some
On Wed, Jun 23, 2010 at 7:11 PM, Nick Cox <n.j.cox@durham.ac.uk> wrote:
> I think this is far from the central issue. With continuous responses it
> can be just as important as with binary responses to ensure that
> predictions stay within the bounds of 0 and 1.
>
> Conversely a linear model might seem justifiable if predictions outside
> those bounds only occurred way beyond the range of the data and if
> linear, logit and probit give similar predictions.
>
> This is like anything else. I often argue, especially to students, that
> choosing a qualitatively correct model precedes estimating the
> parameters and focusing on quantitative fit. But little in this
> territory seems absolute. I wouldn't turn down a Gaussian fit to human
> heights if it fitted well merely because it predicts a positive
> probability of negative heights, even though that is completely
> unbiological.
>
> Nick
> n.j.cox@durham.ac.uk
>
> Scott Millis
>
> The fundamental issue is the type of response variable that you have.
> If it is binary, you would want to use a logit or probit model---not a
> linear model.  If your response variable is continuous, you would use a
> linear model.
>
>
>> What are the advantages of linear
>> probability model over probit and
>> logit. i have read some where that linear probability model
>> fits best
>> for very large sample, where maximum likelihood with probit
>> and logit
>> does not work can any one explain this.
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```