Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: probit questions


From   "Verkuilen, Jay" <JVerkuilen@gc.cuny.edu>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: probit questions
Date   Wed, 25 Jun 2008 12:35:11 -0400

Sun, Yan (IFPRI) wrote:
>I have couple of questions about the Probit model. My dependent
variable is a 0/1 binary choice (1=invest in technology, 0=no
investment) for user groups, independent variables are user groups'
characteristics (around 20).

>1) Which model is correct one: Probit or Logit? What is the STATA
command for checking this?

Unless you have very large samples (which you don't), they are nearly
indistinguishable. In general there is reason to prefer logit to probit
when you have potentially extreme probabilities. The logistic
distribution is very much like a t with 10 df in shape. 

The classic example of being able to tell the difference appears in
chess ranking. The Elo system is, essentially, based on logistic
regression. It was originally based on probit but in practice it turned
out that the probit didn't make enough extreme predictions. 


>2) I have small observations (total 170 observations, but valid obs. Is
only around 60 for all independent 
>variables), sometimes the regression does not report report "wald chi2"
statistics. What is the reason for this?
>3) I got a note after right after the regression, which says "8
failures and 7 successes completely determined", >what does this means?

Simply put you have too many independent variables for your sample. It
sounds like you may have some missing data as well, since the number of
valid observations is much smaller than the number of observations. The
standard errors and Wald statistics failing is one sign. The perfect
predictions is another. You need to deal with the missing data (-findit
ice-) and even then, you have WAY too many independent variables for 170
observations. Very roughly speaking, you should have 10 observations per
variable, and probably more for binary data, which don't have that much
information per observation. Either get more data or get rid of
variables. 


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index