# st: STATA help for GLM misleading?

 From Rijo John To Statalist Subject st: STATA help for GLM misleading? Date Mon, 14 Nov 2005 12:04:22 +0530 (IST)

Hi all,

The STATA help for GLM with the family(binomial) link(logit) option says

"For family(binomial) link(logit) models, we recommend using the logistic
command in preference to glm. Both produce the same answers, but logistic
provides useful post-estimation commands". (STATA-SE 8.2. Sorry I dont
know if it is been corrected in STATA 9)

This is actually misleading. When we have independent variables that are
fractions which can take any values between 1 and 0 including 1 and zero,
using family(binomial) link(logit) along with a robust option is certainly
different from logistic regression. And that IS the essence of the paper
by Papke and wooldrige (1996), "Econometric methods for fractional
response variables with an application to 401(k) plan participation rates"
Journal of Applied Econometrics, Vol.11, No.6, Pp 619-632.

I recently had this problem of estimating fractional logit models using
this glm command and when I looked at STATA help for this I was confused
whether to use a family(binomial) link(logit) or family(gaussian)
link(logit). And the stata help as written above sort of asserted that
using family(binomial) link(logit) is going to give the same result as
logistic, giving us the impression that STATA treats all the non-zero
values in the dependent variable as 1 thus resulting a (0,1) Bernoulli
distribution. But for me family(binomial) link(logit) with a robust option
gave a better result than logistic command. The linktest I carried out
after the glm gave me the result "Model is ordinary regression, use
regress instead". However, by all the other model selection criteria the
family(binomial) link(logit) gave me a better fit.

I had correspondence with Papke and Wooldridge regarding this and here is
a clarification I got from Wooldridge: I am reproducing it verbatim.

\begin{verbatim}
The glm command, glm y x1 x2... xk, family(binomial) link(logit) robust is
the correct one.  It does flogit with robust standard errors.  It's true
that, IF y is binary, and we drop "robust", then the results are identical
to the usual logit.  If y is a fraction and we drop robust, the resulting
standard errors are actually too LARGE. Whoever wrote the manual thinks
that people only use "family(binomial) link(logit)" for binary responses.
The point of our paper was that this can be used when y is a fraction,
too.  But robust standard errors are needed. In older versions of Stata, y
would be turned into a zero-one variable, and that's why we had to write
our own Gauss code.  Fortunately, even though the description in the Stata
manual is misleading, they now allow for a nonbinary y in glm.  (Not in
"logit," though.  Anything bigger than zero is set to one.)  But they give
a warning message, as if you shouldn't use glm for a fractional response.
The warning should be ignored. If one uses glm y x1 x2... xk,
family(gaussian) link(logit) robust this will be nonlinear least squares
with robust standard errors, which is okay, but known to be inefficient
always, whereas the the quasi-MLE is known to be efficient sometimes.
\end{verbatim}

Best Regards,
Rijo John.

***************************************************
Rijo.M.John,Research Scholar
Indira Gandhi Institute of Development Research,
Mumbai, India-400065.
contact: (+91)9892412476
URL: http://rijojohn.bizhat.com

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/