Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: SAS versus Stata, Panel Study Logit models


From   "Joseph Coveney" <jcoveney@bigplanet.com>
To   "Statalist" <statalist@hsphsun2.harvard.edu>
Subject   Re: st: SAS versus Stata, Panel Study Logit models
Date   Thu, 20 Dec 2007 02:46:12 +0900

Richard Williams wrote:

A student of mine has used SAS to estimate various panel study
logistic regressions.  I can perfectly replicate her regular logistic
regression results using -logit-, but I am having a few problems
replicating her panel analysis.  A couple of Qs (which may or may not
be answerable if you don't also know SAS):

* She uses "Alternating Logistic Regression" for part of the
analysis.  Is that possible in Stata (perhaps it goes by a different name)?

* She also uses GEE with unstructured correlations.  I come very
close but not quite to replicating her results with -xtgee- and with
-xtlogit-.  Is it reasonable to think that differences in algorithms
might produce small differences in results?  Or is SAS perhaps using
some kind of different defaults or methods than Stata uses?  And if
so could I specify the necessary changes in Stata, e.g. change the
tolerances or the maximization technique? I'm guessing the
differences are just due to algorithms but it is always possible
there is something more than that.

Specifically, I give commands like

xtlogit  drug age white black hispanic male modal2 severity time
gentime modtime sevtime, corr(uns) pa
xtgee  drug age white black hispanic male modal2 severity time
gentime modtime sevtime, corr(uns) fam(binom) link(logit)

Here is some of her SAS code, which is for a slightly different model
but I believe similar code was used for the final models:

PROC GENMOD data=Drug descending;
CLASS id2 time2 race modal;
MODEL drug=  modal severity time modal*time severity*time/ dist=binomial
link=logit;
REPEATED subject=id2 / withinsubject=time2  type=un covb corrw modelse;
RUN;

--------------------------------------------------------------------------------

Alternating logistic regression is not available in Stata as far as I know.

There are a couple of FAQs about potential differences in GEE results
between SAS and Stata, esp., with regard to standard errors of the estimate:
www.stata.com/support/faqs/stat/xtgeesas.html and
www.stata.com/support/faqs/stat/xtgeetech.html.  I don't know whether these
will address the small differences that you observe, but they might be a
start.

In addition, my understanding is that SAS's handling of categorical
predictor variables in logistic regression (CLASS id2 time2 race modal;)
will drop the highest category to form the reference category  la PROC
GLM.  If I read the documentation correctly, which category SAS considers
highest depends upon a variety of gotchas.

Nevertheless, if there are only two categories for a given predictor, then
any SAS-versus-Stata difference would at worst only result in a change of a
coefficient's sign (inversion of the OR).  Are the numerical differences in
coefficients you're seeing with multiple (>2) categories?  If they're in
two-category predictors, are the small differences perhaps ones that are ca.
OR = 1, i.e., an OR of 0.98 in one package is 1.02 in the other?  Are the
differences observed in categorical-by-continuous-predictor interaction
terms?

(This refers to predictor coding, not the SAS-versus-rest-of-world
difference in how the response variable's coding is construed in logistic
regression--the student is already accommodating the latter with the
DESCENDING work-around.)

Joseph Coveney


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2021 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index