Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: multiple weights per person in GEE?


From   "Joseph Coveney" <jcoveney@bigplanet.com>
To   <statalist@hsphsun2.harvard.edu>
Subject   Re: st: multiple weights per person in GEE?
Date   Wed, 22 Jul 2009 00:05:55 +0900

Steve Samuels wrote:

The WEIGHT used by SAS GENMOD is a dispersion parameter weight, which
should be equivalent to Stata's -aweight-.   Weights for IPTW should
be proportional to probability weights, as in the reference Joseh
quotes. Ariel did not show us her Stata commands and output, as the
Statalist FAQ specify, but it's quite likely that she is comparing
different  things.

--------------------------------------------------------------------------------

You make a good point.  Without the specifics from Ariel it will be impossible
to delve into what might be behind the large differences in the regression
coefficient standard errors between SAS and Stata.  

I wonder whether it has much to do with weights, though.  In the do-file below,
which explores -glm- with varying within-panel weights, -pweight- (-iweight-)
and -aweight- yield identical coefficients and standard errors.  The same
obtains with -iweight- and -pweight- with -xtgee- in the dataset having constant
within-panel weights.  (There is no -aweight- in -xtgee-.)

More important, the M. A. Hernán in the _Stata Journal_ article that I cited is
the same Miguel Hernán in the SAS literature that Ariel cites.  I'm guessing
that if there were a caveat arising from different meanings of "weight" between
the two packages in this context, then it would have been brought out by the
advocates of the marginal structural model approach (or the journal's reviewrs)
who are familiar with both.  I suspect that a comparison of the output (deviance
and pseudolikelihood, coefficients and their standard errors, etc.) from
following SAS code on the PROC IMPORTed CSV dataset file produced by the do-file
below would indicate that SAS is handling the weights much as Stata does with
-pweight- (-iweight-).

PROC GENMOD DATA=DUMMY DESCENDING;
    CLASS PATIENT TREATMENT TIME;
    MODEL SCORE=TREATMENT TIME / DIST=BIN;
    REPEATED SUBJECT=PATIENT / CORR=IND;
    SCWGT WEIGHT;
RUN;

I'd wager that the coefficient standard errors would be essentially the same as
those from Stata.  They might differ in the third significant figure, or so,
because of the convergence properties of the algorithms used by Stata and SAS
for fitting these models, default tolerance settings, and so on.  But I doubt
that they would show the 0.484 versus 0.013 difference that Ariel reports in
SEs.

Like you say, without the specifics, we can only guess at the problem.  In my
experience, GLMs are very sensitive to collinearity among the predictors, for
example.  PROC GENMOD is reported* to be coy about convergence failures when it
encounters compelete or quasicomplete separation with binomial data, for another
example.  It might pay Ariel to take a closer look at the SAS log (and at the
iteration log in Stata) in the discrepant case if he hasn't already done so.

Joseph Coveney

* Paul D. Allison, Convergence Failures in Logistic Regression. Paper 360-2008.
_SAS Global Forum 2008_
www2.sas.com/proceedings/forum2008/360-2008.pdf 


clear *
set more off
version 10.1

matrix define Mu = J(1, 5, 0.5)
matrix define Sigma = J(5, 5, 0.5) + I(5) * 0.5
// findit ovbd
// findit ridder
ovbd , stub(score) means(Mu) corr(Sigma) ///
    seed(`=date("2009-07-21", "YMD")') n(100) clear
generate byte treatment = 0

tempfile tmpfil0
save `tmpfil0'

matrix input Mu = (0.4 0.45 0.5 0.55 0.6)
ovbd , stub(score) means(Mu) corr(Sigma) n(100) clear
generate byte treatment = 1
append using `tmpfil0'

generate int patient = _n
reshape long score, i(patient) j(time)
generate double weight = runiform()

outsheet patient treatment time score weight using dummy.csv, comma names

insheet patient treatment time score weight using dummy.csv, comma names clear

tabulate time, generate(time)
tabulate treatment, generate(treatment)

log using comparem.log, text

glm score treatment1 time1-time4 [iweight=weight], cluster(patient) ///
    family(binomial) link(logit) nolog

glm score treatment1 time1-time4 [aweight=weight], cluster(patient) ///
    family(binomial) link(logit) nolog

glm score treatment1 time1-time4 [pweight=weight], cluster(patient) ///
    family(binomial) link(logit) nolog

bysort patient (time): replace weight = weight[1]

xtgee score treatment1 time1-time4 [iweight=weight], i(patient) ///
    family(binomial) link(logit) corr(independent) robust nolog

xtgee score treatment1 time1-time4 [pweight=weight], i(patient) ///
    family(binomial) link(logit) corr(independent) robust nolog

glm score treatment1 time1-time4 [pweight=weight], cluster(patient) ///
    family(binomial) link(logit) nolog

log close

exit



*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index