Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: multiple weights per person in GEE?


From   "Ariel Linden" <ariel.linden@gmail.com>
To   <statalist@hsphsun2.harvard.edu>
Subject   Re: st: multiple weights per person in GEE?
Date   Wed, 22 Jul 2009 08:56:53 -0700

Steve and Joseph,

Thank you both for your continued devotion to this problem. Here is the code
I used for the comparative models GLM in Stata vs GEE in SAS.



**** GLM in Stata:****

. glm rx tx [aweight = iptw], family(gaussian) link(identity) vce(cluster
personnumber)
(sum of wgt is   4.1810e+05)

Iteration 0:   log pseudolikelihood = -459638.49

Generalized linear models                          No. of obs      =
188831
Optimization     : ML                              Residual df     =
188829
                                                   Scale parameter =
7.617026
Deviance         =  1438315.445                    (1/df) Deviance =
7.617026
Pearson          =  1438315.445                    (1/df) Pearson  =
7.617026

Variance function: V(u) = 1                        [Gaussian]
Link function    : g(u) = u                        [Identity]

                                                   AIC             =
4.868274
Log pseudolikelihood = -459638.4948                BIC             =
-855694

                        (Std. Err. adjusted for 7868 clusters in
personnumber)
----------------------------------------------------------------------------
--
             |               Robust
          rx |      Coef.   Std. Err.      z    P>|z|     [95% Conf.
Interval]
-------------+--------------------------------------------------------------
--
 tx       |   2.476136   .4841721     5.11   0.000     1.527176    3.425096
 _cons |   1.145776   .0553768    20.69   0.000      1.03724    1.254313
----------------------------------------------------------------------------
--

 

**** GEE in SAS:*****
 

PROC GENMOD DATA=DATA;

    CLASS PERSONNU MONTH;

    MODEL  RX = TX;

    WEIGHT IPTW;

    REPEATED  subject=PERSONNU / type=exch;

  run;



GEE (in SAS) IPTW

                                 Standard     Wald 95% Confidence       Chi-
    Parameter    DF    Estimate       Error           Limits           
Square    Pr > ChiSq

    Intercept     1      1.1458      0.0094      1.1273      1.1642   
14816.1        <.0001
    SCCENROL      1      2.4761      0.0128      2.4511      2.5011   
37695.3        <.0001
    Scale         1      4.1067      0.0067      4.0936      4.1198










Date: Tue, 21 Jul 2009 12:22:35 -0400

From: sjsamuels@gmail.com

Subject: Re: st: multiple weights per person in GEE?

- --

Thank you, Joseph. I think that you are right. After I posted, I did some
comparisons of aweights and pweights in -glm-. I saw one extreme case
(weight = response) in which the standard errors differed by a factor of
2-4, but in the other comparisons, the standard errors were within 20% of
each other. I saw nothing like Ariel's discrepancies.

In the absence of a -vce- or -cluster- option, pweights lead to robust
standard errors, whereas aweights lead to OIM (Observed Information

Matrix) standard errors. Perhaps it's time for me to purchase Hardin

and Hilbe's book on Generalized Linear Models.

Here was the extreme case:

sysuse auto,clear

glm rep78 mpg [aweight=rep78], family(poisson) glm rep78 mpg
[pweight=rep78], family(poisson)

- -Steve

On Tue, Jul 21, 2009 at 11:05 AM, Joseph Coveney<jcoveney@bigplanet.com>
wrote:

> Steve Samuels wrote:

>

> The WEIGHT used by SAS GENMOD is a dispersion parameter weight, which 

> should be equivalent to Stata's -aweight-.   Weights for IPTW should 

> be proportional to probability weights, as in the reference Joseh 

> quotes. Ariel did not show us her Stata commands and output, as the 

> Statalist FAQ specify, but it's quite likely that she is comparing 

> different  things.

>

> ----------------------------------------------------------------------

> ----------

>

> You make a good point.  Without the specifics from Ariel it will be 

> impossible to delve into what might be behind the large differences in 

> the regression coefficient standard errors between SAS and Stata.

>

> I wonder whether it has much to do with weights, though.  In the 

> do-file below, which explores -glm- with varying within-panel weights, 

> -pweight- (-iweight-) and -aweight- yield identical coefficients and 

> standard errors.  The same obtains with -iweight- and -pweight- with 

> -xtgee- in the dataset having constant within-panel weights.  (There 

> is no -aweight- in -xtgee-.)

>

> More important, the M. A. Hernán in the _Stata Journal_ article that I 

> cited is the same Miguel Hernán in the SAS literature that Ariel 

> cites.  I'm guessing that if there were a caveat arising from 

> different meanings of "weight" between the two packages in this 

> context, then it would have been brought out by the advocates of the 

> marginal structural model approach (or the journal's reviewrs) who are 

> familiar with both.  I suspect that a comparison of the output 

> (deviance and pseudolikelihood, coefficients and their standard 

> errors, etc.) from following SAS code on the PROC IMPORTed CSV dataset 

> file produced by the do-file below would indicate that SAS is handling 

> the weights much as Stata does with

> -pweight- (-iweight-).

>

> PROC GENMOD DATA=DUMMY DESCENDING;

>    CLASS PATIENT TREATMENT TIME;

>    MODEL SCORE=TREATMENT TIME / DIST=BIN;

>    REPEATED SUBJECT=PATIENT / CORR=IND;

>    SCWGT WEIGHT;

> RUN;

>

> I'd wager that the coefficient standard errors would be essentially 

> the same as those from Stata.  They might differ in the third 

> significant figure, or so, because of the convergence properties of 

> the algorithms used by Stata and SAS for fitting these models, default 

> tolerance settings, and so on.  But I doubt that they would show the 

> 0.484 versus 0.013 difference that Ariel reports in SEs.

>

> Like you say, without the specifics, we can only guess at the problem.  

> In my experience, GLMs are very sensitive to collinearity among the 

> predictors, for example.  PROC GENMOD is reported* to be coy about 

> convergence failures when it encounters compelete or quasicomplete 

> separation with binomial data, for another example.  It might pay 

> Ariel to take a closer look at the SAS log (and at the iteration log in
Stata) in the discrepant case if he hasn't already done so.

>

> Joseph Coveney

>

> * Paul D. Allison, Convergence Failures in Logistic Regression. Paper
360-2008.

> _SAS Global Forum 2008_

> www2.sas.com/proceedings/forum2008/360-2008.pdf

>

>

> clear *

> set more off

> version 10.1

>

> matrix define Mu = J(1, 5, 0.5)

> matrix define Sigma = J(5, 5, 0.5) + I(5) * 0.5 // findit ovbd // 

> findit ridder ovbd , stub(score) means(Mu) corr(Sigma) ///

>    seed(`=date("2009-07-21", "YMD")') n(100) clear generate byte 

> treatment = 0

>

> tempfile tmpfil0

> save `tmpfil0'

>

> matrix input Mu = (0.4 0.45 0.5 0.55 0.6) ovbd , stub(score) means(Mu) 

> corr(Sigma) n(100) clear generate byte treatment = 1 append using 

> `tmpfil0'

>

> generate int patient = _n

> reshape long score, i(patient) j(time) generate double weight = 

> runiform()

>

> outsheet patient treatment time score weight using dummy.csv, comma 

> names

>

> insheet patient treatment time score weight using dummy.csv, comma 

> names clear

>

> tabulate time, generate(time)

> tabulate treatment, generate(treatment)

>

> log using comparem.log, text

>

> glm score treatment1 time1-time4 [iweight=weight], cluster(patient) 

> ///

>    family(binomial) link(logit) nolog

>

> glm score treatment1 time1-time4 [aweight=weight], cluster(patient) 

> ///

>    family(binomial) link(logit) nolog

>

> glm score treatment1 time1-time4 [pweight=weight], cluster(patient) 

> ///

>    family(binomial) link(logit) nolog

>

> bysort patient (time): replace weight = weight[1]

>

> xtgee score treatment1 time1-time4 [iweight=weight], i(patient) ///

>    family(binomial) link(logit) corr(independent) robust nolog

>

> xtgee score treatment1 time1-time4 [pweight=weight], i(patient) ///

>    family(binomial) link(logit) corr(independent) robust nolog

>

> glm score treatment1 time1-time4 [pweight=weight], cluster(patient) 

> ///

>    family(binomial) link(logit) nolog

>

> log close

>

> exit

>

>

>

> *

> *   For searches and help try:

> *   http://www.stata.com/help.cgi?search
<http://www.stata.com/help.cgi?search> 

> *   http://www.stata.com/support/statalist/faq
<http://www.stata.com/support/statalist/faq> 

> *   http://www.ats.ucla.edu/stat/stata/
<http://www.ats.ucla.edu/stat/stata/> 

>

 

 

- --

Steven Samuels

sjsamuels@gmail.com

18 Cantine's Island

Saugerties NY 12477

USA

845-246-0774



*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index