Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: path analysis sureg, cmdok, boostrap with imputed data?

From	Stas Kolenikov <[email protected]>
To	[email protected]
Subject	Re: st: path analysis sureg, cmdok, boostrap with imputed data?
Date	Thu, 14 Apr 2011 10:39:28 -0500
I am not sure I quite see what's the flow there -- where you are
taking the bootstrap sample, and where you are imputing, and what goes
first. You are hacking -mi- guts, like regenerating the observation
IDs, and I will probably consider it to be a dangerous practice; if
you are able to cite [MI] manuals by heart, then such code is probably
OK, but I doubt even Yulia M can do that :). I understand that you
want to estimate the total effect. Here's what I would do, in my order
of preference for simplicity.

1. You can do it just as well with -sureg- (I would actually use
-reg3-, but it sometimes gives underidentification messages that freak
people out) with -nlcom-.

2. You can do it with -mi- if you have missing data; if it does not
support the command you need, you can do it with -ice/mim- combo...
again followed by -nlcom- or -mi testtransform-depending on what
worked out. (If -mi estimate- does not support something, it will tell
you so and won't run, so if it did run, it is probably OK.) Your 40
imputations is all you need, you don't need to bootstrap them.

3. If you really need to use the bootstrap (e.g., you have a stubborn
referee who does not want to hear otherwise), then the sequence should
be:

i. take a bootstrap sample
ii. run one imputation on that bootstrapped sample
iii. obtain the point estimate

Steps ii and iii can be put together into an -eclass- program; step i
and cycling over the subsamples is obviously the -bootstrap-'s
business. The results are to be summarized by the -bootstrap- rather
than -mi-, as the bootstrap takes both sampling uncertainty and the
imputation uncertainty into account. You would need to run at least a
couple of hundred replications to get reliable standard errors.

Reference: http://www.citeulike.org/user/ctacmo/article/1269394.

On Thu, Apr 14, 2011 at 5:13 AM, Catharine Morgan
<[email protected]> wrote:
> Hi Statalist users,
>
> I am trying to test the effect of an independent variable (iv) through a mediator (m1) on a dependent variable (y), all of which are continuous variables. In addition I wish to add further covariates, three continuous and one categorical. I have successfully used a complete case dataset using the sureg command. However I now wish to test the imputed data set (40 imputations) and have used the sureg command using cmdok and mi in STATA 11.
>
> I have the following queries:
>
> 1) I have read there may be issues with some commands not supported by mi but sureg with cmdok appears to be working, are there any indicators from my output to suggest otherwise or obvious signs I should look out for?
>
> 2) I have run bootstrapping (with the help of a colleague) and wanted to check about the warning STATA returns. I am wondering why strata 41 appears on the output rather than 40. I think it may be using m=0 data also. Do you know how to correct this if this is incorrect?
>
> 3) Does anyone know how I can add in multiple independent variables to test them simultaneously on the same mediator and dependent variable?
>
>
> I have listed the syntax and output below. Many thanks for your time, Cathy
>
> ******************************************************************************************************************************************
> This is the syntax I have used:
>
> xi:mi estimate, cmdok: sureg (m1 iv cov1 cov2 cov3 i.cov4) (y m1 iv cov1 cov2 cov3 i.cov4)
>
> capture program drop bootcm
>
> program bootcm, rclass
> drop _mi_id
> bys _mi_m: gen _mi_id = _n
> xi:mi estimate, cmdok: sureg (m1 iv cov1 cov2 cov3 i.cov4) (y m1 iv cov1 cov2 cov3 i.cov4) matrix b = e(b_mi) return scalar ind1 = el("b", 1, colnumb("b", "m1:iv")) return scalar ind2 = el("b", 1, colnumb("b", "y:m1")) return scalar indtotal = el("b", 1, colnumb("b", "m1:iv"))*el("b", 1, colnumb("b", "y:m1")) end
>
> bootstrap r(ind1) r(ind2) r(indtotal), reps(500) strata(_mi_m) :bootcm estat bootstrap, all
>
>
> The output STATA is returning:
>
> . xi:mi estimate, cmdok: sureg (m1 iv cov1 cov2 cov3 i.cov4) (y m1 iv cov1 cov2 cov3 i.cov4)
> i.depriv_grps_i   _Idepriv_gr_1-4     (naturally coded; _Idepriv_gr_1 omitted)
>
> Multiple-imputation estimates                     Imputations     =         40
>                                                  Number of obs   =        425
>                                                  Average RVI     =     0.1076
> DF adjustment:   Large sample                     DF:     min     =     650.87
>                                                          avg     =   30595.89
>                                                          max     =  162393.01
>
> ------------------------------------------------------------------------------
>             |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
> -------------+----------------------------------------------------------
> -------------+------
> m1           |
>          iv |    .263259   .0405813     6.49   0.000     .1836816    .3428365
>        cov1 |   .6513497   .3615691     1.80   0.072    -.0573187    1.360018
>        cov2 |   .0063562    .013241     0.48   0.631    -.0195965    .0323088
>        cov3 |   .0120347    .015811     0.76   0.447    -.0189553    .0430248
>    _Icov4~2 |  -.1198675   .4329068    -0.28   0.782    -.9683557    .7286206
>    _Icov4~3 |    .202258   .4486146     0.45   0.652    -.6770235     1.08154
>    _Icov4~4 |   .0695109   .4543683     0.15   0.878    -.8210761    .9600979
>       _cons |   13.70543   1.352386    10.13   0.000     11.05437     16.3565
> -------------+----------------------------------------------------------
> -------------+------
> y            |
>          m1 |   1.828701   .1590137    11.50   0.000     1.516739    2.140663
>          iv |  -.0633312   .1370068    -0.46   0.644    -.3323599    .2056975
>        cov1 |   1.952565   1.141438     1.71   0.087    -.2856063    4.190736
>        cov2 |   .1639724    .042429     3.86   0.000     .0807423    .2472025
>        cov3 |   -.073326   .0499443    -1.47   0.142    -.1712756    .0246236
>    _Icov4~2 |  -1.714443   1.367262    -1.25   0.210    -4.395557    .9666698
>    _Icov4~3 |   .3836167    1.38614     0.28   0.782    -2.333895    3.101129
>    _Icov4~4 |   -1.18265    1.39184    -0.85   0.396    -3.911344    1.546045
>       _cons |   26.85021   4.797198     5.60   0.000     17.43863    36.26179
> ------------------------------------------------------------------------------
>
> . capture program drop bootcm
>
> . program bootcm, rclass
>  1.    drop _mi_id
>  2.    bys _mi_m: gen _mi_id = _n
>  3.  xi:mi estimate, cmdok: sureg (m1 iv cov1 cov2 cov3 i.cov4) (y m1 iv cov1 cov2 cov3 i.cov4)
>  4.    matrix b = e(b_mi)
>  5.    return scalar ind1 = el("b", 1, colnumb("b", "m1:iv"))
>  6.    return scalar ind2 = el("b", 1, colnumb("b", "y:m1"))
>  7.    return scalar indtotal =  el("b", 1, colnumb("b", "m1:iv"))*el("b", 1, colnumb("b", "y:m1"))
>  8.  end
>
> . bootstrap r(ind1) r(ind2) r(indtotal), reps(500) strata(_mi_m) :bootcm (running bootcm on estimation sample)
>
> Warning:  Because bootcm is not an estimation command or does not set e(sample), bootstrap has no way to determine which observations
>          are used in calculating the statistics and so assumes that all observations are used.  This means that no observations will
>          be excluded from the resampling because of missing values or other reasons.
>
>          If the assumption is not true, press Break, save the data, and drop the observations that are to be excluded.  Be sure that
>          the dataset in memory contains only the relevant data.
>
> Bootstrap results
>
> Number of strata   =        41                  Number of obs      =     17425
>                                                Replications       =       500
>
>      command:  bootcm
>        _bs_1:  r(ind1)
>        _bs_2:  r(ind2)
>        _bs_3:  r(indtotal)
>
> ------------------------------------------------------------------------------
>             |   Observed   Bootstrap                         Normal-based
>             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
> -------------+----------------------------------------------------------
> -------------+------
>       _bs_1 |    .263259    .040638     6.48   0.000       .18361     .342908
>       _bs_2 |   1.828701   .2084724     8.77   0.000     1.420103      2.2373
>       _bs_3 |   .4814221   .0772151     6.23   0.000     .3300833    .6327608
> ------------------------------------------------------------------------------
>
> . estat bootstrap, all
>
> Bootstrap results
> Number of strata   =        41                  Number of obs      =     17425
>                                                Replications       =       500
>
>      command:  bootcm
>        _bs_1:  r(ind1)
>        _bs_2:  r(ind2)
>        _bs_3:  r(indtotal)
>
> ------------------------------------------------------------------------------
>             |    Observed               Bootstrap
>             |       Coef.       Bias    Std. Err.  [95% Conf. Interval]
> -------------+----------------------------------------------------------
> -------------+------
>       _bs_1 |   .26325901   -.021616     .040638      .18361    .342908   (N)
>             |                                       .1613802    .314101   (P)
>             |                                       .1995111   .3861749  (BC)
>       _bs_2 |   1.8287012  -.2486033   .20847243    1.420103     2.2373   (N)
>             |                                       1.132552    1.99977   (P)
>             |                                       1.691838   2.138577  (BC)
>       _bs_3 |   .48142207  -.1006636   .07721506    .3300833   .6327608   (N)
>             |                                       .2299137   .5390809   (P)
>             |                                       .4234512   .6634043  (BC)
> ------------------------------------------------------------------------------
> (N)    normal confidence interval
> (P)    percentile confidence interval
> (BC)   bias-corrected confidence interval
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>



-- 
Stas Kolenikov, also found at http://stas.kolenikov.name
Small print: I use this email account for mailing lists only.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
References:
- st: path analysis sureg, cmdok, boostrap with imputed data?
  - From: Catharine Morgan <[email protected]>
Prev by Date: Re: st: graph twoway option by() dose not allow variable with a comma in the value labels
Next by Date: st: kaplan meier graph
Previous by thread: RE: st: path analysis sureg, cmdok, boostrap with imputed data?
Next by thread: st: graphing predicted probabilities and confidence intervals with 'praccum'
Index(es):
- Date
- Thread