[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Stas Kolenikov" <skolenik@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Mediating variables |

Date |
Mon, 6 Oct 2008 12:45:19 -0500 |

You would want to read up on econometric systems of simultaneous equations estimated by -reg3- in Stata; as I said on SEMNET a couple of times, it is amazing that this foundational methodology is not taught in the standard social science quant sequences, and not covered enough in SEM courses. Following those econometric methods guidelines, we can see that the equation for r is identified, since it does not have any endogeneous variables. However, the equation for Y is underidentified, as it fails the order condition which says, the number of excluded exogenous variables (here, none) should be at least as great as the number of included endogenous variables (here, one). You can estimate the equation for r with OLS; however the equation for Y is not estimable by any method. In terms of instrumental variables, your prediction for r-hat from the first stage regression will be perfectly collinear with x, and hence the second stage linear regression for the first equation will break down. If you specify this as a structural equation model, then you have 6 moments (three variances and three covariances), but seven parameters (d, b, c, Var[x] which is exactly identified, and three elements in variance-covariance matrix of epsilons). You can again show that the second equation is (exactly) identified -- that's an OLS, in the end. But you cannot identify the first equation unless you impose some additional assumptions, such as zero correlation of epsilons -- this is what Mplus or other SEM software might be doing implicitly, but econometric techniques insist of having epsilons correlated, and you seem to be interested in that, too. If you do impose that un-correlated restriction though, your model will be just identified, and you won't have any degrees of freedom to test whether the correlation is indeed zero. So I have no idea how this was done in earlier work on mediation you mentioned if the model is underidentified. This is such a basic failure that no workaround is possible at all. As for the panel aspect of your data, I have not seen this done in panel way, although I imagine it is known in econometrics. Again with covariance structure modeling and balanced panels, you can represent your model in the "wide" format with variables x1, r1, Y1 for first period, blah-blah-blah, xT, rT, YT in the last period, and coming up with a covariance structure model with tons of parameter restrictions (of all the parameters being the same in all periods). Being true to your data, you would need to incorporate some panel effects u1 and u2 in the two equations that are common for all time periods, with epsilon1 and epsilon2 being distinct in each time period. Establishing identification of such a model will be difficult to extremely difficult, although I imagine you can just go along the lines of the "all observed"/simultaneous equations system, incorporating the known restrictions. My intuitions on this says that IF this model is identified for large enough T, you might need to have at least three time periods to get anything sensible. It does not seem like you can get enough leverage out of -reg3- on this occasion, as my brief look through it suggests that you cannot specify a parametric structure for your residuals covariance matrix (which will have some sort of block/Kronecker product structure based on covariances of unique errors epsilon and panel level errors u). You should be able to set this up as a GLLAMM model with three levels: level 1, the response variables; level 2, a single time occasion; level 3, person (or whatever your longitudinal unit is). For GLLAMM, you would need to represent your data in long format, with a single response variable responsible for all of r's and Y's in all time periods... and all the accompanying mess of specifying GLLAMM models. You probably could write down your own likelihood -ml d0-, but it should be easier just to figure out GLLAMM for this. Now, to the updates you posted: if you have another exogenous variable q that affects r but does not affect Y, then it solves the order conditions mentioned above. You would need to check the rank condition based on some matrices, and it looks like your system will be identified, then. And the other piece of good news is that it will be estimable with -reg3- -- at least if you had i.i.d. data; if you have panel data, then you might get more efficient estimates by using that panel structure (assuming that panel errors u are not correlated with anything else in your model). Suggested references in econometrics: Davidson and MacKinnon, Estimation and Inference in Econometrics (http://www.citeulike.org/user/ctacmo/article/1616299), Wooldrdige, Econometric Analysis of Cross-Sectional and Panel Data (http://www.citeulike.org/user/ctacmo/article/106152), Greene, Econometric Analysis (http://www.citeulike.org/user/ctacmo/article/3051932) -- the latter is probably the lightest of them all, and has the best explanation of the procedures to establish rank and order conditions of identification. Suggested references on SEM vs multilevel/panel models: Bauer, D, Estimating multilevel models as SEMs (JEBS, http://www.citeulike.org/user/ctacmo/article/1768596), Curran, P, Have multilevel models been SEMs all along? (MBR, http://www.citeulike.org/user/ctacmo/article/3046574). Suggested reading on GLLAMM: see http://www.gllamm.org and Stata Press books by R-H & S. On Sun, Oct 5, 2008 at 1:17 PM, Jaime Gómez <jaime.gomez@unizar.es> wrote: > Dear Stata users > I have a model in which the relationship between a predictor "x" and an > outcome "y" is mediated by three factors ("r", "s" and "t"). I am only able > to test whether one of the predictors ("r") mediates the relationship > between "x" and "y" (I only have data on this mediating variable and I > cannot get data on the other two). I would like to implement Baron and Kenny > (1986)'s test for mediation. At least, this involves estimating the > following system: > Y=a1+b*r+c*x+epsilon1 > r=a2+d*x+epsilon2 > Given that the errors of the two equations are potentially correlated, it > has been suggested that a 2SLS approach should be used. I have seen that > this could be done with ivregress, provided that I can find data on at least > one variable that affects "r" and does not affect "y". My doubts are the > following: > 1) Given that I have a triangular system, do I have to use the > traditional approach implemented by ivregress or the "modified" proposed in > http://www.stata.com/support/faqs/stat/ivr_faq.html ? Are both valid? > 2) How do I test for the hypothesis that the errors are correlated? I > have seen that the use of a Hausman test is suggested in the literature, but > I do not know how to implement this in Stata (specially in the case I use > the "modified" approach) > 3) Given that I have panel data, could I take advantage of the panel > structure of my data to correct for the fact that I do not have information > on two of the mediating variables ("s" and "t")? Is there a procedure in > Stata for that? > Thanks a lot > Jaime Gómez -- Stas Kolenikov, also found at http://stas.kolenikov.name Small print: I use this email account for mailing lists only. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**RE: st: reporting cox regression for ordinal variables** - Next by Date:
**RE: st: Identify Categorical/Dichotomous and Continuous Variables** - Previous by thread:
**Re: st: Mediating variables** - Next by thread:
**Re: st: Mediating variables** - Index(es):

© Copyright 1996–2016 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |