Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: PCA or SEM for 2-indicator latent?


From   "Lin, Tin-Chi" <[email protected]>
To   "[email protected]" <[email protected]>
Subject   RE: st: PCA or SEM for 2-indicator latent?
Date   Thu, 9 Jan 2014 15:55:45 +0000

Thanks very much William and Nick. I supply more details below:

The longitudinal data has four waves, each of which contains 1000-1200 observations. The conceptual explanatory variable is "physical inactivity", which is measured/presented by two indicators:  S1 = time spent on sitting in a given period, and S2= level of general physical activities in a given period. 

S1 and S2 are highly correlated (cov(S1, S2) = -0.77). Hence if I put both into the regression model , their coefficient estimates will look very weird. Given the colinearity between S1 and S2, I started to think about strategies for data reduction. 

Below is the analysis using PCA to reduce information into one single index, and later fixed-effect for regression modeling

pca s1 s2                  /* principal component analysis*/

predict pc1 pc2      /*PCA post-estimation. Only pc1 will be used as it explains the overall tendency. See  Filmer&Pritchett, Demography (2001) */

xtreg dep_var pc1 control, i(id) fe  /* fixed-effects regression */


Then I recalled that SEM may handle measurement errors (of the explanatory variable) better than a typical regression model; more over,  "the combination that maximizes the variance may not necessarily be the combination that best works in the
regression model "( quote from http://www.stata.com/statalist/archive/2012-09/msg01050.html). Thus I came up with a measurement model first

sem ( s1 <- S@myconstraint) ///
 (s2 <-S@myconstraint), var(S@1)

I got stuck here yesterday. Even though the measurement model is fine, I realized that there were a lot more to consider if I wish to construct a regression model for the longitudinal data using SEM. For example, how do I deal with the fact that I've constrained the variance of S to be 1 in the longitudinal setting? What is my emphasis in the model here--adjust for within-individual clustering over the year,  control for between-individual difference, and how to translate these concepts into SEM? But these are probably questions for the next stage. The first thing is probably to figure out whether using PCA is sensible or not.

Thanks very much,

Tinchi 








-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Nick Cox
Sent: Thursday, January 09, 2014 7:48 AM
To: [email protected]
Subject: Re: st: PCA or SEM for 2-indicator latent?

"while SEM is generally preferred to PCA"

This is like saying "cars are generally preferred to boats".

For what? If it's a flood out there, I'd prefer a boat. Less obliquely, if you want a modelling technique, as you seem to do, PCA is not a modelling technique. You might use PCA results in a model, but that's a different story.

Nick
[email protected]


On 9 January 2014 02:40, Lin, Tin-Chi <[email protected]> wrote:
> Dear Statalisters,
>
> My data is longitudinal and I am facing a catch-22 in choosing a method to (1) construct the explanatory variable (let's call it S), and (2) perform the regression modeling overall. SEM (structural equations modeling) and PCA (principal component analysis) are among the methods that I am considering.
>
> The dilemma is that while SEM is generally preferred to PCA (see a short but good summary from http://www.stata.com/statalist/archive/2012-09/msg01050.html), there is only two indicators (S1, S2) available for S in each wave. From other website, I learned that where there is only two indicator for a single latent, the regression model will be very sensitive to model mis-specification (http://www.statmodel.com/discussion/messages/11/4965.html?1261084141), and I think the problem will get worse in a longitudinal setting.
>
> Another question is, if I am going to use SEM and the primary explanatory variable is latent, is it possible to run a fixed-effect-like model to control for between-individual differences? I had this thought, because at the very beginning my plan was to ran PCA, get the prediction for S, and then use fixed-effects model to get rid of "contamination" from the unobserved differences. I know we can "translate" a fixed-effects model to SEM when all the x-variables are indicators, but I am not sure if we can still do so when an explanatory variable is latent.
>
> Thanks very much
>
> Tin-chi
>
>
> Tin-chi Lin
> Liberty Mutual Research Institute for Safety
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index