Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: RE: Aren't distinct factors from factor analysis or PCA orthogonal to each other?


From   Cameron McIntosh <[email protected]>
To   STATA LIST <[email protected]>
Subject   RE: st: RE: Aren't distinct factors from factor analysis or PCA orthogonal to each other?
Date   Tue, 18 Aug 2009 13:24:28 -0400

Agreed, thanks for calling me on that first point - factor analysis is not *only* applicable to psychometric measures, but such cases are usually exemplars of the classical "reflective" assumption of FA, namely that there is some latent causal entity (e.g., a personality attribute, general intelligence) responsible for the associations among the observables (scale items, ability tests). Too often researchers apply reflective models to correlation matrices without thinking more about how those correlations are actually produced (e.g., income, education, and occupational prestige are correlated yet do not truly "reflect" a common factor socio-economic status; rather, they "form it", or education increases employment opportunities and hence high income). Those interested in the relevant methodological and substantive debates can have a look at:
 
Fayers P.M., & Hand, D.J. (2002) Causal variables, indicator variables, and measurement scales: an example from quality of life. Journal of the Royal Statistical Society, Series A, 165(2), 233-261.
 
Fayers P.M., & Hand, D.J. (1997) Factor analysis, causal indicators, and quality of life. Quality of Life Research, 6, 139-150. 

Also Psychological Methods, 12(2) and Journal of Business Research, 61(12) are essential reads.
 
Cam
----------------------------------------
> Subject: RE: st: RE: Aren't distinct factors from factor analysis or PCA orthogonal to each other?
> Date: Tue, 18 Aug 2009 17:18:53 +0100
> From: [email protected]
> To: [email protected]
>
> I guess Cameron does not mean quite what he says, which is that factor
> analysis can only be used on psychometric measures. In principle I can
> readily imagine fruitful applications on quite different kinds of data.
> But I largely agree with the spirit of his comment, which I take to be
> -- my words not his -- that expecting factor analysis to see structure
> in a mess independently of some understanding is likely to be expecting
> far too much. However, my impression is that is exactly what almost all
> users of factor analysis seem to expect!
>
> I've found occasional use of PCA in the following way.
>
> 1. Plot the data.
>
> 2. Calculate correlations, etc.
>
> 3. Look at the results: get some ideas.
>
> 4. Calculate PCs.
>
> 5. Use PCs to help structure understanding of #1 and #2 in terms of
> variables that go together, variables that are singletons, etc.
> Sometimes, results of #1 and #2 now make more sense in their own terms.
> (For example, a reordering of a scatter plot matrix or correlation
> matrix makes it easier to see what is going on.) Often it is useful here
> to look at a table of correlations between original variables and new
> PCs. -cpcorr- from SSC helps with that.
>
> 6. Now discard PC results and proceed with modelling.
>
> As in some fields every minor variation on a technique is blessed with a
> name, I'll dub this disposable principal component analysis.
>
> Nick
> [email protected]
>
> Cameron McIntosh
>
> Adrian,I think it would be a complete travesty to just feed that whole
> dataset into a factor analysis. Sure, it'll lump together variables with
> high correlations, but most of the time this doesn't reflect what's
> going on underneath the data (e.g., a web of diect and indirect causal
> relations that generated the observed associations/covariance matrix),
> and this type of situation is what tends to give factor analysis a "bad
> name" among statisticians. Factor analysis is typically only appropriate
> for reflective psychometric measures written specifically to assess an
> underlying trait (e.g., self-esteem, anxiety), not datasets like yours.
> I think there are probably complex causal relations among your variables
> that you should think hard about (using your theoretical knowledge about
> these variables)and maybe come up with a path-analytic model or growth
> curve model (say, GDP trajectory and its predictors). You could also
> compare models across countries.
>
> From: [email protected]
>
>> Thank you to Cameron, Bob and everybody else for the references.
>>
>> I have a response to Jay and a couple more questions for everybody, if
> you can still help me...
>
> Jay wrote:
>>> Before you go any further I think you have a big problem to consider:
> 100 variables on, say 200 countries means you have WAY more covariances
> (or correlations) than you have countries. This means your correlation
> matrix is singular.
>>
>>
>> I don't think I have that problem because I don't have 200 countries.
> I only have about 30+ countries.
>>
>> However, even if I had 200 countries, I don't understand exactly what
> the problem would be because I have all 100 variables for country i and
> all 100 variables for country j stacked on one another. So, I have:
>>
>> country year GDP inflation reserves
>> Argentina 1990 2.3 6.4 100
>> Argentina 1991 2.8 7.4 250
>> Argentina 1992 2.6 7.0 200
>> ...
>> Argentina 2006 3.2 8.0 400
>> Brazil 1990 1.7 5.4 120
>> Brazil 1991 2.1 6.3 140
>> Brazil 1992 2.5 7.0 180
>> ...
>>
>>
>> So the variables I enter into my factor analysis are GDP, inflation,
> and reserves... and so the -factor- command in Stata knows nothing about
> the panel/time-series structure of my data. I can see why it should be
> relevant to account for the underlying panel structure of the data --
> for instance, that jump in GDP/inflation/reserves and any other
> variables between Argentina in 2006 and Brazil in 1990 may be a bit
> strange to account for.
>>
>> So, the first question is: do I need to take this panel structure into
> account? And if so, how?
>>
>> The other question is, do units matter? For instance, I know that
> factor analysis or PCA are all based on a variance-covariance matrix...
> but if I have two variables, x and y, and I take the covariance between
> the two of them, that'll be different than if I take the covariance of,
> say 2x and y:
>>
>> cov(x,y) <> cov(2x,y)
>>
>> and so what would happen if I express my GDP in dollars for all
> countries or in local-currency units?? Or in millions or in billions???
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
_________________________________________________________________
Attention all humans. We are your photos. Free us.
http://go.microsoft.com/?linkid=9666047
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index