Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | "Seed, Paul" <paul.seed@kcl.ac.uk> |
To | "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |
Subject | RE: st: Identifying the best scale without a "gold standard" |
Date | Mon, 14 Nov 2011 20:53:01 +0000 |
Dear Cameron, Thank you for all this information. I may have given the wrong impression. In the real data (results below), there is only one factor with eigenvalue > 1, made up of six highly correlated measurements of breathlessness. There is no space (as I understand it) for a second order factor analysis. The six individual measurements are all well-established and validated scales, and are treated as single measurements for the purposes of the analysis. It is therefore not entirely surprising that they agree so well. The research problem is to identify the best single scale for measuring breathlessness from the six candidates. I was therefore interested in a valid test for identifying agreement of individual measures with a latent factor to which they all contributed. Best wishes, Paul T Seed, Senior Lecturer in Medical Statistics, Division of Women's Health, King's College London Women's Health Academic Centre KHP 020 7188 3642. "I see no reason to address the comments of your anonymous expert ... I prefer to publish the paper elsewhere" - Albert Einstein ********************************************************************************* . local vars overallNRSave overallMRC overallBorgave overalldyspnoea12 overallCRQMastery overallCRQDyspnoea . factor `vars' (obs=103) Factor analysis/correlation Number of obs = 103 Method: principal factors Retained factors = 3 Rotation: (unrotated) Number of params = 15 -------------------------------------------------------------------------- Factor | Eigenvalue Difference Proportion Cumulative -------------+------------------------------------------------------------ Factor1 | 2.68006 2.52737 1.0799 1.0799 Factor2 | 0.15269 0.02393 0.0615 1.1414 Factor3 | 0.12876 0.22397 0.0519 1.1933 Factor4 | -0.09520 0.08428 -0.0384 1.1549 Factor5 | -0.17948 0.02553 -0.0723 1.0826 Factor6 | -0.20501 . -0.0826 1.0000 -------------------------------------------------------------------------- LR test: independent vs. saturated: chi2(15) = 206.74 Prob>chi2 = 0.0000 Factor loadings (pattern matrix) and unique variances ----------------------------------------------------------- Variable | Factor1 Factor2 Factor3 | Uniqueness -------------+------------------------------+-------------- overallNRS~e | 0.6421 0.2124 0.0122 | 0.5424 overallMRC | 0.6465 -0.1168 0.1992 | 0.5288 overallBor~e | 0.5869 0.1940 0.1212 | 0.6033 overalldy~12 | 0.7569 0.0510 -0.1620 | 0.3982 overallCRQ~y | -0.6479 0.0963 0.2079 | 0.5277 overallCRQ~a | -0.7160 0.2108 -0.0692 | 0.4381 ----------------------------------------------------------- Cameron McIntosh <cnm100@hotmail.com> wrote: Paul, You should be using a second-order or bifactor model for this analysis: Koufteros, X., Babbarb, S., & Kaighobadi, M. (2009). A paradigm for examining second-order factor models employing structural equation modeling. International Journal of Production Economics, 120(2), 633-652. Rindskopf, D., & Rose, T. (1988). Some theory and applications of confirmatory second-order factor analysis. Multivariate Behavioral Research, 23(1), 51-67. Chen, F.F., West, S.G., & Sousa, K.H. (2006). A Comparison of Bifactor and Second-Order Models of Quality of Life. Multivariate Behavioral Research, 41(2), 189-225.http://www.iapsych.com/articles/chen2006.pdf Chen, F.F., Hayes, A., Carver, C.S., Laurenceau, J.-P., Zhang, Z. (2011). Modeling General and Specific Variance in Multifaceted Constructs: A Comparison of the Bifactor Model to Other Approaches.Journal of Personality, Accepted. Then the most promising scale might be the one with highest first-order factor loading on the second-order factor (in the second-order analysis), or the one with the lest amount of subscale specific explained variance (bifactor model). But that's mainstream stuff... I would also be curious about what automated search routines might tell you about the most tenable structures for the observed variables. Although it may occasionally be plausible, often the common factor model is something we force on multi-item scales without ever considering alternative generating structures: Landsheer, J.A. (2010). The specification of causal models with Tetrad IV: a review. Structural Equation Modeling, 17(4), 703-711.http://www.phil.cmu.edu/projects/tetrad/ Zheng, Z.E., & Pavlou, P.A. (2010). Toward a Causal Interpretation from Observational Data: A New Bayesian Networks Method for Structural Models with Latent Variables. Information Systems Research, 21(2), 365-391.http://www.utdallas.edu/~ericz/ISR09.pdf Xu, L. (2010). Machine learning problems from optimization perspective. Journal of Global Optimization, 47, 369-401.http://www.cse.cuhk.edu.hk/~lxu/papers/journal/ml-opt10.pdf Tu, S., & Xu, L. (2011a). Parameterizations make different model selections: Empirical findings from factor analysis. Frontiers of Electrical and Electronic Engineering in China, 6(2), 256-274.http://www.cse.cuhk.edu.hk/~lxu/papers/journal/11FEE-tsk-two.pdf Tu, S., & Xu, L. (2011b). An investigation of several typical model selection criteria for detecting the number of signals. Frontiers of Electrical and Electronic Engineering in China, 6(2), 245-255.http://www.cse.cuhk.edu.hk/~lxu/papers/journal/11FEE-tsk-sev.pdf ; Xu, L. (2011). Codimensional matrix pairing perspective of BYY harmony learning: hierarchy of bilinear systems, joint decomposition of data-covariance, and applications of network biology. Frontiers of Electrical and Electronic Engineering in China, 6(1), 86-119.http://www.cse.cuhk.edu.hk/~lxu/papers/journal/byy11.pdf Cam ---------------------------------------- > From: paul.seed@kcl.ac.uk > To: statalist@hsphsun2.harvard.edu > Date: Fri, 11 Nov 2011 11:49:57 +0000 > Subject: st: Identifying the best scale without a "gold standard" > > Dear Statalist, > > I have six scales, all of which are supposed to measure the same thing (breathlessness) > Factor analysis confirms a single large factor, with different weightings by the different scales. > I can now say in general terms, which scales are best & which worst, but I am > would like to confirm that the observed differences are not due to chance. > > Method 1: Extract the main factor & use Richard Goldstein's -corcor- to compare > correlations between the scales & factors > > Method 2 : take the simple average of the scales & use -corcor- as before. > > This gives me very different answers: > > ******** example code **************** > version 11.2 > webuse bg2 > factor bg2cost1-bg2cost6 > > * Method 1 > predict f1 > foreach v in varlist bg2cost1- bg2cost5 { > corcor f1 `v' bg2cost6 > } > > * Method 2 > gen mean_bgcost = ( bg2cost1+ bg2cost2+ bg2cost3+ bg2cost4+ bg2cost5+ bg2cost6)/6 > foreach v in varlist bg2cost1- bg2cost5 { > corcor mean_bgcost `v' bg2cost6 > } > > ******** end example **************** > > I am fairly sure that method 2 is better, as > in method 1 there is a circularity about > using the weighted average; and then showing that > the variable with the biggest weighting also has > the biggest correlation. > > However, is there also a flaw in method 2? > (apart from the multiple testing issues) > Is there a better approach? > Any thoughts, references, programs appreciated. > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/