Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

RE: st: Identifying the best scale without a "gold standard"

 From "Seed, Paul" To "statalist@hsphsun2.harvard.edu" Subject RE: st: Identifying the best scale without a "gold standard" Date Mon, 14 Nov 2011 20:53:01 +0000

```Dear Cameron,

Thank you for all this information.  I may have given the wrong impression.
In the real data (results below),  there is only one factor with eigenvalue > 1,
made up of six highly correlated measurements of breathlessness.
There is no space (as I understand it) for a second order factor analysis.

The six individual measurements are all well-established and validated scales,
and are treated as single measurements for the purposes of the analysis.
It is therefore not entirely surprising that they agree so well.

The research problem is to identify the best single scale for measuring breathlessness
from the six candidates.  I was therefore interested in a valid test for
identifying agreement of individual measures with a latent factor
to which they all contributed.

Best wishes,
Paul T Seed, Senior Lecturer in Medical Statistics,
Division of Women's Health, King's College London
020 7188 3642.

"I see no reason to address the comments of your anonymous expert ... I prefer to publish the paper elsewhere" - Albert Einstein

*********************************************************************************

. local vars overallNRSave overallMRC overallBorgave overalldyspnoea12 overallCRQMastery overallCRQDyspnoea

. factor `vars'
(obs=103)

Factor analysis/correlation                        Number of obs    =      103
Method: principal factors                      Retained factors =        3
Rotation: (unrotated)                          Number of params =       15

--------------------------------------------------------------------------
Factor  |   Eigenvalue   Difference        Proportion   Cumulative
-------------+------------------------------------------------------------
Factor1  |      2.68006      2.52737            1.0799       1.0799
Factor2  |      0.15269      0.02393            0.0615       1.1414
Factor3  |      0.12876      0.22397            0.0519       1.1933
Factor4  |     -0.09520      0.08428           -0.0384       1.1549
Factor5  |     -0.17948      0.02553           -0.0723       1.0826
Factor6  |     -0.20501            .           -0.0826       1.0000
--------------------------------------------------------------------------
LR test: independent vs. saturated:  chi2(15) =  206.74 Prob>chi2 = 0.0000

-----------------------------------------------------------
Variable |  Factor1   Factor2   Factor3 |   Uniqueness
-------------+------------------------------+--------------
overallNRS~e |   0.6421    0.2124    0.0122 |      0.5424
overallMRC |   0.6465   -0.1168    0.1992 |      0.5288
overallBor~e |   0.5869    0.1940    0.1212 |      0.6033
overalldy~12 |   0.7569    0.0510   -0.1620 |      0.3982
overallCRQ~y |  -0.6479    0.0963    0.2079 |      0.5277
overallCRQ~a |  -0.7160    0.2108   -0.0692 |      0.4381
-----------------------------------------------------------

Cameron McIntosh <cnm100@hotmail.com> wrote:
Paul,
You should be using a second-order or bifactor model for this analysis:
Koufteros, X., Babbarb, S., & Kaighobadi, M. (2009). A paradigm for examining second-order factor models employing structural equation modeling. International Journal of Production Economics,  120(2), 633-652.
Rindskopf, D., & Rose, T. (1988). Some theory and applications of confirmatory second-order factor analysis. Multivariate Behavioral Research, 23(1), 51-67.
Chen, F.F., West, S.G., & Sousa, K.H. (2006). A Comparison of Bifactor and Second-Order Models of Quality of Life. Multivariate Behavioral Research, 41(2), 189-225.http://www.iapsych.com/articles/chen2006.pdf
Chen, F.F., Hayes, A., Carver, C.S., Laurenceau, J.-P., Zhang, Z. (2011). Modeling General and Specific Variance in Multifaceted Constructs: A Comparison of the Bifactor Model to Other Approaches.Journal of Personality, Accepted.
Then the most promising scale might be the one with highest first-order factor loading on the second-order factor (in the second-order analysis), or the one with the lest amount of subscale specific explained variance (bifactor model).
But that's mainstream stuff... I would also be curious about what automated search routines might tell you about the most tenable structures for the observed variables. Although it may occasionally be plausible, often the common factor model is something we force on multi-item scales without ever considering alternative generating structures:
Landsheer, J.A. (2010). The specification of causal models with Tetrad IV: a review. Structural Equation Modeling, 17(4), 703-711.http://www.phil.cmu.edu/projects/tetrad/
Zheng, Z.E., & Pavlou, P.A. (2010). Toward a Causal Interpretation from Observational Data: A New Bayesian Networks Method for Structural Models with Latent Variables. Information Systems Research, 21(2), 365-391.http://www.utdallas.edu/~ericz/ISR09.pdf
Xu, L. (2010). Machine learning problems from optimization perspective. Journal of Global Optimization, 47, 369-401.http://www.cse.cuhk.edu.hk/~lxu/papers/journal/ml-opt10.pdf
Tu, S., & Xu, L. (2011a). Parameterizations make different model selections: Empirical findings from factor analysis. Frontiers of Electrical and Electronic Engineering in China, 6(2), 256-274.http://www.cse.cuhk.edu.hk/~lxu/papers/journal/11FEE-tsk-two.pdf
Tu, S., & Xu, L. (2011b). An investigation of several typical model selection criteria for detecting the number of signals. Frontiers of Electrical and Electronic Engineering in China, 6(2), 245-255.http://www.cse.cuhk.edu.hk/~lxu/papers/journal/11FEE-tsk-sev.pdf ;
Xu, L. (2011). Codimensional matrix pairing perspective of BYY harmony learning: hierarchy of bilinear systems, joint decomposition of data-covariance, and applications of network biology. Frontiers of Electrical and Electronic Engineering in China, 6(1), 86-119.http://www.cse.cuhk.edu.hk/~lxu/papers/journal/byy11.pdf Cam
----------------------------------------
> From: paul.seed@kcl.ac.uk
> To: statalist@hsphsun2.harvard.edu
> Date: Fri, 11 Nov 2011 11:49:57 +0000
> Subject: st: Identifying the best scale without a "gold standard"
>
> Dear Statalist,
>
> I have six scales, all of which are supposed to measure the same thing (breathlessness)
> Factor analysis confirms a single large factor, with different weightings by the different scales.
> I can now say in general terms, which scales are best & which worst, but I am
> would like to confirm that the observed differences are not due to chance.
>
> Method 1: Extract the main factor & use Richard Goldstein's -corcor- to compare
> correlations between the scales & factors
>
> Method 2 : take the simple average of the scales & use -corcor- as before.
>
> This gives me very different answers:
>
> ******** example code ****************
> version 11.2
> webuse bg2
> factor bg2cost1-bg2cost6
>
> * Method 1
> predict f1
> foreach v in varlist bg2cost1- bg2cost5 {
> corcor f1 `v' bg2cost6
> }
>
> * Method 2
> gen mean_bgcost = ( bg2cost1+ bg2cost2+ bg2cost3+ bg2cost4+ bg2cost5+ bg2cost6)/6
> foreach v in varlist bg2cost1- bg2cost5 {
> corcor mean_bgcost `v' bg2cost6
> }
>
> ******** end example ****************
>
> I am fairly sure that method 2 is better, as
> in method 1 there is a circularity about
> using the weighted average; and then showing that
> the variable with the biggest weighting also has
> the biggest correlation.
>
> However, is there also a flaw in method 2?
> (apart from the multiple testing issues)
> Is there a better approach?
> Any thoughts, references, programs appreciated.
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```