Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: Identifying the best scale without a "gold standard"

From	"Seed, Paul" <[email protected]>
To	"[email protected]" <[email protected]>
Subject	RE: st: Identifying the best scale without a "gold standard"
Date	Mon, 14 Nov 2011 20:53:01 +0000

Dear Cameron, 

Thank you for all this information.  I may have given the wrong impression. 
In the real data (results below),  there is only one factor with eigenvalue > 1, 
made up of six highly correlated measurements of breathlessness.  
There is no space (as I understand it) for a second order factor analysis.

The six individual measurements are all well-established and validated scales, 
and are treated as single measurements for the purposes of the analysis. 
It is therefore not entirely surprising that they agree so well.  

The research problem is to identify the best single scale for measuring breathlessness 
from the six candidates.  I was therefore interested in a valid test for 
identifying agreement of individual measures with a latent factor
to which they all contributed.

Best wishes, 
Paul T Seed, Senior Lecturer in Medical Statistics, 
Division of Women's Health, King's College London
Women's Health Academic Centre KHP
020 7188 3642.

"I see no reason to address the comments of your anonymous expert ... I prefer to publish the paper elsewhere" - Albert Einstein







*********************************************************************************

. local vars overallNRSave overallMRC overallBorgave overalldyspnoea12 overallCRQMastery overallCRQDyspnoea 

. factor `vars' 
(obs=103)

Factor analysis/correlation                        Number of obs    =      103
    Method: principal factors                      Retained factors =        3
    Rotation: (unrotated)                          Number of params =       15

    --------------------------------------------------------------------------
         Factor  |   Eigenvalue   Difference        Proportion   Cumulative
    -------------+------------------------------------------------------------
        Factor1  |      2.68006      2.52737            1.0799       1.0799
        Factor2  |      0.15269      0.02393            0.0615       1.1414
        Factor3  |      0.12876      0.22397            0.0519       1.1933
        Factor4  |     -0.09520      0.08428           -0.0384       1.1549
        Factor5  |     -0.17948      0.02553           -0.0723       1.0826
        Factor6  |     -0.20501            .           -0.0826       1.0000
    --------------------------------------------------------------------------
    LR test: independent vs. saturated:  chi2(15) =  206.74 Prob>chi2 = 0.0000

Factor loadings (pattern matrix) and unique variances

    -----------------------------------------------------------
        Variable |  Factor1   Factor2   Factor3 |   Uniqueness 
    -------------+------------------------------+--------------
    overallNRS~e |   0.6421    0.2124    0.0122 |      0.5424  
      overallMRC |   0.6465   -0.1168    0.1992 |      0.5288  
    overallBor~e |   0.5869    0.1940    0.1212 |      0.6033  
    overalldy~12 |   0.7569    0.0510   -0.1620 |      0.3982  
    overallCRQ~y |  -0.6479    0.0963    0.2079 |      0.5277  
    overallCRQ~a |  -0.7160    0.2108   -0.0692 |      0.4381  
    -----------------------------------------------------------


Cameron McIntosh <[email protected]> wrote: 
Paul,
You should be using a second-order or bifactor model for this analysis:
Koufteros, X., Babbarb, S., & Kaighobadi, M. (2009). A paradigm for examining second-order factor models employing structural equation modeling. International Journal of Production Economics,  120(2), 633-652.
Rindskopf, D., & Rose, T. (1988). Some theory and applications of confirmatory second-order factor analysis. Multivariate Behavioral Research, 23(1), 51-67.
Chen, F.F., West, S.G., & Sousa, K.H. (2006). A Comparison of Bifactor and Second-Order Models of Quality of Life. Multivariate Behavioral Research, 41(2), 189-225.http://www.iapsych.com/articles/chen2006.pdf
Chen, F.F., Hayes, A., Carver, C.S., Laurenceau, J.-P., Zhang, Z. (2011). Modeling General and Specific Variance in Multifaceted Constructs: A Comparison of the Bifactor Model to Other Approaches.Journal of Personality, Accepted.
Then the most promising scale might be the one with highest first-order factor loading on the second-order factor (in the second-order analysis), or the one with the lest amount of subscale specific explained variance (bifactor model). 
But that's mainstream stuff... I would also be curious about what automated search routines might tell you about the most tenable structures for the observed variables. Although it may occasionally be plausible, often the common factor model is something we force on multi-item scales without ever considering alternative generating structures:
Landsheer, J.A. (2010). The specification of causal models with Tetrad IV: a review. Structural Equation Modeling, 17(4), 703-711.http://www.phil.cmu.edu/projects/tetrad/
Zheng, Z.E., & Pavlou, P.A. (2010). Toward a Causal Interpretation from Observational Data: A New Bayesian Networks Method for Structural Models with Latent Variables. Information Systems Research, 21(2), 365-391.http://www.utdallas.edu/~ericz/ISR09.pdf
Xu, L. (2010). Machine learning problems from optimization perspective. Journal of Global Optimization, 47, 369-401.http://www.cse.cuhk.edu.hk/~lxu/papers/journal/ml-opt10.pdf
Tu, S., & Xu, L. (2011a). Parameterizations make different model selections: Empirical findings from factor analysis. Frontiers of Electrical and Electronic Engineering in China, 6(2), 256-274.http://www.cse.cuhk.edu.hk/~lxu/papers/journal/11FEE-tsk-two.pdf
Tu, S., & Xu, L. (2011b). An investigation of several typical model selection criteria for detecting the number of signals. Frontiers of Electrical and Electronic Engineering in China, 6(2), 245-255.http://www.cse.cuhk.edu.hk/~lxu/papers/journal/11FEE-tsk-sev.pdf ;
Xu, L. (2011). Codimensional matrix pairing perspective of BYY harmony learning: hierarchy of bilinear systems, joint decomposition of data-covariance, and applications of network biology. Frontiers of Electrical and Electronic Engineering in China, 6(1), 86-119.http://www.cse.cuhk.edu.hk/~lxu/papers/journal/byy11.pdf Cam
----------------------------------------
> From: [email protected]
> To: [email protected]
> Date: Fri, 11 Nov 2011 11:49:57 +0000
> Subject: st: Identifying the best scale without a "gold standard"
>
> Dear Statalist,
>
> I have six scales, all of which are supposed to measure the same thing (breathlessness)
> Factor analysis confirms a single large factor, with different weightings by the different scales.
> I can now say in general terms, which scales are best & which worst, but I am
> would like to confirm that the observed differences are not due to chance.
>
> Method 1: Extract the main factor & use Richard Goldstein's -corcor- to compare
> correlations between the scales & factors
>
> Method 2 : take the simple average of the scales & use -corcor- as before.
>
> This gives me very different answers:
>
> ******** example code ****************
> version 11.2
> webuse bg2
> factor bg2cost1-bg2cost6
>
> * Method 1
> predict f1
> foreach v in varlist bg2cost1- bg2cost5 {
> corcor f1 `v' bg2cost6
> }
>
> * Method 2
> gen mean_bgcost = ( bg2cost1+ bg2cost2+ bg2cost3+ bg2cost4+ bg2cost5+ bg2cost6)/6
> foreach v in varlist bg2cost1- bg2cost5 {
> corcor mean_bgcost `v' bg2cost6
> }
>
> ******** end example ****************
>
> I am fairly sure that method 2 is better, as
> in method 1 there is a circularity about
> using the weighted average; and then showing that
> the variable with the biggest weighting also has
> the biggest correlation.
>
> However, is there also a flaw in method 2?
> (apart from the multiple testing issues)
> Is there a better approach?
> Any thoughts, references, programs appreciated.
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: Identifying the best scale without a "gold standard"
  - From: Stas Kolenikov <[email protected]>
- Re: st: Identifying the best scale without a "gold standard"
  - From: Ronan Conroy <[email protected]>

Prev by Date: st: Logit using geometric means
Next by Date: st: Stata 12 performance issues
Previous by thread: RE: st: Identifying the best scale without a "gold standard"
Next by thread: Re: st: Identifying the best scale without a "gold standard"
Index(es):
- Date
- Thread