Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Bootstrapping factor loadings


From   Nick Cox <[email protected]>
To   [email protected]
Subject   Re: st: Bootstrapping factor loadings
Date   Tue, 28 Feb 2012 12:15:58 +0000

The underlying point is that there is an arbitrariness of sign in
factor analysis results, as your linear algebra text may or may not
explain. Here's a dopey example: the loading associated with -mpg-
flips sign depending on the order of variable input. That's not
bootstrap sampling variation, but it's also true that in a large
enough bootstrap sample, some of the loadings will differ in sign from
the majority vote. This is one of many bootstrap problems in which it
is salutary to look at the sampling distribution, not just confidence
intervals.

Nick


. sysuse auto
(1978 Automobile Data)

. factor weight mpg
(obs=74)

Factor analysis/correlation                        Number of obs    =       74
    Method: principal factors                      Retained factors =        1
    Rotation: (unrotated)                          Number of params =        1

    --------------------------------------------------------------------------
         Factor  |   Eigenvalue   Difference        Proportion   Cumulative
    -------------+------------------------------------------------------------
        Factor1  |      1.45871      1.61435            1.1194       1.1194
        Factor2  |     -0.15564            .           -0.1194       1.0000
    --------------------------------------------------------------------------
    LR test: independent vs. saturated:  chi2(1)  =   76.43 Prob>chi2 = 0.0000

Factor loadings (pattern matrix) and unique variances

    ---------------------------------------
        Variable |  Factor1 |   Uniqueness
    -------------+----------+--------------
          weight |  -0.8540 |      0.2706
             mpg |   0.8540 |      0.2706
    ---------------------------------------

. factor mpg weight
(obs=74)

Factor analysis/correlation                        Number of obs    =       74
    Method: principal factors                      Retained factors =        1
    Rotation: (unrotated)                          Number of params =        1

    --------------------------------------------------------------------------
         Factor  |   Eigenvalue   Difference        Proportion   Cumulative
    -------------+------------------------------------------------------------
        Factor1  |      1.45871      1.61435            1.1194       1.1194
        Factor2  |     -0.15564            .           -0.1194       1.0000
    --------------------------------------------------------------------------
    LR test: independent vs. saturated:  chi2(1)  =   76.43 Prob>chi2 = 0.0000

Factor loadings (pattern matrix) and unique variances

    ---------------------------------------
        Variable |  Factor1 |   Uniqueness
    -------------+----------+--------------
             mpg |  -0.8540 |      0.2706
          weight |   0.8540 |      0.2706
    ---------------------------------------



On Tue, Feb 28, 2012 at 12:06 PM, Nick Cox <[email protected]> wrote:
> I'd add a note of caution here. Factors can get flipped around as part
> of sampling variation, which will change the sign of the loadings and
> -- especially when loadings are large and more interesting -- inflate
> the bootstrap error.
>
> We could debate whether this is part of the problem or it makes more
> sense to look at the absolute values of the loadings.
>
> Nick
>
> On Tue, Feb 28, 2012 at 10:15 AM, Grant, Robert
> <[email protected]> wrote:
>> Following an earlier thread (http://www.stata.com/statalist/archive/2012-02/msg00036.html), a fellow Statalister asked me off-list about extending this to more than one factor. This is pretty easy to do once you have got the idea of the requirements of -bstat- but I include my suggested code here in case it is of use to anyone in the future:
>>
>> If you have more than one factor, the e(r_L) matrix will have more than one column, one for each factor. If you are using -pca- instead, the same loadings matrix will be called e(b). You need to rearrange them into a single-column vector which here I call obs, and that contains point estimates which -bstat- will then access. If you are not interested in inference for extra stuff such as the % variance explained, then it is simple:
>>
>> // example begins -------------------------------------------
>> // first, get the observed point estimates:
>> factor var1 var2 var3 ... var26, pcf factors(4) // here there are 4 factors and 26 variables
>> rotate, promax // I hope this makes sense - I "don't do" oblique rotations
>> matrix obsload=e(r_L)
>> forvalues i=1/4 {
>>        matrix obsload`i'=obsload[1..26,`i'] // break the loadings matrix up
>> }
>> matrix obs=(obsload1 \ obsload2 \ obsload3 \ obsload4) // put it back together
>>
>> // then carry on with the program...
>> // example ends --------------------------------------------
>>
>> Or if you need extra stuff, have a loop for columns within each loop for rows:
>>
>> // example begins -------------------------------------------
>> // first, get the observed point estimates:
>> factor var1 var2 var3 ... var26, pcf factors(4) // here there are 4 factors and 26 variables
>> rotate, promax // I hope this makes sense - I "don't do" oblique rotations
>> matrix obsload=e(r_L)
>> forvalues i=1/26 {
>>        forvalues j=1/4 {
>>               scalar obsload`i'_`j'=obsload[`i',`j'] // break the loadings matrix up
>>        }
>> }
>> // I was interested in % variance explained - you might want to add other stats in.
>> scalar varexpl=e(rho)
>> // now put it back together:
>> matrix obs=(obsload1_1 , obsload1_2 , obsload1_3 , obsload 1_4 , ///
>>            obsload2_1 , obsload2_2 , obsload2_3 , obsload 2_4 , ///
>>            obsload3_1 , obsload3_2 , obsload3_3 , obsload 3_4 , ///
>>                                .
>>                                .
>>                                .
>>                                .
>>            obsload25_1 , obsload25_2 , obsload25_3 , obsload 25_4 , ///
>>            obsload26_1 , obsload26_2 , obsload26_3 , obsload 26_4 , ///
>>            varexpl)
>> /* then carry on with the program...
>> but be very careful to cite the individual loadings and stats within -simulate- in exactly the same order as above; here I have gone across rows then down columns which looks nicer as j<i but is slightly unconventional in loadings I suppose */
>>
>> // and here comes the program...
>> capture: program drop myboot
>> program define myboot, rclass
>>        preserve
>>        bsample
>>        factor var1 var2 var3 ... var26, pcf factors(1)
>>        rotate, promax
>>        matrix bootload=e(r_L)
>>       forvalues i=1/26 {
>>               scalar bootload`i'=bootload[`i',1]
>>        }
>>        scalar bootexp=e(rho)
>>      restore
>> end
>>
>> // now you use -simulate- to run the -myboot- program, creating one resample each time.
>> simulate load1=bootload1 load2=bootload2 load3=bootload3 ... load26=bootload26 ///
>>        explained=bootexp, noisily reps(1000) seed(1234) saving(myboot_loadings.dta, replace): ///
>>        myboot
>> bstat, stat(obs) n(999) // put the original number of observations into n()
>> estat bootstrap, all
>>
>> // example ends --------------------------------------------

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index