Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: confirmatory factor analysis with binary variables


From   "Stas Kolenikov" <skolenik@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: confirmatory factor analysis with binary variables
Date   Sat, 13 Dec 2008 10:15:27 -0600

If you are really interested in a single factor model then yes indeed
-gllamm- may be your best option. It is more of data handling issue
than anything else though:

* arrange your data to be of the form ID response1, response2, ... response15
reshape long item , i(ID) j( item )
* ID is your id, item is the number of the item
xi i.item, noomit
* generate variables _Iitem_1 ... _Iitem_15 dummy-coding the items
unab items : _Iitem*
eq items `items'
* enter all of them in this eq command!
gllamm response `items' , i(ID) eq( items ) link( logit ) family( bin )

I won't be surprised if this bare bones pseudo-code will work :)). I
am sure Rabe-Hesketh and Skrondal explain more in their manual when
they talk about fitting IRT models (which is exactly what we are doing
here). But if you are interested in all the stuff like fit indices
then you would have to compute them by hand. I am not exactly sure
those measures make much sense even in the continuous case though, and
even less sure about the discrete case. The saturated log-likelihood =
sum over all cells in your multivariate table of (# units in the cell)
log( # units/sample size). With your original un-reshape-d data, this
is

bysort item1-item15 (ID) : g int ngroup = _N if _n == _N
gen llsat = sum( ngroup*ln( ngroup/_N ) )
li llsat in l

so you can form the chi-square goodness of fit test as the
-2(likelihood from -gllamm- minus this llsat figure):

di -2*( e(ll) - llsat[_N] )

The degrees of freedom in the saturated model is (#items^2-1), and the
degrees of freedom in -gllamm- model must be #items for the intercepts
+ #items for slopes + variance of the factor =2*#items+1, and -gllamm-
should show that number somewhere in the output (this should be the
dimension of e(b) vector, too).

[Wow, I never thought of doing that with my -gllamm- exercises :)).
Happy -gllamm-ing! ].

You can also entertain my -polychoric- command that will produce the
correlation matrix that you could use in -factormat- for EFA purposes
only.

Otherwise most standard SEM packages (Mx which is free, Mplus which is
the most powerful, LISREL, EQS, less sure about AMOS) will all do well
with binary/ordinal data, too.

On 12/12/08, Buckley, Gillian J. <gbuckley@jhsph.edu> wrote:
> Dear Statalist,
>
>  I am trying to do a confirmatory factor analysis on data that is all binary, 0=no, 1=yes.   I have downloaded the tertrachoric command and used this to find the tertrachoric correlations.  Is it possible to do confirmatory factor analysis with this data using the cfa1 command in Stata 9? If so, can anyone explain how?


-- 
Stas Kolenikov, also found at http://stas.kolenikov.name
Small print: I use this email account for mailing lists only.
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index