Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: -factor- with binary variables


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   RE: st: -factor- with binary variables
Date   Sun, 28 Nov 2004 16:05:41 -0000

It is somewhat orthogonal to the main question 
but the historical remarks here are perhaps misleading.

Karl Pearson put a lot of effort not only 
into what we now call the Pearson correlation
coefficient but also into seeking ways of getting 
at (analogues of) correlations in situations where the 
data came as binary variables (for example). 

Patricia seems be to seeking a technical answer to 
this question, but the reverse question is to 
ask her whether she thinks underneath 
her indicators there are continuous latent variables, 
which make sense in scientific terms, in which 
case she appears close to Pearson. 

I agree with Clive that if factor analysis is no 
good here, principal component analysis will be no 
better, as the distinction between them is not based
on the data input. 

Nick 
n.j.cox@durham.ac.uk 

Clive Nicholas replied to Patricia Sourdin 

> just a query on -factor-.
> I am trying to construct an index where I have five 
> variables which are binary indicators.
> I have read somewhere that it is not appropriate to use 
> factor analysis if the variables are binary.  Can anyone confirm, please?

> Well, if you ever read what Chatfield and Collins (1980) had 
> to say (or,
> should I say, spit?) on CFA, as they prefer to call it, it's such a
> useless and unreliable method of data analysis (largely, they 
> say, because
> it's difficult to replicate), that there's little point in 
> wasting your
> time doing it! I don't entirely share this view, however. :)
> 
> The whole point of factor analysis, as I understand it, is to 
> explore (in
> a preliminary fashion) correlations between variables that 
> appear to 'hang
> together', which in turn _could_ be combined into new 
> variables in further
> analysis if it were both valid and desirable to do so.
> 
> It's no accident that FA was part-invented by Karl Pearson back in the
> early 1930s. Strictly speaking, you're not meant to run Pearson
> correlations between binary/discrete variables because they 
> are designed
> for continuous variables only. You use chi-square tests for two binary
> variables and eta-coefficient tests if one variable is 
> continuous and the
> other is discrete. But as Eric Morecambe would have said, 
> "Come on now, be
> honest!"  How many of us have run Pearson correlations 
> inappropriately? I
> know I have: and I'm not proud of myself, either.
> 
> Having flicked through perhaps one of the most accessible 
> books around on
> factor analysis (Kline, 1994), although he does not say that 
> the use of
> binary variables is disallowed, _all_ of the examples in the book
> exclusively use either scale measures or naturally continuous 
> scores, such
> as years of education or age. Therefore, I would advise against using
> binary variables in factor analysis.
> 
> > Also, would -pca- be an alternative in this case?
> 
> Principal components analysis is similar to FA in that it's a data
> reduction technique. However, the 'factors' extracted in FA are
> hypothetical: it's left to you to describe why the variables 
> that form the
> factor(s) just extracted have something in common. PCA is rather more
> autistic in its approach: in practice, it's all about 
> estimating how much
> variance the first couple of PCs account for, regardless of 
> whether they
> _really_ have something in common or not. Thus, dropping 
> binary variables
> into a PCA would make it no more valid, in my view.

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index