# Re: st: -factor- with binary variables

 From "Clive Nicholas" To statalist@hsphsun2.harvard.edu Subject Re: st: -factor- with binary variables Date Sun, 28 Nov 2004 10:08:48 -0000 (GMT)

```Patricia Sourdin wrote:

> just a query on -factor-.
> I am trying to construct an index where I have five variables which are
> binary
> indicators.
> I have read somewhere that it is not appropriate to use factor analysis if
> the
> variables are binary.  Can anyone confirm, please?

Well, if you ever read what Chatfield and Collins (1980) had to say (or,
should I say, spit?) on CFA, as they prefer to call it, it's such a
useless and unreliable method of data analysis (largely, they say, because
it's difficult to replicate), that there's little point in wasting your
time doing it! I don't entirely share this view, however. :)

The whole point of factor analysis, as I understand it, is to explore (in
a preliminary fashion) correlations between variables that appear to 'hang
together', which in turn _could_ be combined into new variables in further
analysis if it were both valid and desirable to do so.

It's no accident that FA was part-invented by Karl Pearson back in the
early 1930s. Strictly speaking, you're not meant to run Pearson
correlations between binary/discrete variables because they are designed
for continuous variables only. You use chi-square tests for two binary
variables and eta-coefficient tests if one variable is continuous and the
other is discrete. But as Eric Morecambe would have said, "Come on now, be
honest!"  How many of us have run Pearson correlations inappropriately? I
know I have: and I'm not proud of myself, either.

Having flicked through perhaps one of the most accessible books around on
factor analysis (Kline, 1994), although he does not say that the use of
binary variables is disallowed, _all_ of the examples in the book
exclusively use either scale measures or naturally continuous scores, such
as years of education or age. Therefore, I would advise against using
binary variables in factor analysis.

> Also, would -pca- be an alternative in this case?

Principal components analysis is similar to FA in that it's a data
reduction technique. However, the 'factors' extracted in FA are
hypothetical: it's left to you to describe why the variables that form the
factor(s) just extracted have something in common. PCA is rather more
autistic in its approach: in practice, it's all about estimating how much
variance the first couple of PCs account for, regardless of whether they
_really_ have something in common or not. Thus, dropping binary variables
into a PCA would make it no more valid, in my view.

CLIVE NICHOLAS        |t: 0(044)7903 397793
Politics              |e: clive.nicholas@ncl.ac.uk
Newcastle University  |http://www.ncl.ac.uk/geps

References:

Chatfield C and Collins AJ (1980) INTRODUCTION TO MUTLIVARIATE ANALYSIS,
London: Chapman and Hall.

Kline P (1994) AN EASY GUIDE TO FACTOR ANALYSIS, London: Routledge.
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```