Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: RE: Econometrics Theory Questions on Dummies and Correlation Analysis


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   RE: st: RE: Econometrics Theory Questions on Dummies and Correlation Analysis
Date   Mon, 18 Apr 2005 19:20:43 +0100

There is much good advice here, but it still 
is further than I would go, and bound up 
with a more literal reading of the assertions 
of Stanley Smith Stevens 

http://www.nap.edu/openbook/0309022452/html/424.html

and others on nominal, ordinal, interval and ratio
scales, and what you can do with them, than seems defensible.  

Also, arguments about what was designed to do what 
don't help much here. The techniques work the
way they work because of the mathematics of what is 
being done, not according to what was in the 
inventor's mind at the time. Anyway, historically,  
this is a most dangerous tack, as it was (Karl) Pearson 
above all others who thought that correlations could 
be pulled out of categorical data in all sorts of ways: 
you just needed the right formula to do it. 

Regression (correlation if anyone insists, but the logic 
is the same)  can't discern the categorical origins 
of dummy variables. It just sees 0s and 1s. 

At one extreme, suppose you have two identical 
dummy variables (and some variation in each). 
In terms of a scatter plot, you have two clusters, 
one at the origin (0,0) and one at (1,1), like this 


                  * 




     *  


and a straight line is a perfect summary of such 
data, and so the Pearson correlation is identically 1. 
Also, this on the RHS of a model has implications
for the model. In practice, as Paul emphasises, you 
would do well to count the numbers as well, but this 
result holds irrespective of coding and it is perfectly 
sensible statistically. 

More generally, for paired dummies you have clusters of zero or
more data at (0,0), (0,1), (1,0) and (1,1) 
and the correlation you get will depend on the 
"votes cast" by each of those clusters. In many 
cases, the results won't be especially easy 
to interpret, but they are not crazy or stupid. 
Mind you, almost no correlation is easy to 
interpret without looking at the corresponding scatter plot, 
so nothing has changed there. 

I don't think the case of Spearman correlation 
needs much extra discussion. Note that binary scales 
are always ordinal. In correlating, the signs may 
be arbitrary, but the magnitudes of Spearman 
correlations won't be.  

In fact, in many cases they 
are counts too, in a perhaps strained sense (how 
many women inside this person? answer: either 0 or 1). 

Note that no one, to the best of my knowledge, argues
that logit regression is inapplicable to binary 
responses because you can't (shouldn't) apply such techniques 
to "nominal" data! 

Nick 
n.j.cox@durham.ac.uk 

Paul Millar
 
> on Dummies and Correlation Analysis...
> 
> 1. Is there any theory that prohibit one from undertaking a
>    correlation analysis (i.e., correlation matrix) with either
>    with Pearson or Spearman rank correlation test on variables,
>    which are all dummies?
> 
> Although technically there doesn't seem to be anything 
> preventing the kind of analysis you propose, from a 
> theoretical (or at least methodological) point of view you 
> wouldn't normally use this method for at least two reasons.
> 1) The level of measurement of the variables does not 
> coincide with the level of measurement of the techniques.  
> Pearson correlations are designed for interval (or ratio) 
> measures and Spearman for ordinal.  You have nominal measures 
> (or so it seems).
> 2) It is more complex than required, and potentially 
> obscures, rather than helps, understanding of the 
> relationships between the variables.  A series of simple 
> crosstabs might be more illuminating.
> From a methodological point of view, a compelling reason to 
> overcome these objections would be advisable to make your 
> choice of method more defensible.
> 
> 2. If there is no prohibition, theory wise, can the bivariate
>    correlation coeficients for the dummy variables be interpreted
>    in the same way as one would do with continuous variables?
> As stated above, the interpretation would require that you 
> treat your nominal measures as if they are interval or 
> ordinal.  You need to justify this treatment before 
> interpretation, at least if you are picky picky picky.
> 
> - Paul Millar
> Sociology
> University of Calgary
> 
> ----- Original Message -----
> From: Nick Cox <n.j.cox@durham.ac.uk>
> Date: Monday, April 18, 2005 10:05 am
> Subject: st: RE: Econometrics Theory Questions on Dummies and 
> Correlation Analysis
> 
> > Please note various points about 
> > Statalist procedure: 
> > 
> > 1. This message is just a repeat of 
> > one sent yesterday. 
> > 
> > 2. Please don't send email junk 
> > like vcards with your postings. 
> > 
> > As for your question, I don't think 
> > there is anything to prohibit you 
> > doing this. The results won't necessarily 
> > be very helpful or meaningful, except
> > in the extreme cases in which variables 
> > are identical, or nearly so, which will
> > produce correlations that are +1, or nearly 
> > so. 
> > 
> > Nick 
> > n.j.cox@durham.ac.uk 
> > 
> > Dr. Stephen Owusu-Ansah
> > 
> > > I have econometric/statistical theory questions about dummy
> > > variables and correlation analysis:
> > > 
> > > 1. Is there any theory that prohibit one from undertaking a
> > > correlation analysis (i.e., correlation matrix) with either
> > > with Pearson or Spearman rank correlation test on variables,
> > > which are all dummies?
> > > 
> > > 2. If there is no prohibition, theory wise, can the bivariate
> > > correlation coeficients for the dummy variables be interpreted
> > > in the same way as one would do with continuous variables?
> > > 
> > > Thanks for your usual cooperation.

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index