Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Re: Econometrics Theory Questions on Dummies and Correlation Analysis

From   "Nick Cox" <>
To   <>
Subject   st: RE: Re: Econometrics Theory Questions on Dummies and Correlation Analysis
Date   Tue, 19 Apr 2005 10:33:48 +0100

Paul seems to be implying that whether a binary variable
is nominal is somehow deeper or more fundamental than it 
being binary. I don't accept that at all. 

To repeat an earlier example: 

Suppose you have two identical 
dummy variables (and some variation in each). 
In terms of a scatter plot, you have two clusters, 
one at the origin (0,0) and one at (1,1), like this 



and a straight line is a perfect summary of such 
data, and so the Pearson correlation is identically 1. 
The graph above is label-free and deliberate so, 
as the result holds irrespective of coding. I could 
code the two levels as 7 and 42 or any other distinct 
numbers and the correlation is unchanged.  And 
I don't see any objection to calling that a linear 


Paul Millar
> What fun this all is!   Who'd have thought!  Thanks for the 
> fun with fundamentals!
> I think what Sam was getting at is that with binary 
> variables, once you have the mean, you can throw away the 
> data since the variance is directly derived from the mean.  
> Nothing further is required, even to calculate confidence intervals.
> And I think Nick's response indicates why the level of 
> measurement is relevant.  If the LOM is nominal, there is no 
> linear relationship, strictly speaking.  Only when the scales 
> are equi-interval does a linear relationship, and thus the 
> correlation make theoretical sense; the correlation being a 
> summary of the linear relationship, as Nick points out.  

*   For searches and help try:

© Copyright 1996–2015 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index