[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
Joseph Coveney <jcoveney@bigplanet.com> |

To |
Statalist <statalist@hsphsun2.harvard.edu> |

Subject |
RE: RE: st: RE: Econometrics Theory Questions on DummiesandCorrelation Analysis |

Date |
Tue, 19 Apr 2005 18:05:54 +0900 |

SamL wrote: I did not (mean to) indicate that the variance of a binary variable is undefined or meaningless. The variance of a binary variable just doesn't tell you anything you don't know by knowing the mean, i.e., it is redundant. Also, yes, I know there are lots of ways to think of the correlation coefficient. I indicated in my note that I was talking about one way. To be constructive, let me first be honest--I haven't studied many of the other ways. It strikes me that at least one of those other ways might provide grounds for strong support for reporting Pearson corr coeffs with binary variables. But I am at the limit of my knowledge. Anyone know a way to derive and defend the Pearson's correlation coeff for binary variables as an unbiased indicator of association? I'd love to read it here or be pointed to a citation--it would help me in my own work and in my teaching. ---------------------------------------------------------------------------- I'm not sure that I follow what unbiased indicator of association means in this context. My understanding is that Pearson's correlation coefficient can be used with binary variables without any apology or defense. It's a correlation coefficient by virtue of the arithmetic alone. (This was one of Nick Cox's points, I thought.) If you would like to see an instance of it used as a correlation coefficient for binary variables (if I recall correctly), take a look at A. D. Lunn and S. J. Davies, A note on generating correlated binary variables. _Biometrika_ 85(2):487-490, 1998. In any event, the do-file below might be helpful in the classroom or computer laboratory in illustrating some of the various measures of correlation, association or agreement applicable to binary variables that are available in Stata. I naively think of them as falling into two groups: those that measure correlation between the binary variables and those that estimate correlation of the latent variables that underlie (or that can be tactically conceived as underlying) the binary variables. The do- file illustrates that, in the context of a fourfold table, many familiar coefficients and indexes of association turn out to represent the former whether by coincidence or equivalence. The former could also in some broad sense encompass concordance measures like Goodman and Kruskal's gamma / Yule's Q. In the do-file below, Pearson's correlation coefficient of the binary variables is returned as rho_manvar (rho of the manifest variables, as opposed to that of the latent variables, which is rho_latvar). Several of the measures illustrated are from user-written commands, for example, -somersd-, -polychoric-, -tetrac- (an approximation of what -polychoric- gives in this case) and -reoprob-. You'll need to have these (as well as -slist-) installed. Joseph Coveney clear set more off set seed `=date("2005-04-18", "ymd")' program define corbingen, rclass version 8.2 drawnorm mu0 mu1, corr(1 `1' \ `1' 1) n(200) clear correlate mu0 mu1 return scalar rho_latvar = r(rho) replace mu0 = mu0 > 0 replace mu1 = mu1 > 0 compress tabulate mu*, all return scalar gamma = r(gamma) return scalar CramersV = r(CramersV) return scalar taub = r(taub) somersd mu0 mu1 matrix A = e(b) return scalar somersd = A[1,1] correlate mu* return scalar rho_manvar = r(rho) kap mu0 mu1 return scalar kappa = r(kappa) dprobit mu0 mu1 matrix A = e(dfdx) return scalar dFdx = A[1,1] tetrac mu0 mu1 return scalar tetrac = r(tetra) polychoric mu0 mu1 return scalar polychoric = r(rho) generate int rec = _n reshape long mu, i(rec) j(tim) reoprob mu tim, i(rec) matrix A = e(b) return scalar reprobit_rho = A[1,3] end * forvalues rho = 0.1(0.1)0.9 { local R = round(10 * `rho', 1) tempfile sim`R' simulate "corbingen `rho'" rho_latvar = r(rho_latvar) /// gamma = r(gamma) tetrac = r(tetrac) /// polychoric = r(polychoric) /// reprobit_rho = r(reprobit_rho) /// rho_manvar = r(rho_manvar) CramersV = r(CramersV) taub = r(taub) /// somersd = r(somersd) dFdx = r(dFdx) /// kappa = r(kappa), reps(1) saving(`sim`R'') } forvalues R = 1/8 { append using `sim`R'' } sort rho_latvar slist, noobs decimal(3) assert rho_manvar == taub assert rho_manvar == CramersV pause on graph7 kappa dFdx gamma rho_manvar rho_manvar, /// xlabel ylabel connect(..LL) symbol(oTii) pause graph7 gamma reprobit_rho polychoric rho_latvar /// rho_latvar, xlabel ylabel connect(...L) symbol(oTxi) exit * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**st: RE: -graph combine- with equal size** - Next by Date:
**st: RE: Re: Econometrics Theory Questions on Dummies and Correlation Analysis** - Previous by thread:
**RE: st: Re: Programming, rollreg, gaps, and memory** - Next by thread:
**st: RE: Re: Econometrics Theory Questions on Dummies and Correlation Analysis** - Index(es):

© Copyright 1996–2016 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |