Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down at the end of May, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Nick Cox <njcoxstata@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Confirming whether a variable is binary or continuous |

Date |
Mon, 19 Mar 2012 10:11:45 +0000 |

I agree with Cameron to this extent: There isn't a precise soluble problem here without a precise definition of binary variable and, implicitly or explicitly, of continuous variable. Bert did flag an interest in 0s and 1s, but other people might have interest in other definitions, so the thread has broadened beyond his question. I don't get the impression that Bert wants to do statistical analysis on his data to investigate what measurement scale it is, or might be. I get the impression he wants to do data management. I don't want to get bogged down in terminology here, but binary variables have also been called dichotomous, indicator, dummy, quantal, Boolean and logical and no doubt other names too. (If you know of other names used in English, I'd like to add them to my collection.) Tying together various earlier comments, and making some others: String included or numeric only? A string variable with values "male" and "female" is binary in many people's eyes and can easily be mapped to a numeric binary variable. Two distinct values? One criterion is weak, that just two distinct values occur. That would mean 24 and 42, but that way the obvious comment is that it is difficult to tell apart variables that just happen to have two distinct values but in principle could have many more. Zero or one only? One strict definition is that the values must be restricted to 0 or 1. What if there is only one distinct value in practice? That could be a problem: various analyses won't be possible. but you have to decide what to do. Missing values? Except that missing values may occur. No data management program in Stata is serious unless it copes intelligently with missing data when they occur. It's implicit in Bert's postings that for his purposes, if a variable isn't binary, then it's continuous. I'll comment only that many researchers use much more elaborate taxonomies and terminologies. Here is a sketch of one possible program: program isbinary, rclass version 10.1 syntax [varlist] qui ds `varlist', has(type numeric) local varlist `r(varlist)' foreach v of local varlist { capture assert missing(`v') | inlist(`v', 0, 1) if _rc == 0 local binary `binary' `v' } if "`binary'" != "" describe `binary', simple return local varlist "`binary'" end Notes. 1. No -varlist- need be specified, but either way string variables are ignored. 2. This example program encapsulates one choice (binary variables are numeric, and may be 0, 1 or missing (and nothing else)). Other choices are, as emphasised, clearly possible. 3. To be useful (most) data management programs leave saved results too. 4. Continuous variables are just the complement on this definition. See -ds- or -findname- (SJ, SSC) for tools here. Incidentally, options -min()- and -max()- had already been added to -distinct- (SSC, SJ) before this thread got under way. A version with those options is in press from the Stata Journal. I'll send a copy to Kit Baum for SSC. With this version of -distinct- distinct, max(2) distinct, min(2) max(2) would be other answers to this question. Note: as just mentioned, you can't do this yet with any -distinct- that you have unless you are one of the program authors. I can't see that there is a Stata way to tell apart a variable which is just 0 or 1 in practice from one which can only be 0 or 1 in principle. (Again, missing values aside.) It's a subject-matter decision. Nick On Mon, Mar 19, 2012 at 3:02 AM, Cameron McIntosh <cnm100@hotmail.com> wrote: > I think that the only way to decide how to proceed is to first approach this issue conceptually (i.e, think about it) -- based on your content area expertise, is the covariate in question truly binary (qualitative) or do the observed cate gories merely discretize a latent continuous process? If the former, you can use observed categorical variable methodology to examine the covariate distributions by treatment group. (e.g., chi-square tests of independence and related methods for contingency tables); if the latter, then you may be into tetrachorics and the like. > MacDonald, P.L., & Gardner, R.C. (2000). Type I Error Rate Comparisons of Post Hoc Procedures for I j Chi-Square Tables. Educational and Psychological Measurement, 60(5), 735-754. > > Bentler, P.M. (2011). Can Interval-level Scores be Obtained from Binary Responses? UCLA Preprint #622.http://preprints.stat.ucla.edu/622/Bentler%20Interval%20Scores%20from%20Binary%20Responses.pdf > > Ulrich, R., & Wirtz, M. (2004). On the correlation of a naturally and an artificially dichotomized variable. British Journal of Mathematical and Statistical Psychology, 57(2), 235–251. > > Ledesma, R.D., Macbeth, G., & Valero-Mora, P. (2011). Software for Computing the Tetrachoric Correlation Coefficient. Revista Latinoamericana de Psicología, 43(1), 181-189. http://openjournal.konradlorenz.edu.co/index.php/rlpsi/article/viewFile/459/463 > > Greer, T., Dunlap, W.P., & Beatty, G.O. (2003). A Monte Carlo Evaluation of the Tetrachoric Correlation Coefficient. Educational and Psychological Measurement, 63(6), 931-950. > > Bonett, D.G., & Price, R.M. (2005). Inferential Methods for the Tetrachoric Correlation Coefficient. Journal of Educational and Behavioral Statistics, 30(2), 213-225. > > Long, M.A., Berry, K.J., & Milke, P.W., Jr. (2009). Tetrachoric Correlation: A Permutation Alternative. Educational and Psychological Measurement, 69(3), 429-437. > > Genest, C., & Lévesque, J.-M. (2009). Estimating correlation from dichotomized normal variables. Journal of Statistical Planning and Inference, 139(11), 3785-3794. > > Choi, J., Peters, M., & Mueller, R.O. (2010). Correlational analysis of ordinal data: from Pearson’s r to Bayesian polychoric correlation. Asia Pacific Education Review, 11(4), 459-466. > > Cam > >> Date: Mon, 19 Mar 2012 00:28:27 +0000 >> Subject: Re: st: Confirming whether a variable is binary or continuous >> From: njcoxstata@gmail.com >> To: statalist@hsphsun2.harvard.edu >> >> Your program just echoes its own input, confirming that what you >> specify is a binary variable is indeed binary and what you specify is >> a continuous variable is indeed continuous. It does no checking >> whatsoever. >> >> I am puzzled about why you think that is useful and indeed in what >> sense it is a solution to your original problem. >> >> Nick >> >> On Sun, Mar 18, 2012 at 5:07 PM, Bert Jung <bjung59@gmail.com> wrote: >> >> > Thanks all for these helpful insights. I wanted to share my solution >> > which, if clumsy, works for me. The basic idea is to check whether a >> > particular variable is part of the continuous or binary varlist and >> > then proceed as appropriate. >> > >> > This approach keeps intact the order specified in varlist. I am >> > collecting estimation output and wanted the order to remain as >> > specified by the user. >> > >> > This is just a minimum working example, obviously various checks and >> > balances are of order. >> > >> > Cheers Bert >> > >> > >> > >> > cap program drop varcheck >> > program varcheck, nclass >> > >> > syntax varlist, contvars(varlist) binaryvars(varlist) >> > >> > * Loop over all variables in varlist; this approach keeps the order >> > in -varlist- intact >> > foreach v of local varlist { >> > >> > * (a) Is variable part of the variables specified in "contvars"? >> > local contvar: list v in contvars >> > >> > if `contvar'==1 { >> > di "`v' is specified as continuous variable" >> > } >> > >> > >> > * (b) Is variable part of the variables specified in "binaryvars"? >> > local propvar: list v in binaryvars >> > >> > if `propvar'==1 { >> > di "`v' specified as binary variable" >> > } >> > } >> > >> > end >> > >> > >> > sysuse auto, clear >> > >> > varcheck mpg price foreign weight, contvars(mpg price weight) >> > binaryvars(foreign) >> > >> > >> > >> > >> >> On 03/16/12, Bert Jung <bjung59@gmail.com> wrote: >> >> >>> I am writing a short program to make a balance table that compares >> >>> covariates across a treatment and control group. I am looking for a >> >>> way to confirm whether a variable is binary in order to use -prtest- >> >>> for proportions rather than -ttest- for continous variables. >> >>> >> >>> One option is to check the actual data values and do -prtest- if there >> >>> are only 0's and 1's. But a continuous but rare outcome could >> >>> accidentally also take these values, e.g. the number of >> >>> hospitalizations in the past 3 months. >> >>> >> >>> Is there a safer way to confirm that a variable is binary? >> >>> * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: Confirming whether a variable is binary or continuous***From:*Bert Jung <bjung59@gmail.com>

**Re: st: Confirming whether a variable is binary or continuous***From:*Alexander Jais <jais@win.rwth-aachen.de>

**Re: st: Confirming whether a variable is binary or continuous***From:*Bert Jung <bjung59@gmail.com>

**Re: st: Confirming whether a variable is binary or continuous***From:*Nick Cox <njcoxstata@gmail.com>

**RE: st: Confirming whether a variable is binary or continuous***From:*Cameron McIntosh <cnm100@hotmail.com>

- Prev by Date:
**Re: st: Confirming whether a variable is binary or continuous** - Next by Date:
**st: Analysis of event history data** - Previous by thread:
**RE: st: Confirming whether a variable is binary or continuous** - Next by thread:
**Re: st: Confirming whether a variable is binary or continuous** - Index(es):