Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Confirming whether a variable is binary or continuous


From   daniel klein <klein.daniel.81@googlemail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Confirming whether a variable is binary or continuous
Date   Fri, 16 Mar 2012 23:28:24 +0100

Bert,

as you already realized, there is no possibility to tell whether a
variable is intended to be a binary indicator or merely happens to
only have values 0 and 1. For this purpose you will need more
information on that variable. An option, indicating continuous
variables, seems to be a good idea.

However, I would like to add some thoughts here.

Checking for binary variables -tabulate- is useful but the information
in r(r) is not all it has to offer. Note that a variable with values 1
and 2 will also result in r(r) = 2 and therfore will be declared a
binary variable by your program. Here is how I checked for binary
variables in one of my programs using -tabulate- with -matrow()-
option

[...]
tempname M
qui ta <var> ,matrow(`M')
if (r(r) != 2) | (`M'[1, 1] != 0) | (`M'[2, 1] != 1) {
	di "<var> is not a binary variable"
}
[...]

You will have to make sure <var> is not a string variable, as it is
not allowed to use option -matrow()- with string variables. If you do
not want to check, you can use -levelsof- to get the values of any
variable. In any case, user-written software is not required here
(although the first versions of -levelsof- were, at least partly,
user-written by Nick Cox, as far as I know).

I would not use -compress- as it is, in general, a bad idea to make
(any) changes to the user's dataset if these changes are not the very
purpose of your program. You could use -preserve- to avoid permanent
changes but my guess is your program will execute faster if you just
use -tabulate- (as shown above) in a loop for all numeric variables
(not declared "continuous" by the user).

Best
Daniel
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index