Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Confirming whether a variable is binary or continuous


From   "Dimitriy V. Masterov" <[email protected]>
To   [email protected]
Subject   Re: st: Confirming whether a variable is binary or continuous
Date   Fri, 16 Mar 2012 18:01:47 -0400

Bert,

Personally, I would use a t-test on a binary variable as long as I had
enough data. I would use bitest with a small sample.

I've never encountered a continuous variable with only 2 levels before
in my work, but I can see how that's possible. That seems more of a
data cleaning issue, rather than a statistical one.

You can compress your data and then use storage type (see ds,
hastype() for details) as an additional check on your binary variables
to filter out those cases.

DVM

On Fri, Mar 16, 2012 at 5:43 PM, Bert Jung <[email protected]> wrote:
> Thanks Eric and Dimitriy,
>
> That would work but is it legitimate?  It would seem to me that the
> correct test for a continuous variable that just happens to have 2
> levels should be -ttest-.
>
> I guess my problem cannot be resolved without prior knowledge about
> the variable.  Purely from the data one wouldn't be able to tell if
> the variable is binary by definition or by chance.  I will add an
> option to my program that allows the user to specify this distinction
> ex ante and then double-check using your suggestion with -tab-.
>
> Apologies for not thinking through this properly before posting.
>
> Thanks!
> Bert
>
>
>
> On Fri, Mar 16, 2012 at 5:28 PM, Eric Booth <[email protected]> wrote:
>> <>
>>
>> One way is to -tabulate- the var and then use stored value in r(r) to tell how many values it has.  You could also grab values from the user-written packages -egenmore- (form SJ, see the nvals() fcn) and -distinct- (from SSC)
>>
>>
>> Example:
>>
>> *********
>>
>> sysuse auto, clear
>>
>> ds, has(type numeric)
>> foreach x in `r(varlist)' {
>> quietly tabulate `x'
>> if r(r) == 2 di in red `"`x' is binary"'
>> if r(r)!=2  di "`x'  is not binary"
>> }
>> *********
>>
>> - Eric
>>
>> __
>> Eric A. Booth
>> Public Policy Research Institute
>> Texas A&M University
>> [email protected]
>> +979.845.6754
>>
>> On Mar 16, 2012, at 4:18 PM, Bert Jung wrote:
>>
>>> Dear Statalisters,
>>>
>>> I am writing a short program to make a balance table that compares
>>> covariates across a treatment and control group.  I am looking for a
>>> way to confirm whether a variable is binary in order to use -prtest-
>>> for proportions rather than -ttest- for continous variables.
>>>
>>> One option is to check the actual data values and do -prtest- if there
>>> are only 0's and 1's.  But a continuous but rare outcome could
>>> accidentally also take these values, e.g. the number of
>>> hospitalizations in the past 3 months.
>>>
>>> Is there a safer way to confirm that a variable is binary?
>>>
>>> Thanks for any pointers,
>>> Bert
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/statalist/faq
>>> *   http://www.ats.ucla.edu/stat/stata/
>>
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index