Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Code to generate dummy variable from several categorical variables?

From   Nick Cox <>
Subject   Re: st: Code to generate dummy variable from several categorical variables?
Date   Tue, 17 Jan 2012 09:34:24 +0000

Although terminology is often a small nightmare I think everyone
agrees that "dummy variable" and "indicator variable" are just
different names for the same thing; there is room for discussion over
which is the better term, but that it is not an issue here. Also it is
more than a matter of terminology that one dummy corresponds to
precisely one indicator.

I personally prefer the term "indicator", so I will stick to that.

Your binary variables _already_ are indicator variables; but what
could bite you is that observations with missing values on any of
those variables will be omitted from statistical commands. You say
that you want the missing values "included", but do not say exactly
what that means.

gen A_is_one = A == 1

maps A = 1 to 1 and A = 0 and missing A to 0.

gen A_is_zero = A == 0

maps A = 0 to 1 and A = 1 and missing A to 0

You can also calculate (e.g.)

gen A_is_one_or_missing = A != 0

Creating a variable and then recoding it is not the way here. You got
into a mess when you tried that; you need a direct approach.


On Tue, Jan 17, 2012 at 12:46 AM, DEBORAH L. HUANG
<> wrote:
> Thank you for input and to clarify what I'm trying to do:
> I'm trying to generate a dummy variable with 4 indicators; the values of the
> indicators are to be determined by 3 other binary variables which are not
> mutually exclusive. If generating a categorical variable could be done more
> easily that would be fine. I've already tried generating a composite
> categorical variable but have recoding problems as A, B and C are not
> mutually exclusive.
> For example, possible values for binary variables A, B and C as follows:
>    A     B     C
> 1.  1     0     0
> 2.  1     1     0
> 3.  1     1     1
> 4.  0     1     0
> 5.  0     1     1
> 6.  0     0     1
> 7.  0     0     0
> So I'd like to generate dummy variable abnlX, where
> - abnlX1 includes all subjects where A=1
> - abnlX2 includes all subjects where B=1
> - abnlX3 includes all subjects where C=1
> My difficulty is in figuring out how to code in order to re-categorize
> subjects #2, #3 and #5 into all the appropriate categories (e.g., subject #2
> should count toward abnlX1 and abnlX2). Additionally, there are some missing
> values for any of the variables A, B or C (subject may be missing value for
> A but have values for B and C, etc.) but I would still like to be able to
> include the available values.
> Hopefully my question makes more sense. Thank you!

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index