Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Code to generate dummy variable from several categorical variables?

From   David Hoaglin <>
Subject   Re: st: Code to generate dummy variable from several categorical variables?
Date   Tue, 17 Jan 2012 07:22:05 -0500

It would help to have further clarification.

As Nick pointed out, an indicator variable (aka dummy variable) has
two (non-missing) values: 0 and 1.  Please explain what you mean by "a
dummy variable with 4 indicators" and then give an explicit definition
of the desired "dummy variable" in terms of A, B, and C.

If you actually want a categorical variable with 4 categories (which
would necessarily be mutually exclusive), please define those
categories in terms of A, B, and C.

Your explanation of the "dummy variable" abnlX lists three indicator
variables.  If you intend abnlX to be a categorical variable, those
three indicators are not mutually exclusive.

It would help if you described the role that the new variable will
play in an analysis.  Some regression models, for example, could
include the binary variables A, B, and C as they stand; they would not
need to be mutually exclusive.

BTW, three binary variables yield 8 possible combinations.  The one
not in your list is A=1, B=0, C=1.  Why is it necessary to
re-categorize this subject and subjects #2, #3, and #5?

David Hoaglin

On Mon, Jan 16, 2012 at 7:46 PM, DEBORAH L. HUANG
<> wrote:
> Thank you for input and to clarify what I'm trying to do:
> I'm trying to generate a dummy variable with 4 indicators; the values of the
> indicators are to be determined by 3 other binary variables which are not
> mutually exclusive. If generating a categorical variable could be done more
> easily that would be fine. I've already tried generating a composite
> categorical variable but have recoding problems as A, B and C are not
> mutually exclusive.
> For example, possible values for binary variables A, B and C as follows:
>    A     B     C
> 1.  1     0     0
> 2.  1     1     0
> 3.  1     1     1
> 4.  0     1     0
> 5.  0     1     1
> 6.  0     0     1
> 7.  0     0     0
> So I'd like to generate dummy variable abnlX, where
> - abnlX1 includes all subjects where A=1
> - abnlX2 includes all subjects where B=1
> - abnlX3 includes all subjects where C=1
> My difficulty is in figuring out how to code in order to re-categorize
> subjects #2, #3 and #5 into all the appropriate categories (e.g., subject #2
> should count toward abnlX1 and abnlX2). Additionally, there are some missing
> values for any of the variables A, B or C (subject may be missing value for
> A but have values for B and C, etc.) but I would still like to be able to
> include the available values.

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index