 Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

Re: st: Code to generate dummy variable from several categorical variables?

 From David Hoaglin To statalist@hsphsun2.harvard.edu Subject Re: st: Code to generate dummy variable from several categorical variables? Date Tue, 17 Jan 2012 16:22:51 -0500

```Deborah,

I would describe your planned ANOVAs as a preliminary analysis,
comparing the continuous demographic variables among groups defined by
the three outcome variables A, B, and C (jointly).

In a one-way ANOVA, the groups must be mutually exclusive.  From your
initial message, some subjects have both A=1 and B=1 (and other
combinations in which more than one of the outcome variables are not
0).  As a result, the groups defined by your three indicator variables
are not mutually exclusive.

Since you want to consider the three outcome variables together, I
think you have two main choices.  Either you can enumerate the
combinations of A, B, and C that occur in your data (all 8 or only
some of the 8?), define a categorical variable that has a distinct
value for each of those mutually exclusive groups, and use that
variable to define the groups in a one-way ANOVA; or you can consider
a three-way ANOVA with A, B, and C as the factors and decide which
terms to include in the model (only main effects, main effects and
two-factor interactions, or main effects and two-factor and
three-factor interactions).

Once you have settled on the mutually exclusive groups (and before any
ANOVA), it would be a good idea to check whether each of the
demographic variables is suitable for an ANOVA or should be
transformed.  Making boxplots of the demographic variable by group
would be one way to start.

I hope this discussion helps.

David Hoaglin

On Tue, Jan 17, 2012 at 2:39 PM, DEBORAH L. HUANG
<huangdx@u.washington.edu> wrote:
> Basically what I'm hoping to do is "collapse" the outcome variables A, B and
> C (all binary) into the new outcome indicator variable abnlX for ANOVA
> (e.g., comparison mean age across indicators, among other continuous
> demographic variables).
>
> The new outcome variable abnlX would have 3 indicators (my mistake in the
> earlier message). As an indicator variable abnlX would be defined as
> follows:
>
> abnlX indicator #1 =0 if A is 0 or missing, B is 0/1/missing, C is
> 0/1/missing; =1 if A is 1, B is 0/1/missing, C is 0/1/missing
> abnlX indicator #2 =0 if B is 0 or missing, A is 0/1/missing, C is
> 0/1/missing; =1 if B is 1, A is 0/1/missing, C is 0/1/missing
> abnlX indicator #3 =0 if C is 0 or missing, A is 0/1/missing, B is
> 0/1/missing; =1 if C is 1, A is 0/1/missing, C is 0/1/missing
>
> Alternately for a categorical outcome variable abnlX it would be defined as
> follows:
> abnlX=0 if A=0 or missing & B=0 or missing & C=0 or missing
> abnlX=1 if A=1 & B=0/1/missing & C=0/1/missing
> abnlX=2 if B=1 & A=0/1/missing & C=0/1/missing
> abnlX=3 if C=1 & A=0/1/missing & B=0/1/missing
>
> Thank you again to everyone for your input, and hopefully this further
> clarifies my question.
>
> Deborah Huang
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```