Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Useful labelling of dummy variables following logit

From   Nick Cox <[email protected]>
To   [email protected]
Subject   Re: st: Useful labelling of dummy variables following logit
Date   Wed, 24 Aug 2011 11:42:34 +0100

In addition to Maarten's excellent advice, sSee also -dummieslab- from SSC:

Generating dummy variables from categorical variable using value label names

        dummieslab varname [if exp] [in range]
                 [, word(integer) from(string) to(string)
template(string) truncate(integer) novarlabel ]


    dummieslab generates a set of dummy variables from a categorical
variable. One dummy variable is
    created for each level of the original variable. Names for the
dummy variables are derived from the
    value labels of the categorical variable. (Raw (unlabelled) values
are used if the categorical
    variable has no value labels attached.)

    Two different behaviours can be chosen for the variable names:
(i) use full value labels; (ii) use
    the sth word of the label. In both cases, all invalid characters
are stripped from the new variable

    Any user-defined prefix and/or suffix can be added using the
template option.

    In all cases, no new variable will be generated unless all implied
new names are valid.

    dummieslab applied to variables with no label appends the level to
the original variable name (very
    much like what tabulate does).


    word(s) requests that the sth word of the label be used as the new
variable name. Note the use of
        word(-1) to specify the last word of the label.

    from(string) and to(string) are used together to make replacements
to the strings used to create the
        new variables. from(string) contains a list of words to be
replaced by the list of words supplied
        in to(string), i.e. the first item in from is substituted by
the first item in to, the second
        item in from is substituted by the second item in to, etc.  By
default, all invalid characters
        are dropped from the value labels to create new variable
names. This behaviour can be overridden
        by the use of from(string) and to(string). For example, use
from(" ") and to("_") to replace all
        blanks by underscores.

    template(word)  specifies a template for the new variable name. @
is used as a placeholder for
        inserting the extracted label. This option is used to insert a
prefix (anything before @ in word)
        and/or a suffix (anything after @ in word).

    truncate(n) truncates new variable names after n characters.

    novarlabel prevents automated variable labelling of the generated dummies.

Saved results

      r(names)   List of names of created dummies
       r(from)   Name of the original categorical variable


    . sysuse auto
    . label define newfor 0 "Domestic car" 1 "Foreign (European or
Japanese) car"
    . label values foreign newfor
    . dummieslab foreign
    . dummieslab foreign, word(1)
    . dummieslab foreign, word(-1)
    . dummieslab foreign, from(" ") to("_")
    . dummieslab foreign, from(car or Foreign) to("" "_" "")
    . dummieslab foreign, from(car Foreign or) to("" "" "_")
    . dummieslab foreign, word(1) template("My_@_car")


    Patrick Joly made helpful suggestions on the first version of
dummieslab, which led to the addition
    of the from and to options. Daniel Klein suggested option novarlabel.


    Philippe Van Kerm, CEPS/INSTEAD, Differdange, G.-D. Luxembourg
    [email protected]

    Nicholas J. Cox, Durham University, U.K.
    [email protected]

Also see

    On-line:  tabulate
    On-line (if installed):  dummies

On Wed, Aug 24, 2011 at 11:30 AM, Maarten Buis <[email protected]> wrote:
> On Wed, Aug 24, 2011 at 12:12 PM, Tim Evans wrote:
>> I'm running a logistic regression analysis (logit) in Stata11.2and capturing the output in a log file to which I intend to refer to even when not using Stata. However, the dummy variables in the table output is not user friendly in that I need to be looking at Stata to decode the dummy variables and I wanted to know whether there was a way to get Stata to label up the dummy variables? I'm using the following command:
>> xi: logit Early1  i.eth2 age i.invsurg i.region i.dep if dep!=9 & sex==2, or
> If you are not going to use post-estimation commands like -margins-
> than I would just create the dummies myself, that way I have complete
> control over how they are named. This is what I used to do in Stata <
> 11, I hardly ever used -xi-.
> If you want to use post-estimation commands like -margins-, I would
> leave out -xi- but otherwise leave the command unchanged thus using
> the factor variable notation, see -help fvvarlist-. The output will be
> a bit clearer, but it will still not contain labels. You could use
> -label list- below your regression to add a "legend" below your
> output.

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index