[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
vwiggins@stata.com (Vince Wiggins, StataCorp) |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Stata 11 |

Date |
Sun, 05 Jul 2009 21:52:30 -0500 |

There have been a few interrelated questions recently about the new factor variables in Stata 11. David Airey <david.airey@Vanderbilt.Edu> asks, > Will factor variables support different coding schemes like > indicator coding and effect coding? Joseph Coveney <jcoveney@bigplanet.com> echos David's question, > A question that I have, too. The coding schemes, at least those > that interest me most, such as reverse Helmert, can easily be done > from conventional dummy indicators with -lincom-, afterward. [...] > Maybe the new -margins- can do this sort of thing, too [...] The short answer is not yet. We gave thorough consideration to codings when designing factor variables. I can't say when we will we will undertake codings in a serious way, but both the syntax and internal workings of factor variables are compatible with a future implementation of codings. Roger Newson <r.newson@imperial.ac.uk> writes, > I too am keen to know the answer to David's query. I routinely use > the -noomit- option of -xi- in Stata 10 to fit multi-intercept, and have not > found any mention of a corresponding option [...] Roger is referring to the fact that typically one category (level) of a factor variable is omitted when creating the indicators for each level. We must omit one level because if we include indicators for all levels then the indicators will be collinear with the constant in our regressions. In Stata 11, there are two ways to control what level is used as the base and whether a base is used at all. In a variable list the ib. operator designates the base and can be abbreviated b. The default base is the lowest level. So if we type, . regress mpg i.rep78 The base level is 1 because 1 is the smallest of the levels of rep78 -- 1, 2, 3, 4, and 5. If instead, we type . regress mpg b3.rep78 the base level is 3, and our regression will not include an indicator for rep78==3. The operator bn. specifies that we do not want a base. That is to say, we want indicators for all levels of a variable. Typing, . regress mpg bn.rep78, noconstant runs a regression with indicators for all 5 levels of rep78. We could also type -b(last).rep78- to make the last level (5) of rep78 the base. Typing, -b(freq).rep78- makes the most frequent level (3) the base. The b. operator can also be used on interactions. You can also set the base on your variables permanently. Typing, . fvset base none rep78 foreign sets -rep78- and -foreign- to have no base. . fvset base 3 rep78 sets the base of -rep78- to 3. Typing, . fvset base last _all tells Stata that whenever any variable is used as a factor variable that the variable's largest value in the sample be used as the base. If you save your dataset, the -fvset-ings are saved too. There are other aspects of factor variables that we haven't discussed. For example, if we type, . regress y x1 x2 5.country then a single indicator for country==5 will be added to the model. Indicator variables for each level of a factor variable can be thought of as virtual variables that always exist in our data. That means they can also be used in expressions like -if 1.foreign-. -- Vince vwiggins@stata.com * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**Re: st: Mata question** - Next by Date:
**R: st: R:R:st Mata question** - Previous by thread:
**re: st: Stata 11** - Next by thread:
**st: Stata 11** - Index(es):

© Copyright 1996–2015 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |