[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Nick Cox" <n.j.cox@durham.ac.uk> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
st: dummy variable generation [was: simple question] |

Date |
Tue, 5 Feb 2008 19:31:24 -0000 |

This "simple question" has generated a thread with eight replies so far, so it clearly poses a challenge. By the way, please use more informative titles for your postings, Renuka! Maarten Buis, Svend Juul, Martin Weiss and E. Paul Wileyto all made good points, but none gave my favoured solution, leaving scope for a ninth reply. Solution first, then comment: gen byte redundism = cond(missing(zredundab, zdismissa), ., (zredundab == 1 | zdismissa == 1)) For a problem like this we seek first correctness and then as far as possible clarity, conciseness and efficiency. Paul and Maarten flagged that missing values need to be handled properly whenever they exist. Coding on the assumption that missings might be present is always safe. missing(a, b) will evaluate to 1, meaning true, whenever one or both of a or b is missing. Hence the first two arguments of the call to -cond(,)- above: missing(zredundab, zdismissa), . yield missing results for the dummy if either variable is missing. The first argument (zredundab == 1 | zdismissa == 1) will evaluate to 1 when true and 0 when false (as Martin stressed), completing the assignment in a single command. The FAQ on true and false in Stata 2/03 What is true and false in Stata? http://www.stata.com/support/faqs/data/trueorfalse.html gives a longer discussion with more examples. Insisting on a -byte- variable is for efficiency in storage. If you generate lots of floats for dummies, it may be Stata that will bite you when you run into memory problems. More bytes, fewer bites. As Svend signalled, considering all the cross-combinations in a truth table is good technique. zredundab 0 1 zdismissa 0 a b 1 c d (zredundab == 1 | zdismissa == 1) covers cells b, c and d of the table above. That leaves just cell a, which is defined by (zredundab == 0 & zdismissa == 0). But you need not puzzle that out. Just negating the condition would solve the problem. That is, the two conditions (zredundab == 1 | zdismissa == 1) and !(zredundab == 1 | zdismissa == 1) are complementary and divide up the field. Just parenthesising and negating is especially useful as conditions get more and more complicated. I know that some people may want to spell out each step gen redundism = 1 if (zredundab == 1 | zdismissa == 1) replace redundism = 0 if !(zredundab == 1 | zdismissa == 1) replace redundism = . if missing(zredundab, zdismissa) but the only advantage of that is whenever it appears clearer to you or your readers. It is best just to internalise the Stata fact that logical conditions evaluate to 0 or 1 as soon as you can, as it is so useful. Finally, -egen- is evidently not needed here. Its use to compute a row sum of two variables is very inefficient, replacing one command by dozens once -egen- is interpreted. Nick n.j.cox@durham.ac.uk Renuka Metcalfe I want to create a dummy variable redundism which equals dummy = 1 if the establishment has had any dismissals or redundancies in the past 12 months. I would be grateful, if anyone would let me know if the following is the correct way to do it. There is a debate amongst us at whether it in the second line it should be "|" or "&". I would be grateful, if you would confirm if the following is correct or should it be an "&" ge redundism=. replace redundism=1 if zredundab==1|zdismissa==1 replace redundism=0 if zredundab==0|zdismissa==0 * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: simple question***From:*Renuka Metcalfe <rm18203@yahoo.co.uk>

- Prev by Date:
**RE: st: model-based standardization** - Next by Date:
**RE: st: sorting with tabdisp** - Previous by thread:
**Re: st: simple question** - Next by thread:
**RE: st: simple question** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |