[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Nick Cox" <n.j.cox@durham.ac.uk> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
RE: st: RE: Check if variables have same value |

Date |
Sun, 29 Jul 2007 16:48:36 +0100 |

On strings: this is an FAQ. FAQ . . . . . . . . . Counting distinct strings across a set of variables 7/04 How do I count the number of distinct strings across a set of variables? http://www.stata.com/support/faqs/data/distinctstrings.html There is a typo in the very last line of code. if `v'[`i'] != . & trim(`v'[`i']) != "" should be if `v'[`i'] != "." & trim(`v'[`i']) != "" Nick n.j.cox@durham.ac.uk Friedrich Huebler > Sorry, I should have been more precise. I would like to tag individual > observations if certain variables do not contain the same values for > that particular observation. > > The purpose is error checking in household survey data. Assume every > woman is asked about her age and every man is asked about his wife's > age. The information is stored in separate files. When the files are > merged, every woman has one age (if she is not married) or two ages. I > would like to identify cases where the ages are not the same. > > -egen, rowmin()- and -egen, rowmax()- work for numeric variables like > age but I hope there is a solution that also works with strings. Nick Cox > > Tagging in what sense? > > > > How do you tell which soldiers are out of step? > > Majority vote? How do you split a 50:50 > > agreement? Three variables say "Stata" and three > > say "SAS"? (No, that's an easy one to identify > > which are incorrect.) > > > > (You didn't mention strings; I guess you don't > > care about strings.) > > > > [...] > > > > Friedrich Huebler > > > > > I would like to compare a set of variables and tag those > that do not > > > contain the same values. Missing values should be ignored. -egen > > > newvar = diff(varlist)- is not an option because it does not skip > > > missing values. The last command in the example below works but it > > > becomes impractical with a longer list of variables. > > > > > > . sysuse auto > > > . gen mpg2 = mpg if foreign==0 > > > . gen mpg3 = mpg if foreign==1 > > > . replace mpg3 = mpg+1 if rep78==2 > > > . egen tag = diff(mpg mpg2 mpg3) > > > . gen tag2 = (mpg!=mpg2 & mpg<. & mpg2<. | mpg!=mpg3 & > mpg<. & mpg3<. > > > | mpg2!=mpg3 & mpg2<. & mpg3<.) > > > > > > The -egen- command tags all observations, the -gen- > command only those > > > that I expect to be tagged. Are there better solutions > that can also > > > be used with ten or more variables? * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

