Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: RE: Check if variables have same value


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   RE: st: RE: Check if variables have same value
Date   Sun, 29 Jul 2007 16:48:36 +0100

On strings: this is an FAQ. 

FAQ     . . . . . . . . .  Counting distinct strings across a set of variables
        7/04    How do I count the number of distinct strings
                across a set of variables?
                http://www.stata.com/support/faqs/data/distinctstrings.html

There is a typo in the very last line of code. 

if `v'[`i'] != . & trim(`v'[`i']) != "" 

should be 

if `v'[`i'] != "." & trim(`v'[`i']) != "" 

Nick 
n.j.cox@durham.ac.uk 

Friedrich Huebler
 
> Sorry, I should have been more precise. I would like to tag individual
> observations if certain variables do not contain the same values for
> that particular observation.
> 
> The purpose is error checking in household survey data. Assume every
> woman is asked about her age and every man is asked about his wife's
> age. The information is stored in separate files. When the files are
> merged, every woman has one age (if she is not married) or two ages. I
> would like to identify cases where the ages are not the same.
> 
> -egen, rowmin()- and -egen, rowmax()- work for numeric variables like
> age but I hope there is a solution that also works with strings.
 
Nick Cox

> > Tagging in what sense?
> >
> > How do you tell which soldiers are out of step?
> > Majority vote? How do you split a 50:50
> > agreement? Three variables say "Stata" and three
> > say "SAS"? (No, that's an easy one to identify
> > which are incorrect.)
> >
> > (You didn't mention strings; I guess you don't
> > care about strings.)
> >
> > [...]
> >
> > Friedrich Huebler
> >
> > > I would like to compare a set of variables and tag those 
> that do not
> > > contain the same values. Missing values should be ignored. -egen
> > > newvar = diff(varlist)- is not an option because it does not skip
> > > missing values. The last command in the example below works but it
> > > becomes impractical with a longer list of variables.
> > >
> > > . sysuse auto
> > > . gen mpg2 = mpg if foreign==0
> > > . gen mpg3 = mpg if foreign==1
> > > . replace mpg3 = mpg+1 if rep78==2
> > > . egen tag = diff(mpg mpg2 mpg3)
> > > . gen tag2 = (mpg!=mpg2 & mpg<. & mpg2<. | mpg!=mpg3 & 
> mpg<. & mpg3<.
> > > | mpg2!=mpg3 & mpg2<. & mpg3<.)
> > >
> > > The -egen- command tags all observations, the -gen- 
> command only those
> > > that I expect to be tagged. Are there better solutions 
> that can also
> > > be used with ten or more variables?

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index