[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Ekaterina Hertog <ekaterina.hertog@sociology.ox.ac.uk> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: RE: how to check that demographic values within the same zipcode are identical? |

Date |
Thu, 20 Aug 2009 14:50:05 +0100 |

Dear all, thank you very much for the advice! Sincerely yours, Ekaterina In message <200908171320.n7HDKad21323@hsphsun2.harvard.edu> statalist@hsphsun2.harvard.edu writes: > Hi, > > Nick Cox already mentioned the 'duplicates' command and it's just a > little twist to use it to find non-duplicates. "duplicates" is easy to > set up and works with different types of vars. > > duplicates tag zipcode var1-var5, gen(dup) > > "dup" counts the number of copies in each zipcode group starting with > the second identical case. > If var1-var5 in a zipcode group are constant, dup + 1 is equal to the > number of cases in the group (_N) > > bysort zipcode : assert _N == dup+1 > > In case of errors there may be many ways to spot and correct them, > depending on the size of the dataset, the number of vars to compare and > possible sources of error. It may be feasible to create a variable for > _N in each zipcode group > > bysort zipcode : gen N = _N > > The following code tabulates non-constant vars by zipcode > > levelsof zipcode if N != dup + 1, local(ziperror) > foreach x of local ziperror { > di "Zipcode: `x'" > foreach y of varlist var1-var5 { > qui tab `y' if zipcode == "`x'" // only to check if the var has more > than one non-missing values > if r(r) > 1 & r(r) <. tab `y' if zipcode == "`x'" // tabulates var if it > has more than one value > } > } > > > *** An example with an additional string var and some errors (the assert > command is commented out) > > > clear > input str10 zipcode var1 /* > */ var2 var3 var4 var5 str1 var6 > "0182801" 1252 144 115 113 29 "A" > "0182801" 1253 144 115 123 29 "A" > "0182801" 1253 144 115 113 29 "B" > "0182801" 1253 144 115 113 29 "A" > "0183204" 91 8 8 8 0 "C" > "0183204" 90 8 8 8 0 "D" > "0183331" 772 81 64 62 17 "E" > "0183331" 772 81 64 62 17 "F" > "0183331" 772 81 64 62 17 "E" > "0183505" 1716 262 218 211 44 "A" > "0183505" 1716 262 218 211 44 "A" > end > > duplicates tag zipcode var1-var6, gen(dup) > * bysort zipcode : assert _N == dup+1 > bysort zipcode : gen N = _N > levelsof zipcode if N != dup + 1, local(ziperror) > foreach x of local ziperror { > di "" > di "Zipcode: `x'" > foreach y of varlist var1-var6 { > qui tab `y' if zipcode == "`x'" // only to check if the var has more > than one values > if r(r) > 1 & r(r) <. tab `y' if zipcode == "`x'" // show vars with more > than one values > } > } > > > > Best wishes > Stefan Gawrich > Dillenburg > Germany > > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ -- Ekaterina Hertog (nee Korobtseva) Nissan Institute of Japanese Studies 27 Winchester Road, Oxford OX2 6NA * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: RE: how to check that demographic values within the same zipcode are identical?***From:*Stefan.Gawrich@hlpug.hessen.de

- Prev by Date:
**RE: st: log binomial regression - categorical independent variables** - Next by Date:
**st: Debugging tool** - Previous by thread:
**st: RE: how to check that demographic values within the same zipcode are identical?** - Next by thread:
**st: local containing (all) variables' names in a dataset** - Index(es):

© Copyright 1996–2016 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |