[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Stefan.Gawrich@hlpug.hessen.de |

To |
statalist@hsphsun2.harvard.edu |

Subject |
st: RE: how to check that demographic values within the same zipcode are identical? |

Date |
Mon, 17 Aug 2009 15:11:36 +0200 |

Hi, Nick Cox already mentioned the 'duplicates' command and it's just a little twist to use it to find non-duplicates. "duplicates" is easy to set up and works with different types of vars. duplicates tag zipcode var1-var5, gen(dup) "dup" counts the number of copies in each zipcode group starting with the second identical case. If var1-var5 in a zipcode group are constant, dup + 1 is equal to the number of cases in the group (_N) bysort zipcode : assert _N == dup+1 In case of errors there may be many ways to spot and correct them, depending on the size of the dataset, the number of vars to compare and possible sources of error. It may be feasible to create a variable for _N in each zipcode group bysort zipcode : gen N = _N The following code tabulates non-constant vars by zipcode levelsof zipcode if N != dup + 1, local(ziperror) foreach x of local ziperror { di "Zipcode: `x'" foreach y of varlist var1-var5 { qui tab `y' if zipcode == "`x'" // only to check if the var has more than one non-missing values if r(r) > 1 & r(r) <. tab `y' if zipcode == "`x'" // tabulates var if it has more than one value } } *** An example with an additional string var and some errors (the assert command is commented out) clear input str10 zipcode var1 /* */ var2 var3 var4 var5 str1 var6 "0182801" 1252 144 115 113 29 "A" "0182801" 1253 144 115 123 29 "A" "0182801" 1253 144 115 113 29 "B" "0182801" 1253 144 115 113 29 "A" "0183204" 91 8 8 8 0 "C" "0183204" 90 8 8 8 0 "D" "0183331" 772 81 64 62 17 "E" "0183331" 772 81 64 62 17 "F" "0183331" 772 81 64 62 17 "E" "0183505" 1716 262 218 211 44 "A" "0183505" 1716 262 218 211 44 "A" end duplicates tag zipcode var1-var6, gen(dup) * bysort zipcode : assert _N == dup+1 bysort zipcode : gen N = _N levelsof zipcode if N != dup + 1, local(ziperror) foreach x of local ziperror { di "" di "Zipcode: `x'" foreach y of varlist var1-var6 { qui tab `y' if zipcode == "`x'" // only to check if the var has more than one values if r(r) > 1 & r(r) <. tab `y' if zipcode == "`x'" // show vars with more than one values } } Best wishes Stefan Gawrich Dillenburg Germany * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: RE: how to check that demographic values within the same zipcode are identical?***From:*Ekaterina Hertog <ekaterina.hertog@sociology.ox.ac.uk>

- Prev by Date:
**RE: st: Thinking through best way to do a longitudinal analysis** - Next by Date:
**st: local containing (all) variables' names in a dataset** - Previous by thread:
**RE: statalist-digest V4 #3517 (Re: st: how to have legend as labels on graph)** - Next by thread:
**Re: st: RE: how to check that demographic values within the same zipcode are identical?** - Index(es):

© Copyright 1996–2017 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |