Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: how to check that demographic values within the same zipcode are identical?


From   Ekaterina Hertog <ekaterina.hertog@sociology.ox.ac.uk>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: RE: how to check that demographic values within the same zipcode are identical?
Date   Thu, 20 Aug 2009 14:50:05 +0100

Dear all,
thank you very much for the advice!
Sincerely yours,
Ekaterina

In message <200908171320.n7HDKad21323@hsphsun2.harvard.edu> statalist@hsphsun2.harvard.edu writes:
> Hi,
> 
> Nick Cox already mentioned the 'duplicates' command and it's just a
> little twist to use it to find non-duplicates. "duplicates" is easy to
> set up and works with different types of vars.
> 
> duplicates tag zipcode var1-var5, gen(dup)
> 
> "dup" counts the number of copies in each zipcode group starting with
> the second identical case.
> If var1-var5 in a zipcode group are constant, dup + 1 is equal to the
> number of cases in the group (_N)
> 
> bysort zipcode : assert _N == dup+1 
> 
> In case of errors there may be many ways to spot and correct them,
> depending on the size of the dataset, the number of vars to compare and
> possible sources of error.  It may be feasible to create a variable for
> _N in each zipcode group
> 
> bysort zipcode : gen N = _N 
> 
> The following code tabulates non-constant vars by zipcode
> 
> levelsof zipcode if N != dup + 1, local(ziperror)
> foreach x of local ziperror {
> di "Zipcode: `x'"
> foreach y of varlist var1-var5 {
> qui tab `y' if zipcode == "`x'" // only to check if the var has more
> than one non-missing values
> if r(r) > 1 & r(r) <. tab `y' if zipcode == "`x'" // tabulates var if it
> has more than one value
> }
> }
> 
> 
> *** An example with an additional string var and some errors (the assert
> command is commented out)
> 
> 
> clear
> input str10 zipcode var1 /* 
>  */ var2 var3 var4 var5 str1 var6 
> "0182801"	1252	144	115	113	29 "A"
> "0182801"	1253	144	115	123	29 "A"
> "0182801"	1253	144	115	113	29 "B"
> "0182801"	1253	144	115	113	29 "A"
> "0183204"	91	8	8	8	0 "C"
> "0183204"	90	8	8	8	0 "D"
> "0183331"	772	81	64	62	17 "E"
> "0183331"	772	81	64	62	17 "F"
> "0183331"	772	81	64	62	17 "E"
> "0183505"	1716	262	218	211	44 "A"
> "0183505"	1716	262	218	211	44 "A"
> end
> 
> duplicates tag zipcode var1-var6, gen(dup) 
> * bysort zipcode : assert _N == dup+1 
> bysort zipcode : gen N = _N 
> levelsof zipcode if N != dup + 1, local(ziperror)
> foreach x of local ziperror {
> di ""
> di "Zipcode: `x'"
> foreach y of varlist var1-var6 {
> qui tab `y' if zipcode == "`x'" // only to check if the var has more
> than one values
> if r(r) > 1 & r(r) <. tab `y' if zipcode == "`x'" // show vars with more
> than one values
> }
> }
> 
> 
> 
> Best wishes 
> Stefan Gawrich
> Dillenburg
> Germany
> 
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/

-- 
Ekaterina Hertog (nee Korobtseva)
Nissan Institute of Japanese Studies
27 Winchester Road, Oxford
OX2 6NA

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index