Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: how to check that demographic values within the same zipcode are identical?


From   [email protected]
To   [email protected]
Subject   st: RE: how to check that demographic values within the same zipcode are identical?
Date   Mon, 17 Aug 2009 15:11:36 +0200

Hi,

Nick Cox already mentioned the 'duplicates' command and it's just a
little twist to use it to find non-duplicates. "duplicates" is easy to
set up and works with different types of vars.

duplicates tag zipcode var1-var5, gen(dup)

"dup" counts the number of copies in each zipcode group starting with
the second identical case.
If var1-var5 in a zipcode group are constant, dup + 1 is equal to the
number of cases in the group (_N)

bysort zipcode : assert _N == dup+1 

In case of errors there may be many ways to spot and correct them,
depending on the size of the dataset, the number of vars to compare and
possible sources of error.  It may be feasible to create a variable for
_N in each zipcode group

bysort zipcode : gen N = _N 

The following code tabulates non-constant vars by zipcode

levelsof zipcode if N != dup + 1, local(ziperror)
foreach x of local ziperror {
di "Zipcode: `x'"
foreach y of varlist var1-var5 {
qui tab `y' if zipcode == "`x'" // only to check if the var has more
than one non-missing values
if r(r) > 1 & r(r) <. tab `y' if zipcode == "`x'" // tabulates var if it
has more than one value
}
}


*** An example with an additional string var and some errors (the assert
command is commented out)


clear
input str10 zipcode var1 /* 
 */ var2 var3 var4 var5 str1 var6 
"0182801"	1252	144	115	113	29 "A"
"0182801"	1253	144	115	123	29 "A"
"0182801"	1253	144	115	113	29 "B"
"0182801"	1253	144	115	113	29 "A"
"0183204"	91	8	8	8	0 "C"
"0183204"	90	8	8	8	0 "D"
"0183331"	772	81	64	62	17 "E"
"0183331"	772	81	64	62	17 "F"
"0183331"	772	81	64	62	17 "E"
"0183505"	1716	262	218	211	44 "A"
"0183505"	1716	262	218	211	44 "A"
end

duplicates tag zipcode var1-var6, gen(dup) 
* bysort zipcode : assert _N == dup+1 
bysort zipcode : gen N = _N 
levelsof zipcode if N != dup + 1, local(ziperror)
foreach x of local ziperror {
di ""
di "Zipcode: `x'"
foreach y of varlist var1-var6 {
qui tab `y' if zipcode == "`x'" // only to check if the var has more
than one values
if r(r) > 1 & r(r) <. tab `y' if zipcode == "`x'" // show vars with more
than one values
}
}



Best wishes 
Stefan Gawrich
Dillenburg
Germany


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index