Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: RE: making data duplicate in terms of several variables in case of a given variable taking identical values


From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   st: RE: RE: making data duplicate in terms of several variables in case of a given variable taking identical values
Date   Tue, 6 Jul 2010 13:07:10 +0100

Note that Richard Boylan asked essentially the same question on 30 June:


<http://www.hsph.harvard.edu/cgi-bin/lwgate/STATALIST/archives/statalist
.1006/date/article-1650.html> 

Richard's question was about string variables, not numeric variables,
but that difference is quite secondary to the main problem. 

See the subsequent thread for suggestions by Martin Weiss and myself,
most conveniently 

<http://www.hsph.harvard.edu/cgi-bin/lwgate/STATALIST/archives/statalist
.1007/date/article-11.html>

Two very simple morals arise:

1. Reading Statalist as well as writing to it will reveal tricks useful
to you. 

2. Regardless of that, the underlying techniques are already covered in
two FAQs, named in the posting just referred to, so going directly to
the FAQs would identify a solution. 

Nick 
[email protected] 

Martin Weiss

" I 
think that the only cases where prefecture, towncode and areacode vary 
while zipcodes are identical are when prefecture, towncode and areacode 
are sometimes missing and sometimes not, but I would like to check that 
before I do the necessary replacements."


You have to check those conditions one by one:


***********
clear*

input str10(zipcode prefecture) int(towncode areacode)
"0010027"   "hokkaido"    100        1100
"0010029"   "hokkaido"    100        1100
"0010029"   "" 							.
.
"0010030"   "hokkaido"    100        1100
"0200822"   "iwate"     201        3201
"0200823"   "" 						.        .
"0200823"   "iwate"     201        3201
"0200831"   "iwate"     201        3201
end

compress
li, noo sepby(zipcode)


bys zipcode: gen byte prefvaries=prefecture[1]!=prefecture[_N]
by zipcode: gen byte townvaries=towncode[1]!=towncode[_N]
by zipcode: gen byte areavaries=areacode[1]!=areacode[_N]
by zipcode: egen missings=total(mi(prefecture,towncode, areacode))
by zipcode: gen byte onlysomemiss=missings!=_N & missings!=0 
drop missings
//all conditions fulfilled?
gen byte complies=prefvaries+townvaries+areavaries+onlysomemiss==4
li, noo sepby(zipcode) ab(15)
***********

Ekaterina Hertog

I have some data which looks like this
zipcode        prefecture    towncode    areacode
0010027    hokkaido    100        1100
0010029    hokkaido    100        1100
0010029    .        .        .
0010030    hokkaido    100        1100
0200822    iwate        201        3201
0200823    .        .        .
0200823    iwate        201        3201
0200831    iwate        201        3201
I use Stata 11.

I would like to make my observations identical in terms of prefecture, 
towncode and areacode when they are identical in terms of zipcode. I 
think that the only cases where prefecture, towncode and areacode vary 
while zipcodes are identical are when prefecture, towncode and areacode 
are sometimes missing and sometimes not, but I would like to check that 
before I do the necessary replacements.
I looked into duplicate commands, but did not seem to find a good 
solution. I would be most grateful for any pointers.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index