Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: RE: making data duplicate in terms of several variables in case of a given variable taking identical values

From	"Nick Cox" <[email protected]>
To	<[email protected]>
Subject	st: RE: RE: making data duplicate in terms of several variables in case of a given variable taking identical values
Date	Tue, 6 Jul 2010 13:07:10 +0100

Note that Richard Boylan asked essentially the same question on 30 June:


<http://www.hsph.harvard.edu/cgi-bin/lwgate/STATALIST/archives/statalist
.1006/date/article-1650.html> 

Richard's question was about string variables, not numeric variables,
but that difference is quite secondary to the main problem. 

See the subsequent thread for suggestions by Martin Weiss and myself,
most conveniently 

<http://www.hsph.harvard.edu/cgi-bin/lwgate/STATALIST/archives/statalist
.1007/date/article-11.html>

Two very simple morals arise:

1. Reading Statalist as well as writing to it will reveal tricks useful
to you. 

2. Regardless of that, the underlying techniques are already covered in
two FAQs, named in the posting just referred to, so going directly to
the FAQs would identify a solution. 

Nick 
[email protected] 

Martin Weiss

" I 
think that the only cases where prefecture, towncode and areacode vary 
while zipcodes are identical are when prefecture, towncode and areacode 
are sometimes missing and sometimes not, but I would like to check that 
before I do the necessary replacements."


You have to check those conditions one by one:


***********
clear*

input str10(zipcode prefecture) int(towncode areacode)
"0010027"   "hokkaido"    100        1100
"0010029"   "hokkaido"    100        1100
"0010029"   "" 							.
.
"0010030"   "hokkaido"    100        1100
"0200822"   "iwate"     201        3201
"0200823"   "" 						.        .
"0200823"   "iwate"     201        3201
"0200831"   "iwate"     201        3201
end

compress
li, noo sepby(zipcode)


bys zipcode: gen byte prefvaries=prefecture[1]!=prefecture[_N]
by zipcode: gen byte townvaries=towncode[1]!=towncode[_N]
by zipcode: gen byte areavaries=areacode[1]!=areacode[_N]
by zipcode: egen missings=total(mi(prefecture,towncode, areacode))
by zipcode: gen byte onlysomemiss=missings!=_N & missings!=0 
drop missings
//all conditions fulfilled?
gen byte complies=prefvaries+townvaries+areavaries+onlysomemiss==4
li, noo sepby(zipcode) ab(15)
***********

Ekaterina Hertog

I have some data which looks like this
zipcode        prefecture    towncode    areacode
0010027    hokkaido    100        1100
0010029    hokkaido    100        1100
0010029    .        .        .
0010030    hokkaido    100        1100
0200822    iwate        201        3201
0200823    .        .        .
0200823    iwate        201        3201
0200831    iwate        201        3201
I use Stata 11.

I would like to make my observations identical in terms of prefecture, 
towncode and areacode when they are identical in terms of zipcode. I 
think that the only cases where prefecture, towncode and areacode vary 
while zipcodes are identical are when prefecture, towncode and areacode 
are sometimes missing and sometimes not, but I would like to check that 
before I do the necessary replacements.
I looked into duplicate commands, but did not seem to find a good 
solution. I would be most grateful for any pointers.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: making data duplicate in terms of several variables in case of a given variable taking identical values
  - From: Ekaterina Hertog <[email protected]>
- st: RE: making data duplicate in terms of several variables in case of a given variable taking identical values
  - From: "Martin Weiss" <[email protected]>

Prev by Date: Re: st: interactions of treatment in treatreg
Next by Date: Re: st: Suest v/s biprob in stata 11
Previous by thread: st: RE: making data duplicate in terms of several variables in case of a given variable taking identical values
Next by thread: Re: st: RE: making data duplicate in terms of several variables in case of a given variable taking identical values
Index(es):
- Date
- Thread