Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Nick Cox" <n.j.cox@durham.ac.uk> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
st: RE: RE: making data duplicate in terms of several variables in case of a given variable taking identical values |

Date |
Tue, 6 Jul 2010 13:07:10 +0100 |

Note that Richard Boylan asked essentially the same question on 30 June: <http://www.hsph.harvard.edu/cgi-bin/lwgate/STATALIST/archives/statalist .1006/date/article-1650.html> Richard's question was about string variables, not numeric variables, but that difference is quite secondary to the main problem. See the subsequent thread for suggestions by Martin Weiss and myself, most conveniently <http://www.hsph.harvard.edu/cgi-bin/lwgate/STATALIST/archives/statalist .1007/date/article-11.html> Two very simple morals arise: 1. Reading Statalist as well as writing to it will reveal tricks useful to you. 2. Regardless of that, the underlying techniques are already covered in two FAQs, named in the posting just referred to, so going directly to the FAQs would identify a solution. Nick n.j.cox@durham.ac.uk Martin Weiss " I think that the only cases where prefecture, towncode and areacode vary while zipcodes are identical are when prefecture, towncode and areacode are sometimes missing and sometimes not, but I would like to check that before I do the necessary replacements." You have to check those conditions one by one: *********** clear* input str10(zipcode prefecture) int(towncode areacode) "0010027" "hokkaido" 100 1100 "0010029" "hokkaido" 100 1100 "0010029" "" . . "0010030" "hokkaido" 100 1100 "0200822" "iwate" 201 3201 "0200823" "" . . "0200823" "iwate" 201 3201 "0200831" "iwate" 201 3201 end compress li, noo sepby(zipcode) bys zipcode: gen byte prefvaries=prefecture[1]!=prefecture[_N] by zipcode: gen byte townvaries=towncode[1]!=towncode[_N] by zipcode: gen byte areavaries=areacode[1]!=areacode[_N] by zipcode: egen missings=total(mi(prefecture,towncode, areacode)) by zipcode: gen byte onlysomemiss=missings!=_N & missings!=0 drop missings //all conditions fulfilled? gen byte complies=prefvaries+townvaries+areavaries+onlysomemiss==4 li, noo sepby(zipcode) ab(15) *********** Ekaterina Hertog I have some data which looks like this zipcode prefecture towncode areacode 0010027 hokkaido 100 1100 0010029 hokkaido 100 1100 0010029 . . . 0010030 hokkaido 100 1100 0200822 iwate 201 3201 0200823 . . . 0200823 iwate 201 3201 0200831 iwate 201 3201 I use Stata 11. I would like to make my observations identical in terms of prefecture, towncode and areacode when they are identical in terms of zipcode. I think that the only cases where prefecture, towncode and areacode vary while zipcodes are identical are when prefecture, towncode and areacode are sometimes missing and sometimes not, but I would like to check that before I do the necessary replacements. I looked into duplicate commands, but did not seem to find a good solution. I would be most grateful for any pointers. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: making data duplicate in terms of several variables in case of a given variable taking identical values***From:*Ekaterina Hertog <ekaterina.hertog@sociology.ox.ac.uk>

**st: RE: making data duplicate in terms of several variables in case of a given variable taking identical values***From:*"Martin Weiss" <martin.weiss1@gmx.de>

- Prev by Date:
**Re: st: interactions of treatment in treatreg** - Next by Date:
**Re: st: Suest v/s biprob in stata 11** - Previous by thread:
**st: RE: making data duplicate in terms of several variables in case of a given variable taking identical values** - Next by thread:
**Re: st: RE: making data duplicate in terms of several variables in case of a given variable taking identical values** - Index(es):