Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Merge two long datasets? and re: stopping loops


From   "Scott Merryman" <smerryman@kc.rr.com>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: Merge two long datasets? and re: stopping loops
Date   Thu, 24 Aug 2006 10:26:59 -0500

Claire,

Yes, it is possible to merge on more than one variable.  In fact, in the
example that I posted yesterday, -merge- used both momid and year to merge
the data set back to the original (which I called foo.dta).


Scott

> -----Original Message-----
> From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-
> statalist@hsphsun2.harvard.edu] On Behalf Of Claire M. Kamp Dush
> Sent: Thursday, August 24, 2006 9:43 AM
> To: statalist@hsphsun2.harvard.edu
> Subject: st: Merge two long datasets? and re: stopping loops
> 
> Thanks Scott for the tip.  I did make a mistake in copying my data, so
> thanks for pointing that out.  I have one follow-up and one new question:
> 
> First, there is the continue command for breaking out of loops.  I just
> found it in the Stata 9 Programming manual.  So, anyone who is trying to
> figure that out might want to check out that manual [P] under continue.  I
> wish I had found it earlier.
> 
> Second, before I found that command, I did as you advised, and managed to
> merge the data together beautifully.  However, this poses another
> question:
> 
> Is it possible to merge on two variables?  That is, can I merge two
> datafiles by momid AND by year at the same time?  Or, is it always
> necessary to convert both datasets back to wide form, then merge, then
> reconvert the new dataset to long.  This is what I did.  I have done some
> digging to try to figure out how to merge long datasets, and I have always
> come up short.
> 
> Claire
> 
> 
> 
> At 03:27 PM 8/23/2006, smerryman@kc.rr.com wrote:
> >I didn't read through all your code, but perhaps, using -merge- can
> >accomplish your goal. See the exmaple below.  Also it is not clear
> >how, starting with your initial data set, you get the 1991 Divorced
> >and 1992 Remarried for momid 2 in your final data set.
> >
> >Scott
> >
> >
> >. l , noobs sepby(mom)
> >
> >   +---------------------------------------------------+
> >   | momid   year       type1     y1      type2     y2 |
> >   |---------------------------------------------------|
> >   |     1   2000     Married   2000                 . |
> >   |     1   2001                  .                 . |
> >   |     1   2002   Separated   2001   Divorced   2001 |
> >   |     1   2003                  .                 . |
> >   |     1   2004                  .                 . |
> >   |---------------------------------------------------|
> >   |     2   1988     Married   1987                 . |
> >   |     2   1989                  .                 . |
> >   |     2   1990                  .                 . |
> >   |     2   1991                  .                 . |
> >   |     2   1992                  .                 . |
> >   |     2   1993                  .                 . |
> >   |     2   1994                  .                 . |
> >   |     2   1995                  .                 . |
> >   |     2   1996    Divorced   1993                 . |
> >   |     2   1997                  .                 . |
> >   |     2   1998                  .                 . |
> >   |     2   1999                  .                 . |
> >   |     2   2000   Remarried   1998                 . |
> >   +---------------------------------------------------+
> >
> >. drop year
> >
> >. rename y1 year
> >
> >. sort mom year
> >
> >. merge mom year using "C:\Documents and
> >Settings\scott.merryman\Desktop\foo.dta"
> >variables momid year do not uniquely identify observations in the
> >master data
> >
> >. drop if year ==.
> >(13 observations deleted)
> >
> >. drop _m
> >
> >. sort mom year
> >
> >. order mom year type1 y1 type2 y2
> >
> >. l, noob sepby(mom)
> >
> >   +---------------------------------------------------+
> >   | momid   year       type1     y1      type2     y2 |
> >   |---------------------------------------------------|
> >   |     1   2000     Married   2000                 . |
> >   |     1   2001   Separated      .   Divorced   2001 |
> >   |     1   2002   Separated   2001   Divorced   2001 |
> >   |     1   2003                  .                 . |
> >   |     1   2004                  .                 . |
> >   |---------------------------------------------------|
> >   |     2   1987     Married      .                 . |
> >   |     2   1988     Married   1987                 . |
> >   |     2   1989                  .                 . |
> >   |     2   1990                  .                 . |
> >   |     2   1991                  .                 . |
> >   |     2   1992                  .                 . |
> >   |     2   1993    Divorced      .                 . |
> >   |     2   1994                  .                 . |
> >   |     2   1995                  .                 . |
> >   |     2   1996    Divorced   1993                 . |
> >   |     2   1997                  .                 . |
> >   |     2   1998   Remarried      .                 . |
> >   |     2   1999                  .                 . |
> >   |     2   2000   Remarried   1998                 . |
> >   +---------------------------------------------------+
> >
> >
> >
> >----- Original Message -----
> >From: "Claire M. Kamp Dush" <cmk54@cornell.edu>
> >Date: Wednesday, August 23, 2006 12:52 pm
> >Subject: st: programming: stopping loops?
> >To: statalist@hsphsun2.harvard.edu
> >
> > > Hello, I feel embarrassed to post this because I am sure the
> > > answer to this
> > > is obvious, but I have been puzzling over this issue for a few
> > > hours.  I am
> > > trying to recode the family structure data in the NLSY 79 through
> > > 2004.  I
> > > am trying to go back and recode the data for missing years based
> > > on reports
> > > of marital changes between interviews at follow-ups.  For
> > > instance, if an
> > > individual was interviewed in 1991 and not in 1992, in 1993 they
> > > are asked
> > > to report up to 3 marital changes since the last time they were
> > > interviewed.  My data is stacked, with each individual having 26
> > > lines of
> > > data, for years 1979 through 2004.  The id variable is momid and
> > > the year
> > > variable is year.  change1type, change2type, and change3type are
> > > measured
> > > each year where the respondent has data, and is a categorical
> > > variable with
> > > categories including married, divorced, separated, widowed, etc.
> > > changey1_
> > > , changey2_, and changey3_ are the years in which the each change
> > > is said
> > > to occur.  Here is an example of what the data look like:
> > >
> > > momid   year    change1type     changey1_       change2type
> > > changey2_1               2000    Married         2000
> > > 1               2001
> > > 1               2002    Separated       2001            Divorced
> > >     2001
> > > 1               2003
> > > 1               2004
> > > 2               1988    Married 1987
> > > 2               1989
> > > 2               1990
> > > 2               1991
> > > 2               1992
> > > 2               1993
> > > 2               1994
> > > 2               1995
> > > 2               1996    Divorced        1993
> > > 2               1997
> > > 2               1998
> > > 2               1999
> > > 2               2000    Remarried       1998
> > >
> > > My goal is to have my data look like the following:
> > >
> > > momid   year    change1type     changey1_       change2type
> > > changey2_
> > >     change1misstype         change2misstype
> > > 1               2000    Married         2000
> > >             Married
> > > 1               2001
> > >             Separated               Divorced
> > > 1               2002    Separated       2001            Divorced
> > >     2001
> > > 1               2003
> > > 1               2004
> > > 2               1987
> > >             Married
> > > 2               1988    Married         1987
> > > 2               1989
> > > 2               1990
> > > 2               1991    Divorced        1991
> > >             Divorced                Remarried
> > > 2               1992    Remarried       1991
> > > 2               1993
> > >             Divorced
> > > 2               1994
> > > 2               1995
> > > 2               1996    Divorced        1993
> > > 2               1997
> > > 2               1998
> > >             Remarried
> > > 2               1999
> > > 2               2000    Remarried       1998
> > >
> >
> >
> >*
> >*   For searches and help try:
> >*   http://www.stata.com/support/faqs/res/findit.html
> >*   http://www.stata.com/support/statalist/faq
> >*   http://www.ats.ucla.edu/stat/stata/
> 
> Claire M. Kamp Dush, Ph.D.
> Postdoctoral Fellow, Evolving Family Theme Project
> Cornell University
> Bronfenbrenner Life Course Center
> Bebee Hall
> Ithaca, NY  14853
> 607-255-9908
> http://www.socialsciences.cornell.edu/0407/evolv_fam_desc.html
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index