Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Merge two long datasets? and re: stopping loops


From   "White, Justin" <JWhite@yesvirginia.org>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: Merge two long datasets? and re: stopping loops
Date   Thu, 24 Aug 2006 10:51:42 -0400

Yes.  You can merge using two separate variables.  Look in the STATA
command help under merge.  There is an example where multiple variables
are referred to.  You must remember that the variables you use to merge
must be how your data set is sorted.  For instance, it you want to merge
using momid and year you must make sure your two data sets are sorted by
momid and year:

Sort momid year

Also, you can consider creating a new variable.  For instance:

Assuming the momid and year variables are string variables
Gen str prim_key = momid+year

If they are not strings, you must convert them to a string:
Tostring momid year, replace
Gen str prim_key = momid+year


Hope this helps.


Justin White

-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Claire M.
Kamp Dush
Sent: Thursday, August 24, 2006 10:43 AM
To: statalist@hsphsun2.harvard.edu
Subject: st: Merge two long datasets? and re: stopping loops

Thanks Scott for the tip.  I did make a mistake in copying my data, so 
thanks for pointing that out.  I have one follow-up and one new
question:

First, there is the continue command for breaking out of loops.  I just 
found it in the Stata 9 Programming manual.  So, anyone who is trying to

figure that out might want to check out that manual [P] under continue.
I 
wish I had found it earlier.

Second, before I found that command, I did as you advised, and managed
to 
merge the data together beautifully.  However, this poses another
question:

Is it possible to merge on two variables?  That is, can I merge two 
datafiles by momid AND by year at the same time?  Or, is it always 
necessary to convert both datasets back to wide form, then merge, then 
reconvert the new dataset to long.  This is what I did.  I have done
some 
digging to try to figure out how to merge long datasets, and I have
always 
come up short.

Claire



At 03:27 PM 8/23/2006, smerryman@kc.rr.com wrote:
>I didn't read through all your code, but perhaps, using -merge- can
>accomplish your goal. See the exmaple below.  Also it is not clear
>how, starting with your initial data set, you get the 1991 Divorced
>and 1992 Remarried for momid 2 in your final data set.
>
>Scott
>
>
>. l , noobs sepby(mom)
>
>   +---------------------------------------------------+
>   | momid   year       type1     y1      type2     y2 |
>   |---------------------------------------------------|
>   |     1   2000     Married   2000                 . |
>   |     1   2001                  .                 . |
>   |     1   2002   Separated   2001   Divorced   2001 |
>   |     1   2003                  .                 . |
>   |     1   2004                  .                 . |
>   |---------------------------------------------------|
>   |     2   1988     Married   1987                 . |
>   |     2   1989                  .                 . |
>   |     2   1990                  .                 . |
>   |     2   1991                  .                 . |
>   |     2   1992                  .                 . |
>   |     2   1993                  .                 . |
>   |     2   1994                  .                 . |
>   |     2   1995                  .                 . |
>   |     2   1996    Divorced   1993                 . |
>   |     2   1997                  .                 . |
>   |     2   1998                  .                 . |
>   |     2   1999                  .                 . |
>   |     2   2000   Remarried   1998                 . |
>   +---------------------------------------------------+
>
>. drop year
>
>. rename y1 year
>
>. sort mom year
>
>. merge mom year using "C:\Documents and
>Settings\scott.merryman\Desktop\foo.dta"
>variables momid year do not uniquely identify observations in the
>master data
>
>. drop if year ==.
>(13 observations deleted)
>
>. drop _m
>
>. sort mom year
>
>. order mom year type1 y1 type2 y2
>
>. l, noob sepby(mom)
>
>   +---------------------------------------------------+
>   | momid   year       type1     y1      type2     y2 |
>   |---------------------------------------------------|
>   |     1   2000     Married   2000                 . |
>   |     1   2001   Separated      .   Divorced   2001 |
>   |     1   2002   Separated   2001   Divorced   2001 |
>   |     1   2003                  .                 . |
>   |     1   2004                  .                 . |
>   |---------------------------------------------------|
>   |     2   1987     Married      .                 . |
>   |     2   1988     Married   1987                 . |
>   |     2   1989                  .                 . |
>   |     2   1990                  .                 . |
>   |     2   1991                  .                 . |
>   |     2   1992                  .                 . |
>   |     2   1993    Divorced      .                 . |
>   |     2   1994                  .                 . |
>   |     2   1995                  .                 . |
>   |     2   1996    Divorced   1993                 . |
>   |     2   1997                  .                 . |
>   |     2   1998   Remarried      .                 . |
>   |     2   1999                  .                 . |
>   |     2   2000   Remarried   1998                 . |
>   +---------------------------------------------------+
>
>
>
>----- Original Message -----
>From: "Claire M. Kamp Dush" <cmk54@cornell.edu>
>Date: Wednesday, August 23, 2006 12:52 pm
>Subject: st: programming: stopping loops?
>To: statalist@hsphsun2.harvard.edu
>
> > Hello, I feel embarrassed to post this because I am sure the
> > answer to this
> > is obvious, but I have been puzzling over this issue for a few
> > hours.  I am
> > trying to recode the family structure data in the NLSY 79 through
> > 2004.  I
> > am trying to go back and recode the data for missing years based
> > on reports
> > of marital changes between interviews at follow-ups.  For
> > instance, if an
> > individual was interviewed in 1991 and not in 1992, in 1993 they
> > are asked
> > to report up to 3 marital changes since the last time they were
> > interviewed.  My data is stacked, with each individual having 26
> > lines of
> > data, for years 1979 through 2004.  The id variable is momid and
> > the year
> > variable is year.  change1type, change2type, and change3type are
> > measured
> > each year where the respondent has data, and is a categorical
> > variable with
> > categories including married, divorced, separated, widowed, etc.
> > changey1_
> > , changey2_, and changey3_ are the years in which the each change
> > is said
> > to occur.  Here is an example of what the data look like:
> >
> > momid   year    change1type     changey1_       change2type
> > changey2_1               2000    Married         2000
> > 1               2001
> > 1               2002    Separated       2001            Divorced
> >     2001
> > 1               2003
> > 1               2004
> > 2               1988    Married 1987
> > 2               1989
> > 2               1990
> > 2               1991
> > 2               1992
> > 2               1993
> > 2               1994
> > 2               1995
> > 2               1996    Divorced        1993
> > 2               1997
> > 2               1998
> > 2               1999
> > 2               2000    Remarried       1998
> >
> > My goal is to have my data look like the following:
> >
> > momid   year    change1type     changey1_       change2type
> > changey2_
> >     change1misstype         change2misstype
> > 1               2000    Married         2000
> >             Married
> > 1               2001
> >             Separated               Divorced
> > 1               2002    Separated       2001            Divorced
> >     2001
> > 1               2003
> > 1               2004
> > 2               1987
> >             Married
> > 2               1988    Married         1987
> > 2               1989
> > 2               1990
> > 2               1991    Divorced        1991
> >             Divorced                Remarried
> > 2               1992    Remarried       1991
> > 2               1993
> >             Divorced
> > 2               1994
> > 2               1995
> > 2               1996    Divorced        1993
> > 2               1997
> > 2               1998
> >             Remarried
> > 2               1999
> > 2               2000    Remarried       1998
> >
>
>
>*
>*   For searches and help try:
>*   http://www.stata.com/support/faqs/res/findit.html
>*   http://www.stata.com/support/statalist/faq
>*   http://www.ats.ucla.edu/stat/stata/

Claire M. Kamp Dush, Ph.D.
Postdoctoral Fellow, Evolving Family Theme Project
Cornell University
Bronfenbrenner Life Course Center
Bebee Hall
Ithaca, NY  14853
607-255-9908
http://www.socialsciences.cornell.edu/0407/evolv_fam_desc.html

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index