Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Re: Can I use Many to Many merge for this case


From   "Joseph Coveney" <jcoveney@bigplanet.com>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: Re: Can I use Many to Many merge for this case
Date   Mon, 27 Aug 2012 08:21:20 +0900

Nan Z wrote:

I know that
STATA manual suggest avoiding many to many merge. I would like to know for the
following situation what I should do. Any suggestion is appreciated.
My research
question requires to determine the main activity of an individual in a month.
For
example, if one person spends more than half a month in working, he will be in
the status of Job. Or he will be in the status of training if he takes more
time on training. 
I have two
datasets-- training and job as following. As you can see that each individual
has more than one observation. This implies that they have at least one job or
participate in training at least once in the survey period. The tid/jid tells us
the training id or job id; t_sta/end is the start/ending time of training and
j_sta/end correspond to jobs. 
My question
is whether I can use many to many merge. Or there is other better way to do
it. Thanks for any suggestions. 

[sample data omitted]

--------------------------------------------------------------------------------

No, you should not do a many-to-many merge.  

Your research question tells you the variables on which to merge your datasets:
"My research question requires to determine the main activity of an individual
in a month."  It looks like you're comparing working and training times on a
month-by-month basis throughout the calendar year.  So, your uniquely
identifying variables are individual ("Id"), year (. . ., 2010, 2011, . . .) and
month (1, 2, . . ., 11, 12).  

One approach is to (1) compute the number of days spent training for each person
for each calendar year and month from your Training dataset, (2) -save- this
dataset containing only variables for Id, calendar year and month, and
days-spent-training, (3) compute the number of days spent working for each
person for each calendar year and month from your Job dataset, (4) merge this
latter dataset (in memory) with the saved training-time dataset using

merge 1:1 Id year month using TrainingTimes

and finally (5) compare the days-spent-training to days-spent-working variables
to determine the main activity for each month for each person.

Nick Cox wrote an article on spells in time that might be helpful to you here.
It appeared in _The Stata Journal_ (
http://www.stata-journal.com/article.html?article=dm0029 ) some years ago, and
is now available without charge.

Joseph Coveney



*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index