Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Joseph Coveney" <jcoveney@bigplanet.com> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
st: Re: Can I use Many to Many merge for this case |

Date |
Mon, 27 Aug 2012 08:21:20 +0900 |

Nan Z wrote: I know that STATA manual suggest avoiding many to many merge. I would like to know for the following situation what I should do. Any suggestion is appreciated. My research question requires to determine the main activity of an individual in a month. For example, if one person spends more than half a month in working, he will be in the status of Job. Or he will be in the status of training if he takes more time on training. I have two datasets-- training and job as following. As you can see that each individual has more than one observation. This implies that they have at least one job or participate in training at least once in the survey period. The tid/jid tells us the training id or job id; t_sta/end is the start/ending time of training and j_sta/end correspond to jobs. My question is whether I can use many to many merge. Or there is other better way to do it. Thanks for any suggestions. [sample data omitted] -------------------------------------------------------------------------------- No, you should not do a many-to-many merge. Your research question tells you the variables on which to merge your datasets: "My research question requires to determine the main activity of an individual in a month." It looks like you're comparing working and training times on a month-by-month basis throughout the calendar year. So, your uniquely identifying variables are individual ("Id"), year (. . ., 2010, 2011, . . .) and month (1, 2, . . ., 11, 12). One approach is to (1) compute the number of days spent training for each person for each calendar year and month from your Training dataset, (2) -save- this dataset containing only variables for Id, calendar year and month, and days-spent-training, (3) compute the number of days spent working for each person for each calendar year and month from your Job dataset, (4) merge this latter dataset (in memory) with the saved training-time dataset using merge 1:1 Id year month using TrainingTimes and finally (5) compare the days-spent-training to days-spent-working variables to determine the main activity for each month for each person. Nick Cox wrote an article on spells in time that might be helpful to you here. It appeared in _The Stata Journal_ ( http://www.stata-journal.com/article.html?article=dm0029 ) some years ago, and is now available without charge. Joseph Coveney * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: Can I use Many to Many merge for this case***From:*Nan Z <nanz_mont@yahoo.ca>

- Prev by Date:
**Re: st: RE: FW: Running Polychoric Principal Component Analysis in STATA** - Next by Date:
**st: xtmixed iterations are holding up loop** - Previous by thread:
**st: Can I use Many to Many merge for this case** - Next by thread:
**Re: st: Can I use Many to Many merge for this case** - Index(es):