[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Eric G. Wruck" <ewruck@econalytics.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Merge Problem |

Date |
Thu, 4 Aug 2005 00:58:42 -0400 |

I kind of hacked away at one way to do this. I assumed what you wanted was a larger data set -- one that had, for example, id1 id2 112 5100 112 5101 112 5102 . . . . 112 5199 If this is correct, one approach would call for use of the expandcl statement. Here you have a cluster whenever you have a dash in id2. So gen cluster = 1 if strpos(id2,"-") Then you'll need to know how many observations to add/expand. Here gen expand = 10^(4 - (strpos(id2,"-") - 1)) if cluster I also save the observation number as a new variable n gen n = _n And now expandcl expand if cluster == 1, cluster(n) generate(newcl) But you're still not there since you need to create new id2 variables. Here you'll want something like this: gen id3 = substr(id2,1,strpos(id2,"-")-1) * expand if cluster and then sort id1 id2 cluster by id1 id2 cluster: gen ct = sum(cluster) - 1 replace id3 = id3 + ct if cluster == 1 replace id3 = real(id2) if cluster ~= 1 And now your id3 should be what you wanted for id2. +-------------------+ | id1 id2 id3 | |-------------------| 1. | 111 4569 4569 | 2. | 111 4574 4574 | 3. | 112 51-- 5100 | 4. | 112 51-- 5101 | 5. | 112 51-- 5102 | |-------------------| 6. | 112 51-- 5103 | 7. | 112 51-- 5104 | 8. | 112 51-- 5105 | 9. | 112 51-- 5106 | 10. | 112 51-- 5107 | |-------------------| 11. | 112 51-- 5108 | 12. | 112 51-- 5109 | 13. | 112 51-- 5110 | 14. | 112 51-- 5111 | 15. | 112 51-- 5112 | +-------------------+ Now, if you have an embedded dash, such as 3-75, then the task is somewhat more complicated. Let me know how you fare. Eric >Would someone kindly help me with this problem? > >I'm trying to join two datasets. The matching is one to many. Let's call >identifiers in the two datasets id1 and id2. I have a concordance of the >form (all variables are string): > >id1 id2 >111 4569 >111 4574 >112 51-- >112 5689 >113 411- >113 4145 >113 4310 > >The problem is where I have an entry for id2 like 51--. This means all >four digits that start with 51 should be matched to the id1 value of 112. >Likewise, 411- means all four digits starting with 411 correspond to 113. > >If I had just one entry like that, then I could go into dataset2 and >create another id variable for which all id2s starting with 51 or 411 have >the same value, and come back and make the appropriate change in the >concordance. But I have many cases like this where many id2s match with >id1. Would there be an easy way to take care of this? Perhaps some way to >automate the creation of an alternative id2 in the second dataset? > >Thank you very much for your help. > >Jason -- =================================================== Eric G. Wruck Econalytics 2535 Sherwood Road Columbus, OH 43209 ph: 614.231.5034 cell: 614.330.8846 eFax: 614.573.6639 eMail: ewruck@econalytics.com website: http://www.econalytics.com ==================================================== * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: Merge Problem***From:*Jason Hwang <jjhwang@fas.harvard.edu>

- Prev by Date:
**Re: st: using *one* (random) observation meeting an -if- criteria** - Next by Date:
**st: Competing Hazard Models with Multiple Failures** - Previous by thread:
**st: Merge Problem** - Next by thread:
**st: Competing Hazard Models with Multiple Failures** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |