Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Nick Cox <njcoxstata@gmail.com> |
To | "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |
Subject | Re: st: Identify observations that appear in a list |
Date | Thu, 13 Mar 2014 11:50:26 +0000 |
This is an FAQ, at least in the sense that this is frequently asked here. One approach is just to -merge- the data with a reduced copy of itself, with the important twist that you -rename- what you want as an identifier. The slogan I use to remind myself of this trick is "-merge- is for finding intersections as well as unions" and you're welcome to pin or write it on a board near you. http://www.stata.com/support/faqs/data-management/group-characteristics-for-subsets/ is also relevant. . clear . input str5 CustomerIndustry str5 SupplierIndustry Input Custome~y Supplie~y Input 1. 1000A 4000B 100 2. 1000A 3000A 200 3. 1000A 3000B 100 4. 1000B 4000B 50 5. 1000B 2000A 8 6. 4000B 3000A 19 7. 4000B 2000A 20 8. 3000A 3000B 18 9. 3000A 3000D 12 10. 2000A 1000D 25 11. end . save tostart file tostart.dta saved . bysort SupplierIndustry: keep if _n == 1 (4 observations deleted) . keep SupplierIndustry . rename SupplierIndustry CustomerIndustry . merge 1:m CustomerIndustry using tostart Result # of obs. ----------------------------------------- not matched 8 from master 3 (_merge==1) from using 5 (_merge==2) matched 5 (_merge==3) ----------------------------------------- . tab _merge _merge | Freq. Percent Cum. ------------------------+----------------------------------- master only (1) | 3 23.08 23.08 using only (2) | 5 38.46 61.54 matched (3) | 5 38.46 100.00 ------------------------+----------------------------------- Total | 13 100.00 . end of do-file . list if _merge==3 +-------------------------------------------+ | Custom~y Suppli~y Input _merge | |-------------------------------------------| 2. | 2000A 1000D 25 matched (3) | 3. | 3000A 3000B 18 matched (3) | 6. | 4000B 3000A 19 matched (3) | 12. | 3000A 3000D 12 matched (3) | 13. | 4000B 2000A 20 matched (3) | +-------------------------------------------+ Nick njcoxstata@gmail.com On 13 March 2014 02:12, R Zhang <r05zhang@gmail.com> wrote: > I have the following data set (HAVE) (only provide a few observations > as illustration). The input variable gives the dollar input sold by > supplier to customer. You will notice that customer industry 4000B, > 3000A also appear in SupplierIndustry. This indicates that some > industries can be both suppliers and customer. > > +++++++++++++++++++++++ > > HAVE > > CustomerIndustry SupplierIndustry Input > > 1000A 4000B 100 > > 1000A 3000A 200 > > 1000A 3000B 100 > > 1000B 4000B 50 > > 1000B 2000A 8 > > 4000B 3000A 19 > > 4000B 2000A 20 > > 3000A 3000B 18 > > 3000A 3000D 12 > > 2000A 1000D 25 > > +++++++++++++++++++++++ > > I want to create a dataset that list all customer industries that are > also supplier industry, i.e., my output shall appear as : > > CustomerIndustry SupplierIndustry Input > > 4000B 3000A 19 > > 4000B 2000A 20 > > 3000A 3000B 18 > > 3000A 3000D 12 > > 2000A 1000D 25 > > I am asking for your help on coding this. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/