Nick Cox <n.j.cox@durham.ac.uk>

statalist@hsphsun2.harvard.edu

RE: st: recognizing patterns within two columns of data

Thu, 7 Jul 2011 11:11:08 +0100

The advice here sounds an appropriate caution, but much bigger problems with this solution are not mentioned. Note that -vallist- (SSC) doesn't do here anything that -levelsof- (official Stata) does not do. In fact, there is much more engineering behind -levelsof-, which is just -vallist- made official, and much more tested for larger sets of values. (The main reasons for -vallist- to continue to be visible are nothing to do with anything used here.) Further, commands like local temp1=r(list) will just truncate their arguments at 244 characters, so this code won't work for any serious dataset. Fixing this by something like local temp1 `r(list)' would remove that problem. The sticking-point for this solution then becomes the same kind of problem in another guise, namely an assumption that a list of holders can be held within a string variable, which cannot be more than 244 characters long. Without knowing anything about Dalhia's real data, my guess is that such an assumption may bite, so watch out. Nick n.j.cox@durham.ac.uk P.S. On a matter of style, note that Subrata's code egen group_hold=group(hold_list) tostring group_hold, replace vallist group_hold local temp3=r(list) foreach x of local temp3{ vallist company if group_hold=="`x'" local temp4=r(list) replace comp_list="`temp4'" if group_hold=="`x'" } incorporates some needless to-and-fro, turning a well-behaved integer variable into a string and then calling up -vallist- when the answer is predictable in advance: egen group_hold=group(hold_list) su group_hold, meanonly forval x = 1/`r(max)' { vallist company if group_hold==`x' replace comp_list="`r(list)''" if group_hold==`x' } should have the same effect. However, this is just tinkering, as the larger problems mentioned above still remain. SUBRATA BHATTACHARYYA You might want to try this: (though you would need a package vallist for this, please use -findit- to locate and install) I stored the data (you provided) in a variable named as comp_hold and then split them into company and holder. Then I used vallist to identify distinct observation and used that in a macro to get this output: +-------------------------------------------------+ | hold_list comp_list | |---------------------------------------------------| 1. | holderA holderB compA compB | 2. | holderB compC | +------------------------------------------------+ I hope this works. This is what I wrote: split comp_hold ren comp_hold1 company ren comp_hold2 holder sort company holder gen hold_list="" gen comp_list="" vallist company local temp1=r(list) foreach x of local temp1{ vallist holder if company=="`x'" local temp2=r(list) replace hold_list="`temp2'" if company=="`x'" } egen group_hold=group(hold_list) tostring group_hold, replace vallist group_hold local temp3=r(list) foreach x of local temp3{ vallist company if group_hold=="`x'" local temp4=r(list) replace comp_list="`temp4'" if group_hold=="`x'" } duplicates drop hold_list, force list hold_list comp_list I hope this works for you. FYI, I used Stata 11.2. Just one small advice, please be sure that vallist can capture all the company names or holder names at one go, I am not sure whether it can return a full list of the names if your data set is too large. In that case, you might want to split your file into manageable pieces. On Thu, Jul 7, 2011 at 11:37 AM, Dalhia <ggs_da@yahoo.com> wrote: > Hello, Thanks. But egen group won't work since the holders are not the same. CompA and B (which I want grouped together) are owned by holderA and by holderB. The link is that these two companies are owned by people who also own shares in the other company - holderA owns shares in compA and also compB; similarly holderB owns shares in compA and also in compB. I want to identify those companies that are linked by multiple common owners. > > Example: > compA holderA > compB holderA > compA holderB > compB holderB > compC holderB > > What I want: > compA group1 > compB group1 > > Thanks for your help. --- On Wed, 7/6/11, Nick Cox <n.j.cox@durham.ac.uk> wrote:

> From: Nick Cox <n.j.cox@durham.ac.uk>
> Subject: RE: st: recognizing patterns within two columns of data
> To: "'statalist@hsphsun2.harvard.edu'" <statalist@hsphsun2.harvard.edu>
> Date: Wednesday, July 6, 2011, 7:50 PM
> -egen, group()- ?
>
> Nick
> n.j.cox@durham.ac.uk
>
>
> Austin Nichols
>
> Do you want to make an identifier as in
> http://www.stata.com/statalist/archive/2011-07/msg00170.html
> ?
>
> On Wed, Jul 6, 2011 at 10:12 AM, Dalhia <ggs_da@yahoo.com>
> wrote:
>
> > I would like some advice on how to do the following.
> Here is how the data looks:
> >
> > compA holderA
> > compB holderA
> > compC holderL
> > compD holderH
> > compA holderB
> > compB holderB
> > compC holderB
> >
> > Above, there was more than one instance where compA
> and compB had the same holder. In a large database, how do I
> identify instances where a set of comps appear repeatedly
> with the same holders? In a large database, how do I > > identify instances where a set of comps appear repeatedly > > with the same holders? > > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

