Thank you very much Nick! The explanation is very helpful since I know understand more things when it comes to programming when there are no ready-2-use solutions out there.
(The joke is well taken; I suppose my name really calls for it)
Regards,
Socrates
In message <031173627889364697C50B3B266CBB8A01C07C34@GEOGMAIL.geog.ad.dur.ac.uk> statalist@hsphsun2.harvard.edu writes:
> Socrates asks some tough questions, as Plato also said.
> (Sorry, couldn't resist.)
>
> -fndmtch2- is mine and on SSC and dates from 2000. Its ugly name
> reflects the fact that some Stata users were still working
> with platforms limited to 8.3 or filename.ext filenames,
> and also that there is a -fndmtch- too.
>
> Socrates found a bug. Thanks for pointing it out. I got rid
> of it by rewriting the program from scratch, almost. The
> original program used a grotesque backward logic that produces
> the right answer for examples given, but falls over
> for Socrates' example, which isn't exotic. In retrospect
> that bug is shocking, but Stata is much more powerful now than
> it was in 2000, and I have more experience, but fewer brain cells.
> The net result is still positive. Anyway, in the original
> I evidently made a hidden assumption which just isn't true
> in general.
>
> I have a -findmatch- now that produces correct answers
> in this case, and in previous ones too. I'll send it to Kit
> Baum. But more interesting, and probably more useful, is
> to talk about a direct attack on Socrates' problem
> so that he gets to see how to do it himself.
>
> The "find a match" problem here has this flavour: for
> different values of -var1-, how many values of -var2-
> are the same? They can be anywhere in the dataset,
> unless you want to slap on -if- or -in- restrictions.
>
> There is going to be a loop over the distinct values
> in my solutions. Each time round the loop I am going
> to do a -count-, and put the result into a variable
> in the right place(s). To do that I need to have a
> variable to put it in.
>
> gen long count = 0
>
> initialises a counter variable. The -long- is paranoid,
> just in case the counts get really big. Initialising
> it to missing is another good way.
>
> For toy examples, I can use -levelsof- confidently.
> In Socrates' case, -var1- and -var2- are both string,
> so let's focus on that situation.
>
> levelsof var1, local(levels)
>
> puts the distinct values into a local macro.
>
> quietly foreach l of local levels {
> count if `"`l'"' == var2
> replace count = r(N) if var1 == `"`l'"'
> }
>
> That's a first solution. I slapped on compound
> double quotes `" "' just in case there are double
> quotes lurking in the strings. That's paranoid too,
> but does no harm. Just because you're paranoid
> doesn't mean the data aren't trying to get you.
>
> Now this pivots on both variables being string. Also,
> in a industrial-strength solution, you wouldn't want
> to rely on all the distinct values fitting into a macro,
> so -levelsof- may be set on one side. One thing we
> can always do is map the distinct values to successive
> integers:
>
> egen group = group(var1)
> su group, meanonly
> local ngroup = r(max)
>
> -egen, group()- maps the distinct values of -var1- to the
> integers 1,...,#groups; and we can retrieve #groups by a
> -summarize- and then peeking at the saved results.
> Initialise as before:
>
> gen long count = 0
>
> Another variable will come in useful, holding the
> observation numbers:
>
> gen long obs = _n
>
> qui forval i = 1/`ngroup' {
> su obs if group == `i', meanonly
> local first = r(min)
> count if var1[`first'] == var2
> replace count = r(N) if group == `i'
> }
>
> The loop uses a look-up technique. When we
> are focusing on -group == 1-, for example, how
> we know what value of -var1- we are dealing with?
> (By construction, -var1- is constant for each
> distinct value of -group- -- we set up a one-to-one
> mapping -- but what is that constant?) Notice that
> it is not general enough to go
>
> su var1 if group == `i'
>
> and look at the saved results, because in general
> -var1- could be a string (and it is in Socrates'
> example). We have to be one step more devious.
> We just need to find the observation number for any
> observation in a particular group, and then we can
> get at the corresponding value of -var1-. That
> is where the -obs- variable comes in useful.
> There are two saved results that will work, the
> minimum or the maximum, and you can choose. (The
> mean won't work in general: consider, for example,
> a group with just two representatives, in observation
> 8 and observation 10: the mean at 9 does not
> identify a representative.)
>
> So here is some code for Socrates' example:
>
> egen group = group(owner)
> su group, meanonly
> local ngroup = r(max)
> gen long match = 0
> gen long obs = _n
> qui forval i = 1/`ngroup' {
> su obs if group == `i', meanonly
> local first = r(min)
> count if owner[`first'] == inter
> replace match = r(N) if group == `i'
> }
>
> Nick
> n.j.cox@durham.ac.uk
>
> Socrates Mokkas
>
> > I seem to have a problem with the command fndmtch2.
> > My data is a huge sample of companies. They have the form of:
> >
> > Firms Inter Owner match
> > c r g 0
> > c r t 1
> > b t r 1
> >
> > I want find whether companies that are "Owners" are included
> > in the category
> > of "Inter" also. I run the command fndmtch2 which gives me
> > the variable "match"
> > The command I run is:
> > fndmtch2 Owner Inter, generate(match3) count miss
> >
> > What I do not understand is why isn't match=2 for the case of
> > the 3rd observation since the element of "r" can be met twice
> > (1st and 2nd observation) in the "Inter" variable. Thank you
> > very much!
>
>
> *
> * For searches and help try:
> * http://www.stata.com/support/faqs/res/findit.html
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/