Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: RE: Question about fndmtch2


From   Socrates Mokkas <socrates.mokkas@economics.oxford.ac.uk>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: RE: Question about fndmtch2
Date   Fri, 03 Nov 2006 12:51:10 +0000

Thank you very much Nick! The explanation is very helpful since I know understand more things when it comes to programming when there are no ready-2-use solutions out there.
(The joke is well taken; I suppose my name really calls for it)

Regards,
Socrates



In message <031173627889364697C50B3B266CBB8A01C07C34@GEOGMAIL.geog.ad.dur.ac.uk> statalist@hsphsun2.harvard.edu writes:
> Socrates asks some tough questions, as Plato also said. 
> (Sorry, couldn't resist.) 
> 
> -fndmtch2- is mine and on SSC and dates from 2000. Its ugly name 
> reflects the fact that some Stata users were still working 
> with platforms limited to 8.3 or filename.ext filenames, 
> and also that there is a -fndmtch- too. 
> 
> Socrates found a bug. Thanks for pointing it out. I got rid 
> of it by rewriting the program from scratch, almost. The 
> original program used a grotesque backward logic that produces 
> the right answer for examples given, but falls over 
> for Socrates' example, which isn't exotic. In retrospect
> that bug is shocking, but Stata is much more powerful now than
> it was in 2000, and I have more experience, but fewer brain cells.  
> The net result is still positive. Anyway, in the original 
> I evidently made a hidden assumption which just isn't true 
> in general. 
> 
> I have a -findmatch- now that produces correct answers
> in this case, and in previous ones too. I'll send it to Kit
> Baum. But more interesting, and probably more useful, is 
> to talk about a direct attack on Socrates' problem 
> so that he gets to see how to do it himself. 
> 
> The "find a match" problem here has this flavour: for 
> different values of -var1-, how many values of -var2- 
> are the same? They can be anywhere in the dataset,
> unless you want to slap on -if- or -in- restrictions. 
> 
> There is going to be a loop over the distinct values
> in my solutions. Each time round the loop I am going
> to do a -count-, and put the result into a variable
> in the right place(s). To do that I need to have a 
> variable to put it in. 
> 
> gen long count = 0 
> 
> initialises a counter variable. The -long- is paranoid, 
> just in case the counts get really big. Initialising 
> it to missing is another good way. 
> 
> For toy examples, I can use -levelsof- confidently. 
> In Socrates' case, -var1- and -var2- are both string, 
> so let's focus on that situation. 
> 
> levelsof var1, local(levels) 
> 
> puts the distinct values into a local macro. 
> 
> quietly foreach l of local levels { 
> 	count if `"`l'"' == var2 
> 	replace count = r(N) if var1 == `"`l'"' 
> } 
> 
> That's a first solution. I slapped on compound
> double quotes `" "' just in case there are double 
> quotes lurking in the strings. That's paranoid too, 
> but does no harm. Just because you're paranoid
> doesn't mean the data aren't trying to get you. 
> 
> Now this pivots on both variables being string. Also, 
> in a industrial-strength solution, you wouldn't want
> to rely on all the distinct values fitting into a macro, 
> so -levelsof- may be set on one side. One thing we 
> can always do is map the distinct values to successive
> integers: 
> 
> egen group = group(var1) 
> su group, meanonly 
> local ngroup = r(max) 
> 
> -egen, group()- maps the distinct values of -var1- to the 
> integers 1,...,#groups; and we can retrieve #groups by a 
> -summarize- and then peeking at the saved results. 
> Initialise as before: 
> 
> gen long count = 0 
> 
> Another variable will come in useful, holding the 
> observation numbers: 
> 
> gen long obs = _n 
> 
> qui forval i = 1/`ngroup' { 
> 	su obs if group == `i', meanonly 
> 	local first = r(min) 
> 	count if var1[`first'] == var2 
> 	replace count = r(N) if group == `i' 
> } 
> 
> The loop uses a look-up technique. When we 
> are focusing on -group == 1-, for example, how 
> we know what value of -var1- we are dealing with? 
> (By construction, -var1- is constant for each 
> distinct value of -group- -- we set up a one-to-one
> mapping -- but what is that constant?) Notice that 
> it is not general enough to go 
> 
> 	su var1 if group == `i' 
> 
> and look at the saved results, because in general
> -var1- could be a string (and it is in Socrates' 
> example). We have to be one step more devious. 
> We just need to find the observation number for any 
> observation in a particular group, and then we can 
> get at the corresponding value of -var1-. That 
> is where the -obs- variable comes in useful. 
> There are two saved results that will work, the
> minimum or the maximum, and you can choose. (The 
> mean won't work in general: consider, for example, 
> a group with just two representatives, in observation
> 8 and observation 10: the mean at 9 does not 
> identify a representative.) 
> 
> So here is some code for Socrates' example: 
> 
> egen group = group(owner) 
> su group, meanonly 
> local ngroup = r(max) 
> gen long match = 0 
> gen long obs = _n 
> qui forval i = 1/`ngroup' { 
> 	su obs if group == `i', meanonly 
> 	local first = r(min) 
> 	count if owner[`first'] == inter  
> 	replace match = r(N) if group == `i' 
> } 
> 
> Nick 
> n.j.cox@durham.ac.uk 
> 
> Socrates Mokkas
>  
> > I seem to have a problem with the command fndmtch2.
> > My data is a huge sample of companies. They have the form of:
> > 
> > Firms	Inter	Owner	match
> > c	r	g	0
> > c	r	t	1
> > b	t	r	1
> > 
> > I want find whether companies that are "Owners" are included 
> > in the category 
> > of "Inter" also. I run the command fndmtch2 which gives me 
> > the variable "match"
> > The command I run is:
> > fndmtch2  Owner Inter, generate(match3) count miss
> > 
> > What I do not understand is why isn't match=2 for the case of 
> > the 3rd observation since the element of "r" can be met twice 
> > (1st and 2nd observation) in the "Inter" variable. Thank you 
> > very much!
> 
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
> 
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index