Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Robert Picard <picard@netbox.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: Re: how to search every observation of one variable in another variable |
Date | Thu, 13 Jun 2013 12:02:04 -0400 |
There are two issues that are not illustrated in the OP's sample data. First, each patent may cite more than one patent. Also, there are patents that are owned jointly. Depending on the structure of the real data, this could lead to multiple records per patent identifier. This complicates the problem of identifying if a patent is citing a patent from the same owner. Here's a solution that uses -joinby- instead of -merge- to match the cited patents to their owner(s) * -------------- begin example ---------------------------- clear input cited patent str1 owner 10 20 a 11 20 a 11 21 a 11 21 b 11 21 d 21 22 a 20 23 a 20 24 b 24 25 b 25 26 b 1 27 c 3 28 c 5 29 c end * a patent may cite more than one other patent. a patent * may also have joint ownership sort patent owner cited by patent owner: gen N = _N list, sepby(patent) noobs * make a database of patent ownership. adjust variable * names to merge back with the cited identifiers. preserve keep patent owner sort patent owner by patent owner: keep if _n == 1 list , sepby(patent) rename (patent owner) (cited cowner) tempfile cowners qui save "`cowners'" restore * since we have multiple owners per patent, we want * to avoid a m:m merge; use -joinby- instead to form * all pairwise combinations joinby cited using "`cowners'", unmatched(master) drop _merge * flag all observations where the owner of the cited patent * is the same as the owner of the citing patent gen indicator = cowner == owner * if the cited patent has more than one owner, then * note the frequency of the owner match(es). reduce to one * observation to recover the original observation count. sort patent owner cited indicator by patent owner cited: gen ind_freq = sum(indicator) / _N by patent owner cited: keep if _n == _N list, sepby(patent) noobs * -------------- end example ------------------------------ On Thu, Jun 13, 2013 at 8:16 AM, Joseph Coveney <stajc2@gmail.com> wrote: > ibrahim bostan wrote: > > the code you gave did not work because citing patent no is not unique > identifier, can it be fixed? > > it gave this error; > "variable citing_pt_no does not uniquely identify observations in the > using data" > > > -------------------------------------------------------------------------------- > > I assume that you're referring to the result of the -isid- command. Put the > following just before it: > > contract citing_pt_no patent_owner > > and then proceed with -isid-. > > As Sergiy Radyakin shows later in the thread, you don't need to decompose the > dataset into the two components and then reconstruct it. You can just > -merge-the one back into the original dataset. Unless you're confident that > there aren't any errors in the dataset (such as a citing patent's having one > owner early in the dataset and by accident having another owner later on in the > dataset), I recommend applying the constraints and checks (-contract-, -isid-, > -merge 1:... assert()-, etc.) that I showed. > > Also, unless your dataset has a dummy entry of each cited patent's citing itself > in order to assure that every cited patent has an owner, you might want to > consider setting the indicator variable to .u (for "Unknown") for observations > where the cited-patent owners are "Other" (my code; probably would have been > better as "Unknown" and not "Other") or blank (Sergiy's code). That way you can > keep track of citing patents where your dataset doesn't really allow you to say > whether the owner owns the cited patent. > > Joseph Coveney > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/