Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Re: how to search every observation of one variable in another variable

From   Robert Picard <>
Subject   Re: st: Re: how to search every observation of one variable in another variable
Date   Thu, 13 Jun 2013 12:02:04 -0400

There are two issues that are not illustrated in the OP's sample data.
First, each patent may cite more than one patent. Also, there are
patents that are owned jointly. Depending on the structure of the real
data, this could lead to multiple records per patent identifier. This
complicates the problem of identifying if a patent is citing a patent
from the same owner. Here's a solution that uses -joinby- instead of
-merge- to match the cited patents to their owner(s)

* -------------- begin example ----------------------------
input cited patent str1 owner
 10 20 a
 11 20 a
 11 21 a
 11 21 b
 11 21 d
 21 22 a
 20 23 a
 20 24 b
 24 25 b
 25 26 b
 1 27 c
 3 28 c
 5 29 c

* a patent may cite more than one other patent. a patent
* may also have joint ownership
sort  patent owner cited
by patent owner: gen N = _N
list, sepby(patent) noobs

* make a database of patent ownership. adjust variable
* names to merge back with the cited identifiers.
keep patent owner
sort patent owner
by patent owner: keep if _n == 1
list , sepby(patent)
rename (patent owner) (cited cowner)
tempfile cowners
qui save "`cowners'"

* since we have multiple owners per patent, we want
* to avoid a m:m merge; use -joinby- instead to form
* all pairwise combinations
joinby cited using "`cowners'", unmatched(master)
drop _merge

* flag all observations where the owner of the cited patent
* is the same as the owner of the citing patent
gen indicator = cowner == owner

* if the cited patent has more than one owner, then
* note the frequency of the owner match(es). reduce to one
* observation to recover the original observation count.
sort patent owner cited indicator
by patent owner cited: gen ind_freq = sum(indicator) / _N
by patent owner cited: keep if _n == _N

list, sepby(patent) noobs
* -------------- end example ------------------------------

On Thu, Jun 13, 2013 at 8:16 AM, Joseph Coveney <> wrote:
> ibrahim bostan wrote:
> the code you gave did not work because citing patent no is not unique
> identifier, can it be fixed?
> it gave this error;
> "variable citing_pt_no does not uniquely identify observations in the
> using data"
> --------------------------------------------------------------------------------
> I assume that you're referring to the result of the -isid- command.  Put the
> following just before it:
>     contract citing_pt_no patent_owner
> and then proceed with -isid-.
> As  Sergiy Radyakin shows later in the thread, you don't need to decompose the
> dataset into the two components and then reconstruct it.  You can just
> -merge-the one back into the original dataset.  Unless you're confident that
> there aren't any errors in the dataset (such as a citing patent's having one
> owner early in the dataset and by accident having another owner later on in the
> dataset), I recommend applying the constraints and checks (-contract-, -isid-,
> -merge 1:... assert()-, etc.) that I showed.
> Also, unless your dataset has a dummy entry of each cited patent's citing itself
> in order to assure that every cited patent has an owner, you might want to
> consider setting the indicator variable to .u (for "Unknown") for observations
> where the cited-patent owners are "Other" (my code; probably would have been
> better as "Unknown" and not "Other") or blank (Sergiy's code).  That way you can
> keep track of citing patents where your dataset doesn't really allow you to say
> whether the owner owns the cited patent.
> Joseph Coveney
> *
> *   For searches and help try:
> *
> *
> *
*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index