Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Re: how to search every observation of one variable in another variable


From   ibrahim bostan <[email protected]>
To   [email protected]
Subject   Re: st: Re: how to search every observation of one variable in another variable
Date   Thu, 13 Jun 2013 20:08:57 -0400

Joseph, Sergiy, Robert,
codes worked just fine
thanks a lot!!
IB

On Thu, Jun 13, 2013 at 12:02 PM, Robert Picard <[email protected]> wrote:
> There are two issues that are not illustrated in the OP's sample data.
> First, each patent may cite more than one patent. Also, there are
> patents that are owned jointly. Depending on the structure of the real
> data, this could lead to multiple records per patent identifier. This
> complicates the problem of identifying if a patent is citing a patent
> from the same owner. Here's a solution that uses -joinby- instead of
> -merge- to match the cited patents to their owner(s)
>
> * -------------- begin example ----------------------------
> clear
> input cited patent str1 owner
>  10 20 a
>  11 20 a
>  11 21 a
>  11 21 b
>  11 21 d
>  21 22 a
>  20 23 a
>  20 24 b
>  24 25 b
>  25 26 b
>  1 27 c
>  3 28 c
>  5 29 c
> end
>
> * a patent may cite more than one other patent. a patent
> * may also have joint ownership
> sort  patent owner cited
> by patent owner: gen N = _N
> list, sepby(patent) noobs
>
> * make a database of patent ownership. adjust variable
> * names to merge back with the cited identifiers.
> preserve
> keep patent owner
> sort patent owner
> by patent owner: keep if _n == 1
> list , sepby(patent)
> rename (patent owner) (cited cowner)
> tempfile cowners
> qui save "`cowners'"
> restore
>
> * since we have multiple owners per patent, we want
> * to avoid a m:m merge; use -joinby- instead to form
> * all pairwise combinations
> joinby cited using "`cowners'", unmatched(master)
> drop _merge
>
> * flag all observations where the owner of the cited patent
> * is the same as the owner of the citing patent
> gen indicator = cowner == owner
>
> * if the cited patent has more than one owner, then
> * note the frequency of the owner match(es). reduce to one
> * observation to recover the original observation count.
> sort patent owner cited indicator
> by patent owner cited: gen ind_freq = sum(indicator) / _N
> by patent owner cited: keep if _n == _N
>
> list, sepby(patent) noobs
> * -------------- end example ------------------------------
>
>
> On Thu, Jun 13, 2013 at 8:16 AM, Joseph Coveney <[email protected]> wrote:
>> ibrahim bostan wrote:
>>
>> the code you gave did not work because citing patent no is not unique
>> identifier, can it be fixed?
>>
>> it gave this error;
>> "variable citing_pt_no does not uniquely identify observations in the
>> using data"
>>
>>
>> --------------------------------------------------------------------------------
>>
>> I assume that you're referring to the result of the -isid- command.  Put the
>> following just before it:
>>
>>     contract citing_pt_no patent_owner
>>
>> and then proceed with -isid-.
>>
>> As  Sergiy Radyakin shows later in the thread, you don't need to decompose the
>> dataset into the two components and then reconstruct it.  You can just
>> -merge-the one back into the original dataset.  Unless you're confident that
>> there aren't any errors in the dataset (such as a citing patent's having one
>> owner early in the dataset and by accident having another owner later on in the
>> dataset), I recommend applying the constraints and checks (-contract-, -isid-,
>> -merge 1:... assert()-, etc.) that I showed.
>>
>> Also, unless your dataset has a dummy entry of each cited patent's citing itself
>> in order to assure that every cited patent has an owner, you might want to
>> consider setting the indicator variable to .u (for "Unknown") for observations
>> where the cited-patent owners are "Other" (my code; probably would have been
>> better as "Unknown" and not "Other") or blank (Sergiy's code).  That way you can
>> keep track of citing patents where your dataset doesn't really allow you to say
>> whether the owner owns the cited patent.
>>
>> Joseph Coveney
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index