Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: Re: how to search every observation of one variable in another variable
From 
 
"Joseph Coveney" <[email protected]> 
To 
 
<[email protected]> 
Subject 
 
st: Re: how to search every observation of one variable in another variable 
Date 
 
Thu, 13 Jun 2013 10:58:24 +0900 
ibrahim bostan wrote:
 I have a dataset which have id number of patents applied by a firm and
 the id number of patents cited by this patent. I am trying to have an
 indicator, as below shown, which will equal one if patent applied by
 the firm is citing a patent which is applied by the same firm.
 cited pt_no   citing_pt_no      patent owner            indicator
 10              20                      a                       0
 11              21                      a                       0
 21              22                      a                       1
 20              23                      a                       1
 20              24                      b                       0
 24              25                      b                       1
 25              26                      b                       1
 1               27                      c                       0
 3               28                      c                       0
 5               29                      c                       0
--------------------------------------------------------------------------------
You could try something like that below. 
 -merge- can't . . .  JOIN . . . ON A.one_column = B.another_column, and so the
example involves explicitly forming the Cartesian product followed by the
restriction.
You could try an in-memory hash-table approach (
www.stata.com/statalist/archive/2013-06/msg00569.html  ), too, if your dataset
is small enough.
Joseph Coveney
. version 12.1
. 
. clear *
. set more off
. 
. input cited_pt_no citing_pt_no str1 patent_owner
     cited_p~o  citing_~o  patent_~r
  1.  10 20 a
  2.  11 21 a
  3.  21 22 a
  4.  20 23 a
  5.  20 24 b
  6.  24 25 b
  7.  25 26 b
  8.  1 27 c
  9.  3 28 c
 10.  5 29 c
 11. end
. 
. // Create dataset of citing-patent owners
. preserve
. isid citing_pt_no
. rename patent_owner citing_owner
. list citing*, noobs abbreviate(20)
  +-----------------------------+
  | citing_pt_no   citing_owner |
  |-----------------------------|
  |           20              a |
  |           21              a |
  |           22              a |
  |           23              a |
  |           24              b |
  |-----------------------------|
  |           25              b |
  |           26              b |
  |           27              c |
  |           28              c |
  |           29              c |
  +-----------------------------+
. tempfile tmpfil0
. quietly save `tmpfil0'
. 
. // Create a dataset of cited-patent owners
. restore
. preserve
. contract cited, freq(discard)
. rename cited_pt_no citing_pt_no
. merge 1:1 citing_pt_no using `tmpfil0', nogenerate noreport
. replace cited_pt_no = citing_pt_no
(15 real changes made)
. rename citing_owner cited_owner
. quietly replace cited_owner = "Other" if mi(cited_owner)
. keep cited*
. list , noobs separator(0) abbreviate(20)
  +---------------------------+
  | cited_pt_no   cited_owner |
  |---------------------------|
  |           1         Other |
  |           3         Other |
  |           5         Other |
  |          10         Other |
  |          11         Other |
  |          20             a |
  |          21             a |
  |          24             b |
  |          25             b |
  |          22             a |
  |          23             a |
  |          26             b |
  |          27             c |
  |          28             c |
  |          29             c |
  +---------------------------+
. 
. // Compare cited owner to citing owner
. cross using `tmpfil0'
. quietly save `tmpfil0', replace
. restore
. contract *pt_no, freq(discard)
. merge 1:m citing_pt_no cited_pt_no using `tmpfil0', ///
>   assert(match using) keep(match) nogenerate noreport
. generate byte indicator = cited_owner == citing_owner
. list *_pt_no *_owner indicator, noobs separator(0) abbreviate(20)
  +---------------------------------------------------------------------+
  | cited_pt_no   citing_pt_no   cited_owner   citing_owner   indicator |
  |---------------------------------------------------------------------|
  |          10             20         Other              a           0 |
  |          11             21         Other              a           0 |
  |          21             22             a              a           1 |
  |          20             23             a              a           1 |
  |          20             24             a              b           0 |
  |          24             25             b              b           1 |
  |          25             26             b              b           1 |
  |           1             27         Other              c           0 |
  |           3             28         Other              c           0 |
  |           5             29         Other              c           0 |
  +---------------------------------------------------------------------+
. 
. exit
end of do-file
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/