Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Re: how to search every observation of one variable in another variable


From   "Joseph Coveney" <stajc2@gmail.com>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: Re: how to search every observation of one variable in another variable
Date   Thu, 13 Jun 2013 10:58:24 +0900

ibrahim bostan wrote:

 I have a dataset which have id number of patents applied by a firm and
 the id number of patents cited by this patent. I am trying to have an
 indicator, as below shown, which will equal one if patent applied by
 the firm is citing a patent which is applied by the same firm.



 cited pt_no   citing_pt_no      patent owner            indicator
 10              20                      a                       0
 11              21                      a                       0
 21              22                      a                       1
 20              23                      a                       1
 20              24                      b                       0
 24              25                      b                       1
 25              26                      b                       1
 1               27                      c                       0
 3               28                      c                       0
 5               29                      c                       0

--------------------------------------------------------------------------------

You could try something like that below. 

 -merge- can't . . .  JOIN . . . ON A.one_column = B.another_column, and so the
example involves explicitly forming the Cartesian product followed by the
restriction.

You could try an in-memory hash-table approach (
www.stata.com/statalist/archive/2013-06/msg00569.html  ), too, if your dataset
is small enough.

Joseph Coveney

. version 12.1

. 
. clear *

. set more off

. 
. input cited_pt_no citing_pt_no str1 patent_owner

     cited_p~o  citing_~o  patent_~r
  1.  10 20 a
  2.  11 21 a
  3.  21 22 a
  4.  20 23 a
  5.  20 24 b
  6.  24 25 b
  7.  25 26 b
  8.  1 27 c
  9.  3 28 c
 10.  5 29 c
 11. end

. 
. // Create dataset of citing-patent owners
. preserve

. isid citing_pt_no

. rename patent_owner citing_owner

. list citing*, noobs abbreviate(20)

  +-----------------------------+
  | citing_pt_no   citing_owner |
  |-----------------------------|
  |           20              a |
  |           21              a |
  |           22              a |
  |           23              a |
  |           24              b |
  |-----------------------------|
  |           25              b |
  |           26              b |
  |           27              c |
  |           28              c |
  |           29              c |
  +-----------------------------+

. tempfile tmpfil0

. quietly save `tmpfil0'

. 
. // Create a dataset of cited-patent owners
. restore

. preserve

. contract cited, freq(discard)

. rename cited_pt_no citing_pt_no

. merge 1:1 citing_pt_no using `tmpfil0', nogenerate noreport

. replace cited_pt_no = citing_pt_no
(15 real changes made)

. rename citing_owner cited_owner

. quietly replace cited_owner = "Other" if mi(cited_owner)

. keep cited*

. list , noobs separator(0) abbreviate(20)

  +---------------------------+
  | cited_pt_no   cited_owner |
  |---------------------------|
  |           1         Other |
  |           3         Other |
  |           5         Other |
  |          10         Other |
  |          11         Other |
  |          20             a |
  |          21             a |
  |          24             b |
  |          25             b |
  |          22             a |
  |          23             a |
  |          26             b |
  |          27             c |
  |          28             c |
  |          29             c |
  +---------------------------+

. 
. // Compare cited owner to citing owner
. cross using `tmpfil0'

. quietly save `tmpfil0', replace

. restore

. contract *pt_no, freq(discard)

. merge 1:m citing_pt_no cited_pt_no using `tmpfil0', ///
>   assert(match using) keep(match) nogenerate noreport

. generate byte indicator = cited_owner == citing_owner

. list *_pt_no *_owner indicator, noobs separator(0) abbreviate(20)

  +---------------------------------------------------------------------+
  | cited_pt_no   citing_pt_no   cited_owner   citing_owner   indicator |
  |---------------------------------------------------------------------|
  |          10             20         Other              a           0 |
  |          11             21         Other              a           0 |
  |          21             22             a              a           1 |
  |          20             23             a              a           1 |
  |          20             24             a              b           0 |
  |          24             25             b              b           1 |
  |          25             26             b              b           1 |
  |           1             27         Other              c           0 |
  |           3             28         Other              c           0 |
  |           5             29         Other              c           0 |
  +---------------------------------------------------------------------+

. 
. exit

end of do-file

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index