Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Identify observations that appear in a list

From	Nick Cox <[email protected]>
To	"[email protected]" <[email protected]>
Subject	Re: st: Identify observations that appear in a list
Date	Thu, 13 Mar 2014 11:50:26 +0000

This is an FAQ, at least in the sense that this is frequently asked here.

One approach is just to -merge- the data with a reduced copy of
itself, with the important twist that you -rename- what you want as an
identifier.

The slogan I use to remind myself of this trick is

"-merge- is for finding intersections as well as unions"

and you're welcome to pin or write it on a board near you.

http://www.stata.com/support/faqs/data-management/group-characteristics-for-subsets/
is also relevant.

. clear

. input str5 CustomerIndustry  str5 SupplierIndustry Input

     Custome~y  Supplie~y      Input
  1. 1000A    4000B    100
  2. 1000A    3000A    200
  3. 1000A    3000B    100
  4. 1000B    4000B    50
  5. 1000B    2000A    8
  6. 4000B    3000A    19
  7. 4000B    2000A    20
  8. 3000A    3000B    18
  9. 3000A    3000D    12
 10. 2000A    1000D    25
 11. end

. save tostart
file tostart.dta saved

. bysort SupplierIndustry: keep if _n == 1
(4 observations deleted)

. keep SupplierIndustry

. rename SupplierIndustry CustomerIndustry

. merge 1:m CustomerIndustry using tostart

    Result                           # of obs.
    -----------------------------------------
    not matched                             8
        from master                         3  (_merge==1)
        from using                          5  (_merge==2)

    matched                                 5  (_merge==3)
    -----------------------------------------

. tab _merge

                 _merge |      Freq.     Percent        Cum.
------------------------+-----------------------------------
        master only (1) |          3       23.08       23.08
         using only (2) |          5       38.46       61.54
            matched (3) |          5       38.46      100.00
------------------------+-----------------------------------
                  Total |         13      100.00

.
end of do-file

. list if _merge==3

     +-------------------------------------------+
     | Custom~y   Suppli~y   Input        _merge |
     |-------------------------------------------|
  2. |    2000A      1000D      25   matched (3) |
  3. |    3000A      3000B      18   matched (3) |
  6. |    4000B      3000A      19   matched (3) |
 12. |    3000A      3000D      12   matched (3) |
 13. |    4000B      2000A      20   matched (3) |
     +-------------------------------------------+

Nick
[email protected]


On 13 March 2014 02:12, R Zhang <[email protected]> wrote:

> I have the following data set (HAVE) (only provide a few observations
> as illustration). The input variable gives the dollar input sold by
> supplier to customer. You will notice that customer industry 4000B,
> 3000A also appear in SupplierIndustry. This indicates that some
> industries can be both suppliers and customer.
>
> +++++++++++++++++++++++
>
> HAVE
>
> CustomerIndustry           SupplierIndustry              Input
>
> 1000A    4000B    100
>
> 1000A    3000A    200
>
> 1000A    3000B    100
>
> 1000B    4000B    50
>
> 1000B    2000A    8
>
> 4000B    3000A    19
>
> 4000B    2000A    20
>
> 3000A    3000B    18
>
> 3000A    3000D    12
>
> 2000A    1000D    25
>
> +++++++++++++++++++++++
>
> I want to create a dataset that list all customer industries that are
> also supplier industry, i.e., my output shall appear as :
>
> CustomerIndustry           SupplierIndustry              Input
>
> 4000B    3000A    19
>
> 4000B    2000A    20
>
> 3000A    3000B    18
>
> 3000A    3000D    12
>
> 2000A    1000D    25
>
> I am asking for your help on coding this.
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: Identify observations that appear in a list
  - From: R Zhang <[email protected]>

References:
- st: Identify observations that appear in a list
  - From: R Zhang <[email protected]>

Prev by Date: RE: st: first stage results using ivregress with vce(cluster)
Next by Date: st: do loops and mata
Previous by thread: st: Identify observations that appear in a list
Next by thread: Re: st: Identify observations that appear in a list
Index(es):
- Date
- Thread