Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Identify observations that appear in a list


From   R Zhang <[email protected]>
To   [email protected]
Subject   Re: st: Identify observations that appear in a list
Date   Fri, 14 Mar 2014 16:34:01 -0400

Hi Nick,

your last post works perfectly. I try to incorporate what I learned
from you in an earlier post

http://www.stata.com/statalist/archive/2014-03/msg00232.html

that is, I tried the code below, I want to use howmany variable to
count how many instances

+++++++++++ CODE Begins+++++
clear

input str5 CustomerIndustry  str5 SupplierIndustry Input


"1000A"    "4000B"    100
"1000A"    "3000A"    200
"1000A"    "3000B"    100
"1000B"    "4000B"    50
"1000B"    "2000A"    8
"4000B"    "3000A"    19
"4000B"    "2000A"    20
end


gen howmany = 0

quie forval i=1/`=_N' {
count if SupplierIndustry =CustomerIndustry [`i]
replace howmany=r(N) in `i'
}

++++++++CODE Ends++++++++

++++++++ERROR (See below) +++++++

quie forval i=1/`=_N' {
type mismatch



what did I do wrong (i.e. causing the error)?

thanks ,

R.

On Fri, Mar 14, 2014 at 12:57 AM, R Zhang <[email protected]> wrote:
> Thank you for being so helpful !!!
>
> Warm regards,
>
> Rochelle
>
> On Thu, Mar 13, 2014 at 7:50 AM, Nick Cox <[email protected]> wrote:
>> This is an FAQ, at least in the sense that this is frequently asked here.
>>
>> One approach is just to -merge- the data with a reduced copy of
>> itself, with the important twist that you -rename- what you want as an
>> identifier.
>>
>> The slogan I use to remind myself of this trick is
>>
>> "-merge- is for finding intersections as well as unions"
>>
>> and you're welcome to pin or write it on a board near you.
>>
>> http://www.stata.com/support/faqs/data-management/group-characteristics-for-subsets/
>> is also relevant.
>>
>> . clear
>>
>> . input str5 CustomerIndustry  str5 SupplierIndustry Input
>>
>>      Custome~y  Supplie~y      Input
>>   1. 1000A    4000B    100
>>   2. 1000A    3000A    200
>>   3. 1000A    3000B    100
>>   4. 1000B    4000B    50
>>   5. 1000B    2000A    8
>>   6. 4000B    3000A    19
>>   7. 4000B    2000A    20
>>   8. 3000A    3000B    18
>>   9. 3000A    3000D    12
>>  10. 2000A    1000D    25
>>  11. end
>>
>> . save tostart
>> file tostart.dta saved
>>
>> . bysort SupplierIndustry: keep if _n == 1
>> (4 observations deleted)
>>
>> . keep SupplierIndustry
>>
>> . rename SupplierIndustry CustomerIndustry
>>
>> . merge 1:m CustomerIndustry using tostart
>>
>>     Result                           # of obs.
>>     -----------------------------------------
>>     not matched                             8
>>         from master                         3  (_merge==1)
>>         from using                          5  (_merge==2)
>>
>>     matched                                 5  (_merge==3)
>>     -----------------------------------------
>>
>> . tab _merge
>>
>>                  _merge |      Freq.     Percent        Cum.
>> ------------------------+-----------------------------------
>>         master only (1) |          3       23.08       23.08
>>          using only (2) |          5       38.46       61.54
>>             matched (3) |          5       38.46      100.00
>> ------------------------+-----------------------------------
>>                   Total |         13      100.00
>>
>> .
>> end of do-file
>>
>> . list if _merge==3
>>
>>      +-------------------------------------------+
>>      | Custom~y   Suppli~y   Input        _merge |
>>      |-------------------------------------------|
>>   2. |    2000A      1000D      25   matched (3) |
>>   3. |    3000A      3000B      18   matched (3) |
>>   6. |    4000B      3000A      19   matched (3) |
>>  12. |    3000A      3000D      12   matched (3) |
>>  13. |    4000B      2000A      20   matched (3) |
>>      +-------------------------------------------+
>>
>> Nick
>> [email protected]
>>
>>
>> On 13 March 2014 02:12, R Zhang <[email protected]> wrote:
>>
>>> I have the following data set (HAVE) (only provide a few observations
>>> as illustration). The input variable gives the dollar input sold by
>>> supplier to customer. You will notice that customer industry 4000B,
>>> 3000A also appear in SupplierIndustry. This indicates that some
>>> industries can be both suppliers and customer.
>>>
>>> +++++++++++++++++++++++
>>>
>>> HAVE
>>>
>>> CustomerIndustry           SupplierIndustry              Input
>>>
>>> 1000A    4000B    100
>>>
>>> 1000A    3000A    200
>>>
>>> 1000A    3000B    100
>>>
>>> 1000B    4000B    50
>>>
>>> 1000B    2000A    8
>>>
>>> 4000B    3000A    19
>>>
>>> 4000B    2000A    20
>>>
>>> 3000A    3000B    18
>>>
>>> 3000A    3000D    12
>>>
>>> 2000A    1000D    25
>>>
>>> +++++++++++++++++++++++
>>>
>>> I want to create a dataset that list all customer industries that are
>>> also supplier industry, i.e., my output shall appear as :
>>>
>>> CustomerIndustry           SupplierIndustry              Input
>>>
>>> 4000B    3000A    19
>>>
>>> 4000B    2000A    20
>>>
>>> 3000A    3000B    18
>>>
>>> 3000A    3000D    12
>>>
>>> 2000A    1000D    25
>>>
>>> I am asking for your help on coding this.
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index