Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: STATA code for identifying similar observations within group id


From   "Chirantan Chatterjee" <[email protected]>
To   [email protected]
Subject   st: STATA code for identifying similar observations within group id
Date   Sat, 31 May 2008 16:03:48 -0400 (EDT)

Hi, 

I am a graduate student at Carnegie Mellon and was working over the summers on a dataset of European Patents, patents that have atleast one inventor (var: inv_cou) belonging to an Eastern European country. Had some problems thus this email. 

Allow me to quickly sketch out the data for you. There are 21 such EE countries, identified with Intenational Patent Classification codes. Thus for the patent EP1701504, there are 5 inventors, 4 of which are German, identified by the string "DE" in the variable, inv_cou, and the 5th is a Polish inventor, identified by the string "PL" on the obs for the variable inv_cou. May i also note here that, apart from EE countries, inventors for a multi-inventor patent also come in this dataset from OECD countries, again identified by IPC codes on the variable, DE for Germany, MX for Mexico, KR for Korea and likewise. Another point to be noted is that the observations are not uniquely identified by the patent identifier, pub_nbr or publication number. Thus for the patent EP1701504, EP1701504 is the value under Pub_nbr which is stacked one upon another for each of its 5 inventors. There are some other characteristics too for a patent that come in the dataset, have attached a sample in t!
 
 he email on a .xls. If you cant see the .xls, here is a shortened sketch for the data, for the patent EP0000287, identified by Pub_nbr the patent identifier, it has two inventors stacked one upon another. 
-- 
Pub_nbr    inv_name    inv_city inv_cou inv_total	app_city app_cou app_name
EP0000287  Szab�, S�ndor Budapest XIHU	5	Budapest  HU	AUT�IPARI 
EP0000287  Vad, L�szl�   Visegr�d	HU	5	Budapest  HU	Ikarus 

My objective is to do the following and perhaps you could suggest some relevant steps. I want to create dummy variables telling me whether the inventors that created the patent are:

a. Located in the same country.
b. Resident in multiple countries, but all of the countries are EE countries. (Have the EE country code set)
c. Resident in multiple countries, and atleast one of the countries is an OECD member state. (Have the OECD country code set)
d. & a dummy variable that tells me when the patent applicant is located in an OECD country, app_cou identifies applicant country like for inventor countries as you will see in the attached sample of the data. 

What is the best way to create each of the four dummy variables? Here is a thought, perhaps you can correct them for realisable results. I wanted to sort by Pub_nbr and then create a dummy, to say if _n=_n-1 & _n=_n+1 by Pub_nbr. Another might be to use either the 'distinct' or 'duplicates' command, but I am not exactly clear how to make this work for each of the above 4 requirements. Could you suggest something more specifically? Many thanks for your kind attention. 

With Best Wishes, Chirantan.
PhD Student Policy & Mgmt, Heinz College-Carnegie Mellon.








> I mean creating variables that are the first differences of the variables
> of interest: i.e.
> 
> dlnq22 = D.lnq22 dnmds15a2 = D.nmds15a2
> 
> You can also take differences of the time dummies; you will have to drop
> one more of them in the FD model.
> 
> This is the model that one could fit in a linear fashion with xtivreg, fd
> (using a nonsensical list of excluded instruments and the regress option).
> Like the FE transformation, the FD transformation removes firm-specific
> unobserved heterogeneity.
> 
> In what you have sent me, there appear to be blank lines in between each
> line of the command: i.e.
> 
> nl (LNQ22 = LN(1+ {B=1.443}*NRNDS15A2) + {a1=-0.340}*d90 + {a2=-0.281} 
> *d91 + {a3=0.0}*d92 + {a4=-0.289}*d93 + {a5=-0.143}*d94 + {a6=-0.115} *d95
> + {a7=-0.414}*d96 + {a8=-0.531}*d97 + {a9=-0.502}*d98 + {a10=-0.389}*d99 +
> {a11=-0.358}*d00 + {a12=-0.573}*d01 + {a13=-0.557} *d02 + {a14=-0.604}*d03
> + {a15=-0.524}*d04 + {a16=-0.287}*d05 /// + {s1=-0.275}*firmid_dummy1 +
> {s2=-0.120}*firmid_dummy2 +{s3=-0.670} *firmid_dummy3 +
> {s4=-0.251}*firmid_dummy4 + {s5=-0.288} *firmid_dummy5 + ///
> 
> {s6=1.042}*firmid_dummy6 + {s7=-0.517}*firmid_dummy7 + {s8=-.245} 
> *firmid_dummy8 + {s9=0.045}*firmid_dummy9 + {s10=-1.481} *firmid_dummy10 +
> ///
> 
> {s11=-0.147}*firmid_dummy11 + {s12=-0.761}*firmid_dummy12 + 
> {s13=-0.106}*firmid_dummy13 + {s14=-0.0823}*firmid_dummy14 + 
> {s15=-0.112}*firmid_dummy15 + ///
> 
> You might try removing those blank lines.  But I do not think it should be
> necessary (and certainly not desirable) to use all of those firm dummies.
> 
> 
> Kit Baum, Boston College Economics http://ideas.repec.org/e/pba1.html An
> Introduction to Modern Econometrics Using Stata: 
> http://www.stata-press.com/books/imeus.html
> 
> 
> On Dec 10, 2006, at 11:33 PM, Chirantan Chatterjee wrote:
> 
>> Dear Professor Baum,
>> 
>> I am not sure what you mean by: "using the first difference 
>> transformation manually".
>> 
>> But i have attached a do-file for you to illustrate what i am doing.
>> 
>> lnq22 = is my LHS variable nrnds15a2 = is my RHS variable
>> 
>> d90 to d05 are my time dummies. there are 320 firms - so i have 
>> generated 320 firm dummies, firmid_dummy1-firmid_dummy320.
>> 
>> i use for starting value of all their parameters - coefficient estimates
>> from an ols regression with fixed effects.
>> 
>> whenever i run this command, as in the dofile, errors listed are either:
>>  a. varlist not allowed or b. parentheses unbalanced.
>> 
>> I have succesfully executed this command only with time dummies. Now
>> that i want to absorb firm effects using firm dummies, my sense is with
>> so many lines in this command, something is going wrong in the line
>> ssomewhere, besides this seems like a rather crude method too...
>> 
>> perhaps you can suggest something simpler...if you think, i can call you
>> tomorrow sometime and take your advice.
>> 
>> thanks for replying, appreciate much your guidance...
>> 
>> Best Wishes, Chirantan.
>> 
>> 
>> 
>> 
>>> Chiratan,
>>> 
>>> Is there a reason why you can't use the first difference 
>>> transformation (manually) to remove the firm effects?
>>> 
>>> 
>>> Kit Baum, Boston College Economics http://ideas.repec.org/e/ pba1.html
>>> An Introduction to Modern Econometrics Using Stata: 
>>> http://www.stata-press.com/books/imeus.html
>>> 
>>> 
>>> On Dec 6, 2006, at 10:10 PM, Chirantan Chatterjee wrote:
>>> 
>>>> Hello,
>>>> 
>>>> I am a second year phd student in economics and public policy and i
>>>>  landed by your email address at the North-American 2006 user group
>>>> pages of www.stata.com
>>>> 
>>>> I had a query on how to use, firm fixed effects for my non-linear 
>>>> regressions given the dataset am working on. I have a panel of 320
>>>> firms with 16 year data, if you will please allow me to say more.
>>>> 
>>>> Essentially, i am following this approach for non-linear 
>>>> regressions:
>>>> 
>>>> nl (lnY = ln(1+ {b=1.443}*X) + {various as=specified values}*(time 
>>>> dummies))
>>>> 
>>>> --- for absorbing the time effects on my dataset.
>>>> 
>>>> Now i want to also absorb the firm effects, 320 firms so i was 
>>>> trying to create dummies for the firms, and introduce that into the 
>>>> equation ala the time dummies...
>>>> 
>>>> But that isnt working with the nl command.
>>>> 
>>>> Perhaps you can help, by suggesting a specific variation for 
>>>> introducing firm fixed effects in a nonlinear regression command on
>>>> my dataset..
>>>> 
>>>> I very much appreciate your good time and attention, should it be 
>>>> needed i can call up and explain further.
>>>> 
>>>> 
>>>> Thanks, Chirantan
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> -- Doctoral Student H. John Heinz III School of Public Policy & 
>>>> Management Carnegie Mellon University 5000 Forbes Avenue Pittsburgh,
>>>> PA 15213-3890 Tel Office: 001-412-268-1663 Email: here or at 
>>>> [email protected] --
>>>> 
>>>> "If you are irritated by every rub, how will you be polished?." - 
>>>> Jalaluddin Rumi, a Sufi poet.
>>>> 
>>>> 
>>>> --
>>>> 
>>>> --
>>>> 
>>>> 
>>>> --
>>>> 
>>>> 
>>>> --
>>>> 
>>>> 
>>>> --
>>>> 
>>> 
>>> 
>>> 
>> 
>> 
>> -- <for professor.do>
> 
> 
> 


-- 

-- 

Attachment: sample.xls
Description: MS-Excel spreadsheet




© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index