Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: summarizing data for each panel over chosen time windows


From   R Zhang <[email protected]>
To   [email protected]
Subject   Re: st: summarizing data for each panel over chosen time windows
Date   Mon, 17 Mar 2014 23:23:45 -0400

Dear all,
I try to output the data but got an "invalid syntax" After the line of
code "save E:\Data\Patents\pat_store, replace"

below is sample data, real data 17 million firmid-year combination.

PatentID: is the identification number for company AA’s patent,
citedID is the identification number of a patent that was cited by the
focal patent. I want to generate a dummy that flags the citedID under
the following condition:

citedID=1 if this patent (e.g. 1995 100002 was firm AA’s own patent
filed over the past 5 years, Or 100002 was a patent that was cited by
firm AA over the past 5 years).



*************  code  *************

clear
input ///
year       str2 firmid    patentID              citedID
1995      "AA"           100001            100002
1995      "AA"           100001            100003
1995      "AA"           100001            100004
1994      "AA"           110001            100002
1994     "AA"           110001            100005
1994     "AA"           110001            120001
1993      "AA"           120001            100006
1993      "AA"           120001            100007
1992      "AA"           130001            100008
1992      "AA"           130001            100009
1991      "AA"           140001            100010
1991      "AA"           140001            100011
1989     "AA"           140001            100011
1988     "AA"           140001            100011
1995      "BB"           100001            100002
1995      "BB"           100001            100003
1995      "BB"           100001            100004
1994      "BB"           110001            100002
1994     "BB"           110001            100005
1994     "BB"           110001            120001
1993      "BB"           120001            100006
1993      "BB"           120001            100007
1992      "BB"           130001            100008
1992      "BB"           130001            100009
1991      "BB"           140001            100010
1991      "BB"           140001            100011
end

egen groupid=group(firmid)

gen howmany = 0
save E:\Data\Patents\howmany,replace

local nfirms=r(max)

quie
forval n = 1/`nfirms' {
use E:\Data\Patents\howmany, clear
keep if firmid==`n'
local nobs=_N
forval i=1/`nobs' {
count if  (patentID == citedID[`i'] | citedID == citedID[`i']) ///
      & inrange(year, year[`i']-5, year[`i']-1)
replace howmany = r(N) in `i'
}

append using E:\Data\Patents\pat_store

save E:\Data\Patents\pat_store, replace

*************  code  *************

I am trying to save the data after each loop since there will be
millions of loops in case a computer shutdown I have to start over.

But my code may not be efficient, and I got an "invalid syntax" After
the line of code "save E:\Data\Patents\pat_store, replace"


-Rochelle

On Mon, Mar 17, 2014 at 10:58 PM, R Zhang <[email protected]> wrote:
> Dear all,
>
> I have a 17 million observation panel data (firm year combination). I
> am creating a count for past five years for each firm. My original
> posting was
> http://www.stata.com/statalist/archive/2014-03/msg00215.html
>
> please also refer to Nick's response. His coding works just fine for
> the hypothetical data I posted.
>
> input ///
> year       str2 firmid    patentID              citedID
> 1995      "AA"           100001            100002
> 1995      "AA"           100001            100003
> 1995      "AA"           100001            100004
> 1994      "AA"           110001            100002
> 1994     "AA"           110001            100005
> 1994     "AA"           110001            120001
> 1993      "AA"           120001            100006
> 1993      "AA"           120001            100007
> 1992      "AA"           130001            100008
> 1992      "AA"           130001            100009
> 1991      "AA"           140001            100010
> 1991      "AA"           140001            100011
> 1989     "AA"           140001            100011
> 1988     "AA"           140001            100011
> 1995      "BB"           100001            100002
> 1995      "BB"           100001            100003
> 1995      "BB"           100001            100004
> 1994      "BB"           110001            100002
> 1994     "BB"           110001            100005
> 1994     "BB"           110001            120001
> 1993      "BB"           120001            100006
> 1993      "BB"           120001            100007
> 1992      "BB"           130001            100008
> 1992      "BB"           130001            100009
> 1991      "BB"           140001            100010
> 1991      "BB"           140001            100011
> end
>
> the issue I have now is the real data has 17 million observations. The
> computer ran for several days, and a sudden shutdown, I have to rerun
> the program, and it is still going.
>
> My question is : should I output the data in batch to prevent the
> discontinuation of the program due to unexpected computer shutdown?
> What is a good practice when you run a huge dataset ?
>
> Any suggestions would be greatly appreciated !!!
>
> -Rochelle
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index