Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: summarizing data for each panel over chosen time windows


From   R Zhang <[email protected]>
To   [email protected]
Subject   st: summarizing data for each panel over chosen time windows
Date   Mon, 17 Mar 2014 22:58:56 -0400

Dear all,

I have a 17 million observation panel data (firm year combination). I
am creating a count for past five years for each firm. My original
posting was
http://www.stata.com/statalist/archive/2014-03/msg00215.html

please also refer to Nick's response. His coding works just fine for
the hypothetical data I posted.

input ///
year       str2 firmid    patentID              citedID
1995      "AA"           100001            100002
1995      "AA"           100001            100003
1995      "AA"           100001            100004
1994      "AA"           110001            100002
1994     "AA"           110001            100005
1994     "AA"           110001            120001
1993      "AA"           120001            100006
1993      "AA"           120001            100007
1992      "AA"           130001            100008
1992      "AA"           130001            100009
1991      "AA"           140001            100010
1991      "AA"           140001            100011
1989     "AA"           140001            100011
1988     "AA"           140001            100011
1995      "BB"           100001            100002
1995      "BB"           100001            100003
1995      "BB"           100001            100004
1994      "BB"           110001            100002
1994     "BB"           110001            100005
1994     "BB"           110001            120001
1993      "BB"           120001            100006
1993      "BB"           120001            100007
1992      "BB"           130001            100008
1992      "BB"           130001            100009
1991      "BB"           140001            100010
1991      "BB"           140001            100011
end

the issue I have now is the real data has 17 million observations. The
computer ran for several days, and a sudden shutdown, I have to rerun
the program, and it is still going.

My question is : should I output the data in batch to prevent the
discontinuation of the program due to unexpected computer shutdown?
What is a good practice when you run a huge dataset ?

Any suggestions would be greatly appreciated !!!

-Rochelle
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index