Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | R Zhang <r05zhang@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: summarizing data for each panel over chosen time windows |
Date | Mon, 17 Mar 2014 23:23:45 -0400 |
Dear all, I try to output the data but got an "invalid syntax" After the line of code "save E:\Data\Patents\pat_store, replace" below is sample data, real data 17 million firmid-year combination. PatentID: is the identification number for company AA’s patent, citedID is the identification number of a patent that was cited by the focal patent. I want to generate a dummy that flags the citedID under the following condition: citedID=1 if this patent (e.g. 1995 100002 was firm AA’s own patent filed over the past 5 years, Or 100002 was a patent that was cited by firm AA over the past 5 years). ************* code ************* clear input /// year str2 firmid patentID citedID 1995 "AA" 100001 100002 1995 "AA" 100001 100003 1995 "AA" 100001 100004 1994 "AA" 110001 100002 1994 "AA" 110001 100005 1994 "AA" 110001 120001 1993 "AA" 120001 100006 1993 "AA" 120001 100007 1992 "AA" 130001 100008 1992 "AA" 130001 100009 1991 "AA" 140001 100010 1991 "AA" 140001 100011 1989 "AA" 140001 100011 1988 "AA" 140001 100011 1995 "BB" 100001 100002 1995 "BB" 100001 100003 1995 "BB" 100001 100004 1994 "BB" 110001 100002 1994 "BB" 110001 100005 1994 "BB" 110001 120001 1993 "BB" 120001 100006 1993 "BB" 120001 100007 1992 "BB" 130001 100008 1992 "BB" 130001 100009 1991 "BB" 140001 100010 1991 "BB" 140001 100011 end egen groupid=group(firmid) gen howmany = 0 save E:\Data\Patents\howmany,replace local nfirms=r(max) quie forval n = 1/`nfirms' { use E:\Data\Patents\howmany, clear keep if firmid==`n' local nobs=_N forval i=1/`nobs' { count if (patentID == citedID[`i'] | citedID == citedID[`i']) /// & inrange(year, year[`i']-5, year[`i']-1) replace howmany = r(N) in `i' } append using E:\Data\Patents\pat_store save E:\Data\Patents\pat_store, replace ************* code ************* I am trying to save the data after each loop since there will be millions of loops in case a computer shutdown I have to start over. But my code may not be efficient, and I got an "invalid syntax" After the line of code "save E:\Data\Patents\pat_store, replace" -Rochelle On Mon, Mar 17, 2014 at 10:58 PM, R Zhang <r05zhang@gmail.com> wrote: > Dear all, > > I have a 17 million observation panel data (firm year combination). I > am creating a count for past five years for each firm. My original > posting was > http://www.stata.com/statalist/archive/2014-03/msg00215.html > > please also refer to Nick's response. His coding works just fine for > the hypothetical data I posted. > > input /// > year str2 firmid patentID citedID > 1995 "AA" 100001 100002 > 1995 "AA" 100001 100003 > 1995 "AA" 100001 100004 > 1994 "AA" 110001 100002 > 1994 "AA" 110001 100005 > 1994 "AA" 110001 120001 > 1993 "AA" 120001 100006 > 1993 "AA" 120001 100007 > 1992 "AA" 130001 100008 > 1992 "AA" 130001 100009 > 1991 "AA" 140001 100010 > 1991 "AA" 140001 100011 > 1989 "AA" 140001 100011 > 1988 "AA" 140001 100011 > 1995 "BB" 100001 100002 > 1995 "BB" 100001 100003 > 1995 "BB" 100001 100004 > 1994 "BB" 110001 100002 > 1994 "BB" 110001 100005 > 1994 "BB" 110001 120001 > 1993 "BB" 120001 100006 > 1993 "BB" 120001 100007 > 1992 "BB" 130001 100008 > 1992 "BB" 130001 100009 > 1991 "BB" 140001 100010 > 1991 "BB" 140001 100011 > end > > the issue I have now is the real data has 17 million observations. The > computer ran for several days, and a sudden shutdown, I have to rerun > the program, and it is still going. > > My question is : should I output the data in batch to prevent the > discontinuation of the program due to unexpected computer shutdown? > What is a good practice when you run a huge dataset ? > > Any suggestions would be greatly appreciated !!! > > -Rochelle > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/