Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Nick Cox <njcoxstata@gmail.com> |
To | "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |
Subject | Re: st: sxpose -not possible; would exceed present limit on number of variables |
Date | Thu, 20 Feb 2014 00:50:54 +0000 |
Simplified code: gen j = substr(v1, -1, 1) if v1 != "EntityID" gen which = subinstr(v1, j, "", 1) if v1 != "EntityID" gen EntityID = v2 if v1 == "EntityID" replace EntityID = EntityID[_n-1] if missing(EntityID) drop if v1 == "EntityID" drop v1 reshape wide v2, i(EntityID j) j(which) string renpfix v2 expand endyr - begyr + 1 rename begyr year bysort EntityID j : replace year = year[_n-1] + 1 if _n > 1 drop endyr j l Nick njcoxstata@gmail.com On 20 February 2014 00:34, Nick Cox <njcoxstata@gmail.com> wrote: > In your sample data, blocks *3 *4 *5 seem to be the same information repeated. > > With the sample data, this is code to play with > > gen j = substr(word(v1, 1), -1, 1) if word(v1, 1) != "EntityID" > gen which = subinstr(v1, j, "", 1) if word(v1, 1) != "EntityID" > gen EntityID = v2 if word(v1, 1) == "EntityID" > replace EntityID = EntityID[_n-1] if missing(EntityID) > drop if word(v1,1) == "EntityID" > drop v1 > reshape wide v2, i(EntityID j) j(which) string > renpfix v2 > expand endyr - begyr + 1 > rename begyr year > bysort EntityID j : replace year = year[_n-1] + 1 if _n > 1 > drop endyr j > l > Nick > njcoxstata@gmail.com > > > On 19 February 2014 21:06, R Zhang <r05zhang@gmail.com> wrote: >> Hi Statalisters, >> >> >> My data has 13,458 observation and 21 variables. >> EntityID corpid1 begyr1 gvkey1 endyr1 corpid2 begyr2 gvkey2 endyr2 >> corpid3 begyr3 gvkey3 endyr3 corpid4 begyr4 gvkey4 endyr4 corpid5 >> begyr5 gvkey5 endyr5 >> 100091 8101 1961 1000 1970 8091 1971 1000 1973 8011 1974 1001 2000 >> 8012 2000 1001 2002 8012 2003 1001 2005 >> >> >> for each unique EntityID, the corresponding gvkey and corpid could >> vary over time as indicated by begyr and endyr, >> >> what I want is a dataset that give me the gvkey and corpid for each >> time period, so I can match it to another dataset that has company >> specific financial data , the match variable will be gvkey, year. >> >> as of now, i thought I should reshape the data, Someone on the forum >> kindly offered me the following program to reshape my data. sample >> code (see below) works for his hypothetical data, but when i ran with >> my data (13,458 observation and 21 variables.). I got an error "not >> possible; would exceed present limit on number of variables", could >> you shed light on this? >> >> ***************** >> input str20 v1 v2 >> EntityID 100091 >> corpid1 8101 >> begyr1 1961 >> gvkey1 1000 >> endyr1 1970 >> corpid2 8091 >> begyr2 1971 >> gvkey2 1000 >> endyr2 1973 >> corpid3 8011 >> begyr3 1974 >> gvkey3 1001 >> endyr3 2000 >> corpid4 8011 >> begyr4 1974 >> gvkey4 1001 >> endyr4 2000 >> corpid5 8011 >> begyr5 1974 >> gvkey5 1001 >> endyr5 2000 >> end >> >> compress >> sxpose, clear firstnames force >> reshape long corpid begyr gvkey endyr, i(EntityID) j(pd) >> *********************** >> >> what I ultimately want is : >> EntityID corpid year gvkey >> 100091 8101 1961 1000 >> 100091 8101 1962 1000 >> 100091 8101 1963 1000 >> 100091 8101 1964 1000 >> 100091 8101 1965 1000 >> 100091 8101 1966 1000 >> ... >> 100091 8091 1971 1000 >> 100091 8091 1972 1000 >> 100091 8091 1973 1000 >> 100091 8091 1974 1000 >> >> p.s if you think there is a better way , please also share. >> >> thanks!!! >> >> -R >> >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/faqs/resources/statalist-faq/ >> * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/