Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Nick Cox <njcoxstata@gmail.com> |
To | "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |
Subject | Re: st: sxpose -not possible; would exceed present limit on number of variables |
Date | Thu, 20 Feb 2014 07:59:43 +0000 |
Sorry, but I don't understand the question. The data you show clearly includes -gvkey- and -year-, the variables you say you want. Nick njcoxstata@gmail.com On 20 February 2014 03:08, R Zhang <r05zhang@gmail.com> wrote: > thanks again. > > running your code, I got > +++++++++++++++++++++ > v1 v2 j which EntityID > corpid1 8101 1 corpid 100091 > begyr1 1961 1 begyr 100091 > gvkey1 1000 1 gvkey 100091 > endyr1 1970 1 endyr 100091 > corpid2 8091 2 corpid 100091 > begyr2 1971 2 begyr 100091 > gvkey2 1000 2 gvkey 100091 > endyr2 1973 2 endyr 100091 > corpid3 8011 3 corpid 100091 > begyr3 1974 3 begyr 100091 > gvkey3 1001 3 gvkey 100091 > endyr3 2000 3 endyr 100091 > corpid4 8011 4 corpid 100091 > begyr4 1974 4 begyr 100091 > gvkey4 1001 4 gvkey 100091 > endyr4 2000 4 endyr 100091 > corpid5 8011 5 corpid 100091 > begyr5 1974 5 begyr 100091 > gvkey5 1001 5 gvkey 100091 > endyr5 2000 5 endyr 100091 > +++++++++++++++++++++ > > from the above output, is it easy to get > > +------------------------------------+ > > EntityID corpid year gvkey > > 100091 8101 1961 1000 > 100091 8101 1962 1000 > ... > 100091 8101 1970 1000 > 100091 8091 1971 1000 > 100091 8091 1972 1000 > 100091 8091 1973 1000 > ..... > 100091 8012 2003 1001 > +-------------------------------------------+ > > I would like the corresponding gvkey and year so I can match merge > with another dataset using thse two variables. > > -Rochelle > > On Wed, Feb 19, 2014 at 8:47 PM, Nick Cox <njcoxstata@gmail.com> wrote: >> I am not using -sxpose- (SSC) or the -reshape- following. The key >> point is that that approach will not work for you, as already >> established. My code starts after the sample data is read in by >> >> input str20 v1 v2 >> EntityID 100091 >> corpid1 8101 >> begyr1 1961 >> gvkey1 1000 >> endyr1 1970 >> corpid2 8091 >> begyr2 1971 >> gvkey2 1000 >> endyr2 1973 >> corpid3 8011 >> begyr3 1974 >> gvkey3 1001 >> endyr3 2000 >> corpid4 8012 >> begyr4 2001 >> gvkey4 1001 >> endyr4 2002 >> corpid5 8013 >> begyr5 2003 >> gvkey5 1001 >> endyr5 2004 >> end >> >> Nick >> njcoxstata@gmail.com >> >> >> On 20 February 2014 01:43, R Zhang <r05zhang@gmail.com> wrote: >>> Hi, Nick >>> >>> I fixed the 3, 4, 5 repeated information . I ran your Simplified code >>> and got an error " v1 not found", please let me know if I >>> misunderstood your email. >>> >>> *********************** >>> clear >>> input str20 v1 v2 >>> EntityID 100091 >>> corpid1 8101 >>> begyr1 1961 >>> gvkey1 1000 >>> endyr1 1970 >>> corpid2 8091 >>> begyr2 1971 >>> gvkey2 1000 >>> endyr2 1973 >>> corpid3 8011 >>> begyr3 1974 >>> gvkey3 1001 >>> endyr3 2000 >>> corpid4 8012 >>> begyr4 2001 >>> gvkey4 1001 >>> endyr4 2002 >>> corpid5 8013 >>> begyr5 2003 >>> gvkey5 1001 >>> endyr5 2004 >>> end >>> >>> compress >>> sxpose, clear firstnames force >>> reshape long corpid begyr gvkey endyr, i(EntityID) j(pd) >>> >>> gen j = substr(v1, -1, 1) if v1 != "EntityID" >>> gen which = subinstr(v1, j, "", 1) if v1 != "EntityID" >>> gen EntityID = v2 if v1 == "EntityID" >>> replace EntityID = EntityID[_n-1] if missing(EntityID) >>> drop if v1 == "EntityID" >>> >>> *********************** >>> >>> thanks again, >>> >>> On Wed, Feb 19, 2014 at 7:50 PM, Nick Cox <njcoxstata@gmail.com> wrote: >>>> Simplified code: >>>> >>>> gen j = substr(v1, -1, 1) if v1 != "EntityID" >>>> gen which = subinstr(v1, j, "", 1) if v1 != "EntityID" >>>> gen EntityID = v2 if v1 == "EntityID" >>>> replace EntityID = EntityID[_n-1] if missing(EntityID) >>>> drop if v1 == "EntityID" >>>> drop v1 >>>> reshape wide v2, i(EntityID j) j(which) string >>>> renpfix v2 >>>> expand endyr - begyr + 1 >>>> rename begyr year >>>> bysort EntityID j : replace year = year[_n-1] + 1 if _n > 1 >>>> drop endyr j >>>> l >>>> >>>> Nick >>>> njcoxstata@gmail.com >>>> >>>> >>>> On 20 February 2014 00:34, Nick Cox <njcoxstata@gmail.com> wrote: >>>>> In your sample data, blocks *3 *4 *5 seem to be the same information repeated. >>>>> >>>>> With the sample data, this is code to play with >>>>> >>>>> gen j = substr(word(v1, 1), -1, 1) if word(v1, 1) != "EntityID" >>>>> gen which = subinstr(v1, j, "", 1) if word(v1, 1) != "EntityID" >>>>> gen EntityID = v2 if word(v1, 1) == "EntityID" >>>>> replace EntityID = EntityID[_n-1] if missing(EntityID) >>>>> drop if word(v1,1) == "EntityID" >>>>> drop v1 >>>>> reshape wide v2, i(EntityID j) j(which) string >>>>> renpfix v2 >>>>> expand endyr - begyr + 1 >>>>> rename begyr year >>>>> bysort EntityID j : replace year = year[_n-1] + 1 if _n > 1 >>>>> drop endyr j >>>>> l >>>>> Nick >>>>> njcoxstata@gmail.com >>>>> >>>>> >>>>> On 19 February 2014 21:06, R Zhang <r05zhang@gmail.com> wrote: >>>>>> Hi Statalisters, >>>>>> >>>>>> >>>>>> My data has 13,458 observation and 21 variables. >>>>>> EntityID corpid1 begyr1 gvkey1 endyr1 corpid2 begyr2 gvkey2 endyr2 >>>>>> corpid3 begyr3 gvkey3 endyr3 corpid4 begyr4 gvkey4 endyr4 corpid5 >>>>>> begyr5 gvkey5 endyr5 >>>>>> 100091 8101 1961 1000 1970 8091 1971 1000 1973 8011 1974 1001 2000 >>>>>> 8012 2000 1001 2002 8012 2003 1001 2005 >>>>>> >>>>>> >>>>>> for each unique EntityID, the corresponding gvkey and corpid could >>>>>> vary over time as indicated by begyr and endyr, >>>>>> >>>>>> what I want is a dataset that give me the gvkey and corpid for each >>>>>> time period, so I can match it to another dataset that has company >>>>>> specific financial data , the match variable will be gvkey, year. >>>>>> >>>>>> as of now, i thought I should reshape the data, Someone on the forum >>>>>> kindly offered me the following program to reshape my data. sample >>>>>> code (see below) works for his hypothetical data, but when i ran with >>>>>> my data (13,458 observation and 21 variables.). I got an error "not >>>>>> possible; would exceed present limit on number of variables", could >>>>>> you shed light on this? >>>>>> >>>>>> ***************** >>>>>> input str20 v1 v2 >>>>>> EntityID 100091 >>>>>> corpid1 8101 >>>>>> begyr1 1961 >>>>>> gvkey1 1000 >>>>>> endyr1 1970 >>>>>> corpid2 8091 >>>>>> begyr2 1971 >>>>>> gvkey2 1000 >>>>>> endyr2 1973 >>>>>> corpid3 8011 >>>>>> begyr3 1974 >>>>>> gvkey3 1001 >>>>>> endyr3 2000 >>>>>> corpid4 8011 >>>>>> begyr4 1974 >>>>>> gvkey4 1001 >>>>>> endyr4 2000 >>>>>> corpid5 8011 >>>>>> begyr5 1974 >>>>>> gvkey5 1001 >>>>>> endyr5 2000 >>>>>> end >>>>>> >>>>>> compress >>>>>> sxpose, clear firstnames force >>>>>> reshape long corpid begyr gvkey endyr, i(EntityID) j(pd) >>>>>> *********************** >>>>>> >>>>>> what I ultimately want is : >>>>>> EntityID corpid year gvkey >>>>>> 100091 8101 1961 1000 >>>>>> 100091 8101 1962 1000 >>>>>> 100091 8101 1963 1000 >>>>>> 100091 8101 1964 1000 >>>>>> 100091 8101 1965 1000 >>>>>> 100091 8101 1966 1000 >>>>>> ... >>>>>> 100091 8091 1971 1000 >>>>>> 100091 8091 1972 1000 >>>>>> 100091 8091 1973 1000 >>>>>> 100091 8091 1974 1000 >>>>>> >>>>>> p.s if you think there is a better way , please also share. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/