Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
R Zhang <r05zhang@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: sxpose -not possible; would exceed present limit on number of variables |

Date |
Wed, 19 Feb 2014 20:43:42 -0500 |

Hi, Nick I fixed the 3, 4, 5 repeated information . I ran your Simplified code and got an error " v1 not found", please let me know if I misunderstood your email. *********************** clear input str20 v1 v2 EntityID 100091 corpid1 8101 begyr1 1961 gvkey1 1000 endyr1 1970 corpid2 8091 begyr2 1971 gvkey2 1000 endyr2 1973 corpid3 8011 begyr3 1974 gvkey3 1001 endyr3 2000 corpid4 8012 begyr4 2001 gvkey4 1001 endyr4 2002 corpid5 8013 begyr5 2003 gvkey5 1001 endyr5 2004 end compress sxpose, clear firstnames force reshape long corpid begyr gvkey endyr, i(EntityID) j(pd) gen j = substr(v1, -1, 1) if v1 != "EntityID" gen which = subinstr(v1, j, "", 1) if v1 != "EntityID" gen EntityID = v2 if v1 == "EntityID" replace EntityID = EntityID[_n-1] if missing(EntityID) drop if v1 == "EntityID" *********************** thanks again, On Wed, Feb 19, 2014 at 7:50 PM, Nick Cox <njcoxstata@gmail.com> wrote: > Simplified code: > > gen j = substr(v1, -1, 1) if v1 != "EntityID" > gen which = subinstr(v1, j, "", 1) if v1 != "EntityID" > gen EntityID = v2 if v1 == "EntityID" > replace EntityID = EntityID[_n-1] if missing(EntityID) > drop if v1 == "EntityID" > drop v1 > reshape wide v2, i(EntityID j) j(which) string > renpfix v2 > expand endyr - begyr + 1 > rename begyr year > bysort EntityID j : replace year = year[_n-1] + 1 if _n > 1 > drop endyr j > l > > Nick > njcoxstata@gmail.com > > > On 20 February 2014 00:34, Nick Cox <njcoxstata@gmail.com> wrote: >> In your sample data, blocks *3 *4 *5 seem to be the same information repeated. >> >> With the sample data, this is code to play with >> >> gen j = substr(word(v1, 1), -1, 1) if word(v1, 1) != "EntityID" >> gen which = subinstr(v1, j, "", 1) if word(v1, 1) != "EntityID" >> gen EntityID = v2 if word(v1, 1) == "EntityID" >> replace EntityID = EntityID[_n-1] if missing(EntityID) >> drop if word(v1,1) == "EntityID" >> drop v1 >> reshape wide v2, i(EntityID j) j(which) string >> renpfix v2 >> expand endyr - begyr + 1 >> rename begyr year >> bysort EntityID j : replace year = year[_n-1] + 1 if _n > 1 >> drop endyr j >> l >> Nick >> njcoxstata@gmail.com >> >> >> On 19 February 2014 21:06, R Zhang <r05zhang@gmail.com> wrote: >>> Hi Statalisters, >>> >>> >>> My data has 13,458 observation and 21 variables. >>> EntityID corpid1 begyr1 gvkey1 endyr1 corpid2 begyr2 gvkey2 endyr2 >>> corpid3 begyr3 gvkey3 endyr3 corpid4 begyr4 gvkey4 endyr4 corpid5 >>> begyr5 gvkey5 endyr5 >>> 100091 8101 1961 1000 1970 8091 1971 1000 1973 8011 1974 1001 2000 >>> 8012 2000 1001 2002 8012 2003 1001 2005 >>> >>> >>> for each unique EntityID, the corresponding gvkey and corpid could >>> vary over time as indicated by begyr and endyr, >>> >>> what I want is a dataset that give me the gvkey and corpid for each >>> time period, so I can match it to another dataset that has company >>> specific financial data , the match variable will be gvkey, year. >>> >>> as of now, i thought I should reshape the data, Someone on the forum >>> kindly offered me the following program to reshape my data. sample >>> code (see below) works for his hypothetical data, but when i ran with >>> my data (13,458 observation and 21 variables.). I got an error "not >>> possible; would exceed present limit on number of variables", could >>> you shed light on this? >>> >>> ***************** >>> input str20 v1 v2 >>> EntityID 100091 >>> corpid1 8101 >>> begyr1 1961 >>> gvkey1 1000 >>> endyr1 1970 >>> corpid2 8091 >>> begyr2 1971 >>> gvkey2 1000 >>> endyr2 1973 >>> corpid3 8011 >>> begyr3 1974 >>> gvkey3 1001 >>> endyr3 2000 >>> corpid4 8011 >>> begyr4 1974 >>> gvkey4 1001 >>> endyr4 2000 >>> corpid5 8011 >>> begyr5 1974 >>> gvkey5 1001 >>> endyr5 2000 >>> end >>> >>> compress >>> sxpose, clear firstnames force >>> reshape long corpid begyr gvkey endyr, i(EntityID) j(pd) >>> *********************** >>> >>> what I ultimately want is : >>> EntityID corpid year gvkey >>> 100091 8101 1961 1000 >>> 100091 8101 1962 1000 >>> 100091 8101 1963 1000 >>> 100091 8101 1964 1000 >>> 100091 8101 1965 1000 >>> 100091 8101 1966 1000 >>> ... >>> 100091 8091 1971 1000 >>> 100091 8091 1972 1000 >>> 100091 8091 1973 1000 >>> 100091 8091 1974 1000 >>> >>> p.s if you think there is a better way , please also share. >>> >>> thanks!!! >>> >>> -R >>> >>> * >>> * For searches and help try: >>> * http://www.stata.com/help.cgi?search >>> * http://www.stata.com/support/faqs/resources/statalist-faq/ >>> * http://www.ats.ucla.edu/stat/stata/ > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: sxpose -not possible; would exceed present limit on number of variables***From:*Nick Cox <njcoxstata@gmail.com>

**References**:**st: sxpose -not possible; would exceed present limit on number of variables***From:*R Zhang <r05zhang@gmail.com>

**Re: st: sxpose -not possible; would exceed present limit on number of variables***From:*Nick Cox <njcoxstata@gmail.com>

**Re: st: sxpose -not possible; would exceed present limit on number of variables***From:*Nick Cox <njcoxstata@gmail.com>

- Prev by Date:
**Re: st: cumul, relrank and pweights** - Next by Date:
**Re: st: sxpose -not possible; would exceed present limit on number of variables** - Previous by thread:
**Re: st: sxpose -not possible; would exceed present limit on number of variables** - Next by thread:
**Re: st: sxpose -not possible; would exceed present limit on number of variables** - Index(es):