Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: sxpose -not possible; would exceed present limit on number of variables

From	Carlos Avellaneda Suárez <[email protected]>
To	[email protected]
Subject	Re: st: sxpose -not possible; would exceed present limit on number of variables
Date	Thu, 20 Feb 2014 10:03:25 -0500

Hi Rochelle,
You forgot to run the second part of Nick's excellent code:
drop v1
reshape wide v2, i(EntityID j) j(which) string
renpfix v2
expand endyr - begyr + 1
rename begyr year
bysort EntityID j : replace year = year[_n-1] + 1 if _n > 1
drop endyr j
l

2014-02-20 2:59 GMT-05:00 Nick Cox <[email protected]>:
> Sorry, but I don't understand the question. The data you show clearly
> includes -gvkey- and -year-, the variables you say you want.
>
> Nick
> [email protected]
>
> On 20 February 2014 03:08, R Zhang <[email protected]> wrote:
>
>> thanks again.
>>
>> running your code, I got
>> +++++++++++++++++++++
>> v1 v2 j which EntityID
>> corpid1 8101 1 corpid 100091
>> begyr1 1961 1 begyr 100091
>> gvkey1 1000 1 gvkey 100091
>> endyr1 1970 1 endyr 100091
>> corpid2 8091 2 corpid 100091
>> begyr2 1971 2 begyr 100091
>> gvkey2 1000 2 gvkey 100091
>> endyr2 1973 2 endyr 100091
>> corpid3 8011 3 corpid 100091
>> begyr3 1974 3 begyr 100091
>> gvkey3 1001 3 gvkey 100091
>> endyr3 2000 3 endyr 100091
>> corpid4 8011 4 corpid 100091
>> begyr4 1974 4 begyr 100091
>> gvkey4 1001 4 gvkey 100091
>> endyr4 2000 4 endyr 100091
>> corpid5 8011 5 corpid 100091
>> begyr5 1974 5 begyr 100091
>> gvkey5 1001 5 gvkey 100091
>> endyr5 2000 5 endyr 100091
>> +++++++++++++++++++++
>>
>> from the above output, is it easy to get
>>
>>  +------------------------------------+
>>
>>  EntityID    corpid  year gvkey
>>
>> 100091   8101    1961 1000
>> 100091   8101    1962 1000
>> ...
>> 100091   8101    1970 1000
>> 100091   8091   1971 1000
>> 100091   8091   1972 1000
>> 100091   8091   1973 1000
>> .....
>> 100091     8012  2003 1001
>> +-------------------------------------------+
>>
>> I would like the corresponding gvkey and year so I can match merge
>> with another dataset using thse two variables.
>>
>> -Rochelle
>>
>> On Wed, Feb 19, 2014 at 8:47 PM, Nick Cox <[email protected]> wrote:
>>> I am not using -sxpose- (SSC) or the -reshape- following. The key
>>> point is that that approach will not work for you, as already
>>> established. My code starts after the sample data is read in by
>>>
>>> input str20 v1 v2
>>> EntityID 100091
>>> corpid1 8101
>>> begyr1 1961
>>> gvkey1 1000
>>> endyr1 1970
>>> corpid2 8091
>>> begyr2 1971
>>> gvkey2 1000
>>> endyr2 1973
>>> corpid3 8011
>>> begyr3 1974
>>> gvkey3 1001
>>> endyr3 2000
>>> corpid4 8012
>>> begyr4 2001
>>> gvkey4 1001
>>> endyr4 2002
>>> corpid5 8013
>>> begyr5 2003
>>> gvkey5 1001
>>> endyr5 2004
>>> end
>>>
>>> Nick
>>> [email protected]
>>>
>>>
>>> On 20 February 2014 01:43, R Zhang <[email protected]> wrote:
>>>> Hi, Nick
>>>>
>>>> I fixed the 3, 4, 5 repeated information . I ran your Simplified code
>>>> and got an error " v1 not found", please let me know if  I
>>>> misunderstood your email.
>>>>
>>>> ***********************
>>>> clear
>>>> input str20 v1 v2
>>>> EntityID 100091
>>>> corpid1 8101
>>>> begyr1 1961
>>>> gvkey1 1000
>>>> endyr1 1970
>>>> corpid2 8091
>>>> begyr2 1971
>>>> gvkey2 1000
>>>> endyr2 1973
>>>> corpid3 8011
>>>> begyr3 1974
>>>> gvkey3 1001
>>>> endyr3 2000
>>>> corpid4 8012
>>>> begyr4 2001
>>>> gvkey4 1001
>>>> endyr4 2002
>>>> corpid5 8013
>>>> begyr5 2003
>>>> gvkey5 1001
>>>> endyr5 2004
>>>> end
>>>>
>>>> compress
>>>> sxpose, clear firstnames force
>>>> reshape long corpid begyr gvkey endyr, i(EntityID) j(pd)
>>>>
>>>> gen j = substr(v1, -1, 1) if v1 != "EntityID"
>>>> gen which = subinstr(v1, j, "", 1) if v1 != "EntityID"
>>>> gen EntityID = v2 if v1 == "EntityID"
>>>> replace EntityID = EntityID[_n-1] if missing(EntityID)
>>>> drop if v1 == "EntityID"
>>>>
>>>> ***********************
>>>>
>>>> thanks again,
>>>>
>>>> On Wed, Feb 19, 2014 at 7:50 PM, Nick Cox <[email protected]> wrote:
>>>>> Simplified code:
>>>>>
>>>>> gen j = substr(v1, -1, 1) if v1 != "EntityID"
>>>>> gen which = subinstr(v1, j, "", 1) if v1 != "EntityID"
>>>>> gen EntityID = v2 if v1 == "EntityID"
>>>>> replace EntityID = EntityID[_n-1] if missing(EntityID)
>>>>> drop if v1 == "EntityID"
>>>>> drop v1
>>>>> reshape wide v2, i(EntityID j) j(which) string
>>>>> renpfix v2
>>>>> expand endyr - begyr + 1
>>>>> rename begyr year
>>>>> bysort EntityID j : replace year = year[_n-1] + 1 if _n > 1
>>>>> drop endyr j
>>>>> l
>>>>>
>>>>> Nick
>>>>> [email protected]
>>>>>
>>>>>
>>>>> On 20 February 2014 00:34, Nick Cox <[email protected]> wrote:
>>>>>> In your sample data, blocks *3 *4 *5 seem to be the same information repeated.
>>>>>>
>>>>>> With  the sample data, this is code to play with
>>>>>>
>>>>>> gen j = substr(word(v1, 1), -1, 1) if word(v1, 1) != "EntityID"
>>>>>> gen which = subinstr(v1, j, "", 1) if word(v1, 1) != "EntityID"
>>>>>> gen EntityID = v2 if word(v1, 1) == "EntityID"
>>>>>> replace EntityID = EntityID[_n-1] if missing(EntityID)
>>>>>> drop if word(v1,1) == "EntityID"
>>>>>> drop v1
>>>>>> reshape wide v2, i(EntityID j) j(which) string
>>>>>> renpfix v2
>>>>>> expand endyr - begyr + 1
>>>>>> rename begyr year
>>>>>> bysort EntityID j : replace year = year[_n-1] + 1 if _n > 1
>>>>>> drop endyr j
>>>>>> l
>>>>>> Nick
>>>>>> [email protected]
>>>>>>
>>>>>>
>>>>>> On 19 February 2014 21:06, R Zhang <[email protected]> wrote:
>>>>>>> Hi Statalisters,
>>>>>>>
>>>>>>>
>>>>>>> My data has 13,458 observation and 21 variables.
>>>>>>> EntityID corpid1 begyr1 gvkey1 endyr1 corpid2 begyr2 gvkey2 endyr2
>>>>>>> corpid3 begyr3 gvkey3 endyr3 corpid4 begyr4 gvkey4 endyr4 corpid5
>>>>>>> begyr5 gvkey5 endyr5
>>>>>>> 100091 8101 1961 1000 1970 8091 1971 1000 1973 8011 1974 1001 2000
>>>>>>> 8012 2000 1001 2002 8012 2003 1001 2005
>>>>>>>
>>>>>>>
>>>>>>> for each unique EntityID, the corresponding gvkey and corpid could
>>>>>>> vary over time as indicated by begyr and endyr,
>>>>>>>
>>>>>>> what I want is a dataset that give me the gvkey and corpid for each
>>>>>>> time period, so I can match it to another dataset that has company
>>>>>>> specific financial data , the match variable will be gvkey, year.
>>>>>>>
>>>>>>> as of now, i thought I should reshape the data, Someone on the forum
>>>>>>> kindly offered me the following program to reshape my data. sample
>>>>>>> code (see below) works for his hypothetical data, but when i ran with
>>>>>>> my data (13,458 observation and 21 variables.). I got an error "not
>>>>>>> possible; would exceed present limit on number of variables", could
>>>>>>> you shed light on this?
>>>>>>>
>>>>>>> *****************
>>>>>>> input str20 v1 v2
>>>>>>> EntityID 100091
>>>>>>> corpid1 8101
>>>>>>> begyr1 1961
>>>>>>> gvkey1 1000
>>>>>>> endyr1 1970
>>>>>>> corpid2 8091
>>>>>>> begyr2 1971
>>>>>>> gvkey2 1000
>>>>>>> endyr2 1973
>>>>>>> corpid3 8011
>>>>>>> begyr3 1974
>>>>>>> gvkey3 1001
>>>>>>> endyr3 2000
>>>>>>> corpid4 8011
>>>>>>> begyr4 1974
>>>>>>> gvkey4 1001
>>>>>>> endyr4 2000
>>>>>>> corpid5 8011
>>>>>>> begyr5 1974
>>>>>>> gvkey5 1001
>>>>>>> endyr5 2000
>>>>>>> end
>>>>>>>
>>>>>>> compress
>>>>>>> sxpose, clear firstnames force
>>>>>>> reshape long corpid begyr gvkey endyr, i(EntityID) j(pd)
>>>>>>> ***********************
>>>>>>>
>>>>>>> what I ultimately want is :
>>>>>>> EntityID corpid year gvkey
>>>>>>> 100091 8101 1961 1000
>>>>>>> 100091 8101 1962 1000
>>>>>>> 100091 8101 1963 1000
>>>>>>> 100091 8101 1964 1000
>>>>>>> 100091 8101 1965 1000
>>>>>>> 100091 8101 1966 1000
>>>>>>> ...
>>>>>>> 100091 8091 1971 1000
>>>>>>> 100091 8091 1972 1000
>>>>>>> 100091 8091 1973 1000
>>>>>>> 100091 8091 1974 1000
>>>>>>>
>>>>>>> p.s if you think there is a better way , please also share.
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: sxpose -not possible; would exceed present limit on number of variables
  - From: R Zhang <[email protected]>
- Re: st: sxpose -not possible; would exceed present limit on number of variables
  - From: Nick Cox <[email protected]>
- Re: st: sxpose -not possible; would exceed present limit on number of variables
  - From: Nick Cox <[email protected]>
- Re: st: sxpose -not possible; would exceed present limit on number of variables
  - From: R Zhang <[email protected]>
- Re: st: sxpose -not possible; would exceed present limit on number of variables
  - From: Nick Cox <[email protected]>
- Re: st: sxpose -not possible; would exceed present limit on number of variables
  - From: R Zhang <[email protected]>
- Re: st: sxpose -not possible; would exceed present limit on number of variables
  - From: Nick Cox <[email protected]>

Prev by Date: Re: st: insheet and dropping cases
Next by Date: Re: st: About taking log on zero values
Previous by thread: Re: st: sxpose -not possible; would exceed present limit on number of variables
Next by thread: st: Update to - domin - on SSC
Index(es):
- Date
- Thread