Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: sxpose -not possible; would exceed present limit on number of variables


From   Nick Cox <[email protected]>
To   "[email protected]" <[email protected]>
Subject   Re: st: sxpose -not possible; would exceed present limit on number of variables
Date   Thu, 20 Feb 2014 07:59:43 +0000

Sorry, but I don't understand the question. The data you show clearly
includes -gvkey- and -year-, the variables you say you want.

Nick
[email protected]

On 20 February 2014 03:08, R Zhang <[email protected]> wrote:

> thanks again.
>
> running your code, I got
> +++++++++++++++++++++
> v1 v2 j which EntityID
> corpid1 8101 1 corpid 100091
> begyr1 1961 1 begyr 100091
> gvkey1 1000 1 gvkey 100091
> endyr1 1970 1 endyr 100091
> corpid2 8091 2 corpid 100091
> begyr2 1971 2 begyr 100091
> gvkey2 1000 2 gvkey 100091
> endyr2 1973 2 endyr 100091
> corpid3 8011 3 corpid 100091
> begyr3 1974 3 begyr 100091
> gvkey3 1001 3 gvkey 100091
> endyr3 2000 3 endyr 100091
> corpid4 8011 4 corpid 100091
> begyr4 1974 4 begyr 100091
> gvkey4 1001 4 gvkey 100091
> endyr4 2000 4 endyr 100091
> corpid5 8011 5 corpid 100091
> begyr5 1974 5 begyr 100091
> gvkey5 1001 5 gvkey 100091
> endyr5 2000 5 endyr 100091
> +++++++++++++++++++++
>
> from the above output, is it easy to get
>
>  +------------------------------------+
>
>  EntityID    corpid  year gvkey
>
> 100091   8101    1961 1000
> 100091   8101    1962 1000
> ...
> 100091   8101    1970 1000
> 100091   8091   1971 1000
> 100091   8091   1972 1000
> 100091   8091   1973 1000
> .....
> 100091     8012  2003 1001
> +-------------------------------------------+
>
> I would like the corresponding gvkey and year so I can match merge
> with another dataset using thse two variables.
>
> -Rochelle
>
> On Wed, Feb 19, 2014 at 8:47 PM, Nick Cox <[email protected]> wrote:
>> I am not using -sxpose- (SSC) or the -reshape- following. The key
>> point is that that approach will not work for you, as already
>> established. My code starts after the sample data is read in by
>>
>> input str20 v1 v2
>> EntityID 100091
>> corpid1 8101
>> begyr1 1961
>> gvkey1 1000
>> endyr1 1970
>> corpid2 8091
>> begyr2 1971
>> gvkey2 1000
>> endyr2 1973
>> corpid3 8011
>> begyr3 1974
>> gvkey3 1001
>> endyr3 2000
>> corpid4 8012
>> begyr4 2001
>> gvkey4 1001
>> endyr4 2002
>> corpid5 8013
>> begyr5 2003
>> gvkey5 1001
>> endyr5 2004
>> end
>>
>> Nick
>> [email protected]
>>
>>
>> On 20 February 2014 01:43, R Zhang <[email protected]> wrote:
>>> Hi, Nick
>>>
>>> I fixed the 3, 4, 5 repeated information . I ran your Simplified code
>>> and got an error " v1 not found", please let me know if  I
>>> misunderstood your email.
>>>
>>> ***********************
>>> clear
>>> input str20 v1 v2
>>> EntityID 100091
>>> corpid1 8101
>>> begyr1 1961
>>> gvkey1 1000
>>> endyr1 1970
>>> corpid2 8091
>>> begyr2 1971
>>> gvkey2 1000
>>> endyr2 1973
>>> corpid3 8011
>>> begyr3 1974
>>> gvkey3 1001
>>> endyr3 2000
>>> corpid4 8012
>>> begyr4 2001
>>> gvkey4 1001
>>> endyr4 2002
>>> corpid5 8013
>>> begyr5 2003
>>> gvkey5 1001
>>> endyr5 2004
>>> end
>>>
>>> compress
>>> sxpose, clear firstnames force
>>> reshape long corpid begyr gvkey endyr, i(EntityID) j(pd)
>>>
>>> gen j = substr(v1, -1, 1) if v1 != "EntityID"
>>> gen which = subinstr(v1, j, "", 1) if v1 != "EntityID"
>>> gen EntityID = v2 if v1 == "EntityID"
>>> replace EntityID = EntityID[_n-1] if missing(EntityID)
>>> drop if v1 == "EntityID"
>>>
>>> ***********************
>>>
>>> thanks again,
>>>
>>> On Wed, Feb 19, 2014 at 7:50 PM, Nick Cox <[email protected]> wrote:
>>>> Simplified code:
>>>>
>>>> gen j = substr(v1, -1, 1) if v1 != "EntityID"
>>>> gen which = subinstr(v1, j, "", 1) if v1 != "EntityID"
>>>> gen EntityID = v2 if v1 == "EntityID"
>>>> replace EntityID = EntityID[_n-1] if missing(EntityID)
>>>> drop if v1 == "EntityID"
>>>> drop v1
>>>> reshape wide v2, i(EntityID j) j(which) string
>>>> renpfix v2
>>>> expand endyr - begyr + 1
>>>> rename begyr year
>>>> bysort EntityID j : replace year = year[_n-1] + 1 if _n > 1
>>>> drop endyr j
>>>> l
>>>>
>>>> Nick
>>>> [email protected]
>>>>
>>>>
>>>> On 20 February 2014 00:34, Nick Cox <[email protected]> wrote:
>>>>> In your sample data, blocks *3 *4 *5 seem to be the same information repeated.
>>>>>
>>>>> With  the sample data, this is code to play with
>>>>>
>>>>> gen j = substr(word(v1, 1), -1, 1) if word(v1, 1) != "EntityID"
>>>>> gen which = subinstr(v1, j, "", 1) if word(v1, 1) != "EntityID"
>>>>> gen EntityID = v2 if word(v1, 1) == "EntityID"
>>>>> replace EntityID = EntityID[_n-1] if missing(EntityID)
>>>>> drop if word(v1,1) == "EntityID"
>>>>> drop v1
>>>>> reshape wide v2, i(EntityID j) j(which) string
>>>>> renpfix v2
>>>>> expand endyr - begyr + 1
>>>>> rename begyr year
>>>>> bysort EntityID j : replace year = year[_n-1] + 1 if _n > 1
>>>>> drop endyr j
>>>>> l
>>>>> Nick
>>>>> [email protected]
>>>>>
>>>>>
>>>>> On 19 February 2014 21:06, R Zhang <[email protected]> wrote:
>>>>>> Hi Statalisters,
>>>>>>
>>>>>>
>>>>>> My data has 13,458 observation and 21 variables.
>>>>>> EntityID corpid1 begyr1 gvkey1 endyr1 corpid2 begyr2 gvkey2 endyr2
>>>>>> corpid3 begyr3 gvkey3 endyr3 corpid4 begyr4 gvkey4 endyr4 corpid5
>>>>>> begyr5 gvkey5 endyr5
>>>>>> 100091 8101 1961 1000 1970 8091 1971 1000 1973 8011 1974 1001 2000
>>>>>> 8012 2000 1001 2002 8012 2003 1001 2005
>>>>>>
>>>>>>
>>>>>> for each unique EntityID, the corresponding gvkey and corpid could
>>>>>> vary over time as indicated by begyr and endyr,
>>>>>>
>>>>>> what I want is a dataset that give me the gvkey and corpid for each
>>>>>> time period, so I can match it to another dataset that has company
>>>>>> specific financial data , the match variable will be gvkey, year.
>>>>>>
>>>>>> as of now, i thought I should reshape the data, Someone on the forum
>>>>>> kindly offered me the following program to reshape my data. sample
>>>>>> code (see below) works for his hypothetical data, but when i ran with
>>>>>> my data (13,458 observation and 21 variables.). I got an error "not
>>>>>> possible; would exceed present limit on number of variables", could
>>>>>> you shed light on this?
>>>>>>
>>>>>> *****************
>>>>>> input str20 v1 v2
>>>>>> EntityID 100091
>>>>>> corpid1 8101
>>>>>> begyr1 1961
>>>>>> gvkey1 1000
>>>>>> endyr1 1970
>>>>>> corpid2 8091
>>>>>> begyr2 1971
>>>>>> gvkey2 1000
>>>>>> endyr2 1973
>>>>>> corpid3 8011
>>>>>> begyr3 1974
>>>>>> gvkey3 1001
>>>>>> endyr3 2000
>>>>>> corpid4 8011
>>>>>> begyr4 1974
>>>>>> gvkey4 1001
>>>>>> endyr4 2000
>>>>>> corpid5 8011
>>>>>> begyr5 1974
>>>>>> gvkey5 1001
>>>>>> endyr5 2000
>>>>>> end
>>>>>>
>>>>>> compress
>>>>>> sxpose, clear firstnames force
>>>>>> reshape long corpid begyr gvkey endyr, i(EntityID) j(pd)
>>>>>> ***********************
>>>>>>
>>>>>> what I ultimately want is :
>>>>>> EntityID corpid year gvkey
>>>>>> 100091 8101 1961 1000
>>>>>> 100091 8101 1962 1000
>>>>>> 100091 8101 1963 1000
>>>>>> 100091 8101 1964 1000
>>>>>> 100091 8101 1965 1000
>>>>>> 100091 8101 1966 1000
>>>>>> ...
>>>>>> 100091 8091 1971 1000
>>>>>> 100091 8091 1972 1000
>>>>>> 100091 8091 1973 1000
>>>>>> 100091 8091 1974 1000
>>>>>>
>>>>>> p.s if you think there is a better way , please also share.
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index