Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Re: reshape


From   "Airey, David C" <[email protected]>
To   "[email protected]" <[email protected]>
Subject   st: Re: reshape
Date   Tue, 7 Aug 2012 15:48:15 -0500

.

reshape does seem very slow for data sets with many variables. I get 4-5 minutes for a 300 row by 2000 variable wide to long reshape. Maybe the command can be sped up. Thanks for the code alternative. I was looking into using Perl too.

-Dave

> 
> When reshaping datasets from wide to long with very many variables and rows, is there any gain in speed of reshaping fewer variables or rows and then later combining versus letting reshape do its thing on the whole data set?
> I don't know about that, but...
> 
> 
>> 
>> The reshape command is inexplicably slow. Take a dataset with variables id, year and x2001-x2010. Then the command:
>>   reshape long x, i(id) j(year)
>> 
>> 
>> takes about 20 seconds per million observations. But you can write out a separate file for each year of data, and then concatenate them into one long dataset in about 2 seconds. For example:
>>   forvalues year = 2001/2010 {
>>     use id year x`year' using widedata,replace
>>     rename x`year' x
>>     save "/tmp/reshape`year'",replace
>>   }
>>   clear
>>   forvalues year = 2001/2010 {
>>     append using "/tmp/reshape`year'"
>>   }
>> 
>> 
>> Obviously, the additional code isn't worthwhile unless you have multi-millions of observations, or are reshaping many times, but
>> sometimes that is what you have.
>> 
>> dan feenberg
>> NBER


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index