Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: reshape


From   Daniel Feenberg <feenberg@nber.org>
To   "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   Re: st: reshape
Date   Tue, 7 Aug 2012 15:19:59 -0400 (EDT)


On Tue, 7 Aug 2012, Airey, David C wrote:

.

When reshaping datasets from wide to long with very many variables and rows, is there any gain in speed of reshaping fewer variables or rows and then later combining versus letting reshape do its thing on the whole data set?

I don't know about that, but...


The reshape command is inexplicably slow. Take a dataset with variables id, year and x2001-x2010. Then the command:

  reshape long x, i(id) j(year)

takes about 20 seconds per million observations. But you can write out a separate file for each year of data, and then concatenate them into one long dataset in about 2 seconds. For example:

  forvalues year = 2001/2010 {
    use id year x`year' using widedata,replace
    rename x`year' x
    save "/tmp/reshape`year'",replace
  }
  clear
  forvalues year = 2001/2010 {
    append using "/tmp/reshape`year'"
  }

Obviously, the additional code isn't worthwhile unless you have multi-millions of observations, or are reshaping many times, but
sometimes that is what you have.

dan feenberg
NBER
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index