Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: reshape


From   Daniel Feenberg <[email protected]>
To   "[email protected]" <[email protected]>
Subject   Re: st: reshape
Date   Tue, 7 Aug 2012 15:19:59 -0400 (EDT)


On Tue, 7 Aug 2012, Airey, David C wrote:

.

When reshaping datasets from wide to long with very many variables and rows, is there any gain in speed of reshaping fewer variables or rows and then later combining versus letting reshape do its thing on the whole data set?

I don't know about that, but...


The reshape command is inexplicably slow. Take a dataset with variables id, year and x2001-x2010. Then the command:

  reshape long x, i(id) j(year)

takes about 20 seconds per million observations. But you can write out a separate file for each year of data, and then concatenate them into one long dataset in about 2 seconds. For example:

  forvalues year = 2001/2010 {
    use id year x`year' using widedata,replace
    rename x`year' x
    save "/tmp/reshape`year'",replace
  }
  clear
  forvalues year = 2001/2010 {
    append using "/tmp/reshape`year'"
  }

Obviously, the additional code isn't worthwhile unless you have multi-millions of observations, or are reshaping many times, but
sometimes that is what you have.

dan feenberg
NBER
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index