[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Supnithadnaporn, Anupit" <gtg065t@mail.gatech.edu> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: RE: RE: Use foreach or forvalues to create the long form data |

Date |
Sat, 18 Oct 2008 20:38:26 -0400 (EDT) |

Thank you for all the suggestions. I will try all of them and report the results later. Anupit ----- "Friedrich Huebler" <fhuebler@gmail.com> wrote: > Anupit, > > -reshape- is indeed slow. You can change the structure of your data > by > saving the contents of the variables in individual files that are > subsequently combined with -append-. The difference between the two > approaches can be demonstrated with the auto data. > > First, create a dataset with 148,000 observations in 10 variables, > plus an identifier. > > sysuse auto, clear > drop make foreign > local i = 1 > foreach var of varlist * { > ren `var' var`i' > local ++i > } > expand 2000 > gen i = _n > > We can now -reshape- the data from wide to long. > > reshape long var, i(i) j(j) > > The alternative solution does not rely on -reshape-. Instead, we use > -forvalues- in combination with -preserve-, -keep-, -save-, -restore- > and -append-. > > d, s > local j = r(k) - 1 > forvalues i = 1/`j' { > preserve > keep i var`i' > rename var`i' var > gen j = `i' > tempfile var`i' > save `var`i'' > restore > } > use `var1', clear > forvalues i = 2/`j' { > append using `var`i'' > } > > -reshape- is more convenient because it only takes one line of code. > This convenience comes at the cost of processing time and memory > requirements. On my PC the first solution with -reshape- takes about > 11 seconds. The second solution takes less than 2 seconds and also > needs less memory. > > Friedrich > > On Thu, Oct 16, 2008 at 10:57 AM, Supnithadnaporn, Anupit > <gtg065t@mail.gatech.edu> wrote: > > Hello, > > > > Martin and Nick, thank you so much. Your suggestion works very > well. > > > > I am sorry for being unclear about my question. > > I will clarify it now. I have a large dataset around 1 million > records. There are > > around 20 variables. I would like to run the clogit regression, > which requires me > > to reshape the data into the long form. Basically, it is the model > of a person > > choosing a product from the choice set of 12. Thus, the total > records would be of > > 12 million after reshaping. > > > > In the beginning, I tried with the smaller sample and reshape worked > very slow. > > Then, I thought I should start with only 2 ID variables: person ID > and choice ID (1-12). > > I created the wide form data of only 2 ID variables, reshaped it to > the long form, and > > lastly merged other variables that are associate with person and > choice respectively. > > > > However, even with the only 2 ID variables, it took a long time for > reshape to finish > > for the small sample of the total 1 million records. That is why I > try to find the > > faster way to create the empty dataset with 2 ID variables first. > Then my next step > > is to merge the information about a person and the choice. > > > > I hope this is clear enough. And if you and others have other better > approach to > > prepare the data like this, please let me know. There are a lot more > for me to learn > > from all of you. > > > > Thank you, > > Anupit > > > > > > ----- "Nick Cox" <n.j.cox@durham.ac.uk> wrote: > > > >> Martin's advice looks good. > >> > >> But Anupit's question doesn't hang together for me. The specific > >> example, and even longer ones of the same form, don't strike me as > >> -reshape- questions at all as they involve creating new data in > >> structured form. > >> > >> By the way, for large datasets make sure to use -egen long- or > -egen > >> double- if you need to. > >> > >> But if you had a -reshape- question, strict sense, I doubt you > could > >> speed things up much by programming it yourself with -forvalues- > or > >> -foreach-. That would, broadly speaking, mean that you were a > better > >> Stata programmer than the Stata developers. There could well be > >> exceptions, but I'd guess that this statement would be true much > more > >> often than its converse. > >> > >> Nick > >> n.j.cox@durham.ac.uk > >> > >> Martin Weiss > >> > >> - h egen,seq()- > >> > >> Supnithadnaporn, Anupit > >> > >> Would you please suggest me how to create data in the long form > >> by *not* using reshape? I would like to avoid reshape because > reshape > >> takes very very long time. In fact, the final & total number of > >> records > >> that I have to create would be around 12,000,000. > >> > >> I think foreach and forvalues can do this work. > >> But, I am a novice in Stata programming and could not figure out > so > >> far. > >> > >> In the beginning, I have only Obsid which is created by > >> > >> gen Obsid = _n > >> > >> The desired data would look like this: > >> > >> Obsid Vid Imp > >> 1 1 1 > >> 2 1 2 > >> 3 1 3 > >> 4 1 4 > >> 5 2 1 > >> 6 2 2 > >> 7 2 3 > >> 8 2 4 > >> > >> ... > >> > >> > >> 100 25 4 > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**Re: st: RE: RE: Use foreach or forvalues to create the long form data***From:*"Friedrich Huebler" <fhuebler@gmail.com>

- Prev by Date:
**Re: st: significance levels in outreg2 after dprobit** - Next by Date:
**RE: st: twoway scatter ..., yscale(log) with axis on log and arithmetic scales** - Previous by thread:
**Re: st: RE: RE: Use foreach or forvalues to create the long form data** - Next by thread:
**RE: st: RE: RE: Use foreach or forvalues to create the long form data** - Index(es):

© Copyright 1996–2015 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |