Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: RE: Use foreach or forvalues to create the long form data


From   "Supnithadnaporn, Anupit" <gtg065t@mail.gatech.edu>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: RE: RE: Use foreach or forvalues to create the long form data
Date   Thu, 16 Oct 2008 10:57:17 -0400 (EDT)

Hello,

Martin and Nick, thank you so much. Your suggestion works very well.

I am sorry for being unclear about my question.
I will clarify it now. I have a large dataset around 1 million records. There are
around 20 variables. I would like to run the clogit regression, which requires me
to reshape the data into the long form. Basically, it is the model of a person
choosing a product from the choice set of 12. Thus, the total records  would be of
12 million after reshaping.

In the beginning, I tried with the smaller sample and reshape worked very slow.
Then, I thought I should start with only 2 ID variables: person ID and choice ID (1-12).
I created the wide form data of only 2 ID variables, reshaped it to the long form, and
lastly merged other variables that are associate with person and choice respectively.

However, even with the only 2 ID variables, it took a long time for reshape to finish
for the small sample of the total 1 million records. That is why I try to find the 
faster way to create the empty dataset with 2 ID variables first. Then my next step
is to merge the information about a person and the choice.

I hope this is clear enough. And if you and others have other better approach to 
prepare the data like this, please let me know. There are a lot more for me to learn
from all of you.

Thank you,
Anupit


----- "Nick Cox" <n.j.cox@durham.ac.uk> wrote:

> Martin's advice looks good. 
> 
> But Anupit's question doesn't hang together for me. The specific
> example, and even longer ones of the same form, don't strike me as
> -reshape- questions at all as they involve creating new data in
> structured form. 
> 
> By the way, for large datasets make sure to use -egen long- or -egen
> double- if you need to. 
> 
> But if you had a -reshape- question, strict sense, I doubt you could
> speed things up much by programming it yourself with -forvalues- or
> -foreach-. That would, broadly speaking, mean that you were a better
> Stata programmer than the Stata developers. There could well be
> exceptions, but I'd guess that this statement would be true much more
> often than its converse. 
> 
> Nick 
> n.j.cox@durham.ac.uk 
> 
> Martin Weiss
> 
> - h egen,seq()-
> 
> Supnithadnaporn, Anupit
> 
> Would you please suggest me how to create data in the long form
> by *not* using reshape? I would like to avoid reshape because reshape
> takes very very long time. In fact, the final & total number of
> records 
> that I have to create would be around 12,000,000.
> 
> I think foreach and forvalues can do this work. 
> But, I am a novice in Stata programming and could not figure out so
> far.
> 
> In the beginning, I have only Obsid which is created by
> 
> gen Obsid = _n
> 
> The desired data would look like this:
> 
> Obsid   Vid     Imp
> 1       1       1
> 2       1       2
> 3       1       3
> 4       1       4
> 5       2       1
> 6       2       2
> 7       2       3
> 8       2       4
> 
> ...
> 
> 
> 100     25      4
> 
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index