Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: generate


From   Nick Cox <n.j.cox@durham.ac.uk>
To   "'statalist@hsphsun2.harvard.edu'" <statalist@hsphsun2.harvard.edu>
Subject   RE: st: generate
Date   Fri, 8 Oct 2010 10:31:59 +0100

I haven't got 9.2 on my machine any more, but I imagine that #variables is much more likely to bite you than #observations. Again, look at -help limits- to see what the numbers are in your case. ("9.2" is not enough information to pin it down.) 

Hence the sentiment "you are better off long than wide" still applies to you, at least more than the converse. 

Nick 
n.j.cox@durham.ac.uk 

Mirriam Gee

I am referring to limit in terms of number of observations. I am using
stata version 9.2.

On Thu, Oct 7, 2010 at 1:17 PM, Nick Cox <n.j.cox@durham.ac.uk> wrote:

> It's the same data, wide or long. Which limit, observations or variables, do you imagine will bite first? Look at -help limits- for your version of Stata (not stated here).
>
> Before you replied, I was going to reinforce Dimitriy's advice. I would reach for -reshape- in this instance and I would keep the data in long form, at least on the information you have given.
>
> In a concurrent thread, I have commented:
>
> Some things are easier with a  wide structure but most things are easier otherwise.
>
> There is much more discussion in
>
> SJ-9-1  pr0046  . . . . . . . . . . . . . . . . . . .  Speaking Stata: Rowwise
>        (help rowsort, rowranks if installed) . . . . . . . . . . .  N. J. Cox
>        Q1/09   SJ 9(1):137--157
>        shows how to exploit functions, egen functions, and Mata
>        for working rowwise; rowsort and rowranks are introduced
>
> Although that column shows that you can do many things rowwise, the underlying theme is that it isn't usually trivial.

Mirriam Gee

> Thank you very much Dimitry for your suggestion. It worked perfectly
> well but my main worry is I have many hid (30000) and many g
> variables( eventually i will work with over 2000 variables), so i will
> end up having memory limitation problems if I use reshape command.
> Unless of course if I also divide my dataset into smaller groups.
>
> On Wed, Oct 6, 2010 at 10:55 PM, Dimitriy V. Masterov
>
>> Mirriam Gee wants to:
>>> generate new variable(s) X1- X20 which contains the first 20
>>> numbers ( excluding the zeros) from g1- g100?. For example:
>>
>> There's probably a more elegant way of doing this, but this can be
>> accomplished with the -reshape- command to make your data easier to
>> work with, and then reshaping it again to get it like you want it for
>> your analysis. First, preserve the data and then reshape long to get
>> the X variable. Then, reshape wide and save the X variables. Restore
>> the G variables data, and merge the Xs back in with the Gs:
>>
>> #delimit;
>> /* Preserve your data */
>> preserve;
>>
>> /* Preserve your data */
>> preserve;
>>
>> /* Create the x variables with 2 reshapes */
>> keep hid g*;
>> reshape long g, i(hid) j(which_g);
>>
>> drop if g==0;
>> rename g x;
>> bys hid: gen t=_n;
>> drop which_g;
>>
>> reshape wide x, i(hid) j(t);
>>
>> tempfile temp;
>> save "`temp'";
>>
>> /* Restore data */
>> restore;
>>
>> /* Merge the x variables with the g variables */
>> merge 1:1 hid using "`temp'";
>> drop x21-_merge;

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index