Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Splitting Dataset - Save by unique identifier

From   Daniel Feenberg <>
Subject   Re: st: Splitting Dataset - Save by unique identifier
Date   Sun, 28 Oct 2012 09:56:24 -0400 (EDT)

On Sat, Oct 27, 2012 at 5:28 PM, Tim Streibel <> wrote:
Hey all,

I am having a question I am currently computing abnormal returns in a way that implies opening a large dataset (about 2m obs.) about 400 times which I think costs a lot of time.

So my idea is to create small datasets (for each stock one dataset). Is there a way to quickly create a dataset only containing the observations of one stock (uniquely identified by Permno)?

Currently my only idea is to open the large dataset drop all obs. except the ones of one stock and save it. But doing that for every stock forces me to open the large dataset 10 000 times, so it doesn't really save me time.

Some combination of by (permno) and save would be cool.

While -save- does not allow -if- or -in- qualifiers, -outsheet- does. Depending on the exact details of your dataset, the conversion overhead might be worthwhile. Of course, -by- would be even better, but I don't see how to get that advantage. Just reducing the i/o with outsheet will likely be a big help, though.

Note that rules of thumb (such as avoiding looping over Stata
statements) are only rules of thumb, and when datasets get very large,
they may no longer hold. In your case I might examine the possibility
of using the -file open- and -file write- statements in a double loop.
It might be worth the programming effort, depending on how often you
will want to do this.

daniel feenberg
*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index