Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Re: Saving 1 observation


From   "Sergiy Radyakin" <serjradyakin@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Re: Saving 1 observation
Date   Fri, 30 May 2008 11:05:02 -0400

Thank you everyone who responded to my question regarding saving one
observation (or small number of observations) in Stata.

Michael Blasnik suggested using -post- command, which allows creating
native Stata datasets, although without labels (I am adding them later
in a separate loop).

Jeph Herrin noted however, that while -save- does not support if/in,
-outsheet- does, and this could be exploited to solve the problem at
hand.

Both suggestions boost performance of my initial program, which is
what I was looking for. But as Nick Cox has noted, the problem appears
to be already long-known and no more efficient solution has been found
so far.

So, as a wish for Stata 11 may I suggest adding if/in to save's
syntax? (Or a special export command for this purpose if if/in could
cause any compatibility issues).

Thank you,
   Sergiy Radyakin




On 5/29/08, Jeph Herrin <junk@spandrel.net> wrote:
> An imperfect solution might be to use -outsheet-, which
> allows the -if- qualifier. First, save your labels:
>
>  label save using mylabels, replace
>  forv i=1/N {
>        outsheet using Portion`i'.csv if Needed`i', replace
>  }
>  clear
>  forv i=1/N {
>        insheet using Portion`i'.csv, clear
>        do mylabels
>        save Portion`i', replace
>  }
>
> The first loop saves the fragmentary datasets as CSV files;
> the second reads them in, applies the labels, and saves them
> as Stata files.
>
> It is clumsy, but I think will be much faster than your
> current solution.
>
> hth,
> Jeph
>
>
>
>
>
> Sergiy Radyakin wrote:
> > Dear Michael,
> >
> > thank you very much for your suggestions. Just as you wrote, the first
> > one saves about half the time needed and is a good improvement. The
> > second one is a bit complicated, since I don't immediately see how the
> > labels can be declared/saved with this approach.
> >
> > So, I am thinking about saving the labels first with -label save-,
> > then dumping the data into several files with -post-, then open them
> > one-by-one and apply the saved labels and resave. Would that be the
> > fastest way to do it?
> >
> > Thank you,
> >   Sergiy Radyakin
> >
> >
> >
> >
> > On 5/28/08, Michael Blasnik <michael.blasnik@verizon.net> wrote:
> >
> > > ...
> > >
> > > I have two suggestions that may be worth exploring:
> > >
> > > 1) use -restore, preserve- instead of -restore- and you will save the
> time
> > > required to preserve the dataset next time.
> > >
> > > 2) a little more tricky, but you could employ -post-  to post an
> observation
> > > to a dataset.  I'm not sure how much time this would save but it may be
> > > worth a try.
> > >
> > > Michael Blasnik
> > >
> > >
> > > ----- Original Message ----- From: "Sergiy Radyakin"
> > > <serjradyakin@gmail.com>
> > > To: <statalist@hsphsun2.harvard.edu>
> > > Sent: Wednesday, May 28, 2008 6:31 PM
> > > Subject: st: Saving 1 observation
> > >
> > >
> > >
> > > > Hello All!
> > > >
> > > > I have a large dataset (to be specific ~ 1mln observations, 600MB).
> > > >
> > > > I need to (repeatedly) save several small portions of it (small can be
> > > > as small as 1 observation) into separate files.
> > > >
> > > > So far it is done similarly to this
> > > >
> > > > preserve
> > > >  keep if Needed1
> > > >  save "Portion1"
> > > > restore
> > > >
> > > > preserve
> > > >  keep if Needed2
> > > >  save "Portion2"
> > > > restore
> > > >
> > > > ... etc ...
> > > >
> > > > where variables Needed1 and Needed2 are dummies generated earlier in
> the
> > > >
> > > code.
> > >
> > > > This works. But it is painfully slow.
> > > >
> > > > The problem is that it will necessarily have to preserve/restore the
> > > > whole large dataset.
> > > > -save-  does not support -if- and -in- modifiers, otherwise my ideal
> > > > choice would be:
> > > >
> > > > save "Portion1" if Needed1
> > > > save "Portion2" if Needed2
> > > >
> > > > As an alternative I was thinking of saving the dataset directly (by
> > > > generating Stata file byte-by-byte), but since I need labels to be
> > > > preserved together with the data, this becomes more tricky, and
> > > > reinventing what is already [well] done, does not sound like a good
> > > > idea.
> > > >
> > > > To pose a specific question: how to save one observation 1<=K<=_N
> > > > (with labels) to a Stata file, without having to save the whole
> > > > dataset?
> > > >
> > > > Version of Stata: Stata 10/ Windows
> > > >
> > > > Thank you,
> > > >  Sergiy Radyakin
> > > >
> > > >
> > > *
> > > *   For searches and help try:
> > > *   http://www.stata.com/support/faqs/res/findit.html
> > > *   http://www.stata.com/support/statalist/faq
> > > *   http://www.ats.ucla.edu/stat/stata/
> > >
> > >
> > *
> > *   For searches and help try:
> > *   http://www.stata.com/support/faqs/res/findit.html
> > *   http://www.stata.com/support/statalist/faq
> > *   http://www.ats.ucla.edu/stat/stata/
> >
> >
> *
> *   For searches and help try:
> *   http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index