Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Re: Saving 1 observation


From   Jeph Herrin <junk@spandrel.net>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Re: Saving 1 observation
Date   Thu, 29 May 2008 09:34:22 -0400

An imperfect solution might be to use -outsheet-, which
allows the -if- qualifier. First, save your labels:

 label save using mylabels, replace
 forv i=1/N {
	outsheet using Portion`i'.csv if Needed`i', replace
 }
 clear
 forv i=1/N {
	insheet using Portion`i'.csv, clear
	do mylabels
	save Portion`i', replace
 }

The first loop saves the fragmentary datasets as CSV files;
the second reads them in, applies the labels, and saves them
as Stata files.

It is clumsy, but I think will be much faster than your
current solution.

hth,
Jeph

	
	

Sergiy Radyakin wrote:
Dear Michael,

thank you very much for your suggestions. Just as you wrote, the first
one saves about half the time needed and is a good improvement. The
second one is a bit complicated, since I don't immediately see how the
labels can be declared/saved with this approach.

So, I am thinking about saving the labels first with -label save-,
then dumping the data into several files with -post-, then open them
one-by-one and apply the saved labels and resave. Would that be the
fastest way to do it?

Thank you,
   Sergiy Radyakin




On 5/28/08, Michael Blasnik <michael.blasnik@verizon.net> wrote:
...

I have two suggestions that may be worth exploring:

1) use -restore, preserve- instead of -restore- and you will save the time
required to preserve the dataset next time.

2) a little more tricky, but you could employ -post-  to post an observation
to a dataset.  I'm not sure how much time this would save but it may be
worth a try.

Michael Blasnik


----- Original Message ----- From: "Sergiy Radyakin"
<serjradyakin@gmail.com>
To: <statalist@hsphsun2.harvard.edu>
Sent: Wednesday, May 28, 2008 6:31 PM
Subject: st: Saving 1 observation


Hello All!

I have a large dataset (to be specific ~ 1mln observations, 600MB).

I need to (repeatedly) save several small portions of it (small can be
as small as 1 observation) into separate files.

So far it is done similarly to this

preserve
 keep if Needed1
 save "Portion1"
restore

preserve
 keep if Needed2
 save "Portion2"
restore

... etc ...

where variables Needed1 and Needed2 are dummies generated earlier in the
code.
This works. But it is painfully slow.

The problem is that it will necessarily have to preserve/restore the
whole large dataset.
-save-  does not support -if- and -in- modifiers, otherwise my ideal
choice would be:

save "Portion1" if Needed1
save "Portion2" if Needed2

As an alternative I was thinking of saving the dataset directly (by
generating Stata file byte-by-byte), but since I need labels to be
preserved together with the data, this becomes more tricky, and
reinventing what is already [well] done, does not sound like a good
idea.

To pose a specific question: how to save one observation 1<=K<=_N
(with labels) to a Stata file, without having to save the whole
dataset?

Version of Stata: Stata 10/ Windows

Thank you,
  Sergiy Radyakin

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index