[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Rodini, Mark" <mrodini@compasslexecon.com> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
RE: st: using the first n observations in a dataset w/o evaluating the whole thing? |

Date |
Thu, 3 Apr 2008 17:28:04 -0700 |

Thanks for the replies, and the long explanation. (Yes I did mean little _n: typo!) Anyway, I tried the suggestion: use in 1/99 using mydata and I did indeed find it took time. In fact, I tried to apply the idea to a 20MB dataset after having only set the memory to 10MB, and it completely froze up. It only worked on the 20MB dataset if I set memory to >20MB, and it was slow --as though it were reading the whole thing first. Oh well. -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of David Kantor Sent: Thursday, April 03, 2008 5:20 PM To: statalist@hsphsun2.harvard.edu Subject: Re: st: using the first n observations in a dataset w/o evaluating the whole thing? At 07:31 PM 4/3/2008, Mark Rodini wrote: >Greetings. > >Suppose I have a large Stata dataset (e.g. 3,000,000 observations) and I >only with to read in the first, say, 100 observations. > >I have tried the code, which works: > >use mydata if ( _N<100 ) > >However, evidently, this code goes through ALL 3 million observations to >evaluate the expression in parentheses, which can be very time consuming >(and sort of defeats the purpose). Is there a way to only read the >first 100 observations without having to evaluate the entire dataset? > >Perhaps some application of the "set obs 100"? But I have not been >successful. > >Thank you. >-Mark First, that is not officially valid syntax, though it is accepted. I find that it gets you 0 observations, though it does read through the whole file. You probably mean _n (little n), rather than _N. (I suppose _N is . during the loading process, so _N <100 is false). Officially correct syntax is, use if _n <100 using mydata or, better yet, use in 1/99 using mydata (This latter syntax is much more efficient.) But in any case, my experience has been that it always reads through the whole file. And you can tell it's dong that if you have 3000000 observations. The reason is that, in the file structure, there are some important elements that come after the data (values labels, I believe, for example), so there is a reason to have to read the whole file. At least that's how it's been as far as I know; I don't know if they've changed the file structure in that regard in version 10. I may have written to Stata Corp. about this some time in the past; if I had my way, there would either... be nothing after the end of the data segment, or be some way to jump directly to the part of the file that lies after the data. (The latter idea may or may not work, depending on file-system issues.) In either case, I would want it to not read the whole file if you asked for an initial subset. But as things stand now, we are stuck with this behavior. The only thing you can do is, if you plan to experiment on a small segment of the file (and want to load it many times), load a small segment and save it under a different name. Thus, you go through the lengthy process just once. use in 1/99 using mydata save mydata_shortversion Later... use mydata_shortversion -- should load quickly. Hope this helps. --David * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: using the first n observations in a dataset w/o evaluating the whole thing?***From:*"Rodini, Mark" <mrodini@compasslexecon.com>

**Re: st: using the first n observations in a dataset w/o evaluating the whole thing?***From:*David Kantor <kantor.d@att.net>

- Prev by Date:
**Re: st: using the first n observations in a dataset w/o evaluating the whole thing?** - Next by Date:
**Re: st: RE: gr bar for string variable** - Previous by thread:
**Re: st: using the first n observations in a dataset w/o evaluating the whole thing?** - Index(es):

© Copyright 1996–2015 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |