[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
Maarten buis <[email protected]> |

To |
[email protected] |

Subject |
Re: st: Use a few observations from a tab-delimited or csv file |

Date |
Wed, 20 Aug 2008 15:59:42 +0100 (BST) |

It should not be too much of a problem: Using the formulae from http://www.stata.com/support/faqs/data/howbig.html you can see that if -insheet- stores your variables as floats your dataset should be about 672 MB, which is too much for computer, but if the variables can be stored as bytes the size reduces to 168 mb, which is well within the limit of your computer. If your dataset contains many dummy variables you could import the data easily using stattransfer, or you could split the data up into two parts, -insheet- them separetely, -compress- them, and -merge-/-append- the parts to create one dataset. Even before you start considering this, you should seriously think if you really need all 2200 variables... -- Maarten --- "Todd D. Kendall" <[email protected]> wrote: > Dear Statlisters, > > I have a file that is currently in csv format (or I could easily > convert it to tab-delimited). It is fairly large: roughly 80,000 > observations and 2,200 variables. > > In fact, it is too large to fit into Stata (I am running Stata 9.2 on > a Windows XP machine with 1 GB of RAM). The maximum memory I can > allocate to Stata is -set mem 636m-. When I try to simply insheet > the > file at this setting, I get only 16,276 observations read in -- not > anywhere close to the whole file, so I don't think there are any easy > tweaks to make this work. > > However, it turns out that, for roughly the last 2,000 variables, I > really don't need every single variable; instead, I just need a few > summary statistics calculated over these 2,000 variables (e.g., the > mean or standard deviation). My idea is to write a simple do file > that loads in, say, the first 15,000 observations, computes the mean > and standard deviation of the 2,000 variables, then drops these > variabes and saves as a .dta file. I would then repeat on the next > 15,000 observations, and so on. Then I could just append all the > little files together, and I would assume I could fit this into > Stata, > as it would only have around 200 variables instead of 2,200. > > My problem is that insheet doesn't work with "in" -- i.e., I can't > write -insheet filename.csv in 1/15000-. Alternatively, if I could > convert the file from csv into a fixed format, I could write a > dictionary and use infix, but my Google search for how to convert a > csv file into a fixed-column file has come up pretty dry. > > Am I barking up the wrong tree completely here, or am I missing > something obvious? I greatly appreciate any suggestions. > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > ----------------------------------------- Maarten L. Buis Department of Social Research Methodology Vrije Universiteit Amsterdam Boelelaan 1081 1081 HV Amsterdam The Netherlands visiting address: Buitenveldertselaan 3 (Metropolitan), room Z434 +31 20 5986715 http://home.fsw.vu.nl/m.buis/ ----------------------------------------- Send instant messages to your online friends http://uk.messenger.yahoo.com * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: Use a few observations from a tab-delimited or csv file***From:*"Todd D. Kendall" <[email protected]>

- Prev by Date:
**Re: Re: st: RE: RE: Bootstrap and Technical analysis** - Next by Date:
**Re: st: Use a few observations from a tab-delimited or csv file** - Previous by thread:
**st: RE: Use a few observations from a tab-delimited or csv file** - Next by thread:
**Re: st: Use a few observations from a tab-delimited or csv file** - Index(es):

© Copyright 1996–2024 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |