[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Martin Weiss" <martin.weiss@uni-tuebingen.de> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
st: RE: Use a few observations from a tab-delimited or csv file |

Date |
Wed, 20 Aug 2008 16:56:57 +0200 |

Well, to create a dataset of summary datasets, use -h collapse-. If you had access to Stat/Transfer, that would facilitate your problem with the size of the file. Excel could probably take care of the conversion as well, but is usually frowned upon on the list... HTH Martin -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Todd D. Kendall Sent: Wednesday, August 20, 2008 4:41 PM To: statalist@hsphsun2.harvard.edu Subject: st: Use a few observations from a tab-delimited or csv file Dear Statlisters, I have a file that is currently in csv format (or I could easily convert it to tab-delimited). It is fairly large: roughly 80,000 observations and 2,200 variables. In fact, it is too large to fit into Stata (I am running Stata 9.2 on a Windows XP machine with 1 GB of RAM). The maximum memory I can allocate to Stata is -set mem 636m-. When I try to simply insheet the file at this setting, I get only 16,276 observations read in -- not anywhere close to the whole file, so I don't think there are any easy tweaks to make this work. However, it turns out that, for roughly the last 2,000 variables, I really don't need every single variable; instead, I just need a few summary statistics calculated over these 2,000 variables (e.g., the mean or standard deviation). My idea is to write a simple do file that loads in, say, the first 15,000 observations, computes the mean and standard deviation of the 2,000 variables, then drops these variabes and saves as a .dta file. I would then repeat on the next 15,000 observations, and so on. Then I could just append all the little files together, and I would assume I could fit this into Stata, as it would only have around 200 variables instead of 2,200. My problem is that insheet doesn't work with "in" -- i.e., I can't write -insheet filename.csv in 1/15000-. Alternatively, if I could convert the file from csv into a fixed format, I could write a dictionary and use infix, but my Google search for how to convert a csv file into a fixed-column file has come up pretty dry. Am I barking up the wrong tree completely here, or am I missing something obvious? I greatly appreciate any suggestions. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

