Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Use a few observations from a tab-delimited or csv file


From   "Martin Weiss" <martin.weiss@uni-tuebingen.de>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: Use a few observations from a tab-delimited or csv file
Date   Wed, 20 Aug 2008 16:56:57 +0200

Well, to  create a dataset of summary datasets, use -h collapse-. If you had
access to Stat/Transfer, that would facilitate your problem with the size of
the file. Excel could probably take care of the conversion as well, but is
usually frowned upon on the list...

HTH
Martin


-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Todd D. Kendall
Sent: Wednesday, August 20, 2008 4:41 PM
To: statalist@hsphsun2.harvard.edu
Subject: st: Use a few observations from a tab-delimited or csv file

Dear Statlisters,

I have a file that is currently in csv format (or I could easily
convert it to tab-delimited).  It is fairly large: roughly 80,000
observations and 2,200 variables.

In fact, it is too large to fit into Stata (I am running Stata 9.2 on
a Windows XP machine with 1 GB of RAM).  The maximum memory I can
allocate to Stata is -set mem 636m-.  When I try to simply insheet the
file at this setting, I get only 16,276 observations read in -- not
anywhere close to the whole file, so I don't think there are any easy
tweaks to make this work.

However, it turns out that, for roughly the last 2,000 variables, I
really don't need every single variable; instead, I just need a few
summary statistics calculated over these 2,000 variables (e.g., the
mean or standard deviation).  My idea is to write a simple do file
that loads in, say, the first 15,000 observations, computes the mean
and standard deviation of the 2,000 variables, then drops these
variabes and saves as a .dta file.  I would then repeat on the next
15,000 observations, and so on.  Then I could just append all the
little files together, and I would assume I could fit this into Stata,
as it would only have around 200 variables instead of 2,200.

My problem is that insheet doesn't work with "in" -- i.e., I can't
write -insheet filename.csv in 1/15000-.  Alternatively, if I could
convert the file from csv into a fixed format, I could write a
dictionary and use infix, but my Google search for how to convert a
csv file into a fixed-column file has come up pretty dry.

Am I barking up the wrong tree completely here, or am I missing
something obvious?  I greatly appreciate any suggestions.
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index