[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Use a few observations from a tab-delimited or csv file

From   "Martin Weiss" <>
To   <>
Subject   st: RE: Use a few observations from a tab-delimited or csv file
Date   Wed, 20 Aug 2008 16:56:57 +0200

Well, to  create a dataset of summary datasets, use -h collapse-. If you had
access to Stat/Transfer, that would facilitate your problem with the size of
the file. Excel could probably take care of the conversion as well, but is
usually frowned upon on the list...


-----Original Message-----
[] On Behalf Of Todd D. Kendall
Sent: Wednesday, August 20, 2008 4:41 PM
Subject: st: Use a few observations from a tab-delimited or csv file

Dear Statlisters,

I have a file that is currently in csv format (or I could easily
convert it to tab-delimited).  It is fairly large: roughly 80,000
observations and 2,200 variables.

In fact, it is too large to fit into Stata (I am running Stata 9.2 on
a Windows XP machine with 1 GB of RAM).  The maximum memory I can
allocate to Stata is -set mem 636m-.  When I try to simply insheet the
file at this setting, I get only 16,276 observations read in -- not
anywhere close to the whole file, so I don't think there are any easy
tweaks to make this work.

However, it turns out that, for roughly the last 2,000 variables, I
really don't need every single variable; instead, I just need a few
summary statistics calculated over these 2,000 variables (e.g., the
mean or standard deviation).  My idea is to write a simple do file
that loads in, say, the first 15,000 observations, computes the mean
and standard deviation of the 2,000 variables, then drops these
variabes and saves as a .dta file.  I would then repeat on the next
15,000 observations, and so on.  Then I could just append all the
little files together, and I would assume I could fit this into Stata,
as it would only have around 200 variables instead of 2,200.

My problem is that insheet doesn't work with "in" -- i.e., I can't
write -insheet filename.csv in 1/15000-.  Alternatively, if I could
convert the file from csv into a fixed format, I could write a
dictionary and use infix, but my Google search for how to convert a
csv file into a fixed-column file has come up pretty dry.

Am I barking up the wrong tree completely here, or am I missing
something obvious?  I greatly appreciate any suggestions.
*   For searches and help try:

*   For searches and help try:

© Copyright 1996–2015 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index