[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Problems in load large data or read several fields from CSV data

From   Steven Samuels <>
Subject   Re: st: Problems in load large data or read several fields from CSV data
Date   Wed, 21 Jan 2009 08:10:58 -0500

There was a superfluous, but harmless option, in my first shell statement.


You might also try to pre-process the text file with awk. Here's an example. I put the shell command into a stata do file, but you could write a script for your system outside of Stata.

**************************CODE BEGINS**************************
* Data is in seven comma-separated fields in source.txt.
* We want fields 3 and 6
shell awk 'BEGIN {FS=","; OFS =","} ; {print($3, $6)}' source.txt > in.txt;
insheet x3 x6 using in.txt, comma
***************************CODE ENDS***************************


On Jan 21, 2009, at 6:36 AM, <> <> wrote:

Dear Joseph,

Many thanks for all your advice.

Best wishes,
-----Original Message-----
[] On Behalf Of Joseph
Sent: 20 January 2009 14:56
To: Statalist
Subject: Re: st: Problems in load large data or read several fields from
CSV data

Haiyan Gao wrote:

I have a very large dataset in CSV format with 486,000 KB. The data
contains more than 100 fields and more than 300,000 recodes. I have
tried to open this file by set mem 500M (or 1000M) and used

insheet using filename.csv, clear

The error message shows that there is no enough memory to load the data.
Could anyone suggest me on the followings?

1) How to read only several fields from this CSV data file, say the
first, thirteenth and thirtieth?
2) What command should I try to load the whole data?

---------------------------------------------------------------------- --

The easiest and most convenient way is to use Stat/Transfer
( ) for this kind of problem, especially if you're
going to encounter it regularly.

Absent that, you could make use of Stata's -file- command to -read- in a limited number of records of the CVS file, turn right around and - write-
them to a -tempfile-, then -insheet- that, and save it to an
intermediate Stata dataset; repeat (reading the first record each time
in order to read the variable names) with successive chunks of the
original CSV file, and -append-
the pieces (the intermediate Stata datasets). In order to automate the
process, you'd put it in a -while- loop having -file- look for the
end-of-file marker. You can use -file- to read in the first, thirteenth
and thirtieth logical records (rows), as well.

Joseph Coveney

P.S.  It's considered better form to avoid replying to a previous post
when starting a new thread.

*   For searches and help try:

---------------------------------------------------------------------- --
This email is confidential and is intended solely for the person or
Entity to whom it is addressed. If this is not you, please forward the
Message to  We have scanned this email
before sending it, but cannot guarantee that malicious software is
absent and we shall carry no liability in this regard.

We advise that information intended to be kept confidential should not
Be sent by email.  We also advise that health concerns should be
Discussed with a medical professional in person or by telephone.
NHS Direct can also provide advice.  We shall not be liable for any
failure to follow this advice. University College London Hospitals NHS
Foundation Trust (UCLH).

*   For searches and help try:

*   For searches and help try:

© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index