[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Problems in load large data or read several fields from CSV data

From	Steven Samuels <[email protected]>
To	[email protected]
Subject	Re: st: Problems in load large data or read several fields from CSV data
Date	Wed, 21 Jan 2009 08:10:58 -0500

There was a superfluous, but harmless option, in my first shellstatement.



Haiyan,

You might also try to pre-process the text file with awk. Here's anexample. I put the shell command into a stata do file, but you couldwrite a script for your system outside of Stata.


**************************CODE BEGINS**************************
* Data is in seven comma-separated fields in source.txt.
* We want fields 3 and 6
*"1,2,3,4,55,6,77"
*"11,22,33,44,5,66,7"
****************************************************

shell awk 'BEGIN {FS=","; OFS =","} ; {print($3, $6)}' source.txt >in.txt;

insheet x3 x6 using in.txt, comma
list
***************************CODE ENDS***************************


-Steve

On Jan 21, 2009, at 6:36 AM, <[email protected]><[email protected]> wrote:

Dear Joseph,

Many thanks for all your advice.

Best wishes,
Haiyan
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Joseph
Coveney
Sent: 20 January 2009 14:56
To: Statalist

Subject: Re: st: Problems in load large data or read several fieldsfrom

CSV data


Haiyan Gao wrote:

I have a very large dataset in CSV format with 486,000 KB. The data
contains more than 100 fields and more than 300,000 recodes. I have
tried to open this file by set mem 500M (or 1000M) and used

insheet using filename.csv, clear

The error message shows that there is no enough memory to load thedata.

Could anyone suggest me on the followings?

1) How to read only several fields from this CSV data file, say the
first, thirteenth and thirtieth?
2) What command should I try to load the whole data?

------------------------------------------------------------------------

--------

The easiest and most convenient way is to use Stat/Transfer

( www.stattransfer.com ) for this kind of problem, especially ifyou're

going to encounter it regularly.

Absent that, you could make use of Stata's -file- command to -read-in alimited number of records of the CVS file, turn right around and -write-

them to a -tempfile-, then -insheet- that, and save it to an
intermediate Stata dataset; repeat (reading the first record each time
in order to read the variable names) with successive chunks of the
original CSV file, and -append-

the pieces (the intermediate Stata datasets). In order toautomate the

process, you'd put it in a -while- loop having -file- look for the

end-of-file marker. You can use -file- to read in the first,thirteenth

and thirtieth logical records (rows), as well.

Joseph Coveney

P.S.  It's considered better form to avoid replying to a previous post
when starting a new thread.


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

------------------------------------------------------------------------

This email is confidential and is intended solely for the person or

Entity to whom it is addressed. If this is not you, please forwardthe

Message to [email protected].  We have scanned this email
before sending it, but cannot guarantee that malicious software is
absent and we shall carry no liability in this regard.

We advise that information intended to be kept confidential should not
Be sent by email.  We also advise that health concerns should be
Discussed with a medical professional in person or by telephone.
NHS Direct can also provide advice.  We shall not be liable for any
failure to follow this advice. University College London Hospitals NHS
Foundation Trust (UCLH).


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: Problems in load large data or read several fields from CSV data
  - From: David Elliott <[email protected]>
- RE: st: Problems in load large data or read several fields from CSV data
  - From: "Nick Cox" <[email protected]>

References:
- RE: st: Problems in load large data or read several fields from CSV data
  - From: <[email protected]>

Prev by Date: Re: st: Problems in load large data or read several fields from CSV data
Next by Date: st: Partial Correlation Kendall's Tau (or Somers D)
Previous by thread: Re: st: Problems in load large data or read several fields from CSV data
Next by thread: RE: st: Problems in load large data or read several fields from CSV data
Index(es):
- Date
- Thread