Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: Problems in load large data or read several fields from CSV data


From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   RE: st: Problems in load large data or read several fields from CSV data
Date   Wed, 21 Jan 2009 14:09:29 -0000

Note that awk might not be installed on some systems, especially under
Windows, but public domain versions should be downloadable for all
platforms on which Stata is supported. 

Nick 
[email protected] 

Steven Samuels

There was a superfluous, but harmless option, in my first shell  
statement.

Haiyan,

You might also try to pre-process the text file with awk.  Here's an  
example.  I put the shell command into a stata do file, but you could  
write a script for your system outside of Stata.

**************************CODE BEGINS**************************
* Data is in seven comma-separated fields in source.txt.
* We want fields 3 and 6
*"1,2,3,4,55,6,77"
*"11,22,33,44,5,66,7"
****************************************************
shell awk 'BEGIN {FS=","; OFS =","} ; {print($3, $6)}' source.txt >  
in.txt;
insheet x3 x6 using in.txt, comma
list
***************************CODE ENDS***************************

On Jan 21, 2009, at 6:36 AM, <[email protected]>  

> Many thanks for all your advice.

Joseph Coveney

> Haiyan Gao wrote:
>
> I have a very large dataset in CSV format with 486,000 KB. The data
> contains more than 100 fields and more than 300,000 recodes. I have
> tried to open this file by set mem 500M (or 1000M) and used
>
> insheet using filename.csv, clear
>
> The error message shows that there is no enough memory to load the  
> data.
> Could anyone suggest me on the followings?
>
> 1) How to read only several fields from this CSV data file, say the
> first, thirteenth and thirtieth?
> 2) What command should I try to load the whole data?
>
> ----------------------------------------------------------------------

> --
> --------
>
> The easiest and most convenient way is to use Stat/Transfer
> ( www.stattransfer.com ) for this kind of problem, especially if  
> you're
> going to encounter it regularly.
>
> Absent that, you could make use of Stata's -file- command to -read-  
> in a
> limited number of records of the CVS file, turn right around and - 
> write-
> them to a -tempfile-, then -insheet- that, and save it to an
> intermediate Stata dataset; repeat (reading the first record each time
> in order to read the variable names) with successive chunks of the
> original CSV file, and -append-
> the pieces (the intermediate Stata datasets).   In order to  
> automate the
> process, you'd put it in a -while- loop having -file- look for the
> end-of-file marker.  You can use -file- to read in the first,  
> thirteenth
> and thirtieth logical records (rows), as well.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index