Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: Problems in load large data or read several fields from CSV data


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   RE: st: Problems in load large data or read several fields from CSV data
Date   Wed, 21 Jan 2009 14:09:29 -0000

Note that awk might not be installed on some systems, especially under
Windows, but public domain versions should be downloadable for all
platforms on which Stata is supported. 

Nick 
n.j.cox@durham.ac.uk 

Steven Samuels

There was a superfluous, but harmless option, in my first shell  
statement.

Haiyan,

You might also try to pre-process the text file with awk.  Here's an  
example.  I put the shell command into a stata do file, but you could  
write a script for your system outside of Stata.

**************************CODE BEGINS**************************
* Data is in seven comma-separated fields in source.txt.
* We want fields 3 and 6
*"1,2,3,4,55,6,77"
*"11,22,33,44,5,66,7"
****************************************************
shell awk 'BEGIN {FS=","; OFS =","} ; {print($3, $6)}' source.txt >  
in.txt;
insheet x3 x6 using in.txt, comma
list
***************************CODE ENDS***************************

On Jan 21, 2009, at 6:36 AM, <Haiyan.Gao@uclh.nhs.uk>  

> Many thanks for all your advice.

Joseph Coveney

> Haiyan Gao wrote:
>
> I have a very large dataset in CSV format with 486,000 KB. The data
> contains more than 100 fields and more than 300,000 recodes. I have
> tried to open this file by set mem 500M (or 1000M) and used
>
> insheet using filename.csv, clear
>
> The error message shows that there is no enough memory to load the  
> data.
> Could anyone suggest me on the followings?
>
> 1) How to read only several fields from this CSV data file, say the
> first, thirteenth and thirtieth?
> 2) What command should I try to load the whole data?
>
> ----------------------------------------------------------------------

> --
> --------
>
> The easiest and most convenient way is to use Stat/Transfer
> ( www.stattransfer.com ) for this kind of problem, especially if  
> you're
> going to encounter it regularly.
>
> Absent that, you could make use of Stata's -file- command to -read-  
> in a
> limited number of records of the CVS file, turn right around and - 
> write-
> them to a -tempfile-, then -insheet- that, and save it to an
> intermediate Stata dataset; repeat (reading the first record each time
> in order to read the variable names) with successive chunks of the
> original CSV file, and -append-
> the pieces (the intermediate Stata datasets).   In order to  
> automate the
> process, you'd put it in a -while- loop having -file- look for the
> end-of-file marker.  You can use -file- to read in the first,  
> thirteenth
> and thirtieth logical records (rows), as well.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2025 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index