Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: CSV read with limits

From	Mike Lacy <[email protected]>
To	[email protected]
Subject	Re: st: CSV read with limits
Date	Sat, 12 Mar 2011 16:24:39 -0700


Argyn Kuketayev <[email protected]> wrote:

>I have CSV file (comma separated). I need Stata to read the 1st line
>with variable names, then import only selected variables. Also I want
>to limit the number of observations to read.
>
>I cant figure out how to do it in Stata. In SAS it would be easy with
>DATA, var list and OBS option.
>
>thanks
>
>- --
>Argyn Kuketayev

I agree with Argyn that this is more difficult than it should be, asI've face similar problems myself.Considering that many large data sets are distributed as csv, havingan option on insheet to read a limited number of lines and/orvariables would be natural. That being said, here's a way to solveArgyn's problem with available tools.

The user written program -chunky- can break CSV files into chunks,while retaining the header on each one. It can be used easily (if abit inefficiently) to address the current problem, as follows:



// Make a large-ish CSV file to work with
clear
set obs 10000
forval i = 1/200 {
   gen x`i' = 100 * runiform()
}
outsheet using c:\temp\big.csv, comma names nolabel replace
//
//

// Get the user-written program -chunky- and use it to break up thefile into chunks

ssc install chunky
// ********** Real work starts here
cd c:\temp  // need somewhere to put the chunks
// choose # of bytes in each chunk; larger is faster
local size = 10000000

chunky using c:\temp\big.csv, chunksize(`size') header(include)stub(piece) replaceinsheet using "piece0001.txt", clear comma names // chunky namesfiles consecutively

keep x1 x100 x150 x200
keep in 1/500 // retain the desired lines
//
foreach f in `s(filelist)' {
   erase `f'
}

The preceding would be less clumsy if -chunky- had options to
1) allow reading of just one chunk, with size specified by line length;
2))use  tempfiles to store the pieces.

These are not complaints, just some thoughts about about usefuloptions that I suspect are consistent with the way -chunky- works.


Regards,




=-=-=-=-=-=-=-=-=-=-=-=-=
Mike Lacy, Assoc. Prof.
Soc. Dept., Colo. State. Univ.
Fort Collins CO 80523 USA

(970)-491-6721


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: CSV read with limits
  - From: David Elliott <[email protected]>
- Re: st: CSV read with limits
  - From: Steven Samuels <[email protected]>

Prev by Date: Re: st: survival analysis in the presence of competing risks and multi-level data
Next by Date: Re: st: CSV read with limits
Previous by thread: Re: st: CSV read with limits
Next by thread: Re: st: CSV read with limits
Index(es):
- Date
- Thread