Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Re: reading a txt file that loops

From	Mike Lacy <[email protected]>
To	[email protected]
Subject	st: Re: reading a txt file that loops
Date	Sun, 17 Apr 2011 09:45:32 -0600


[email protected] wrote

>Are there any shortcuts to reading a data file that has the following format
>other than to reorganize the data before importing?

Here's a simple approach with ordinary Stata machinery. The generalidea is to read each line of the data as a string and then lightlymassage it into something that can be -insheet-ed in the form

"FIPS, y1, y2, y3, y4, locname, decstart"

I presume the data has the following form and is in "loopdata.csv"

FIPS        1990       1980       1970       1960
00000  248709873  226545805  203211926  179323175 United States
18000    5544159    5490224    5193669    4662498 Indiana
18001      31095      29619      26871      24643 Adams County
18003     300836     294335     280455     232196 Allen County
18005      63657      65088      57022      48198 Bartholomew County
FIPS        1950       1940       1930       1920
00000  151325798  132164569   12320262  106021537 United States
18000    3934224    3427796    3238503    2930390 Indiana
18001      22393      21254      19957      20503 Adams County
18003     183722     155084     146743     114303 Allen County
18005      36108      28276      24864      23887 Bartholomew County

//
//clear
insheet using loopdata.csv // each line of the data becomes a string in v1
replace v1 = itrim(v1)  //multiple spaces are a nuisance
// Mark cases according to starting decade
gen decstart = (word(v1,2)) if (strpos(v1, "FIPS") > 0)
replace decstart = decstart[_n-1] if missing(decstart)

// We want something comma delimited, with only one header line withvariable names.

replace v1 = subinstr(v1, " ", ",", 5) // 5 is unique to this data, of course
drop if ((strpos(v, "FIPS") > 0) & (_n > 1))
replace v1 = "FIPS, y1, y2, y3, y4, locname, decstart" if _n ==1
replace v1 = v1 + ", " + decstart if _n > 1
drop decstart
// Save data file as a csv, then reimport
tempfile temp
outsheet using `temp', nonames noquote
clear
insheet using `temp', names comma
//  A long data structure makes sense, I'm guessing
reshape long y, i(fips decstart) j(year)
replace year = decstart - (year-1) * 10

Regards,

=-=-=-=-=-=-=-=-=-=-=-=-=
Mike Lacy, Assoc. Prof.
Soc. Dept., Colo. State. Univ.
Fort Collins CO 80523 USA

(970)-491-6721


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Prev by Date: st: Panel Data-Autocorrelation
Next by Date: Re: st: Different selection variables for different subpopulations in the sample
Previous by thread: st: Panel Data-Autocorrelation
Next by thread: st: Interpreting ivreg2 outputs
Index(es):
- Date
- Thread