Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Re: reading a txt file that loops


From   Mike Lacy <Michael.Lacy@colostate.edu>
To   statalist@hsphsun2.harvard.edu
Subject   st: Re: reading a txt file that loops
Date   Sun, 17 Apr 2011 09:45:32 -0600


searsgeneral@indy.rr.com wrote

>Are there any shortcuts to reading a data file that has the following format
>other than to reorganize the data before importing?

Here's a simple approach with ordinary Stata machinery. The general idea is to read each line of the data as a string and then lightly massage it into something that can be -insheet-ed in the form
"FIPS, y1, y2, y3, y4, locname, decstart"

I presume the data has the following form and is in "loopdata.csv"

FIPS        1990       1980       1970       1960
00000  248709873  226545805  203211926  179323175 United States
18000    5544159    5490224    5193669    4662498 Indiana
18001      31095      29619      26871      24643 Adams County
18003     300836     294335     280455     232196 Allen County
18005      63657      65088      57022      48198 Bartholomew County
FIPS        1950       1940       1930       1920
00000  151325798  132164569   12320262  106021537 United States
18000    3934224    3427796    3238503    2930390 Indiana
18001      22393      21254      19957      20503 Adams County
18003     183722     155084     146743     114303 Allen County
18005      36108      28276      24864      23887 Bartholomew County

//
//clear
insheet using loopdata.csv // each line of the data becomes a string in v1
replace v1 = itrim(v1)  //multiple spaces are a nuisance
// Mark cases according to starting decade
gen decstart = (word(v1,2)) if (strpos(v1, "FIPS") > 0)
replace decstart = decstart[_n-1] if missing(decstart)
// We want something comma delimited, with only one header line with variable names.
replace v1 = subinstr(v1, " ", ",", 5) // 5 is unique to this data, of course
drop if ((strpos(v, "FIPS") > 0) & (_n > 1))
replace v1 = "FIPS, y1, y2, y3, y4, locname, decstart" if _n ==1
replace v1 = v1 + ", " + decstart if _n > 1
drop decstart
// Save data file as a csv, then reimport
tempfile temp
outsheet using `temp', nonames noquote
clear
insheet using `temp', names comma
//  A long data structure makes sense, I'm guessing
reshape long y, i(fips decstart) j(year)
replace year = decstart - (year-1) * 10

Regards,

=-=-=-=-=-=-=-=-=-=-=-=-=
Mike Lacy, Assoc. Prof.
Soc. Dept., Colo. State. Univ.
Fort Collins CO 80523 USA
(970)-491-6721

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index