Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: reading a txt file that loops


From   Robert Picard <picard@netbox.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: reading a txt file that loops
Date   Sun, 17 Apr 2011 12:47:33 -0400

I suspect that the data you are describing can be found at:

http://www.census.gov/population/cencounts/in190090.txt

Since there are only 3 segments of data, you could simply use an
editor to copy and paste each part into different files. You are still
stuck having to input fixed-format data into Stata. Here's an approach
that processes the hole thing all at once:

*--------------------------- begin example -----------------------
* original data from:
* http://www.census.gov/population/cencounts/in190090.txt

clear
#delimit ;
infix str5 fips 1-5
      str11 y1 6-16
      str11 y2 17-27
      str11 y3 28-38
      str11 y4 39-49
      str20 place 51-70
using "in190090.txt";
#delimit cr
compress

drop if fips == ""
gen segment = sum(fips=="FIPS")
drop if segment == 0
tempfile main
save "`main'"

* Loop over each segment and rename data vars
sum segment, meanonly
local n = r(max)
forvalues i = 1/`n' {
	use "`main'", clear
	keep if segment == `i'
	drop segment
	forvalues j = 1/4 {
		if y`j'[1] != "" rename y`j' pop`=y`j'[1]'
		else drop y`j'
	}
	sort fips
	tempfile part`i'
	save "`part`i''"
}

use "`part1'", clear
forvalues i = 2/`n' {
	merge 1:1 fips using "`part`i''", nogen
}

drop if fips == "FIPS"
destring pop*, replace
order pop*, last alpha

*--------------------- end example --------------------------



On Sat, Apr 16, 2011 at 8:35 AM, Sears Generic <searsgeneral@indy.rr.com> wrote:
> Are there any shortcuts to reading a data file that has the following format
> other than to reorganize the data before importing?  The data file is for
> population by year by geographic location (e.g. United States, Indiana, then
> 3 counties in Indiana).  "FIPS" is a unique identifier for each county.  The
> problem is that the text file loops (i.e. only provides 4 decades of data
> before starting over) on a new line.  In the example below I've reduced the
> issue to the United States, Indiana, and 3 counties, but the full dataset
> has every county for every state so the looping does not recur in a
> consistent way.  Any suggestions would be appreciated.
>
>
> FIPS        1990       1980       1970       1960
> 00000  248709873  226545805  203211926  179323175 United States
>
> 18000    5544159    5490224    5193669    4662498 Indiana
> 18001      31095      29619      26871      24643 Adams County
> 18003     300836     294335     280455     232196 Allen County
> 18005      63657      65088      57022      48198 Bartholomew County
>
> FIPS        1950       1940       1930       1920
> 00000  151325798  132164569   12320262  106021537 United States
>
> 18000    3934224    3427796    3238503    2930390 Indiana
> 18001      22393      21254      19957      20503 Adams County
> 18003     183722     155084     146743     114303 Allen County
> 18005      36108      28276      24864      23887 Bartholomew County
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index