Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Re: Reading non-standard data


From   "Michael Blasnik" <[email protected]>
To   <[email protected]>
Subject   st: Re: Reading non-standard data
Date   Mon, 15 Mar 2004 12:21:06 -0500

There are a few ways that you could approach this, but I think the simplest
may be to just read in the data as text strings and then manipulate it.  If
the rows are no more than 80 characters (or if you hace Stata/SE, then you
could change the first line to read in through column 244 instead of 80),
then this might work:

infix str line 1-80 using rawdata.txt
* identify lines with Grid-ref info
gen gridref=line if index(line,"Grid")
gen gridgroup=sum(gridref!="")
replace gridref=gridref[_n-1] if gridref==""
drop if index(line,"Grid")
split line, gen(month) destring
mvdecode month*, mv(-999)
gen year=1900+gridgroup
replace gridref=subinstr(gridref,"Grid-ref=","",.)
split gridref, gen(coord) parse(,) destring
rename coord1 xcoord
rename coord2 ycoord
order year xcoord ycoord
drop gridref gridgroup line

This code should give you:
xcoord and ycoord numeric vars,
year (numeric var running 1901 through 2000)
month1, month2, month3, ...month12 : numeric variables containing the data
values

If it doesn't work as expected, get rid of the last line (that drops the
working variables) and take a look at the data where i doesn't seem to work.
You may have problems with the destring option on the split command if there
are non-numeric characters.

Good luck,

Michael Blasnik
[email protected]

----- Original Message ----- 
From: "Rio, Martin" <[email protected]>
To: <[email protected]>
Sent: Monday, March 15, 2004 11:35 AM
Subject: st: Reading non-standard data


> I need to read a climate dataset into Stata. My problem is that the
dataset
> in not 1-observation per row, but the information in each row depends on a
> previous row. The dataset contains monthly measurements for each cell on a
> grid from 1901-2000. The grid is big, but for the purpose of illustration
> suppose it is 10x10 (10 cells wide by 10 cells tall).
>
> Each cell of the grid is identified by a header observation that states
its
> x, y coordinated and is followed by rows that contain the data
corresponding
> to that grid, which are 100 rows (years 1901-2000) and 12 columns (months
> Jan-Dec). The format is fixed in the sense that there is always a header
row
> followed by 100 rows with 12 columns sepparated by a space (missing values
> are coded -999, so that there are always values for columns)
>
> This is an example of how the dataset looks:
>
> Grid-ref=   1, 1
>    40   73  139   97   66  338  523  265  172   57   96   72
>    86   87  125  106   74  233  275  419   89   37   67   72
>     ...
>    60  112   58   22  145  273  576  451   82   91  194   61
> Grid-ref=   1, 2
>    32  103  108   68  126  312  565  486  213   54   76   29
>    13  106   42   93  112  344  244  422   88  104   59   73
>   ...
>   101  104   57   46   91  165  601  107  163   60  161   59
>    [and so on, until]
> Grid-ref=   10, 10
>    13  102   19   98   58  309  247  303  181   48   68   81
>    123   60  133  107  107  208  321  502   48  107   80  103
>     ...
>    101   89  187   74  201  515  509  490  102   86  100   52
>
> I would like to read these data into a Stata dataset with columns:
> x-coordinate, y-coordinate, year, jan, feb, ..., dec.
>
> I've been trying for a while, but I am really at a loss and would
appreciate
> any help. If someone wants to help me and finds this email confusing,
please
> e-mail me for clarification.
>
> Thanks in advance,
>
> Mart�n Rio
> Charles River Associates
> Washington, DC
> phone: (202) 662-3957
> fax: (202) 662-3910


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index