Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: need help on reading big ASCII file to make a panel data set on temperature across geographic coordinates


From   Nick Cox <njcoxstata@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: need help on reading big ASCII file to make a panel data set on temperature across geographic coordinates
Date   Wed, 4 Jan 2012 09:41:04 +0000

See Statalist FAQ on requesting private replies. I have copied this
directly to Jose Ramon.

This is a very clearly posed question.

The only real awkwardness is the occurrence of lines explaining the
data structure. So, get rid of them. In my favourite text editor Vim
that is something like

:g/rows/d

Then this worked

infile temp1-temp72 using hadcrut3v.txt
di 69552/161
di 36 * 12
egen year = seq(), from(1850) block(432)
egen month = seq(), from(1) to(12) block(36)
egen id = seq(), from(1) to(36)
mvdecode *, mv(-1e30)

The -display- lines are not needed but I leave them here to show what
I did. That is, Stata told me I had read in 69552 lines. I knew that
there should be 161 years, so division showed me 432 observations per
year, and I checked that 36 * 12 is 432. (I can do that in my head
too.) Then -egen-'s -seq()- function comes in useful for creating
-year- and -month-.

There is still much restructuring and renaming too, but that yields to
appropriate commands.

In Stata <12, you will probably need to -set memory- first.

Nick

On Wed, Jan 4, 2012 at 1:01 AM, Jose Ramon Albert <jrgalbert@gmail.com> wrote:
> i have a big TEMPERATURE data set
>
> http://dl.dropbox.com/u/308664/hadcrut3v.txt
>
> that pertains to temperature data for 2592 = 36 x 72 geographic
> coordinates across
> months from the years 1850 to 2010
>
> for year = 1850 to 2010
>  for month = 1 to 12
>   format(2i6) year, month
>   for row = 1 to 36 (85-90N,80-85N,75-80N,...75-80S,80-85S,85-90S)
>    format(72(e10.3,1x)) 180W-175W,175W-170W,...,175-180E
>
> the first row in the txt file prior to the actual data,
>
>  1850     1     1    36 rows     72 columns. Missing=-1.000e+30
>
> announces that the data ff it are for month 1 (jan) 1850 which i
> would like to ignore, and then the next line
> again has a description
>
>  1850     2     1    36 rows     72 columns. Missing=-1.000e+30
>
> which then is succeded by the feb 1850 data... and this goes on and on
> to Dec 2010.
>
> i need to construct a panel database that will look like this:
>
> Year Month  Var1 Var 2 ... Var2592
> 1850   1       DATA read from file
> 1850   2 ..... (Data read from file)
>
> etc.
>
>
> can anyone help me out ? i understand this may need the use of
> the file, read command, but i have not used this before...
> grateful for help.
>
> please respond directly to
> jrgalbert@gmail.com
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index