Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Re: need help on reading big ASCII file to make a panel data set on temperature across geographic coordinates

From	Gordon Hughes <[email protected]>
To	[email protected]
Subject	st: Re: need help on reading big ASCII file to make a panel data set on temperature across geographic coordinates
Date	Wed, 04 Jan 2012 11:00:22 +0000

The HadCRUT3 dataset is a relatively straightforward one to handleonce you have done a minor edit on the header lines. All you reallyneed to do is a global replacement of "rows" by " " and "columns.Missing=" by " ". You can use Notepad++ for this if your regulartext editor cannot handle files of 60+ Mb.

Once you have done this, the code below will do almost everything youwant. [I wrote this to handle a version of the HadCRUT3 dataset upto mid-2010.] The edited version of the original data is stored inthe file "hadcrut3.txt". The panel_id and time_id variables aredefined at the end of the code. Replace the command -save- by-saveold- if you want to save the data as a State 9 dataset.


Gordon Hughes
[email protected]

========

capture cd "g:\CRU_Data";
capture log close;
log using "hadcrut3_grid_data.log", replace;
infile year month unit nrows ncolumns missval
  row1_1-row1_72 row2_1-row2_72 row3_1-row3_72 row4_1-row4_72
  row5_1-row5_72 row6_1-row6_72 row7_1-row7_72 row8_1-row8_72
  row9_1-row9_72 row10_1-row10_72 row11_1-row11_72 row12_1-row12_72
  row13_1-row13_72 row14_1-row14_72 row15_1-row15_72 row16_1-row16_72
  row17_1-row17_72 row18_1-row18_72 row19_1-row19_72 row20_1-row20_72
  row21_1-row21_72 row22_1-row22_72 row23_1-row23_72 row24_1-row24_72
  row25_1-row25_72 row26_1-row26_72 row27_1-row27_72 row28_1-row28_72
  row29_1-row29_72 row30_1-row30_72 row31_1-row31_72 row32_1-row32_72
  row33_1-row33_72 row34_1-row34_72 row35_1-row35_72 row36_1-row36_72
  using hadcrut3.txt;
drop unit nrows ncolumns missval;
compress;

reshapelong row1_ row2_ row3_ row4_ row5_ row6_ row7_ row8_ row9_ row10_row11_ row12_ row13_ row14_ row15_ row16_ row17_ row18_row19_ row20_row21_ row22_ row23_ row24_ row25_ row26_ row27_ row28_row29_ row30_row31_ row32_ row33_ row34_ row35_ row36_, i(yearmonth) j(hgrid);

forvalues n=1/36 {;
  rename row`n'_ dtemp`n';
  };
reshape long dtemp, i(year month hgrid) j(vgrid);
replace dtemp=. if dtemp <= -1000000;
sort vgrid hgrid year month;
by vgrid hgrid: egen cell_nobs=count(dtemp);
drop if cell_nobs <= 0;
* panel_id: 5 deg grid cells starting from 90-85N, 180-175W = 1;
gen panel_id=(vgrid-1)*72+hgrid;
* time-_id: months starting with Jan 1850 = 1;
gen time_id=(year-1850)*12+month;
sort panel_id year time_id;
compress;
describe;
save "hadcrut3_grid_data.dta", replace;

==================

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Prev by Date: Re: st: Question about subscripting
Next by Date: Re: st: ML-Evaluator for modelling retirement decisions
Previous by thread: st: Question about subscripting
Next by thread: st: New version of -eclplot- on SSC
Index(es):
- Date
- Thread