Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Re: need help on reading big ASCII file to make a panel data set on temperature across geographic coordinates

From   Gordon Hughes <>
Subject   st: Re: need help on reading big ASCII file to make a panel data set on temperature across geographic coordinates
Date   Wed, 04 Jan 2012 11:00:22 +0000

The HadCRUT3 dataset is a relatively straightforward one to handle once you have done a minor edit on the header lines. All you really need to do is a global replacement of "rows" by " " and "columns. Missing=" by " ". You can use Notepad++ for this if your regular text editor cannot handle files of 60+ Mb.

Once you have done this, the code below will do almost everything you want. [I wrote this to handle a version of the HadCRUT3 dataset up to mid-2010.] The edited version of the original data is stored in the file "hadcrut3.txt". The panel_id and time_id variables are defined at the end of the code. Replace the command -save- by -saveold- if you want to save the data as a State 9 dataset.

Gordon Hughes


capture cd "g:\CRU_Data";
capture log close;
log using "hadcrut3_grid_data.log", replace;
infile year month unit nrows ncolumns missval
  row1_1-row1_72 row2_1-row2_72 row3_1-row3_72 row4_1-row4_72
  row5_1-row5_72 row6_1-row6_72 row7_1-row7_72 row8_1-row8_72
  row9_1-row9_72 row10_1-row10_72 row11_1-row11_72 row12_1-row12_72
  row13_1-row13_72 row14_1-row14_72 row15_1-row15_72 row16_1-row16_72
  row17_1-row17_72 row18_1-row18_72 row19_1-row19_72 row20_1-row20_72
  row21_1-row21_72 row22_1-row22_72 row23_1-row23_72 row24_1-row24_72
  row25_1-row25_72 row26_1-row26_72 row27_1-row27_72 row28_1-row28_72
  row29_1-row29_72 row30_1-row30_72 row31_1-row31_72 row32_1-row32_72
  row33_1-row33_72 row34_1-row34_72 row35_1-row35_72 row36_1-row36_72
  using hadcrut3.txt;
drop unit nrows ncolumns missval;
reshape long row1_ row2_ row3_ row4_ row5_ row6_ row7_ row8_ row9_ row10_ row11_ row12_ row13_ row14_ row15_ row16_ row17_ row18_ row19_ row20_ row21_ row22_ row23_ row24_ row25_ row26_ row27_ row28_ row29_ row30_ row31_ row32_ row33_ row34_ row35_ row36_, i(year month) j(hgrid);
forvalues n=1/36 {;
  rename row`n'_ dtemp`n';
reshape long dtemp, i(year month hgrid) j(vgrid);
replace dtemp=. if dtemp <= -1000000;
sort vgrid hgrid year month;
by vgrid hgrid: egen cell_nobs=count(dtemp);
drop if cell_nobs <= 0;
* panel_id: 5 deg grid cells starting from 90-85N, 180-175W = 1;
gen panel_id=(vgrid-1)*72+hgrid;
* time-_id: months starting with Jan 1850 = 1;
gen time_id=(year-1850)*12+month;
sort panel_id year time_id;
save "hadcrut3_grid_data.dta", replace;


*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index