Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Reshape --- it takes so bloodey long.


From   Daniel Wilde <dgw24@bath.ac.uk>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Reshape --- it takes so bloodey long.
Date   Wed, 10 Oct 2007 13:06:08 +0100

All,

I forgot to paste the do file its:


set more off


* This do file creates ten years worth of time series data for infant, under three and child mortality from a demographic health survey.

* This do file has to be run for each survey. Then the results are pasted into the panel data time series Excel sheet, which also contains information from WDI, then the results are converted back into STATA

* lines that have to change between surveys are noted

* this particular do file is for

*The file is always loaded from C:\\DHSEX, so the relevant DHS survey has to be placed there.

* Step 1 Setting up Stata

* The next line tells stata we are using 8.2; because you are using your full laptop version of STATA 8, but WED has version 9

version 8.2

* The next command increases the systems memory because we are going to be working with a lot of variables

set mem 250m

* The next command increases the number of variables it can handle

set maxvar 30000

* Step 2 loading the data

* the next command loads the data file. The file has to be put in

use C:\DHSEX, clear

* Step 3 Recoding

*Recode the variable that tells you if women interviewed is from a rural or urban area

recode v102 2 = 0

*rescale the sample weight

replace v005 = v005/1000000

* Step 4 Extracting the relevant information for each child

* the first stage is too only keep the relevant variables because the reshape command crashes if we have everything. We are only keeping 6 kids because this is what Stifel SPS prog does, but we need to check that this is right with him.

sort caseid
g wid=_n

* the below command will have to change so that it gets the right number of births

keep caseid v001 v002 v003 v005 v006 v007 v011 v008 v101 v102 bidx_01-bidx_20 bord_01-bord_20 b0_01-b0_20 b3_01-b3_20 b4_01-b4_06 b7_01-b7_20 b10_01-b10_20

*keep caseid v001 v002 v003 v005 v006 v007 v011 v008 v101 v102 bidx_01-bidx_06 bord_01-bord_06 b0_01-b0_06 b3_01-b3_06 b4_01-b4_06 b7_01-b7_06 b10_01- b10_06

* the second stage is too create a macro with all the information we want in it apart from the variable that we are making long

global varx "bidx_ bord_ b0_ b3_ b4_ b7_ b10_ "

* the third stage is too reshape so each observation is its own line

reshape long $varx, i(caseid) j(j 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20)


* now lets drop missing observations

drop if bidx == .


* Lets create variables with new names

gen hcluster = v001
drop v001

gen hhnumber = v002
drop v002

gen mother = v003
drop v003

gen wgt = v005
drop v005

gen monthint = v006
drop v006

gen yearint = v007
drop v007

gen dateint = v008
drop v008

gen mdob = v011
drop v011

gen region = v101
drop v101

gen urban = v102
drop v102

gen twin_ = b0_


gen dob_ = b3_


gen sex_ = b4_


gen aged_ = b7_


gen flag_ = b10_


*now lets drop all variables that are not relevant

keep caseid hcluster hhnumber mother wgt monthint yearint dateint mdob region urban bidx_ bord_ twin dob_ sex_ aged_


* now lets create a variables showing the childs age

gen magekdob = (dob_ - mdob)/12
gen age = dateint - dob
gen yob = int(dob/12)

* now lets sort the variables

sort hcluster hhnumber mother bidx

* from here on is Stifels do file

#delimit;

*Load dataset and keep only those kids born to mothers whose age was 15-39 at the time of birth;

scalar year = 2000;
gen ageint = dateint - dob;
drop if ageint<12;
keep yob aged magekdob wgt ageint;
drop if magekdob<15 | magekdob>35;
gen yob1 = yob + 1900;
drop yob;
rename yob1 yob;

save c:\DHSTEMP1, replace;

*------------------------------------------------*
| |
| Program to create 10 scalars indicating |
| the year in which each cohort of kids |
| was born. |
| |
| For example, if the survey year was 1988, |
| |
| surv1 = 78 |
| surv2 = 79 |
| . |
| . |
| surv10 = 87 |
| |
| Syntax for the program is simply "survgen" |
| |
*------------------------------------------------*;


capture program drop survgen;
program define survgen;
local i = 1;
while `i' <= 10 {;
scalar surv`i' = year - 11 + `i';
local i = `i' + 1;
};
drop if yob == year | yob < surv1;
end;

*------------------------------------------------*
| |
| Program to calculate IMR & CMR rates |
| for cohorts of kids born in each of the |
| 10 years prior to the survey |
| |
| Saves the output as two datasets: |
| c:\DHSTEMP1 -- IMR |
| c:\temp\DHSTEMP2 -- CMR |
| |
| Syntax for the program is simply "cimrgen" |
| |
*------------------------------------------------*

capture program drop cimrgen;
program define cimrgen;
local i = 1;
local thsnd = 1000;
qui gen imrv=0;
qui replace imrv = 1 if aged <= 12;
qui gen cmrv = 0;
qui replace cmrv = 1 if aged <= 36;

sum imrv cmrv;

while `i' <= 10 {;
matrix yr`i' = surv`i';
qui gen x = imrv;
qui replace x = . if yob ~= surv`i';
qui summ x [aw=wgt], meanonly;
local imrl = r(mean);
matrix imr`i' = `imrl' * `thsnd';
drop x;
if `i' == 2 {;
matrix imr = ( yr1 , imr1 \ yr2 , imr2 );
};
if `i' > 2 {;
matrix imrn = yr`i' , imr`i';
matrix imr = imr \ imrn;
};

if `i' <= 8 {;
qui gen x = cmrv;
qui replace x = . if yob ~= surv`i';
qui summ x [aw=wgt], meanonly;
local cmrl = r(mean);
matrix cmr`i' = `cmrl' * `thsnd';
drop x;

if `i' == 2 {;
matrix cmr = ( yr1 , cmr1 \ yr2 , cmr2 );
};
if `i' > 2 {;
matrix cmrn = yr`i' , cmr`i';
matrix cmr = cmr \ cmrn;
};
};

local i = `i' + 1;
};

matrix colnames imr = year imr;
matrix colnames cmr = year cmr;
drop _all;
svmat cmr, name(col);
sort year;
save c:\DHSTEMP1, replace;
drop _all;
svmat imr, name(col);
sort year;
merge year using c:\DHSTEMP1
drop _m;
save c:\DHSTEMP1, replace;
list;
graph imr cmr year;

end;

*********************************************;

survgen;
cimrgen;

save c:\Cameroon2004output;

log close;
clear;


--On 10 October 2007 12:46 +0100 Daniel Wilde <dgw24@bath.ac.uk> wrote:


All,

I have created the following do file to create ten years of mortality
data from each DHS survey. It works. But the reshape command takes ages
-- like 15 mins --- on my version of STATA 8 on my IBM think pad laptop.
Is there someway I can speed up the command?

Thanks

Daniel Wilde *
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index