[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
Daniel Wilde <dgw24@bath.ac.uk> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
st: DHS ten years of data |

Date |
Thu, 04 Oct 2007 15:37:49 +0100 |

All,

I am trying to create ten years of data for infant and child mortality data from a single Demographic Health Survey. I originally had a SAS do file which selected the relevant records and variables, and a STATA file that then created the two series of DATA. With some help from numerous people on STAT list (thanks) I was a able to combine the two do files in a single STATA do file. This do file now runs all the way through, but it does not always produce tens years of data for both series. For example, for Bangledesh 1996 survey (individual recode) it produces ten years of data for one series but only seven years for the other series. For Kenya 1989 (individual recode) it only produces a few years worth of data. Can anyone take a look at the do file and tell me why this is happening? I have pasted all three do files below:

Any help would be greatly appreciated.

**** DO FILE 1 COMBINED STATA FILE ********

*Explanation of Stata do file and SAS File

set more off

* lets open a log

log using c:\DHSLOG1, replace

* This do file creates ten years worth of time series data for infant, under three and child mortality from a demographic health survey.

* This do file has to be run for each survey. Then the results are pasted into the panel data time series Excel sheet, which also contains information from WDI, then the results are converted back into STATA

* The file is always loaded from the desktop DHSEX

* Step 1 Setting up Stata

* The next line tells stata we are using 8.2; because you are using your full laptop version of STATA 8, but WED has version 9

version 8.2

* T he next command increases the systems memory because we are going to be working with a lot of variables

set mem 36m

* The next command increases the number of variables it can handle

set matsize 3500

* Step 2 loading the data

* the next command loads the data file. The file has to be put in

use C:\DHSEX.DTA, clear

* Step 3 Recoding

*Recode the variable that tells you if women interviewed is from a rural or urban area

recode v102 2 = 0

*rescale the sample weight

replace v005 = v005/1000000

* Step 4 Extracting the relevant information for each child

* the first stage is too only keep the relevant variables because the reshape command crashes if we have everything. We are only keeping 6 kids because this is what Stifel SPS prog does, but we need to check that this is right with him.

sort caseid

g wid=_n

keep caseid v001 v002 v003 v005 v006 v007 v011 v008 v101 v102 bidx_01-bidx_06 bord_01-bord_06 b0_01-b0_06 b3_01-b3_06 b4_01-b4_06 b7_01-b7_06 b10_01- b10_06

* the second stage is too create a macro with all the information we want in it apart from the variable that we are making long

global varx "bidx_ bord_ b0_ b3_ b4_ b7_ b10_ "

* the third stage is too reshape so each observation is its own line

reshape long $varx, i(caseid) j(j 01 02 03 04 05 06)

* Lets creat variables with new names

gen hcluster = v001

gen hhnumber = v002

gen mother = v003

gen wgt = v005

gen monthint = v006

gen yearint = v007

gen dateint = v008

gen mdob = v011

gen region = v101

gen urban = v102

gen twin = b0_

gen dob_ = b3_

gen sex_ = b4_

gen aged_ = b7_

gen flag_ = b10_

*now lets drop all variables that are not relevant

keep caseid hcluster hhnumber mother wgt monthint yearint dateint mdob region urban bidx_ bord_ twin dob_ sex_ aged_

* now lets drop missing observations

drop if bidx == .

* now lets create a variables showing the childs age

gen magekdob = (dob_ - mdob)/12

gen age = dateint - dob

gen yob = int(dob/12)

* now lets sort the variables

sort hcluster hhnumber mother bidx

* from here on is Stifels do file

#delimit;

*Load dataset and keep only those kids born to mothers whose age was 15-39 at the time of birth;

scalar year = 96;

gen ageint = dateint - dob;

drop if ageint<12;

keep yob aged magekdob wgt ageint;

drop if magekdob<15 | magekdob>35;

save c:\DHSTEMP1, replace;

*------------------------------------------------*

| |

| Program to create 10 scalars indicating |

| the year in which each cohort of kids |

| was born. |

| |

| For example, if the survey year was 1988, |

| |

| surv1 = 78 |

| surv2 = 79 |

| . |

| . |

| surv10 = 87 |

| |

| Syntax for the program is simply "survgen" |

| |

*------------------------------------------------*;

capture program drop survgen;

program define survgen;

local i = 1;

while `i' <= 10 {;

scalar surv`i' = year - 11 + `i';

local i = `i' + 1;

};

drop if yob == year | yob < surv1;

end;

*------------------------------------------------*

| |

| Program to calculate IMR & CMR rates |

| for cohorts of kids born in each of the |

| 10 years prior to the survey |

| |

| Saves the output as two datasets: |

| c:\DHSTEMP1 -- IMR |

| c:\temp\DHSTEMP2 -- CMR |

| |

| Syntax for the program is simply "cimrgen" |

| |

*------------------------------------------------*

capture program drop cimrgen;

program define cimrgen;

local i = 1;

local thsnd = 1000;

qui gen imrv=0;

qui replace imrv = 1 if aged <= 12;

qui gen cmrv = 0;

qui replace cmrv = 1 if aged <= 36;

sum imrv cmrv;

while `i' <= 10 {;

matrix yr`i' = surv`i';

qui gen x = imrv;

qui replace x = . if yob ~= surv`i';

qui summ x [aw=wgt], meanonly;

local imrl = r(mean);

matrix imr`i' = `imrl' * `thsnd';

drop x;

if `i' == 2 {;

matrix imr = ( yr1 , imr1 \ yr2 , imr2 );

};

if `i' > 2 {;

matrix imrn = yr`i' , imr`i';

matrix imr = imr \ imrn;

};

if `i' <= 8 {;

qui gen x = cmrv;

qui replace x = . if yob ~= surv`i';

qui summ x [aw=wgt], meanonly;

local cmrl = r(mean);

matrix cmr`i' = `cmrl' * `thsnd';

drop x;

if `i' == 2 {;

matrix cmr = ( yr1 , cmr1 \ yr2 , cmr2 );

};

if `i' > 2 {;

matrix cmrn = yr`i' , cmr`i';

matrix cmr = cmr \ cmrn;

};

};

local i = `i' + 1;

};

matrix colnames imr = year imr;

matrix colnames cmr = year cmr;

drop _all;

svmat cmr, name(col);

sort year;

save c:\DHSTEMP1, replace;

drop _all;

svmat imr, name(col);

sort year;

merge year using c:\DHSTEMP1

drop _m;

save c:\DHSTEMP1, replace;

list;

graph imr cmr year;

end;

*********************************************;

survgen;

cimrgen;

save c:\DHSEX2;

log close;

clear;

******** DO FILE 2 ORIGINAL SAS FILE ************************************************

/* SAS program to create SAS dataset of Mortality */

/* of children in country (cc) in year (YY) from DHS */

libname user '/home4/ds52/aadata';

libname work1 '/home4/ds52/temp';

options ls=132 ps=54 nocenter;

data work1.temp0;

set user.bdir3afl;

* Recode urban so that urban = 1 and rural = 0;

if v102 = 2 then v102 = 0;

* Rescale the weights;

v005 = v005/1000000;

********************************************

Extract Info Relevant to Each Child

********************************************;

array bidx_a{20} bidx_01-bidx_20;

array bord_a{20} bord_01-bord_20;

array b0_a{20} b0_01- b0_20;

array b3_a{20} b3_01- b3_20;

array b4_a{20} b4_01- b4_20;

array b7_a{20} b7_01- b7_20;

array b10_a{20} b10_01- b10_20;

do i=1 to 6 ;

if bidx_a{i}>0 then do;

hcluster = v001;

hhnumber = v002;

mother = v003;

wgt = v005;

monthint = v006;

yearint = v007;

dateint = v008;

mdob = v011;

region = v101;

urban = v102;

bidx = bidx_a{i};

bord = bord_a{i};

twin = b0_a{i};

dob = b3_a{i};

sex = b4_a{i};

aged = b7_a{i};

flag = b10_a{i};

keep caseid hcluster hhnumber mother wgt monthint yearint dateint

mdob region urban bidx bord twin dob sex aged;

if bidx_a{i}>0 then output;

end;

end;

run;

data work1.temp0;

set work1.temp0;

magekdob = (dob - mdob)/12;

age = dateint - dob;

yob = int(dob/12);

run;

proc sort; by hcluster hhnumber mother bidx; run;

proc contents;

proc means;

weight wgt;

title "Bangladesh 1996 (DHS) Mortality Data of Kids";

run;

libname trn1 xport '/home4/ds52/aadata/bd96mort.v5x'; **;

proc copy in=work1 out=trn1;

select temp0;

run;

endsas;

**** DO FILE THREE --- ORIGINAL STATA DO FILE *********************************************

version 6.0

clear

#delimit ;

set matsize 350;

set more off;

log using c:\dstifel\amort\rates\bd96imr.log, replace;

*************************************************************

Load dataset and keep only those kids born to mothers

whose age was 15-39 at the time of birth

*************************************************************;

scalar year = 96;

use c:\dstifel\amort\data\bd96mort;

gen ageint = dateint - dob;

drop if ageint<12;

keep yob aged magekdob wgt ageint;

drop if magekdob<15 | magekdob>35;

save c:\temp\temp1, replace;

*------------------------------------------------*

| |

| Program to create 10 scalars indicating |

| the year in which each cohort of kids |

| was born. |

| |

| For example, if the survey year was 1988, |

| |

| surv1 = 78 |

| surv2 = 79 |

| . |

| . |

| surv10 = 87 |

| |

| Syntax for the program is simply "survgen" |

| |

*------------------------------------------------*;

capture program drop survgen;

program define survgen;

local i = 1;

while `i' <= 10 {;

scalar surv`i' = year - 11 + `i';

local i = `i' + 1;

};

drop if yob == year | yob < surv1;

end;

*------------------------------------------------*

| |

| Program to calculate IMR & CMR rates |

| for cohorts of kids born in each of the |

| 10 years prior to the survey |

| |

| Saves the output as two datasets: |

| c:\temp\temp1 -- IMR |

| c:\temp\temp2 -- CMR |

| |

| Syntax for the program is simply "cimrgen" |

| |

*------------------------------------------------*

capture program drop cimrgen;

program define cimrgen;

local i = 1;

local thsnd = 1000;

qui gen imrv=0;

qui replace imrv = 1 if aged <= 12;

qui gen cmrv = 0;

qui replace cmrv = 1 if aged <= 36;

sum imrv cmrv;

while `i' <= 10 {;

matrix yr`i' = surv`i';

qui gen x = imrv;

qui replace x = . if yob ~= surv`i';

qui summ x [aw=wgt], meanonly;

local imrl = r(mean);

matrix imr`i' = `imrl' * `thsnd';

drop x;

if `i' == 2 {;

matrix imr = ( yr1 , imr1 \ yr2 , imr2 );

};

if `i' > 2 {;

matrix imrn = yr`i' , imr`i';

matrix imr = imr \ imrn;

};

if `i' <= 8 {;

qui gen x = cmrv;

qui replace x = . if yob ~= surv`i';

qui summ x [aw=wgt], meanonly;

local cmrl = r(mean);

matrix cmr`i' = `cmrl' * `thsnd';

drop x;

if `i' == 2 {;

matrix cmr = ( yr1 , cmr1 \ yr2 , cmr2 );

};

if `i' > 2 {;

matrix cmrn = yr`i' , cmr`i';

matrix cmr = cmr \ cmrn;

};

};

local i = `i' + 1;

};

matrix colnames imr = year imr;

matrix colnames cmr = year cmr;

drop _all;

svmat cmr, name(col);

sort year;

save c:\temp\temp1, replace;

drop _all;

svmat imr, name(col);

sort year;

merge year using c:\temp\temp1;

drop _m;

save c:\temp\temp1, replace;

list;

graph imr cmr year;

end;

*********************************************;

survgen;

cimrgen;

save c:\dstifel\amort\rates\bd96imr.dta, replace;

log close;

clear;

*

* For searches and help try:

* http://www.stata.com/support/faqs/res/findit.html

* http://www.stata.com/support/statalist/faq

* http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**Re: st: RE: How to use Postfile without using a new Stata dataset** - Next by Date:
**st: Simultaneous Estimation of Discrete (Polychotomous) Continous Equation System** - Previous by thread:
**Repost: instrumental variables regression with random effects GLS using cross-section data and endogenous binary independent variable** - Next by thread:
**st: Simultaneous Estimation of Discrete (Polychotomous) Continous Equation System** - Index(es):

© Copyright 1996–2016 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |