Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: DHS sampling weight command


From   <Emma.Slaymaker@lshtm.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   Re: st: DHS sampling weight command
Date   Mon, 20 Jul 2009 21:00:49 +0100

Dear Nikh,

I wouldn't rely on the information in v022 to determine the strata.  To find the appropriate strata for a DHS you generally have to check the survey report to see how the strata were defined. Most surveys are stratified by province (v024) and urban/rural (v025) so a combination of those two variables is appropriate. Others are done (very) differently.  There is sometimes additional information in the dataset documentation (in the zip file) and even extra variables in some datasets.   The information given in v022 is often a relic of the old DHS data processing system and so not what Stata expects strata to be.  It depends on how old the data are.  

Some surveys aren't stratified, in which case you can svyset without strata.  If you are combining data from several years you can give all the observations from the unstratified survey the same number so they form one strata.  

The weight variable (v005) should be divided by 1000000 before use because DHS supply it multiplied up to avoid precision problems with different software (you'll notice the label says something about 6 decimals).

Best wishes,
Emma


>>> nikh 2000 <nikh.2000@gmail.com> 20/07/09 17:20 >>>
Thanks  Stas Kolenikov.
As per Stas Kolenikov's advice I have added labels, summary statistics
of the relevant vars.

Hi, I am using the following commands to set up DHS (Demographic and
Health Survey data) data for analysis

gen psu =    v021
gen strata = v022
gen sampwt = v005/1000000  //as per DHS  instruction//
svyset psu [pw = sampwt], strata(strata)

Where,
v005         sample weight
v021         primary sampling unit
v022         sample stratum number

. sum v005 v021 v022
    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
        v005 |     11440     1000000    479282.7      55728    2707592
        v021 |     11440    223.3237    163.2414          1        550
        v022 |     11440    89.80385    51.64129          1        177


I have two questions:
1. Is this the right way to set up data ?
2. For the data set  I am using, for one year, var V022 is missing.

What other var(s) can I consider to use instead of V022







On Mon, Jul 20, 2009 at 9:52 AM, Stas Kolenikov<skolenik@gmail.com> wrote:
> Nikh, this is not terribly informative -- give the labels of the
> variables. (As the FAQ of the list says, don't assume that everybody
> knows your data and your literature as well as you do.) You may not
> like the idea of having weights like 10,000 if you are used to think
> about the weight variable as something close to 1, or maybe something
> close to 1/n. But if you want to estimate the total number of people
> in the country that don't have access to clean water, those 10,000
> weights are the right ones to use: the weight of 1 is going to give
> you the total number of people in the sample that don't have access to
> clean water, and you cannot put that sort of stuff into your country
> report. Check DHS documentation again on the survey settings.
>
> To my knowledge, stratification does not change in DHS from year to
> year, so you can keep strata ID from other years if you can match the
> clustdrs. If you have any new PSUs, it may not be possible to
> determine where they are coming from though; you could create a
> separate stratum for all of them. Finally, you can ignore
> stratification whatsoever, and lose some precision/efficiency with
> that.
>
> On Mon, Jul 20, 2009 at 10:21 AM, nikh 2000<nikh.2000@gmail.com> wrote:
>> Hi, I am using the following commands to set up DHS (Demographic and
>> Health Survey data) data for analysis
>>
>> gen psu =    v021
>> gen strata = v022
>> gen sampwt = v005/1000000
>>
>> svyset psu [pw = sampwt], strata(strata)
>>
>> I have two questions:
>>
>> 1. Is this the right way to set up data ?
>> 2. For the data set  I am using, for one year, var V022 is missing.
>> What other var(s) can I consider to use instead of V022
>
>
>
> --
> Stas Kolenikov, also found at http://stas.kolenikov.name 
> Small print: I use this email account for mailing lists only.
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search 
> *   http://www.stata.com/support/statalist/faq 
> *   http://www.ats.ucla.edu/stat/stata/ 
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search 
*   http://www.stata.com/support/statalist/faq 
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index