[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: DHS sampling weight command

From	nikh 2000 <[email protected]>
To	[email protected]
Subject	Re: st: DHS sampling weight command
Date	Mon, 20 Jul 2009 16:49:51 -0600

Thanks Emma, this is very helpful.

Nikh

On Mon, Jul 20, 2009 at 2:00 PM, <[email protected]> wrote:
> Dear Nikh,
>
> I wouldn't rely on the information in v022 to determine the strata.  To find the appropriate strata for a DHS you generally have to check the survey report to see how the strata were defined. Most surveys are stratified by province (v024) and urban/rural (v025) so a combination of those two variables is appropriate. Others are done (very) differently.  There is sometimes additional information in the dataset documentation (in the zip file) and even extra variables in some datasets.   The information given in v022 is often a relic of the old DHS data processing system and so not what Stata expects strata to be.  It depends on how old the data are.
>
> Some surveys aren't stratified, in which case you can svyset without strata.  If you are combining data from several years you can give all the observations from the unstratified survey the same number so they form one strata.
>
> The weight variable (v005) should be divided by 1000000 before use because DHS supply it multiplied up to avoid precision problems with different software (you'll notice the label says something about 6 decimals).
>
> Best wishes,
> Emma
>
>
>>>> nikh 2000 <[email protected]> 20/07/09 17:20 >>>
> Thanks  Stas Kolenikov.
> As per Stas Kolenikov's advice I have added labels, summary statistics
> of the relevant vars.
>
> Hi, I am using the following commands to set up DHS (Demographic and
> Health Survey data) data for analysis
>
> gen psu =    v021
> gen strata = v022
> gen sampwt = v005/1000000  //as per DHS  instruction//
> svyset psu [pw = sampwt], strata(strata)
>
> Where,
> v005         sample weight
> v021         primary sampling unit
> v022         sample stratum number
>
> . sum v005 v021 v022
>    Variable |       Obs        Mean    Std. Dev.       Min        Max
> -------------+--------------------------------------------------------
>        v005 |     11440     1000000    479282.7      55728    2707592
>        v021 |     11440    223.3237    163.2414          1        550
>        v022 |     11440    89.80385    51.64129          1        177
>
>
> I have two questions:
> 1. Is this the right way to set up data ?
> 2. For the data set  I am using, for one year, var V022 is missing.
>
> What other var(s) can I consider to use instead of V022
>
>
>
>
>
>
>
> On Mon, Jul 20, 2009 at 9:52 AM, Stas Kolenikov<[email protected]> wrote:
>> Nikh, this is not terribly informative -- give the labels of the
>> variables. (As the FAQ of the list says, don't assume that everybody
>> knows your data and your literature as well as you do.) You may not
>> like the idea of having weights like 10,000 if you are used to think
>> about the weight variable as something close to 1, or maybe something
>> close to 1/n. But if you want to estimate the total number of people
>> in the country that don't have access to clean water, those 10,000
>> weights are the right ones to use: the weight of 1 is going to give
>> you the total number of people in the sample that don't have access to
>> clean water, and you cannot put that sort of stuff into your country
>> report. Check DHS documentation again on the survey settings.
>>
>> To my knowledge, stratification does not change in DHS from year to
>> year, so you can keep strata ID from other years if you can match the
>> clustdrs. If you have any new PSUs, it may not be possible to
>> determine where they are coming from though; you could create a
>> separate stratum for all of them. Finally, you can ignore
>> stratification whatsoever, and lose some precision/efficiency with
>> that.
>>
>> On Mon, Jul 20, 2009 at 10:21 AM, nikh 2000<[email protected]> wrote:
>>> Hi, I am using the following commands to set up DHS (Demographic and
>>> Health Survey data) data for analysis
>>>
>>> gen psu =    v021
>>> gen strata = v022
>>> gen sampwt = v005/1000000
>>>
>>> svyset psu [pw = sampwt], strata(strata)
>>>
>>> I have two questions:
>>>
>>> 1. Is this the right way to set up data ?
>>> 2. For the data set  I am using, for one year, var V022 is missing.
>>> What other var(s) can I consider to use instead of V022
>>
>>
>>
>> --
>> Stas Kolenikov, also found at http://stas.kolenikov.name
>> Small print: I use this email account for mailing lists only.
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: DHS sampling weight command
  - From: nikh 2000 <[email protected]>
- Re: st: DHS sampling weight command
  - From: Stas Kolenikov <[email protected]>
- Re: st: DHS sampling weight command
  - From: nikh 2000 <[email protected]>
- Re: st: DHS sampling weight command
  - From: <[email protected]>

Prev by Date: st: RE: RE: RE: RE: Nested loops by observation
Next by Date: Re: st: Panel data unbalanced--time as indep variable?
Previous by thread: Re: st: DHS sampling weight command
Next by thread: st: Biprobit
Index(es):
- Date
- Thread