Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Steve Samuels <sjsamuels@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: RE: Cluster Robust Standard Errors for Cross Country Data |

Date |
Thu, 5 Jul 2012 09:52:49 -0500 |

I forgot an interesting thread for comparing weighted and unweighted means that was started at: http://www.stata.com/statalist/archive/2011-06/msg00405.html Austin Nichols suggested the DuMuouchel-Duncan and Winship-Radbill references. Stas Kolenikov mentioned the important paper by Pfeffermann (1993), which can be found at: http://www.stat.cmu.edu/~brian/905-2008/papers/Pfeffermann-ISR-1993.pdf Reference: Pfeffermann, D. (1993). The role of sampling weights when modeling survey data. International Statistical Review/Revue Internationale de Statistique, 317-337. Steve sjsamuels@gmail.com > In response to my request to see the codebook that advised against using > weights in the Demographic and Health Surveys, Abekah Nkrumah privately sent me > a document: > > GUIDE TO DHS STATISTICS > Shea Oscar Rutstein, Ph.D. Guillermo Rojas, M.C.S., M.A. > Demographic and Health Surveys ORC Macro Calverton, Maryland > September 2006 > > which states on page 14 (in reverse order): > > "5. Use of sample weights biases estimates of confidence intervals in most > statistical packages since the number of weighted cases is taken to produce the > confidence interval instead of the true number of observations. For oversampled > areas or groups, use of the sample weights will drastically overestimate > sampling variances and confidence intervals for those groups." > > My response: This paragraph refers to the fact that some statistical packages > are not survey-aware and so treat all weights as frequency weights. It is not > an argument against probability weighting. > > "4. Use of sample weights is inappropriate for estimating relationships, such as > regression and correlation coefficients." > > My response: > > I'm not surprised that the authors gave no justification for their assertion. > It's not true in general (see any advanced text) and I see no reason why it > would apply to the DHS without qualification. There are certainly some > situations where an unweighted analysis is preferable. Abekah should review at a > minimum the downloadable abstract of Windship and Radbill (1994) and the > downloadable reference by DuMouchel and Duncan (1983). Groves (1989) presents > an interesting example and argument. (I am traveling and so do not have a > page reference.) To sum up: Unless Abekah can provide substantive justification > for doing otherwise, he should use the weights. > > > References: > > W DuMouchel & G Duncan (1983) “Using Sample Survey Weights in Multiple > Regression Analysis.” Journal of the American Statistical Association > 78(383):535-543. Download at: > www.stat.cmu.edu/~brian/905-2008/papers/DumouchelDuncan-JASA-1983.pdf > > Groves, R. M. (1989). Survey errors and survey costs, New York: Wiley. > > Winship, C., & Radbill, L. (1994). Sampling Weights and Regression Analysis. > Sociological Methods & Research, 23(2), 230-257. Abstract at: smr.sagepub.com/content/23/2/230.refs > > Steve > sjsamuels@gmail.com > > You are welcome, Gordon. Could you please post a link to the study and to the codebook that advises that weights are not necessary? Thanks, Steve On Jul 3, 2012, at 5:34 AM, Abekah Nkrumah wrote: Dear Steve, Thank you for the response. In response to your question; YES the data has within country sample weights and strata. The strata is the cluster_var. Each country is divided into clusters and from within each cluster households are sampled for interviews. So the strata variable is the same as the cluster variable. That being the case, what will then constitute cluster_var in the survey command that you gave? Secondly I have already done some estimations at the country level without using the survey command but correcting for possible intra-cluster correlations using the cluster variable. So for consistency I would want to continue the cross country without survey commands. I did not use the survey commands for simplicity and secondly the data code book advices that it is not necessary to includes sample weights in estimations. The issue then is just correcting the intra-cluster correlations arising from the within country cluster correlations at a cross country level. Nonetheless, I will appreciate your answer to the first question as well so I can try the two and see what differences there might be. Regards Gordon On Mon, Jul 2, 2012 at 10:56 PM, Steve Samuels <sjsamuels@gmail.com> wrote: > > It's quite all right to combine surveys. > > Some questions for you: > > Are sampling weights provided? I'll assume > so below. If not, what do you know about the sample weighting? > Are sampling strata within countries identified? > > I suggest that you -svyset- the data > > *************************** > svyset cluster_var [pw = sampling_weight ] , strata(country) > ************************** > > If there were within-country strata, then define > *********************************************************** > egen super_strat = group(country stratum_var) > ****************************************************** > and substitute "strata(super_strat)" in the -svyset- statement. > > Then use commands that take a -svy- prefix. To see Stata's official survey-aware > commands type "help svy_estimation" > > Steve > > On Jul 2, 2012, at 5:35 PM, Abekah Nkrumah wrote: > > Dear Mark, > > Thank you very much for the response. Reading your response I was > wondering what the difference will be if I decide to cluster on the > cluster id instead of the household id. As I indicated in my earlier > mail, there is actually a cluster variable for each country. This > cluster variable contains the different clusters for each country from > which households were sampled. in my dataset the country with the > lowest number of clusters is about 412. > > Thank you very much > > On Mon, Jul 2, 2012 at 4:08 PM, Schaffer, Mark E <M.E.Schaffer@hw.ac.uk> wrote: >> Gordon, >> >>> -----Original Message----- >>> From: owner-statalist@hsphsun2.harvard.edu >>> [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of >>> Abekah Nkrumah >>> Sent: 02 July 2012 10:32 >>> To: statalist@hsphsun2.harvard.edu >>> Subject: st: Cluster Robust Standard Errors for Cross Country Data >>> >>> Dear Stata List, >>> >>> I have pooled cross-section household datasets from 20 >>> countries. For each of these countries, the data was >>> collected via cluster sampling meaning there will be >>> intra-cluster correlations which will affect the validity of >>> the standard errors. If I were carrying out my estimations on >>> a single country I know that I could correct for the possible >>> bias in the standard errors by using the variable containing >>> the cluster ids to estimate cluster robust standard errors. >>> >>> In the present case where I have pooled (i.e appended as in >>> stata) the household cross-section data from 20 different >>> countries, will it be right to still use the variable >>> containing the cluster ids to estimate the cluster robust >>> standard errors? Note that now the cluster ids will be for >>> all 20 countries. >> >> This is problematic. The consistency of the cluster-robust covariance >> estimator is asymptotic in the number of clusters, and 20 isn't very far >> on the way to infinity. Clustering on country is probably not a great >> idea. >> >> An alternative is to cluster on household ID and to use country dummies >> when you pool the data. This would allow for arbitrary within-household >> correlation (via clustering on household ID) and invariant >> within-country correlation (via the country dummies). >> >> HTH, >> Mark >> >>> I will appreciate your help. >>> >>> Thank you very much >>> >>> Gordon >>> >>> -- >>> ********************************************** >>> Dept. of Public Admin & Health Serv. Mgt University of Ghana >>> Business School P.O. Box LG 78 Legon-Accra Ghana >>> Tel: ++ 233 21 500159 Ext. 6247 >>> ++ 233 21 502258 Ext. 6247 >>> ++ 233 21 502255 Ext. 6247 >>> Cell: ++233 243 198 313 >>> >>> Email: gankrumah@ug.edu.gh >>> ankrumah@gmail.com >>> * >>> * For searches and help try: >>> * http://www.stata.com/help.cgi?search >>> * http://www.stata.com/support/statalist/faq >>> * http://www.ats.ucla.edu/stat/stata/ >>> >> * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: RE: Cluster Robust Standard Errors for Cross Country Data***From:*Abekah Nkrumah <ankrumah@gmail.com>

**References**:**st: Cluster Robust Standard Errors for Cross Country Data***From:*Abekah Nkrumah <ankrumah@gmail.com>

**st: RE: Cluster Robust Standard Errors for Cross Country Data***From:*"Schaffer, Mark E" <M.E.Schaffer@hw.ac.uk>

**Re: st: RE: Cluster Robust Standard Errors for Cross Country Data***From:*Abekah Nkrumah <ankrumah@gmail.com>

**Re: st: RE: Cluster Robust Standard Errors for Cross Country Data***From:*Steve Samuels <sjsamuels@gmail.com>

**Re: st: RE: Cluster Robust Standard Errors for Cross Country Data***From:*Abekah Nkrumah <ankrumah@gmail.com>

- Prev by Date:
**Re: st: adding brief description to a database** - Next by Date:
**Re: st: FW: How to keep only firms that have at least two consecutive years of data** - Previous by thread:
**Re: st: RE: Cluster Robust Standard Errors for Cross Country Data** - Next by thread:
**Re: st: RE: Cluster Robust Standard Errors for Cross Country Data** - Index(es):