[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Jean-Gael Collomb <jg@ufl.edu> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Fwd: st: Unexpected proportions after survey commands |

Date |
Wed, 13 May 2009 18:10:39 -0400 |

From: Jean-Gael Collomb <jg@ufl.edu> Date: May 13, 2009 3:50:02 PM EDT To: statalist@hsphsun2.harvard.edu Subject: Re: st: Unexpected proportions after survey commands Thanks to the stata listserv members who who replied to my query.I have recalculated weights based on on your suggestions, and I getvalues which look more like what I expected but still a little off.I'll follow up off list for a contact on my campus.Cheers, Jean-Gael "JG" Collomb PhD candidateSchool of Natural Resources and Environment / School of ForestResources and ConservationUniversity of Florida jgcollomb@gmail.com jg@ufl.edu +1 (352) 870 6696 On May 9, 2009, at 10:03 PM, sjsamuels@gmail.com wrote:--- I meant: "A probability weight is the number of people represented by a sample member." On Sat, May 9, 2009 at 8:20 PM, <sjsamuels@gmail.com> wrote:Jean-Gael:A probability weight is the number of people represented by thosein asample member. Your weights look nothing like numbers of people.Inyour first sample, the HH probability weights (before non-responseadjustments) should be 10.0, because you took a 10% sample of HH.Ifyou interviewed every adult in the HH, they retain the HH weight.Ifyou interviewed 1/K in a household, the person weight is the HHweightx K. It's not clear whether your frame of tourist workers (sample 2) was of HH or people. If people, then you should be interviewing onlypeople who work in tourism, not their HH members--as HH memberswouldnot have been in the frame. Since I don't know your samplingscheme,I don't know how to compute the sampling weight. When you have 2 samples, as you did here, treat each one as coming from a different stratum. Transfer the people in sample who work intourism to the 2nd stratum, and retain their original samplingweight.If villages are strata, then you have 2x10 = 20 sampling strata. However it sounds like 10 villages are themselves a conveniencesample. If so, then keep the two samples as strata. Your PSUshouldprobably be HH. However if you interviewed only one person per HH, then PSU can be person.After computing the sampling weights, you can, as Michael states,usethe -poststratify- option in Stata to reproduce the tourism counts.Your post-stratification totals (tourism workers, non-tourismworkers,should add to the estimated population totals in the 10 villages; 0.84% should be tourism workers, and 98.26% should be non-tourism workers. If you want separate estimates of impact in each village,then you can use the the villages to also define your post-strata:10villages x 2 tourist-worker-status strata. Finally, unless one goal is to compare tourism and non-tourism workers, it was not necessary to enhance your sample with tourismworkers. Tourism workers are obviously greatly affected bytourism,compared to non-tourism workers. However, they constitute only0.84%of the population, so contribute minimally to the overall effectsoftourism on the population.if you need further assistance, the University of Florida has anumberof faculty with experience in survey sampling. -SteveOn Sat, May 9, 2009 at 5:13 PM, Jean-Gael Collomb <JG@ufl.edu>wrote:Hello all,I have a question about using post stratification weights andusing Stata'ssurvey commands. After setting the weights, I do not get theproportions Iexpected.My overall research question is to see if tourism (TOURIND)influencesquality of life in several communities in a rural province ofNamibia. Myaim was to conduct individual interviews in a sample of 10% of allhouseholds in each community. I obtained household census countsfrom keyinformants within the community and my own double checks duringfield work.This random sample yielded a random sample of 395 interviews, ofwhich only9 (2.3%) were conducted with individuals working in tourism.Given this verylow number of respondents who worked in tourism and my interestin trying tounderstand the impact of tourism, I established a sampling framerestrictedto individuals working in tourism and interviewed 72 individuals.[Two ofthose interviews were conducted with individuals not employed intourism butliving in a household where someone was]. In total, I thusinterviewed 467people, among which 79 worked in tourism. My full sampleoversampled tourismemployees and i think it would be wrong to derive from it that 17%(79/467*100) of the population works in tourism. I think Poststratificationweights should be assigned to my data set to correct for theoversampling.In fact, the percentage of the population working in tourismvaries bycommunities and thus different weights should be calculated fordifferentcommunities. I used existing reports documenting total numbers ofcommunityresidents employed by local tourism operators and totalpopulation size as abasis to calculate the "true" distribution of tourism employees(weight2).The weights were calculated by dividing the “true” percentage bythe“oversampled” percentage.The problem is that when I apply the weights in Stata, I do notget theproportion I expected. Specifically, I expected that after svyset_n[pweight = samplewt2] and svy: tab tourind, I would find that0.84% of thepopulation could be labeled TOURIND, but Stata returns a value of3.25% (andsimilar discrepancies for each community).I am not sure I am doing something wrong in calculating theweights,assigning the weights to my dataset, or entering the tab commandsin svymode. I’d greatly appreciate your help in helping move past thisand takeadvantage of survey commands in Stata.Thank you very much if you have time to give me some feedback orpoint metowards the best information source (textbook?). Cheers, Jean-Gael Collomb, jg@ufl.edu (PS. I run Stata 10 in Mac OSX) State code entered: *ASSIGNING POST STRATIFICATION WEIGHTS *------------------------------------- gen samplewt2=0 label var samplewt2 "Post Stratification sample weight 2" replace samplewt2=0.99975204562360500 if conservancy==1 & sample==1 replace samplewt2=0.04357333333333330 if conservancy==2 & sample==2 replace samplewt2=1.39197814207650000 if conservancy==2 & sample==1 replace samplewt2=0.10144078144078100 if conservancy==3 & sample==2 replace samplewt2=1.18320139407518000 if conservancy==3 & sample==1 replace samplewt2=0.05683908045977010 if conservancy==4 & sample==2 replace samplewt2=1.47985380116959000 if conservancy==4 & sample==1 replace samplewt2=0.01906976744186050 if conservancy==5 & sample==2 replace samplewt2=1.05030411449016000 if conservancy==5 & sample==1 tab tourind bysort conservancy: tab tourind *applying weight2 (those derived from IRDNC data) svyset _n [pweight = samplewt2] svy: tab tourind, percent Jean-Gael "JG" Collomb PhD candidateSchool of Natural Resources and Environment / School of ForestResources andConservation University of Florida jgcollomb@gmail.com jg@ufl.edu +1 (352) 870 6696 * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/* * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

* * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**Re: st: RE: sample size estimates for multilevel models** - Next by Date:
**AW: AW: st: bioprobit with pweights and clustered standard errors** - Previous by thread:
**RE: st: Unexpected proportions after survey commands** - Next by thread:
**st: Returning a p-value for simulation** - Index(es):

© Copyright 1996–2016 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |