# Re: st: Unexpected proportions after survey commands

 From sjsamuels@gmail.com To statalist@hsphsun2.harvard.edu Subject Re: st: Unexpected proportions after survey commands Date Sat, 9 May 2009 22:03:11 -0400

```---

I meant: "A probability weight is the number of people represented by
a sample member."

On Sat, May 9, 2009 at 8:20 PM,  <sjsamuels@gmail.com> wrote:
> Jean-Gael:
>
>
> A probability weight is the number of people represented by those in a
> sample member.  Your weights look nothing like numbers of people.  In
> your first sample, the HH probability weights (before non-response
> adjustments) should be 10.0, because you took a 10% sample of HH.  If
> you interviewed every adult in the HH, they retain the HH weight.  If
> you interviewed 1/K in a household, the person weight is the HH weight
> x K.
>
> It's not clear whether your frame of tourist workers (sample 2)  was
> of HH or people.  If people, then you should be interviewing only
> people who work in tourism, not their HH members--as HH members would
> not have been in the frame.  Since I don't know your sampling scheme,
> I don't know how to compute the sampling weight.
>
> When you have 2 samples, as you did here, treat each one as coming
> from a different stratum.  Transfer the people in sample who work in
> tourism to the 2nd stratum, and retain their original sampling weight.
>
> If villages are strata, then you have 2x10 = 20 sampling strata.
> However it sounds like 10 villages are themselves a convenience
> sample.  If so, then keep the two samples as strata.  Your PSU should
> probably be HH.  However if you interviewed only one person per HH,
> then PSU can be  person.
>
> After computing the sampling weights, you can,  as Michael states, use
> the -poststratify- option in Stata to reproduce the tourism counts.
> Your post-stratification totals (tourism workers, non-tourism workers,
> should add to the estimated population totals in the 10 villages;
> 0.84% should be tourism workers, and 98.26% should be non-tourism
> workers.   If you want separate estimates of impact in each village,
> then you can use the the villages to also define your post-strata: 10
> villages x 2 tourist-worker-status strata.
>
> Finally, unless one goal is to compare tourism and non-tourism
> workers, it was not necessary to enhance your sample with tourism
> workers.   Tourism workers are obviously greatly affected by tourism,
> compared to non-tourism workers.  However, they constitute only 0.84%
> of the population,  so contribute minimally to the overall effects of
> tourism on the population.
>
> if you need further assistance, the University of Florida has a number
> of faculty with experience in survey sampling.
>
> -Steve
>
>
>
> On Sat, May 9, 2009 at 5:13 PM, Jean-Gael Collomb <JG@ufl.edu> wrote:
>> Hello all,
>>
>> I have a question about using post stratification weights and using Stata's
>> survey commands. After setting the weights, I do not get the proportions I
>> expected.
>>
>> My overall research question is to see if tourism (TOURIND) influences
>> quality of life in several communities in a rural province of Namibia. My
>> aim was to conduct individual interviews in a sample of 10% of all
>> households in each community. I obtained household census counts from key
>> informants within the community and my own double checks during field work.
>>  This random sample yielded a random sample of 395 interviews, of which only
>> 9 (2.3%) were conducted with individuals working in tourism. Given this very
>> low number of respondents who worked in tourism and my interest in trying to
>> understand the impact of tourism, I established a sampling frame restricted
>> to individuals working in tourism and interviewed 72 individuals. [Two of
>> those interviews were conducted with individuals not employed in tourism but
>> living in a household where someone was]. In total, I thus interviewed 467
>> people, among which 79 worked in tourism. My full sample oversampled tourism
>> employees and i think it would be wrong to derive from it that 17%
>> (79/467*100) of the population works in tourism. I think Post stratification
>> weights should be assigned to my data set to correct for the oversampling.
>> In fact, the percentage of the population working in tourism varies by
>> communities and thus different weights should be calculated for different
>> communities. I used existing reports documenting total numbers of community
>> residents employed by local tourism operators and total population size as a
>> basis to calculate the "true" distribution of tourism employees (weight2).
>> The weights were calculated by dividing the “true” percentage by the
>> “oversampled” percentage.
>>
>> The problem is that when I apply the weights in Stata, I do not get the
>> proportion I expected. Specifically, I expected that after svyset _n
>> [pweight = samplewt2] and svy: tab tourind, I would find that 0.84% of the
>> population could be labeled TOURIND, but Stata returns a value of 3.25% (and
>> similar discrepancies for each community).
>>
>> I am not sure I am doing something wrong in calculating the weights,
>> assigning the weights to my dataset, or entering the tab commands in svy
>> mode. I’d greatly appreciate your help in helping move past this and take
>> advantage of survey commands in Stata.
>>
>> Thank you very much if you have time to give me some feedback or point me
>> towards the best information source (textbook?).
>>
>> Cheers,
>>
>> Jean-Gael Collomb, jg@ufl.edu
>>
>> (PS. I run Stata 10 in Mac OSX)
>>
>>
>>
>> State code entered:
>>
>> *ASSIGNING POST STRATIFICATION WEIGHTS
>>
>> *-------------------------------------
>>
>> gen samplewt2=0
>>
>> label var samplewt2 "Post Stratification sample weight 2"
>>
>> replace samplewt2=0.99975204562360500 if conservancy==1 & sample==1
>>
>> replace samplewt2=0.04357333333333330 if conservancy==2 & sample==2
>>
>> replace samplewt2=1.39197814207650000 if conservancy==2 & sample==1
>>
>> replace samplewt2=0.10144078144078100 if conservancy==3 & sample==2
>>
>> replace samplewt2=1.18320139407518000 if conservancy==3 & sample==1
>>
>> replace samplewt2=0.05683908045977010 if conservancy==4 & sample==2
>>
>> replace samplewt2=1.47985380116959000 if conservancy==4 & sample==1
>>
>> replace samplewt2=0.01906976744186050 if conservancy==5 & sample==2
>>
>> replace samplewt2=1.05030411449016000 if conservancy==5 & sample==1
>>
>> tab tourind
>>
>> bysort conservancy: tab tourind
>>
>> *applying weight2 (those derived from IRDNC data)
>>
>> svyset _n [pweight = samplewt2]
>>
>> svy: tab tourind, percent
>>
>>
>>
>> Jean-Gael "JG" Collomb
>>
>> PhD candidate
>>
>> School of Natural Resources and Environment / School of Forest Resources and
>> Conservation
>>
>> University of Florida
>>
>> jgcollomb@gmail.com
>>
>> jg@ufl.edu
>>
>> +1 (352) 870 6696
>>
>>
>>
>>
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>>
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```