Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.

# Re: st: sampling weight

 From Nick Cox To statalist@hsphsun2.harvard.edu Subject Re: st: sampling weight Date Thu, 27 Sep 2012 15:20:03 +0100

```I think there is some misunderstanding here. Stas wants you to
describe the sampling design that was used to produce your dataset,
not to design a survey yourself. This means exactly how the data were
collected.

More broadly, neither Stas nor anybody else can give good advice to
you on how to analyse your dataset without an idea of how that dataset
was generated. (I  am guessing you did not visit the cities and select
the people yourself.)  Perhaps this is not even documented clearly,
but the point remains. Using any kind of weights is dubious unless you
know from documentation of the survey that those weights make sense.

Nick

On Thu, Sep 27, 2012 at 3:07 PM, Lynn Lee <lynn09v@gmail.com> wrote:

> I have no idea about sampling design ( I never learned before.). The below
> are just my idea about choice of simple weights. I generate a new variable,
> which is total number of individuals in each city in the data set. And I
> choose this new variable as weights, type in -[pweight=total]- , looks like
> Stata11 can do this weighted regression for me. But I can not figure out
> how Stata11 do weighting. Could you please give me suggestion about basics
> of sampling design (or some web link)?  I am new to sampling design, I do
> not know how to describe in full detail.

Stas Kolenikov

> These are steps in the right direction. Please describe your sampling design
> in full detail, so that we could brainstorm and see what the right
> specifications should be.

Lynn Lee

>> I just want to do simple sampling.
>>
>> Take "webuse total" for example. I am wondering how was "swgt" generated?
> I
>> guess: obs 1 has her corresponding sampling weight, swgt=25964, which is
> the
>> total population in her group; obs 4 has his corresponding sampling
> weight,
>> swgt=4312, which is the total population in his group;etc.  Is that right?
>>
>> So, if I use this logic in my downloaded survey data sets, I can group all
>> the obs into different sampling weight over residence place and gender.
>> Like: I calculate total number of individuals who were in the dataset
>> according to their resident city , say, total number of individuals in
> city
>> 1 is 1000 in dataset, total number of individuals in city n is 400 in the
>> data set, then, I generate this city-total-individuals as a new variable
>> (weight). (Or I can even be more detailed, total number of people in the
>> data set over city, gender, age.) In regression, I simply use command "reg
> y
>> x1 x2 x3 [pweight=total]". Can this way correct in part for unweighted
> data
>> set?
>>
>> Suppose the mean of total(weights) is 500, min is 100 and max is 800.Then,
>> weighted analysis will give at most 800/100 times the weights to
> potentially
>> under-sampled observations. Do I understand correctly?

Stas Kolenikov

>> If Lynn obtained her sample in a rigorous way by enumerating the
> dwellings,
>> she should have all the inputs into the probability of selection, and the
>> baseline sampling weight is the inverse of that.
>> Then she would want to correct for non-response, which would be the
> fraction
>> of those responding to the survey among those sampled.
>>
>> If Lynn is interested in a specific population (females of reproductive
> age,
>> say), and that's who the survey collected the data on, then she would need
>> to get the total population counts for that specific population (which may
>> prove even more difficult).
>>
>> If she does not have these figures, then I don't really know what to do.
> As
>> they say, when you approach a statistician with collected data in hand,
> they
>> can only tell you what killed your study.

>> On Wed, Sep 26, 2012 at 8:15 AM, JVerkuilen (Gmail)

>>> On Wed, Sep 26, 2012 at 2:49 AM, Lynn Lee <lynn09v@gmail.com> wrote:
>>>
>>>> Any suggestion to suggest which weight is better? Or, other types of
>> weights
>>>> may be better than population weights?
>>>
>>> Do you have a few accurately observed variables such as the population
>>> age and gender breakdown? If so you can often create
>>> post-stratification weights (through a process called "raking") that
>>> make your samples align with the associations observed in those
>>> tables.
>>>
>>> A quick -findit raking- turned up a program -ipfraking- written by
>>> Stas Kolenikov and available from his website. Hopefully he'll chime
>>> in.
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/
```