Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: sampling weight

From   "Lynn Lee" <>
To   <>
Subject   st: sampling weight
Date   Fri, 28 Sep 2012 09:37:20 +0800

Thank JVerkuilen and Steve.

Best Regards,
Lynn Lee

-----Original Message-----
[] On Behalf Of JVerkuilen
Sent: Friday, September 28, 2012 12:05 AM
Subject: Re: st: sampling weight

The reason I suggested post-stratification way back at the beginning was
because you can generate cross-tables of things like gender by age structure
and village and can hopefully get that from census information. Then you can
make weights that line up with the observed population values on a few .
It's not a cure for what ails a survey but it can help. Check the
documentation first, though, of course.

You might simply have to indicate this as a study limitation.

On Thu, Sep 27, 2012 at 11:03 AM, Lynn Lee <> wrote:
> Thank Nick and Stas.
> The data sets are downloaded from web site. From the codebook, no 
> information was given about how the data was collected, it just remind 
> users that the "frequencies for variables are not weighted". So, in 
> this case, I should contact the survey group about sampling drawn.
> But suppose, I can not get any information about how the data set draw 
> the samples. Can I still use the idea I described before?
> Any suggestion is appreciated.
> Best Regards,
> Lynn Lee
> -----Original Message-----
> From:
> [] On Behalf Of Stas 
> Kolenikov
> Sent: Thursday, September 27, 2012 10:33 PM
> To:
> Subject: Re: st: sampling weight
> On Thu, Sep 27, 2012 at 9:20 AM, Nick Cox <> wrote:
>> I think there is some misunderstanding here. Stas wants you to 
>> describe the sampling design that was used to produce your dataset, 
>> not to design a survey yourself. This means exactly how the data were 
>> collected.
>> More broadly, neither Stas nor anybody else can give good advice to 
>> you on how to analyse your dataset without an idea of how that 
>> dataset was generated. (I  am guessing you did not visit the cities 
>> and select the people yourself.)  Perhaps this is not even documented 
>> clearly, but the point remains. Using any kind of weights is dubious 
>> unless you know from documentation of the survey that those weights make
> If you did go select the people yourself, you had to have known 
> something about sampling to have done this properly. If a survey 
> organization collected the data, they should have provided you a 
> methods report describing how the sample was drawn, and how the data 
> were collected. If they have not, it should have been their 
> responsibility, and you are in a position to ask them. Of course, I 
> can imagine a number of worst case scenarios when the data were 
> collected, but no proper report was written, and the person who oversaw
the data collection left the organization, etc.
> But usually there are ways to find out about how the sample was drawn.
> --
> -- Stas Kolenikov, PhD, PStat (SSC)  ::
> -- Senior Survey Statistician, Abt SRBI  ::  work email kolenikovs at 
> srbi dot com
> -- Opinions stated in this email are mine only, and do not reflect the 
> position of my employer
>> Nick
>> On Thu, Sep 27, 2012 at 3:07 PM, Lynn Lee <> wrote:
>>> I have no idea about sampling design ( I never learned before.). The
> below
>>> are just my idea about choice of simple weights. I generate a new
> variable,
>>> which is total number of individuals in each city in the data set. 
>>> And I choose this new variable as weights, type in -[pweight=total]- 
>>> , looks
> like
>>> Stata11 can do this weighted regression for me. But I can not figure 
>>> out how Stata11 do weighting. Could you please give me suggestion 
>>> about
> basics
>>> of sampling design (or some web link)?  I am new to sampling design, 
>>> I do not know how to describe in full detail.
>> Stas Kolenikov
>>> These are steps in the right direction. Please describe your 
>>> sampling
> design
>>> in full detail, so that we could brainstorm and see what the right 
>>> specifications should be.
>> Lynn Lee
>>>> I just want to do simple sampling.
>>>> Take "webuse total" for example. I am wondering how was "swgt"
> generated?
>>> I
>>>> guess: obs 1 has her corresponding sampling weight, swgt=25964, 
>>>> which is
>>> the
>>>> total population in her group; obs 4 has his corresponding sampling
>>> weight,
>>>> swgt=4312, which is the total population in his group;etc.  Is that
> right?
>>>> So, if I use this logic in my downloaded survey data sets, I can 
>>>> group
> all
>>>> the obs into different sampling weight over residence place and gender.
>>>> Like: I calculate total number of individuals who were in the 
>>>> dataset according to their resident city , say, total number of 
>>>> individuals in
>>> city
>>>> 1 is 1000 in dataset, total number of individuals in city n is 400 
>>>> in
> the
>>>> data set, then, I generate this city-total-individuals as a new 
>>>> variable (weight). (Or I can even be more detailed, total number of 
>>>> people in the data set over city, gender, age.) In regression, I 
>>>> simply use command
> "reg
>>> y
>>>> x1 x2 x3 [pweight=total]". Can this way correct in part for 
>>>> unweighted
>>> data
>>>> set?
>>>> Suppose the mean of total(weights) is 500, min is 100 and max is
> 800.Then,
>>>> weighted analysis will give at most 800/100 times the weights to
>>> potentially
>>>> under-sampled observations. Do I understand correctly?
>> Stas Kolenikov
>>>> If Lynn obtained her sample in a rigorous way by enumerating the
>>> dwellings,
>>>> she should have all the inputs into the probability of selection, 
>>>> and
> the
>>>> baseline sampling weight is the inverse of that.
>>>> Then she would want to correct for non-response, which would be the
>>> fraction
>>>> of those responding to the survey among those sampled.
>>>> If Lynn is interested in a specific population (females of 
>>>> reproductive
>>> age,
>>>> say), and that's who the survey collected the data on, then she 
>>>> would
> need
>>>> to get the total population counts for that specific population 
>>>> (which
> may
>>>> prove even more difficult).
>>>> If she does not have these figures, then I don't really know what to
>>> As
>>>> they say, when you approach a statistician with collected data in 
>>>> hand,
>>> they
>>>> can only tell you what killed your study.
>>>> On Wed, Sep 26, 2012 at 8:15 AM, JVerkuilen (Gmail)
>>>>> On Wed, Sep 26, 2012 at 2:49 AM, Lynn Lee <> wrote:
>>>>>> Any suggestion to suggest which weight is better? Or, other types 
>>>>>> of
>>>> weights
>>>>>> may be better than population weights?
>>>>> Do you have a few accurately observed variables such as the 
>>>>> population age and gender breakdown? If so you can often create 
>>>>> post-stratification weights (through a process called "raking") 
>>>>> that make your samples align with the associations observed in 
>>>>> those tables.
>>>>> A quick -findit raking- turned up a program -ipfraking- written by 
>>>>> Stas Kolenikov and available from his website. Hopefully he'll 
>>>>> chime in.
>> *
>> *   For searches and help try:
>> *
>> *
>> *
> *
> *   For searches and help try:
> *
> *
> *
> *
> *   For searches and help try:
> *
> *
> *

JVVerkuilen, PhD

"Out beyond ideas of wrong-doing and right-doing there is a field.
I'll meet you there. When the soul lies down in that grass the world is too
full to talk about." ---Rumi
*   For searches and help try:

*   For searches and help try:

© Copyright 1996–2016 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index