[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
Steven Joel Hirsch Samuels <sjhsamuels@earthlink.net> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: variance when using svy: mean |

Date |
Mon, 3 Dec 2007 12:11:10 -0500 |

David, it doesn't sound like your study is a probability sample; if not, you don't need -svy- commands. Instead, use non-survey commands and assign an -iweight- or other weight variable to properly represent your population.

If your data do arise from a probability sample, your 'areas' appear to be strata, not primary sampling units (psu's). Strata are units which partition a population. A psu is the highest stage unit selected by random numbers within a stratum. Standard errors for survey data depend mainly on the number of psu's, not on the number of observations within them.

-Steven

On Dec 3, 2007, at 11:19 AM, David Merriman wrote:

Dear Statalisters:

I am a long time Statauser but new to svy: commands and am quite confused.

I apologize if this is long-winded I am trying to say it as concisely

as possible.

I have collected primary data in several geographic areas. Each of

the geographic areas has a different weight so that my entire sample

should be representative of the population. In each geographic area I

have collected a number of observations but the number of observations

in the area tells me nothing about the density of the activit

area. I want to estimate the population mean (for all geographic

areas) and the variance of that estimate. The problem is that while I

get sensible means the variances do not seem to be a function of the

number of observations I have. Intuitively I think that the variance

ought to change (fall) as the number of observations increases.

I tried using

svyset psuedo_psu [pweight=obs_weight]

svy: mean psuedo_chicago_tax_paid

where psuedo_psu is the variable indicating the primary sampling unit,

obs_weight is the psu_weight divided by the number of observations in

that psu and psuedo_chicago_tax_paid is the (zero-one) variable for

which I want to estimate the mean and variance.

I created a simulated data set (the real one is more complex) with 2

psus. In the first trial, each psu had 50 observations. psu 1 had a

weight of 1 and a 50 percent chance of a 1. psu 2 had a weight of 5

and a 20 percent chance of a 1. I get a sensible mean of .25 and a

standard error of .0833333.

In the second trial, I also had two psu. Psu 1 has 900 observations

and psu 2 has 100 observations. psu 1 had a weight of 1 and a 50

percent chance of a 1. psu 2 had a weight of 5 and a 20 percent

chance of a 1. I get a sensible mean of .25 but the same standard

error of .0833333 as in case 1. This does not make sense to me. I

have more observations in case 2 so I think I should get a smaller

variance.

I imagine I am not using the correct design. Can anyone help? Below,

I show the computer code for my simulation (fake data set) but you

don't need to read this if you understand the comments above. Thanks

so much.

#delimit ;

****************************************************************

* created the simulated data

***********************************************************;

set obs 100;

****************************************************************

* generate psu

***********************************************************;

gen psuedo_psu=1 if _n<51;

replace psuedo_psu=2 if _n>=51;

****************************************************************

* generate chicago_tax_paid

***********************************************************;

gen psuedo_chicago_tax_paid=1 if _n<=25;

replace psuedo_chicago_tax_paid=0 if _n>25 & _n<=50;

replace psuedo_chicago_tax_paid=1 if _n>50 & _n<61;

replace psuedo_chicago_tax_paid=0 if _n>=61;

****************************************************************

* generate psu weights

***********************************************************;

gen sample_weight=1 if psuedo_psu==1;

replace sample_weight=5 if psuedo_psu==2;

summarize;

****************************************************************

* generate OBSERVATION weights

***********************************************************;

sort psuedo_psu;

by psuedo_psu: gen obs_weight= sample_weight/_N;

summarize;

svyset psuedo_psu [pweight=obs_weight];

**********************************************************

* psu1 has a mean of .5 and a weight of 1

* psu2 has a mean of .2 and a weight of 5

* (5*.2)+(1*.5)=1.5

* 1.5/6=.25

*

* so the mean estimate makes sense to me

*******************************************************;

svy : mean psuedo_chicago_tax_paid;

mean psuedo_chicago_tax_paid;

*********************************************************

* do a second round with unequal size groups

*****************************************************;

clear;

#delimit ;

****************************************************************

* created the simulated data

***********************************************************;

set obs 1000;

****************************************************************

* generate psu

***********************************************************;

gen psuedo_psu=1 if _n<901;

replace psuedo_psu=2 if _n>=901;

****************************************************************

* generate chicago_tax_paid

***********************************************************;

gen psuedo_chicago_tax_paid=1 if _n<=450;

replace psuedo_chicago_tax_paid=0 if _n>450 & _n<=900;

replace psuedo_chicago_tax_paid=1 if _n>900 & _n<921;

replace psuedo_chicago_tax_paid=0 if _n>=921;

****************************************************************

* generate PSU weights

***********************************************************;

gen sample_weight=1 if psuedo_psu==1;

replace sample_weight=5 if psuedo_psu==2;

****************************************************************

* generate OBSERVATION weights

***********************************************************;

sort psuedo_psu;

by psuedo_psu: gen obs_weight= sample_weight/_N;

summarize;

svyset psuedo_psu [pweight=obs_weight];

**********************************************************

* psu1 has a mean of .5 and a weight of 1

* psu2 has a mean of .2 and a weight of 5

*

* I get the same answer for the mean in case 1 and case 2

* which I think is correct but

* I also get the same answer for the variance which I think is not correct

*

* I think I should have a lower variance in case 2

*******************************************************;

svy : mean psuedo_chicago_tax_paid;

mean psuedo_chicago_tax_paid;

--

David Merriman

dmerrim@gmail.com

*

* For searches and help try:

* http://www.stata.com/support/faqs/res/findit.html

* http://www.stata.com/support/statalist/faq

* http://www.ats.ucla.edu/stat/stata/

Steven Samuels sjhsamuels@earthlink.net 18 Cantine's Island Saugerties, NY 12477 Phone: 845-246-0774 EFax: 208-498-7441 * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: variance when using svy: mean***From:*"David Merriman" <dmerrim@gmail.com>

**References**:**st: variance when using svy: mean***From:*"David Merriman" <dmerrim@gmail.com>

- Prev by Date:
**st: IV and Heckman in Stata** - Next by Date:
**Re: st: variance when using svy: mean** - Previous by thread:
**st: variance when using svy: mean** - Next by thread:
**Re: st: variance when using svy: mean** - Index(es):

© Copyright 1996–2022 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |