[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"David Merriman" <dmerrim@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
st: variance when using svy: mean |

Date |
Mon, 3 Dec 2007 10:19:12 -0600 |

Dear Statalisters: I am a long time Statauser but new to svy: commands and am quite confused. I apologize if this is long-winded I am trying to say it as concisely as possible. I have collected primary data in several geographic areas. Each of the geographic areas has a different weight so that my entire sample should be representative of the population. In each geographic area I have collected a number of observations but the number of observations in the area tells me nothing about the density of the activity in the area. I want to estimate the population mean (for all geographic areas) and the variance of that estimate. The problem is that while I get sensible means the variances do not seem to be a function of the number of observations I have. Intuitively I think that the variance ought to change (fall) as the number of observations increases. I tried using svyset psuedo_psu [pweight=obs_weight] svy: mean psuedo_chicago_tax_paid where psuedo_psu is the variable indicating the primary sampling unit, obs_weight is the psu_weight divided by the number of observations in that psu and psuedo_chicago_tax_paid is the (zero-one) variable for which I want to estimate the mean and variance. I created a simulated data set (the real one is more complex) with 2 psus. In the first trial, each psu had 50 observations. psu 1 had a weight of 1 and a 50 percent chance of a 1. psu 2 had a weight of 5 and a 20 percent chance of a 1. I get a sensible mean of .25 and a standard error of .0833333. In the second trial, I also had two psu. Psu 1 has 900 observations and psu 2 has 100 observations. psu 1 had a weight of 1 and a 50 percent chance of a 1. psu 2 had a weight of 5 and a 20 percent chance of a 1. I get a sensible mean of .25 but the same standard error of .0833333 as in case 1. This does not make sense to me. I have more observations in case 2 so I think I should get a smaller variance. I imagine I am not using the correct design. Can anyone help? Below, I show the computer code for my simulation (fake data set) but you don't need to read this if you understand the comments above. Thanks so much. #delimit ; **************************************************************** * created the simulated data ***********************************************************; set obs 100; **************************************************************** * generate psu ***********************************************************; gen psuedo_psu=1 if _n<51; replace psuedo_psu=2 if _n>=51; **************************************************************** * generate chicago_tax_paid ***********************************************************; gen psuedo_chicago_tax_paid=1 if _n<=25; replace psuedo_chicago_tax_paid=0 if _n>25 & _n<=50; replace psuedo_chicago_tax_paid=1 if _n>50 & _n<61; replace psuedo_chicago_tax_paid=0 if _n>=61; **************************************************************** * generate psu weights ***********************************************************; gen sample_weight=1 if psuedo_psu==1; replace sample_weight=5 if psuedo_psu==2; summarize; **************************************************************** * generate OBSERVATION weights ***********************************************************; sort psuedo_psu; by psuedo_psu: gen obs_weight= sample_weight/_N; summarize; svyset psuedo_psu [pweight=obs_weight]; ********************************************************** * psu1 has a mean of .5 and a weight of 1 * psu2 has a mean of .2 and a weight of 5 * (5*.2)+(1*.5)=1.5 * 1.5/6=.25 * * so the mean estimate makes sense to me *******************************************************; svy : mean psuedo_chicago_tax_paid; mean psuedo_chicago_tax_paid; ********************************************************* * do a second round with unequal size groups *****************************************************; clear; #delimit ; **************************************************************** * created the simulated data ***********************************************************; set obs 1000; **************************************************************** * generate psu ***********************************************************; gen psuedo_psu=1 if _n<901; replace psuedo_psu=2 if _n>=901; **************************************************************** * generate chicago_tax_paid ***********************************************************; gen psuedo_chicago_tax_paid=1 if _n<=450; replace psuedo_chicago_tax_paid=0 if _n>450 & _n<=900; replace psuedo_chicago_tax_paid=1 if _n>900 & _n<921; replace psuedo_chicago_tax_paid=0 if _n>=921; **************************************************************** * generate PSU weights ***********************************************************; gen sample_weight=1 if psuedo_psu==1; replace sample_weight=5 if psuedo_psu==2; **************************************************************** * generate OBSERVATION weights ***********************************************************; sort psuedo_psu; by psuedo_psu: gen obs_weight= sample_weight/_N; summarize; svyset psuedo_psu [pweight=obs_weight]; ********************************************************** * psu1 has a mean of .5 and a weight of 1 * psu2 has a mean of .2 and a weight of 5 * * I get the same answer for the mean in case 1 and case 2 * which I think is correct but * I also get the same answer for the variance which I think is not correct * * I think I should have a lower variance in case 2 *******************************************************; svy : mean psuedo_chicago_tax_paid; mean psuedo_chicago_tax_paid; -- David Merriman dmerrim@gmail.com * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: variance when using svy: mean***From:*Steven Joel Hirsch Samuels <sjhsamuels@earthlink.net>

- Prev by Date:
**Re: st: graph window is automatically minimized** - Next by Date:
**st: Pointers in Stata and the .new built-in** - Previous by thread:
**st: large-scale models** - Next by thread:
**Re: st: variance when using svy: mean** - Index(es):

© Copyright 1996–2022 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |