Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# st: -svy- commands with a pps sample vs. a simple random sample

 From Mike Lacy To statalist@hsphsun2.harvard.edu Subject st: -svy- commands with a pps sample vs. a simple random sample Date Fri, 13 May 2011 15:58:58 -0600

```
Greetings,

```
I'm getting standard errors for means and regression coefficients using the -svy- commands that surprise me enough to make me wonder if I am using them correctly. What I'm finding is that the SE(mean) and SE(b) are smaller with a simple random sample than with probability proportional to size, even though the pps sample is constructed using a variable correlated about 0.9 with the outcome of interest. Below, I have some code with simulated data that shows what I am doing.
```

```
Background: I'm simulating data for an electrical utility usage reduction experiment. I've made the simulated distribution of kwh usage look like the real distribution. I assume that the percent of kwh usage saved (savepct) following an experiment with the users is of the form y = b0 + b1X + b2*sqrt(x), with that being the function of interested to be estimated.
```
// Create the simulated data
clear
set obs 25000
local sampleN = 500
set seed 83573
gen kwh = exp(rnormal(6.4, 0.65))  // kwh usage
gen savepct = -0.61 - 0.00014*kwh + 0.14 * sqrt(kwh)  // looks realistic to me
replace savepct = savepct + rnormal(0,0.5)  // gives r = 0.9 with kwh
// Population regression relationship
gen sqrtk = sqrt(kwh)
regress savepct kwh sqrtk   // The true populatioh relationship
//
// Sample the data, pps, and run a regression model
quiet summ kwh, detail
gen pps = `sampleN' * kwh/r(sum)  // sampling prob to get pps and n = 500
// User written -gsample- , see -findit gsample-
gsample `sampleN' [aw = pps],  gen(picked_pps) wor
gen pwt = 1/pps
svyset _n [pweight = pwt]
svy: mean savepct if picked_pps
svy: regress savepct kwh sqrtk if picked_pps
//
// Repeat analysis with simple random sampling
svyset, clear
gsample `sampleN',  gen(picked_psrs) wor
gen psrs = `sampleN'/`=_N' // sampling prob
replace pwt = 1/psrs
svyset _n [pweight = pwt]
svy: mean savepct if picked_psrs
svy: regress savepct kwh sqrtk if picked_psrs

Thanks,

=-=-=-=-=-=-=-=-=-=-=-=-=
Mike Lacy, Assoc. Prof.
Soc. Dept., Colo. State. Univ.
Fort Collins CO 80523 USA
```
(970)-491-6721
```
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```