Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: -svy- commands with a pps sample vs. a simple random sample

From   Mike Lacy <>
Subject   st: -svy- commands with a pps sample vs. a simple random sample
Date   Fri, 13 May 2011 15:58:58 -0600


I'm getting standard errors for means and regression coefficients using the -svy- commands that surprise me enough to make me wonder if I am using them correctly. What I'm finding is that the SE(mean) and SE(b) are smaller with a simple random sample than with probability proportional to size, even though the pps sample is constructed using a variable correlated about 0.9 with the outcome of interest. Below, I have some code with simulated data that shows what I am doing.

Background: I'm simulating data for an electrical utility usage reduction experiment. I've made the simulated distribution of kwh usage look like the real distribution. I assume that the percent of kwh usage saved (savepct) following an experiment with the users is of the form y = b0 + b1X + b2*sqrt(x), with that being the function of interested to be estimated.

// Create the simulated data
set obs 25000
local sampleN = 500
set seed 83573
gen kwh = exp(rnormal(6.4, 0.65))  // kwh usage
gen savepct = -0.61 - 0.00014*kwh + 0.14 * sqrt(kwh)  // looks realistic to me
replace savepct = savepct + rnormal(0,0.5)  // gives r = 0.9 with kwh
// Population regression relationship
gen sqrtk = sqrt(kwh)
regress savepct kwh sqrtk   // The true populatioh relationship
// Sample the data, pps, and run a regression model
quiet summ kwh, detail
gen pps = `sampleN' * kwh/r(sum)  // sampling prob to get pps and n = 500
// User written -gsample- , see -findit gsample-
gsample `sampleN' [aw = pps],  gen(picked_pps) wor
gen pwt = 1/pps
svyset _n [pweight = pwt]
svy: mean savepct if picked_pps
svy: regress savepct kwh sqrtk if picked_pps
// Repeat analysis with simple random sampling
svyset, clear
gsample `sampleN',  gen(picked_psrs) wor
gen psrs = `sampleN'/`=_N' // sampling prob
replace pwt = 1/psrs
svyset _n [pweight = pwt]
svy: mean savepct if picked_psrs
svy: regress savepct kwh sqrtk if picked_psrs


Mike Lacy, Assoc. Prof.
Soc. Dept., Colo. State. Univ.
Fort Collins CO 80523 USA

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index