Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: -svy- commands with a pps sample vs. a simple random sample


From   Mike Lacy <Michael.Lacy@colostate.edu>
To   statalist@hsphsun2.harvard.edu
Subject   st: -svy- commands with a pps sample vs. a simple random sample
Date   Fri, 13 May 2011 15:58:58 -0600


Greetings,

I'm getting standard errors for means and regression coefficients using the -svy- commands that surprise me enough to make me wonder if I am using them correctly. What I'm finding is that the SE(mean) and SE(b) are smaller with a simple random sample than with probability proportional to size, even though the pps sample is constructed using a variable correlated about 0.9 with the outcome of interest. Below, I have some code with simulated data that shows what I am doing.


Background: I'm simulating data for an electrical utility usage reduction experiment. I've made the simulated distribution of kwh usage look like the real distribution. I assume that the percent of kwh usage saved (savepct) following an experiment with the users is of the form y = b0 + b1X + b2*sqrt(x), with that being the function of interested to be estimated.

// Create the simulated data
clear
set obs 25000
local sampleN = 500
set seed 83573
gen kwh = exp(rnormal(6.4, 0.65))  // kwh usage
gen savepct = -0.61 - 0.00014*kwh + 0.14 * sqrt(kwh)  // looks realistic to me
replace savepct = savepct + rnormal(0,0.5)  // gives r = 0.9 with kwh
// Population regression relationship
gen sqrtk = sqrt(kwh)
regress savepct kwh sqrtk   // The true populatioh relationship
//
// Sample the data, pps, and run a regression model
quiet summ kwh, detail
gen pps = `sampleN' * kwh/r(sum)  // sampling prob to get pps and n = 500
// User written -gsample- , see -findit gsample-
gsample `sampleN' [aw = pps],  gen(picked_pps) wor
gen pwt = 1/pps
svyset _n [pweight = pwt]
svy: mean savepct if picked_pps
svy: regress savepct kwh sqrtk if picked_pps
//
// Repeat analysis with simple random sampling
svyset, clear
gsample `sampleN',  gen(picked_psrs) wor
gen psrs = `sampleN'/`=_N' // sampling prob
replace pwt = 1/psrs
svyset _n [pweight = pwt]
svy: mean savepct if picked_psrs
svy: regress savepct kwh sqrtk if picked_psrs


Thanks,


=-=-=-=-=-=-=-=-=-=-=-=-=
Mike Lacy, Assoc. Prof.
Soc. Dept., Colo. State. Univ.
Fort Collins CO 80523 USA
(970)-491-6721

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index