Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Svy subsamples


From   "Austin Nichols" <austinnichols@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Svy subsamples
Date   Sat, 24 Nov 2007 20:38:14 -0500

Richard--
I think there is a legit way to make your dataset smaller, but it
*really can be* that horrible if you just extract cases rather than
using the subpop option:

webuse nhanes2f, clear
svy, subpop(highlead): logit heartatk female weight diabetes
keep if highlead
svy: logit heartatk female weight diabetes

though this is a somewhat perverse case because of the missing values.
 If you're careful, the subset should give you identical coefs and
SEs:

webuse nhanes2f, clear
svy, subpop(highlead): logit heartatk female weight diabetes
est sto correct
keep if highlead==1
bys strat (psu): g coll=psu[1]==psu[_N]
egen c=group(strat psu)
g mstr=cond(coll==1,33,strat)
svyset c [pw=finalw], strat(mstr)
svy: logit heartatk female weight diabetes
est sto approx
esttab correct approx, mti nogaps sca(N_pop N_subpop F)

Note that now only the pop size and F are off--this too is fixable:

webuse nhanes2f, clear
svy, subpop(highlead): logit heartatk female weight diabetes
est sto correct
preserve
keep if highlead!=1
keep if !mi(heartatk,female,weight,diabetes)
collapse (sum) finalw, by(strat psu highlead)
foreach v in heartatk female weight diabetes {
 g `v'=0
 }
tempfile tmp
save `tmp'
restore
keep if highlead==1
bys strat (psu): g coll=psu[1]==psu[_N]
egen c=group(strat psu)
g mstr=cond(coll==1,33,strat)
svyset c [pw=finalw], strat(mstr)
svy: logit heartatk female weight diabetes
est sto approx
append using `tmp'
svyset psu [pw=finalw], strat(strat)
svy, subpop(highlead): logit heartatk female weight diabetes
est sto better
esttab correct approx better, mti nogaps sca(N_pop N_subpop F)

Note that the "better" data just contain one obs for each stratum/psu
containing the sum of weights for excluded obs, thus reducing the
total size of the data. It is tempting to write a -svysubset- package
to automate this subsetting procedure, but for any given model, the
pattern of missing values might be different, which means the
automatic-subsetting package could offer no savings in general over
keeping all the data in memory.  Your student might be able to do
something like the above once, though, and then safely use the subset
for multiple analyses.

On 11/21/07, Richard Williams <Richard.A.Williams.5@nd.edu> wrote:
> I know that when using the svy: prefix, you should use the subpop
> option when analyzing subsamples, rather than using if or dropping
> cases.  However, I have a student who has this monstrous data set and
> she only wants to analyze a small subset of it.  I'm afraid that if
> she has to keep all these unused cases in her file, her
> not-so-powerful computer is going to have problems.
>
> Is there a legit way to extract only the cases you want?  Is it all
> that horrible if you extract cases rather than use subpop?
>
> Thanks for any info.  And Happy Thanksgiving to all those who celebrate it.
>
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index