On Nov 24, 2007, at 8:38 PM, Austin Nichols wrote:

Richard-- I think there is a legit way to make your dataset smaller, but it *really can be* that horrible if you just extract cases rather than using the subpop option: webuse nhanes2f, clear svy, subpop(highlead): logit heartatk female weight diabetes keep if highlead svy: logit heartatk female weight diabetes though this is a somewhat perverse case because of the missing values. If you're careful, the subset should give you identical coefs and SEs: webuse nhanes2f, clear svy, subpop(highlead): logit heartatk female weight diabetes est sto correct keep if highlead==1 bys strat (psu): g coll=psu[1]==psu[_N] egen c=group(strat psu) g mstr=cond(coll==1,33,strat) svyset c [pw=finalw], strat(mstr) svy: logit heartatk female weight diabetes est sto approx esttab correct approx, mti nogaps sca(N_pop N_subpop F) Note that now only the pop size and F are off--this too is fixable: webuse nhanes2f, clear svy, subpop(highlead): logit heartatk female weight diabetes est sto correct preserve keep if highlead!=1 keep if !mi(heartatk,female,weight,diabetes) collapse (sum) finalw, by(strat psu highlead) foreach v in heartatk female weight diabetes { g `v'=0 } tempfile tmp save `tmp' restore keep if highlead==1 bys strat (psu): g coll=psu[1]==psu[_N] egen c=group(strat psu) g mstr=cond(coll==1,33,strat) svyset c [pw=finalw], strat(mstr) svy: logit heartatk female weight diabetes est sto approx append using `tmp' svyset psu [pw=finalw], strat(strat) svy, subpop(highlead): logit heartatk female weight diabetes est sto better esttab correct approx better, mti nogaps sca(N_pop N_subpop F) Note that the "better" data just contain one obs for each stratum/psu containing the sum of weights for excluded obs, thus reducing the total size of the data. It is tempting to write a -svysubset- package to automate this subsetting procedure, but for any given model, the pattern of missing values might be different, which means the automatic-subsetting package could offer no savings in general over keeping all the data in memory. Your student might be able to do something like the above once, though, and then safely use the subset for multiple analyses. On 11/21/07, Richard Williams <Richard.A.Williams.5@nd.edu> wrote:I know that when using the svy: prefix, you should use the subpop

option when analyzing subsamples, rather than using if or dropping

cases. However, I have a student who has this monstrous data set and

she only wants to analyze a small subset of it. I'm afraid that if

she has to keep all these unused cases in her file, her

not-so-powerful computer is going to have problems.

Is there a legit way to extract only the cases you want? Is it all

that horrible if you extract cases rather than use subpop?

Thanks for any info. And Happy Thanksgiving to all those who celebrate it.

