Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: seeking answer to survey set question


From   Stas Kolenikov <skolenik@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: seeking answer to survey set question
Date   Fri, 28 Aug 2009 15:14:54 -0500

Very simply put, the survey characteristic that provides the match of
the sample to your population are sampling weights. Strata and PSUs
are used to obtain correct standard errors; if stratification is
ignored (as Austin implicitly suggests below), you will get
conservative standard errors (i.e., too large). You should try to find
out from documentation as to why those 1406 observations have missing
design information. Were they not sampled originally, but added as a
substitution? Is this a part of a replenishment sample? Are those the
students who transferred to another school, and hence aren't a part of
the original PSU in the later waves? What is the story behind them?

On Fri, Aug 28, 2009 at 11:04 AM, Austin Nichols<austinnichols@gmail.com> wrote:
> James <jpsanders@wsu.edu> :
> Here's what I would do:
>
> egen c=group(stata psu), m
> ologit depvar indvar1 indvar2 [pw=pw], cluster(c).
>
> which puts all the missing-strata people in one stratum.
>
> On Fri, Aug 28, 2009 at 11:59 AM, Sanders, James Parry<jpsanders@wsu.edu> wrote:
>> Hello,
>> NELS (educational) data comes packaged with psu, pw, and strata data.  When I svyset the data, I am told that 1,406 cases have missing values in the survey characteristics (all 1,406 are missing psu and strata data).  Thus, when I run a survey command (e.g. svy: ologit) these 1,406 are excluded from the analysis.  Alternatively, I can keep the 1,406 in by running a standard command and including 2 of the three weights but leaving out the strata values (e.g. ologit depvar indvar1 indvar2 [pw=pw], cluster(psu)).  Either way the results are essentially the same.
>>
>> My question(s) is/are this:  Which way is preferred?  The first way includes all weights but drops 8% of the sample who otherwise have complete data.  The second way keeps everyone but doesn't include the strata data and may thus not be fully representative of the population. Is there a way to keep cases lacking psu and strata data in svy commands.  Alternatively, is there a way to include the strata data in the non-survey command?
>> Thanks for any help,
>> James


-- 
Stas Kolenikov, also found at http://stas.kolenikov.name
Small print: I use this email account for mailing lists only.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index