[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
jpitblado@stata.com (Jeff Pitblado, StataCorp LP) |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Svy regress using subpop and incorrect number of obs |

Date |
Tue, 22 Jul 2008 10:31:47 -0500 |

Heather Ridolfo <evd7@CDC.GOV> is using -svy: regress- with the -subpop()- option, and noticed that the reported sample size is smaller than the number of observations in here dataset: > I am using survey regression with the subpop command, see below. > > svyset psuscid [pweight = gswgt1], strata(region) > svy, subpop(allsp): regress esteem white3 hisp child_sex > > However the number of observations in the output does not match the > total number of cases in my dataset. I have 18924 cases in the original > dataset, here the number of observations is only 18768. > However, the subpopulation number of observations does appear correct > (10224). > > Survey: Linear regression > > Number of strata = 4 Number of obs = 18768 > Number of PSUs = 132 Population size = 22000302 > Subpop. no. of obs = 10224 > Subpop. size = 12582072 > Design df = 128 > F( 3, 126) = 47.41 > Prob > F = 0.0000 > R-squared = 0.0331 > > > In another output not only are the number of observations incorrect > (should be 10244) but the PSUs are also lower. > > svyset psuscid [pweight = gswgt1], strata(region) > svy, subpop(if bhsp == 1): regress esteem racebh > > > Number of strata = 4 Number of obs = 6973 > Number of PSUs = 126 Population size = 5955176.3 > Subpop. no. of obs = 4017 > Subpop. size = 3346350.8 > Design df = 122 > F( 1, 122) = 33.36 > Prob > F = 0.0000 > R-squared = 0.0253 > > There are missing cases in some of the variables in my regression. Is > stata dropping these cases from the number of original observations? I > do specify in my subpop command to not include cases with missing data. > If STATA is dropping observations from my original dataset due to > incomplete data, is the survey design information from these > observations retained in the calculation of the standard errors? > Every example I have found of stata output using survey regression with > the subpop command the number of observations matches the total number > of cases in the dataset. Heather should check that her Stata is fully up-to-date. On 02apr2008, we posted an ado-file update that fixed a problem similar to what Heather is describing above. Here is the corresponding entry from -help whatsnew-: 5. svy's linearized variance estimator was marking out observations that had missing values in the independent variables for observations outside the subpopulation. This affects the estimated variance values when the primary sampling units were the individual observations and could decrease the design degrees of freedom. Both of these effects are very slight and inversely related to the sample size. This has been fixed. Note that, prior to this update, entire PSU's can be dropped if each observation within the PSU contains a missing value in one of the variables in the model fit. With an updated Stata, only observations containing missing values within the subpop are dropped. Here is a simple experiment, using the auto data: . sysuse auto . gen sub = for & !missing(rep78) . tab rep78 sub Repair | Record | sub 1978 | 0 1 | Total -----------+----------------------+---------- 1 | 2 0 | 2 2 | 8 0 | 8 3 | 27 3 | 30 4 | 9 9 | 18 5 | 2 9 | 11 . | 5 0 | 5 -----------+----------------------+---------- Total | 53 21 | 74 . svyset _n . svy, subpop(sub): regress mpg rep78 (running regress on estimation sample) Survey: Linear regression Number of strata = 1 Number of obs = 74 Number of PSUs = 74 Population size = 74 Subpop. no. of obs = 21 Subpop. size = 21 Design df = 73 F( 1, 73) = 0.60 Prob > F = 0.4409 R-squared = 0.0285 ------------------------------------------------------------------------------ | Linearized mpg | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- rep78 | 1.486111 1.917962 0.77 0.441 -2.336381 5.308604 _cons | 18.91667 7.106727 2.66 0.010 4.75298 33.08035 ------------------------------------------------------------------------------ --Jeff jpitblado@stata.com * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**Re: st: Display ValuesLabels and NumericCodes in Tabulate Tables** - Next by Date:
**st: How to report two normally distributed variables with a correlation structure?** - Previous by thread:
**st: Svy regress using subpop and incorrect number of obs** - Next by thread:
**st: IV with missing values** - Index(es):

© Copyright 1996–2015 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |