Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Svy regress using subpop and incorrect number of obs


From   "Ridolfo, Heather E. (CDC/CCHIS/NCHS) (CTR)" <evd7@CDC.GOV>
To   statalist@hsphsun2.harvard.edu
Subject   st: Svy regress using subpop and incorrect number of obs
Date   Tue, 22 Jul 2008 08:31:27 -0400

All,

I am using survey regression with the subpop command, see below. 

svyset psuscid [pweight = gswgt1], strata(region)
svy, subpop(allsp): regress esteem white3 hisp child_sex

However the number of observations in the output does not match the
total number of cases in my dataset. I have 18924 cases in the original
dataset, here the number of observations is only 18768.
However, the subpopulation number of observations does appear correct
(10224). 

Survey: Linear regression

Number of strata   =         4                 		Number of obs
=     18768
Number of PSUs     =       132                 		Population size
=  22000302
                                                	Subpop. no. of
obs =     10224
                                              	  	Subpop. size
=  12582072
                                             		Design df
=       128
                                                	F(   3,    126)
=     47.41
                                               		Prob > F
=    0.0000
                                                	R-squared
=    0.0331


In another output not only are the number of observations incorrect
(should be 10244) but the PSUs are also lower.

svyset psuscid [pweight = gswgt1], strata(region)
svy, subpop(if bhsp == 1): regress  esteem racebh


Number of strata   =         4                  	 Number of obs
=      6973
Number of PSUs     =       126                 		 Population size
= 5955176.3
                                             		 Subpop. no. of
obs =      4017
                                                	 Subpop. size
= 3346350.8
                                                	 Design df
=       122
                                                	 F(   1,    122)
=     33.36
                                                	 Prob > F
=    0.0000
                                                	 R-squared
=    0.0253

There are missing cases in some of the variables in my regression. Is
stata dropping these cases from the number of original observations? I
do specify in my subpop command to not include cases with missing data.
If STATA is dropping observations from my original dataset due to
incomplete data, is the survey design information from these
observations retained in the calculation of the standard errors? 
Every example I have found of stata output using survey regression with
the subpop command the number of observations matches the total number
of cases in the dataset. 

Thanks,
Heather



*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index