I have come across a problem that I can't figure out. I don't know why I am "losing" 873 observations with one stata command versus the other - for mean age by race, if age>=18. Both age and race are 100% present in the total dataset - total (N=21004) and for age>=18 (N=11,441).

This is complex survey data and of course, I would use the survey commands for analysis, but it still doesn't make sense why the non-survey command would use less observations (even so, the point estimates are identical.

Any insight into this would be greatly appreciated.

Here is the output for both:

svy, subpop(if age>=18): mean age, over(race)
(running mean on estimation sample)

Survey: Mean estimation

Number of strata = 28 Number of obs = 21004
Number of PSUs = 57 Population size = 2.8e+08
Subpop. no. obs = 11441
Subpop. size = 2.1e+08
Design df = 29

1: race = 1
2: race = 2
3: race = 3
4: race = 4
5: race = 5

--------------------------------------------------------------
| Linearized
Over | Mean Std. Err. [95% Conf. Interval]
-------------+------------------------------------------------
age |
1 | 46.87004 .3829944 46.08673 47.65335
2 | 42.29357 .4316838 41.41068 43.17646
3 | 36.81336 .5968429 35.59268 38.03404
4 | 41.05272 .9506344 39.10846 42.99699
5 | 41.83675 1.234968 39.31096 44.36254
--------------------------------------------------------------

mean age [pweight = mecwt4] if age>=18, over(race)

Mean estimation Number of obs = 10568

1: race = 1
2: race = 2
3: race = 3
4: race = 4
5: race = 5

--------------------------------------------------------------
Over | Mean Std. Err. [95% Conf. Interval]
-------------+------------------------------------------------
age |
1 | 46.87004 .2514365 46.37718 47.3629
2 | 42.29357 .343518 41.62021 42.96693
3 | 36.81336 .2793463 36.26579 37.36094
4 | 41.05272 .9437899 39.20272 42.90273
5 | 41.83675 .7637752 40.33961 43.33389
--------------------------------------------------------------

.

