# RE: st: Analyzing a subpopulation in Stata 10.1

 From "Karadogan, Figen" To "statalist@hsphsun2.harvard.edu" Subject RE: st: Analyzing a subpopulation in Stata 10.1 Date Tue, 30 Jun 2009 15:49:41 +0000

```Michael and Jeff,

________________________________________
Sent: Monday, June 29, 2009 10:51 PM
To: statalist@hsphsun2.harvard.edu
Subject: Re: st: Analyzing a subpopulation in Stata 10.1

subpopulation estimation:

> Thank you so much for your detailed response. It was very helpful in
> demonstrating the mechanics of Stata's poststratification adjustment to
> weights. I now understand how they work, but I'm not sure what they do
> makes sense for subpopulations.

> I can look at Table T4 to get estimated #s of women who have given
> birth, who have not given birth, and who we don't know about, all adding
> up to the subpouplation size. I can create a new weight (with some
> work), still based on the poststrata, that produces estimated counts of
> the women who have given birth and those who have not, which also add up
> to the subpopulation size.

> The counts in Table T3, however, don't add up to the subpopulation size
> and don't have a straightforward interpretation. Since they total to
> 2251, the table implies that 2374-2251 = 123 women have unknown status.
> It's unclear why that's a better number than the 260 estimated in T4. To
> me, the numbers in T3 have no substantive meaning ... and by extension
> proportions, regressions, etc., will be weighted in a manner that has no
> obvious interpretation.

> It seems to me that the right thing to do is either drop the missing
> data, like we do in T4 or ordinarily would if we were not using
> poststratification, or to produce estimates that sum to subpopulation
> totals through reweighting at the subpopulation level. Can you tell me
> why I'm wrong? Thanks.

The thing to keep in mind here is that the poststratification adjustment must
be applied to the entire estimation sample.

It is not possible to reweight at the subpopulation level unless there is
poststratification information at that level; i.e. if we had the postratum
population sizes for the four cells defined by sex and native status.

In table T3, -svy: tabulate- applies the weight adjustment to the 184
observations in the estimation sample.  The only way to prevent that is to fix
the adjusted weights ahead of time (see -help svygen-), but that isn't always
a good solution.  The poststratified sampling weights are designed to reduce
bias in the point estimates; however, with the postratum ID's -svy linearized-
can produce more efficient variance estimates than without.

Ultimately, it is the researcher/data-analyst that has the responsibility and
power to choose which analysis is most appropriate for themselves.

I can imagine real survey data where there are any number of different
poststratification adjustments one could apply for a given analysis.  Some
will make much more substantive sense than others.

Suppose we had a variable called -ns_postid- that simultaneously identified
the native status and sex of each individual in the dataset, and another
variable called -ns_postw- that contained the population size for the
corresponding group.  I think it is clear that this poststratification
information could be applied more broadly than the one in Michael's simulate
dataset.

--Jeff

PS.  There is an undocumented -svygen- command that will generate
poststratification adjusted samling weights; see -help svygen-.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```