Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: testing equality of means for survey data


From   "Austin Nichols" <austinnichols@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: testing equality of means for survey data
Date   Fri, 23 Nov 2007 18:23:33 -0500

DEEPANKAR BASU <basu.15@osu.edu>:
Quite wrong, I'm afraid: Your variance calculation is incorrect, and
it looks like your subpop option does not restrict to boys (or girls),
but to children living in households with any boys (or any girls).
Try instead:

svyreg alive female
svymean alive, by(female)
test [alive]0=[alive]1

in Stata 8, or in Stata 10:

svy: reg alive female
svy, over(female): mean alive
test [alive]0=[alive]1

for a t-test of the difference in mean "alive" siblings (the
coefficient on female in the regressions is the difference between
boys and girls, so the p-value in that row of output offers a test of
no difference).  Since the dependent variable is a count, a  better
specification is

svypoisson alive female

though the tests should produce near-identical results in practice.

On 11/23/07, DEEPANKAR BASU <basu.15@osu.edu> wrote:
> I am working with Stata 8. I am working with survey data (DHS data) and am studying fertility behaviour of families. I have complete birth history data for each family in the sample. I wish to test the following hypothesis: girls have, on average, larger number of sibling.
>
> This is how I proceed. I calculate the number of boys and girls in each family (*nboy* and *ngirl*); then, I do:
>
> quietly gen alive = nboy + ngirl
> quietly gen sibg = (alive - 1) if ngirl > 0
> quietly gen sibb = (alive - 1) if nboy > 0
>
> Thus, *sibg* is the number of sibling for girls and *sibb* is number of sibling for boys. Then, I do:
>
> gen smpwt = v005/1000000
> svyset [pweight=smpwt], psu(v021) strata(v022)
>
> svymean sibg, subpop(ngirl)
> matrix t1 = e(b)
> matrix t2 = e(V)
> local t11 = e(N)
>
> svymean sibb, subpop(nboy)
> matrix t3 = e(b)
> matrix t4 = e(V)
> local t33 = e(N)
>
> gen sibeff = t1[1,1] - t3[1,1]
> local g1 = (t1[1,1] - t3[1,1])/sqrt((t2[1,1]/`t11')+(t4[1,1]/`t33'))
>
> Thus, *sibeff* gives me the difference in the average number of sibling for girls and boys and *g1* gives me the t-statistic for testing whether *sibeff* is significantly different from zero.
>
> I am getting the t-statistic as much larger than I expected; it is also much smaller if I do not correct for survey design and simply assume that I have a simple random sample. This is making me a little suspicious. My questions:
>
> 1) Am I making any mistake in my computation or reasoning?
> 2) Is there a better way to conduct this t-test?
>
> I looked at: http://www.ats.ucla.edu/STAT/stata/faq/svyttest.htm
> but did not find it useful.
>
> Thanks in advance.
>
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index