Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: subpops vs. over & lincom t vs. regress t in svyset data

From   Stas Kolenikov <>
Subject   Re: st: subpops vs. over & lincom t vs. regress t in svyset data
Date   Sun, 22 Jan 2012 23:01:03 -0500

I thought it was answered, and here's what I remember from that answer
(and what I understand about the problem). The option -over(whatever)-
omits the missing values in the -whatever- variable, essentially
working as the -if !missing(whatever)- filter:

sysuse auto, clear
svyset [pw=weight]
svy : mean mpg
svy : mean mpg, over( rep )
svy , subpop( if rep78 == 1 ) : mean mpg

Note the difference in the number of observations (and hence
population size), as well as the standard errors, between the last two
commands, even though they report the same point estimates. The latter
standard error can be brute-forced by

svy , subpop( if rep78 == 1 ) : mean mpg if !missing( rep78 )

although one shall never use -if- with -svy-! See for a very clear
explanation of the issues involved; it makes no sense for me to repeat
these arguments on the list.

On Tue, Jan 3, 2012 at 1:14 PM, Michael Costello
<> wrote:
> Happy New Year Statlisters!
> I'm working with many many similar survey weighted datasets of
> international education data.  Often I am tasked with creating tables
> of statistics (means, variances, counts, t-statistics, effect size,
> etc.) for many subpopulations and over several phases (baseline,
> midterm, final).
> We had been calculating our statistics using -svy: varname,
> over(subpops)- rather than using many -svy, subpop(subpops): mean
> varname- functions in quick succession, as the returned values were
> equal.  In a more recent database, the values are not equal, and I'm
> wondering why that is.  The subpopulation I was working with was
> gender (female=1, male=0).  Could the discrepancies be due to the
> handful of observations with gender = . (missing), or is there some
> other difference in the calculations?  It appears that using the
> -subpop- option treats those observations as non-existent.  How does
> -over- treat them?
> I'm also trying to find out the difference between the t-statistic
> that is printed when I do a -lincom- function and the t-statistic that
> is printed when I do a regress function.  For example:
> svy: regress score gender
> vs.
> svy: mean score, over(gender)
> lincom [score]Male - [score]Female
> I believe that the regression function uses a pooled standard error
> SE, while the -lincom- uses an unpooled calculation, but I was hoping
> for some confirmation on that.
> Thanks so much for all your help and advice!  You folks are always so
> helpful and informative.
> -Michael
> --
> Michael Costello
> "To call in the statistician after the experiment is done may be no
> more than asking him to perform a post-mortem examination: he may be
> able to say what the experiment died of."  -Sir Ronald Aylmer Fisher,
> *
> *   For searches and help try:
> *
> *
> *

Stas Kolenikov, also found at
Small print: I use this email account for mailing lists only.

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index