Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: subpops vs. over & lincom t vs. regress t in svyset data

From   "Scholes, Shaun" <>
To   "" <>
Subject   st: RE: subpops vs. over & lincom t vs. regress t in svyset data
Date   Tue, 10 Jan 2012 10:02:24 +0000

Michael, you first question is dealt with in Stata's (Version 12) survey data manual (pp.62-63). Using the over option will result in the exclusion of observations with missing values (gender==.).
Whether they are excluded by the subpop option or not depends on how your variable used to define the subpop deals with the missing observations. 
Using the manual's example, if there are missing observations on race then:

generate nonblack = (race == 0) if !missing(race)
generate nonblack = (race == 0) 

will give different SEs when using:

svy, subpop(nonblack): mean birthwgt

The manual clearly states that the first approach ought to be used (observations with missing values will then make zero contribution to the variance). 
There should then be no difference in results when using svy: mean birthwgt,over(race) or svy,subpop(nonblack): mean birthwgt.

I am more able to answer the first than the second question. But I note that the following code gives me identical SEs and t-statistic:

svy:regress birthwgt race
svy:mean birthwgt,over(race)

Best wishes

-----Original Message-----
From: [] On Behalf Of Michael Costello
Sent: 03 January 2012 19:15
To: statalist
Subject: st: subpops vs. over & lincom t vs. regress t in svyset data

Happy New Year Statlisters!

I'm working with many many similar survey weighted datasets of international education data.  Often I am tasked with creating tables of statistics (means, variances, counts, t-statistics, effect size,
etc.) for many subpopulations and over several phases (baseline, midterm, final).

We had been calculating our statistics using -svy: varname,
over(subpops)- rather than using many -svy, subpop(subpops): mean
varname- functions in quick succession, as the returned values were equal.  In a more recent database, the values are not equal, and I'm wondering why that is.  The subpopulation I was working with was gender (female=1, male=0).  Could the discrepancies be due to the handful of observations with gender = . (missing), or is there some other difference in the calculations?  It appears that using the
-subpop- option treats those observations as non-existent.  How does
-over- treat them?

I'm also trying to find out the difference between the t-statistic that is printed when I do a -lincom- function and the t-statistic that is printed when I do a regress function.  For example:

svy: regress score gender
svy: mean score, over(gender)
lincom [score]Male - [score]Female

I believe that the regression function uses a pooled standard error SE, while the -lincom- uses an unpooled calculation, but I was hoping for some confirmation on that.

Thanks so much for all your help and advice!  You folks are always so helpful and informative.

Michael Costello

"To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of."  -Sir Ronald Aylmer Fisher, FRS

*   For searches and help try:

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index