Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: How to calculate standardized difference in means with survey weighted data?

From   Steve Samuels <>
Subject   Re: st: RE: How to calculate standardized difference in means with survey weighted data?
Date   Mon, 5 Mar 2012 11:39:02 -0500

The post that Shaun refers to was from Jeff Pitblado at:
(Shaun: as perthe  FAQ please always give the URL for references)

There are many ways to do this computation.  The "pooled" SD is simply the
 estimated population SD, so calculation of the subpopulation SDs is not needed.

Try the following with one variable. When you understand the output, 
use the -quietly- block (now commented out) to suppress all but the 
display of standardized differences 

*************CODE BEGINS*************
sysuse auto, clear
svyset _n [pw = turn]
// set up variable list
local vlist length price weight headroom

foreach v of varlist `vlist'{
// qui {
svy: mean `v'

estat sd  //population SD: available in Stata 10
local sd = el(r(sd),1,1)
// or  local sd  = sqrt(e(N)*el(e(V_srs),1,1))  

svy: mean `v', over(foreign)
local sdiff1 = (_b[`v':Foreign] - _b[`v':Domestic])/`sd'

 // Difference in means with less typing, more output */
svy: reg `v' foreign
local sdiff2 = _b[foreign]/`sd'
di `sdiff1' " " `sdiff2'  //same
// }  end quiet block

di _column(3)  "Stand. Diff " "`v'" _column(25) " = " _column(28) `sdiff1'
**************CODE ENDS**************


On Mar 5, 2012, at 5:49 AM, Scholes, Shaun wrote:

This was a lot trickier than I thought...and I cannot answer with 100% confidence (or maybe not even 95%). 
But I found a posting called "RE: st: svy: mean and descriptive tables " from July 2007.

And I was able to replicate the following results using estat sd (using that posting: and I am using Stata SE v12.0): 

webuse nhanes2
svyset psu [pw=finalwgt], strata(strata)
svy: mean smsa height weight bpsystol bpdiast tcresult tgresult
estat sd

// e(N) is in a scalar
// unweighted number of observations

di sqrt(e(N) * el(e(V_srs), 1, 1))
di sqrt(e(N) * el(e(V_srs), 2, 2))
di sqrt(e(N) * el(e(V_srs), 7, 7))

So the quantities you may (I stress may) need are in the scalar e(N) and the diagonal of the variance-covariance matrix (which is where I normally obtain the Standard Errors from). 

Then I tried the same in the context of using -svy- to estimate mean BMI over two groups in this simple example:

input id bmi group
1	25.1	1
2	26.2	2
3	26.4	2
4	25.4	1
5	25.7	2
6	25.3	1
7	26.3	2
8	26.3	1
9	25.9	2
10	24.1	2
11	25.6	1
12	25.4	1
13	28.3	2
14	27.2	1
15	26.2	2
16	26.2	1
17	28.2	1
18	27.1	2
19	28.1	1
20	28.4	2

gen weight=1
svyset [pweight=weight]
svy:mean bmi,over(group)
estat sd
mat A = e(_N)
di sqrt(A[1,1]* el(e(V_srs), 1, 1))
di sqrt(A[1,2]* el(e(V_srs), 2, 2))

If this is on the right lines then you could adapt the posting from July 2007 and put all these into tables using the programmes in the July 2007 posting? 

Hope this helps 
Best wishes

-----Original Message-----
From: [] On Behalf Of Lok Wong
Sent: 05 March 2012 00:46
Subject: st: How to calculate standardized difference in means with survey weighted data?

I need to calculate the standardized bias (the difference in means divided by the pooled standard deviation) with survey weighted data using STATA. 
I am comparing the means of 2 groups (Y: treatment and control) for a list of X predictor variables. The purpose is to evaluate differences before and after propensity score weighting (not matching so I cannot use PSMATCH2 or other similar packages).

This is as far I got:

svy: mean X, over (Y)
estat sd
lincom [X]1 - lincom [x]0

I calculated the means by treatment/control groups.
Then obtained the standard deviations for each means (as the SE is reported by svy: means) I used lincom to obtain the difference in means from the svy post-estimation results.

How do I now extract the stored standard deviations for the 2 means, so I can divide the difference in means by the pooled standard deviation?

I need to do this twice for 20 variables, so I don't want to just read the output results and calculate by hand.

Any suggestions?

I did see an earlier posting (from 2005) on Standardized Response Mean, but the suggested code (diff in change score / sd of change score) does not address how to use survey weighted data.


Lok Wong Samson
Doctoral Candidate
*   For searches and help try:

*   For searches and help try:

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index