Statalist The Stata Listserver

[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Re: Correction: How to test for equality of variance in data with sampling weights

From   Steven Samuels <>
Subject   st: Re: Correction: How to test for equality of variance in data with sampling weights
Date   Fri, 25 May 2007 22:25:44 -0400

The previous code below had an error, now corrected.
Yohannes wrote to me privately that he has a household indicator, say hh_id; that household is the only PSU he can identify in the data set; and that in a single urban setting, there is no stratum variable. In that case he would set up his analysis with:
"svyset hh_id [pweight=finalwgt]"

As Stas indicated in HIS recent email, one can compute a SD as a function of expectations and use either -testnl- or (my choice) - nlcom- because:

SD(income)= square root of E(income^2)-(E(income))^2

However -nlcom- will produce an erroneous standard error unless the variable is standardized by subtracting off the mean: inc= inc - mean (inc) and then squaring : inc2= inc*inc Then E(inc2) = var(income) and sqrt(E(inc2)) estimates the SD of income

As the SD is apt to have an asymmetric distribution, I suggest that the Johannes estimate the SE for the log(SD) and then convert back to the SD scale.

Johannes actually wants to compare SD's in two groups, assumed to be Male & Female gender here for illustration. In that case, I recommend that he compute a CI for the ratio, rather then for the difference, and that he do this on the log scale and then convert back to the ratio scale.

Below is code that should work. This utilizes the linearization method. Possibly Johannes might wish to try a jackknife estimate of the variance-covariance matrix.

/***************************CODE FOLLOWS*********************************************/

capture program drop _all

/* First a little program to back transform calculations done on the log scale after -nlcom- */
program antilog
local lparm el(r(b),1,1)
local se sqrt(el(r(V),1,1))
local bound invttail(e(df_r),.025)*`se' //For 95% CI's
local parm exp(`lparm')
local ll exp(`lparm' - `bound')
local ul exp( `lparm' + `bound')
di "parm =" `parm' " ll = " `ll' " ul = " `ul'

/* Get Estimate of the Mean for each Group */

svy: mean income, over(gender)

/* If gender has value labels (e.g. 1=Male 2=Female) use the following syntax */
gen inc=income-[income]Male if gender==1
replace inc=income-[imcome]Female if gender==2

/* Use this syntax if gender has no value label, but values 1 & 2 as above */
gen inc=income-[income]1 if gender==1
replace inc=income-[imcome]2 if gender==2

/* Now compute the square term */

gen inc2=inc*inc

svymean: inc2, over(gender) //estimate for inc2 is the estimated Variance of income

/* Individual SD's. Log Scale */
nlcom .5*log([inc2]Male)
nlcom .5*log([inc2]Female)

/* CI for the ratio of SD's--No Log */
nlcom sqrt([inc2]Male/[inc2]Female)

/* CI for ratio of SD's after Log Transformation. Omit 0.5 for ratio of Variances
The t-statistic is apt to be very different from that of the no-log version above*/

nlcom 0.5*log([inc2]Male/[inc2]Female)

/*--------END CODE-----------*/

* For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index