# Re: st: How to test for equality of variance in data with sampling weights

 From jmetzler@worldbank.org To statalist@hsphsun2.harvard.edu Subject Re: st: How to test for equality of variance in data with sampling weights Date Wed, 30 May 2007 19:01:01 -0400

```Dear Steven and Stas,
once I have tried!
Kind regards,
Johannes

Steven Samuels
<ssamuels@alban
y.edu>                                                          To
Sent by:                statalist@hsphsun2.harvard.edu
owner-statalist                                                 cc
@hsphsun2.harva
rd.edu                                                     Subject
Re: st: How to test for equality of
variance in data with sampling weights
05/25/2007
06:10 PM

to
statalist@hsphs
un2.harvard.edu

Yohannes wrote to me privately that he has a household indicator, say
hh_id; that household is the only PSU he can identify in the data
set; and that in a single urban setting, there is no stratum
variable. In that case he would set up his analysis with:
"svyset hh_id [pweight=finalwgt]"

As Stas indicated in HIS recent email, one can compute a SD as a
function of expectations and use either -testnl- or (my choice) -
nlcom-   because:

SD(income)= square root of E(income^2)-(E(income))^2

However  -nlcom- will produce an erroneous standard error unless the
variable is standardized by subtracting off the mean: inc= inc - mean
(inc) and then squaring : inc2= inc*inc  Then E(inc2) = var(income)
and sqrt(E(inc2)) estimates the SD of income

As the SD is apt to have an asymmetric distribution, I suggest that
the Johannes estimate the SE for the log(SD) and then convert back to
the SD scale.

Johannes actually wants to compare SD's in two groups, assumed to be
Male & Female gender here for illustration.   In that case, I
recommend that he compute a CI for the ratio, rather then for the
difference, and that he do this on the log scale and then convert
back to the ratio scale.

Below is code that should work.  This utilizes the linearization
method.  Possibly Johannes might wish to try a jackknife estimate of
the variance-covariance matrix.
Steve

/***************************CODE
FOLLOWS*********************************************/

capture program drop _all

/* First a little program to back transform calculations done on the
log scale after -nlcom- */
program antilog
local lparm  el(r(b),1,1)
local se     sqrt(el(r(V),1,1))
local bound  invttail(e(df_r),.025)*`se'  //For 95% CI's
local parm   exp(`lparm')
local ll     exp(`lparm'  - `bound')
local ul     exp( `lparm' + `bound')
di  "parm =" `parm'  "    ll = " `ll'  "   ul = " `ul'
end

/* Get Estimate of the Mean for each Group */

svy: mean income, over(gender)

/* If gender has value labels (e.g. 1=Male 2=Female) use the
following syntax */
gen     inc=income-[income]Male   if gender==1
replace inc=income-[imcome]Female if gender==2

/* Use this syntax if gender has no value label, but values 1 & 2 as
above */
gen     inc=income-[income]1 if gender==1
replace inc=income-[imcome]2 if gender==2

/* Now compute the square term */

gen inc2=inc*inc

svymean: inc2, over(gender)   //estimate for inc2 is the estimated
Variance of income

/* Individual SD's. Log Scale */
nlcom  .5*log([inc2]Male)
antilog
nlcom  .5*log([inc2]Female)
antilog

/* CI for the ratio of SD's--No Log */
nlcom sqrt([inc2]Male/[inc2]Female)

/* CI for ratio of SD's after Log Transformation. The square root can
be omitted, because log(A^.5)-log(B^.5) = log(A)-log(B)
The t-statistic is apt to be very different from that of the no-log
version above*/

nlcom log([inc2]Male/[inc2]Female)
antilog

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```