Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

re: Re: st: Constructing a variable from standard deviations

From   Maarten buis <[email protected]>
To   [email protected]
Subject   re: Re: st: Constructing a variable from standard deviations
Date   Mon, 22 Nov 2010 19:19:38 +0000 (GMT)

So now Mathijs has 3 solutions:
[Stas] use one -regres-, -predict- the residuals, compute the standard
       error of these residuals for each occupation separately.
[me]   use maximum likelihood to estimate a model with different 
       residual standard errors for each occupation.
[Kit]  estimate a separate regression for each occupation, predict the
       residuals, and compute the standard error of each occupation.

The solutions by me and Kit have the advantage that the model explicitly
allows for differences in the residual variance, Stas' solution works, but
is substantively awkward as in it you use a model that assumes that the 
residual variance is constant acros occupations to estimate differences in 
the residual variance across occupations.

However, when comparing these models in the simulation below, all three
seem to perform reasonably well. The bias is larger than one would have 
expected given the randomness inherit in Monte Carlo simulations (compare
the biases with the MCse), but the biases are so small that based on this
simulation I would be happy to use any of the three methods.

The example below requires Ian White's -simsum-, which you can download
by typing in Stata: -ssc install simsum-, and you can read more about it
in Ian White (2010) "simsum: Analyses of simulation studies including 
Monte Carlo error" The Stata Journal, 10(3): 369--385.

*--------------------------- begin simulation ------------------------
program drop _all
program define mynormal_lf
	version 11
	args lnfj mu ln_sigma
	quietly replace `lnfj' = ///
                ln(normalden($ML_y1, `mu', exp(`ln_sigma')))

program define sim, rclass
// create data
	drop _all
	set obs 500
	gen occ = ceil(4*runiform())
	gen x = rnormal()
	gen sd = cond(occ==1,.2, ///
             cond(occ==2,.6, ///
             cond(occ==3,.4, .8)))
	gen y = 1 + 2*(occ==2) + .5*(occ==3) + ///
            3*(occ==4) + sd*rnormal()

// Stas
	reg y x i.occ
	predict double resid, resid
	sum resid if occ == 2
	return scalar stas = r(sd)

// Maarten
	tempname b0 rmse 
	matrix `b0' = e(b)           
	scalar `rmse' = ln(e(rmse))  

	ml model lf mynormal_lf                            /// 
		(mu: y = x i.occ) ///
		(ln_sigma: i.occ)

	ml init `b0'
	ml init ln_sigma:_cons = `= `rmse' '

	ml maximize

	return scalar maarten = exp( [ln_sigma]_b[2.occ] + ///
                                 [ln_sigma]_b[_cons] )

// Kit
	reg y x if occ==2
	qui predict double eps if e(sample), resid
	sum eps if occ == 2
	return scalar kit = r(sd)

simulate stas=r(stas) maarten=r(maarten) kit=r(kit), reps(10000) : sim
simsum stas maarten kit, true(.6) mcse bias
twoway kdensity stas || kdensity maarten || kdensity kit
*--------------------------- end simulation ----------------------------

Hope this helps,

Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen


*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index