Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

re: Re: st: coefficient test in different regression models

From   Maarten buis <>
Subject   re: Re: st: coefficient test in different regression models
Date   Mon, 4 Oct 2010 20:44:02 +0100 (BST)

--- On Mon, 4/10/10, Christopher F Baum wrote:
> Maarten suggests estimating the two models by pooling. Not
> a bad idea, but it does impose one additional constraint:
> that the sigma^2 are the same across equations. For that
> reason one should at minimum use robust VCE in that case.
> An alternative is to use -suest-. Notice that you estimate
> the individual equations with classical VCE and apply robust
> on -suest- if desired.
> It might be interesting to do some simulations of
> the two approaches to see where they will agree or differ

That is true. So I made a first stab at such a simulation.
In particular whether my "pooled regression" approach will
work when the residual variance actually differs across the
sub-populations. In the simulation below there is virtually
no difference in the point estimates. That is no surprise 
for robust and non-robust, that is build in the program, 
but as far as I understand it, this did not have to be true
for -suest- (though this does not really surprise me either). 

The area where I expected the method might matter was the test
statistic. The simulation returns the p-values of the test
of a true null-hypothesis. These p-values should be uniformly
distributed. That way if we choose a significance level of 
.05 we will find a p-value less than .05 in 5% of the 
simulations, and if we choose a significance value .10 we
will find a p-value less than .10 in 10% of the simulations,
etc. In other words, we would than get the correct coverage
regarless of what significance level we have chosen. I 
checked this with the -hangroot- program, which can be 
downloaded from SSC by typing in Stata:
-ssc install hangroot-. The confidence intervals shown in the
graphs now have an interpretation as the area where we might 
expect the simulations to occur due to the randomness inherrit 
in simulation.

What surprised me is that in this simulation the regular 
regression without the robust standard errors seems to do
best. A possible reason is the sample size: I choose 200
as in that case there might be some random variation
resulting in more interesting pictures, but robust 
standard errors and -suest- rely on asymptotic arguments
and 200 may not be large enough.

*------------------------- begin simulation ----------------------
set seed 12345
set more off
program drop _all
program define sim, rclass
	drop _all
	set obs 200
	gen d = _n <=100
	gen x = rnormal()
	gen y = d + x + x*d + .25*(d + 1)*rnormal()

	reg y x if d
	est store a
	reg y x if !d
	est store b
	suest a b
	test _b[a_mean:x] - _b[b_mean:x] = 1
	return scalar dif_suest = _b[a_mean:x] - _b[b_mean:x]
	return scalar p_suest = r(p)
	reg y c.x##i.d
	test _b[1.d#c.x] = 1
	return scalar dif_reg = _b[1.d#c.x]
	return scalar p_reg = r(p)
	reg y c.x##i.d, vce(robust)
	test _b[1.d#c.x] = 1
	return scalar dif_rob = _b[1.d#c.x]
	return scalar p_rob = r(p)

simulate dif_suest=r(dif_suest) p_suest=r(p_suest) ///
         dif_reg  =r(dif_reg)   p_reg  =r(p_reg)   ///
         dif_rob  =r(dif_rob)   p_rob  =r(p_rob),  ///
         rep(10000) : sim
sum dif*
hangroot p_suest, susp notheor ci dist(uniform) name(suest, replace)
hangroot p_reg, susp notheor ci dist(uniform) name(reg, replace)
hangroot p_rob, susp notheor ci dist(uniform) name(rob, replace)
*----------------------- end simulation --------------------------	
(For more on examples I sent to the Statalist see: )

Hope this helps,

Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen


*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index