[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: simulate consequences of selection bias 101

From   Thomas Gschwend <>
Subject   Re: st: simulate consequences of selection bias 101
Date   Tue, 01 Apr 2008 15:47:35 +0200

I meant to say "selection on the dependent variable". I wanted to let the students see that we might even get a sign flip if we select on the dependent variable and run a regression of the range-restricted Y on X.

Thanks for the references.


Austin Nichols schrieb:

Thomas Gschwend <>:

You seem to be using the term "selection bias" in a somewhat
nonstandard way ("virtues of selection bias" is certainly an odd turn
of phrase)--do you have in mind selection on the dependent variable?
Or the classic form of selection bias (selection on unobservables, or
omitted "confounding" variables, leading to endogeneity of X) which
could be modeled as a neglected nonlinearity in X for your case?

range x -3 6 100
expand 80 if x<0
g y=x^2 +invnorm(uniform())
reg y x
reg y x if y>10
reg y x if x>0
lpoly y x

In this simple case, the omitted variable is clearly just X^2.

See SJ7(4):507-541
[] for an
inventory of common solutions for endogeneity of X.

A nice example of sign reversal due to omitted variables that students
can easily understand is given in Julious and Mullee (1994) citing
Charig et al. (1986):

Tell students they each have a kidney stone. In past cases, treatment
OS (open surgery) had a success rate of 78% while treatment PN
(percutaneous nephrolithotomy) had a success rate of 83% overall.  Ask
them which treatment they would choose.  Now tell them the success
rates look rather different when stone size is taken into account. For
smaller stones (diameter <2 cm), 93% of cases treated with OS were
successful compared with just 83% of cases treated with PN. For larger
stones (diameter >=2 cm), the success rate of OS was 73% and the
success rate of PN was 69%.  Now which would they choose, even not
knowing which size stone they have?

Always good to put death on the table as a possible outcome of omitted
variables bias in regression.

Steven A. Julious and Mark A. Mullee. 1994. "Confounding and Simpson's
paradox". British Medical Journal 309(6967): 1480–1481.

C. R. Charig, D. R. Webb, S. R. Payne, O. E. Wickham. 1986.
"Comparison of treatment of renal calculi by operative surgery,
percutaneous nephrolithotomy, and extracorporeal shock wave
lithotripsy". British Medical Journal 292 (6524): 879–882.

On Tue, Apr 1, 2008 at 3:32 AM, Thomas Gschwend
<> wrote:
Dear all,
 prompted by a student's question when teaching about the virtues of
 selection bias I would like to simulate some data which fulfills the
 following requirements, whereby Y = b0 + b1*X

 1)      When regressing Y on X (for the full sample)
 b1 = -.5 and significantly < 0

 2)      When regressing Y on X (for a subsample, say for Y > 10)
 b1 = +2 and significantly > 0

 I am not sure how to do simulate data that fulfills both requirements.

 Any help is greatly appreciated.

*   For searches and help try:

Thomas Gschwend
Professor for Quantitative Methods in the Social Sciences
Center for Doctoral Studies in Social & Behavioral Sciences (CDSS)
Graduate School of Economic & Social Sciences (GESS)
University of Mannheim
68131 Mannheim
0621.181.2087 (direct)
0621.181.2414 (assistant)
0621.181.3699 (fax)
*   For searches and help try:

© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index