# Re: st: simulate consequences of selection bias 101

 From "Austin Nichols" To statalist@hsphsun2.harvard.edu Subject Re: st: simulate consequences of selection bias 101 Date Tue, 1 Apr 2008 09:18:00 -0400

```Thomas Gschwend <gschwend@uni-mannheim.de>:

You seem to be using the term "selection bias" in a somewhat
nonstandard way ("virtues of selection bias" is certainly an odd turn
of phrase)--do you have in mind selection on the dependent variable?
Or the classic form of selection bias (selection on unobservables, or
omitted "confounding" variables, leading to endogeneity of X) which
could be modeled as a neglected nonlinearity in X for your case?

clear
range x -3 6 100
expand 80 if x<0
g y=x^2 +invnorm(uniform())
reg y x
reg y x if y>10
reg y x if x>0
lpoly y x

In this simple case, the omitted variable is clearly just X^2.

See SJ7(4):507-541
[http://www.stata-journal.com/article.html?article=st0136] for an
inventory of common solutions for endogeneity of X.

A nice example of sign reversal due to omitted variables that students
can easily understand is given in Julious and Mullee (1994) citing
Charig et al. (1986):

Tell students they each have a kidney stone. In past cases, treatment
OS (open surgery) had a success rate of 78% while treatment PN
them which treatment they would choose.  Now tell them the success
rates look rather different when stone size is taken into account. For
smaller stones (diameter <2 cm), 93% of cases treated with OS were
successful compared with just 83% of cases treated with PN. For larger
stones (diameter >=2 cm), the success rate of OS was 73% and the
success rate of PN was 69%.  Now which would they choose, even not
knowing which size stone they have?

Always good to put death on the table as a possible outcome of omitted
variables bias in regression.

Steven A. Julious and Mark A. Mullee. 1994. "Confounding and Simpson's
paradox". British Medical Journal 309(6967): 1480–1481.
[http://www.bmj.com/cgi/content/full/309/6967/1480]

C. R. Charig, D. R. Webb, S. R. Payne, O. E. Wickham. 1986.
"Comparison of treatment of renal calculi by operative surgery,
percutaneous nephrolithotomy, and extracorporeal shock wave
lithotripsy". British Medical Journal 292 (6524): 879–882.
[http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pubmed&pubmedid=3083922]

On Tue, Apr 1, 2008 at 3:32 AM, Thomas Gschwend
<gschwend@uni-mannheim.de> wrote:
> Dear all,
>  prompted by a student's question when teaching about the virtues of
>  selection bias I would like to simulate some data which fulfills the
>  following requirements, whereby Y = b0 + b1*X
>
>  1)      When regressing Y on X (for the full sample)
>  b1 = -.5 and significantly < 0
>
>  2)      When regressing Y on X (for a subsample, say for Y > 10)
>  b1 = +2 and significantly > 0
>
>  I am not sure how to do simulate data that fulfills both requirements.
>
>  Any help is greatly appreciated.
>
>  Thomas

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```