Re: st: Biased estimates?

 From "Michael I. Lichter" To statalist@hsphsun2.harvard.edu Subject Re: st: Biased estimates? Date Wed, 04 Mar 2009 17:30:43 -0500

```Mike,

```
Again, if you've limited your target population to those who are not in low-response areas, you don't have a bias. If you're not imposing this limitation, then yes, the most likely effect on parameter estimates is that they will be more positive than they otherwise would be, resulting in higher predicted probabilities of success for some or all cases. (You can simulate this if you like, to see how large the probable effect is.) However, given your specific application does it really matter if the predicted probability for case XXX = 0.015 vs. 0.019?
```
Michael

Mike Wazowski wrote:
```
```thank you michael.  one problem that i am facing is that i have no idea how many low response observations were dropped in the initial mailing as i do not have them in the dataset (we receive data from a third party).  so it is not possible to determine whether and how the dropped cases differ from the retained observations.

is my intuition correct that because the dropped observations are likely nonresponders, then my estimates are biased upward?  that is, students who are less likely to respond are driven into a higher predicted probability of responding?

thank you,

mike

--- On Wed, 3/4/09, Michael I. Lichter <mlichter@buffalo.edu> wrote:

```
```From: Michael I. Lichter <mlichter@buffalo.edu>
Subject: Re: st: Biased estimates?
To: statalist@hsphsun2.harvard.edu
Date: Wednesday, March 4, 2009, 8:32 PM
Mike,

Whether or not you've biased your results by throwing
out cases depends on whether or not those cases differ
systematically from the general population of cases. If they
do, you can (arguably) compensate by giving additional
weight to cases that are like the ones you dropped. For
example, if you dropped 100 cases in very low response areas
but retained 200 cases in moderately low response areas, you
could give those 200 cases each a weight of 1.5 in your
analysis.

On the other hand, if you're not mailing to the very
low response areas in the second round, it was right to
exclude them from the analysis and there's no bias in
your predictions for people in areas that have more than
very low response rates.

In any event, since success is a rare outcome in your study
(only 5%), you might consider using Gary King's rare
events logit (-relogit-) available at
http://GKing.Harvard.Edu.

Michael

Mike Wazowski wrote:
```
```Hello Statalisters, I am hoping that somebody can help
```
```me with the following.
```
```I have data on invitations mailed to students to join
```
```an honor society as well as who responded (joined) the
society.  There are two mailing campaigns: preliminary, to
about 10% of the eligible students, and then a secondary
mailing to all the remaining 90% students.  Since the
response rate is low (around 5%), my task is to build a
predictive model, based on the first round of mailing, of
who is likely to join so that we can minimize the cost of
the second mailing.
```
```The problem is that the data for the first mailing is
```
```already purged of "bad" zip codes - those from
whom the response rate was close to zero in the previous
year or two (the data for the second round contains all the
zip codes although I can delete the "bad" ones
too, if necessary).  I am using a logit model to estimate
coefficients based on the first round and use those for an
out-of-sample prediction for the probability of response for
the second round.
```
```My question is whether the data purge for the first
```
```mailing biases the results?  I am getting many respondents
classified as very low probability respondents. Are there
any statistical procedures to correct for the deletion of
bad zip codes in the first mailing?
```
```Thank you,

Mike

```
*
```*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```
```-- Michael I. Lichter, Ph.D.
Research Assistant Professor & NRSA Fellow
UB Department of Family Medicine / Primary Care Research
Institute
UB Clinical Center, 462 Grider Street, Buffalo, NY 14215
Office: CC 125 / Phone: 716-898-4751 / E-Mail:
mlichter@buffalo.edu

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```
```

```
```*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```
```
--
Michael I. Lichter, Ph.D.
Research Assistant Professor & NRSA Fellow
UB Department of Family Medicine / Primary Care Research Institute
UB Clinical Center, 462 Grider Street, Buffalo, NY 14215
Office: CC 125 / Phone: 716-898-4751 / E-Mail: mlichter@buffalo.edu

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```