[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Biased estimates?

From   "Michael I. Lichter" <>
Subject   Re: st: Biased estimates?
Date   Wed, 04 Mar 2009 15:32:33 -0500


Whether or not you've biased your results by throwing out cases depends on whether or not those cases differ systematically from the general population of cases. If they do, you can (arguably) compensate by giving additional weight to cases that are like the ones you dropped. For example, if you dropped 100 cases in very low response areas but retained 200 cases in moderately low response areas, you could give those 200 cases each a weight of 1.5 in your analysis.

On the other hand, if you're not mailing to the very low response areas in the second round, it was right to exclude them from the analysis and there's no bias in your predictions for people in areas that have more than very low response rates.

In any event, since success is a rare outcome in your study (only 5%), you might consider using Gary King's rare events logit (-relogit-) available at http://GKing.Harvard.Edu.


Mike Wazowski wrote:
Hello Statalisters, I am hoping that somebody can help me with the following.

I have data on invitations mailed to students to join an honor society as well as who responded (joined) the society.  There are two mailing campaigns: preliminary, to about 10% of the eligible students, and then a secondary mailing to all the remaining 90% students.  Since the response rate is low (around 5%), my task is to build a predictive model, based on the first round of mailing, of who is likely to join so that we can minimize the cost of the second mailing.

The problem is that the data for the first mailing is already purged of "bad" zip codes - those from whom the response rate was close to zero in the previous year or two (the data for the second round contains all the zip codes although I can delete the "bad" ones too, if necessary).  I am using a logit model to estimate coefficients based on the first round and use those for an out-of-sample prediction for the probability of response for the second round.

My question is whether the data purge for the first mailing biases the results?  I am getting many respondents classified as very low probability respondents. Are there any statistical procedures to correct for the deletion of bad zip codes in the first mailing?

Thank you,


*   For searches and help try:

Michael I. Lichter, Ph.D.
Research Assistant Professor & NRSA Fellow
UB Department of Family Medicine / Primary Care Research Institute
UB Clinical Center, 462 Grider Street, Buffalo, NY 14215
Office: CC 125 / Phone: 716-898-4751 / E-Mail:

*   For searches and help try:

© Copyright 1996–2021 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index