[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: Biased estimates?

From   "Lachenbruch, Peter" <>
To   <>
Subject   RE: st: Biased estimates?
Date   Wed, 4 Mar 2009 13:35:53 -0800

The ideas of Marginal Structural Models may also be of some help.  These
estimate the probability of response from some predictor variables.  You
may need to do a bit of tinkering on these.  Try the current index of
statistics to get references.  I don't have the references at hand, but
Babette Brumback and James Robins have published on this.


Peter A. Lachenbruch
Department of Public Health
Oregon State University
Corvallis, OR 97330
Phone: 541-737-3832
FAX: 541-737-4001

-----Original Message-----
[] On Behalf Of Michael I.
Sent: Wednesday, March 04, 2009 12:33 PM
Subject: Re: st: Biased estimates?


Whether or not you've biased your results by throwing out cases depends 
on whether or not those cases differ systematically from the general 
population of cases. If they do, you can (arguably) compensate by giving

additional weight to cases that are like the ones you dropped. For 
example, if you dropped 100 cases in very low response areas but 
retained 200 cases in moderately low response areas, you could give 
those 200 cases each a weight of 1.5 in your analysis.

On the other hand, if you're not mailing to the very low response areas 
in the second round, it was right to exclude them from the analysis and 
there's no bias in your predictions for people in areas that have more 
than very low response rates.

In any event, since success is a rare outcome in your study (only 5%), 
you might consider using Gary King's rare events logit (-relogit-) 
available at http://GKing.Harvard.Edu.


Mike Wazowski wrote:
> Hello Statalisters, I am hoping that somebody can help me with the
> I have data on invitations mailed to students to join an honor society
as well as who responded (joined) the society.  There are two mailing
campaigns: preliminary, to about 10% of the eligible students, and then
a secondary mailing to all the remaining 90% students.  Since the
response rate is low (around 5%), my task is to build a predictive
model, based on the first round of mailing, of who is likely to join so
that we can minimize the cost of the second mailing.
> The problem is that the data for the first mailing is already purged
of "bad" zip codes - those from whom the response rate was close to zero
in the previous year or two (the data for the second round contains all
the zip codes although I can delete the "bad" ones too, if necessary).
I am using a logit model to estimate coefficients based on the first
round and use those for an out-of-sample prediction for the probability
of response for the second round.
> My question is whether the data purge for the first mailing biases the
results?  I am getting many respondents classified as very low
probability respondents. Are there any statistical procedures to correct
for the deletion of bad zip codes in the first mailing?
> Thank you,
> Mike
> *
> *   For searches and help try:
> *
> *
> *

Michael I. Lichter, Ph.D.
Research Assistant Professor & NRSA Fellow
UB Department of Family Medicine / Primary Care Research Institute
UB Clinical Center, 462 Grider Street, Buffalo, NY 14215
Office: CC 125 / Phone: 716-898-4751 / E-Mail:

*   For searches and help try:

*   For searches and help try:

© Copyright 1996–2021 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index