[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: How low can the percentage of uncensored cases be in heckprob?

From   Steven Samuels <>
Subject   Re: st: How low can the percentage of uncensored cases be in heckprob?
Date   Tue, 11 Nov 2008 16:17:48 -0500

I am not familiar with -heckprob-, but I doubt if the *percent* of uncensored observations matters much.

-heckprob- fits two probit models. I know of results related to Margaret's question only for logit models. For a single logistic regression model, the relevant sample size is the smaller of the number of events or non-events. Peduzzi et al. (1996) showed that the ratio of this number to the number of predictors should be at least 15:1 to avoid bias from over-fitting.


Peduzzi PN, Concato J, Holford TR, Feinstein AR. (1995) The importance of events per independent variable in multivariable analysis, II: accuracy and precision of regression estimates. J Clin Epidemiol; 48: 1503–10.

Peduzzi PN, Concato J, Kemper E, Holford TR, Feinstein AR. (1996) A simulation study of the number of events per variable in logistic regression analysis. J Clin Epidemiol; 49: 1373–9.

M Babyak. (2004) What You See May Not Be What You Get: A Brief, Nontechnical Introduction to Overfitting in Regression-Type Models. Psychosomatic Medicine 66:411-421. Full text:

On Nov 11, 2008, at 12:15 PM, Maarten buis wrote:

--- "Tyler, Margaret C D" <> wrote:
In the example in the Stata reference -H heckprob, there are 95 total
and 59 uncensored observations, so 62% are uncensored. In my own
situation I have only about 19% uncensored. Is it still appropriate
to use heckprob for my analysis? I have run the equations and gotten
what seem to be valid results. rho is non-significant.

You are obviously pushing your luck with that many censored cases. It
is no longer very popular to make statements like you need at least N
observation or p% uncensored cases for technique t to be appropriate
(whatever appropriate may mean). So I don't think you will get the
answer you are looking for. However, what you can do is run some
simulations and see how well (or bad) your estimator behaves with a
small number of uncensored cases. At the last Summer North American
Stata Users' Group meeting I gave a talk on using Stata for doing this
type of simulations, you can get the materials from:

Hope this helps,

Maarten L. Buis
Department of Social Research Methodology
Vrije Universiteit Amsterdam
Boelelaan 1081
1081 HV Amsterdam
The Netherlands

visiting address:
Buitenveldertselaan 3 (Metropolitan), room N515

+31 20 5986715

*   For searches and help try:

*   For searches and help try:

© Copyright 1996–2022 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index