Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.

# Re: st: Interval regression: Categories with 100% censoring

 From Gillian.Frost@hsl.gov.uk To statalist@hsphsun2.harvard.edu Subject Re: st: Interval regression: Categories with 100% censoring Date Mon, 20 Feb 2012 09:57:46 +0000

```Hello Stas,

Gillian

From:   Stas Kolenikov <skolenik@gmail.com>
To:     statalist@hsphsun2.harvard.edu
Date:   20/02/2012 03:33
Subject:        Re: st: Interval regression: Categories with 100%
censoring
Sent by:        owner-statalist@hsphsun2.harvard.edu

In this case, the regression parameter for the region may not be
identified. The numeric maximization algorithm pushes this parameter
to the right until the probability of being censored becomes as little
as possible, i.e., computer zero (or c(epsdouble), if you like).
Hence, the probability being above the point of left censoring becomes
computer 1-c(epsdouble). If this is the case, then (1) you would see
absurdly large standard errors on the shift parameter for this region;
(2) the point estimate would likely be a few sigmas (e(sigma)) away
from where it should have been:

. display abs( invnorm( c(epsfloat) ) )
5.1665781

Simple demonstration:

clear
set obs 20
gen byte k = mod(_n,4)
set seed 12345
gen e1 = rnormal()
gen e2 = rnormal()
gen y1 = min( k + e1, k + e2 )
gen y2 = max( k+e1, k+e2 )
replace y1 = . in 15/20
replace y2 = . in 1/7
replace y2 = . if k==1

Then we have

. intreg y1 y2 ibn.k, nocons nolog

Interval regression                               Number of obs   =  19
Wald chi2(4)    = 207.14
Log likelihood = -4.4521754                       Prob > chi2     = 0.0000

------------------------------------------------------------------------------
|      Coef.   Std. Err.      z    P>|z|     [95% Conf.
Interval]
-------------+----------------------------------------------------------------
k |
0  |   .1423818   .2701057     0.53   0.598    -.3870157
.6717792
1  |   2.570719   233.2821     0.01   0.991    -454.6537
459.7952
2  |   2.460023   .9607074     2.56   0.010     .5770706
4.342974
3  |   3.053332   .2163009    14.12   0.000      2.62939
3.477274
-------------+----------------------------------------------------------------
/lnsigma |  -1.227765   .5440687    -2.26   0.024     -2.29412
-.1614097
-------------+----------------------------------------------------------------
sigma |   .2929467   .1593831                      .1008501
.8509434
------------------------------------------------------------------------------

Observation summary:         5  left-censored observations
0     uncensored observations
9 right-censored observations
5       interval observations

. display (_b[1.k]-1)/e(sigma)
5.3617906

On Sun, Feb 19, 2012 at 1:49 PM,  <Gillian.Frost@hsl.gov.uk> wrote:
> Hello all,
>
>
> A number of water samples have been collected, and a microbiological
> examination undertaken to assess the number of colony forming units per
> 100ml (CFU).  Some observations are right censored, some are left
> censored, and the censoring point is not always the same.  I have
> therefore been using interval regression (-intreg-) to look for regional
> differences in the organism levels.
>
> Based on previous advice from Statalist (thank you!), my dependent
> variable is log10 colony forming units and my independent variable is
the
> categorical variable of region.  Some samples were collected from the
same
> location.  My command is as follows:
>
> intreg depvar1 depvar2 i.region, vce(cluster location)
>
> This seems to be working quite nicely and the results seem sensible.
> However, when I have a region where all observations are, say, right
> censored, then the predicted log10 CFU for this region is substantially
> higher than the other regions (statistically significantly so).  But its
> value does not seem sensible - for example, all regions generally
predict
> around 3-4 CFU, whereas the region with all censored values has a
> predicted value of around 7 CFU!
>
> I am guessing that this is happening because all of the observations are
> right censored and so the model doesn't really have sufficient
information
> to estimate a reliable coefficient.  But I cannot find this written
> anywhere.
>
> Do you think it is justifiable to say that "results can be unreliable
when
> all observations are censored, and so regions where this happens were
> excluded from the analysis", or something along those lines?
>
> I would be grateful for any advice.
>
> Many thanks,
>
> Gillian
>
>
>
>
>
>
>
>
>
>
>
> ------------------------------------------------------------------------
> ATTENTION:
>
> This message contains privileged and confidential information intended
> for the addressee(s) only. If this message was sent to you in error,
> you must not disseminate, copy or take any action in reliance on it and
> we request that you notify the sender immediately by return email.
>
> Opinions expressed in this message and any attachments are not
> necessarily those held by the Health and Safety Laboratory or any person
> connected with the organisation, save those by whom the opinions were
> expressed.
>
> Please note that any messages sent or received by the Health and Safety
> Laboratory email system may be monitored and stored in an information
> retrieval system.
> ------------------------------------------------------------------------
> Think before you print - do you really need to print this email?
> ------------------------------------------------------------------------
>
> ------------------------------------------------------------------------
> Scanned by MailMarshal - Marshal's comprehensive email content security
> ------------------------------------------------------------------------
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/

--
Stas Kolenikov, also found at http://stas.kolenikov.name
Small print: I use this email account for mailing lists only.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

------------------------------------------------------------------------
ATTENTION:

This message contains privileged and confidential information intended
for the addressee(s) only. If this message was sent to you in error,
you must not disseminate, copy or take any action in reliance on it and
we request that you notify the sender immediately by return email.

Opinions expressed in this message and any attachments are not
necessarily those held by the Health and Safety Laboratory or any person
connected with the organisation, save those by whom the opinions were
expressed.

Please note that any messages sent or received by the Health and Safety
Laboratory email system may be monitored and stored in an information
retrieval system.
------------------------------------------------------------------------
Think before you print - do you really need to print this email?
------------------------------------------------------------------------

------------------------------------------------------------------------
Scanned by MailMarshal - Marshal's comprehensive email content security