Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Interval regression: Categories with 100% censoring

From	Stas Kolenikov <[email protected]>
To	[email protected]
Subject	Re: st: Interval regression: Categories with 100% censoring
Date	Sun, 19 Feb 2012 21:23:37 -0600

In this case, the regression parameter for the region may not be
identified. The numeric maximization algorithm pushes this parameter
to the right until the probability of being censored becomes as little
as possible, i.e., computer zero (or c(epsdouble), if you like).
Hence, the probability being above the point of left censoring becomes
computer 1-c(epsdouble). If this is the case, then (1) you would see
absurdly large standard errors on the shift parameter for this region;
(2) the point estimate would likely be a few sigmas (e(sigma)) away
from where it should have been:

. display abs( invnorm( c(epsfloat) ) )
5.1665781

Simple demonstration:

clear
set obs 20
gen byte k = mod(_n,4)
set seed 12345
gen e1 = rnormal()
gen e2 = rnormal()
gen y1 = min( k + e1, k + e2 )
gen y2 = max( k+e1, k+e2 )
replace y1 = . in 15/20
replace y2 = . in 1/7
replace y2 = . if k==1

Then we have

. intreg y1 y2 ibn.k, nocons nolog

Interval regression                               Number of obs   =         19
                                                  Wald chi2(4)    =     207.14
Log likelihood = -4.4521754                       Prob > chi2     =     0.0000

------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           k |
          0  |   .1423818   .2701057     0.53   0.598    -.3870157    .6717792
          1  |   2.570719   233.2821     0.01   0.991    -454.6537    459.7952
          2  |   2.460023   .9607074     2.56   0.010     .5770706    4.342974
          3  |   3.053332   .2163009    14.12   0.000      2.62939    3.477274
-------------+----------------------------------------------------------------
    /lnsigma |  -1.227765   .5440687    -2.26   0.024     -2.29412   -.1614097
-------------+----------------------------------------------------------------
       sigma |   .2929467   .1593831                      .1008501    .8509434
------------------------------------------------------------------------------

  Observation summary:         5  left-censored observations
                               0     uncensored observations
                               9 right-censored observations
                               5       interval observations

. display (_b[1.k]-1)/e(sigma)
5.3617906



On Sun, Feb 19, 2012 at 1:49 PM,  <[email protected]> wrote:
> Hello all,
>
> I would appreciate some advice regrading the output from -intreg-.
>
> A number of water samples have been collected, and a microbiological
> examination undertaken to assess the number of colony forming units per
> 100ml (CFU).  Some observations are right censored, some are left
> censored, and the censoring point is not always the same.  I have
> therefore been using interval regression (-intreg-) to look for regional
> differences in the organism levels.
>
> Based on previous advice from Statalist (thank you!), my dependent
> variable is log10 colony forming units and my independent variable is the
> categorical variable of region.  Some samples were collected from the same
> location.  My command is as follows:
>
> intreg depvar1 depvar2 i.region, vce(cluster location)
>
> This seems to be working quite nicely and the results seem sensible.
> However, when I have a region where all observations are, say, right
> censored, then the predicted log10 CFU for this region is substantially
> higher than the other regions (statistically significantly so).  But its
> value does not seem sensible - for example, all regions generally predict
> around 3-4 CFU, whereas the region with all censored values has a
> predicted value of around 7 CFU!
>
> I am guessing that this is happening because all of the observations are
> right censored and so the model doesn't really have sufficient information
> to estimate a reliable coefficient.  But I cannot find this written
> anywhere.
>
> Do you think it is justifiable to say that "results can be unreliable when
> all observations are censored, and so regions where this happens were
> excluded from the analysis", or something along those lines?
>
> I would be grateful for any advice.
>
> Many thanks,
>
> Gillian
>
>
>
>
>
>
>
>
>
>
>
> ------------------------------------------------------------------------
> ATTENTION:
>
> This message contains privileged and confidential information intended
> for the addressee(s) only. If this message was sent to you in error,
> you must not disseminate, copy or take any action in reliance on it and
> we request that you notify the sender immediately by return email.
>
> Opinions expressed in this message and any attachments are not
> necessarily those held by the Health and Safety Laboratory or any person
> connected with the organisation, save those by whom the opinions were
> expressed.
>
> Please note that any messages sent or received by the Health and Safety
> Laboratory email system may be monitored and stored in an information
> retrieval system.
> ------------------------------------------------------------------------
> Think before you print - do you really need to print this email?
> ------------------------------------------------------------------------
>
> ------------------------------------------------------------------------
> Scanned by MailMarshal - Marshal's comprehensive email content security
> solution. Download a free evaluation of MailMarshal at www.marshal.com
> ------------------------------------------------------------------------
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/



-- 
Stas Kolenikov, also found at http://stas.kolenikov.name
Small print: I use this email account for mailing lists only.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: Interval regression: Categories with 100% censoring
  - From: [email protected]

References:
- st: Interval regression: Categories with 100% censoring
  - From: [email protected]

Prev by Date: Re: st: Propensity Score Matching
Next by Date: st: Re: weighted time dependent Cox model
Previous by thread: st: Interval regression: Categories with 100% censoring
Next by thread: Re: st: Interval regression: Categories with 100% censoring
Index(es):
- Date
- Thread