Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Stas Kolenikov <skolenik@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: Interval regression: Categories with 100% censoring |
Date | Sun, 19 Feb 2012 21:23:37 -0600 |
In this case, the regression parameter for the region may not be identified. The numeric maximization algorithm pushes this parameter to the right until the probability of being censored becomes as little as possible, i.e., computer zero (or c(epsdouble), if you like). Hence, the probability being above the point of left censoring becomes computer 1-c(epsdouble). If this is the case, then (1) you would see absurdly large standard errors on the shift parameter for this region; (2) the point estimate would likely be a few sigmas (e(sigma)) away from where it should have been: . display abs( invnorm( c(epsfloat) ) ) 5.1665781 Simple demonstration: clear set obs 20 gen byte k = mod(_n,4) set seed 12345 gen e1 = rnormal() gen e2 = rnormal() gen y1 = min( k + e1, k + e2 ) gen y2 = max( k+e1, k+e2 ) replace y1 = . in 15/20 replace y2 = . in 1/7 replace y2 = . if k==1 Then we have . intreg y1 y2 ibn.k, nocons nolog Interval regression Number of obs = 19 Wald chi2(4) = 207.14 Log likelihood = -4.4521754 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- k | 0 | .1423818 .2701057 0.53 0.598 -.3870157 .6717792 1 | 2.570719 233.2821 0.01 0.991 -454.6537 459.7952 2 | 2.460023 .9607074 2.56 0.010 .5770706 4.342974 3 | 3.053332 .2163009 14.12 0.000 2.62939 3.477274 -------------+---------------------------------------------------------------- /lnsigma | -1.227765 .5440687 -2.26 0.024 -2.29412 -.1614097 -------------+---------------------------------------------------------------- sigma | .2929467 .1593831 .1008501 .8509434 ------------------------------------------------------------------------------ Observation summary: 5 left-censored observations 0 uncensored observations 9 right-censored observations 5 interval observations . display (_b[1.k]-1)/e(sigma) 5.3617906 On Sun, Feb 19, 2012 at 1:49 PM, <Gillian.Frost@hsl.gov.uk> wrote: > Hello all, > > I would appreciate some advice regrading the output from -intreg-. > > A number of water samples have been collected, and a microbiological > examination undertaken to assess the number of colony forming units per > 100ml (CFU). Some observations are right censored, some are left > censored, and the censoring point is not always the same. I have > therefore been using interval regression (-intreg-) to look for regional > differences in the organism levels. > > Based on previous advice from Statalist (thank you!), my dependent > variable is log10 colony forming units and my independent variable is the > categorical variable of region. Some samples were collected from the same > location. My command is as follows: > > intreg depvar1 depvar2 i.region, vce(cluster location) > > This seems to be working quite nicely and the results seem sensible. > However, when I have a region where all observations are, say, right > censored, then the predicted log10 CFU for this region is substantially > higher than the other regions (statistically significantly so). But its > value does not seem sensible - for example, all regions generally predict > around 3-4 CFU, whereas the region with all censored values has a > predicted value of around 7 CFU! > > I am guessing that this is happening because all of the observations are > right censored and so the model doesn't really have sufficient information > to estimate a reliable coefficient. But I cannot find this written > anywhere. > > Do you think it is justifiable to say that "results can be unreliable when > all observations are censored, and so regions where this happens were > excluded from the analysis", or something along those lines? > > I would be grateful for any advice. > > Many thanks, > > Gillian > > > > > > > > > > > > ------------------------------------------------------------------------ > ATTENTION: > > This message contains privileged and confidential information intended > for the addressee(s) only. If this message was sent to you in error, > you must not disseminate, copy or take any action in reliance on it and > we request that you notify the sender immediately by return email. > > Opinions expressed in this message and any attachments are not > necessarily those held by the Health and Safety Laboratory or any person > connected with the organisation, save those by whom the opinions were > expressed. > > Please note that any messages sent or received by the Health and Safety > Laboratory email system may be monitored and stored in an information > retrieval system. > ------------------------------------------------------------------------ > Think before you print - do you really need to print this email? > ------------------------------------------------------------------------ > > ------------------------------------------------------------------------ > Scanned by MailMarshal - Marshal's comprehensive email content security > solution. Download a free evaluation of MailMarshal at www.marshal.com > ------------------------------------------------------------------------ > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ -- Stas Kolenikov, also found at http://stas.kolenikov.name Small print: I use this email account for mailing lists only. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/