Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: Interval regression with skewed data


From   Nick Cox <n.j.cox@durham.ac.uk>
To   "'statalist@hsphsun2.harvard.edu'" <statalist@hsphsun2.harvard.edu>
Subject   st: RE: Interval regression with skewed data
Date   Mon, 9 Jan 2012 16:08:24 +0000

I'd be more worried about violating linearity of functional form than normality of errors, but you say nothing about that. Nor do you say anything about what your predictors are. 

I can't see from your discussion that it can be a choice between interval regression and some kind of survival analysis. What you have doesn't sound to me at all like a survival analysis problem. 

However, assuming the first, you could transform before you use -intreg-. Your limits just transform to limits on your transformed scale. From other experiences with hydrological data I would reach for a logarithmic transform as first port of call. You would need to back-transform afterwards. 

Nick 
n.j.cox@durham.ac.uk 

Gillian.Frost@hsl.gov.uk

I am struggling with an analysis and would like your insight.  I think 
that I am looking at using interval regression but there are certain 
aspects of the data that are worrying me.  First some background...

A number of water samples have been taken from around the UK, and a 
microbiological examination of the water has been undertaken.  Whenever a 
sample is sent to a lab, a whole suite of tests are done to count the 
number of colony forming units of various organisms.  I therefore have a 
number of outcomes, whose units are the number of colony forming units per 
ml.  The aim of this part of the analysis is to compare the organism 
levels found in different regions of the UK.

Some observations are left censored (0-6% depending on the outcome) - ie 
<1 CFU/ml, or <10 CFU/ml - and some are right censored (0-59%) - ie. >3000 
CFU/ml.  The censoring point varies,and so I thought that I would have to 
use interval regression (Stata's -intreg-).

However, the data are not Normally distributed (which is an assumption of 
interval regression), but are positively skewed with some outcomes having 
a high number of zero counts (one has 75% zeros!).  In the book by J S 
Long (Regression models for categorical and limited dependent variables, 
2007), there was a discussion about how accelerated failure time (AFT) 
models can be used to perform interval regression when the data are not 
Normally distributed, but there was no example of how to do this. 
Unfortunately I no longer have the book to provide you with the page 
reference.

I have found a user written command -intcens-, which can perform 
interval-censored survival analysis and fits a number of different 
distributions, but I cannot find any documentation or examples of its use 
(apart from the help file).

Does anyone have any examples of using AFT models to perform interval 
regression or examples of using -intcens-?  Or do you think that there is 
a better way I could be handling the data?


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index