Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Gillian.Frost@hsl.gov.uk |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: RE: Interval regression with skewed data |

Date |
Tue, 10 Jan 2012 14:06:00 +0000 |

Hello Nick, Thank you for your response. I think that I'll end up using the transformation you suggest. Many thanks, Gillian From: Nick Cox <njcoxstata@gmail.com> To: statalist@hsphsun2.harvard.edu Date: 10/01/2012 08:53 Subject: Re: st: RE: Interval regression with skewed data Sent by: owner-statalist@hsphsun2.harvard.edu So, your problem is essentially whether the variably censored distribution of your outcome differs by region. Linearity is thus not an issue as you would need to represent your categorical variable region by a set of indicator variables. Depending on how much censoring there is you might be able to work out median and quartiles and produce partial box plots to get a feeling for skewness. Some people in ecology use log(count + 1) as a transform. People outside that field tend to be sniffy about it. Alan's suggestion remains a very good one in principle. Nick On Tue, Jan 10, 2012 at 8:40 AM, <Gillian.Frost@hsl.gov.uk> wrote: > Hello Nick, Alan, > > Thank you both for your replies. > > Nick, I apologise for not being clear in my original posting. My > outcome/dependent variable is the number of colony forming units per ml, > and my predictor/independent variable is the region (North West, North > East, South East England,...) within which the sample was taken. I > gravitated towards interval regression because I have some observations > that are left censored and some that are right censored but the censoring > value is not always the same, and I started to think about survival > analysis because I had seen a suggestion where this could be used to > perform interval regression when the Normality assumption was violated. > Unfortunately, my outcome has some zero counts and so I cannot really use > the logarithmic transform. I am more than happy to consider other methods > of analysis if you have any ideas? > > Alan, I am afraid that what you suggest is probably outside of my > programming and statistical expertise, and would also take me longer than > the time I have to look at this problem. > > Many thanks, > > Gillian > > > > > > From: Nick Cox <n.j.cox@durham.ac.uk> > To: "'statalist@hsphsun2.harvard.edu'" > <statalist@hsphsun2.harvard.edu> > Date: 09/01/2012 16:15 > Subject: st: RE: Interval regression with skewed data > Sent by: owner-statalist@hsphsun2.harvard.edu > > > > I'd be more worried about violating linearity of functional form than > normality of errors, but you say nothing about that. Nor do you say > anything about what your predictors are. > > I can't see from your discussion that it can be a choice between interval > regression and some kind of survival analysis. What you have doesn't sound > to me at all like a survival analysis problem. > > However, assuming the first, you could transform before you use -intreg-. > Your limits just transform to limits on your transformed scale. From other > experiences with hydrological data I would reach for a logarithmic > transform as first port of call. You would need to back-transform > afterwards. > > Nick > n.j.cox@durham.ac.uk > > Gillian.Frost@hsl.gov.uk > > I am struggling with an analysis and would like your insight. I think > that I am looking at using interval regression but there are certain > aspects of the data that are worrying me. First some background... > > A number of water samples have been taken from around the UK, and a > microbiological examination of the water has been undertaken. Whenever a > sample is sent to a lab, a whole suite of tests are done to count the > number of colony forming units of various organisms. I therefore have a > number of outcomes, whose units are the number of colony forming units per > > ml. The aim of this part of the analysis is to compare the organism > levels found in different regions of the UK. > > Some observations are left censored (0-6% depending on the outcome) - ie > <1 CFU/ml, or <10 CFU/ml - and some are right censored (0-59%) - ie. >3000 > > CFU/ml. The censoring point varies,and so I thought that I would have to > use interval regression (Stata's -intreg-). > > However, the data are not Normally distributed (which is an assumption of > interval regression), but are positively skewed with some outcomes having > a high number of zero counts (one has 75% zeros!). In the book by J S > Long (Regression models for categorical and limited dependent variables, > 2007), there was a discussion about how accelerated failure time (AFT) > models can be used to perform interval regression when the data are not > Normally distributed, but there was no example of how to do this. > Unfortunately I no longer have the book to provide you with the page > reference. > > I have found a user written command -intcens-, which can perform > interval-censored survival analysis and fits a number of different > distributions, but I cannot find any documentation or examples of its use > (apart from the help file). > > Does anyone have any examples of using AFT models to perform interval > regression or examples of using -intcens-? Or do you think that there is > a better way I could be handling the data? > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ ------------------------------------------------------------------------ ATTENTION: This message contains privileged and confidential information intended for the addressee(s) only. If this message was sent to you in error, you must not disseminate, copy or take any action in reliance on it and we request that you notify the sender immediately by return email. Opinions expressed in this message and any attachments are not necessarily those held by the Health and Safety Laboratory or any person connected with the organisation, save those by whom the opinions were expressed. Please note that any messages sent or received by the Health and Safety Laboratory email system may be monitored and stored in an information retrieval system. ------------------------------------------------------------------------ Think before you print - do you really need to print this email? ------------------------------------------------------------------------ ------------------------------------------------------------------------ Scanned by MailMarshal - Marshal's comprehensive email content security solution. Download a free evaluation of MailMarshal at www.marshal.com ------------------------------------------------------------------------ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: Interval regression with skewed data***From:*Gillian.Frost@hsl.gov.uk

**st: RE: Interval regression with skewed data***From:*Nick Cox <n.j.cox@durham.ac.uk>

**Re: st: RE: Interval regression with skewed data***From:*Gillian.Frost@hsl.gov.uk

**Re: st: RE: Interval regression with skewed data***From:*Nick Cox <njcoxstata@gmail.com>

- Prev by Date:
**Re: st: Spline interpolation of spatial data** - Next by Date:
**st: GLM with spatial correlation among error terms?** - Previous by thread:
**Re: st: RE: Interval regression with skewed data** - Next by thread:
**Re: st: RE: Interval regression with skewed data** - Index(es):