Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Fw: influential observations


From   Nick Cox <njcoxstata@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Fw: influential observations
Date   Tue, 12 Apr 2011 11:23:40 +0100

As I said, different fields have different arguments and issues. In
the Earth and environmental sciences, big rivers exist and should be
treated as genuine observations -- which is not to say that they are
easy to measure. In crystallography, my guess is that some
observations may well just look bad and throwing them out is what you
do: easy come, easy go. With clinical data, the call may well be
different.

You haven't reached the end of any road. Perhaps you could transform
one or more of your predictors.

Zeros in response are not necessarily incompatible with an inverse
link, any more than they are incompatible with the logarithmic link
you reported earlier. But I don't think they ever make things easier.

Nick

On Tue, Apr 12, 2011 at 10:56 AM, Arti Pandey <rtpandey@yahoo.com> wrote:

> Thanks Nick. I "think" I see what you mean, with the river example. My lack of
> understanding stems from automation, using software that would do everything and
> all we were looking at was the chi2 values. The field being x-ray
> crystallography, there were several hundred thousand observations, and there
> were always some that had been excluded at the end of convergence of data. I
> never worried about it.
> What I am dealing with now is very few (89, and precious) observations, which
> are clinical measurements. So if a statistical software tells me to look at
> certain observations, I do that, find they are all right, no reason to exclude
> and hit a dead end.
> For the inverse link, the glm goes on endlessly with iterations, I guess because
> I have zeroes in my response variable.
> Arti
>
>
> ----- Original Message ----
> From: Nick Cox <njcoxstata@gmail.com>
> To: statalist@hsphsun2.harvard.edu
> Sent: Tue, April 12, 2011 2:43:55 PM
> Subject: Re: st: Fw: influential observations
>
> Do Anscombe residuals come out normal with non-normal families? I am
> away from any pertinent literature.
>
> On Tue, Apr 12, 2011 at 8:51 AM, Nick Cox <njcoxstata@gmail.com> wrote:
>> I don't think it is a good idea to expect a firm statistical answer
>> based on this information.
>>
>> 1. Isn't there science that will throw light on this question for you?
>> For example, in my field, the Amazon is often an influential
>> observationr, as are other very big rivers. But throwing them out just
>> because they might make modelling awkward would usually be very
>> strange science. They deserve their votes. Your field, whatever it is,
>> wil presumably have its own arguments and issues.
>>
>> 2. When there are influential observations in a -glm-, considering a
>> different link, e.g. reciprocal, is often a good way forward.
>>
>> 3. There are many situations in which one predictor that is
>> insignificant at conventional levels deserves its place in a model if
>> it has a logical role.
>>
>> 4. I don't see why you expect normally distributed residuals when the
>> family is gamma!!! I think that overall plots of residual vs fitted,
>> observed vs fitted, variance of residual vs fitted, etc., are worth
>> more attention than the marginal distribution.
>>
>> Nick
>>
>> On Tue, Apr 12, 2011 at 6:17 AM, Arti Pandey <rtpandey@yahoo.com> wrote:
>>> Hello
>>> A belated thank you to Maarten Buis and David Greenberg for suggestions to my
>>> previous query.
>>> I decided to go with -glm- for my model and have been trying to understand
> the
>>> different procedures for checking the model
>>> The anscombe residuals and deviance are  normally distributed, but there are
>>> three influential observations based upon cooksd.
>>> On removing these observations, the BIC rises by 10, and one of the
> predictors
>>> also becomes insignificant.
>>> Is the model fitting because of these influential observations now and
>>>therefore
>>>
>>> not correct?
>>> I have continuous response data and used gamma distribution with log link.
>>> Any recommendations for information on model checking after glm are also
>>> appreciated, the book "glm
>>>
>>> and extensions" by Hardin and Hilbe is out of my reach, unless an electronic
>>> copy is available.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index