Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: re: st: Ocratio gives neither AIC nor BIC

From	Marcos Almeida <[email protected]>
To	[email protected]
Subject	Re: re: st: Ocratio gives neither AIC nor BIC
Date	Wed, 1 Jan 2014 11:56:22 -0200
Hello Phil and all Statalisters,

Dear Phil, thank you very much for your reply and for the suggestions.
Also, for the research done on heart rate variability. Regarding
pnn50, as I underlined, we're dealing with long-run (24 h) "real-life"
electrocardiogram, and it represents the mean value for all the
period. Therefore, it's different from the registers we'd take in 15
minutes, having patients resting on a lit. There is no standard for
the elderly population and my aim is to shed some light on the changes
of this variable according to the age groups, taking into
consideration a few covariates.

The "problem" was: there were many outliers and I feared we'd rather
not just "dismiss" them. For many reasons, let alone the very fact we
still don't have the standard for the elderly population, particularly
within 24 h reports.

I absolutely agree with your and David's recommendations, and I guess,
to some extent,  as I mentioned, they were put into practice,
including the log-transformation and the checking of residuals. To may
dismay, I'm afraid these deeds weren't enough to cope with such a huge
variation. That prompted me to accept losing some information by
employing the quartiles, but having as a reward a much
"representative" estimation. Or "stable', if you will.

So I first-in-time "plunged" into a whole bunch of multinomial
response models. Unfortunately, the assumption of proportional odds
didn't favour the ordered logit model. Nor the IIA assumption favoured
the multinomial logit model. Then I tested the generalized ordered
outcome model, and the adjustments for the partial proportional odds
pleased me most. I did this by installing the user-written gologit2
and adding in the Stata command the option "autofit". The reason was
the fact that some covariates didn't violate the proportional-odds
assumption, so I found it reasonable to keep their regression
coefficients without restrictions.

By the way, I guess I dutifully abided by the instructions in the book
I mentioned: Generalized Linear Models and Extensions, Hardin and
Hilbe, Stata Press, third edition, 2012.

Indeed, no  sooner after reading in the book the chapter on
continuation-ratio, then I decided to give it a try, and compare with
my "collection" of AICs and BICs from the models hitherto evaluated.

I installed the user-written command ocratio as suggested in pages
339-344. According to the reports presented in the book, we'd easily
get the AIC by just typing "aic" after the estimation. It didn't
happen so, though. Not even when typing "estat ic", or, else, after
installing the user-written "fitstat". All I got was the message in
red: "estimates not found".

Since my software is a weekly uptaded Stata 13 IC, and I gather the
version from the above-mentioned book might well be Stata 11 or 12, I
decided to share this "situation" in the Statalist, may it perchance
be some kind of troubleshooting.

Hopefully you will give me some further advice on how to endly get the
AIC after ocratio.

Thank you again for all the consideration and thoughtful suggestions!

Have all Statalisters an excellent 2014!

Best regards,

Marcos Almeida
Associate Professor of Medicine
UNIT
Brazil




Date: Mon, 30 Dec 2013 12:09:36 -0600
From: Phil Schumm <[email protected]>
Subject: Re: st: Ocratio gives neither AIC nor BIC

On Dec 29, 2013, at 1:33 PM, Marcos Almeida <[email protected]> wrote:
> The dependent variable relates to heart rate variability software in time domain analysis. It is called pnn50. Pnn50 is a result of (validated) computerized measurements done over a 24 hour electrocardiogram and conveys the parasympathetic flow: the higher the values, the higher the parasympathetic flow.
>
> In this dataset, the mean of pnn50 is = 9; the SD = 15; min = 0.01; max = 213).


My understanding was that pNN50 is a percentage.  The mean and SD you
cite sound plausible (e.g., Ramaekers et al. 1998), but the minimum
(and obviously the maximum) do not.  How does your measure differ from
the standard pNN50 calculation?  As David said, understanding how your
dependent variable is measured/calculated is typically the first step
in determining a reasonable model.  For example, if you are indeed
modeling a percentage, then a logistic model (i.e., glm with logit
link and binomial variance function) might be a plausible candidate.

Another important aspect of choosing an appropriate model is the goal
of your analysis.  Are you looking merely to test a null hypothesis,
or are you looking for a richer description of the relationships in
the data?  Are you primarily interested in estimating how the response
changes (possibly on a specific scale) according to changes in your
covariates?  Are you looking to replicate a previous analysis, or to
provide information that can be used in subsequent studies or possibly
for clinical purposes (e.g., a nomogram)?  Or, are you primarily
interested in prediction?  Having a clear idea of your analytic
goal(s) is also an important part of model-building.

David gave excellent advice WRT using plots to examine the
distribution of your dependent variable *conditional on* the
covariates (as opposed to only the marginal distribution of the
dependent variable).  The most important features here are the mean of
the distribution (which determines the appropriate link function) and
the variance (which determines the appropriate variance function or
distribution family).  In particular, so-called component plus
residual plots are excellent for examining how the mean changes with
the covariates, while a smoothed plot of the absolute (standardized)
residuals can be helpful in identifying an appropriate variance
function (I presume that these strategies or similar alternatives are
discussed in Hardin and Hilbe's book).  Whether you ultimately
transform the dependent variable or use an appropriate combination of
link/family to obtain a model depends in part on your analytic goals,
but either should, if properly performed, give similar con!
 clusions.

Personally, I wouldn't transform pNN50 into quartiles (at least not
without a compelling reason for doing so).  This throws away
information, and ties your conclusions to the observed quartiles in
your sample, which may not be relevant in other samples/populations.
On a related note, I'm not surprised that the proportional odds model
doesn't fit, since that assumes an underlying logistic distribution
(albeit conditional on the covariates), and from what you've said, it
doesn't sound like your dependent variable is symmetric (as is the
logistic distribution).

As an alternative, if you want to explore how the quantiles of your
dependent variable are related to your covariates, you could use
quantile regression (-qreg- in Stata).  Koenker (2005) illustrates the
value of plotting the coefficient from quantile regression against the
quantile, which can be very informative.

In sum, your dataset is large enough (n = 1,800) to provide a lot of
information about which models fit well (and which do not), and, once
you have a good model, to yield relatively precise estimates.  My
advice (consistent with David's) would be to spend some more time
thinking about the precise nature of your dependent variable, and
examining how the (conditional) distribution of this changes with the
covariates.  Based on this, it is likely that you can come up with a
reasonable regression model (either by transforming the dependent
variable or by an appropriate choice of link/family), which will serve
as a good baseline model even if you decide to pursue other
approaches.


- -- Phil

References
- ----------

Hardin, J. W. & Hilbe, J. M. (2007). Generalized Linear Models and
Extensions, 2nd edition. College Station, TX: Stata Press.

Koenker, R. (2005). Quantile Regression. Cambridge: Cambridge University Press.

Ramaekers, D., Ector, H., Aubert, A. E., Rubens, A., & Van de Werf, F.
(1998). Heart rate variability and heart rate in healthy volunteers.
European Heart Journal, 19, 1334-1341.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/
Prev by Date: st: Randomly assigning certain percentage of observations within groups
Next by Date: Re: st: Randomly assigning certain percentage of observations within groups
Previous by thread: st: Randomly assigning certain percentage of observations within groups
Next by thread: re: Re: st: RE: Propensity score balancing property
Index(es):
- Date
- Thread