Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Ocratio gives neither AIC nor BIC

From   Phil Schumm <>
To   Statalist Statalist <>
Subject   Re: st: Ocratio gives neither AIC nor BIC
Date   Mon, 30 Dec 2013 12:09:36 -0600

On Dec 29, 2013, at 1:33 PM, Marcos Almeida <> wrote:
> The dependent variable relates to heart rate variability software in time domain analysis. It is called pnn50. Pnn50 is a result of (validated) computerized measurements done over a 24 hour electrocardiogram and conveys the parasympathetic flow: the higher the values, the higher the parasympathetic flow.
> In this dataset, the mean of pnn50 is = 9; the SD = 15; min = 0.01; max = 213).

My understanding was that pNN50 is a percentage.  The mean and SD you cite sound plausible (e.g., Ramaekers et al. 1998), but the minimum (and obviously the maximum) do not.  How does your measure differ from the standard pNN50 calculation?  As David said, understanding how your dependent variable is measured/calculated is typically the first step in determining a reasonable model.  For example, if you are indeed modeling a percentage, then a logistic model (i.e., glm with logit link and binomial variance function) might be a plausible candidate.

Another important aspect of choosing an appropriate model is the goal of your analysis.  Are you looking merely to test a null hypothesis, or are you looking for a richer description of the relationships in the data?  Are you primarily interested in estimating how the response changes (possibly on a specific scale) according to changes in your covariates?  Are you looking to replicate a previous analysis, or to provide information that can be used in subsequent studies or possibly for clinical purposes (e.g., a nomogram)?  Or, are you primarily interested in prediction?  Having a clear idea of your analytic goal(s) is also an important part of model-building.

David gave excellent advice WRT using plots to examine the distribution of your dependent variable *conditional on* the covariates (as opposed to only the marginal distribution of the dependent variable).  The most important features here are the mean of the distribution (which determines the appropriate link function) and the variance (which determines the appropriate variance function or distribution family).  In particular, so-called component plus residual plots are excellent for examining how the mean changes with the covariates, while a smoothed plot of the absolute (standardized) residuals can be helpful in identifying an appropriate variance function (I presume that these strategies or similar alternatives are discussed in Hardin and Hilbe's book).  Whether you ultimately transform the dependent variable or use an appropriate combination of link/family to obtain a model depends in part on your analytic goals, but either should, if properly performed, give similar con!

Personally, I wouldn't transform pNN50 into quartiles (at least not without a compelling reason for doing so).  This throws away information, and ties your conclusions to the observed quartiles in your sample, which may not be relevant in other samples/populations.  On a related note, I'm not surprised that the proportional odds model doesn't fit, since that assumes an underlying logistic distribution (albeit conditional on the covariates), and from what you've said, it doesn't sound like your dependent variable is symmetric (as is the logistic distribution).

As an alternative, if you want to explore how the quantiles of your dependent variable are related to your covariates, you could use quantile regression (-qreg- in Stata).  Koenker (2005) illustrates the value of plotting the coefficient from quantile regression against the quantile, which can be very informative.

In sum, your dataset is large enough (n = 1,800) to provide a lot of information about which models fit well (and which do not), and, once you have a good model, to yield relatively precise estimates.  My advice (consistent with David's) would be to spend some more time thinking about the precise nature of your dependent variable, and examining how the (conditional) distribution of this changes with the covariates.  Based on this, it is likely that you can come up with a reasonable regression model (either by transforming the dependent variable or by an appropriate choice of link/family), which will serve as a good baseline model even if you decide to pursue other approaches.

-- Phil


Hardin, J. W. & Hilbe, J. M. (2007). Generalized Linear Models and Extensions, 2nd edition. College Station, TX: Stata Press.

Koenker, R. (2005). Quantile Regression. Cambridge: Cambridge University Press.

Ramaekers, D., Ector, H., Aubert, A. E., Rubens, A., & Van de Werf, F. (1998). Heart rate variability and heart rate in healthy volunteers. European Heart Journal, 19, 1334-1341.

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index