[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Graphing Quadratic Interactions

From   Ronan Conroy <>
To   "" <>
Subject   Re: st: Graphing Quadratic Interactions
Date   Tue, 16 Dec 2008 10:53:56 +0000

This request has generated no replies as yet, and it's not hard to see why. Note, especially "I need a pretty graph"

I looked up "pretty graph" in Stata and there's no routine available, so we'll have to think this one out. Apologies - this is a long post.

On 14 Noll 2008, at 16:41, Susanna Khavul wrote:

I would like to graph a quadratic interaction after running rreg
(Robust Regression). The model is as follows:

xi:rreg Y i.X1 X2 X3 X4 X5 X5squared X5*X4 X5squared*X4
(interaction terms are centered)

X5 -- independent variable
X5squared--square of the independent variable
X5*X4--independent variable x moderator
X5squared*X4--independent variable squared x moderator

The rest (X1 X2 X3) are controls which I want to account for as well.

One standard deviation above and below the mean would be fine.

I want to include both components of the interaction (linear and
quadratic). Is there a module that will do it?  If not, what is the
optimum graphing command.

Any help would be much appreciated. I need a pretty graph.

It appears that X2, X3 and X4 are binary and that X1 has more than two categories. Even more difficult, X5 enters two interaction terms.

Of course, there's no general answer to this, and a graph can only be constructed on the basis of the scientific question that the analysis answers.

The first question is whether any of the predictor variables defines subsets of the data which are of primary interest. There may be two reasons for this 1. The relationship is different in each subset. For example, in our National Health and Lifestyle Survey, we found that the relationship of well-being to age was different in men and women. In men, it declined continuously, while in women it declined sharply initially than recovered in middle and later life. 2. There may be a priori reasons for testing the hypothesis in two subsets. We've got a paper in press looking at the effect of worry on quality of life. because of its association with depression, we have graphed this relationship in the depressed and the non-depressed participants, side by side, to show that the relationship is similar in depressed and non-depressed people, but that the depressed have a worse baseline quality of life. (Interestingly, non-depressed severe worriers have a quality of life just as bad as non-worried depressed people).

If either of these two conditions is true, you probably need to construct a chart using -by- to make separate graphs for each subgroup

The second question is whether any of the covariates can be 'standardised out' By this I mean constructing the graphs at a chosen value of the covariate.

For example, worry declines very significanly with age, as does quality of life. For this reason, we constructed our graphs of quality of life for a person aged 75 (using -adjust-). With continuous predictors which are not of primary interest, but which must be controlled as confounders, graphing relationships at fixed (sensible) values of the predictors may be the best way of displaying the data.

The same logic may be applied to binary predictors, again using - adjust- to generate predictions for fixed prevalences of the predictors.

This leaves us with the final question of the form of the graph.

Again, without knowing the science behind the question, it's impossible to answer. However, adding one standard deviation above and below the mean is probably not a good data display. The standard deviation measures the scatter of observations, and if you're going to show scatter, then you are probably better off graphing the actual data, whose scatter can be quite different to that implied by the standard deviation. Adding boxes or means plus confidence intervals allows you to superimpose a data summary (both implemented in Nick Cox's remarkably useful -stripplot-, which is the first user-written graphic routine I require my students to download).

That's as much as I can say about the graph based on general principle. A good graph shows something interesting about the data. To make a good graph, you must first identify that interesting something. And this, clearly, is impossible in the absence of knowing either the hypothesis or the results of the analysis.

However, I would be wary of -rreg- as a primary analysis tool. It is a good procedure for reassuring yourself that the results of your analysis would not change substantively if influential observations were down-weighted, but as a primary model-building tool it suffers, I think, from a primary difficulty: that it violates the logic of hypothesis testing.

In a hypothesis test, the investigator specifies the form of the model and then calculates the model parameters. For example, the model

baby weight = a constant + (mother's height x something1) + (gestational age x something2) + error

needs three parameters calculated. For each parameter, we can test whether the proportional reduction in error is statistically significant.

However, robust regression rebuilds the model as a complex equation in which individual observations are entered in a reweighted form. Thus, the investigator is leaving the selection of the model itself to chance features of the data. As such, the hypothesis tests which follow violate the central assumption of any such test - that the model was specified independently of the data.

Perhaps I'm being a little jaundiced here, but I use -rreg- for support, not illumination.

Ronan Conroy
Royal College of Surgeons in Ireland
Epidemiology Department,
Beaux Lane House, Dublin 2, Ireland
+353 (0)1 402 2431
+353 (0)87 799 97 95
+353 (0)1 402 2764 (Fax - remember them?)

P    Before printing, think about the environment

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index