Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.

# Re: st: Quantile regression

 From David Hoaglin To statalist@hsphsun2.harvard.edu Subject Re: st: Quantile regression Date Sat, 22 Sep 2012 09:07:06 -0400

```Dear Vasan,

Your explanation makes it easier for me to offer suggestions, from the
perspective of a statistician.  I agree with most of Nick's comments,
but not all.  For example, if your main interest is in separate
results for men and women, you can do separate analyses.  An analysis
of  the data from both men and women, with interactions in the model,
may also be of interest, but it would be optional.

I would start by exploring the relation between the outcome variable
(glucose) and each of the predictor variables (BMI, waist, body fat,
weight, etc.), without partitioning any predictor into quartiles.
Nick's suggestion of a scatter plot with a lowess curve is a good way
to do it.  The data may suggest that you work with glucose in the log
scale (I prefer base-10 logs), or your interest in "fold" differences
may support using the log scale (a ratio or "fold" difference in the
original scale corresponds to a constant difference in the log scale).

relation between the outcome and the predictor is a satisfactory
summary, or whether the relation is more complicated.

If a linear relation is satisfactory, you can fit a regression (line)
to the full data.  Then you can compare the predicted outcome at a
suitable value of the predictor in Q4 against the predicted outcome at
a suitable value of the predictor in Q1.

It may be reasonable to fit a simple linear regression of the outcome
on each of the potential predictors and use the values of R^2 to
decide which predictor is "best."  In the process, you should make a
scatter plot of the residuals from each regression (against the
predictor).  Also, it would be wise to check whether any observations
are particularly influential.

In an alternative analysis, you could partition the predictor variable
into quartiles, treat those quartiles as four groups, and use one-way
analysis of variance.  That analysis would use the full data, not just
the mean (or median) within each quartile.  The usual output for a
one-way ANOVA will show whether the means of the four groups differ
significantly (in any way).  To get the difference between the mean in
Q4 and the mean in Q1, you can set up the analysis as a regression,
with Q1 as the reference category and dummy variables for the other
three quartiles.  Then the significance of the coefficient of the
dummy variable for Q4 is what you are interested in.

I am often uncomfortable with sharp boundaries because they distort
differences between observations.

If you want to consider several predictors simultaneously, you will
need a multiple regression model.  The results will tell you about the
contribution of each predictor, adjusting for the contributions of the
other predictors.  It may be messy to set up quartiles on all of the
predictors and use the sets of dummy variables in a regression model,
but it would not be impossible.  One compromise would use dummy
variables for one predictor and linear expressions for the others.

I don't yet understand what you mean by "the unit change observed in
the outcome variable."

I hope this discussion is useful.

David Hoaglin

On Sat, Sep 22, 2012 at 4:05 AM, Vasan Kandaswamy
<vasan.kandaswamy@ki.se> wrote:
> Dear David,
> Thank you very much. I sincerely apologize for not having made my question clear.
>
> The scientific question that I would like to address are:
> 1. How much fold increase in outcome variable ( glucose) is observed from Quartile 1 to Quartile 4 of predictor variable (BMI) and want to see if this difference across quartiles is significant.
> 2. How much is the unit change observed in outcome variable.
> 3. With various predictors ( BMI, waist, body fat, weight etc) , I want to see which one best predicts the outcome variable
> 4. All analysis I would like to see seperately for men and women
>
> 1. derived mean/median of outcome variable in each quartile
> 2. To compare the mean of glucose across quartiles of BMI for males ( not compare male mean and female mean in each quartile)- I intend to do an one way ANOVA ( but was suggested a two way)
> 3. To observe the unit change across quartiles, I wanted to do a regression model using qreg.
> 4. Finally, I am not sure as to how to go about with finding out which is the best predictor of the outcome. ( If I am not mistaken, I do not think I can do a standardized beta in qreg).
>
> The script I used are
> xtile bmi_q = bmi, nquantiles(4)
> bysort bmi_q sex:sum glucose, detail
> bysort sex: anova glucose_log bmi_q
> bysort sex: qreg bmi glucose age
>
> I hope I have made it more understandable now.
> Would be really very useful if I have your suggestions on these.
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```