Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.

# FW: st: Quantile regression

 From "Seed, Paul" To "statalist@hsphsun2.harvard.edu" Subject FW: st: Quantile regression Date Sun, 23 Sep 2012 23:50:10 +0100

```Vasan Kandaswamy <vasan.kandaswamy@ki.se> asked:
<snip>
The scientific question that I would like to address are:
1. How much fold increase in outcome variable ( glucose) is observed from Quartile 1 to Quartile 4 of predictor variable (BMI) and want to see if this difference across quartiles is significant.
2. How much is the unit change observed in outcome variable.
3. With various predictors ( BMI, waist, body fat, weight etc) , I want to see which one best predicts the outcome variable
4. All analysis I would like to see seperately for men and women
................................................................
I find this approach surprising for several scientific and statistical reasons

1. BMI is a very well explored predictor, and there are standard World Health Organisation
definitions for underweight, healthy weight, over weight, obese etc.  Defining your own
categories based on your current data set just adds unecessary confusion, and reduces
2. The relationship between clinical outcomes (and sometimes biomarkers such as glucose)
with BMI is often non-linear, sometime with a minimum
around 20-25 kg/m2.  Comparing Q1 & Q3 (the first and third quartiles - there is no fourth quartile)
will not pick this up.  Nor will comparing the first and fourth quarters
(defined as BMI <Q1 and BMI > Q3), which I assume is what you meant.

3.  Although serum glucose has a skewed distribution, a log transformation
will get you much closer to Normality.  (and help to identify
outliers due to inadequate samples, data entry errors etc.
4. For a lognormal distribution the observed medians and geometric means are
estimates of the same population paramenter; with the geometric mean
the smaller variance.
5. Linear regression on the logged glucose values can be used to estimate
the ratios of the geometric mean between BMI groups.
6. Why separately for men & women?  Do you have good a priori reasons
(preferably incl.uding both biochemical theory and published data)
for thinking that glcouse and BMI behave
totally differently (not just have different average values)
7. It looks as though you are warming up to saying "The effect exists
for men, but not for women", based only on p<0.05 (M) and p>0.2 (F);
or something similar.
8. The power for the separate M& F estimates will obviously be less
for an estimated on the combined data set. Likewise, the use of ranks instead of exact values amounts to corasening the
data, or throwing  information away; and there will be a corresponding
loss of power.
This is a basic error known as comparing p-values; and is fR TOO common
in published papers.  Unless you carry out proper interaction tests, there
is no justification for such conclusions.

BW

Paul T Seed
King's College London, Division of Women's Health
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```