Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

FW: st: Quantile regression


From   "Seed, Paul" <paul.seed@kcl.ac.uk>
To   "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   FW: st: Quantile regression
Date   Sun, 23 Sep 2012 23:50:10 +0100

Vasan Kandaswamy <vasan.kandaswamy@ki.se> asked:
<snip>
The scientific question that I would like to address are:
1. How much fold increase in outcome variable ( glucose) is observed from Quartile 1 to Quartile 4 of predictor variable (BMI) and want to see if this difference across quartiles is significant. 
2. How much is the unit change observed in outcome variable.
3. With various predictors ( BMI, waist, body fat, weight etc) , I want to see which one best predicts the outcome variable
4. All analysis I would like to see seperately for men and women
................................................................
I find this approach surprising for several scientific and statistical reasons

1. BMI is a very well explored predictor, and there are standard World Health Organisation 
definitions for underweight, healthy weight, over weight, obese etc.  Defining your own 
categories based on your current data set just adds unecessary confusion, and reduces 
the usefulness of your results.
2. The relationship between clinical outcomes (and sometimes biomarkers such as glucose) 
with BMI is often non-linear, sometime with a minimum 
around 20-25 kg/m2.  Comparing Q1 & Q3 (the first and third quartiles - there is no fourth quartile)
will not pick this up.  Nor will comparing the first and fourth quarters 
(defined as BMI <Q1 and BMI > Q3), which I assume is what you meant.

3.  Although serum glucose has a skewed distribution, a log transformation 
will get you much closer to Normality.  (and help to identify 
outliers due to inadequate samples, data entry errors etc. 
4. For a lognormal distribution the observed medians and geometric means are 
estimates of the same population paramenter; with the geometric mean 
the smaller variance.
5. Linear regression on the logged glucose values can be used to estimate 
the ratios of the geometric mean between BMI groups.
6. Why separately for men & women?  Do you have good a priori reasons 
(preferably incl.uding both biochemical theory and published data) 
for thinking that glcouse and BMI behave 
totally differently (not just have different average values)
7. It looks as though you are warming up to saying "The effect exists 
for men, but not for women", based only on p<0.05 (M) and p>0.2 (F);
or something similar.
8. The power for the separate M& F estimates will obviously be less 
for an estimated on the combined data set. Likewise, the use of ranks instead of exact values amounts to corasening the 
data, or throwing  information away; and there will be a corresponding 
loss of power.
This is a basic error known as comparing p-values; and is fR TOO common 
in published papers.  Unless you carry out proper interaction tests, there 
is no justification for such conclusions.

BW 

Paul T Seed 
King's College London, Division of Women's Health
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index