Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

FW: st: Quantile regression

From	"Seed, Paul" <[email protected]>
To	"[email protected]" <[email protected]>
Subject	FW: st: Quantile regression
Date	Sun, 23 Sep 2012 23:50:10 +0100

Vasan Kandaswamy <[email protected]> asked:
<snip>
The scientific question that I would like to address are:
1. How much fold increase in outcome variable ( glucose) is observed from Quartile 1 to Quartile 4 of predictor variable (BMI) and want to see if this difference across quartiles is significant.
2. How much is the unit change observed in outcome variable.
3. With various predictors ( BMI, waist, body fat, weight etc) , I want to see which one best predicts the outcome variable
4. All analysis I would like to see seperately for men and women
................................................................
I find this approach surprising for several scientific and statistical reasons

1. BMI is a very well explored predictor, and there are standard World Health Organisation
definitions for underweight, healthy weight, over weight, obese etc. Defining your own
categories based on your current data set just adds unecessary confusion, and reduces
the usefulness of your results.
2. The relationship between clinical outcomes (and sometimes biomarkers such as glucose)
with BMI is often non-linear, sometime with a minimum
around 20-25 kg/m2. Comparing Q1 & Q3 (the first and third quartiles - there is no fourth quartile)
will not pick this up. Nor will comparing the first and fourth quarters
(defined as BMI <Q1 and BMI > Q3), which I assume is what you meant.

3. Although serum glucose has a skewed distribution, a log transformation
will get you much closer to Normality. (and help to identify
outliers due to inadequate samples, data entry errors etc.
4. For a lognormal distribution the observed medians and geometric means are
estimates of the same population paramenter; with the geometric mean
the smaller variance.
5. Linear regression on the logged glucose values can be used to estimate
the ratios of the geometric mean between BMI groups.
6. Why separately for men & women? Do you have good a priori reasons
(preferably incl.uding both biochemical theory and published data)
for thinking that glcouse and BMI behave
totally differently (not just have different average values)
7. It looks as though you are warming up to saying "The effect exists
for men, but not for women", based only on p<0.05 (M) and p>0.2 (F);
or something similar.
8. The power for the separate M& F estimates will obviously be less
for an estimated on the combined data set. Likewise, the use of ranks instead of exact values amounts to corasening the
data, or throwing information away; and there will be a corresponding
loss of power.
This is a basic error known as comparing p-values; and is fR TOO common
in published papers. Unless you carry out proper interaction tests, there
is no justification for such conclusions.

Paul T Seed
King's College London, Division of Women's Health
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: FW: st: Quantile regression
  - From: Maarten Buis <[email protected]>

Prev by Date: Re: st: creating cross tables/ matrices with expected/ observed frequencies from long data set
Next by Date: Re: st: Convert SAS code to STATA
Previous by thread: Re: st: Quantile regression
Next by thread: Re: FW: st: Quantile regression
Index(es):
- Date
- Thread