Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down at the end of May, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Seed, Paul" <paul.seed@kcl.ac.uk> |

To |
"statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |

Subject |
FW: st: Quantile regression |

Date |
Sun, 23 Sep 2012 23:50:10 +0100 |

Vasan Kandaswamy <vasan.kandaswamy@ki.se> asked: <snip> The scientific question that I would like to address are: 1. How much fold increase in outcome variable ( glucose) is observed from Quartile 1 to Quartile 4 of predictor variable (BMI) and want to see if this difference across quartiles is significant. 2. How much is the unit change observed in outcome variable. 3. With various predictors ( BMI, waist, body fat, weight etc) , I want to see which one best predicts the outcome variable 4. All analysis I would like to see seperately for men and women ................................................................ I find this approach surprising for several scientific and statistical reasons 1. BMI is a very well explored predictor, and there are standard World Health Organisation definitions for underweight, healthy weight, over weight, obese etc. Defining your own categories based on your current data set just adds unecessary confusion, and reduces the usefulness of your results. 2. The relationship between clinical outcomes (and sometimes biomarkers such as glucose) with BMI is often non-linear, sometime with a minimum around 20-25 kg/m2. Comparing Q1 & Q3 (the first and third quartiles - there is no fourth quartile) will not pick this up. Nor will comparing the first and fourth quarters (defined as BMI <Q1 and BMI > Q3), which I assume is what you meant. 3. Although serum glucose has a skewed distribution, a log transformation will get you much closer to Normality. (and help to identify outliers due to inadequate samples, data entry errors etc. 4. For a lognormal distribution the observed medians and geometric means are estimates of the same population paramenter; with the geometric mean the smaller variance. 5. Linear regression on the logged glucose values can be used to estimate the ratios of the geometric mean between BMI groups. 6. Why separately for men & women? Do you have good a priori reasons (preferably incl.uding both biochemical theory and published data) for thinking that glcouse and BMI behave totally differently (not just have different average values) 7. It looks as though you are warming up to saying "The effect exists for men, but not for women", based only on p<0.05 (M) and p>0.2 (F); or something similar. 8. The power for the separate M& F estimates will obviously be less for an estimated on the combined data set. Likewise, the use of ranks instead of exact values amounts to corasening the data, or throwing information away; and there will be a corresponding loss of power. This is a basic error known as comparing p-values; and is fR TOO common in published papers. Unless you carry out proper interaction tests, there is no justification for such conclusions. BW Paul T Seed King's College London, Division of Women's Health * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: FW: st: Quantile regression***From:*Maarten Buis <maartenlbuis@gmail.com>

- Prev by Date:
**Re: st: creating cross tables/ matrices with expected/ observed frequencies from long data set** - Next by Date:
**Re: st: Convert SAS code to STATA** - Previous by thread:
**Re: st: Quantile regression** - Next by thread:
**Re: FW: st: Quantile regression** - Index(es):