Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
David Hoaglin <dchoaglin@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Quantile regression |

Date |
Sat, 22 Sep 2012 09:07:06 -0400 |

Dear Vasan, Your explanation makes it easier for me to offer suggestions, from the perspective of a statistician. I agree with most of Nick's comments, but not all. For example, if your main interest is in separate results for men and women, you can do separate analyses. An analysis of the data from both men and women, with interactions in the model, may also be of interest, but it would be optional. I would start by exploring the relation between the outcome variable (glucose) and each of the predictor variables (BMI, waist, body fat, weight, etc.), without partitioning any predictor into quartiles. Nick's suggestion of a scatter plot with a lowess curve is a good way to do it. The data may suggest that you work with glucose in the log scale (I prefer base-10 logs), or your interest in "fold" differences may support using the log scale (a ratio or "fold" difference in the original scale corresponds to a constant difference in the log scale). Those explorations should help you to determine whether a linear relation between the outcome and the predictor is a satisfactory summary, or whether the relation is more complicated. If a linear relation is satisfactory, you can fit a regression (line) to the full data. Then you can compare the predicted outcome at a suitable value of the predictor in Q4 against the predicted outcome at a suitable value of the predictor in Q1. It may be reasonable to fit a simple linear regression of the outcome on each of the potential predictors and use the values of R^2 to decide which predictor is "best." In the process, you should make a scatter plot of the residuals from each regression (against the predictor). Also, it would be wise to check whether any observations are particularly influential. In an alternative analysis, you could partition the predictor variable into quartiles, treat those quartiles as four groups, and use one-way analysis of variance. That analysis would use the full data, not just the mean (or median) within each quartile. The usual output for a one-way ANOVA will show whether the means of the four groups differ significantly (in any way). To get the difference between the mean in Q4 and the mean in Q1, you can set up the analysis as a regression, with Q1 as the reference category and dummy variables for the other three quartiles. Then the significance of the coefficient of the dummy variable for Q4 is what you are interested in. I am often uncomfortable with sharp boundaries because they distort differences between observations. If you want to consider several predictors simultaneously, you will need a multiple regression model. The results will tell you about the contribution of each predictor, adjusting for the contributions of the other predictors. It may be messy to set up quartiles on all of the predictors and use the sets of dummy variables in a regression model, but it would not be impossible. One compromise would use dummy variables for one predictor and linear expressions for the others. I don't yet understand what you mean by "the unit change observed in the outcome variable." I hope this discussion is useful. David Hoaglin On Sat, Sep 22, 2012 at 4:05 AM, Vasan Kandaswamy <vasan.kandaswamy@ki.se> wrote: > Dear David, > Thank you very much. I sincerely apologize for not having made my question clear. > > The scientific question that I would like to address are: > 1. How much fold increase in outcome variable ( glucose) is observed from Quartile 1 to Quartile 4 of predictor variable (BMI) and want to see if this difference across quartiles is significant. > 2. How much is the unit change observed in outcome variable. > 3. With various predictors ( BMI, waist, body fat, weight etc) , I want to see which one best predicts the outcome variable > 4. All analysis I would like to see seperately for men and women > > To address these : I went about this way > 1. derived mean/median of outcome variable in each quartile > 2. To compare the mean of glucose across quartiles of BMI for males ( not compare male mean and female mean in each quartile)- I intend to do an one way ANOVA ( but was suggested a two way) > 3. To observe the unit change across quartiles, I wanted to do a regression model using qreg. > 4. Finally, I am not sure as to how to go about with finding out which is the best predictor of the outcome. ( If I am not mistaken, I do not think I can do a standardized beta in qreg). > > The script I used are > xtile bmi_q = bmi, nquantiles(4) > bysort bmi_q sex:sum glucose, detail > bysort sex: anova glucose_log bmi_q > bysort sex: qreg bmi glucose age > > I hope I have made it more understandable now. > Would be really very useful if I have your suggestions on these. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**RE: st: Quantile regression***From:*Vasan Kandaswamy <vasan.kandaswamy@ki.se>

**Re: st: Quantile regression***From:*David Hoaglin <dchoaglin@gmail.com>

**RE: st: Quantile regression***From:*Vasan Kandaswamy <vasan.kandaswamy@ki.se>

- Prev by Date:
**Re: st: spmap** - Next by Date:
**Re: st: Quantile regression** - Previous by thread:
**RE: st: Quantile regression** - Next by thread:
**FW: st: Quantile regression** - Index(es):