Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.

# Re: st: Quantile regression

 From "JVerkuilen (Gmail)" To statalist@hsphsun2.harvard.edu Subject Re: st: Quantile regression Date Sat, 22 Sep 2012 09:05:59 -0400

```The best way is to use the quartiles AFTER you've modeled the predictor (BMI).

To my eye, quantile regression seems like a pretty good bet for you
given how nonlinear things are likely to be. A good though not
infallible strategy for model building is:

(1) Pick some control variables. These are the variables you know are
important but don't fundamentally care about. These could be age or
gender, for instance.
(2) Pick your focal variables (such as BMI).
(3) Think about some interactions, which may be between variables that
are otherwise controls.

Now fit models ranging from the simplest (controls only) to the most
complex (control+focal+interactions). I highly recommend -sqreg- in
this regard as you can choose other quantiles besides the median and
get a very good summary of the entire distribution of glucose, not
just the median.

Use predicted values of different scenarios to generate a good,
parsimonious (most likely graphical) summary of the effect and its
associated uncertainty. There is a substantial amount of art and
effort in this. If a picture is worth a thousand words, think about
how long it takes to write a *good* thousand words.

On Sat, Sep 22, 2012 at 6:55 AM, Vasan Kandaswamy
<vasan.kandaswamy@ki.se> wrote:
> Thank you very much Nick for you response.
> They are very useful in taking the analysis forward. Is there any way that variability within quartile groups be addressed ?
> Best regards,
> Vasan
>
> ________________________________________
> From: owner-statalist@hsphsun2.harvard.edu [owner-statalist@hsphsun2.harvard.edu] on behalf of Nick Cox [njcoxstata@gmail.com]
> Sent: Saturday, September 22, 2012 11:52 AM
> To: statalist@hsphsun2.harvard.edu
> Subject: Re: st: Quantile regression
>
> I will number your commands for ease of discussion.
>
> 1. xtile bmi_q = bmi, nquantiles(4)
>
> 2. bysort bmi_q sex:sum glucose, detail
>
> 3. bysort sex: anova glucose_log bmi_q
>
> 4. bysort sex: qreg bmi glucose age
>
> #2 gives descriptive statistics, which no doubt could be useful. I
> would expect graphs to be as or more useful, e.g.
>
> scatter glucose bmi || lowess glucose bmi, by(sex)
>
> #1 and #3 are choices that seem very hard to defend in any statistical
> discussion. You are throwing away information on variability within
> quartile groups of -bmi- and degrading the data.
>
> #4 is puzzling too. Why expect a linear relation between -bmi- and its
> predictors? If  there are different relationships according to -sex-,
> the most usual tactic is not to fit separate models, but to fit a
> joint model with interactions between age and sex.
>
> If -glucose- is the response, it should not be the predictor in #4.
>
> Why is glucose treated as linear in one model and logged in another?
>
> This is not my field, but I find it difficult to imagine that the
> science _demands_ thinking in terms of quartiles. Quartiles are a best
> a convenient categorisation and at worst an arbitrary and inefficient
> one.
>
> Identifiying a best predictor is never easy and often futile.
>
> Nick
>
> On Sat, Sep 22, 2012 at 9:05 AM, Vasan Kandaswamy
> <vasan.kandaswamy@ki.se> wrote:
>
>> Thank you very much. I sincerely apologize for not having made my question clear.
>>
>> The scientific question that I would like to address are:
>> 1. How much fold increase in outcome variable ( glucose) is observed from Quartile 1 to Quartile 4 of predictor variable (BMI) and want to see if this difference across quartiles is significant.
>> 2. How much is the unit change observed in outcome variable.
>> 3. With various predictors ( BMI, waist, body fat, weight etc) , I want to see which one best predicts the outcome variable
>> 4. All analysis I would like to see seperately for men and women
>>
>> 1. derived mean/median of outcome variable in each quartile
>> 2. To compare the mean of glucose across quartiles of BMI for males ( not compare male mean and female mean in each quartile)- I intend to do an one way ANOVA ( but was suggested a two way)
>> 3. To observe the unit change across quartiles, I wanted to do a regression model using qreg.
>> 4. Finally, I am not sure as to how to go about with finding out which is the best predictor of the outcome. ( If I am not mistaken, I do not think I can do a standardized beta in qreg).
>>
>> The script I used are
>> xtile bmi_q = bmi, nquantiles(4)
>> bysort bmi_q sex:sum glucose, detail
>> bysort sex: anova glucose_log bmi_q
>> bysort sex: qreg bmi glucose age
>>
>> I hope I have made it more understandable now.
>> Would be really very useful if I have your suggestions on these.
>
> David Hoaglin [dchoaglin@gmail.com]
>
>> I'm puzzled.  From the way in which you described your analysis in
>> your first message, I don't understand why you would use quantile
>> regression.  As I recall, you wanted to compare the means of some
>> variables across quartiles of BMI for males and females.  In that
>> description, it was not clear to me whether you wanted to compare the
>> mean of a variable in data from males among the quartiles of BMI and
>> similarly in data from females, or whether you wanted to compare the
>> female mean and the male mean within each quartile of BMI, or whether
>> you wanted to make both of these types of comparisons.  I did not see
>> any mention of the numbers of observations or the source of the data
>> or, importantly, the scientific question that you are addressing.
>>
>> As I read the command below, you are asking -qreg- the fit a
>> regression model to the median of BMI with predictors fast_glucose,
>> etc. (the median is the default quantile in -qreg-).  This seems far
>> from what you set out to do.
>>
>> Those of us who are following this thread would be better able to
>> advise you if you went back to the beginning and gave us more
>> information on the data and the context.  I do not know, for example,
>> whether the data that you are analyzing are suitable for ANOVA.  They
>> may be (perhaps after a transformation), and you may have given up on
>> ANOVA too quickly.
>
>> On Wed, Sep 19, 2012 at 5:33 PM, Vasan Kandaswamy
>
>>> Now, I have given up on ANOVA since I cannot derive p values for gender seperately, but did a regression.
>>>
>>> A quantile regression this way comes up this way
>>> bysort bmi_q sex:sum g0mmol
>>> bysort sex: qreg bmi fast_glucose age pr ( adjusted for age)
>>>
>>> I tabulate the output this way
>>> BMI                Q1      Q2        Q3        Q4     Beta (95%CI)            P value
>>> Male              5.3     5.4        5.5        5.6     2.61 (1.46, 3.76)     8.91 x 10^-06
>>> Female         5.4      5.4       5.4         5.7    0.36 (-0.15, 0.86)     0.168
>>>
>>> IF you actually look at the mean glucose values in Q1-Q5, there is not much difference, but the regression shows a clear difference with p values of males significant, while females are not.
>>>
>>> Could you please explain of my approach is correct.
>>> The basic question I would like to ask is if the fold change from Q1 to Q5 is significant.
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/

--
JVVerkuilen, PhD
jvverkuilen@gmail.com

"Out beyond ideas of wrong-doing and right-doing there is a field.
I'll meet you there. When the soul lies down in that grass the world
is too full to talk about." ---Rumi
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```