Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: Is it valid to use the individual ratios (i.e. Xi/Yi) in the dependent or independent part of a regression model?

 From Tirthankar Chakravarty To statalist@hsphsun2.harvard.edu Subject Re: st: Is it valid to use the individual ratios (i.e. Xi/Yi) in the dependent or independent part of a regression model? Date Sat, 26 May 2012 01:26:09 -0700

```They estimate two different quantities - you decide which one you want:

*******************************************
webuse census2, clear

// ratio of means
ratio (deathrate: death/pop)
* or, more transparently
mean death pop
di _b[death]/_b[pop]

// mean of ratio
g deathrate = death/pop
reg deathrate
* or, more transparently
mean deathrate
*******************************************

T

On Sat, May 26, 2012 at 12:19 AM,  <guhjy@kmu.edu.tw> wrote:
> My point is that the mean and se are different between that obtained
> by the "ratio" (which is supposedly to be more accurate) and the
> "regress" command. Thus, the results obtained by the "regress" command
> may be invalid. My question is: how to analyze ratios as the dependent
> or independent variables in regression if the mean and se of (Xi/Yi)
> is incorrect.
> For example:
>
> . webuse census2, clear
> (1980 Census data by state)
>
> .
> . gen drate1=death/pop
>
> .
> . reg drate1
>
>      Source |       SS       df       MS              Number of obs =      50
> -------------+------------------------------           F(  0,    49) =    0.00
>       Model |           0     0           .           Prob > F      =       .
>    Residual |  .000083179    49  1.6975e-06           R-squared     =  0.0000
> -------------+------------------------------           Adj R-squared =  0.0000
>       Total |  .000083179    49  1.6975e-06           Root MSE      =   .0013
>
> ------------------------------------------------------------------------------
>      drate1 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
> -------------+----------------------------------------------------------------
>       _cons |    .008436   .0001843    45.78   0.000     .0080657    .0088063
> ------------------------------------------------------------------------------
>
> .
> . ratio (deathrate: death/pop)
>
> Ratio estimation                    Number of obs    =      50
>
>    deathrate: death/pop
>
> --------------------------------------------------------------
>             |             Linearized
>             |      Ratio   Std. Err.     [95% Conf. Interval]
> -------------+------------------------------------------------
>   deathrate |   .0087368   .0002052      .0083244    .0091492
> --------------------------------------------------------------
>
>
> Thank you.
>
> Sincerely Yours,
> Jinn-Yuh Guh, M.D.
> Division of Nephrology
> Department of Internal Medicine
> Kaohsiung Medical University
> 100 Zihyou 1st Rd.
> Kaohsiung, Taiwan 80756
> E-mail:guhjy@kmu.edu.tw
> TEL: 886-7-3121101 EXT.7353~12
> FAX: 886-7-3228721
>
>
> 2012/5/26 Steve Samuels <sjsamuels@gmail.com>:
>>
>> Rich Goldstein's nice summary contains a reference to Dick Kronmal's article:
>>
>> Kronmal, R. A. (1993). Spurious correlation and the fallacy of the ratio standard
>>  revisited. Journal of the Royal Statistical Society. Series A (Statistics in
>>  Society), 379-392.
>>
>> Dick's thinking (and title) were inspired by:
>>
>> Tanner, J. M. (1949). Fallacy of per-weight and per-surface area standards,
>> and their relation to spurious correlation. Journal of Applied Physiology, 2(1), 1-15.
>>
>> Happily, Tanner's article is available online:
>>
>> http://0-jap.physiology.org.library.pcc.edu/content/2/1/1.full.pdf+html
>>
>> Steve
>> sjsamuels@gmail.com
>>
>>
>> Your opening statement is more nearly incorrect than correct. In
>> general, X / Y is indeterminate whenever Y is 0; if X and Y are
>> normally distributed that is an event with probability 0 (which still
>> means possible) but the ratio is otherwise well defined.
>>
>> If Y is ever 0 in your data then the ratio X / Y is unlikely to make
>> scientific sense and so the question of what you can and can't do with
>> it statistically doesn't really arise.
>>
>> I don't think there is a simple answer to whether you should use
>> ratios in regression. Often it is scientifically natural; often it is
>> pretty dangerous.
>>
>> For one statement of various pitfalls see list member RIchard
>> Goldstein on ratios:
>>
>> http://biostat.mc.vanderbilt.edu/wiki/pub/Main/BioMod/goldstein.ratios.pdf
>>
>> Better advice might depend on your giving more details on what you
>> want to, mentioning the scientific or medical context as well.
>>
>> Nick
>>
>> On Fri, May 25, 2012 at 5:36 AM,  <guhjy@kmu.edu.tw> wrote:
>>
>>> The ratio of two normally distributed variables (X and Y) has no mean
>>> or variance.
>>> 1. Why is it valid that the "ratio" command estimates the mean and se of ratios?
>>> 2. Is it valid to use the individual ratios (i.e. Xi/Yi) in the
>>> dependent or independent part of a regression model?
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>>
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/

--
Tirthankar Chakravarty
tchakravarty@ucsd.edu
tirthankar.chakravarty@gmail.com

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```

• Follow-Ups: