Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: difference in medians . Raw vs calculated

 From Steve Samuels To statalist@hsphsun2.harvard.edu Subject Re: st: difference in medians . Raw vs calculated Date Sat, 19 Jan 2013 21:02:48 -0500

```Richard-

Thanks for illustrating your problem with an accessible data set. Too
few posters do.  That said, nothing strange is going on here.

1. -cendif- estimates the "generalized Hodges-Lehmann median
difference", which is the median of possible draws of two observations,
one from each population. This is not the same as the "difference in
medians".

2. The output for -cid- clearly states that the command is computing a
difference in means, not medians.

3. Example 1 in the -help- for -qreg- discusses why the estimated regression coefficient might not be the difference in medians.

4. Roger Newson's -bpmedian- package (SSC) estimates a Bonett-Price CI for
the median.

By the way, the p50 for a group is _not_ necessarily the sample median:

.tab weight if foreign

Weight |
(lbs.) |      Freq.     Percent        Cum.
------------+-----------------------------------
1,760 |          1        4.55        4.55
1,830 |          1        4.55        9.09
1,930 |          1        4.55       13.64
1,980 |          1        4.55       18.18
1,990 |          1        4.55       22.73
2,020 |          1        4.55       27.27
2,040 |          1        4.55       31.82
2,050 |          1        4.55       36.36
2,070 |          1        4.55       40.91
2,130 |          1        4.55       45.45
2,160 |          1        4.55       50.00
2,200 |          1        4.55       54.55
2,240 |          1        4.55       59.09
2,280 |          1        4.55       63.64
2,370 |          1        4.55       68.18
2,410 |          1        4.55       72.73
2,650 |          1        4.55       77.27
2,670 |          1        4.55       81.82
2,750 |          1        4.55       86.36
2,830 |          1        4.55       90.91
3,170 |          1        4.55       95.45
3,420 |          1        4.55      100.00
------------+-----------------------------------
Total |         22      100.00

Notice n = 22, an even number of observations, so the median is not
unique. By convention, it is the midpoint between the two middle observations,
the  11th and 12th, which is, for this data. (2160 +2200)/2 = 2180.
But it could be any value between 2160 and 2200.

Steve

Steven J. Samuels
Consultant in Statistics
18 Cantine's Island
Saugerties NY 12477 USA
Voice: 845-246-0774

> On Jan 19, 2013, at 8:00 PM, Richard Hiscock wrote:
>
> I wish to derive 95%CI for difference in medians and noticed that difference in raw median values between groups didn't equal that calculated using packages cendif (R.Newson) and cid (P.Royston) Clearly Im missing something and would be grateful for an explanation.
>
> I suspect it relates to a transformation performed prior to calculation of the difference & subsequent back transformation to original units.
>
> However it is hard to present raw unit median values and the the difference in medians (& CI) which are not the same. In my data set (plasma protein assay) the raw difference in medians is 0.5 whereas the difference calculated by cid or cendif is 0.33 making it hard to explain to readers.
>
>
>
>
> Illustrated using the auto data set:
>
>
>
> Use auto
>
> tabstat weight, by(foreign) stats(p50)
>
>
>
> Summary for variables: weight by categories of: foreign (Car type)
>
>
>
> foreign |       p50
>
> ---------+----------
>
> Domestic |      3360
>
> Foreign |      2180
>
> ---------+----------
>
>  Total |      3190
>
> --------------------
>
>
>
> *difference = 1180
>
>
>
>
>
> . cendif weight, by(foreign)
>
> Y-variable: weight (Weight (lbs.))
>
> Grouped by: foreign (Car type)
>
> Group numbers:
>
>
>
>  Car type |      Freq.     Percent        Cum.
>
> ------------+-----------------------------------
>
>  Domestic |         52       70.27       70.27
>
>   Foreign |         22       29.73      100.00
>
> ------------+-----------------------------------
>
>     Total |         74      100.00
>
> Transformation: Fisher's z
>
> 95% confidence interval(s) for percentile difference(s)
>
> between values of weight in first and second groups:
>
>  Percent    Pctl_Dif     Minimum     Maximum
>
>       50        1095         750        1330
>
>
>
> . cid weight,by(foreign) unpaired
>
>
>
> Normal-based confidence interval for difference in  means by foreign
>
>
>
> Variable |     Obs     Estimate    Std. Err.       [95% Conf. Interval]
>
> ---------+-------------------------------------------------------------
>
> weight |      74     1001.206    160.2876        681.6788    1320.734
>
>
>
> . qreg weight foreign
>
> Iteration  1:  WLS sum of weighted deviations =  34840.693
>
>
>
> Iteration  1: sum of abs. weighted deviations =      34860
>
> note:  alternate solutions exist
>
> Iteration  2: sum of abs. weighted deviations =      34620
>
> note:  alternate solutions exist
>
> Iteration  3: sum of abs. weighted deviations =      34580
>
>
>
> Median regression                                    Number of obs =        74
>
> Raw sum of deviations    48860 (about 3180)
>
> Min sum of deviations    34580                     Pseudo R2     =    0.2923
>
>
>
> ------------------------------------------------------------------------------
>
>     weight |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
>
> -------------+----------------------------------------------------------------
>
>    foreign |      -1150   223.2969    -5.15   0.000    -1595.134   -704.8659
>
>      _cons |       3350   121.7526    27.51   0.000     3107.291    3592.709
>
> ------------------------------------------------------------------------------
> //www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/
```