st: RE: Median calculation for interval data

 From "Nick Cox" To Subject st: RE: Median calculation for interval data Date Fri, 9 Mar 2007 11:13:56 -0000

```One specific comment is that using -preserve- is
a bit heavy-handed for a problem like this.

A more general comment is that
gearing your program closely to the immediate details of
the problem -- even on a cosmetic level like option names
such as -hholds()- -- makes it more difficult to see the
generic problem -- and to reuse this code later for some
related problem.

I would base this more on what -summarize-
will do for you in any case. As I understand it,
you have a variable and what are in effect frequency
weights. The frequency weights are in your data
the number of households in each interval.

su myvar [fw=freq], detail

leaves behind r(p50).

Then you can count, e.g.

su freq if myvar < r(p50), meanonly
local lower = r(sum)
su freq if myvar = r(p50), meanonly
local equal = r(sum)
su freq if myvar > r(p50), meanonly
local upper = r(sum)

A program might start looking like this:

program imedian, rclass
version 8
syntax varname(numeric) [fweight aweight] [if] [in] ///
, lower(varname numeric)

marksample touse
qui count if `touse'
if r(N) == 0 error 2000

quietly {
tempvar wt
tempname median fu fe fl

su `varlist' if `touse', detail
scalar `median' = r(p50)

gen double `wt' = `exp' if `touse'
su `wt' if `varlist' > r(p50) & `touse', meanonly
scalar `fu' = r(sum)
su `wt' if `varlist' == r(p50) & `touse', meanonly
scalar `fe' = r(sum)
su `wt' if `varlist' < r(p50) & `touse', meanonly
scalar `fl' = r(sum)

levelsof `lower' if `touse', local(levels)
tokenize `levels'
local i = 1
while `median' > ``i'' {
local ++i
}
local lo = ``i''
local ++i
local hi = ``i''

}

di as txt "median: " as res `median'
return scalar median = `median'
end

Nick
n.j.cox@durham.ac.uk

Chris Ruebeck

> I have written an ado file to calculate a version of the median for
> interval data as described below.  A synopsis: when there are many
> observations with the median value, we may believe there is some
> information in the distribution of observations above, below, and
> within the median value.
>
> My question for Statalist: is there an existing Stata ado
> file that I
> could have used?
>
> I would also appreciate any comments on the method that I used.

> Method: Calculate the fraction of the median interval above the
> median interval's lower bound necessary to have half of all
> observations above and half below, assuming that in the median
> interval the observations are evenly distributed.  Thus, if
> there are
> 25 observations above the median interval and 75 observations below
> it, 80 observations in it, and the median interval is [10, 15), then
> the "median" is
>
> 	10 + (15-10)*((25 + 80 + 75)/2 - 75)/80 = 10.9375.
>
> capture program drop intervalMedian
> program define intervalMedian, rclass
> 	syntax if, ///
> 		  lowlim(varname numeric) /// Lower limit of interval
> 		  hholds(varname numeric) // Number of
> households in this interval
> 	preserve
> 	marksample touse
> 	keep if `touse'
> 	keep `lowlim' `hholds'
> 	tempvar runSum /// The runing sum
> 		  markMed /// 0, -1, 2 marker for below, median, above
> 		  upper // upper limit of interval
>
> 	// Get the upper limit for each one
> 	sort `lowlim'
> 	generate `upper' = `lowlim'[_n+1]
>
> 	// Find the median interval
> 	generate `runSum' = sum(`hholds') // Final observation is total
> 	local halfObs = `runSum'[_N]/2 // The index of the median
> 	generate `markMed' = `runSum' - `halfObs' // Negative
> below median
> interval
> 	replace `markMed' = cond(`markMed'<0,0,2) // Marks at &
> above median
> interval
> 	sort `markMed' `lowlim' // Already in this order, but
> Stata doesn't
> know
> 	by `markMed': replace `markMed' = -1 if _n==1 &
> `markMed'==2 // The
> median
>
> 	// Collect values necessary for calculation (could be 1
> of 3)
> 	sort `markMed' // The median interval is now the first
> observation
> 	local countBelow = `runSum' - `hholds' // # below
> median interval
> 	local intervalBelow = `halfObs' - `countBelow' // #
> below median in
> interval
> 	local theMedian = (`intervalBelow'/`hholds')*(`upper' -
> `lowlim') +
> `lowlim'
> 	return local median `theMedian'
> 	display "The median: `theMedian'"
>
> 	restore
> end

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```