Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Median calculation for interval data


From   Chris Ruebeck <[email protected]>
To   [email protected]
Subject   st: Median calculation for interval data
Date   Thu, 8 Mar 2007 20:34:18 -0500

I have written an ado file to calculate a version of the median for interval data as described below. A synopsis: when there are many observations with the median value, we may believe there is some information in the distribution of observations above, below, and within the median value.

My question for Statalist: is there an existing Stata ado file that I could have used?

I would also appreciate any comments on the method that I used.

Thanks,
Chris


Method: Calculate the fraction of the median interval above the median interval's lower bound necessary to have half of all observations above and half below, assuming that in the median interval the observations are evenly distributed. Thus, if there are 25 observations above the median interval and 75 observations below it, 80 observations in it, and the median interval is [10, 15), then the "median" is

10 + (15-10)*((25 + 80 + 75)/2 - 75)/80 = 10.9375.


capture program drop intervalMedian
program define intervalMedian, rclass
syntax if, ///
lowlim(varname numeric) /// Lower limit of interval
hholds(varname numeric) // Number of households in this interval
preserve
marksample touse
keep if `touse'
keep `lowlim' `hholds'
tempvar runSum /// The runing sum
markMed /// 0, -1, 2 marker for below, median, above
upper // upper limit of interval

// Get the upper limit for each one
sort `lowlim'
generate `upper' = `lowlim'[_n+1]

// Find the median interval
generate `runSum' = sum(`hholds') // Final observation is total
local halfObs = `runSum'[_N]/2 // The index of the median
generate `markMed' = `runSum' - `halfObs' // Negative below median interval
replace `markMed' = cond(`markMed'<0,0,2) // Marks at & above median interval
sort `markMed' `lowlim' // Already in this order, but Stata doesn't know
by `markMed': replace `markMed' = -1 if _n==1 & `markMed'==2 // The median

// Collect values necessary for calculation (could be 1 line instead of 3)
sort `markMed' // The median interval is now the first observation
local countBelow = `runSum' - `hholds' // # below median interval
local intervalBelow = `halfObs' - `countBelow' // # below median in interval
local theMedian = (`intervalBelow'/`hholds')*(`upper' - `lowlim') + `lowlim'
return local median `theMedian'
display "The median: `theMedian'"

restore
end
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/




© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index