Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Median calculation for interval data


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: Median calculation for interval data
Date   Fri, 9 Mar 2007 11:13:56 -0000

One specific comment is that using -preserve- is 
a bit heavy-handed for a problem like this. 

A more general comment is that 
gearing your program closely to the immediate details of
the problem -- even on a cosmetic level like option names
such as -hholds()- -- makes it more difficult to see the 
generic problem -- and to reuse this code later for some 
related problem. 

I would base this more on what -summarize- 
will do for you in any case. As I understand it, 
you have a variable and what are in effect frequency 
weights. The frequency weights are in your data
the number of households in each interval. 

su myvar [fw=freq], detail 

leaves behind r(p50). 

Then you can count, e.g. 

su freq if myvar < r(p50), meanonly 
local lower = r(sum) 
su freq if myvar = r(p50), meanonly 
local equal = r(sum) 
su freq if myvar > r(p50), meanonly 
local upper = r(sum) 

A program might start looking like this: 

program imedian, rclass 
	version 8 
	syntax varname(numeric) [fweight aweight] [if] [in] ///
	, lower(varname numeric)

	marksample touse 
	qui count if `touse' 
	if r(N) == 0 error 2000 

	quietly {
		tempvar wt 
		tempname median fu fe fl

		su `varlist' if `touse', detail 
		scalar `median' = r(p50) 

		gen double `wt' = `exp' if `touse' 
		su `wt' if `varlist' > r(p50) & `touse', meanonly 
		scalar `fu' = r(sum)
		su `wt' if `varlist' == r(p50) & `touse', meanonly 
		scalar `fe' = r(sum)
		su `wt' if `varlist' < r(p50) & `touse', meanonly 
		scalar `fl' = r(sum) 

		levelsof `lower' if `touse', local(levels)
		tokenize `levels' 
		local i = 1 
		while `median' > ``i'' { 
			local ++i 
		} 
		local lo = ``i'' 
		local ++i 
		local hi = ``i'' 

		scalar `median' = <your formula> 
	}

	di as txt "median: " as res `median' 
	return scalar median = `median'
end 	

Nick 
n.j.cox@durham.ac.uk 

Chris Ruebeck
 
> I have written an ado file to calculate a version of the median for  
> interval data as described below.  A synopsis: when there are many  
> observations with the median value, we may believe there is some  
> information in the distribution of observations above, below, and  
> within the median value.
> 
> My question for Statalist: is there an existing Stata ado 
> file that I  
> could have used?
> 
> I would also appreciate any comments on the method that I used.
 
> Method: Calculate the fraction of the median interval above the  
> median interval's lower bound necessary to have half of all  
> observations above and half below, assuming that in the median  
> interval the observations are evenly distributed.  Thus, if 
> there are  
> 25 observations above the median interval and 75 observations below  
> it, 80 observations in it, and the median interval is [10, 15), then  
> the "median" is
> 
> 	10 + (15-10)*((25 + 80 + 75)/2 - 75)/80 = 10.9375.
> 
> capture program drop intervalMedian
> program define intervalMedian, rclass
> 	syntax if, ///
> 		  lowlim(varname numeric) /// Lower limit of interval
> 		  hholds(varname numeric) // Number of 
> households in this interval
> 	preserve
> 	marksample touse
> 	keep if `touse'
> 	keep `lowlim' `hholds'
> 	tempvar runSum /// The runing sum
> 		  markMed /// 0, -1, 2 marker for below, median, above
> 		  upper // upper limit of interval
> 
> 	// Get the upper limit for each one
> 	sort `lowlim'
> 	generate `upper' = `lowlim'[_n+1]
> 
> 	// Find the median interval
> 	generate `runSum' = sum(`hholds') // Final observation is total
> 	local halfObs = `runSum'[_N]/2 // The index of the median
> 	generate `markMed' = `runSum' - `halfObs' // Negative 
> below median  
> interval
> 	replace `markMed' = cond(`markMed'<0,0,2) // Marks at & 
> above median  
> interval
> 	sort `markMed' `lowlim' // Already in this order, but 
> Stata doesn't  
> know
> 	by `markMed': replace `markMed' = -1 if _n==1 & 
> `markMed'==2 // The  
> median
> 
> 	// Collect values necessary for calculation (could be 1 
> line instead  
> of 3)
> 	sort `markMed' // The median interval is now the first 
> observation
> 	local countBelow = `runSum' - `hholds' // # below 
> median interval
> 	local intervalBelow = `halfObs' - `countBelow' // # 
> below median in  
> interval
> 	local theMedian = (`intervalBelow'/`hholds')*(`upper' - 
> `lowlim') +  
> `lowlim'
> 	return local median `theMedian'
> 	display "The median: `theMedian'"
> 
> 	restore
> end

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index