# st: re: egen to calculate industry medians with own firm excluded

 From Kit Baum <[email protected]> To [email protected] Subject st: re: egen to calculate industry medians with own firm excluded Date Fri, 21 Dec 2007 11:02:10 -0500

To complete the discussion of Friedrich Huebler's query, I present here three solutions to the problem (none of which I have written). The first is the implementation of Nick Cox's suggestion from his FAQ
http://www.stata.com/support/faqs/data/members.html
Nick graciously provided a correction to my imperfect implementation of the FAQ's suggestions which made it work properly.

The second is Friedrich's corrected code.

The third is Ben Jann's. When this problem was posed I immediately thought of jackknife computations, as that's what a jackknife does-- it calculates a statistic leaving one out. A bit tricker here, naturally, but Ben shows how his functions from -moremata- that calculate medians and jackknife statistics may be used for a Mata- based solution. That might seem unnecessarily complex, but Mata's speed advantage over do-file code might make an important difference to someone trying to do something like this on a really large data set.

---------
sysuse auto, clear
keep mpg rep78
drop if rep78==.
egen group = group(rep78 )
gen medianNJC = .

// per NJC FAQ, as interpreted by Nick Cox
bys group: gen pid=_n
su pid, mean
qui forvalues i = 1/`r(max)' {
// Note that in this case it is crucial to define the entry as missing if own-entry
egen work = median(cond(pid ~= `i', mpg, .)), by(group)
replace medianNJC = work if pid==`i'
drop work
}

// per Friedrich Huebler, corrected
gen medianFH = .
count
local n = r(N)
sort group
quietly forvalues i = 1/`n' {
gen temp = mpg if `i' != _n
by group: egen temp2 = median(temp)
replace medianFH = temp2 in `i'
drop temp temp2
}

// per Ben Jann, making use of Mata and his -moremata- package functions for
// median and jackknife calculations

prog drop _all
mata: mata clear

*! version 1.0.0 20dec2007 Ben Jann
prog jkmedian, byable(onecall)
syntax varname [if] [in], Generate(name)
qui gen `generate' = .
if _by() local by "by `_byvars':"
qui `by' _jkmedian `varlist' `if' `in', generate(`generate')
end
prog _jkmedian, byable(recall)
syntax varname [if] [in], Generate(name)
marksample touse
mata: _jkmedian()
end
mata:
void _jkmedian()
{
real scalar touse
real colvector x
struct mm_jkstats scalar jk

touse = st_varindex(st_local("touse"))
st_view(x, ., st_local("varlist"), touse)

jk = mm_jk(&__jkmedian(), x, 1, 1)
st_store(., st_local("generate"), touse, jk.rstat)

}
real scalar __jkmedian(real colvector x, real colvector w)
{
return(mm_median(select(x, w:>0)))
}
end
bysort group: jkmedian mpg, gen(medianBJ)

// compare the three approaches

g diffCH = medianNJC-medianFH
g diffCJ = medianNJC-medianBJ
g diffHJ = medianFH-medianBJ
list mpg med* diff*, sepby(group)

--------

Best wishes of the season
Kit

Kit Baum, Boston College Economics and DIW Berlin
http://ideas.repec.org/e/pba1.html
An Introduction to Modern Econometrics Using Stata:
http://www.stata-press.com/books/imeus.html

*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/