[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: re: egen to calculate industry medians with own firm excluded

From   Kit Baum <[email protected]>
To   [email protected]
Subject   st: re: egen to calculate industry medians with own firm excluded
Date   Fri, 21 Dec 2007 11:02:10 -0500

To complete the discussion of Friedrich Huebler's query, I present here three solutions to the problem (none of which I have written). The first is the implementation of Nick Cox's suggestion from his FAQ
Nick graciously provided a correction to my imperfect implementation of the FAQ's suggestions which made it work properly.

The second is Friedrich's corrected code.

The third is Ben Jann's. When this problem was posed I immediately thought of jackknife computations, as that's what a jackknife does-- it calculates a statistic leaving one out. A bit tricker here, naturally, but Ben shows how his functions from -moremata- that calculate medians and jackknife statistics may be used for a Mata- based solution. That might seem unnecessarily complex, but Mata's speed advantage over do-file code might make an important difference to someone trying to do something like this on a really large data set.

sysuse auto, clear
keep mpg rep78
drop if rep78==.
egen group = group(rep78 )
gen medianNJC = .

// per NJC FAQ, as interpreted by Nick Cox
bys group: gen pid=_n
su pid, mean
qui forvalues i = 1/`r(max)' {
// Note that in this case it is crucial to define the entry as missing if own-entry
egen work = median(cond(pid ~= `i', mpg, .)), by(group)
replace medianNJC = work if pid==`i'
drop work

// per Friedrich Huebler, corrected
gen medianFH = .
local n = r(N)
sort group
quietly forvalues i = 1/`n' {
gen temp = mpg if `i' != _n
by group: egen temp2 = median(temp)
replace medianFH = temp2 in `i'
drop temp temp2

// per Ben Jann, making use of Mata and his -moremata- package functions for
// median and jackknife calculations

prog drop _all
mata: mata clear

*! version 1.0.0 20dec2007 Ben Jann
prog jkmedian, byable(onecall)
syntax varname [if] [in], Generate(name)
qui gen `generate' = .
if _by() local by "by `_byvars':"
qui `by' _jkmedian `varlist' `if' `in', generate(`generate')
prog _jkmedian, byable(recall)
syntax varname [if] [in], Generate(name)
marksample touse
mata: _jkmedian()
void _jkmedian()
real scalar touse
real colvector x
struct mm_jkstats scalar jk

touse = st_varindex(st_local("touse"))
st_view(x, ., st_local("varlist"), touse)

jk = mm_jk(&__jkmedian(), x, 1, 1)
st_store(., st_local("generate"), touse, jk.rstat)

real scalar __jkmedian(real colvector x, real colvector w)
return(mm_median(select(x, w:>0)))
bysort group: jkmedian mpg, gen(medianBJ)

// compare the three approaches

g diffCH = medianNJC-medianFH
g diffCJ = medianNJC-medianBJ
g diffHJ = medianFH-medianBJ
list mpg med* diff*, sepby(group)


Best wishes of the season

Kit Baum, Boston College Economics and DIW Berlin
An Introduction to Modern Econometrics Using Stata:

* For searches and help try:

© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index