Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: RE: Histograms (was: Multiple (overlaid) Histogram)


From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   RE: st: RE: Histograms (was: Multiple (overlaid) Histogram)
Date   Fri, 30 May 2003 20:58:16 +0100

Marcello Pagano

> As I have said before, I would very much like, with Allan Reese, an
> option for equi-probability histograms.  I think that this
> is especially
> useful when thinking of the histogram as an estimator of an
> underlying
> density function of a continuous variable. I do not see a strong
> argument, other than tradition (laziness?), for being
> constrained to
> histograms with equi-spaced bins.

An interesting problem.

I see one key question arising.

If I understand this properly, we just need to find so many quantiles
x_(i) equally spaced on the probability scale and calculate bar
heights proportional to 1 / (x_(i+1) - x_(i)). In practice, it is
not that uncommon to get ties, implying infinite bar heights. We
can fudge this by interpolating linearly between different quantiles.

What lies below is a hack. The number of bins is wired in at no
more than 20. My experiments indicate that you need
a lot of data to avoid somewhat bizarre details. Essentially
you are bound to see the kind of instability inherent in crude
numerical derivatives.

Don't point out that density estimates are nicer! I know that.
This is an attempt at what Marcello wants.

Nick
[email protected]

Example:

sysuse auto
foreach v of var price-gear {
	eqprhistogram `v'
	more
}

program eqprhistogram
*! for Marcello, eppur si muove!
*! NJC 1.0.0 30 May 2003
	version 8
	syntax varname(numeric) [if] [in] [ , bin(numlist int >0 <=20) * ]

	// #bins <= 20
	if "`bin'" == "" local bin = 20
	local binp1 = `bin' + 1
	local binm1 = `bin' - 1

	// enough data?
	marksample touse
	qui count if `touse'
	if `r(N)' < `binp1' {
		di as err "insufficient observations"
		error 2000
	}

	tempvar quantile qnum qid qadd width x height id
	qui {
		// get quantiles
		su `varlist' if `touse', meanonly
		generate `quantile' = r(min) in 1
		replace `quantile' = r(max) in `binp1'
		_pctile `varlist' if `touse', nq(`bin')
		forval i = 1/`binm1' {
			replace `quantile' = r(r`i')  in `= `i' + 1'
		}

		// prepare data set for graph: massage tied quantiles
		preserve
		keep if `quantile' < .
		bysort `quantile' : gen byte `qnum' = _N
		by `quantile' : gen byte `qid' = _n
		gen `qadd' = `quantile'[_n + `qnum' - `qid' + 1] - `quantile'
		replace `qadd' = `qadd' * (`qid' - 1) / `qnum'
		replace `quantile' = `quantile' + `qadd' if `qid' > 1

		gen `width' = `quantile'[_n+1] - `quantile'
		gen `x' = (`quantile'[_n+1] + `quantile')/2
		gen `height' = `qnum' / (`bin' * `width')

		// vertices of bars
        	expand 6
	        sort `quantile'
        	gen byte `id' = mod(_n,6)
	        replace `height' = 0  if (`id' <= 1 | `id' >= 4)
        	replace `x' = `x' - 0.5 * `width' ///
	        	if `id' == 1 | `id' == 2 | `id' == 5
	        replace `x' = `x' + 0.5 * `width' ///
	        	if `id' == 3 | `id' == 4 | `id' == 0
	}

	// graph
	_crcslbl `x' `varlist'
	label var `height' "density"
    	twoway area `height' `x', bstyle(histogram) `options'
end


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index