Daniel Waxman asked:
I have generated an empiric frequency distribution, using -contract-, such
that I have two variables-- the variable of interest (hour of the day that a
patient is discharged from the hospital), and the cumulative frequency of
from a set of empiric observations.
How can I generate a function which picks random hours in a frequency
proportional to this distribution?
--------------------------------------------------------------------------------
You can test whether a uniform random variate is at least as great as each
level of the cumulative relative frequency and increment the hour variable
if it is.
I've illustrated it below, first creating a fictional hospital-discharge
dataset to use as the starting point. Following your path, it first
uses -contract- to get the empirical cumulative percent distribution.
To create a random dataset that follows the distribution, first
use -levelsof- to put the cumulative percentage values into a macro. Then,
generate a random uniform variate for as large a dataset as you wish (the
illustration uses 100 000). To fill in the hour variable, test whether the
uniform random variate exceeds the respective cumulative percentage point
stored in the macro, and increment the hour variable whenever it does.
Joseph Coveney
clear
set memory 50M
set more off
* Creating fictional ca. 10000-patient
* hospital discharge dataset
* with peak discharge rate in midafternoon
set obs 14
generate byte hour = 7 + _n // Discharges during hour ending at . . .
generate count = floor( `=1e4'* (cos(2 * _pi * (_n) / _N - _pi) + 1 ) / _N)
graph7 count hour, xlabel(8(1)21) // No discharges after 8:00 p.m.
*
* BEGIN HERE:
* 1. Get empirical cumulative distribution function
contract hour [fweight=count], cpercent(cumulative_distribution) float
levelsof cumulative_distribution, local(cut_points)
*
* 2. Create random variate (
drop _all
set seed `=date("2006-02-17", "ymd")'
set obs `=10e5'
generate randu = uniform()
*
* 3. Wrap-up: populate with randomly drawn hours
* that follow empirical distribution function
generate byte hour = 8
foreach cut_point of local cut_points {
replace hour = hour + (randu > `cut_point' / 100)
}
graph7 hour, bin(14) xlabel(8(1)20)
tabulate hour
exit
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/