[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
wgould@stata.com (William Gould, StataCorp LP) |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Generate Random Variable Given Percentile Values |

Date |
Tue, 17 Feb 2009 13:27:44 -0600 |

Peter Graven <grave165@umn.edu> asks, > How do you create a random variable given just the percentile > statistics for a distribution? For example the 1%, 5%, 10%, 25%, 50%, > 75,%, 90%, 95%, 99% percentiles are, respectively, 795, 1834, 2172, > 2641, 3174, 3822, 4900, 5981, 8522. That is, I would like to create a > variable with this distribution in my dataset. What a great question! Stata really should have a function that will do this for you, but it doesn't. So we'll have to write our own. First, let's figure out how we are going to proceed. Forgive me for being so didactic;I'm talking to myself. We have, q .01 .05 .10 .25 .50 .75 .90 .95 .99 ------------------------------------------------------ y 795 1834 2172 2641 3174 3822 4900 4981 8522 Let's imagining pulling a value u at random from uniform(). If that value were .01, or .05, or ..., we know what our funtion should return: 795, or 1834, or ... What if we get a value between two of the predefined quantiles? Well, we could assume the values that y can take on are fixed and those values are 695, 1834, ..., but looking at those values, I bet y is continuous, so let's linearly interpolate. That leaves on more problem. What if we u=.002 or u=.995? Peter didn't tell us the range of y, but we'll need to know it. I'm going to pretend the table reads q 0 .01 .05 .10 .25 .50 .75 .90 .95 .99 1 ---------------------------------------------------------------- y 0 795 1834 2172 2641 3174 3822 4900 4981 8522 10000 So I'm asserting that y is bounded by [0, 10000). Peter can substitute more reasonable end points. Okay, I know what I need to code. We could write this as a Stata program -- an ado-file -- but the result would run slowly and be difficult to read. I'm going to use Mata. Here's my program: ------------------------------------------------ myprog.do ------------------- capture mata mata drop u_empirical() mata: real colvector u_empirical(real scalar n, real vector q, real vector y) { real scalar i, k real scalar u, a real colvector result if (length(q)!=length(y)) _error(3200) if (length(q)<2) _error(3300) result = J(n, 1, .) for (i=1; i<=n; i++) { u = runiform(1,1) if (u==0) result[i] = y[1] else { for (k=2; 1; k++) { if (q[k] > u) break } a = (u-q[k-1])/(q[k]-q[k-1]) result[i] = (1-a)*y[k-1] + a*y[k] } } return(result) } end ------------------------------------------------------------------------------ Let's try it, . do myprog (output omitted) . mata : q = ( 0, .1, .5, .1, .25, .5, .75, .90, .95, .99, 1) : y = ( 0, 795,1834,2172,2641,3174,3822,4900,5981,8522,10000) : random = u_empirical(5, q, y) 1 +---------------+ 1 | 891.0661438 | 2 | 3545.227971 | 3 | 3323.821994 | 4 | 3445.628493 | 5 | 3651.384132 | +---------------+ : end I think that's right. Actually, I'm pretty sure it's right because, behind the scenes, I included the line -u, result[i]- at the end of the outside for loop so that I could see the value of u chosen and verify that that the program calculated the correct result. Doubtlessly Peter wants these values a Stata dataset: . drop _all . set obs 50 obs was 0, now 50 . gen x = . (50 missing values generated) . set seed 20399 . mata : st_view(myx=., ., "x") : myx[.] = u_empirical(50, q, y) : end . list in 1/5 +----------+ | x | |----------| 1. | 3954.497 | 2. | 1015.753 | 3. | 7914.299 | 4. | 1564.397 | 5. | 995.6785 | +----------+ -- Bill wgould@stata.com * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**Re: st: RE: graph hbox with ylabel(angle())** - Next by Date:
**Re: st: estout a matrix** - Previous by thread:
**st: RE: Generate Random Variable Given Percentile Values** - Next by thread:
**st: How can estimate dicount rate for each household in the panel data** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |