Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Clumsy solution: skewed distributions

From   Reza C Daniels <>
Subject   st: Clumsy solution: skewed distributions
Date   Thu, 29 Sep 2005 16:20:24 +0200


I've come up with a fairly clumsy solution to the thread: "Generating skewed distributions on closed intervals" that we have been discussing today.

set obs 1000
gen u1=uniform()
gen b1=invnibeta(4,4,1,uniform()) /*symmetric*/
gen b2=invnibeta(4,2,1,uniform()) /*negative skew*/
gen b3=invnibeta(2,4,1,uniform()) /*positive skew*/

/* Apply following transformation to get range within desired interval [20,30] */

gen u2=20+(10*u1)

/* Repeat transformation above for b1-b3 */


A problem with this is that I get observations in the range (20,30) not [20,30]. I just can't seem to find a solution to explicitly stating the range of the beta distribution using -invnibeta-.

Any suggestions on improvements for the code below are of course welcomed.


Maarten Buis wrote:

There are various skewed distributions to choose from: the lognormal, Chi square, F, gamma. These all have a range from 0 to positive infinity.
A distribution that can be skewed, symmetric or flat depending on its parameters, and remains between a fixed range is the beta distribution.

-----Original Message-----
From: []On Behalf Of Reza C Daniels
Sent: donderdag 29 september 2005 13:29
Subject: Re: st: RE: Generating skewed distributions on closed intervals

Hi Maarten,

My problem is exactly one of data coarsening, as explained by Heitjan
and Rubin (JASA, 1991). The exception is that they applied this to
heights and I'm wanting to apply it to age.

I am also aware of the need to multiply impute. However, I wanted the
uniform, normal and skewed distributions first before imputing, so that
once I obtained the multiply imputed estimates, I would have something
to compare them to.


Maarten Buis wrote:

Hi Reza,

Will you be using this new age variable as a
dependent/explained/y-variable or as an

If you are using age as an explained variable you will probably end
up in survival analysis, and they have good techniques of dealing
with discrete time, so I see no need to invent something new there.
See: "An Introduction to Survival Analysis Using Stata" by Mario
Cleves, William W. Gould, and Roberto Gutierrez available from Stata

If you will be using age as an explanatory variable than it is good
to know that even very coarsely categorized variables often produce
good estimates. If you still want to do something about the
categorisation, than you would probably want to do some form of
multiple imputation. The way to think about it is that there is one
age distribution, which was chopped up in bits. You don't want to use
different distributions for each age band, since than you would
assume a very bumpy overall age distribution. So you would first
estimate the parameters of this age distribution. Than if you wanted
to draw an age for a person in category 20-30, you would draw from a
value this distribution truncated between 20 and 30. You would create
multiple datasets this way, estimate the regression or whatever other
parameter of interest for each of these datasets, and the mean of
these effects would be your estimate controlling for the
categorisation of age. However, I repeat that this is probably more
trouble that its worth.

I'd like to be sure that this is what you want, before I spent an
afternoon writing Stata code for you.


-----Original Message----- From:
[]On Behalf Of Reza C
Daniels Sent: donderdag 29 september 2005 12:34 To: Subject: Re: st: RE: Generating skewed
distributions on closed intervals

Hi Maarten,

I tried this in the following way:

set obs 100 -gen z1=invnorm(uniform())- where z>0 -gen z2=ln(z1)- for
positively skewed -gen z3=exp(z1)- for negatively skewed

As I'm sure you know, this gives me the correct shape of the
distributions I'm looking for, but the incorrect range.

So, I still can't solve it.

Thanks anyway, Reza

Maarten Buis wrote:

It reminds me of an ordered probit problem: you have one unobserved
distribution, which is being carved up. Only now you also have
information about where the cuts are made. This should be solvable.
You might want to look at the log normal instead of the normal
though, since no one can get, or has ever been, -2 (even with
plastic surgery).

-----Original Message----- From:
[]On Behalf Of Nick Cox
Sent: donderdag 29 september 2005 11:09 To: Subject: RE: st: RE: Generating
skewed distributions on closed intervals

Well, I guess wildly the literature you are unaware of holds better
solutions, but that's an empty comment as I don't know what it is.
The idea that an age distribution is a bunch of little truncated
Gaussians sitting next to each other on a line sounds at best
strange to me, but as I said I don't understand what your problem


Reza C Daniels

There is a literature on this problem that I am aware of. I'm
just having trouble with the code in Stata to generate my
required results.

Whatever your problem is, it is difficult to believe that there
is not a literature on it, e.g. in demography, actuarial
science, population ecology.

* *   For searches and help try: * * *
* *   For searches and help try: * * *

* *   For searches and help try: * * *
*   For searches and help try:

*   For searches and help try:
*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index