Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: Fwd: stata question

 From Nick Cox To "statalist@hsphsun2.harvard.edu" Subject Re: st: Fwd: stata question Date Thu, 30 May 2013 11:21:42 +0100

```Maarten gave a good guess at what i wanted here, but

1. If the distribution is really skew, fitting a normal remains a bad
idea, even when you have
done your best to account for truncation.

2. Something like number of employees is clearly bounded below,
regardless of what is reported, and to that extent no normal
distribution is right in principle. Whether it in practice gives a
rough approximation that is not too lousy for the researcher's purpose
is naturally too difficult to judge by remote sensing.

Nick
njcoxstata@gmail.com

On 30 May 2013 08:58, Maarten Buis <maartenlbuis@gmail.com> wrote:
> ---Bernard Alex wrote me privately:
>> I found your contact on Statalist, read about you on your
>> website and thought you can easily help me get through
>> an exercise I need to perform.
>
> That is not the way Statalist works. Questions should not be sent to
> its individual members but to the list. There are very good reasons
> for that that are listed here:
> <http://www.stata.com/support/faqs/resources/statalist-faq/#private>
>
>> I have the size distribution of a group of firms. I know size
>> is not normally distributed; the distribution is skewed and
>> both left and right truncated (50-499 employees).
>>
>> Now, what I would like to get is the underlying data of size
>> assuming it was normally distributed (the green line in the graph).
>>
>> Accordingly I tried to get the normal distribution given the true
>> mean and SD: gen normala=invnorm(uniform())*97.27415+146.2396
>>
>> 97.27415 is the standard deviation and 146.2396 the mean.
>>
>> Question: is it possible to impose the support of the distribution?
>> I would like to have the normal distribution between the two
>> extremes 50 and 499.
>
> That would mean you want to fit a truncated normal distribution to
> your data and sample from that distribution. You can fit the
> parameters of that distribution (for fixed truncation points, in your
> case ll(50) and ul(499)) using -truncreg-. Than you san sample from
> that distribution like in the example below:
>
> *------------------ begin example ------------------
> sysuse nlsw88, clear
>
> // find the mean and standard deviation for the
> // non-truncated normal
> truncreg wage, ll(2) ul(40)
>
> tempname mu sigma alpha beta diff
> scalar `mu'    = _b[_cons]
> scalar `sigma' = [sigma]_b[_cons]
> scalar `alpha' = normal(( 2 - `mu') / `sigma')
> scalar `beta'  = normal((40 - `mu') / `sigma')
> scalar `diff'  = `beta' - `alpha'
>
> // create 19 simulated variables from this distribution:
> forvalues i = 1/19 {
>     gen sim`i' = invnormal(                  ///
>                  `alpha' + runiform()*`diff' ///
>                  )*`sigma' + `mu'
> }
>
> // compare observed distribution with simulated distribution
> local opts "sort lpattern(solid) lcolor(gs8)"
> forvalues i = 1/19 {
>     cumul sim`i', gen(c`i')
>     local graph "`graph' line c`i' sim`i', `opts' ||"
> }
> cumul wage, gen(c)
> twoway `graph' scatter c wage, msymbol(oh) ///
>        legend(order(20 "observed" 1 "simulated"))
> *------------------- end example -------------------
> (For more on examples I sent to the Statalist see:
> http://www.maartenbuis.nl/example_faq )
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/
```