Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Nick Cox <njcoxstata@gmail.com> |

To |
"statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |

Subject |
Re: st: Fwd: stata question |

Date |
Thu, 30 May 2013 11:21:42 +0100 |

Maarten gave a good guess at what i wanted here, but 1. If the distribution is really skew, fitting a normal remains a bad idea, even when you have done your best to account for truncation. 2. Something like number of employees is clearly bounded below, regardless of what is reported, and to that extent no normal distribution is right in principle. Whether it in practice gives a rough approximation that is not too lousy for the researcher's purpose is naturally too difficult to judge by remote sensing. Nick njcoxstata@gmail.com On 30 May 2013 08:58, Maarten Buis <maartenlbuis@gmail.com> wrote: > ---Bernard Alex wrote me privately: >> I found your contact on Statalist, read about you on your >> website and thought you can easily help me get through >> an exercise I need to perform. > > That is not the way Statalist works. Questions should not be sent to > its individual members but to the list. There are very good reasons > for that that are listed here: > <http://www.stata.com/support/faqs/resources/statalist-faq/#private> > >> I have the size distribution of a group of firms. I know size >> is not normally distributed; the distribution is skewed and >> both left and right truncated (50-499 employees). >> >> Now, what I would like to get is the underlying data of size >> assuming it was normally distributed (the green line in the graph). >> >> Accordingly I tried to get the normal distribution given the true >> mean and SD: gen normala=invnorm(uniform())*97.27415+146.2396 >> >> 97.27415 is the standard deviation and 146.2396 the mean. >> >> Question: is it possible to impose the support of the distribution? >> I would like to have the normal distribution between the two >> extremes 50 and 499. > > That would mean you want to fit a truncated normal distribution to > your data and sample from that distribution. You can fit the > parameters of that distribution (for fixed truncation points, in your > case ll(50) and ul(499)) using -truncreg-. Than you san sample from > that distribution like in the example below: > > *------------------ begin example ------------------ > sysuse nlsw88, clear > > // find the mean and standard deviation for the > // non-truncated normal > truncreg wage, ll(2) ul(40) > > tempname mu sigma alpha beta diff > scalar `mu' = _b[_cons] > scalar `sigma' = [sigma]_b[_cons] > scalar `alpha' = normal(( 2 - `mu') / `sigma') > scalar `beta' = normal((40 - `mu') / `sigma') > scalar `diff' = `beta' - `alpha' > > // create 19 simulated variables from this distribution: > forvalues i = 1/19 { > gen sim`i' = invnormal( /// > `alpha' + runiform()*`diff' /// > )*`sigma' + `mu' > } > > // compare observed distribution with simulated distribution > local opts "sort lpattern(solid) lcolor(gs8)" > forvalues i = 1/19 { > cumul sim`i', gen(c`i') > local graph "`graph' line c`i' sim`i', `opts' ||" > } > cumul wage, gen(c) > twoway `graph' scatter c wage, msymbol(oh) /// > legend(order(20 "observed" 1 "simulated")) > *------------------- end example ------------------- > (For more on examples I sent to the Statalist see: > http://www.maartenbuis.nl/example_faq ) * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**References**:**st: Fwd: stata question***From:*Maarten Buis <maartenlbuis@gmail.com>

- Prev by Date:
**RE: st: RE: % of variance in factor analysis** - Next by Date:
**st: Interpretation of interaction term in nonlinear models** - Previous by thread:
**st: Fwd: stata question** - Next by thread:
**st: Can I have SEMs' coefficients results on the diagram if I change the starting values?** - Index(es):