Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Trying to simulate sampling distribution of mean

From   Nick Cox <>
Subject   Re: st: Trying to simulate sampling distribution of mean
Date   Tue, 29 Jan 2013 23:51:32 +0000

Your program -ybar- does exactly the same thing every time, so
inevitably the results are the same. If you look again at the help for
-simulate- you will see that the example program -lnsim- includes its
own random variate generation. Conversely, you do use -sample 0.1- but
you use it outside your program.

Otherwise put, -simulate- does not actually do stochastic simulation;
it is just a framework that runs and collates the results of a program
you write -- and that program must do the simulation

In your case, there is an easy way of getting random samples from your
dataset. Just chop the dataset into blocks randomly and summarize each
block. .

If you shuffle your data

set seed 2803
gen random = runiform()
sort random

and create blocks of size 100

gen block = ceil(_n/100)


egen mean = mean(age), by(block)
egen tag = tag(block)
l mean if tag

that will give you 1000 means each for blocks of size 100. For some
reason, it seems that you only want 5, and that means you can throw
995 away.


On Tue, Jan 29, 2013 at 11:15 PM, krishanu karmakar
<> wrote:

> The following is my code
> ==== code start =====
> program define ybar, rclass
>         syntax [,]
>         replace y1 = y2
>         summarize y1
>         return scalar m_y = r(mean)
> end
> local reps 5
>         quietly use big.dta, clear
>         generate y2 = age
>         sample 0.1
>         quietly{
>         gen y1=.
>         simulate m_age=r(m_y), saving(meandata, replace) nodots reps(`reps'): ybar
> }
> ==== code ends =====
> What I am trying to do.
> I have a dataset named "big.dta" with 100,000 observations. The only
> variable in this dataset is "age".
> I want to first draw a sample of size 100 from this dataset and
> calculate the mean for the variable "age". I want to draw 5 such
> samples and store the mean of "age" from each sample as the variable
> "m_age" in a new dataset called "meandata". So this dataset will have
> 5 observations.
> My code is running, but wrongly. I am getting stata to save the
> "meandata", but all the five observations (mean of age from 5
> different samples) are stored as equal in value. That means stata is
> not drawing 5 different samples, but only one sample. Could anyone
> help by showing which line my code should I change?
*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index