Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Trying to simulate sampling distribution of mean

From	Nick Cox <[email protected]>
To	[email protected]
Subject	Re: st: Trying to simulate sampling distribution of mean
Date	Tue, 29 Jan 2013 23:51:32 +0000

Your program -ybar- does exactly the same thing every time, so
inevitably the results are the same. If you look again at the help for
-simulate- you will see that the example program -lnsim- includes its
own random variate generation. Conversely, you do use -sample 0.1- but
you use it outside your program.

Otherwise put, -simulate- does not actually do stochastic simulation;
it is just a framework that runs and collates the results of a program
you write -- and that program must do the simulation

In your case, there is an easy way of getting random samples from your
dataset. Just chop the dataset into blocks randomly and summarize each
block. .

If you shuffle your data

set seed 2803
gen random = runiform()
sort random

and create blocks of size 100

gen block = ceil(_n/100)

then

egen mean = mean(age), by(block)
egen tag = tag(block)
l mean if tag

that will give you 1000 means each for blocks of size 100. For some
reason, it seems that you only want 5, and that means you can throw
995 away.

Nick

On Tue, Jan 29, 2013 at 11:15 PM, krishanu karmakar
<[email protected]> wrote:

> The following is my code
>
> ==== code start =====
>
> program define ybar, rclass
>         syntax [,]
>         replace y1 = y2
>         summarize y1
>         return scalar m_y = r(mean)
> end
>
>
> local reps 5
>
>         quietly use big.dta, clear
>         generate y2 = age
>         sample 0.1
>
>         quietly{
>         gen y1=.
>         simulate m_age=r(m_y), saving(meandata, replace) nodots reps(`reps'): ybar
> }
>
> ==== code ends =====
>
> What I am trying to do.
> I have a dataset named "big.dta" with 100,000 observations. The only
> variable in this dataset is "age".
>
> I want to first draw a sample of size 100 from this dataset and
> calculate the mean for the variable "age". I want to draw 5 such
> samples and store the mean of "age" from each sample as the variable
> "m_age" in a new dataset called "meandata". So this dataset will have
> 5 observations.
>
> My code is running, but wrongly. I am getting stata to save the
> "meandata", but all the five observations (mean of age from 5
> different samples) are stored as equal in value. That means stata is
> not drawing 5 different samples, but only one sample. Could anyone
> help by showing which line my code should I change?
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: Trying to simulate sampling distribution of mean
  - From: krishanu karmakar <[email protected]>

References:
- st: Trying to simulate sampling distribution of mean
  - From: krishanu karmakar <[email protected]>

Prev by Date: st: Trying to simulate sampling distribution of mean
Next by Date: Re: st: Writing to large Excel files
Previous by thread: st: Trying to simulate sampling distribution of mean
Next by thread: Re: st: Trying to simulate sampling distribution of mean
Index(es):
- Date
- Thread